#159: Inside the new PyPI launch Transcript
00:00 Michael Kennedy: Python is often described as a batteries included language and ecosystem. In fact, that's been taken so far, there's even a delightful Easter egg in the Python REPL. Just type, import antigravity, to see what I mean. Where do these powerful packages come from? Well, the Python Package Index, or PyPI. On this episode, you will meet Nicole Harris, Ernest Durbin III, and Dustin Ingram. They were part of the team that has just launched the new version of PyPI over at, pypi.org. Not only have they given us a great new website around packaging and Python, they've laid the foundation for innovation in the space for years to come. This is Talk Python To Me, Episode 159, recorded April 18th, 2018. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Active State and Codacy. Please check out what they're offering during their segments, it really helps support the show. Hey, everyone, before we get to the exciting news about the new PyPI launch, I want to tell you about a brand new course we just launched. It's called Python 3, An Illustrated Tour, and it's a five hour visual and code-based tour of all the features in Python 3. It's written by Matt Harrison, who has authored 15 technical books and is a bestselling Python author. Check it out over at talkpython.fm/illustrated and if you get the course this week, we'll throw in Matt's newest Python book for free, which is a perfect complement for the course. If you have the everything bundle already, then you should definitely check out the course because it's included in your bundle and you can just go take it. I hope you love this new course. We have many more coming down the pipe and I'm looking forward to sharing those with you as well. Now, let's hear about the new PyPI. Nicole, Dustin, Ernest, welcome to Talk Python.
02:15 Panelists: Hey, thanks. It's great to be here. Thanks for having us.
02:17 Michael Kennedy: Yeah, you all have done something amazing. It's almost like you've caught a unicorn in the mythical sense of there's been this talk of a new PyPI website and infrastructure for so long and then here it is. You all are really central to doing this, so I'm super excited to talk about this, the rollout, the technology behind it, new features we're going to get, we have already gotten, things like that. But before we get to that, let's start with your story, just briefly, since there are three of you. How did you get in programming Python? Nicole, go first.
02:52 Panelist (Nicole Harris): I started off with programming generally about 10 years ago. My degree's actually in film and photography and I wanted to make a website to put up my animation works and that kind of led me to HTML and CSS which are still my specialization. From that, kind of, I became what was back then a sort of generic web designer before we had lots of different specializations. Then my husband is actually a Python programmer, so that's how I got involved in the Python community and I don't program in Python very much these days but I do sort of dabble in it every now and again.
03:28 Michael Kennedy: Yeah, yeah, very nice. Ernest, how about yourself?
03:31 Panelists: I graduated from school with a degree in physics and math in the sort of peak of the recession back in 2007-8 era and eventually conned my way into a job as a business analyst. At that point I started programming in order to stop using Excel, and then years later, I've come to this point.
03:52 Michael Kennedy: Very cool, I love how you sort of took your career and just kind of laddered it up or leveled it up. Right, math and physics, I'm not going to work at CERN, so now what? Then you just, you know, worked your way up that ladder like I also, I've said this several times on the show, of course, but, I also was working on my PhD in Math and then kind of abandoned for my self-taught developer path many years ago. Dustin, how about you?
04:14 Panelists: Yeah, so I went to school for computer science and I'm not really sure when I first was introduced to Python, but I do remember at some point, after having done a lot of C and C++, in my studies, coming across this Python thing and being like, oh, this looks so much nicer. I slowly sort of worked that in as much as I could, and yeah, now I could probably call myself a Python developer.
04:37 Michael Kennedy: That's pretty awesome. So, you're like, this can't work, there's only five lines, right, like, in C++, I'd definitely have to write a whole app around this. But it works, it's the beauty of Python, right? Nice. Okay, first of all, I want to start with a big piece of news which we've been hinting at, or I've been hinting at, but, has a particular date. So the new pypi.org, which by the way, for a while was pypi.io, I'm going to ask you about that. But pypi.org has launched, and legacy PyPI is shutting down April 30th, right? That's on the blog recently announced. Congratulations, how do you all feel about that?
05:16 Panelists: Thanks, I think we're super excited. Yeah, I don't think there's anything negative to say about it, I mean, it's just, to see the culmination of the effort come to, like, the moment, has been incredible and there will be another sort of celebratory secondary, on the 30th when we sort of say goodbye to something that's been around for so long.
05:35 Michael Kennedy: Yeah, we're going to have to get used to less gray, more red, or more blue, right? It's blue, isn't it? Is that your work, Nicole? It sounds like you might have done a fair amount of the redesign, HTML, and Bootstrap type of thing.
05:48 Panelists: I joined the project back in 2015 because Donald, who's our lead developer, I think you've already met and interviewed.
05:58 Michael Kennedy: Yeah, he's been on the show twice, he's great.
05:59 Panelists: Yeah, so he put a call out basically to say, I'm rebuilding this thing, but I'm terrible at design so is there anybody out there who can help? I got in touch and so that's kind of how I ended up in charge of both the user interface, user experience, and I also took charge of the HTML and the SCSS code base as well, so kind of front end minus JavaScript.
06:25 Michael Kennedy: Yeah, that's really cool.
06:27 Panelists: Anything that looks good is Nicole's doing, and not any of the rest of us.
06:31 Michael Kennedy: I got to say, congratulations because I do feel like it looks really modern, not overly designed, but it definitely feels like you know, 2018, somewhere you want to be. It doesn't look old, neglected, gray, and just like default browser font style, right? The look is really really good, and I think on one hand design, how much does it matter, right? It's like a package warehouse, but on the other, I think it sends a message to the community like, this place is special, we care about it, we put in effort to style it and make it really look, look good and be usable, right?
07:02 Panelists: Yeah, and I think a lot of the design focus for me was thinking about how much Python is a teaching language and how, for how many programmers, it might be their first experience dealing with a package index. So, it was really important to me that it looked friendly and it reflected the values of the Python community. Both in terms of the design, but also in terms of the accessibility features that we've built into the front end code base. We're trying to make sure that it's serving as many people as well as possible.
07:35 Michael Kennedy: That's cool, and do you mean things like ARIA, like Screen Reader, indicators and stuff like that?
07:40 Panelists: We've done a reasonable amount of work on that so far and we've actually got an accessibility audit happening this week as well. So there'll be more improvements on that side but given there's so many users of the site currently, it's just from a percentage perspective, you know, that there is going to be a portion of those users who are going to be using assistive technology. So we need to be looking after them, and I think that reflects the Python community and the way that we go about things offline as well.
08:08 Michael Kennedy: Very nice. Let's touch on the contributions the other two of you have made. So Ernest, what was your major part in this whole project here?
08:16 Panelists: Sure. So, since about 2013, 12 or 13-ish, I've been contributing to the Python Software Foundation's infrastructure and so this is the servers and services behind Python.org, www.python.org, mail, wiki, etc. PyPI is one of the largest and most, most used of the services provided by the PSF, and I got involved primarily just keeping things turned on. In 2013, there was a large contribution that I did to modernize the infrastructure that hosted the old PyPI, and over the past few years, I've continued that work and adding to the reliability and telemetry of PyPI. And so with the Warehouse project, Donald Stufft and myself both sort of took a step back and said, if we were going to do it all over again, how can we make sure we have excellent infrastructure warehouse? So my main contribution in the most recent work has been a mixture primarily of the infrastructure behind pypi.org and also some code changes that features as well as just stuff to make it more compatible and easier to operate and do so reliably.
09:27 Michael Kennedy: Yeah, very cool. Dustin, how about yourself?
09:30 Panelists: I joined the project just as a volunteer contributor about two years ago. I think I just had happened to come across it looking at Donald's GitHub and I was like, wow, this is a really usable PyPI, but it's not finished, and as a new contributor, I was pretty just attracted to it because, I knew I could actually contribute to it. Legacy is a behemoth and has very few tests and even to run it locally, you have to actually go in and comment out a bunch of code, so it's really abrasive for new contributors and Warehouse is not like that at all. So I sort of started making some contributions, doing some, elasticsearch tuning and that kind of thing, and, just adding elements to the UI that weren't there before. I think I work necessary and also, making a lot of contributions to the just tooling ecosystems of, that's other projects like Twine and pip and things like that, just to work with the new PyPI.
10:28 Michael Kennedy: Yeah, very cool. Now, the three of you are here but you all have mentioned Donald Stufft who's been spearheading this and deserves a lot of credit as well, so congratulations to him. Who else? Is there anyone else who we should sort of give a shout out to while we're talking to you all?
10:43 Panelists: Yeah, absolutely, I want to point out Sumana. Sumana, actually I've never tried to pronounce Sumana's last name out loud. So Sumana H took an incredible role in the project management and leadership over the past few months and making, and bringing this together. Absolutely was a huge driver in a lot of the work that we did to encourage and welcome and have sticky contributors to the project. So, I sort of said this a few, I don't remember when exactly, but there was a point where whenever Donald or I would tweet "the PyPI team", what we meant was, whichever one of us happened to have done something that week or that month with PyPI. And I attribute personally a lot of the reason why, when I say "the PyPI team" now, it's a collection of more than, like, five, I mean it's probably closer to, like, seven or eight people who were regularly contributing and there's a team and when I say that now, I say it earnestly, pardon the pun. But, yeah, so Sumana must be, in my opinion, must be sort of encouraged and called out here as well. Yeah, I just want to say, I don't think the project would have been as big of a success as it was if Sumana hadn't been sort of organizing and herding us along the way. She did an exceptional job.
12:00 Michael Kennedy: So glad to hear it. So congratulations to you all. I've two sort of burning questions around the new pypi.org. Ah, three. Let's start with the simple one. We have pypi.python.org/pypi which is a crazy location on the internet cause why the duplication. But anyway, we have that. And then for a little while you had pypi.io and then you switched to pypi.org for like where the actual new Warehouse, the new Python packaging index lives. Why did you change it halfway along the way there?
12:36 Panelists: Sure, it's a good story. So it started at python.org/pypi, is where it initially lived. Then it moved to to pypi.python.org/pypi because it was easy to change the domain and not so easy to change the URLs.
12:52 Michael Kennedy: More easy to separate the infrastructure to another server rather than behind a load balancer or something like that, right?
12:57 Panelists: Mmhmm.
12:58 Michael Kennedy: Okay.
12:59 Panelists: And then eventually pypi.io, I don't remember when we got it, but we've been using that for sort of the internal domain for PyPI for a long time. So for the actual servers behind the scenes. When Warehouse started to get to the point where Donald's like oh this is real. We could start deploying this somewhere. We went with pypi.io because we had it. The frustrating part is that pypi.org was not owned by the PSF or a Python community member for a long time. So basically the reason why it switched mid stream was that pypi.org was successfully obtained by the PSF and by the PyPI maintainers.
13:41 Michael Kennedy: Oh, okay.
13:42 Panelists: It was sort of the gold standard of the domain that we desired but it wasn't ours until, I don't remember when that happened, but when it became ours, we immediately switched.
13:54 Michael Kennedy: I see, so that was what you wanted all along but there was just this like squatter type of situation thing going on. It is the internet isn't it. Alright, so whoever wants to take this, one feel free to jump in. One thing that I'm wondering is, what features or benefits do we get other than the underlying system is more polished, easier to contribute to and so on. As just a user, suppose I don't care about that. It could be written in PHP for all I care. But when I go to it, what do I get to do that's better? Or different?
14:27 Panelists: Honestly, there's not much different. Most of the goal of this project was to move to a system that would allow us to more easily add new and exciting features. So we have a lot of ideas. Like new APIs and ability to deprecate packages and things like that, that are now going to be, not trivial, but much much much easier to implement. Much easier than they would've been on legacy. So a lot of this is just modernization efforts and taking what was originally just a proof of concept that became PyPI into something that's actually been thought through and designed and robust.
14:59 Michael Kennedy: Yeah, I think you mentioned earlier, and Donald himself had said this previously, that the original PyPI, the gray one, not the blue one, was really based on almost like custom web programming. Not even like Pyramid or Flask, or something. It was really hard to get, people would come and say hey I want to contribute to a new feature. They would look and go actually, not that much. They would go away, right. So now, maybe this is a good place to switch into it. We could talk a little bit about what the underlying technology for that is. So maybe, Dustin and Ernest talk about the backend and Nicole we could touch on the front end as well. Cause that also got super modernized I'm sure.
15:44 Panelists: Yeah, so the thing about legacy is that it was written at a time that predates a lot of the frameworks and tools that we know exist today. So it was doing the best with what it had I think.
15:54 Michael Kennedy: It's not a real direct criticism of it but it just came into existence like really early before much of the other stuff. Like, you pip install flask, but where are you going to do that from if you don't have it.
16:06 Panelists: It's old. Yeah the modern PyPI framework we chose to use is Pyramid. That was after a little bit of experimentation that sort of, just, Pyramid allows us to have a little more control over various things that we need to do to be PyPI. I think a big part of this project was the infrastructure work that Ernest did and I think he should talk about that more.
16:26 Michael Kennedy: Yeah, go for it.
16:27 Panelists: We're now deployed on top of a nice, buzzy framework, a piece of infrastructure called Kubernetes. We sort of looked at that as getting to the point where it as technology, Kubernetes has come so far and by the time Warehouse is going to be really real, Donald and I were both comfortable with sort of targeting that. The biggest drawback that that as a platform has, is right now sort of the industry standard, the de facto for deploying to Kubernetes is you write a bunch of YAML or you use something to generate a template for YAML. The goal was basically to have a lot of similar features to other platform as a service and do so without really having to have Warehouse maintainers or PyPI maintainers worry too much about what's actually happening. So a project came out of this work called Cabotage which is a platform within Kubernetes and a web app and worker on top of it that just basically managed continuous deployment. So you can set and configure your environment variables and such and then deploy your service and it pops up at a known URL.
17:40 Michael Kennedy: That's really sweet. So you basically, as a contributor, I do a check-in to a Git branch, maybe PR and when that is accepted, that will trigger Kubernetes to pull down a new version and just kick off, sort of reroute the request? What happens there?
17:54 Panelists: Not yet.
17:55 Michael Kennedy: That's the dream?
17:56 Panelists: Yeah, that is something, that is another long term benefit that we can sort of foresee out this. Right now the biggest benefit that we get from this is we have incredible flexibility in the way that we deploy Warehouse and how we change how many resources it has effectively. So all of the primitives of the platform, or of Kubernetes, effectively are really excellent. It's just that you have to bring them together and that's the part that's sort of difficult. So one of the biggest benefits we get is the zero downtime deployments. So since PyPI went live on Monday, we've already deployed like 30 times and nobody noticed. Which is great! Then also just being able to be really flexible. We have, I think it's like five different like types of things happening behind pypi.org and we're running certain workloads under Gunicorn because they perform very well under Gunicorn and the primary site is deployed using Twisted. for that purpose. Overall, just having a little more flexibility and scalability was the main driver and down the line, we're really excited to see about doing things like you mentioned. Being able to do branch based deploys, etc.
19:12 Michael Kennedy: Yeah that's really cool. Go, Dustin.
19:14 Panelists: I just wanted to mention, I totally forgot there is one feature that I'm super proud of that pypi.org does that legacy did not. Can't believe I forgot about this, 'cause this is my baby for a long time. But you can now write Markdown descriptions on PyPI.
19:30 Michael Kennedy: Yeah, that's awesome.
19:31 Panelists: A feature that people have wanted for a really long time. That's really the one big thing that super excited to say that the new PyPI does.
19:39 Michael Kennedy: That's cool and that's part of the modernization that you're talking about, right? Like Markdown, I don't know what people would've thought that meant back when it was created, but now obviously it's like the de facto way of formatting structured input that doesn't break the site because it's missing a div or something right? So really cool, really cool.
19:58 Panelists: Markdown didn't even exist when PyPI was first created.
20:01 Michael Kennedy: Yeah I'm sure it didn't. Nicole, how did the sort of redesign look? Did you try to take what was there and like patch it up? Or were you like I'm just going to recreate this from scratch? And style it up from scratch? What was that process like?
20:17 Panelists: Before I answer that question, I actually have something to add on the infrastructure question that you asked. One of the things that I really appreciate about the project is how easy it is to set up as a contributing developer. So I'm not the most technical contributor but I found the project really really easy to set up with Docker and Docker Compose. So the infrastructure that the team has set up, in terms of being able to hack on this project, is really really amazing and it really lowers the barrier to entry for a lot of people. We've seen people who've made their first open source pull request on this project.
20:59 Michael Kennedy: That's really great, yeah.
21:00 Panelists: It's really accessible for people to actually come and contribute to the project. So I don't want to undersell that aspect. I think that's really important.
21:08 Michael Kennedy: I agree that it is and I think, I think that's one of the real powers of this whole Docker thing is right. It kind of comes all together but Docker on its own brings almost equally many difficulties or challenges at the same time and bringing in Kubernetes kind of to make all the pieces fit, I think is really really clever. So, quite nice. This portion of Talk Python to Me is brought to you by Active State. Active State gives you a faster way to build and secure open source run times, from your first line of code through to production. Every second you spend building your Python distro or trying to secure your Python programs, is less time spent doing the work you love. You've got better things to do than trying to resolve dependencies or making sure that you tick off all the security boxes when you ship to production. Standardize on your Python builds so you can have less friction in the development cycle and you can deliver apps faster. You can also get a unique server-side way to verify your Python applications at run time. Bake security right into your code without impacting performance. Go faster, spend more time doing the work you love and comply with your enterprise security needs. Try Active State and see why it was chosen by IBM, Microsoft, NSA, Siemens, Pepsi Co., and more. Join millions of developers who trust Active State to build their open source language distros. Visit talkpython.fm/activestate for a special offer. That's talkpython.fm/activestate.
22:31 Panelists: On your other question, so in terms of the redesign, basically Donald just gave me free rein to do whatever I needed to do because I hadn't... To give you an impression of the old code base, Donald basically said, don't even go and touch that. Don't look at anything over there. Don't set it up.
22:51 Michael Kennedy: Not our problems.
22:53 Panelists: Just avoid it at all costs because he knew that would be a world of pain for me. I didn't really take any of the HTML, the CSS or the design from that. It was just like, okay, so we've got this fresh new thing. We want to show that it's a fresh new thing. We want to bring it to the modern design standards that people expect. We want it to be responsive so it works across all devices and we want it to be accessible. So I basically started from a completely clean slate. That's not true. Donald had put together some templates, but basically like throw that in the bin and start again. So that's what I did.
23:31 Michael Kennedy: That's really cool. So what are some of the technologies in the new one? It looks to me like it's probably Bootstrap based which I'm a fan of so that's cool. And what else?
23:40 Panelists: No, it's not Bootstrap.
23:41 Michael Kennedy: No, it's not Bootstrap?
23:42 Panelists: No.
23:43 Michael Kennedy: Okay, what's involved there?
23:44 Panelists: Okay, so we're going to go into a bit of CSS and HTML naming methodology. So it's, the HTML uses BIM which is a naming methodology for controlling the specificity of your CSS. Then basically each of the areas of the front end is a separate block or component within our SCSS code base. So basically the idea is we built up a custom reusable CSS code base.
24:12 Michael Kennedy: Yeah, that's really nice. And you're using SASS you said? Or S-A-S-S which is like programmable CSS that then compiles or trans piles to CSS which real nice. So it sounds like if people want to contribute to the UI side of things it's pretty modern and fresh if they want to drop in.
24:28 Panelists: It is and it's documented as well. So it's fairly clear how that system works if you want to change variables, if you want to change what are called mixins which are kind of like reusable functions in our SCSS. And if you want to modify a certain part of the code base, it's really obvious when you inspect the HTML, it's really obvious where the corresponding CSS is for that within the code base. So it's quite logical in terms of the way that the file structure is being set up. I don't take credit for that so it uses a system from a CSS guru called Nicolas Gallagher. I mean, if anyone's into CSS, that's someone you should be following. So it uses the IT CSS system from him.
25:11 Michael Kennedy: That's cool. I feel CSS and a lot of the web design stuff kind of gets the short end of the stick, but it can either be a serious drag to work on or it can be really beautiful depending on how you do it, right?
25:22 Panelists: The challenge with CSS is kind of achieving something at scale. I think most people can write a decent CSS code base for small projects, but when you start to scale projects, that's when you kind of have all this complexity with the cascade. Things starting to break where you don't expect them to break. So that's why from the beginning, I introduced these kinds of systems that I knew would allow us to scale the code base as we add new features.
25:51 Michael Kennedy: I can't remember who on my show said it before, but somebody said they feel like CSS in large projects becomes write only. Like you don't actually change anything, you only go to the bottom and maybe override it or add another file that replaces it or adds to it. Pretty interesting. So let's talk about the actual roll out, because... Actually before we talk about the roll out, let's talk about the traffic. I don't know, maybe Ernest, this is most clear on your mind, but this site and this underlying infrastructure, it handles a little bit of data, right?
26:21 Panelists: In total, PyPI does, the numbers are not directly in front of me, why did I do that. But I have a slide deck somewhere that has this information.
26:31 Michael Kennedy: It's like 30 or 50 terabytes a month, like something to that size, I think? It's a tremendous amount.
26:38 Panelists: I think it's like 10 billion requests per month is our running average right now. Let's go look at numbers. If we go and look at the old service. In the last month, so that excludes two days, we did a total of 6.5 billion requests, at the edge, 6.8 billion requests per month and 1.5 petabytes of data at the edge.
27:06 Michael Kennedy: Petabytes! Holy moley...
27:07 Panelists: Right, so we're also doing that at around 150 milliseconds of latency and with not that many errors all things considered. It's always important when we talk about these huge numbers to take one step back and go, yes, that is what the service in total does. But it's all thanks to Fastly which is the CDN provider.
27:31 Michael Kennedy: Right, because of the CDN, yeah.
27:33 Panelists: Which is the CDN provider that offered to front PyPI many years ago. So just that one change was the most significant thing that happened to PyPI until, in my opinion, Monday. But at the backend, we still do something like 25 to 30 requests per second across myriad of different routes.
27:54 Michael Kennedy: Yeah, that's really cool. Pyramid's working out pretty well for you? Like, my entire site, my core site, my podcast site, and various other pieces of infrastructure are almost all Pyramid or Flask and I think it's been rock solid. So I've enjoyed it a lot but how's it working for you?
28:10 Panelists: I've had no complaints. I mean I didn't really use Pyramid before I started contributing to the project and now it's definitely my preferred framework for more intensive web application in Python. So I like it a lot. Yeah it broke my brain. I got to the point where now I'm like, ah, of course, this is how this works. I was like, oh, aw, I can't do that here. And so overall, I think I agree with what Dustin sort of alluded to earlier around the control and precision that you can get from Pyramid that other frameworks sort of make you run around to do.
28:49 Michael Kennedy: Yeah nice. So the roll out. I set the stage with how much data you guys do, how much traffic you do. When you flip the switch on that, that's got to be a, so did you just go, it all goes here? Or did you like do some sort of like take 1% of 1% of the traffic and like slowly roll it over? What was it like?
29:07 Panelists: The main traffic sources for PyPI are pip installs, XML RPCs, so we have an XML RPC API and that gets a lot of traffic because it's mostly POST requests and it's hard to cache that. Then a very small fraction of that is actual web traffic. So, pypi.org existed for a long time before the launch and you could go and do everything via the web interface that you could do on regular legacy PyPI. So that didn't require a lot of traffic and worked fine. So what we did was sort of some incremental load testing where we would switch certain, either some pip traffic or XML RPC traffic over to pypi.org and see how it stood up. Yeah, so once again, Fastly was sort of predominant in that effort. So because we were doing those redirects at the edge, we were able to set rules there and so right now actually, there's still quite a bit of traffic going to the legacy PyPI backend and we can do that because we're not redirecting the traffic over PyPI. So we were able to like tune it at like 5, 10, 15, 20 percent for the heavy hitting stuff and test ahead of time. So when we switched, basically all we did was started issuing redirects from the old service. So it was a one time click, but there were weeks and weeks of like incremental quick load tests where we would throw a bunch of traffic at it. There was some replaying we did ahead of time as well.
30:36 Michael Kennedy: Yeah, oh replaying, that's pretty cool. That's basically capture the exact web traffic and you replay it against the domain and just see how it behaves, right?
30:44 Panelists: It wasn't the exact traffic. We were taking measured percentage stuff and then re-dispatching a request that looks like it. Because the problem is, we can't just do every request blindly or people would dual submit and action or something.
31:01 Michael Kennedy: That's true, right. You got to have non-modifying type of stuff or test data or something I guess, yeah. Yeah, pretty cool. So how did it go?
31:09 Panelists: Perfectly, not a thing went wrong. It was good for the first 15 minutes. I think we were all really excited.
31:15 Michael Kennedy: It's working! Wait a minute, it's not working.
31:19 Panelists: Then the issues started rolling in.
31:20 Michael Kennedy: What did you guys run into?
31:22 Panelists: Sure, so previously, all files uploaded by users to PyPI, there's maintainers uploading your packages were hosted under the same domain. So packages were hosted at pypi.python.org/packages, some stuff. During this switch, we decided to make a separate service, a separate domain for hosting user content. If you've ever seen the documentation that used to be hosted at, or is still hosted, I'm sorry, at pythonhosted.org, the main reason for that is that serving user generated content from the same domain that you're actually operating a service from, can be dangerous from some security perspectives. So the thing that went wrong, is that when we switched over, there were redirect loops and all sorts of craziness happening for people trying to download files from files.pythonhosted.org, our new host. Ultimately it was a bewildering and sort of bizarre thing because we had a number of factors at play. We had files that were cached, they were fine. Files that weren't cached were going to end up in this redirect loop. We had some host names involved and overall it was, just, we realized it at sort of the worst possible time. So if you go to status.python.org, and scroll down a little bit, you can read an incident report that sort of describes in more detail what went wrong. But effectively we were making this change as part of the roll out and a esoteric thing that occurs occasionally when you try to move a host name from one backend or one CDN configuration to another CDN configuration, we mishandled that. It was a one line, the fix was one line and it was like 13 characters, but it resolved it. So not everything can go perfect.
33:17 Michael Kennedy: Well sometimes the best, most memorable lessons are taught in production. Well we talked about, before we started the official recording, I was listening to, as a group your overall thought was this was a big success even if there was like a blip here or there.
33:35 Panelists: Yeah. Yes, absolutely. Aside from that files outage, which is kind of the core use of PyPI, so that's kind of a big deal. But aside from that, everything else worked great and continues to work great so we're generally pleased. Like 99% of things worked perfectly.
33:53 Michael Kennedy: That's really great. So I think this is one of those things. I'm sure people were concerned about switching, what might go wrong, would we break Netflix's deployment because they can't get a pip install to work on some Docker container in a continuous build because, these types of, I may be affecting this, but you sort of had to go through that to be on the better side of the world, right? So now, you have Nicole's design job, the Pyramid app that you all built is up and now it's just there to be polished and built upon, right?
34:26 Panelists: Yeah I think that we're, our hands are, well once legacy is shut down, our hands are untied and we can make progress in places that we wouldn't have been able to, in the future. Something that I like to point out about PyPI, the historical PyPI, is that there was a point where it was pretty much the only non-static web host that python.org had. So it would end up getting a bunch of features thrown into it that weren't necessarily critical to its operation. So as we split into Warehouse, features were removed from PyPI legacy and sort of while they were both simultaneously in existence, we had to be very strategic about what thing we added to Warehouse or pypi.org. So once legacy is shut down, we can start to make up much more progress and do so much more quickly and much more safely than we ever have been able to before. So that alone is probably the biggest long term benefit of this, is being able to do the things that people need.
35:34 Michael Kennedy: Whether it's design or functionality, I think if you have to remain on parity with this older system that totally, you're not designing one thing, you're designing almost two things or you're constrained really painfully. So you'll be free--
35:51 Panelists: They share a database, so that also is a huge complicating factor.
35:56 Michael Kennedy: Very interesting. I guess a couple questions just really quick on that and then I want to kind of talk about where things are going. You said they share a database, what database is that? Where is the actual web apps running, your Kubernetes containers running these days?
36:10 Panelists: We use Postgres for our database and we have a very generous donation for in kind service basically. So AWS said, yeah you can run PyPI here. So right now we run the entire stack in Ohio. I picked where it deployed, so I picked Ohio. But in the Ohio region, for AWS, we've got, I think like nine medium-ish size servers running Kubernetes and we're using RDS and Elasticache for Postgres and Redis and such.
36:39 Michael Kennedy: That's cool. And Dustin, I heard you talk about Elasticsearch, right? That's involved as well?
36:46 Panelists: They're another sponsor in kind. The search on pypi.org is far, far better than it was on legacy which is basically a super naive search. Now we can do full text search across descriptions and summaries and package names and even author maintainer names and it's a little more performant than previous search. A little more reliable and better results.
37:10 Michael Kennedy: Yeah, perfect, perfect. Alright, so let's talk about where things are going, I guess. So you have a roadmap laid out at wiki.python.org/psf/WarehouseRoadmap, I'll put that in the show notes of course. So the very first thing, you have a bunch of stuff which is pretty awesome. It's like, here's a milestone, closed. Here's a milestone, closed, completed, right? Dut dut dut, these are great. Then the current one that's coming, in progress, is shut down down legacy PyPI. You all want to talk about that? That's coming on the 30th, right? Like we're recording right now on April 18th, so 22 days. Yeah, go Dustin.
37:47 Panelists: We kept the legacy up for now just because there are a few big users of PyPI that weren't able to make the migration in time. So, it is to just keep it up for a little bit longer and then fully, the domain will continue to exist so pypi.python.org will redirect to pypi.org but the legacy service will cease to exist.
38:07 Michael Kennedy: That's the big change you were talking about, Ernest, where you'll kind of be free to build this thing as it's own creation, right? Not mirroring that.
38:15 Panelists: Yeah, it's interesting. I think the first thing that Warehouse ever did that was production impacting, was take control of the database schema. That was years ago. So we started tracking database changes there. Then uploads came and then the actual web app came up and was usable and such and we added features there to get to parity. So everything that the project has sort of undertaken up to this point, except for markdown descriptions, I think that's it.
38:47 Michael Kennedy: And the design, yeah?
38:48 Panelists: Of course the refreshed design has been just to make sure that we're doing everything we can to keep from breaking too many people. It's impossible for us not to. I mean, it's impossible for any service to make progress without at some point deprecating older APIs and such. So we're really getting to the point where we've paired down a lot of things and we can start looking forward to value add features if you will where it's like security features, audit features, accessibility is a big thing that we're looking forward to as well.
39:23 Michael Kennedy: Yeah, very cool. So that comes on the 30th and it'll be officially, the chains will be broken and Warehouse will be it's own thing and that'll be great. This portion of Talk Python to Me is brought to you by Codacy. If you want to improve code quality, prevent bugs and security issues from making it into production and at the same time, speed up your code review process by 20%, then you need to try Codacy. That's C-O-D-A-C-Y. Codacy makes it easy to track code quality and identity and fix issues by automatically analyzing your commits and pull requests with all the most widely used static analysis tools. Codacy helps great teams build great software. Join companies like Deliver Hero, Paypal, Samsung, and more. Try your first code review by visiting talkpython.fm/codacy and linking your GitHub or Bitbucket account. You can also just click on the Codacy link in the show notes. So then you have under your roadmap, you have another section called post legacy shutdown. Then kind of beyond that, you have cool, but not urgent. Which is a nice way to categorize it. So maybe we could kind of touch on those and whoever feels most like it's in their space, just grab it. So like Dustin, there's something called incremental search indexing coming? Tell us about that.
40:42 Panelists: Yeah, so right now the way the search index works, you upload a package, our index runs, I think now it's every three hours roughly. When it actually runs. There's a lot of packages indexed and we don't have at the moment, a way to sort of incrementally update the index so as soon as you publish a package, it shows up in search results.
41:04 Michael Kennedy: I see, so you could say like this part is super stale because I know it just got updated, so rerun the search but only on this package for example?
41:10 Panelists: The goal was we got search up and running on PyPI and it was still a lot better than legacy, so it was good enough for launch but it can be better. So that's one of the things we're focused on adding.
41:21 Michael Kennedy: While you're on it, there's the auto complete for search which will be pretty nice. There's also a search API. That's pretty cool. Is there a way to search now, in the future, is this going to be a thing, what's the story?
41:36 Panelists: Technically we have the XML RPC API that is technically deprecated. You probably shouldn't be depending on it or using it or adding new thing that depends on it.
41:43 Michael Kennedy: It does have the words XML RPC in it, right?
41:47 Panelists: That should be an indicator that it's deprecated but, no. You can technically search from this API and this is how,if you type, pip search, whatever that's how you get results through there but XML RPC, like I think I said before, is really hard for us to cache. It's a big consumer of our bandwidth and our backend resources so the idea is to move to something that is a little more cache-able. So this would be, we have a lot of ideas about future APIs for pypi.org and something that might be included in that is a search API.
42:17 Michael Kennedy: Another one that's interesting to me is the... The psycopg2 warning. I guess that's just, you guys are using Postgres basically. Are you using the asynchronous stuff or just synchronously?
42:31 Panelists: Warehouse is all synchronous right now.
42:33 Michael Kennedy: Are you thinking of any way to get something asynch in there or does it not matter?
42:38 Panelists: So a number of the services that are behind the entire sort of service, it's like the service umbrella if you will, of what PyPI is. So PyPI, it has been broken up into hunks. For some things it truly does matter, the way these requests are handled. A lot of the really incredible work that was done initially on Warehouse by Donald, was just how aggressively cached everything is. The goal was basically to make as few requests require a transit to the backend as possible. So we don't have a ton of concurrency concerns around that. But for some services that do see lots of traffic, like we have a service that just translates old URLs to new URLs and that is effectively proxying information. So that's a knockout use case for asynch stuff.
43:32 Michael Kennedy: Yeah, pretty interesting. Let's see what else is in your post legacy shutdown here. We have, stop having a staging environment. Is that because of the Kubernetes stuff? It makes it not required?
43:41 Panelists: So that's talking about test PyPI which would be at test.pypi.org now. It currently exists so that people can do stuff and not worry about being on the real PyPI. So you can practice uploading a package. See how it looks on PyPI. I think there's a lot of reasons for it to exist, sort of just as an experimental and educational tool but I think the main reason people use it, is to see if their restructured text descriptions are going to break or not because historically, PyPI would just, it's sort of all or nothing. You either get a perfect description or it's just looks like plain text. There's some ideas about doing some new things that might obviate the need for test PyPI. Like the ability to stage your releases. So, you're going to make a new release for your package. You can upload them all to PyPI but they're not actually published yet. You can go and look at them, but no one can see them. Then you hit a button and they are released.
44:36 Michael Kennedy: Yeah, that's very cool.
44:36 Panelists: And a big reason why that's important is we have immutable releases on PyPI. So right now, there's a lot of frustration that comes from users around, they upload a package, they don't like what they saw, they try to delete it and they get a warning that says when you delete this, you won't be able to re-upload it and then they go to re-upload it and they're frustrated. Then they try to delete the project. Then they go and re-upload it again and it says no, you still can't re-upload that file. So this is around primarily a caching and immutability thing to basically say that files can't be replaced. So if you've been installing files from PyPI for however long, it will still be there. So giving people a way to trial things without committing basically, be able to like the permanence of the thing, is a big reason for that as well.
45:27 Michael Kennedy: Alright when you get billions of requests, one pip freeze can make it part of the history of the software, right, for sure. Alright, so just really quick, some other things. You have GitHub sign on coming along. Renaming projects, a few other cool things. In the cool but not urgent, the one that stood out most to me was a mobile app. What's the story with the mobile app there? Nicole, are you going to be designing a new mobile app for PyPI?
45:52 Panelists: I don't know whether or not. I mean, this has been a suggestion from the community and I think we're still working out whether or not that is something that's justifiable in terms of our time and the resources that we have on the project.
46:04 Michael Kennedy: What exactly, do you guys know the goal, I mean, you're definitely not going to pip install onto your mobile phone, that wouldn't mean anything right? Is it more about management and seeing stats?
46:15 Panelists: I think it was more just about can we offer this user interface as a mobile app as opposed to a responsive website. For me, I'm not sure how much value that would bring. We probably have like, I mean I don't have the statistics in front of me, but it's less than 10% of our users are using a mobile or tablet device. The site works on mobile now, better than the old one. I'm not sure whether or not we'll go down that road. What I think is most interesting about the mobiles apps being tracked there is, it is a prerequisite for that is effectively the next generation of an API for interacting with PyPI. That's one of the biggest things that is intended to be undertaken at the PyCon sprints this year. So now putting my PyCon hat on and my Warehouse hat on at the same time, I think it'd be an excellent idea for people who are interested in helping to contribute to the discussion and design of the next generation of APIs for PyPI to consider attending the sprints after PyCon this year. The sprints are Monday, Tuesday, Wednesday, Thursday after and there will be a number of contributors to the project around and that's one of the main things we plan on discussing.
47:33 Michael Kennedy: Sounds really good, yeah, very very nice. Couple others, let's see, that are pretty nice too. Package update feeds, so that's like I can subscribe to real time changes to the backend data so I know if I've pulled that down or something. I could refresh say my local PyPI caching server type thing?
47:53 Panelists: Yeah, so there's a lot of third party services that depend on PyPI that kind of want real time notifications about new package uploads or removals and that kind of thing. So this is just going to be a new API for, like a tool like PyUp which lets you automatically upgrade your dependencies when they're released.
48:10 Michael Kennedy: Yeah, I use PyUp on my stuff. I love it.
48:12 Panelists: Yeah it's great. So we want to be able to support them. Make it easier for them to do their job and use PyPI. So that's one of the things we're thinking about.
48:19 Michael Kennedy: Yeah, you don't want to have to suck all that data down just to get a new batch. Kind of like your incremental search. This is like the external version of it sort of.
48:25 Panelists: Yeah, exactly.
48:26 Michael Kennedy: So another one that's really closely related to it, like including related to pyup.io like you just mentioned, is security notification system for Python packages. That sounds really useful. We just had, this year or is it end of last year, some sort of test malicious stuff uploaded to PyPI. Alright, a couple packages that were sort of hitting on typosquatting. Didn't really seem to do anything but still kind of scary. So knowing about security notifications, I guess not necessarily just people uploading malware, but like, hey we actually forgot to check the password in this login field, you probably want to get the newer version that checks the password type of thing, right?
49:07 Panelists: On legacy you could do this thing called hiding releases which just made them not show up but they basically still existed and it's not going to prevent you from using them. One of the things we're thinking about doing with the new PyPI is either adding the ability to deprecate a release, saying, like you should not use this anymore, or it doesn't work, or being able to mark it as insecure in some way. So there's like a known vulnerability in it and you should upgrade to the new version. And this is something that's going to have to change in a lot of different parts of the code packaging ecosystem. So like pip needs to be able to say, hey you told me to install this version and PyPI says it's insecure and tell the user. Give them a warning of whatever, but yeah.
49:44 Michael Kennedy: Yeah I mean just related to that, I would love to be able type pip security checkup on an environment or something and go these two things have security warnings, these have updates but they're feature only or something to that affect, right? That would be cool.
49:58 Panelists: Yeah, and to be clear, it doesn't happen very often that there are security vulnerabilities in Python packages but it's still something that does, could happen, might happen. We want to be able to support it.
50:06 Michael Kennedy: Yeah, for example, Django had one or two minor security issues patched, right? You'd want to know if you were built upon Django. Hey you should probably install a new version before people start doing anything with that, right?
50:18 Panelists: Yeah.
50:19 Michael Kennedy: Very cool. So just a super quick, we're about out of time, but just touched on one more thing. This week, I think, pip 10 was released, right?
50:27 Panelists: That's correct.
50:28 Michael Kennedy: I don't know how much any of you all had to do with that but still pretty good news, right?
50:31 Panelists: Yeah it's great. It had been a long time since we had a pip release, so. Yeah it's really exciting. I mean, the biggest thing is it's a pretty foundational refactoring of a lot of the internal stuff and it puts, in my opinion anyway, one of the things I'm most excited about it is it puts a lot of the internal tooling of pip and makes it more available for more interesting things built around and on top of pip. Not necessarily at a CLI basis. Cause right now you've got to like, if you want to use pip stuff you used to have to jump into super private API to do it which isn't so great.
51:04 Michael Kennedy: That's really cool. Probably will make pairing it with work that you're doing on the server side easier as well. Alright, so I think I have other things I would love to talk to you about, but I think we're running low on time. So let's get to a couple of things here at the end. Final two questions, just quick since there's three of you. Nicole, start with you. If you're going to do some work on this project, what typical editor do you use?
51:30 Panelists: I use Atom.
51:31 Michael Kennedy: Okay, alright I don't think that I know about that one. Tell me a little bit about it.
51:36 Panelists: You're talking about text editor?
51:37 Michael Kennedy: Yeah.
51:38 Panelists: Yeah so it's Atom which is developed by GitHub.
51:40 Michael Kennedy: Oh Atom? Oh yeah, sorry sorry. I must have misheard you. Atom, of course I know Atom, yeah.
51:44 Panelists: Sorry my accent.
51:46 Michael Kennedy: No, no. Cool, Dustin?
51:48 Panelists: I'm a Vim user.
51:49 Michael Kennedy: Vim, right on. Ernest?
51:51 Panelists: I'm also a Vim user.
51:52 Michael Kennedy: Nice, alright now this particular question I ask of everybody but it's kind of interesting cause you're both on the inside and the outside, so. Notable PyPI package, Ernest, how about you go first?
52:03 Panelists: Notable in what way?
52:06 Michael Kennedy: Notable in that like, it's probably not necessarily the most popular thing. People always say requests which is fine. I learned about this thing, you should totally check it out sort of notable. It's not necessarily totally known but it's actually amazing and it's just a pip install away.
52:23 Panelists: Recently with the typosquatting thing we sort of talked about, I was on the hunt for something that would just tell me all of the standard lib module names. And that exists and go figure, it is called, I think it's called standard lib module names.
52:39 Michael Kennedy: Descriptive names are good.
52:41 Panelists: Yeah, and so we were able to add that to PyPI and very quickly be able to have a good block of that first line of defense. Somebody didn't try to pip install regex or something.
52:54 Michael Kennedy: Right, right, right, yeah. pip install re, no not doing that. Dustin?
52:58 Panelists: In the course of this project, I had this sort of favorite Python package I'd learned about which is pretend. Which we use pretty heavily on Warehouse for sort of mocking things out in tests so the new PyPI has like a 100% test coverage. So it's a lot of mocking going on. That's I think Alex Skinner's tool and it's been really helpful. I think as of lately, my favorite package is not actually on PyPI but I just discovered it the other day. I'm kind of a sucker for funny little hacks or jokes. So this guy Dominic Miedzinski, he made this project called import PyPI and it's really interesting. What it does is, it sort of wraps the import command and if you don't have a given package on your system, it will go out to PyPI, get it, and install it and it will just work so you never actually have to pip install anything again.
53:48 Michael Kennedy: I ran across that as well and that's pretty interesting. It's quite ironic it's not on PyPI but yeah--
53:53 Panelists: Does it do that on the fly? Yeah it does.
53:56 Michael Kennedy: I think it does.
53:57 Panelists: I don't think it's really recommend for production grade usage but it's a fun little hack.
54:03 Michael Kennedy: It is quite interesting for what it's worth. Hope it puts a --user on it at least. Alright, Nicole, do you have one?
54:10 Panelists: Oh, yeah, I do. As I said, I only dabble in Python but when I was dabbling, I got really into testing and I really liked factoryboy which creates factories.
54:20 Michael Kennedy: Factory boy, okay.
54:23 Panelists: So I used that a lot for running Selenium tests. Running over my Django code base when I was developing with Python. It's a really cool project. I think it's actually based off of a Ruby project originally. Yeah, off Thoughtbot's factory bot. So yeah it's a really great project to work with.
54:40 Michael Kennedy: Awesome, that sounds like a great one. Alright, well, thank you all for being on the show. I want to give you one final chance for a call to action. There's people who have packages they maintain. They should probably play with your stuff, right? Try the new thing? We have people who maybe want to contribute to open source. Ernest, you spoke about the sprints. What should people do?
55:00 Panelists: They should come to the, if they're going to be at PyCon, they should come to the packaging sprints. So I'll be there. Some part of Ernest will be there after running PyCon. We'll see what's left of him. We're going to just sprint on the packing ecosystem including PyPI and see what we can build.
55:16 Michael Kennedy: Awesome.
55:17 Panelists: You should go verify janitorial aspects that's a go verify your email address. That's super helpful for us.
55:26 Michael Kennedy: Yeah that's awesome. Yeah, Nicole?
55:28 Panelists: The other thing is I'm planning on running a sprint also at EuroPython this year in Edinburgh. So for people who are based in Europe who want to contribute to the project, we'll be running a sprint there as well. And the other thing is, people should consider donating to the Python packaging working group because we actually were lucky enough to receive an award from Mozilla to be able to fund working on our Warehouse for the last few months, but that money is about to run out. We have used that money to get to our goal which is to launch the new PyPI and shut down legacy but in terms of the future development of the project, any funding that we can secure is obviously going to mean that we can move faster and more reliably and be less reliant our volunteers for our sort of core infrastructure. I know the PSF is currently running a fundraising campaign. So certainly consider donating to the working group. As handy link actually at the top of the new site, if you do want to donate. So yeah, any contributions would be most welcome.
56:34 Michael Kennedy: That is a great suggestion. I think people should definitely do that. I forgot to call out the Mozilla open source foundation and say thank you for that but like the reason we're here having this conversion and it got this major boost is largely, that was a major factor in it right? Dustin, you wanted to add something.
56:53 Panelists: Yeah, the Mozilla award is definitely the reason why this is all possible. I wanted to have a call to action. Anyone that wants to contribute to the project or just contribute to open source, come and find us on GitHub. We are a pretty friendly group and we have a bunch of issues tagged good first issue that you could take a crack at and we'd like to see more contributors every day.
57:13 Michael Kennedy: Absolutely, and it's much easier, as you all have laid out, for various reasons why that's the case. Ernest.
57:19 Panelists: Yeah I definitely wanted to just, like, I'm shaking here going how did we not talk a little bit more about Mozilla Open Source Support Grant Program. Indeed, it is the sole reason why pypi.org launched on Monday and not in another year or 18 months because just the amount of work that went in to making this all possible, I think in retrospect, and without being super optimistic, looking forward, wasn't incredible. Just based on looking back, it probably would have been an indefinite period of time before this occurred without being able to have people committed and thinking and soliciting the community to help as well. So Mozilla was instrumental and forever indebted to them for how much they made this happen.
58:12 Michael Kennedy: Yeah that's really awesome and thank you to them. That's great. Want to add one final thing. People should donate to the Python Packaging Working Group but they should also, if they have a company that massively depends on Python, they should say, dear company, you're running a five billion dollar business on this, can we set up a thousand dollar recurring donation monthly to this because without this, your business goes away or at least a good chunk of it.
58:37 Panelists: Yeah the number of organizations and companies that depend on PyPI are most of them it seems like. So, yeah it's now possible to make recurring donations. So we definitely appreciate the support.
58:48 Michael Kennedy: Right, awesome. Alright, well, let's leave it there. Thank you all for being on the show. It's been a great conversation and congratulations on the launch. I'm super excited to see it.
58:56 Panelists: Thanks Michael. Thanks Michael. Thank you.
59:00 Michael Kennedy: This has been another episode of Talk Python to Me. Our guests have been Nicole Harris, Ernest Durbin III, and Dustin Ingram and this episode has been brought to you by Active State and Codacy. Active State gives you a faster way to build and secure open source run times. From your first line of code through to production. Check it out at talkpython.fm/activestate. Review less, merge faster with Codacy. Check code style, security, duplication, complexity, and coverage on every change while tracking code quality throughout your sprints. Try them at talkpython.fm/codacy, C-O-D-A-C-Y. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps or our brand new 100 Days of Code in Python. If you're interested in more than one course, be sure to check out the everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.