Inside the Python Package Index

Episode #64, published Fri, Jun 24, 2016, recorded Wed, Jun 22, 2016

Episode Deep Dive Transcript

What is the most powerful part of the Python ecosystem? Well, the ability to say "pip install magic_library" has to be right near the top. But do you what powers the Python Package Index and the people behind it? Did you know it does over 300 TB traffic each month these days?

Just me as we chat with Donald Stufft to look inside Python's package infrastructure.

Links from the show:

Donald on Twitter: @dstufft
Donald on the web: caremad.io
Powering the Python Package Index:
caremad.io/2016/05/powering-pypi/
A Year of PyPI Downloads:
caremad.io/2015/04/a-year-of-pypi-downloads
Donate to PPA: donate.pypi.io
PyPI (Legacy): pypi.python.org/pypi
Warehouse (new PyPI): pypi.io
BigQuery Data Source:
mail.python.org/pipermail/distutils-sig/2016-May/028986.html

Episode Deep Dive

Guest Introduction and Background

Donald Stufft is a seasoned Python developer who has been deeply involved in Python packaging for many years. He worked at Rackspace with a 50/50 focus on open source packaging before joining Hewlett Packard Enterprise full-time to improve the Python Package Index (PyPI) and the overall packaging ecosystem. Donald is a core maintainer of pip, a core maintainer and driving force behind the packaging toolchain, and one of the primary engineers behind PyPI’s ongoing modernization effort called “Warehouse.” He’s also a Python Software Foundation Fellow.

What to Know If You’re New to Python

If you’re new to Python and want to get the most out of this conversation about the Python Package Index (PyPI), here are a few basics:

Pip and PyPI: pip is the standard tool to install packages from the Python Package Index (PyPI). Anytime you write pip install some_package, you are fetching code from PyPI.
Virtual Environments: Often, we use virtual environments (via venv) to isolate Python dependencies to avoid conflicts.
Packaging Evolution: Much of Python’s evolving ecosystem (tools like pip, setuptools, twine, etc.) revolves around simplifying how you share and install code.
A common question for new folks is “Why is PyPI so critical?” This episode will clarify the massive scale, complexity, and importance of that system for anyone using Python.

Key Points and Takeaways

Why PyPI Is So Critical to Python The Python Package Index is often considered the beating heart of the Python ecosystem. Almost every library or framework (beyond the standard library) is distributed and installed via PyPI. The ability to run pip install [library_name] has become second nature for developers and data scientists alike, making PyPI one of the most relied-upon services in the Python world.
- Links and Tools:
  - PyPI main site (Legacy)
  - PyPI next-gen (Warehouse) or https://pypi.io
The Scale and Traffic of PyPI Donald shares that PyPI serves a staggering amount of data, hundreds of terabytes every month and billions of requests. With Docker, CI/CD, and cloud-based workflows, package installations constantly grow, further pushing PyPI’s bandwidth requirements. Hosting and bandwidth costs are largely donated by companies, such as Fastly for CDN services, but it still poses major operational challenges.
- Links and Tools:
  - Fastly (CDN)
  - HP Enterprise
Funding and the Tragedy of the Commons Running PyPI costs around $35,000 per month in infrastructure and bandwidth alone, excluding the cost of volunteer time. Many big companies depend heavily on PyPI, yet only a small handful sponsor it, reflecting a “tragedy of the commons” scenario. Donations, sponsor support, and developer time from the community are crucial to keep PyPI stable and thriving.
- Links and Tools:
  - Donate to PyPI (Redirects via the PSF)
  - Python Software Foundation
Inside PyPI’s Old Codebase The current (legacy) PyPI codebase is over 15 years old, predating modern frameworks like Django and Flask. It was never meant to be permanent, it started as a proof of concept. Lack of proper tests, reliance on CGI, and outdated design choices make improvements extremely risky and challenging.
- Links and Tools:
  - PyPI “Legacy” Repository on GitHub (Archived) (Note: The codebase is not typically recommended for new contributors.)
Warehouse: The Next-Gen PyPI (PyPI.io) To address these structural problems, the Python Packaging Authority (PyPA) is developing “Warehouse.” It’s a modern, tested, and maintainable web app built on Pyramid, SQLAlchemy, and Jinja2. Warehouse is accessible at pypi.io for testing and will eventually replace the legacy PyPI site once it’s fully ready.
- Links and Tools:
  - Warehouse on GitHub
  - Pyramid Web Framework
Framework Decisions Behind Warehouse Donald Stufft tried a few approaches, first raw WSGI, then Flask, then Django. He settled on Pyramid for Warehouse due to its flexibility and its non-restrictive design, allowing a mix of SQLAlchemy, Jinja2, and minimal “thread local” global usage. This structure lets developers test, extend, and contribute far more easily.
- Links and Tools:
Corporate Sponsorship and Volunteer Effort Hewlett Packard Enterprise sponsors Donald’s full-time work on packaging. Besides that, volunteers like Ernest Durbin (operations) and Richard Jones (original PyPI author) still step in. However, most folks contributing to PyPI and pip do so in their free time, raising concerns about sustainability and the future of Python packaging.
- Links and Tools:
  - PyPA (Python Packaging Authority)
  - pip
pip vs. easy_install and Virtual Environments easy_install was an older way of installing packages before pip took over. pip uses virtual environments as a best practice to isolate dependencies, whereas easy_install allows multi-version installs in a single environment. The recommended approach today is clearly pip + virtual environments.
- Links and Tools:
  - pip
  - virtualenv (often now replaced by python -m venv)
CI Systems, Docker, and Repeated Installs Large CI providers like Travis, GitHub Actions, and others spin up fresh environments for each build, which triggers frequent downloads. Docker containers also often download packages repeatedly from scratch. Together, these drive PyPI’s huge bandwidth usage and underscore the significance of caching strategies like pip’s built-in caching.
- Links and Tools:
  - Travis CI
  - GitHub Actions
Improving Uploads with Twine Upload failures on legacy PyPI sometimes reached 10%. Twine, a tool specifically for uploading Python packages, can direct uploads to the new Warehouse, reducing errors. Donald recommends Twine for a smoother, more secure distribution process.

Links and Tools:
- Twine

bpython: A Handy REPL Alternative When asked about lesser-known Python packages, Donald Stufft highlighted bpython. It’s a more advanced REPL that provides syntax highlighting, in-line autocomplete, and other features that significantly improve the interactive Python experience.

Links and Tools:
- bpython on PyPI

Call for Community Involvement From code contributions to corporate sponsorship, PyPI’s future relies on the community that uses it. Donald invites both individual developers and large organizations to donate, sponsor, or simply help fix bugs in tools like Warehouse or pip. With PyPI being mission-critical, a little help goes a long way.

Links and Tools:
- PyPI: Info and Contribution (Project docs may live here or on GitHub)

Interesting Quotes and Stories

"We have 343 terabytes of traffic and 3 billion requests a month. When it goes down, people definitely notice." -- Donald Stufft on the sheer scale of PyPI

"It was never meant to be permanent, just a proof of concept that stuck around for 15 years." -- Donald Stufft on PyPI’s original codebase

Key Definitions and Terms

PyPI (Python Package Index): The main repository of third-party Python packages where users publish and install libraries.
Warehouse: The next-generation PyPI replacement (pypi.io) built in Pyramid and designed for modern maintainability.
pip: The standard package manager for Python, used to install packages from PyPI.
Twine: A secure and more reliable tool to upload Python packages to PyPI.
PyPA (Python Packaging Authority): A collective that oversees and maintains core packaging projects like pip, virtualenv, and Warehouse.
Virtual Environment: An isolated Python environment to manage dependencies without interfering with system-level packages.
easy_install: An older package-install tool from setuptools, mostly replaced by pip.
Fastly: A content delivery network (CDN) that donates bandwidth and caching services to PyPI.

Learning Resources

Here are some curated links to help you go deeper into Python in general and aspects of packaging touched on in the show:

Python for Absolute Beginners: Ideal for anyone just getting started with Python, covers fundamentals with a friendly, project-based approach.
Write Pythonic Code Like a Seasoned Developer: Learn how to adopt more modern, idiomatic Pythonic coding patterns, which can help if you’re packaging or distributing libraries.
Getting Started with pytest: If you plan to contribute to Warehouse or other Python projects, a testing background is critical. Pytest is a powerful framework to ensure code reliability.

Overall Takeaway

PyPI sits at the center of the Python universe, hosting and distributing the libraries that data scientists and software developers rely on every day. Despite handling massive traffic and underpinning thousands of projects, it’s largely sustained by volunteer efforts and donations. This episode underscores both the importance of PyPI and the vital need for ongoing community support, financially and through code contributions, to keep the entire Python ecosystem healthy and growing. If you depend on pip install for your day-to-day work, consider giving back in some way to ensure PyPI remains reliable for everyone.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 What is the most powerful part of the Python ecosystem?

00:02 Well, the ability to say pip install magic library has to be right near the top.

00:07 But what powers the Python package index and who are the people behind it?

00:11 Did you know that they ship over 300 terabytes of traffic each month these days?

00:16 Join me as we chat with Donald Stuffed to look inside Python's packaging infrastructure.

00:21 This is Talk Python To Me, episode 64, recorded Wednesday, June 22, 2016.

00:26 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:56 ecosystem, and the personalities.

00:58 This is your host, Michael Kennedy.

01:00 Follow me on Twitter where I'm @mkennedy.

01:02 Keep up with the show and listen to past episodes at talkpython.fm.

01:06 And follow the show on Twitter via at Talk Python.

01:08 This episode has been brought to you by SnapCI and Rollbar.

01:12 Hey, everyone.

01:14 I have a really fabulous look inside the Python package index on deck for you.

01:18 Donald and I just scratched the surface, but there are a ton of fascinating topics that

01:22 we cover.

01:22 Before we get to our talk with Donald, I have one announcement for you.

01:26 I've released my second online course called Write Pythonic Code Like a Seasoned Developer.

01:30 It's over four hours and 50 concrete examples of how you can write more Pythonic code.

01:35 It's jam-packed with tips that you can incorporate into your projects immediately.

01:39 Topics covered include the expansive use of dictionaries, hacking Python's memory usage

01:44 via slots, using generators to comprehend and generate expressions, creating subsets of collections

01:50 via slices all the way to the database, and much more.

01:53 Several of these topics are Python 3-only features, so you'll have even more reason to adopt Python

01:58 3 for your next project.

01:59 The response to the first two days has been super positive.

02:03 I hope you'll take a moment to see what the course is all about at talkpython.fm/pythonic.

02:07 Now, let's get to Donald.

02:09 Donald, welcome to the show.

02:11 Yes, hi.

02:12 Hey, thanks for coming today.

02:13 It's going to be really fun to talk about packaging and PyPI and all these sorts of things.

02:18 But, of course, like always, before we get into the details, let's hear your story.

02:23 How did you get started in programming in Python?

02:25 Yeah, so programming in general.

02:27 I was playing a video game back in high school on EverQuest, and I started to get into hacking

02:34 that video game.

02:35 And there was a tool called MacReQuest that I started writing add-ons for and such, and it

02:40 sort of got me involved in programming at all.

02:44 And then I started to get jobs in programming.

02:48 My first job was with PHP.

02:51 And when I was working with Drupal at the time, I kept hearing about this cool framework called

02:58 Django, but it was written in Python.

02:59 So I picked up a Python book and sort of taught myself Python over a week or so, so that I could

03:08 then use Django to do websites instead of Drupal.

03:13 So I was feeling constrained by the sort of CMS aspects of Drupal.

03:17 Yeah, okay, that's cool.

03:18 Drupal's PHP, right?

03:20 Yes, Drupal's a PHP CMS that's sort of got some framework-y aspects to it, or at least it

03:28 did back then.

03:29 That was 2007 or so.

03:31 I haven't really looked at it.

03:32 I did recently.

03:33 Yeah, it's definitely still going.

03:35 So was it refreshing to get into Python from PHP?

03:39 Yes, I found Python to be incredibly useful.

03:43 concise and great to work with.

03:44 I like the enforced white space coming from, you know, somewhere where PHP attends.

03:52 And particularly at the time, a lot of the examples you would find online didn't always have the

03:57 greatest formatting.

03:58 So that sort of enforced formatting helped a lot with me reading other people's code to

04:05 figure out what they were doing and how it worked.

04:08 And sort of just a general grogking of the code bases to help me learn over time.

04:15 Yeah, yeah.

04:15 That's really cool.

04:16 I'm sure it was.

04:17 It was nice.

04:18 So I think maybe a good place to start this conversation would be to talk about what you

04:24 do for your day job.

04:26 Yeah, so my day job is I'm employed by Hucule Packard Enterprise, full-time employee.

04:33 And my sort of mandate is basically make Python packaging better.

04:38 So from there, you know, I work on PyPI.

04:40 I work on pip.

04:43 I work on a project called Warehouse, which is essentially PyPI 2.0.

04:48 The Warehouse theme is not really going to be exposed to end users other than if they start

04:53 to work on the backend.

04:54 But we call it Warehouse just to distinguish it.

04:57 Then there's other little small tools like Twine and, you know, sort of a lot of the background

05:03 efforts, you know, writing peps and coordinating things and whatnot sort of all falls under my

05:10 banner of what I do for my day job.

05:13 Yeah, so that's really awesome that HP is making that investment to more or less fund a developer

05:20 to continuously work on Python packaging.

05:24 Yeah, you know, you know, I think it's great.

05:26 You know, they are a big contributor to the OpenStack community.

05:30 And OpenStack is, I think it's not entirely written in Python now, but it's largely written

05:34 in Python.

05:34 So they're heavy users of, you know, Python and PyPI and pip and whatnot.

05:40 You know, so HP felt that, you know, they're depending on these things, you know, made sense

05:46 to invest in these things to make sure that they continue to be running and work and whatnot.

05:52 Yeah, that's cool.

05:53 So is this a position that you somehow managed to create or was there like a job announcement

06:00 hiring packaging support person?

06:05 Yeah.

06:05 So it was created for me.

06:07 At the time there was somebody named Monty Taylor inside the HPE who sort of led the effort

06:15 to convince the higher ups that this was something that they should do.

06:19 And, you know, he's the one who reached out to me and offered this to me.

06:22 You know, prior to this, I was working at Rackspace where they gave me half time.

06:26 You know, 50% time to work on packaging, 50% time to work on their own projects.

06:32 You know, so he sort of took that idea and just amped it up to the next level.

06:38 And, you know, we're really pushed for getting that done in HPE and convinced me to come over.

06:45 Yeah, that's awesome.

06:46 Yeah.

06:47 I think we'll dig in a little bit more later on how companies are supporting PyPI and so on.

06:54 But let's keep it high level for a little bit.

06:56 So I've had a lot of interesting conversations with people basically around the pronunciation

07:02 of P-Y-P-I.

07:04 And I've heard a lot of people say PyPI.

07:09 And I've heard a fair number of people say PyPI.

07:13 What's the official way to say it?

07:15 The official way to say it is PyPI.

07:18 It's Python package index.

07:20 So, you know, you put the emphasis on the PyPI.

07:23 A lot of people do call it PyPI.

07:26 I've heard Pippi and PeePee and all sorts of pronunciations.

07:31 I really don't like the name.

07:33 I think it's a confusing name that has this big sort of problem with pronunciation.

07:39 Everyone pronounces it a little bit differently.

07:41 And one of the most common pronunciations, PyPI, clashes with PyPI as P-Y-P-Y.

07:49 The alternative Python implementation, however, I've sort of thus far been unsuccessful at convincing people that it's worth it to change the name

08:01 since P-Y-P-I is so ingrained and everywhere across the ecosystem.

08:09 Yeah, it's deeply ingrained, but it's also an insider term, right?

08:13 Like, you can't walk up to somebody who's very barely familiar with Python and say PyPI and have them,

08:21 oh, yeah, of course, right?

08:23 But, you know, that's true, I think, with lots of the packaging systems.

08:27 Like, if you said NPM to somebody, they wouldn't know.

08:31 If you said Gems, they wouldn't know.

08:34 You know, NuGet.

08:35 Like, there's all these packaging systems.

08:37 They all seem to have poor names, but PyPI, okay, that's great.

08:41 So I'm glad to hear that I've been saying it right, but I've been following Guido's lead,

08:47 and I figured that's a pretty safe lead to follow.

08:49 Yeah, yep.

08:50 And, you know, PyPI is what it is, although I do slip up every once in a while and pronounce it wrong.

08:56 Yeah.

08:57 Okay, cool.

08:58 So PyPI, one thing I always wondered about, maybe you can explain this for me.

09:03 If I type pypi.python.org and hit enter, I get pypi.python.org slash pypi.

09:09 Like, why does it appear on both ends?

09:11 Is it just, like, really beloved?

09:14 There's a long history there, and I'm, you know, I don't fully know it because a lot of it comes from before my time.

09:22 But I believe PyPI was originally deployed under just python.org slash pypi,

09:29 and then they eventually moved it to its own domain name.

09:33 It's pypi.python.org, and they just kept the slash pypi because that prefix was baked into scripts,

09:42 and it was easier to just change the domain name rather than changing the whole path.

09:47 Oh, right.

09:48 So it's just a compatibility thing.

09:50 Yeah.

09:50 And it was actually just recently we, and I knew that because we just recently, within the last year,

09:58 broke compatibility for people who were still using python.org slash pypi,

10:03 and apparently there were still people who had scripts and whatnot pointing to that

10:09 because when we broke that, we got people yelling at us for it.

10:13 You heard about it, huh?

10:14 Yes.

10:15 Well, that's the quickest, easiest way to tell if a service or a database

10:20 or something like that is required is like, turn it off and see if anyone screams, right?

10:25 Yes.

10:26 And with PyPI, that's one of our main ways we figure out what's needed or not

10:33 because it's never been documented well what you can depend on in PyPI, and sort of the entire history of PyPI has been someone depended on something weird in it,

10:44 some sort of implementation detail, and suddenly that became the API.

10:47 So a lot of things in PyPI are we have no idea what people are depending on,

10:53 so we just change things and pray.

10:56 Yeah.

10:56 So you step very lightly making changes, I suspect.

11:00 Yeah.

11:01 Particularly on what we call legacy PyPI, which is what's running now because it's a very old code base.

11:09 It's like 15 years old or so, and it's got no tests.

11:13 It doesn't run locally very easily.

11:16 Like, to actually run it locally, you have to modify the code and comment things out to get it to start up.

11:23 So it's fun.

11:26 Yeah.

11:26 I'm sure this is really an interesting experience and long-lived Python code.

11:33 So maybe it's a good place to start talking about the history.

11:36 Like, PyPI is not as old as Python, right?

11:40 And so there was a while where Python was just a thing.

11:43 There was no packaging, right?

11:44 And then PyPI came along.

11:46 What's the story there?

11:47 So, again, this predates me.

11:49 So this is what my understanding is from what people have told me.

11:53 So Python did originally have no packaging story.

11:59 So people started sort of making their own story with make files.

12:02 Make files obviously are not super repeatable.

12:05 They had to write a whole new make file every time you switched to a different project.

12:09 They don't run by default on Windows, you know, et cetera, et cetera.

12:13 So someone whose name escapes me right now came up with distruteels to replace all these make files with a Python script that will run everywhere and it will sort of do all that stuff for you.

12:26 So, which at the time, great.

12:29 It was, you know, a really big step forward for Python and packaging in general.

12:35 And I think roughly around that same time, maybe a little bit after, there were several sort of efforts to create sort of a CPAN, but for Python, one name that I know of was Vaults of Parnassus.

12:49 I don't fully know all the names or what their specific implementations were.

12:54 But then Richard Jones, I believe, came up with the idea of PyPI and sort of implemented a proof of concept of what PyPI could be.

13:04 And they deployed it to Python.org.

13:06 And we're still running that today.

13:09 That proof of concept that was originally designed to be replaced quickly with the real thing.

13:14 I'm always a little suspicious or leery of proof of concepts, showing them to people who are like, oh, this is amazing.

13:24 Let's use this.

13:25 We don't even have to.

13:25 We're almost done.

13:26 Let's go.

13:27 Right.

13:27 Yes.

13:28 It can be a shifty, soft foundation, I suppose.

13:32 Yes.

13:33 And an important thing to realize about PyPI's code quality, it's not great.

13:40 It was written 15, mostly 15 years ago with people hacking on it since then.

13:45 And it predates much of what consists of the modern web stack.

13:51 Django didn't exist.

13:52 Pyramid didn't exist.

13:54 Flask didn't exist.

13:55 I think WSGI might have existed barely, or they were like contemporaries.

14:00 So the original PyPI didn't use WSGI.

14:03 It just sort of wrote its own Python handler.

14:07 I think it used CGI.

14:10 So a lot of that code is still there.

14:12 And a lot of what we've done to try and semi-modernize it has been how do we hack in WSGI support into this thing that expects CGI?

14:22 How do we deploy this?

14:24 We're using a modern web server instead of this little script that just happens to sit inside the thing.

14:29 Yeah.

14:30 So it's got a lot of soap in it because it was big 15 years ago.

14:35 Right.

14:35 Of course.

14:36 A lot of its problems really just stemmed from the fact it was written before we knew how to do good websites.

14:43 Yeah.

14:44 What year was that when the prototype was created?

14:47 I believe it was 2003.

14:48 Okay.

14:50 So there were some examples of serious web activity, serious websites, right?

14:56 That's .com, just post .com days.

14:59 But still, Python was not nearly as polished around the web story there, was it?

15:05 Yeah.

15:05 Correct.

15:05 Cool.

15:06 Yeah.

15:06 Okay.

15:06 So that's really interesting.

15:07 So what's the relationship with easy install these days?

15:11 Easy install lets you install packages.

15:13 pip lets us install packages.

15:15 It's all pip these days, right?

15:17 Is there still a reason to use easy install?

15:19 So easy install still exists.

15:21 It does a few things that pip doesn't do.

15:25 And pip largely doesn't do them on purpose.

15:28 Like easy install supports multi-version installs where you have a single Python environment and you can install multiple versions of, say, requests into that single Python environment.

15:41 And then at runtime, it will generate a sys.path that has the correct version for whatever thing you're running on that sys.path.

15:51 pip doesn't do that, which I believe is a good thing pip prefers to use.

15:57 Is that because pip really assumes the presence of virtual environments and it's like, if that's what you want, just create a virtual environment, it's a different version?

16:04 Yes.

16:05 Correct.

16:05 So pip says use virtual environments, create explicit environments, and install things into there.

16:13 Easy install while it works with virtual environments because there's a way virtual environments work.

16:19 Virtual environments, it sort of says you declare in your script or in your script wrappers, et cetera, what you depend on.

16:29 And we will sort of create a dynamic virtual environment in memory by munging the sys.path, whereas virtual environments are you create an explicit named environment, a name by your file system path.

16:42 And then, you know, pip will install things flat, you know, just a single version into that environment.

16:48 That's a cool feature that you could hack it together and have multiple versions, but it seems like it's almost more trouble than it's worth.

16:55 Yeah, so it's certainly an interesting feature.

16:58 I don't think one is necessarily better than the other.

17:01 They each have their trade-offs.

17:04 We've sort of settled around using the virtual environment, a single flat install system, and I don't think there's enough of a reason to go back to the easy install sort of multi-version install system.

17:17 Just because, you know, I think everyone's sort of figured out how to use the virtual environments.

17:23 And, you know, I think trying to change gears now would just be a big disruption for not a whole lot of benefit.

17:29 But if we had never done that, I think we would be in a perfectly fine place now.

17:34 We just have a different mechanism for doing things.

17:37 Okay, cool.

17:39 So what's the relationship between the Python Packaging Authority and the Python Software Foundation?

17:46 Yeah, so the PyPA is sort of not a real thing, sort of is a real thing.

17:52 You know, it's not an official organization.

17:55 We don't have a 5013C or whatever they're called.

17:59 So Python Packaging Authority is like a shadow organization pulling the strings of Python packaging.

18:04 Yeah, sort of.

18:05 Yeah, yeah.

18:07 You can think of it that way.

18:09 You know, it sort of came around because pip was putting itself on GitHub and they needed an organization name.

18:18 So they just kind of jokingly called it PyPI or PyPA.

18:21 And I have too many things in my head.

18:23 It's good.

18:23 P-Y-P something.

18:25 I confuse them.

18:27 Yeah, so they sort of jokingly called it PyPA.

18:29 And then as we sort of, this sort of push in the past couple of years to really standardize around these things and fix things and get moved forward, we just sort of started to take that name and say, okay, we're going to really take this name that started out as a sort of joke.

18:44 And we're going to use this for a real thing.

18:46 And it's sort of just an umbrella organization where it's like, okay, you're working on something that's packaging in Python.

18:52 You can bring your project into PyPA.

18:55 We don't have very many rules.

18:57 I think the only rule we really have is that you have to, your project has to be governed by our code of conduct.

19:02 And beyond that, you know, you can run your project however you want, use whatever VCS you want, et cetera, et cetera.

19:08 But it sort of provides a central location for someone who's, okay, I want to work on packaging stuff.

19:13 You know, go to the PyPA, you know, pages.

19:16 You know, you can see the list of projects.

19:19 And then since there's a lot of cross-pollination between people who work on these projects, kind of makes managing permissions and stuff a bit easier.

19:26 But, like, funding, we're starting to try and get funding and stuff available for that.

19:32 So that's largely going through the PSF.

19:34 You know, so the PSF is sort of the legal entity that we hang our stuff off of.

19:39 But the PyPA is sort of, you know, just our little unofficial organization.

19:43 You can think of it similar to Python Dev, how Python Dev manages CPython.

19:48 But, you know, the legal trademarks and stuff sort of hang off of the PSF.

19:54 Right.

19:55 Okay.

19:55 So since the Packaging Authority is just, like, a loose group of people who work together on packaging, there's no legal component there.

20:05 Like, for example, I made a donation to PyPI when you guys announced, like, hey, we need some supporters and whatnot.

20:13 Small one, but still, to do that, I actually went to, through the PSF website.

20:17 And I sort of donated it to them.

20:19 And then they forwarded it on, right?

20:21 So for things like that, where you need an entity, PSF is kind of there to support you.

20:26 Yeah, exactly.

20:27 It frees us up from having to deal with, you know, getting the board of directors and dealing with the mundane legalities.

20:35 PSF already does that.

20:37 And I believe they call it fiscal sponsorship or something along that lines.

20:41 But, you know, it's basically, you know, they manage being the legal entity behind this stuff.

20:47 And we manage actually doing the, you know, the code work and, you know, the roadmaps about how this is going to go forward and, you know, making those sort of decisions.

21:11 SnapCI is a continuous delivery tool from ThoughtWorks that lets you reliably test and deploy your code through multi-stage pipelines in the cloud without the hassle of managing hardware.

21:23 Automate and visualize your deployments with ease and make pushing to production an effortless item on your to-do list.

21:29 Snap also supports Docker and M browser debugging, and they integrate with AWS and Heroku.

21:36 Thanks, SnapCI for sponsoring this episode by trying them with no obligation for 30 days by going to snap.ci slash talkpython.

21:54 So I kind of want to focus on three themes around PyPI.

21:58 One is looking inside the infrastructure and the traffic and all that.

22:03 One is about this new version you called warehouse.

22:06 And then also, like, the funding and support and that kind of thing.

22:09 So let's start with the traffic and infrastructure.

22:11 You wrote a cool blog post called Powering the Python Package Index.

22:17 In there, you really laid out a lot of the underlying technology.

22:20 You talked about some of the bandwidth requirements.

22:23 Like, I felt like I had bandwidth requirements until I saw what you guys are doing.

22:29 Like, you had a comment where you said, We had 293 terabytes of traffic serving 3 billion requests in April, for example.

22:41 Yes.

22:43 Can you want to talk a little bit about that?

22:44 I mean, that's pretty darn impressive.

22:45 Like, you know, you've got to go talking to the video places, the Vimeos, the YouTubes, and the Netflixes to get more traffic than that or more bandwidth than that in some sense, right?

22:56 Yeah, you know, and I mean, the vast bulk of that bandwidth is taken up by the package files themselves, although a not insignificant amount of that is taken up by API requests and stuff.

23:07 I was actually curious.

23:08 I looked at May's numbers, and May's numbers are 343 terabytes and, you know, 3 billion and change HTTP requests again.

23:20 That's really amazing.

23:21 And so it sounds like May had, I don't know, what is that, like 10% more traffic in terms of bandwidth than it did in April.

23:31 So does that mean, how do you interpret it?

23:34 Does that mean that we have more popularity of Python, like the popularity and usage of Python is growing?

23:40 Or does that mean people are spinning up more little tiny VMs and pip install requirements.txt more often?

23:49 Like, is this a use case variation or is this a adoption variation, you think, as these numbers are going up?

23:55 Yeah, so I actually don't have a whole lot of insight into exactly what it is.

23:59 It's something that I've tried to get some insight into, but it's kind of hard.

24:04 I will say one thing is that as of pip 6, which was released in, I want to say, the end of 2014, I think.

24:13 Don't quote me on that, though.

24:15 pip sort of aggressively caches locally.

24:19 So you type pip install requests and you get request version 3.0 or 2.0 or whatever.

24:27 It only downloads that file once per computer, basically.

24:30 It uses HTTP caching.

24:32 We have 10-year-long cache headers on those.

24:36 So it's basically once per 10 years per file.

24:39 It's basically once per machine, right?

24:43 Like, probably the machine goes away before the cache does.

24:45 Yeah, unless you blow away the cache or something along that lines.

24:51 So we're definitely, I believe, not seeing increase in people doing pip install on their own machine multiple times.

25:01 Or rather, I shouldn't say we're not seeing increase.

25:03 We don't know if that's increasing or not because they download once and then we never see that download again from them.

25:09 No matter how many times.

25:10 Right.

25:10 It's probably not reflected in those numbers.

25:12 Yeah.

25:13 Yeah.

25:13 So this is going to be either new users or new machines, cloud machines.

25:19 You know, I think probably more people switching to cloud-based workflows, you know, have an impact on this.

25:25 You know, because each time you bring up a new cloud, you know, Docker containers, yes.

25:30 Each time you bring up a new Docker container, a new cloud, you know, you start with a fresh cache that downloads again.

25:37 I think people are doing CI more and particularly things like Travis and such.

25:41 Unless you go out of your way to cache things between runs, Travis is going to give you a brand new cache each time.

25:49 You know, so I think a lot of it is things like that.

25:51 Yeah, I use SynapseEye and every time I check in something, it definitely pip installs a bunch of stuff.

25:57 I'm not sure if it has the cache populated before that or not.

26:02 But yeah, you're right.

26:03 Every check-in in some sense triggers that kind of behavior.

26:07 Yeah.

26:07 You know, and plus I do think we are – I don't know if Python itself is growing in usage, but I think pip is particularly since there's been a lot more push from the Python doc side and things.

26:19 Say, hey, you know, here's a thing that you can use to download other packages.

26:24 I think particularly for Python 2.6 and 2.7, they're getting kind of long-handed too.

26:29 And people are wanting some of the new features from Python 3 without actually switching to Python 3.

26:33 So they're installing backports and stuff or they're reaching out for things that aren't included in the standard library as much more than they used to be.

26:43 You know, I also think there is a – previously, PyPI wasn't very reliable.

26:51 You know, we went down fairly regularly, sometimes even in the middle of file downloads.

26:55 And sort of in the past three to five years, I believe it's gotten to a point where you can pip install something and be pretty confident you're going to download something and install it.

27:07 So I think just usability overall has made people more willing to use PyPI than they were in the past.

27:15 Yeah, and kudos to you guys for that, right?

27:17 Yeah, although, I mean, I would admit a vast amount of that has been Fastly.

27:22 I say Fastly is our secret scaling sauce because, you know, we really would not be able to do near the amount of traffic with the skeleton crew we have without offloading a lot of that to the CDN.

27:38 Right.

27:38 You guys don't have, like, a massive data center in San Antonio or something that's, like, in a bunker that you manage.

27:45 You're pushing this to the cloud like all the modern companies, right?

27:48 Yep, yep.

27:49 And we actually have PyPI Legacy runs.

27:53 We have three web nodes for all of that.

27:56 And we have a Heroku database server.

27:59 And, you know, we store our files in S3.

28:03 And that's really about it for the infrastructure for PyPI itself.

28:08 Okay, excellent.

28:09 And can you give us a sense of what it costs to run PyPI?

28:13 Yeah, so for us, the cost is zero.

28:17 But besides the people's time, but, like, right, of course.

28:21 Yeah, because we are lucky to have all these companies donate services.

28:28 I think I clocked it at where somewhere around $35,000 a month.

28:32 You know, some of those numbers are a little fuzzy because, like, Fastly, the billing numbers we have are for all of our use of Fastly and Python.

28:39 Right.

28:40 But you probably represent the vast majority of traffic.

28:45 Yeah, well, like, in May, we said we have 340-some terabytes for PyPI.

28:51 For Fastly in general, we had 399 terabytes.

28:55 You know, and our Fastly bill in May was $33,000.

29:01 You know, we have $6,000-some dollars in Rackspace for all of Python.

29:06 And then we have, you know, DNS and such there.

29:10 So not counting people time, you know, we're roughly $35,000, maybe a little bit more a month for just PyPI.

29:17 Okay.

29:18 This is not something that you could just run if you wanted to, right?

29:23 This takes the support of the community and companies like HP that are really backing it and Fastly.

29:29 Oh, absolutely.

29:30 Because there's no person that's going to go, you know, I really believe in this project, so here's $35,000 a month to pay for bandwidth.

29:38 Yeah, you know, I mean.

29:39 I guess there could be people.

29:41 Maybe you can find, you know, Bill Gates or someone.

29:44 You could have been to them.

29:46 That's almost $400,000 a year.

29:49 That's, you know, that's more than most people make.

29:51 But outside that, yeah.

29:53 Let's talk about the people involved.

29:56 So we know that HP is basically supporting you to do what needs to be done for Python packaging, mostly around PyPI, but in the general sense.

30:06 How many people work on these projects?

30:09 And how many people would you say are responsible for keeping pip install a thing a possibility?

30:16 Ernest Durbin helps a lot with PyPI itself.

30:21 He is not paid for that.

30:22 He's completely volunteer.

30:24 He's sort of the ops side of things.

30:27 I do a little bit of ops.

30:29 I'm not great at it.

30:30 So he does a lot of the operation stuff and is super helpful with that.

30:35 You know, Richard Jones does things on a volunteer basis also.

30:39 He's largely stepped back lately.

30:41 Just because his time has been taken up by other things.

30:44 But he still comes around and helps with support requests and stuff, which I don't have the time to do.

30:50 You know, then we start getting like setup tools, which you need for pip install to do source disks.

30:58 You know, that's largely Jason Combs, Jericho.

31:03 That's largely his baby now.

31:05 You know, then you start talking about pip.

31:06 You know, I'm there as well.

31:08 You know, there's like a Paul Moore and Marcus is there and a few other people, you know, but you know, their time is all limited based on because they're doing it in their spare time, which I do in my spare time as well on top of the HP time.

31:26 But most people outside of myself have a lot more limitations to the time they can spend on it just because they're completely volunteer based.

31:39 Yeah.

31:39 So just a handful of people full time.

31:43 Oh, yeah.

31:44 And everyone else is just a little here and a little there as they can.

31:47 Yeah.

31:48 Do you feel like PIPI is underfunded?

31:50 Yeah.

31:51 You know, I do believe PIPI is underfunded.

31:55 You know, it's sort of the tragedy of the commons.

32:00 You know, it's used by a lot of people, you know, as evidenced by our traffic numbers.

32:06 That or someone's using it a lot.

32:09 While we do get support from, you know, a number of companies, you know, realistically, I forget the exact number off the top of my head, but we're less than 10 companies who support us.

32:20 And I'm pretty sure more than they use PIPI.

32:23 They could be just installing a lot of stuff.

32:26 Of course, I mean, everybody uses it.

32:29 It's absolutely one of the foundational things, right?

32:33 It is the thing that facilitates batteries included in the broad sense of Python, right?

32:40 Yeah.

32:41 You know, I absolutely believe so.

32:43 You know, I think, you know, it's one of the most important things, if not the most important things provided by the PSF infrastructure.

32:53 It's debatable with, you know, you know, HG or HG instances and stuff.

32:57 I think it's one of the most important things that, you know, the PSF infrastructure does provide.

33:03 You know, and certainly when it goes down, people are very quick to notice that, you know, they, I get notifications through Twitter and email and IRC before our monitoring even notices it's down.

33:19 People are telling me it's down.

33:20 A lot of people depend on it.

33:22 Is it stressful to you to be responsible for it?

33:26 So there's definitely stress.

33:27 Over the years, I've gotten better at dealing with it.

33:30 You know, a lot of people, it's one of those things where, you know, when it's working, people don't think about it.

33:36 And when it's not working, they're out in droves to tell you it's not working.

33:42 You know, and I've never actually been technically on call for PyPI.

33:48 Ernest has been, Nova Kanchowitz has been, been technically on call.

33:53 But I say, you know, it's basically impossible for me to not be on call unless I completely unplug myself from the internet.

34:03 You have to go into the forest and not look for sky riding airplanes.

34:07 Yeah, because, you know, I'm so publicly known and associated with PyPI that, you know, people are reaching out to me as soon as it's down.

34:20 And to their credit, they're mostly trying to be helpful to let me know it's down.

34:24 But, you know, so it becomes a flood of communication anytime it suffers serious downtime.

34:31 Now we've sort of engineered it so that pip install very, very rarely goes down.

34:38 But uploads and stuff kind of go down more often than that.

34:43 But that affects a much smaller percentage of people.

34:46 So that doesn't get as bad.

34:48 Absolutely.

34:49 I'm really excited to tell you about a new sponsor of the show, Rollbar.

35:07 One of the frustrating things about being a developer is dealing with errors.

35:11 Relying on users to report errors, digging through log files, trying to debug issues, or a million alerts just flooding your inbox and ruining your day.

35:18 With Rollbar's full stack error monitoring, you get the context, insights, and control you need to find and fix bugs faster.

35:25 It's easy to install.

35:26 Start tracking production errors and deployments in eight minutes or less.

35:30 Rollbar works with all the major languages and framework, including the Python ones.

35:35 Django, Flask, Pyramid, as well as Ruby, JavaScript, Node, iOS, and Android.

35:40 You can integrate Rollbar into your existing workflow.

35:43 Send error alerts to Slack or HipChat, or automatically create new issues in Jira, Pivotal Tracker, and lots more.

35:50 We have a special offer for Talk Python listeners.

35:53 Visit Rollbar.com slash Talk Python To Me and get the bootstrap plan for free for 90 days.

35:58 That's 300,000 errors tracked for free.

36:01 But hey, just between you and me, I really hope you don't need that many errors.

36:05 Rollbar is loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch, and more.

36:12 Give them a try today.

36:13 Go to Rollbar.com slash Talk Python To Me.

36:16 If we had more people working on it and more people whose time was more seriously dedicated to it like yours is,

36:31 we could probably do some amazing stuff, right?

36:35 So there's a way to donate to this.

36:39 You could actually have a donate.pypi.io as like the landing domain, I guess you would call it, because it redirects.

36:47 But that allows people to donate.

36:50 Like how if people are working for a company and they're like, yeah, our whole company does $100 million of revenue, depends upon this thing.

36:58 And we've actually not even helped support it.

37:01 Maybe we should.

37:02 Like how would they get involved?

37:04 Maybe those would be the biggest bang for the bucks if we could get some big companies to step up.

37:08 Yeah, so obviously they can donate.

37:10 If that's the way that they want to get involved, donations are easy.

37:13 And there are tax write-off in the U.S.

37:16 But if they want to donate developer time, one of the easiest things to do would be to contribute to Warehouse,

37:25 which is on GitHub at github.com/PyPA slash Warehouse.

37:30 That is deployed now to PyPI.io.

37:33 That's where you're going to live in the future.

37:36 And, you know, my big push right now is trying to get Warehouse to the point where we can say, okay, this is ready enough.

37:43 Let's start directing people here by default.

37:46 Because PyPI is sort of an old code base.

37:49 It's slowly falling apart.

37:52 Uploads, we get a sort of base 10% error rate on uploads right now.

37:57 You know, it's just falling apart.

37:59 It's sort of a full-time job to keep it from not falling apart, which doesn't give me any time to work on other things.

38:07 All your fingers plugging holes in the dam rather than, like, building the dam.

38:11 Yeah?

38:12 Yeah.

38:13 So I've sort of ignored some of the holes willfully to try and free up time to get Warehouse ready to launch.

38:20 Because, you know, the idea is that hopefully Warehouse will get slotted into place and a lot of things will be better.

38:28 I'm sure it's going to break things for a lot of people.

38:30 But it will become more joyous to work on, and maybe you'll get more contributors using stuff that people understand, right?

38:37 Yeah.

38:38 Yeah.

38:38 Oh, absolutely.

38:39 Like, PyPI Legacy is largely two files, both over 4,000 lines long.

38:46 And I think I'm the only person left that actually understands it.

38:50 Richard hasn't touched it recently enough, and things have changed that he would have to get ramped back up.

38:57 And so far, I've personally had 100% failure rate on getting new people involved in that just because they take one look at the code and then it just kind of didn't disappear.

39:06 It's not a fun code base to work on.

39:09 You know, it's particularly because you can't really run it locally.

39:13 So a lot of times, changes are make a change, push it to production, maybe to test PyPI, which is sort of a staging slash sandbox instance we have, depending on what the exact problem is.

39:26 And then just sort of pray and who the century is and start yelling at us.

39:29 Yep.

39:30 All right.

39:31 So Warehouse is a huge topic that people are interested in.

39:35 I want to get to that.

39:36 But before we move off this topic, you also heard another blog post called A Year of PyPI Downloads.

39:41 And there was a lot of insight that could be gleamed from those numbers.

39:45 Can you talk about that really quick?

39:47 Yeah.

39:48 So sort of in general, in January of 2014, we started saving and archiving all of our download logs.

39:59 In 2014 and again in 2015, I believe it was, I sort of pulled those all down and crunched some numbers on those and pulled out some what I thought were interesting numbers largely around, you know, what versions of Python are being used, you know, what versions of packaging tools are being used, things like that.

40:15 And those blog posts were sort of very widely received.

40:20 So what I've done in the mean since then is I've sort of improved our metrics stream to the point where now every single download generates a row in a BigQuery database.

40:35 And, you know, which is got all sorts of information like, you know, what version of Python downloaded it, you know, what tool downloaded it, what country it came from, stuff like that.

40:45 And as of, let's say, a month ago, that data is now completely public for anyone to go and query as long as they have a Google account.

40:52 Nice.

40:52 Yeah.

40:53 Give me the link to where that is and I'll put it in the show notes.

40:56 Yeah.

40:56 So I'm not really that great at data visualization or pulling information from data.

41:03 You know, I tried to do my best with my year of PyPI downloads posts.

41:09 But I'm hoping that by making that completely public, A, people can do, can make data-driven decisions about what versions of Python they support.

41:17 Because you can very easily, using an SQL-esque language, say what versions of Python are downloading X package over some period of time.

41:26 You know, I'm hoping that people who are actually good at data visualization and pulling meaning from data can take a look at it and come up with some interesting information.

41:36 Particularly maybe as it relates to, you know, correlating that with, you know, GitHub data or Bitbucket data or, you know, some other kind of data.

41:45 Yeah, that sounds really cool.

41:47 And if you give me a link to the details, I'll be sure to put it in the show notes.

41:51 Yep, yep.

41:52 It's not greatly documented right now.

41:54 It's basically a post to Disutil SIG telling people about it.

41:59 The historical data is not in there yet.

42:01 You know, so that data starts, I want to say, March-ish, sometime late March.

42:07 That data starts, I'm backfilling from January of 2016 right now.

42:13 And then I have to backfill back beyond that.

42:17 But that takes a bit more effort because the logs aren't in the correct format.

42:21 So I have to come up with something to munch those logs into the new correct format.

42:27 Yeah, okay.

42:27 Well, that sounds really promising.

42:28 And, you know, there's a lot of data scientists and data visualization professionals that listen to my show.

42:34 And when I put data out, they seem to do amazing stuff with it in a surprisingly short amount of time.

42:40 So maybe someone will come up with something cool for that.

42:43 Let's talk about Warehouse because that's a really interesting project.

42:47 And I sent some messages out on Twitter.

42:50 I said, hey, what should I ask Donald while we're talking?

42:53 And everybody came back with some variation of talking about Warehouse.

42:58 So Warehouse can be found at PyPI.io.

43:01 What is it?

43:02 Yeah, so PyPI.io is the production-ish.

43:07 I say ish because we're not monitoring it or anything.

43:11 Deployment of Warehouse, which is PyPI 2.0.

43:15 It is backed by the same PostgreSQL database, the same S3 instance, et cetera.

43:21 So anything that changes on PyPI changes on Warehouse, and the reverse is true.

43:27 Right.

43:28 So they're watching the same data stores.

43:30 It's more of the front-end and API implementation, right?

43:34 Correct.

43:34 So right now, the read-only portions of it, which is 90-some percent of our traffic, are pretty much done.

43:43 There are a few.

43:43 If you go through the UI, you'll see a few to-do stuff that we have to either finish or comment out before we make it official.

43:52 A lot of the author UI stuff is not done yet.

43:56 Okay.

43:56 The client side, the read-only UI bit is really nice.

44:01 It looks, you know, I feel like I'm going from SourceForge to GitHub equivalent of experience.

44:07 It's much better than current story.

44:11 So I think that's going to be delightful.

44:13 That's cool.

44:14 Yeah, yeah.

44:15 And, you know, I think that was Nicole Harris who's done the design for that, which I think she's done a phenomenal job on that so far.

44:24 You know, particularly going from what we had to what this is, you know, it really does feel like you're jumping forward an era or two, you know, as far as what modern design looks like.

44:37 Oh, yeah, absolutely.

44:38 No, I was just saying, you know, she's put a big focus on, you know, trying to get the usability of PyPI to be a whole lot nicer and better to use, you know, and surfacing the information people care about and, you know, hiding information that people really don't care about or maybe admitting it completely if it's, you know, just confusing information or information that maybe really needs to tend to know.

45:01 Yeah, nice.

45:02 So can you talk a little bit about what frameworks and internals use to create it?

45:07 Yeah, so Warehouse is written in Pyramid.

45:10 Now it is.

45:12 Warehouse has got sort of a sordid history where I've gone from, I started out using just WordZig and making my own framework to then it went to Flask, then it went to Django.

45:24 Now I've finally settled on Pyramid.

45:26 Why do you make those changes?

45:28 Like, what kept moving you along?

45:31 Yeah, so one of the things I wanted to do with Warehouse was to 100% test coverage across the board.

45:40 You know, coming from with PyPI where, you know, we had zero test coverage.

45:44 One of the things that was very painful to me to make any change was, you know, figuring out what the impact was going to be, where I'm going to break code that I thought was unrelated but was really related.

45:55 So I really wanted to make sure we had great test coverage.

45:58 Unfortunately, a lot of the web frameworks out there tend to use a fair amount of globals.

46:05 There's some argument about whether it's threat globals or technically globals or not, but I call them globals too.

46:10 I think they're globals.

46:11 You know, and, you know, that makes it more difficult to test things.

46:16 So I really created my own framework on top of Workazig that was heavily influenced by Gary Bernhardt's boundaries talk where there was very few of these sort of bag of items.

46:32 There wasn't a user class that had, well, that was a data model and it had, you know, all these sorts of things you could do to a user hanging off of it.

46:40 So that makes it hard to test because you have to pass through a huge interface.

46:44 It's sort of like a miniature global.

46:46 They kind of have to pass through this huge interface and actually test things.

46:49 You have to provide an object, implement all those things or else, you know, your tests are not actually really testing things.

46:56 And so we didn't have a ORM, you know, it was using SQL Alchemist expression layer to some degree.

47:03 It was also using just raw SQL in some places.

47:07 And one of the things that we, that I discovered doing that was, A, I was reinventing a lot of things, which, while fun, wasn't necessarily the best use of my time.

47:17 You know, two, it brought us back to the same problem we had with PyPI where other people found it hard to contribute to it because it was using something that, well, it fit my headspace quite well.

47:30 It didn't necessarily fit other people's headspaces.

47:33 And, you know, three.

47:34 Right.

47:34 Nobody had experience with it, right?

47:36 Yeah.

47:37 And, you know, and three, it, a lot of decisions were bottlenecked on me because it was, well, how, how do we do X thing in this completely custom framework?

47:47 Well, nobody knows except Donald because Donald has to invent it.

47:50 And so a lot of things were bottlenecked on me.

47:52 So, you know, me and Richard sort of talked about it and we decided, you know, we need to move to something more standard.

47:58 And I can't recall exactly if it was Flask or Django that came first.

48:02 I know it was ported to Flask and I wasn't really too fond of that because one of Flask's big API things is thread locals as part of the API.

48:13 And my experience with that is it becomes hard to then do anything in Flask without adding increasing amounts of thread locals.

48:23 And so I kind of mixed that because I just didn't like the third locals.

48:26 I went to Django and Django is a great code base.

48:30 You know, I've used Django a lot.

48:33 I just started with Django.

48:34 I started porting it to that and I discovered that the Django ORM is not really powerful enough to handle all of the things that PyPay does in its database that it's sort of accumulated over time.

48:48 We have, you know, we have tables with composite keys.

48:51 We have composite foreign keys.

48:53 We have tables without any primary keys.

48:55 You know, we have, you know, a number of things.

48:58 And, you know, I did get the user things ported over to Django.

49:02 Then I just sort of gave up and said, you know, this is too much work to actually get this to slot it into the shape that Django wants it to be.

49:10 And I said, I'm going to just have to use SQLAlchemy for this, which is another great ORM.

49:16 But then once you sort of throw away the Django ORM, you'll lose a lot of the power of Django.

49:22 And I also wasn't using the Django template language because prior to all of this happening, I worked on a sort of another alternative front end to PyPI called crate.io.

49:32 Back before I was a PyPI administrator, which was written in Django.

49:35 I used the Django template language in that.

49:38 And that became a bottleneck because of how big some of our HTML pages were to list all the packages.

49:44 That became a serious bottleneck.

49:46 So then we're sitting there.

49:48 So we use Jinja2.

49:50 So then I'm sitting there looking, okay, we have Django.

49:54 It's kind of hard to fit our database into Django's ORM.

49:58 So we can't use that.

49:59 The DTL was too slow at the time.

50:02 I don't know if it is now to really use that.

50:04 So we've sort of paired Django down to a glorified request router.

50:09 And we've thrown away a lot of the power of Django.

50:12 There's no third-party apps.

50:14 There's no admin.

50:15 So then I took a look at Pyramid.

50:17 And Pyramid had a lot of the things I liked about Django.

50:20 There was really no thread locals.

50:22 There is a thread local, but it's optional whether you use it or not.

50:27 And, you know, it had enough flexibility in it to sort of use whatever tools we wanted to do.

50:33 So I could bring in SQLAlchemy and use that.

50:35 I could bring in, you know, sort of change things around to sort of suit my purposes better than I could with Django.

50:43 Now, the flip side of that is it doesn't do as much out of the box.

50:47 You have to kind of configure it and make it do what you want to do.

50:51 But, you know, given the long history of PyPI and all the sorts of little weird things that's grown over time,

50:57 that fell a lot nicer to what we needed to do than sort of Django does.

51:02 Because while Django is great, once you sort of get out of the Django workflow,

51:08 you start fighting in the framework a lot more than you do with Pyramid.

51:12 Yeah, okay.

51:13 That kind of makes sense to me.

51:15 The testing part with Flask, that I didn't know about.

51:18 But I can imagine the stuff with Django.

51:20 You know, I think you and I settled on, for very different projects, the same technology stack, more or less, like for all my web properties,

51:27 I'm using SQLAlchemy and Pyramid and whatnot.

51:29 So, yeah, very interesting.

51:32 Let's see, we have just a few minutes left, and I have a couple of questions from the listeners,

51:37 and then maybe one more thing I wanted to make sure we touch on.

51:41 Mahmoud Hashemi, who's also been a guest on the show, asks if real-time download counters are coming back.

51:47 Yeah, they are.

51:49 So, the old metric stack sort of fell apart and died.

51:54 because it was sort of hacked onto the side of PyPI, like a lot of what PyPI does has been.

51:58 The new metric stack, which is based around BigQuery and such, is sort of designed to allow us to bring that back,

52:05 as well as give us this sort of archival ability to query all sorts of things.

52:11 I disabled them just because they were zero all the time, because the thing had broken, and I didn't have time to fix it.

52:18 But we are planning on bringing those back, and hopefully they'll actually work this time, and it'd be nice.

52:26 Yeah, would they reappear in the warehouse time frame, or would you bring them back to the legacy version?

52:30 It'll probably be in the warehouse time frame.

52:33 Might not be until after launch of warehouse, unless someone feels like coming around and figuring out how to do all that beforehand.

52:42 I'm new to BigQuery.

52:44 So far, I really enjoy using it, and it seems to work really great.

52:48 But one of its constraints is that queries can take a couple seconds to a minute or two to run,

52:56 which is fine if you're just looking at it for data archival.

53:00 Not great to do in the middle of a request-response cycle.

53:04 Yeah, that's for sure.

53:05 So we need to do some work around figuring out, okay, how do we take this data that's in BigQuery

53:10 and put it in a format that we can look at in warehouse in the request-response cycle,

53:18 and then just take someone to sit down and figure that out.

53:22 I just haven't had the time to do that.

53:23 Sure.

53:24 So final thing on warehouse.

53:27 Nicola Kentar asks, what's the time frame for shipping that?

53:31 So I've given a few dates before.

53:33 We've passed them all so far.

53:35 We're not historically been great at estimating how long until it's ready to go.

53:40 I think we're pretty close, though.

53:41 I've started – I just recently told people on District Till Sig to start switching their uploads to using warehouse,

53:49 largely because we have a 10% standard failure rate on uploads to legacy.

53:55 So far, everyone who's done that has said it's worked great.

53:58 We just recently committed to Twine, which is a tool to replace setup.py upload.

54:04 We switched in the master branch to default to using warehouse that's not been released yet.

54:10 But I'm hoping that that will be released in the next couple weeks.

54:14 And if that works out great, hopefully we will get Cpythron itself switched to the warehouse for uploads.

54:21 And, you know, then that will hopefully propagate out and people will use that and that will solve a big problem.

54:28 As far as switching the actual, you know, like redirecting the old domain, I think it's soon.

54:34 I don't have a good target for exactly when.

54:38 But hopefully – I'm really, really hoping it'll be in 2016 because I'm tired of the old code base.

54:47 I want it to die a thousand deaths.

54:51 Yeah, I can imagine.

54:53 That sounds like a really cool new version.

54:56 And it sounds like it's going to be great for everybody when it's working and it's the default.

55:01 Certainly, my playing around with it, you know, it seems like a nice place to be.

55:05 So I have a bunch of other questions I'd love to talk to you about.

55:09 But we're just running out of time.

55:11 So maybe we'll have to leave it there.

55:14 Maybe when you guys do ship it, when it flips, maybe we can come back and do some kind of celebratory show.

55:20 Sure.

55:20 To celebrate the actual flipping of the DNS or the redirect.

55:24 Cool.

55:25 So a couple of questions I always ask people before they end at the end of the show is – and I think this question is particularly interesting to you given your relationship.

55:35 But I ask this to all my guests is, you know, there are over 80,000 packages on PyPI these days.

55:41 And like we talked about, there's so many amazing little packages people can grab and make their programs awesome.

55:49 Like what one do you think is amazing you like to call attention to that's maybe not requests that everybody knows?

55:54 You know, something like that.

55:55 Yeah.

55:56 I would have to say Bpython.

55:59 It's sort of an alternative REPL for Python.

56:04 It's got the, you know, syntax highlighting.

56:06 You know, it's got autocomplete as you type things out.

56:11 It really works well and I install it on all my virtual M's because it just works a lot nicer than the building one, I think.

56:19 I use it a lot.

56:20 Okay.

56:21 That's awesome.

56:21 And when you write Python code, what editor do you use?

56:25 So lately I've been using Atom.

56:27 I've been trying it out for the past two or three months.

56:30 So far I've liked it.

56:31 You know, previously I was using Sublime Text 3.

56:34 Lately it's been Atom.

56:36 Okay, cool.

56:37 Yeah.

56:37 It seems like Sublime and Atom are working at the same level.

56:41 They have kind of a similar, they appeal to a similar group of people.

56:45 That's cool.

56:46 All right.

56:47 Any final call to action while you've got the mic?

56:50 Yeah, you know, I just, you know, I would love it if anyone who uses PyPI could come and, you know, contribute to that or to pip or, you know, talk to your companies, see about contributing to developer resources or even just some money.

57:03 Anything helps.

57:04 And, you know, hopefully we can keep moving things forward and everyone will be happy and stop yelling at me when things go down.

57:12 That would be amazing.

57:14 I just want to second that as well.

57:15 Like, try to convince your companies, if they depend heavily on Python, to contribute just a little bit.

57:22 Because imagine a world where pip install basically didn't work.

57:26 You had to go piece that all back together.

57:28 That's something we don't want to see happen.

57:31 And it would be really great if we could make it a much more stable, supported, active thing instead of putting all the weight on you, Donald.

57:37 And the few other guys we mentioned, right?

57:40 Yep.

57:40 And remember, donations are tax deductible in the U.S.

57:45 Awesome.

57:46 So there's just so many more things we could talk about, but we're going to have to leave it here just for the sake of time.

57:51 So, Donald, thanks for being on the show.

57:53 It was great to talk to you.

57:53 Yep.

57:54 Thanks for having me.

57:54 Yeah.

57:55 Bye-bye.

57:56 This has been another episode of Talk Python To Me.

57:59 Today's guest was Donald Stuffed, and this episode has been sponsored by SnapCI and Rollbar.

58:04 Thank you both for supporting the show.

58:06 SnapCI is modern, continuous integration and delivery.

58:09 Build, test, and deploy your code directly from GitHub, all in your browser with debugging, Docker, and parallels included.

58:15 Try them for free at snap.ci slash talkpython.

58:19 Rollbar takes the pain out of errors.

58:21 They give you the context and insight you need to quickly locate errors that might have otherwise gone unnoticed, until your users complain to you, of course.

58:28 As Talk Python To Me listeners, you can track a ridiculous number of errors for free.

58:32 Just go to rollbar.com slash talkpython to me to get started.

58:37 Are you or a colleague trying to learn Python?

58:39 Have you tried books and videos that left you bored by just covering topics point by point?

58:43 Well, check out my online course, Python Jumpstart by Building 10 Apps, at talkpython.fm/course, to experience a more engaging way to learn Python.

58:51 You can find the links from this episode at talkpython.fm/episodes slash show slash 64.

58:57 Be sure to subscribe to the show.

59:00 Open your favorite podcatcher and search for Python.

59:02 We should be right at the top.

59:04 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

59:12 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

59:17 You can hear the entire song at talkpython.fm/music.

59:21 This is your host, Michael Kennedy.

59:23 Thanks so much for listening.

59:24 I really appreciate it.

59:25 Smix, let's get out of here.

59:28 Stating with my voice, there's no norm that I can feel within.

59:31 Haven't been sleeping.

59:33 I've been using lots of rest.

59:34 I'll pass the mic back to who rocked it best.

59:37 I'll pass the mic back to who rocked it best.

59:41 I'll pass the mic back to who rocked it best.

59:43 I'll pass the mic back to who rocked it best.

59:45 I'll pass the mic back to who rocked it best.

59:46 I'll pass the mic back to who rocked it best.

59:49 I'll pass the mic back to who rocked it best.

59:49 Thank you.