Monitor errors and performance issues with Sentry.io

#377: Python Packaging and PyPI in 2022 Transcript

Recorded on Thursday, Aug 11, 2022.

00:00 PyPI has been in the news for a bunch of reasons lately, many of them great, but also some with a bit of drama or mixed reactions. On this episode we have Dustin Ingram, one of the PyPI maintainers and one of the directors of the PSF, here to discuss the whole to a faith story, securing the supply chain, and plenty more related topics. This is another important episode that people deeply committed to the Python space will want to hear. This is Talk Python to Me episode 377, recorded August 11, 2022. Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at Talkpython.FM and follow the show on Twitter via @Talkpython. We've started streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over at Talkpython.com/YouTube to get notified about upcoming shows and be part of that episode.

01:09 This episode of Talk Python to Me is brought to you by 'Compiler' from Red Hat. Listen to an episode of their podcast as they demystify the tech industry over at Talk Python.Fm/compiler. It's also brought to you by the 'IRL Podcast', an original podcast from Mozilla. This season they are looking at AI in real life. Listen to an episode at 'Talkpython.fm/IRL'. Transcripts for this and all of our episodes are brought to you by Assembly AI. Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code? Visit 'Talkpython.fm/assemblyai'.

01:44 Dustin, welcome back to Talk Python to me.

01:47 Yeah, it's great to be back. Good to see you.

01:49 Yeah, good to see you as well. It's lovely to have you back. We talked about the PyPA Python Packaging Authority, we talked about PyPI and all these things previously, and we're back to talk about them some more with a particular focus on security.

02:05 Yes. Which is like kind of my new focus on my day to day, my job hat, my PSF hat, all that stuff.

02:11 It's fantastic when the job that you're paid to do lines up with these other things, right, you can kind of learn on the job and then it really applies quickly. So maybe let's just start there. You're at Google working on security there. Maybe tell us about what you're up to and how it ties together.

02:30 Last time we talked, I was working for Google Cloud as a dev advocate. And so that was a lot of like I think people mostly know me from that, a lot of conference talks and things like that. But since then I've switched to a brand new team at Google that I'm really excited about. I think it's just really exciting. In general, we're an open source security team and we don't work just like on Google's open source libraries or whatever, but we just generally, broadly work on open source security across the entire open source ecosystem. And not just the Python ecosystem, but, like, every open source ecosystem. So we have our hand in a lot of Pots, and I think you're probably aware there's like, this incredible wave of focus on software security, but also open source software security. And so we're kind of riding that wave a little bit. But, yeah, it's a dream team. Everyone I work with is, like, super talented. We're working on some really interesting new security stuff and really love it.

03:20 I bet it's very exciting. And you also have a chance to make a big impact. Right.

03:24 I've been working kind of tangentially on software security ever since I started working on PyPI, and I cared about it for a long time, but it's like, really, I think it's validating to sort of see that, oh, now everyone kind of gets it. Everyone's like, oh, this is a thing we need to focus and make better. So it's cool to be there and be, like, ready to do it and have the tools to make it happen.

03:43 Yeah, absolutely. Have a lot of resources behind you through Google and the team and so on.

03:49 Absolutely. Like, an incredible amount of resources. Yeah. So nice.

03:52 Most people probably don't fully appreciate it, right?

03:54 Yeah.

03:55 So that is fantastic. The other thing that you're doing is working as the director of the PSF, right?

04:01 Well, not the director, so PSF has a board of directors, so I call myself one of the directors of the Psf. Oh, yeah.

04:08 Tell us about your role at PSF.

04:09 Yeah, so I joined the board, I think it's about two years ago. We just had an election, and so we set for three year terms. I've got another year left before I have to run again, but it's been really nice to sort of, like, work on the Psf from the inside and do some community stuff. It's been a really weird time to join the board as well. It was like the start of the Pandemic, and the Psf derives most of its income from events like PyCon, like, a lot of its income. And that was always sort of identified as kind of an existential threat to the Psf, but it very much became a reality very quickly. So there's a lot of work done before I joined and after I joined as well to adapt to that. And I think Psf did an amazing job. We actually did really well, partly, and thanks to all of our sponsors and donors that still continue to step it up, even though we weren't doing an in person PyCon, we did a bunch of virtual Pycons. They went pretty well. Not as quite as fun for me. I like to see folks in person. But yeah, we think made it through the other side, amazingly. It's been great. We got a really great board now. We just brought on a couple of new folks as well, and I'm really excited to see what we're going to do for the next couple of years.

05:13 I don't know that people fully appreciate how important PyCon is to the existence and financial wellbeing of the PSF. Maybe elaborate a bit on that.

05:23 Yeah, I think the statistic maybe at its peak, PyCon revenue was about 85% of the operating budget of the Psf. So almost all of the money that Psf needs to run and operate, which means pay staff, pay for infrastructure, all that kind of stuff came from ticket.

05:40 Sales for PyCon in sponsorship money and things like that.

05:43 Yeah, it's a little gray because there are sponsors and they both sponsor PyCon and sponsor the Psf, and that money sort of just gets used by the Psf. But a lot of that sponsorship is really tied to the in person event. So one thing we've done recently is like, if you're not a Psf sponsor, you should go psf.Org/sponsor. And there's kind of like a new menu for sponsorship and we sort of adapted it in a way that's not exactly focused. You don't have to show up to PyCon to be a Psf sponsor, and still you'll get a lot of benefit from it, including supporting things like PyPI and other infrastructure projects.

06:17 There's a bit of it seems like a bit of a wave of large companies coming in and properly sponsoring the Psf. And I don't know if this is in reaction to what happened with PyCon and Covid, or it just happens to be the timing and the growth and especially the growth of Python in a more business corporate sense.

06:36 I think it's a couple of things. One is that Psf fairy much needs the support, right? And I think that's made obvious to the organizations that use Python and our infrastructure, like PyPI. The other thing is, I think a lot of organizations are taking a source as a dependency a lot more seriously, so making sure that they're like in some way contributing or providing support for the infrastructure, tools, software that they use. The other thing that I want to call out here is like, the Psf staff is incredible. They've done an amazing job about making it really an attractive thing to be a sponsor of the Psf and also following through on our commitments, organizations commitments to us, our commitments to them, that kind of thing, and finding new and interesting ways to get funding as well. Right. So we started doing an interesting thing a couple of years ago where we started applying for grants for work on PyPI. And I think that's actually our first podcast was about some funded work that I got hired to do as a contractor. And then we kind of repeated that and we brought in a ton of money to fund really big stuff. Big stuff that a volunteer would never get done in a year of weekends. Right. It's just never going to happen that a volunteer is going to sit down and have the time to do this. So it's been really successful in terms of shipping stuff that users need that's like big, like large scale stuff.

07:52 It seems a little bit like in the past, but when you go to Pypi.org, this is still kind of shiny new. Right. It got rewritten a couple of years ago and polished up and made a lot more modern, right?

08:04 Yeah, I think 2018 we launched this and so it hasn't changed really visually much since then. A lot of new features in development. And the whole point behind the rewrite was to make it a lot easier to build on top of PyPI was like ancient. Essentially it predated everything on PyPi. So it was kind of wacky. But yeah, it's super modern.

08:23 Yeah, that's fantastic.

08:24 It's like more sustainable now as well. Right. We have better commitments from our inkind donors for infrastructure to vastly that pays our entire infrastructure bill and is like an amazing sponsor of PyPi. But also, just like, we had just hired an infrastructure engineer to work on PyPI super exciting and other Psf infrastructure as well. And yeah, it's just like it was a little more sustainable than it used to be. We have better core volunteers, moderators, all.

08:51 That we talked back in 2018, I think maybe you and Donald stuffed. I don't remember if you all are on together. There were two separate shows, but you were both involved and one of the challenges was PyPI. The Web app was so bespoken sort of its own tangled mess that people would want to contribute and they'd be like, you know what, now that I see this, maybe not so much. It sounds like it's in a better place. We have some PEPs that we're going to talk about, about extending some of the Bits functionality and those sorts of things, which is probably a spin off of just making it easier to work with.

09:25 We did a full stack rewrite for our reasons, because it's easier to maintain for us, it's easier to contribute for other users, easier to propose new changes.

09:33 And I think maybe the undertone for this entire interview is there needs to be progress. We can't just get to a point where it's just that's it that's good.

09:43 It's a constantly shifting landscape. So if we, the Psf Pypi, want to continue to be successful and popular and like Python is doing amazing right now, we have to adapt to that to some extent.

09:53 Yeah, it's not the same world that was built for when it first came out. And also this 393,000 packages is probably not something that was expected when this whole idea was put together.

10:04 It scaled impressively well. I think over the years. Like almost 4 million individual releases, like million artifacts. That's a lot of stuff.

10:15 Let's talk just a bit about the whole infrastructure side, not the tech or anything. We've covered that before and it was really interesting. But just how much data and expense there to run this thing.

10:26 I wrote a blog post sometime last year. And it was going to be essentially a five year update from a previous post that Donald. Who's one of the other PyPI maintainers. Had written about just. Like. What it takes to power PyPI and have some statistics in it that are. At this point out of date. But we serve almost. I think at this point. Over 2 billion requests a day. We transfer, like, more than 60 terabytes from Pypi.org, and that doesn't include files. So when we serve the actual files, actual distributions, that's like almost 10 terabytes a day. Like, per day. That's a lot.

11:02 Yeah. If we had to pay retail costs for our bandwidth from our CDN, so almost like 99% of PyPI's served from CDN, it would be in the millions of dollars. Like, it's a substantial infrastructure just to, like, serve the files, serve the request. So it's not going down. It's plateauing either. It's definitely going up, which is good in the sense that we want it to be popular, but there's sustainability questions that come with that as well as we grow just sort of like unfettered figuring that out.

11:32 Honestly, that kind of blows my mind. I was just wondering, what would you possibly do if you didn't have companies like Fastly really supporting?

11:40 Honestly, it would be very hard to keep PyPI running if we didn't have the support of all our sponsors. And I think it's really important to make this distinction between PyPI and other indices, like NPM, for example, which is owned by a massive corporation and has a whole support staff as a whole engineering staff. PyPI is like a couple of folks and a bunch of donated stuff, and it's on the same scale. It's like, as useful as when I.

12:05 Think about how PyPI and NPM and Ruby Gems and this is not to focus on, like, to call out Python, but just all of these. It reminds me of the early Internet back when not maybe when we didn't have passwords, but when it was kind of like, oh, well, we don't really need encryption here. It was from a time when things were simpler, and it feels like it's getting a little more complicated security wise and so on.

12:32 Yeah, definitely. There was a point when me as a PyPI maintainer administrator, we never had to respond to takedowns for malicious stuff. Like, it just never happened. And now it's like my inbox is on fire because I get really reports a day. Yeah, I think part of it is like, people are trying to hit security bounties and do research with PyPI, which is not the intended use case for PyPI yet, but it's a lot there's an uptick.

12:57 Yes. There's been a lot of talk on the Internet about things that might fix it, like signing packages and what not. But we'll talk about whether that actually has anything to offer. There one thing I did want to give a quick shout out to is there's from the Open SSF. They just gave some big donation to make PyPI a little bit better. Right. So they committed $400,000 in order to create a new role. Tell us a bit about this. What is this?

13:28 Yeah, they're very excited to announce that they're planning to support us with a new role. So it hasn't been finalized. Okay. Contract hasn't been signed yet. No, fairly new organization, a bunch of member organizations, including Google, Microsoft, whatever, to essentially support software security. Right. And so they're just kind of getting started pretty recently and I think their marketing team kind of outpaced the legal thing here. So we haven't signed the contract yet, but I feel confident saying that this is probably most definitely going to happen. So yeah, they committed I think, $400,000 to doing a developer in residence. That's security focused. And so this is sort of like piggy backing on something that I helped start two years ago at this point, which is create the C Python developer in Residence. So that was started with writing from Google and Lucas became the C Python developer residence. And I loved to see this because I'm very happy to say google is not the sponsor of C Python development residents this year. It's Facebook. And that's great because I think this is something that can be shared by all the Psf sponsors funding it each year, that kind of thing. So in a similar way, we're going to ideally hire someone that will focus on just security for Python. That might be security for CPython, it might be security for PyPi. They also want, I think, to fund a security audit of some critical tooling for Python ecosystem that might be PyPI.

14:51 But yeah, this is super cool. And they've also announced funding for some other organizations like Eclipse Foundation.

14:57 This is fantastic news and it's too bad that it's not signed yet, but it sounds like a stuff. When it becomes official, I'll give it another shout out just to say things because this is going to probably make a big difference. That's a big chunk of money to contribute to.

15:11 It similar to the C Python development residence role. Like we're going to do interviews and audit and hire someone for that role. So there'll be a job posting if this happens and then I'll be definitely tweeting sharing that, trying to get people to get interested in apply because this is super cool role.

15:26 It sure is.

15:29 This portion of Talk Pythony to Me is brought to you by the compiler podcast from Red Hat. Just like you, I'm a big fan of podcasts and I'm happy to share a new one from a highly respected and open source company, compiler. An original podcast from Red Hat. With more and more of us working from home, it's important to keep our human connection with technology. With compiler, you'll do just that the compiler podcast unravels industry topics, trends and things you've always wanted to know about tech through interviews with people who know it best. These conversations include answering big questions like what is technical debt? What are hiring managers actually looking for? And do you have to know how to code to get started in Open source? I was a guest on Red Hat's previous podcast command line Heroes Incompiler. Follows along in that excellent and polished style we came to expect from that show. I just listened to episode twelve of Compiler how should we handle failure? I really value their conversation about making space for developers to fail so that they can learn and grow without fear of making mistakes or taking down the production website. It's a conversation we can all relate to, I'm sure. Listen to an episode of compiler by visiting Talkpython.fm/compiler. The link is in your podcast players show notes. You can listen to compiler on Apple podcast, overcast Spotify podcast, or anywhere you listen to your podcast. And yes, of course you could subscribe by just searching for it in your podcast player, but do so by following Talkpython.fm/compiler so that they know that you came from Talkpython to me. My thanks to the compiler podcast for keeping this podcast going strong.

17:08 Let's talk about 2FA that's been a bit of a flashpoint and I don't feel like it should have, but it has been. What's the story of two FA and critical packages and PyPI?

17:20 Yes, flashpoint. Kind of almost unexpected for me. I think I'm just so close to security space and pypi and all that stuff that the reaction was a little stronger than I think everyone expected.

17:34 It feels to me like the reaction was if you set up a rule that said, hey, you can't have the letter A as your password and everyone knows the letter A, you have to change it. It's almost like that level of requirement change to me it feels like and yet it just blew up. Right?

17:51 Yeah. Let me give some background and then we can talk about realistically what it means. So yeah, we made an announcement and basically that we were going to designate some projects of pypi as critical. And essentially we determined that based on download count because that's kind of like it's not a great metric, but it's kind of the best metric we have for determining if this project was compromised. And I'll talk about how that might happen, how many people would be affected, and it's like if we measure the amount of times it's getting downloaded a day, that's a pretty good proxy for impact in terms of something being compromised.

18:24 Right?

18:24 So yeah, we made this designation and we sort of announced that at some point in the future, do not announce a date, did not enforce the requirement. At this point we're going to ask those maintainers to require that to 2FA enabled for their account. And so we did that and then we sort of paired this with an incentive. My team at Google actually funded the purchase of a bunch of Titan security keys. These are like hardware keys for two factor authentication that Google manufacturers, but we just essentially give away discount codes to these maintainers of projects that have been designated as critical and they can get not one but two for free. So if they're one of these maintainers so yeah, we did that and the designation was 1%. We decided the top 1% of projects would be this point designated as critical.

19:07 Right. I feel like there was a bit of confusion when people saw this announcement. They saw, wait a minute, you're making me adopt hardware based 2FA because I have pypi package? The requirement is not that you have to use the hardware keys if you have a couple of packages.

19:24 I would love if everyone used hardware keys because I think they're generally considered to be a little bit more secure. But no, the idea is that everyone should turn on 2FA and that's pypi supports TOTP, which is like what you're used to free applications on a phone or other device. And security keys.

19:43 Security keys is pretty broad. Now, that doesn't just include the USB devices, but also you can do web auth like phones and other physical hardware.

19:54 The integration with browsers is pretty good now. A lot of support.

19:58 Yeah, just like the audience out there, michael is asking do they need to be hardware keys or just regular? Auth is just regular off. Right. And that's why I said I don't feel like it's that big of a deal. It's like, well, you have to have a secure password or you have to have two FA or whatever.

20:10 Like kind of immediate reaction from some folks with like really big megaphones essentially, was that this is a slippery slope. Like Pypi is asking something of its users. We don't do that very often. We sort of like let users do whatever they want and we have some sort of baseline requirements for how to use Pypi, but we don't often ask people to do extra stuff. There's a good reason why we're interested in asking people to do 2FA. And it's not because Google has secretly conspired to do it so that it's own open source security.

20:39 There's a very good reason.

20:40 There's a whole undercurrent, a whole thread of, well, it's these big corporate companies that are adopting Python that are making us do different security to support them. And that wasn't it at all, was it?

20:52 Here's the main reason. Right. That is valid. There are big corporations that consume stuff pypi. They would love to have more assurances about that their projects haven't been compromised. I don't think twoFA is exactly the right way to do that. At the end of the day, two FA, it protects against two kind of critical attacks that could happen on a Python package. One is just like pishing, right? 2FA is essentially completely eliminates the potential to get pished. I've never seen someone get fished on Pypi. I've never heard about phishing attack. But Pypi is as susceptible as, like, a bank or anything else for phishing. Like, it could happen to anyone. So that's one thing. The other thing is maybe more specific to Pypi itself, which is what we call domain resurrection attacks. So developers really love their vanity domains, their personal domains, their personal email addresses. And so, unlike maybe your bank, the users on Pypi are more likely to have these one off domains. And those domains expire. People forget about, they lose access to them, they get registered by someone else. And when that email address has the ability to reset a password on the Pypi account, an attacker can keep an eye on your domain, watch when it expires, go and register it, do a password reset, and then take over your account and publish whatever they want. And so 2FA, in a similar way to phishing, protects against that attack as well.

22:07 I never really thought about that. That's almost like the SIM card, a little bit equivalent, but for email. So the SIM card problem is, I could call up my I could call up someone else's phone provider and say, I lost my SIM card, please issue me a new one. And then you start getting their SMS for like, SMS authentication and stuff.

22:28 You've taken over their domain. Not maliciously. They just decided to credit card expired or something. And then you snatch it up, set up some MX records, and off you go.

22:38 Okay, our ultimate go. Like Pypi administrators, I'd love to protect all users from attacks that could be prevented from 2FA, but it's a little bit more like it's actually for our own benefit, right, those kind of attacks. So one has happened recently. The CTX package had a domain takeover and a malicious release published, and we wrote a very long incident about it. It took a lot of our time, and essentially, like, it's not sustainable for these to happen.

23:03 We don't have a support team. We can't manually remove these packages and monitor things.

23:09 We just can't handle it. So 2FA is like the folks that maintain Pypi, asking users like, hey, help us out a little bit. Just do this thing for us to kind of cut down on the potential for this and make it easier for us to do things that we actually want to do to Pypi, and not just like, respond to security instance.

23:25 Right. Because there's only a couple of you, and if you're spending all your time putting out these fires, you're not adding JSON endpoints and other beneficial things.

23:33 Yeah, all sorts of stuff. The more time we spend putting out fires, the less we can do, like, useful and interesting things to Pipe.

23:39 Yeah. So why 1% of the top packages. Why is that critical? And also, what's the designation over time?

23:46 The designation is if at any point it was in the top 1%. And I think we recompute this every day. So projects have, since we announced this, they've moved into the 1% because it's constantly shifting. But yeah, why 1%. So that's a question that was coming up a lot in the discussion after we made this announcement. And the secret to the 1% is that in reality, if you were to go and figure out like, okay, how much traffic, how many downloads does this 1% of packages actually represent for PyPi? It's like over 95%. It's close to 99%. It's like most of what people are using from Pypi is in this 1%. So by saying 1%, we also essentially said for the long tail of pipe that people aren't using, we care a little bit less about that. We're going to cover the majority of these. Like I said, the potential for impact, if something was compromised, we sort of maximize that. And we also kind of had to minimize that 1% too, because I think another thing that folks didn't really realize about what it takes to support 2FA is that there's an incredible maintenance burden for 2FA. Like, we have to handle account recovery requests because people, like, they lose their phones, they lose their keys. People are humans, right. And so this happens all the time and it's expensive for us to handle this. Right. And we can't just say, okay, great, you lost 2FA, I turned it off for your account, go Wild, because that's essentially like a perfect way to circumvent to FA. Instead we have to do this very manual process where we verify other identities, emails. Like if you have a GitHub associated, we ask you to do something on GitHub, just like prove that you own that account. And even then it's like, it's really not perfect. There is potential for someone to be compromised who did have 2FA enabled by someone who could take over this account or that account and pretend like they need an account recovery. But yeah, this is a huge maintenance burden. So we actually can barely handle account recovery requests right now, and I'm a little wary of how many we're going to get now that folks have started really turning on 2FA. But we think it's worthwhile and maybe.

25:43 That's probably why 1% and not 100%, right?

25:46 Oh, yeah. There's zero chance we could handle 100% of everyone on Pip with two if they enabled, we just couldn't handle it. I would love that. That'd be great. But yeah, unfortunately the amount of people losing their stuff and having to come to us for resets the burden is really high.

26:02 Sure. For me, I used AuTh from my 2FA which syncs across devices, so at least if I lose one, I can get it back.

26:09 Yeah, Google Authenticator works really well for TOTP. As well. And I think you can download the codes or store them externally as well. So if you lose your phone, you can regain access to those totp be codes as well. And there's a bunch there's also, like emulated TOTP stuff where you can run it on your laptop. It's not like maybe not technically true factor, but like, a lot of people use that because it's more convenient.

26:30 It's way better than nothing, right?

26:31 Oh, better than nothing. Exactly.

26:33 Let's talk about James Bennett and opinions. You called out this article and I also read this. I think this is really good. What are some of your takeaways here?

26:40 Yeah, James absolutely nailed the response here. And actually when we got a lot of feedback I'm not going to say that it was bad feedback, it was maybe somewhat uninformed feedback or it was somewhat sensational feedback, but we got a lot of feedback after this and some of it was totally valid. Like, at the end of the day, we are asking users to take a little more effort and some people, they don't want to do that.

27:03 None of the pypi administrators actually explicitly responded to a lot of this. I think we were all a little bit depressed about how upset some people were about 2FA requirement that didn't even exist yet. But yeah, James, like, really shout out to James because I read this and I was like, I really could not have written it better than he did. He really called out.

27:23 There's a lot there and I think it's very well thought out.

27:26 Yeah, shout out to James. There was kind of like two arguments that he was making, which is that a lot of people were concerned this would be a slippery slope. And I think I don't really foresee pypi making too many more mandates about stuff like this. Not because of the feedback, but because I don't think we're ever going to mandate signing, for example. That's always going to be the option of the maintainer. But things like to pay for certain high profile stuff, it really helps pypi continue to exist. Right. That's actually the motivation here.

27:56 I definitely want to echo the message that you said about the overhead.

28:01 The people who would otherwise be constructively working on this have to deal with these problems.

28:05 Yeah, every day. I mean, it's like I don't get paid to do it. I do it out of love, but it becomes larger and larger every day. And yes, we're keeping our head above water right now, but yes, there's plans also to make that better.

28:16 How much do you think the reaction I'll put it out. I think the overreaction was how much do you think that was perceived as it's got to be a hardware key versus it's just straight to FA? Do you think people really rejected it being 2FA or did it seem like a bigger burden than just adding it to your google Authenticator.

28:34 If I were to say that whether we made some sort of failure here when we announced it, I would say, like, we didn't message this super well, right? And that's because I'm a software engineer. I'm not a marketer or I'm an okay communicator. And same is true for the rest of us. We don't have copywriters or anything like that. We don't have a PR team. So, yeah, there was some stuff that people kind of missed, and I think one of the things was missed was like, the mandate doesn't exist right now. We're just talking about enforcing it in the future. The other was like, what is actually being required of you today? Which for most folks, it was nothing. It was like, if you want to get a pair of free security keys, you have to do this today. And by the way, those are still available.

29:11 I'm sure you all saw this as a positive. Like, hey, we got this cool thing for people that they can get if they want or they just do 2FA. But people are like, what is this? You're saying there's still some available for folks who want to get it, right?

29:21 Yeah. So through October 1, this might call out at the end as well. But yeah, if you go to pypi.org/securitykeygiveaway, you can check if you're a critical maintainer, and you can get a pair of keys, actually. So the pairkeys thing also, people weren't really sure why we were doing that, but the main reason is to help you not lose both of them, like, lose all access. So if you have two keys and you've used both of them, you have.

29:46 Some redundancy, you can stick someone, stick it in the garage or stick it somewhere else. Hand it to a friend in your backyard.

29:54 Exactly.

29:58 This episode of Talk Python is brought to you by the IRL podcast, an original podcast from Mozilla. If you're like me, you care about the ideas behind technology, not just the tech itself.

30:10 We know that tech has an enormous influence on society. Many of these effects are hugely beneficial. Just think about how much information we carry with us every day through our cell phones.

30:22 Other tech influences can be more negative. I really appreciate that Mozilla is always on the lookout for and working to mitigate negative influences. A tech for all of us. If those kinds of ideas resonate with you, you should definitely check out the IRL podcast. It's hosted by Bridget Todd, and this season of IRL looks at AI in real life. Who can AI help? Who can harm? The show features fascinating conversations with people who are working to build a more trustworthy AI. For example, there's an episode on how the world is mapped with AI, but it's the data that's missing from those maps that tells as much of the story as the data that is there. Another episode is about gig workers who depend on apps for their livelihood. It looks at how they're pushing back against algorithms that control how much they get paid and how they're seeking new ways to gain power over data and create better working conditions for all of them. And for you, Political Junkies, there's even an episode about the role that AI plays when it comes to the spread of disinformation around elections. Obviously a huge concern for democracies around the world. I just listened to The Tech That We Won't Build, which explores when developers and data scientists should consider saying no to projects that can be harmful to society, even though we do have the tech to build them. Does this sound like an interesting show? Please use the Link Talkpython.Fm/irl to subscribe. Yes, you can search for it in your podcast player, but use the link Talk Python.Fm/irl to let them know that you came from us. The link is in your podcast players. Show notes. Thank you to IRL and Mozilla for supporting Talk Python. To me, I guess part of the reason this is so much in the public awareness is because of this project called Atomic Rights.

32:12 Yes.

32:16 Give us all the rundown of why we're talking about this package. Let me just give people a really quick background. Atomic Rights is a package that lets you within a width block, like you would do open file, but instead you say atomic write and it will write to a temporary file and only commit those changes to the real file, like at the very end, all in one shot. Pretty useful. Not super hard to do your own version of with a couple of builtin things in Python like the Temp files and what, but still kind of no.

32:44 Longer necessary for modern Python is my understanding. Like, this is a couple of lines of Modern Python. You don't have to worry about it. But it used to be something that.

32:51 You would use, right? Exactly. How does this relate to 2FA? That has nothing to do with 2FA, does it?

32:56 There's a thing that happens all the time, right? So IPA has this policy that everything on Pypi is essentially immutable. And that means that individual files, file names which include a project name, a version, and like a distribution type, those are immutable. So if you upload something to pipe that is like source distribution for some version or whatever, you publish, that it's there. You can't overwrite it, you can't surreptitiously change what that points to. So anyone installing that is always going to get the same thing, same shaw, everything. But that also means if you want to delete something, you delete it and it's gone forever. You can't come back and overwrite it with something else.

33:31 And I don't encourage people to delete stuff from pypi generally, because you're almost definitely going to break somebody. There's better methods for kind of marking something as not useful and telling Tip to not install it. That's our Yankee, which is a whole pip into itself. But yeah, so this thing happens all the time, though. Like, we have a huge warning banner, big red button, like everything telling you if you're going to delete this thing, you're not going to be able to get it back. And so what happened here is like, this maintainer didn't want to comply with 2FA. The project was marked as critical because a lot of people were using it, like, a lot of people were using it still, and they thought that it would be a cool they thought they discovered a cool hack where if they deleted it and then recreated it later, that the mandate would no longer apply. And that was kind of true because like I said, our computation for critical projects runs once a day. So when they brought it back, it didn't have that flag. Within 24 hours, that flag was added back to the project, essentially, but for a brief period of time. Yeah, it was not marked as critical. But what happened was all these versions went away and like, a lot of people, I think, were depending on, like, actual users of this project.

34:32 There's a long discussion happening now about whether it should even be possible to delete stuff from Pypi, and there's good arguments on both sides of the coin, right?

34:40 Yeah, well, that was one of my first thoughts, like, wait, you can delete the releases. I knew they were immutable. You can't update them, but deleting so what's the trade off there? Why can you delete them now? And maybe, why wouldn't you in the future?

34:53 This is like NPM left pad incident, essentially. Like we right now, there's potential for high profile enough. This package wasn't super high profile, but it was in the critical list, it was in the top 1%. There's potential for some maintainer to decide, and it's their prerogative right now. Right. There's no guarantees that these things continue to exist on Pypi. No one's necessarily paying for this. So, yeah, maintainer absolutely has the ability now to just wipe something super popular, necessary off the face of pypi, and that's the current status quo. It's not the same in a lot of other ecosystems.

35:25 Some of them don't have that policy. Some of them do, but yeah, so there's a bit of debate about whether it should be necessary, especially when we have stuff like yanking, which actually is exactly one of the more meaningful way to remove something.

35:35 Yeah, so let's suppose somebody pallets or whatever erases flask tomorrow.

35:41 David, don't do it.

35:42 David, please. Keep going, man.

35:45 Is there a way to get a hold of the actual wheels and stuff as a community and put it back up under potentially a different name, or is it just gone? How seriously gone is it when it's gone?

35:57 Yeah. Nothing published pypi is actually gone unless we're legally required to. We don't delete any actual files off of our data. Store. So like, the bucket that everything goes into, everything that's ever been published to Pypi is still there. So this actually played out. It's good that we have this in a couple of instances because this is what exactly we used in this case, atomic rates. Because the maintainer was like, oh, I made a mistake. And they were like kind of humble. They were like, yeah, okay, this is a mistake, I shouldn't have done this. And then asked us to essentially restore the project from scratch. And we don't really have mechanisms to do that. Right. That's not something that we do often. I think I can only remember maybe once when we've done that before, maybe not even once. We generally just don't do this. Like if you delete something, we say it's gone, you need to publish a new version. But in this case, we did decide to take the time. I think it took Donald like almost an hour to do this because it's a super manual process. But yeah, the files are still there.

36:50 Something you don't do very often, right?

36:52 No, almost never.

36:53 How can I even do this?

36:55 The files are still there and they're still externally addressable too, so they're always going to be available. And if something like that happened, David, don't do it. But if something like that happened, I think folks would probably be okay. We'd find ways around it. But yeah, I mean, it's a strong argument for not allowing it to happen. And when people publish stuff to Pypi, our toss is essentially you give us the right to distribute this as we see fit forever. So pipeline is within its right. But there are arguments for giving maintainers the ability to do it for various reasons.

37:26 Has there been any thoughts to putting levels of what Pip will install?

37:32 For example, I'm thinking like, I want to set up my Pip, though it will only accept things that have 2FA set up, or it will only accept things with a certain number of downloads. Like I can only Pip install something with 10,000 or more downloads because maybe I'm trying to avoid type of squatting for very edge case things in that general realm. Have you all thought about this?

37:53 Probably yes, we have it on our list to talk about later. But yeah, I mean, there's definitely potential, right? There's all sorts of signals that you could potentially take into account here. TBD, like how meaningful some of them actually will be or how much they will actually protect you, but sure, yeah. People have talked about like essentially defining a policy for what they'll consume and either having that be part of pip or something external. Yeah, it's definitely been discussed.

38:16 Okay, for example, the web browsers, you can have no blocking. You can have block third party cookies, you can block third party cookies and trackers and you can decide, how broken do I want my web to be? Versus how safe do I want my web to be. I feel like there might be something like that in the Pip world.

38:35 I think the reality, at least right now, is that any kind of policy like that would not be enforceable because there's going to be some edge case, some dependency that's super old or whatever.

38:46 Python is not nearly as bad as an ecosystem like NPM in terms of breadth of dependencies for a given thing. Right? Yeah.

38:52 Usually the dependencies are like thicker. Right. You don't have like three lines of code you're depending on, you just put that in your code.

38:59 It does exist. But yeah, generally now, even still, I think you'd have a hard time saying, I'm going to only consume packages that have two of the enabled because there's so few of them right now. If all that's the tooling and stuff for that existed. Yeah, sure.

39:15 Okay. Interesting. So this atomic right story, everything was put back, but it shows unintended consequences.

39:22 And kind of ironic too, actually, because Pypi is an open source project as well and people were upset we were making demands of users to do a certain thing, but at the end of the day, someone is making demands of us to use our time and in ways that we don't necessarily want to.

39:37 Again. See James Bennett's article, right? A lot of those ideas were really well spelled out there.

39:42 Absolutely.

39:43 Okay.

39:43 People forget that it's a volunteer open source project and not like run by some corporation.

39:48 I thought it was this conglomerate of corporate overlords.

39:52 So that's what I got from Reddit Shadowy cabal. What's this?

39:57 PyPi 2FA dashboard here. This looks pretty cool. Tell us about this project. I'll link to it in the show.

40:03 Notes, of course, in the top right switch to past three months or something like that. You can really see the bump there. So, yeah, this is the dashboard we put together essentially for us to monitor how the rollout for TFA and security key giveaway was going. But we made it public so anyone can check this out. And I think we'll put link in the show notes, but yeah, the numbers are great. The one thing that isn't actually showing is how many security keys we've given away. So we've at this point, like I just checked earlier, we've given away more than 500 keys, which is awesome, and it's only a fraction of what we have to give away. So I would really like anyone listening wants a key and it has a critical project like go and get the keys and then we'll find something to do with those keys if we don't give them all away by the time they expire in October. But yeah, so a bunch of keys given away. I didn't mention this, but as part of this, we also turned on a feature that allowed any project to manually require 2FA for all their maintainers. So anyone that just wasn't critical and wanted to opt into this, they could do that too. So almost 300 projects have done that. And then we're almost like we're so close to hitting 30,000 users on Pip with two factor enabled, which is huge, and right on that's up from like 27,000 before we did the giveaway.

41:13 Cool. Well, I was one of the 27,000 before because my packages on Pypi are not super significant, but they are there. And so I definitely put 2FA on there and just have it running through my phone, basically.

41:26 I appreciate that.

41:27 Yeah. So people can go and see the progress here of how it's coming along.

41:32 Yeah. And how many projects we've classified as critical. Right. Like, how many is 1%? Well, right now it's like almost 4000 projects, which is a lot of time, but there's a lot of maintainers of those projects.

41:42 Sure. Well, two things that are interesting. One is I think I can go to Pypi.org and see 393,000 and say, well, 3930 are probably critical right. As that designation. But as you said, it's computed over time. So maybe there's something that was critical but is no longer. Or something becomes critical. Right. So this number could sort out pace the actual number of just 1% of the total projects.

42:09 Yeah. It will grow above 1% over time. Because if something's been designated as critical, it just retains that designation indefinitely. And then the other thing that we kind of snuck in here is that anything that's a dependency of Pypi itself is also critical. So we just figured that's a good idea for us. There's a couple of projects that maybe wouldn't normally be included, but we include them because we personally care about.

42:32 Is that like, maybe pyramid or stuff like that?

42:35 Yeah. I don't know what the difference between those two sets is necessarily, but it would be interesting to figure out if.

42:39 There's anything that potentially wasn't, then it is because of that. Okay.

42:42 Yeah.

42:42 The other thing that's interesting is there's 8400 users identified as critical even though there's 3900 packages. So I guess because multiple people can be designated as a maintainer.

42:54 Yeah, exactly. So it looks like the average is about two maintainers per critical project. Which is a little scary, I think in that reality, there's a lot that just have one and a lot that have a lot more.

43:03 Yeah, it's bimodal sort of distribution. Right. Like a whole bunch by the one maintainer than these groups of people.

43:10 Yes.

43:11 Interesting.

43:11 Okay.

43:12 This is cool. So people can check this out and see how it's going. This whole idea of critical and requiring 2FA. NPM is also doing something like this. Right. So it's not completely out of the blue.

43:23 Yes. I mean, a lot of organizations are doing it. I think Ruby Gem said that they were working on a mandate or had proposed one as well. And NPM, they started with a pretty small cohort. It's only the top 100 projects, but they're going to expand that. And then the big one is, I think a lot of people aren't aware of this, but GitHub announced that they are going to require 2FA for anyone that contributes code on GitHub, which is like, I guess everyone that uses Git. I don't know, I don't know what the user is that doesn't contribute code on GitHub. But yeah, everyone's going to have 2FA enable, which is huge, by the end of 2023, so they have some time. But I can't comprehend the size of the support team they must be hiring right now to satisfy that because that's like it's crazy.

44:02 Well, probably a lot of support team and a lot of automation that they're trying to get in place.

44:07 Here's what you got to do.

44:08 But the secret here is that via the open SSF. It has a bunch of work groups and we have a new one that's pretty fun for me because it's about securing software repositories. But essentially everyone that maintains a software repository. Including folks from NPM. Crates. Ruby Gems like Maven Central. All that stuff. We all come talk twice a week or once every two weeks and talk about this kind of stuff. So we've been all talking about like 2FA for a while now and working on our plans together and sharing notes and that kind of thing.

44:38 Yeah. Fantastic. I can only think that there's probably some users who just clone repos and post issues but don't really do any check ins. They're just there to just be a copy. Yeah, that's actually probably not the main use. Right. Probably the main use of people contributing to their private repos, or maybe even less of a public repos. Probably mostly private repos, I would guess. Okay. But yeah, that's going to be a big deal. And it also, I guess, leads us into the whole supply integrity side of things. Right. Because it's one thing to say your account on Pypi has to be secured with 2FA and better security.

45:22 But if somebody can just put in bad code through a very complex PR that happens to sneak less than where.

45:30 They used to be.

45:31 Greater than some weird little edge case into a PR. Or take over somebody's GitHub account. Change the code and they don't realize it. Right. That's probably more likely because there's no notification that you did your own commit in a significant way. Right. So walking down GitHub will have really important knock on effects for Python and all the other open source package locations. Right.

45:54 And there's a lot of work being done here just in terms of like making your social prosperity more secure, making your bills if you're building artifacts on GitHub, making those more secure, making your publishing method more secure, like all of it is getting a ton of improvement right now.

46:09 Yeah, you don't want something sneaking into mixing some code in a CI stuff or any of those types of things.

46:16 Just so many insertion points, right? Yes, there's a lot of spots.

46:20 Exactly. So I guess what are some of the thoughts on what can be done? This is kind of what I was going back to, like, oh, there's just be this world with no encryption and lacks or no passwords, and now all of a sudden we need all this junk. And I feel like there's a little bit of that with people going, well, let's just try to abuse Pypi, let's try to abuse NPM and sneak in what we can and maybe talk about some of the steps you all have taken to mitigate that and what you think can be done. There's a lot of talk about package signing, which I'm not so sure straight up signing is how useful it is, but yeah, maybe let's start at what.

46:53 Have you thought about the thing I want to start with is just like none of these is a pancea, right? So I think one of the arguments that raised with the two factor stuff was just like, well, this isn't going to protect us from a vulnerability or a maintainer going rogue. It's like, yeah, no, obviously not. There's not one thing that's going to protect you from that. There's a combination of features that are going to protect you from a combination of threat actors or vectors, and we have to use all of them if you want to feel fully protected. So, yeah, I think we just spent a lot of time talking about 2FA, but like I said, it eliminates entire classes of attacks. Please turn on TFA. Yeah, another one, since we're talking about GitHub, interesting one is the security hardening with OIDC. If you've heard of OIDC, it's open ID connect, it's kind of built on OAuth protocol, but essentially it allows you to give things identity. So, like, each individual GitHub action workflow, like each run of the workflow actually gets its own identity. And that identity is like cryptographically verifiable. So pypi is we're working on implementing support right now and it will exist very soon for what is essentially going to be credentials publication from GitHub Actions Workflow. Okay, so that means like, no password, no API, token, nothing. You essentially say, okay, I trust this workflow, and it has the ability to publish directly in a secure way. It's super cool. And then like, a lot of other Ci providers hopefully support for ODC as well. So, yeah, we'll probably see this in a bunch of places.

48:17 All right, so short live tokens directly from your cloud provider?

48:22 Yeah, essentially it works for like, Google Cloud right now as well, and a couple of other things, but yeah, it's essentially like a way to verify the identity of like a really tightly scoped thing like Actions Run, and then verify it and authenticate it and give it the permission to do something like Publish to Pypi.

48:39 This looks useful.

48:41 Handing out API keys or embedding username passwords into CI/CD doesn't sound like a great idea.

48:48 Yeah, I know, especially with stuff like Codecove attack. Travis had a similar attack where all environment variables were exposed. Everyone's got to just go and roll everything at that point. It's like such a mess.

48:59 It's one thing to say, okay, we got to go reset your password. Fine. There is a race. I mean, as soon as that happens, there's a race from people looking to use those credentials and you looking to not have them used by other than you. The real problem, I think, is not the people who are paying attention. It's probably the maintainer who set up some project and hasn't touched it in a year and hasn't checked their email. They're just kind of not super engaged that could stay open for a super long time. In a bad way.

49:25 Exactly.

49:26 All right, package signing. Donald stuff talked about package signing and said why package signing is not the Holy Grail. People just say it's a little bit like the 2FA stuff. If you just sign your packages to prove they come from you, everything is going to be fine. Unless the person just goes rogue. Like you can't protect from crazy. There was one of the packages that got messed up. I think it was on Pypi. That same person was arrested for bomb making materials in New York. I don't know if I can find the article. It's like, clearly that's not a well functioning sort of person. And it doesn't matter if they sign the package if they just go bonkers.

50:08 Yeah, it's not the attack that we're trying to protect against the signing, but.

50:12 Yeah, we can sign and help with and what is it not going to help us with?

50:14 Yeah, Donald wrote a post that's kind of like canon at this point for why packetsign is not the Holy Grail, but he's really talking about GPG.

50:22 Why GPG signing is not the Holy Grail. There's good points in here. Right. So like you said, packersign doesn't protect against an actual account compromise. If someone compromised your GPG key, they can sign whatever they want, so there's no protections there. And then there's other problems with GBG as well. Like, there's UX and usability issues. There's issues with web of trust, like actually establishing. Okay, great. You sign this thing about how do I establish that the person that signed it is actually a person that I trust and not someone's provided me with the malicious public key. Right? Like, how do you actually do that?

50:53 So it's a little equivalent to people saying, well, just check that it's Https for the URL. Well, that's not identity, that's just like encryption.

51:00 Yeah, you can still serve crap over Https. Yeah, I think a lot of people also they don't realize that Pypi has supported uploading GPG signatures for a very long time. Still. Does hasn't gone away, and nobody does it. It's just not used. Nobody does it. It's too hard to link, doesn't work right, or it's just like, not worth doing. So I think Donald's post is right in the context of a world, and this is written in 2013, so a world where only GBG is the only signing feature or signing tooling that you have available. But that's not true anymore. And I'm really excited about this new tech called Sigstore, partly because I work with it, people on my team work on it. It's like really interesting technology. But Sigstore is essentially like a new way to sign things, and it's not necessarily based on long lived maintained keys. It actually uses ephemeral keys. When you sign something with you generate a public private key pair. You sign it very quickly and then you throw those away. Like, you don't actually maintain them. They can't get leaked, they don't ever get rid of the disk. Like nothing. They don't exist.

52:01 That's interesting.

52:02 This is also based on ODC, the.

52:03 Chain of ownership of those keys, the keys provided by some other trusted key or something that really is tied to you. Something like that.

52:12 No, it's just literally just a key that you generate a thin air, you sign it and then you throw it away. The way that six store works, which is it's also built on top of OEDC. So you have these identities, right? You have your email, your Gmail account, your GitHub account, that kind of thing. And they all offer essentially an online identity. You essentially sign with these identities instead. So you sign into something like Gmail, sign in something like GitHub. You share this identity with a certificate authority that Six store runs. And this binds the identity to this one time ephemeral private public key pair that you generated. Then that certificate is published on a transparency log, so it's there forever. There's a record of everything that gets signed. But then the thing that you have to trust is not like some however many digit long alphanumeric public key ID. It's like an email address. It's like di@python.org. You can be pretty sure that someone hasn't and someone can still lose access to that identity via compromise of that account. But it's much, much easier to use, sign and maintain. And it's a little bit less likely that actual identity is going to get compromised.

53:21 Okay, here's the one that I was thinking. I'm sorry. It was NPM, not Pypi. NPM libraries, colors and fakers sabotaged in protest by maintainer a person named Mark Squires. And followed up quickly from that. We have the resident of Queen's home, suspected in bomb making materials, arrested Mark Squires and so on.

53:44 This is what I was talking about when I'm say, like, the no amount of signing or 2FA is going to help against this. And it's just something I think it's part of the deal if you accept code from people you got to vet that.

54:00 I've essentially always said make no guarantees. You can't trust anything that's on it. You need to take your own steps to learn and build trust and things that are on there. We can give you some tools to help you do that. But yeah, essentially you're giving someone commit access to your project, your application, whatever, like you're allowing them to introduce code into your project and run alongside your application.

54:22 What are your thoughts on private hosted package systems like Pypi Dev Server or whatever?

54:30 Those where you can create the local ones and maybe even mirror stuff in Pypi?

54:34 Yeah, I mean if you're going to do like a really robust auditing pipeline where you pull stuff from public Pipe, you spend some time looking at it, you maybe run it through some tests and then you introduce it to your private server. That's a really good way to insulate yourself from a couple of different types of attacks, like defense, confusion attacks, type of squatting, that kind of thing. If you're pointing your install at private server that just doesn't have any of that stuff on it because you have manually curated it, then yeah, that's a pretty good practice.

55:01 Like Art Factory, google cloud, Artifact Registry, all those things. They're all sort of similar in that regard, I guess.

55:06 Related to that is if you automatically install the latest continuously, that maybe puts you at a higher level of risk than if you choose to upgrade at some point to a package. Like pinning versus not pinning.

55:18 Yeah, exactly.

55:19 We're getting pretty short on time here, Dustin. What else can we cover? I know, maybe Pip Audit or Scorecard or what do you want to focus on for a couple of minutes here?

55:29 Let me just quickly say that like six store, we built a Python client so you can install Six store and then sign, verify, do whatever from there. And I'm super excited to say that the upcoming Python 311 release, the releases are usually signed with GPG, but we're going to start signing it with Six Store as well. The release maintainer nice. Pablo is going to sign it.

55:48 Can it be signed with two things at once?

55:50 Megasit can, yeah, multiple people can sign it. So we're just going to have the release manager sign it. But yeah, that's super exciting. Yeah. So another area that we've been working on a lot lately, I've been working on this vulnerability auditing or Remediation. So we have a new tool called Pip Audit. It's not part of Pip right now and there's some discussion about whether it should be or not, but essentially this is a tool that allows you to audit your local environment, your docker, container requirements, file, whatever for known vulnerabilities. Not like unknown vulnerabilities, but stuff that's been known, reported, and either fixed or upgraded. So it'll tell you like essentially if it finds a CV or something like that. But this also uses like a Python specific advisory database that we built that pairs with the Open Source vulnerability service and works pretty well. I'm pretty pleased with it. I would encourage everyone to just run it on their machine and see what vulnerabilities you have lurking about right now, but also integrated into your CI pipeline. Like run an audit, just make sure that your application is not going to have a vulnerability introduced. Pairs really nicely with dependable things.

56:50 There's a bunch of other stuff like working on Salsa. If you're not familiar with Salsa, it's essentially like a framework for thinking about how secure your build pipeline is when you're producing publishing artifacts. So if you're a maintainer, you might think about whether tampering is possible, that kind of thing. Just sort of a way just to sort of think about how good of a job the build pipeline is doing in that regard.

57:10 Yeah, build pipelines are a little scary. They have a huge value, but they're also you could sneak stuff in without even actually changing the code of the original repo and all sorts of stuff. I saw this is managed by in collaboration with Trailhead and also you're a maintainer.

57:26 What's the origin story of audit?

57:28 Yeah. So Trailheads is a security consultancy. They've done a lot of work in the Python space and software security space for a long time now. Folks like William Woodruff were actually involved in way back when implementing two factor and some other stuff on Pipe. So my team, open source security team at Google, we've hired them as contractors to do some of this work, do some maintenance, build these open source projects, that kind of thing. So you'll see William and Alex and some other folks all over these projects because they've been working really hard to make them really useful and work really well and be really secure.

58:00 Yeah. Fantastic. Let's maybe just round things out with the stuff on the Pep, various PEPs.

58:06 Oh, yeah.

58:06 Maybe the API. Let's go through the PEPs real quick then. We'll probably have covered enough.

58:10 Yes, I stuck my name on a couple of PEPs recently.

58:12 Or as much we have time for it.

58:13 Yeah, I think most of these are I've provided some minimal input into them. I can't say I claim that I authenticate them myself, but what I'm super excited about is Pep621. So this is a way to do essentially static metadata for Python packages. So this includes source distributions, which means that you don't have to use setup PY anymore. You don't know why you don't need to use that py anymore. It's essentially arbitrary code execution at install time, which is super scary and it should not happen.

58:42 Pair that thought with this resident queens for bomb making.

58:47 If you want to have that person running arbitrary code on your machine, probably no install time.

58:53 Yeah, exactly. So I actually just saw right before I joined this, I saw a tweet from paging that setup tools has full support for this. There's a bunch of folks that are working really hard on it. So yeah, it's essentially like you don't need to use set up.py anymore. It's really nice. And like a lot of tools that sort of converged on Pypiproject taml is the sort of best standard for metadata configuration. So it's nice to see that conversions. Yeah, it's great. Shout out to Brett, I think mostly led this pep. He did an amazing job.

59:22 Very cool.

59:23 691 is exciting as well. So pypi has a couple of different APIs. Most of them are not standardized. One of them that is standardized is a simple API, which is essentially just an HTML page and tools like Pip.

59:35 It'S kind of insane, but if you go to a simple, are you really.

59:38 Going to do it? Yeah. This is going to blow up your browser for sure. Because essentially yeah. So schools don't actually use this page to use individual pages for these projects. But tools like Tip essentially have to parse HTML to interact with pypi and it's not great. Like, it used to work. Okay. It doesn't scale well now. So we're in the process of synergizing a lot of our JSON APIs and one of them that we sort of did, Donald, led this path with some input from project and then Cooper and myself essentially like the same data, the same API files, everything that Pip needs. That's not HTML. It's just JSON. So they can use standard library JSON parser to request and get this and do the stuff that Pip needs to do to be piped.

01:00:16 Yeah, probably make it a little more efficient, easier for other people to consume, right?

01:00:20 Yeah.

01:00:20 Is that something that you encourage other applications to go mess with Pypi APIs or is it public, but we'd rather you don't mess with it? What are your thoughts there?

01:00:31 Like the stuff that we have standardized, definitely you can depend on it, right? Like it will continue to exist and py standardizing it, we said this is what you should expect and unless we change the standard, it's what it's going to continue to do. We have other APIs that existed before we were standardizing stuff. Like there's a legacy JSON API, there's this XML RPC API that's like such a nightmare to maintain. We've kept them running just because a lot of people use them. Like for example, poetry uses this kind of our unofficial JSON API. And yes, there's times when we need to do stuff to that because it isn't scaling right or there are trade offs that we need to make and it's like, well, we're probably going to have to break someone in order to keep this afloat. But with things like the standard APIs, we've spent a lot of time designing them, plan them, standardize them. Those are definitely 100% cool to integrate against.

01:01:18 Okay, fantastic. Traditionally on these PEPs, you'll see it's accepted and planned for this version of this one doesn't have that right because it just goes against the web app. Right.

01:01:28 This is a packaging pep, so we use the same processes Python for their Python enhancements. But yeah, this is about packaging, so it doesn't necessarily tied to an individual Python release.

01:01:38 You don't ship it to a binary that people get. Right.

01:01:41 Well, in a way, we do like, ship support for it in a binary and it's implemented like the success of Pypi. Now, Pip uses this.

01:01:48 Now, if I have the latest Pip on my machine and I Pip and install something or do Pip actions, which does it hit the old simple or does it hit this new one?

01:01:56 Now, if I'm remembering correctly, support has been added to Pip. I think it's been really yeah, I'm pretty sure I could be wrong. I'm not a pip maintainer, but yeah, sure.

01:02:05 Okay, cool. And we have one more to touch on.

01:02:09 You only really care about this if you're, like, really working with Pypi. But we're going to have a new as I said, we're standardizing all our API, so there's going to be a new upload API. The existing API has a lot of problems. It's fairly old. It's essentially just a big post request metadata in it. So this should be a little bit better. And also enable things like draft releases where you can publish something to Pypi that's in draft state. You can review it, practice installing it before you actually publish it, and when it's in draft, it will allow you to override it.

01:02:38 Do things that we don't fix a problem.

01:02:41 Yeah, okay, exactly.

01:02:42 Is this an alternative replacement or just another safety net compared to, say, like the test pypi versus production pypi?

01:02:50 You mean like the draft stuff? Yeah, I think it would be the.

01:02:53 Preferred would stop automatically upload and draft. And you got to flip it so that you don't accidentally publish something that's not ready because you forgot to use the test pypi.

01:03:02 Yeah, test pypi is kind of weird because it actually existed to be our test environment for pypi because we didn't have a great test suite with the old Pipe and now it sort of hangs around as like a playground sandbox. We don't care. But yeah, some people do have it in their production, in their release flow. Like upload here, first, make sure that it works and then so I think this will be a better use case and sort of consolidate folks to use pypi for everything. And we can eventually shut down test pypi because it's not as useful as it could be.

01:03:32 All right, well, I think that's probably all the time we got to talk about this. We could go on and on. We only touched on some of the stuff that we're thinking about talking about, but before we're done, maybe answer the final two questions if you're going to write some code. Yeah, what editor do you use?

01:03:46 I do almost everything in Bi and have since I sort of started doing any kind of development work. That said, I kind of try to do as much as I can in the browser in the GitHub UI. I hit speed bumps there. Sometimes I can't be quite as fast, but for little stuff I kind of like seeing what I can do there.

01:04:04 Yeah, right on. Do you ever press the dot and GitHub?

01:04:07 Yeah, all the time. Love the dot.

01:04:09 Yes. The dot converts it to like hosted Vs code, basically, and then package you want to give a shout out to.

01:04:15 So my biased answer to this is check out the Sig store package on Pipe and the Pip Dash audit package. Those are the stuff that I've been working on, my team has been working on. I'm really proud of the way that they work and we're going to be working on integrating those into more use cases, more patterns, check them out, try them out. I'd love to get feedback on them.

01:04:34 Yeah, fantastic. They seem to feel like a super important hole that sort of backfills some of the security and supply chain stability and so on.

01:04:42 And then I think my unbiased answer, I did want to give a shout out to the Pip Dash tools project on Pypi that's maintained by the jazz band team. So it's just like a roving revolving door of maintainers, but they do a good job keeping it up and running. It satisfies this really, I think, important use case that a lot of people don't do with their Python dependencies, which is like essentially allows you to compile your dependencies into a requirements file that has all the versions pinned, all the sub dependencies. There hashes, which is like, please, like used tips, hash checker, hash all your dependencies. It definitely protects you against another whole class of attacks. But yeah, Pip Tools is great for that. I love the future where that was part of Tip. I don't know if that's going to exist or not up to the maintainers, but yeah, it's super cool.

01:05:24 I have switched to using Pip Tools for all of my packages and I love it. It's fantastic.

01:05:29 Yeah, me too. We use it for Pypi, we use it for a bunch of other stuff.

01:05:32 All right, final call to action. People want to get more involved, maybe make their things more secure.

01:05:37 I'd say call action is go to pypi.Org/security-key-giveaway, see if you're eligible as a critical maintainer for security keys. If not, please just turn on 2FA anyway, while you're there as a maintainer. I'd really appreciate that. Keep your eye on the security space. I think there's a lot of interesting stuff happening, a lot of focus, a lot of resources going into it. Right now. It's a good time to try and adopt some additional security, but to protect yourself, your users, everything.

01:06:02 Yes, I definitely second that. Dustin thanks so much for coming and sharing all this. This has been great to talk about, getting insight, some of the ideas and thoughts behind all these decisions. It's great for sure.

01:06:13 My goal is always really great to talk to you. So glad to be here.

01:06:16 You too. Thanks again. See ya.

01:06:17 See ya.

01:06:19 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show.

01:06:27 Listen to an episode of Compiler, an original podcast from Red Hat Compiler unravels industry topics, trends and things you've always wanted to know about tech through interviews with the people who know it best. Subscribe today by following talkpython.FM/ compiler. You care about the ideas behind technology, not just the tech itself. And you know that tech has an enormous influence on society. So check out the IRL podcast. It's hosted by Bridget Todd, and this season of IRL looks at AI in real life. Listen to an episode at Talk python Fm/irl. Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and Async. And best of all, there's not a subscription in site. Check it out for yourself at training python FM. Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunes feed at /itunes, the GooglePlay feed at /Play, and the Direct rss feed at rss on talkpython.fm

01:07:33 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython/YouTube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon