Monitor performance issues & errors in your code

#435: PyPI Security Transcript

Recorded on Monday, Sep 18, 2023.

00:00 Do you worry about your developer data science supply chain safety?

00:03 All the packages for the Python ecosystem are much of what makes Python awesome.

00:09 But they are also a bit of an open door to your code and machine.

00:13 Luckily, the PSF is taking this seriously and hired Mike Fiedler as the full-time PyPI safety and security engineer, not to be confused with a security developer in residence tapped by Seth Michael Larsen.

00:26 Mike Fiedler is here to give us the state of PyPI security and their plans for the future.

00:32 This is Talk Python to Me, episode 435, recorded September 18th, 2023.

00:51 Welcome to Talk Python to Me, a weekly podcast on Python.

00:54 This is your host, Michael Kennedy.

00:56 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, both on fosstodon.org.

01:04 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:09 We've started streaming most of our episodes live on YouTube.

01:13 Subscribe to our YouTube channel over at talkpython.fm/YouTube to get notified about upcoming shows and be part of that episode.

01:21 This episode is sponsored by Sentry.

01:23 Don't let those errors go unnoticed.

01:25 Use Sentry.

01:26 Get started at talkpython.fm/Sentry.

01:30 And it's also brought to you by us over at Talk Python Training.

01:34 Did you know that we have over 250 hours of Python courses?

01:38 Yeah, that's right.

01:39 Check them out at talkpython.fm/ courses.

01:42 Hey, Mike.

01:43 Hey, Michael.

01:45 Welcome to Talk Python to Me.

01:46 It's awesome to have you here.

01:47 Thanks for having me.

01:48 I'm really excited to be here.

01:49 Yeah, I'm excited to have you.

01:52 It's interesting to talk about security.

01:53 I got to tell you, talking about security just makes me nervous.

01:58 Why is that?

01:59 Well, two reasons.

02:00 I feel like when you talk about security, you're kind of sticking your head up and people are like, let me see if I could whack that.

02:05 You know, not everybody, but a few people in the world, right?

02:08 But it is the internet.

02:09 So if you take a very, very small percentage and multiply it by billions, it becomes non-zero.

02:16 And then, you know, it's just one of those things.

02:18 It's like trying to prove the absence of something.

02:22 It's very hard to prove that you're not missing stuff, some step.

02:27 It's very hard to prove that you haven't, that there's not a, you know, you've got all the controls and there's not one control you forgot.

02:32 Right.

02:33 In that regard, probably more so.

02:36 It's pretty tricky.

02:37 The way I've often thought about security is it's a spectrum, right?

02:41 I used this quote a million years ago.

02:42 I don't know who said it first, but the most secure computer is powered off and buried in six feet of under concrete, right?

02:50 But it's useless, right?

02:53 It's very secure, but nothing in there is useful.

02:56 So if we take that as like a crazy extreme of secure and say the most insecure computer is, you know, powered on, has zero password control, connected to the internet and auto publishing IP data, so that way anyone can come and do whatever they want.

03:10 All right.

03:11 So that's the other end of the spectrum.

03:13 That's a really bad situation.

03:15 There's a fine balance that every software application system company has to kind of navigate to figure out where along those two crazy extremes, where do they fall and where are their kind of risk thresholds of tolerance are?

03:32 Like what would it cost me to add more security?

03:34 Well, I could, you know, lock down all of my users and not allow them in unless they come to the front door and show a picture ID, right?

03:43 Like, okay, if that's how we want to secure our building, that's one way to do it, but that'll slow down the ingress to our building.

03:51 So we issue our employees badge cards and we assume that they act in good faith and they don't kind of lose them and report if they lose them.

03:59 Oh, great.

04:00 So that's kind of a middle ground where you kind of delegate some of the security to the individuals and just kind of, you have to figure out where your security is and what you're willing to do and sacrifice in order to get it.

04:15 Yeah, I totally agree.

04:17 Wild sidebar.

04:18 I can't believe the internet in its early days was like you described, like no NAT firewalls that stop direct access, no passwords.

04:29 We might want to know who you are just so we can assign the files more conveniently to you, you know?

04:34 Yeah.

04:35 I hearken back to like the bulletin board days where you would dial up into somebody's random computer and you would do stuff in there.

04:41 And I hosted a BBS and I interacted with others and it was like, we were all generally operating in good faith because we wanted to kind of play together.

04:51 And not until much later did bad actors saying, you know what, I see how I could take advantage of this in a way that suits me and not you.

05:01 To which we started to say, all right, well then how do we control for these things?

05:05 Today that conversation comes into modern systems development of secure by design, right?

05:11 Or a lot of folks will say shift left, right?

05:15 Take security into account much earlier into the life cycle as opposed to, oh, we have to tack this on at the end.

05:22 So I think the evolution of the internet was necessary for us to get to here.

05:29 But as we're seeing newer protocols develop, those are taking this more secure by design approach.

05:35 Yeah.

05:36 In depth and with layers.

05:37 Were you a trade wars fan?

05:39 Oh man, that's a name I have not heard in a very long time.

05:43 That was a good one though.

05:44 Yeah.

05:45 I was very much a news and mail kind of relay kind of kid.

05:49 Just wanted to see what was going on.

05:51 Got very much involved in like understanding how the pretty good privacy would allow you to sign your messages so that way other folks could believe that those were you.

06:02 Kind of like a test to truth and that kind of fell apart because again, these are all imperfect systems.

06:08 They were, but it was such a world full of possibilities back in those days.

06:13 I remember even just sending a mail and getting it back through that whole systems of relays was mind boggling.

06:20 At the time I was living on top of a mountain in the middle of Israel and having that ability to connect with other people who there's no way I was ever going to see this variety of people back then.

06:35 Like Oh, this opened the world.

06:37 Right.

06:38 Yeah.

06:39 And that kind of fueled my desire to like, okay, what else can I do with these computers with these systems?

06:44 And Oh wait, there's this internet thing.

06:46 All right.

06:47 Well, my mom's going to be ticked off because I'm tying up the phone line for hours and like, all right, well, let's just have some fun.

06:54 Yeah.

06:55 That's when call waiting was the nemesis.

06:57 So I bring, I kind of focus on that a little bit because while we're going to talk about things that are not necessarily positive or people trying to do negative things to something that we all love and has been a very positive thing for the Python ecosystem.

07:12 I do want to point out mostly technology is doing really awesome things for people like opening these doors and educating and connecting.

07:20 It's just some of the bad people, they like to connect and bad ways.

07:23 So before we get too far down that let's, let's just have you give people a quick introduction about yourself.

07:28 So, so they all know you.

07:29 Hey everyone.

07:30 I'm Mike Fiedler.

07:31 I'm in New York city and that's where I've been living for the last 15 years, I say, I think.

07:37 And I've been working in software development systems engineering for over 30 years across a couple of continents, variety of different companies.

07:45 And for the past two years, I think, or three, I've been an active contributor to pypi.org.

07:53 Prior to that, I was contributing to a lot of Ruby projects, the chef ecosystem.

07:59 And I've worked at a variety of different companies, both startups and enterprises.

08:04 You may have heard of some like Datadog, Warby Parker, MongoDB, Capital One, just kind of like working through different scenarios and learning different industries along the way.

08:15 For the past year, I've been, well, since January, I've been focusing pretty much purely on pypi.org.

08:22 You work for the PSF officially or what's the story?

08:25 Yeah.

08:26 As of August, I was hired to come on full-time.

08:28 We thank you to our grants from Amazon web services, AWS, and some other folks that are chipping in to fund this pypi safety and security role.

08:38 But the PSF got some funding and I am the first full-time engineer to focus on pypi.org as a full-time.

08:47 In the past, you've spoken to some other folks who were contracted out to build out different aspects or features, but now I'm a full-time maintainer.

08:56 Yeah, that's really cool.

08:58 You know, the developer in residence at Lucas langa is playing, working in that role now.

09:04 I feel like that was the first one of these types of roles, but now there's a couple, right?

09:08 Yeah.

09:09 I mean, the PSF is a nonprofit organization, very small staff.

09:13 I think we number a total of 12 and of those 12, I think only about five of us are engineers and everything else is volunteer based.

09:23 The first developer in residence program, which is Lukash, has been successful enough that we got another organization and grants to fund the security developer in residence, which Seth Larson is doing.

09:36 And he is kind of focusing on the wider Python ecosystem as a whole.

09:41 Whereas my role is very much more narrowly focused on PyPI.org and the ecosystem surrounding that.

09:49 So that way, you know, we can focus on specific targets around security for the packaging world as opposed to the Python core.

09:59 Okay.

10:00 Well, I do believe if you talk to people about why they like Python and especially why they stick with Python, the language is good.

10:07 You can do cool stuff with it, but it's pip install, say your name, say the name of your useful library that just brings so much and makes it so sticky and useful and productive.

10:22 And so making sure that we have trust and pip install is really important.

10:26 Last year, I think Dustin Ingram came on and talked about some of the stats that he had pulled together that speak about like how much PyPI.org is used.

10:38 That doesn't even count for the countless folk out there who are mirroring PyPI packages.

10:44 So that way they can have a local cache, you know, deal with corporate firewalls or whatever need, right.

10:50 But it's true.

10:52 There's the very popular request library or the Django project.

10:56 pip install Django, and you have all the things that you need to start a Django project.

11:01 Right.

11:02 And the speed at which the folks who are kind of working on the tooling like pip or some of the other alternatives out there to enable users to get those packages is such a wonderful tool in anyone's toolbox.

11:18 But then very often folks figure out, forget that there is an entire kind of package universe behind what they just did as a consumer.

11:27 Right.

11:28 So pip install Django is, yeah, I got this thing.

11:31 It installed it.

11:32 Where did it install it from?

11:33 How did it get there?

11:34 Who put it up there?

11:35 Why is it there?

11:36 All of those questions, most people go their entire career with not even having to worry about or think about.

11:41 They're just like on the consumer side.

11:43 But then on the producer side or the package maintainer or project maintainer, that's, there's a whole other slew of things that one has to worry about.

11:52 Yeah.

11:53 There's some stuff we'll talk about in there, which will be really fun.

11:55 I think also there's the third level of just the people who run PyPI and the infrastructure and the stats behind it.

12:02 I mean, maybe give us a quick, I kind of started us off down this path.

12:06 Maybe give us a quick statement for those who don't necessarily know what PyPI is, but I think more interestingly, maybe try to give us some of the stats about the scale of things behind the.

12:15 Sure.

12:16 I mean, I haven't, I haven't computed the, the runtime stats in a little bit, but pypi.org stands for the Python package index.

12:23 And it's distinct from other things that have pypy in their name, which is a different runtime.

12:31 But pypi.org is a package index, very much kind of a, a grocery or a, a store where you would pick up ingredients for the thing that you want to bake, right?

12:42 If you wanted to bake a cake, you need your ingredients.

12:44 What kind of flour are you going to use?

12:46 What kind of sugar?

12:47 Sure.

12:48 There's different kinds of flour and sugar.

12:49 Which one do you want?

12:50 How do you know you go and find one and where the package index helps is we store kind of publish all the different kinds of flour and sugar that you might want that other people have spent time developing.

13:04 That doesn't mean that there's only one type of flour, but there's a variety and we just make it easy for people to publish their projects.

13:13 And as you've highlighted, there's over 480,000 projects live on PyPI right now and over 4.8 or almost 4.9 million releases.

13:25 And a release is not a one-to-one to a project.

13:27 A project may have many releases.

13:29 So for instance, if there's the requests library and they publish a new version that comes as a release.

13:37 And then beyond that we have files and files map to releases as you could have a source distribution.

13:43 So there's like literally the source code of a given release, or you could have compiled wheels for different platforms.

13:51 So there's a lot more files than there are releases and there's a lot more releases than there are projects.

13:57 And then on the, like the last stat that we show on the front page is the users.

14:02 We do have over 740,000 users on pypi.org.

14:07 That doesn't mean that these are active users, but they have at some point signed up for an account on pypi.org.

14:13 That's a huge number.

14:14 And these are not people who might pip install a thing.

14:16 These are people who for some reason or other are interested in potentially creating content for others to use.

14:23 Exactly.

14:24 Today, the only way you can publish a project on PyPI is by having a user or, you know, it starts with a user.

14:31 There's other ways to publish, but you have to have a user to kind of start the process.

14:37 And a lot of folks have started to kind of get the idea that if this project needs long-term maintainership, right, it's not just me.

14:47 Maybe I should ask somebody else to help co-maintain this.

14:50 So it's also not a one-to-one mapping of users to projects or releases or something like that.

14:55 For sure.

14:58 This portion of Talk Python to Me is brought to you by Sentry.

15:01 You know Sentry for their error tracking service.

15:03 But did you know you can take that all the way through your multi-tiered and distributed app with their distributed tracing feature?

15:10 Distributed tracing is a debugging technique that involves tracking requests of your system starting from the very beginning, like a user action, all the way to the back end, database, and third-party services.

15:21 This can help you identify if the cause of an error in one project is due to the error in another.

15:26 Every system can benefit from distributed tracing, but they're especially useful for microservices.

15:32 In this architecture, logs won't give you the full picture, so you can't debug every request in full just by reading the logs.

15:39 Distributed tracing with a platform like Sentry gives you a visual overview about which services were called during the execution of certain requests.

15:47 Aside from debugging and visualizing your architecture, distributed tracing also helps you identify performance bottlenecks.

15:54 Through a visual like a Gantt chart, you can see if a particular span in your stack took longer than expected and how it could be causing slowdowns in other parts of your app.

16:03 Learn more and see some examples in the tracing section at docs.sentry.io to take advantage of all the features of the Sentry platform.

16:10 Just create your free account.

16:12 And for all of you Talk Python listeners, use the code TALKPYTHON, all one word, and you'll activate a free month of their premium paid features.

16:21 Get started today at talkpython.fm/sentry-trace.

16:25 That link is in your podcast player show notes and the episode page.

16:29 Thank you to Sentry for supporting Talk Python to me.

16:33 Some of the changes coming, I think, allow for like, almost like a GitHub organization within PyPI, right?

16:41 Rather than, well, we're going to create an account and that one account is for all of AWS, for example, which is not really the right granularity, probably.

16:49 It definitely isn't, but it historically has been, right?

16:51 Like that is just a feature we had never built.

16:54 It was never a focus, but over the past year or so, I think we got funded to build out some of the organizations aspect.

17:04 We have launched the community organizations.

17:08 So that way, if you're running a, an open source project or an ecosystem there, you can sign up today and get an organization name.

17:17 We are still working through a long backlog of organizations in order to approve them.

17:21 It still requires a, an admin to do so, but we are still working through some of the complexities around corporate organizations when it comes to just as a nonprofit, how can we kind of figure out how to support corporations properly?

17:36 Yeah.

17:37 I've always thought that that was something of an opportunity to work with corporations more closely on PyPI and indirectly through the PSF.

17:47 Your role exists because of these grants, because connections with certain high profile and high consumers of Python tech companies, right?

17:54 Like AWS and others.

17:56 But there's tons of companies that have things that support their product and they're at least their developers work with and having a way to make them feel more at home on PyPI I think is a good idea.

18:09 Beyond what lots of organizations may do is, you know, have some of their in-house engineers contribute to PyPI.org to the warehouse code base.

18:19 It's open source.

18:20 Everything you're looking at is open source.

18:22 That's where I started.

18:24 And it's, that's the easiest way of like, Oh, you want this thing?

18:28 Open an issue, talk about it with us.

18:29 You know, if you want to go ahead and put some effort behind it, we'll welcome that too.

18:34 But there is a, a wiki page out there of like packaging fundable improvement projects of like, all right, if you're considering throwing some money at the problem, here are some things we've thought about and would love your assistance with.

18:48 And that like there's other ways of just like straight up funding a role that can focus on a particular thing.

18:54 Excellent.

18:55 All right.

18:56 Let's talk about supply chain issues.

18:58 We were talking before we went live here that probably the biggest side of security or the biggest, at least from my perspective, what seems like a very huge opportunity for people to do bad things is to just upload malware basically of different ways.

19:15 Right?

19:16 And I don't want to talk about hacking PyPI org itself or other stuff, but I think that that's probably quite well covered.

19:22 And it's more about, can I get trick somebody and through various ways of installing something that they, they didn't.

19:29 And that generally falls under the supply chain security side of things.

19:33 So I wanted to just point out three examples that just show this is a industry wide problem, not necessarily a PyPI problem, but there is a PyPI manifestation of it.

19:44 Right?

19:45 Yeah.

19:46 So just to lay the groundwork for folks who aren't familiar with supply chain attacks, the notion is that instead of an attacker trying to get onto your computer, they're going to go after something that they have a high probability of knowing is going to be on your computer through for the solar winds as kind of an administrative action.

20:07 Well, you know, many, many solar winds were installed on servers, on computers.

20:12 That's part of the supply chain that it's not, I'm not going directly after you.

20:16 I'm going after something you consume.

20:18 Right.

20:20 And it can be very, very meta, right?

20:22 So one of the examples that I would say that that falls under is this thing called Xcode ghost.

20:27 And so I believe this was primarily a Chinese problem, basically because in China, there were a lot of app store developers who weren't either weren't registered as Apple developers or for whatever reason, didn't go, maybe it's just a latency thing.

20:41 Didn't go through the app store to get their Xcode or go through the developer portal.

20:44 They just found like a local mirror.

20:47 What are those local mirrors?

20:49 What could go wrong?

20:50 I just, I'll just get it from, you know, this IP address instead of apple.com.

20:54 Right.

20:55 Yeah.

20:56 So what it did was it was a backdoored version of Xcode.

21:00 So not, they weren't attacking even the things that people were using.

21:03 They said, let's take over the developers tool chain.

21:07 So whatever they happen to be building, we don't know what that is, but we'll install a virus into their app.

21:11 That Apple go in the app store, then whoever installs that app, we'll have it right.

21:15 These things get very indirect.

21:17 This is kind of the, the challenge is like nobody until somebody surfaced this as an attack, right?

21:24 Nobody thought this was a problem.

21:25 This is kind of earlier to your comment of like, how do you disprove the evidence, the existence of, of a, of a problem.

21:33 And a lot of it is just like, all right, we gotta, we gotta think about every aspect that goes into producing a given piece of software.

21:42 But like the strongest answer here is don't download random stuff from people on the internet.

21:47 Right.

21:48 Like I'm sure that this one in particular had a good reason for having a local mirror, but if you're going to local mirror it, then who is the local mirror and what is there?

21:57 What are they doing?

21:58 Right.

21:59 How, what kind of attestation or assurances do you have that they haven't modified anything in the process?

22:05 It's very tricky because I might absolutely trust some company out there that's building a very popular, they have 10 million downloads like that.

22:13 Surely that's fine.

22:14 But one of their developers or one of their consultants to one of their developers may have, you know, misappropriately gotten their tools.

22:22 And it's very hard from the outside to even know that that could be a problem.

22:26 So these things are tricky.

22:27 Yeah.

22:28 I mean, the good news is that there's a large volume of security companies out there who, you know, make their bread and butter by scanning and looking for patterns that looks, you know, sneaky, tricky, and they spend a lot of investigative time digging into these.

22:43 We get lots of reports from the, from those types of folk of like, here, this is a new package.

22:50 It looks fishy and here's why.

22:52 And then we, we take action on those.

22:54 I hear you.

22:55 Hypo squatting was a big issue for a while.

22:59 That's a form of supply chain attack.

23:01 Like here, this Xcode ghost is we're going to get people to use a fake Xcode or a broken bad Xcode with that they think is fine.

23:10 Right.

23:11 Instead of trying to say, take over Django, the package and do some militia to it, try to take over Django or, you know, whatever, right.

23:20 Some common misspelling of that and upload that package.

23:24 And you could even embed Django.

23:26 Right.

23:27 And so it still functions.

23:29 I don't remember it being spelled this way, but it's working.

23:32 So it gotta be fine.

23:33 Yeah.

23:34 Typo squatting is, is, is very much a prevalent problem.

23:37 Right.

23:38 And because like, I can't prevent you from making a typo.

23:40 Like I literally can't, if you type in Django, that's it game over.

23:44 Right.

23:45 What I can do is look or receive reports that Django exists.

23:50 It looks malware and let's just take that down.

23:52 Let's not do that.

23:53 Right.

23:54 Yeah.

23:55 And I was going to say when it comes to type of, oh, you talked about typo scouting and, and I was reminded of a, of, of an article I remember reading around DNS record bit flipping where some computers, some browsers would not properly process a given bit in a memory register for a DNS record.

24:17 So this author figured out what those bit flips would be for popular DNS names, registered those DNS names and started just harvesting traffic and said, you know what, this is not anything you can do.

24:28 This is just how browsers and memory work.

24:31 And that was, I don't know, about six, seven years ago.

24:33 And I believe it's been fixed since, but it was like, yeah, there's sometimes there's just not anything that you did wrong.

24:39 It's the ecosystem you're in is doing things in a way that you don't expect for something as nefarious as, as like DNS bit flipping.

24:49 Like this is where like having outbound firewalls can help a whole lot to say, don't allow traffic that I didn't initiate in some manner.

24:59 And if I did have, I have, I initiated the traffic to this address before.

25:03 Do you remember zone alarm from the early two thousands?

25:07 Yes.

25:08 So this is before this is, this harkens back to a slightly less naive version of, I can't believe there was no passwords on the accounts, just on the open internet.

25:18 But windows 95, 98, there were no firewalls.

25:22 And I, I was at a company that was based inside of a university where we all got ethernet and every computer that plugged in, got its own IP address and all sorts of crazy stuff, but there were no firewalls.

25:36 And I remember when that thing came out, I thought, you know what, maybe I'm just gonna go around and put this on all the dev machines.

25:41 Like it's kind of insane that we have this incredibly insecure software just on the open internet.

25:47 And so I did in all the, when I started, it used to say, do you want to let such and such thing act as a server?

25:53 Do you want to let IIS or, you know, NGINX or this type of thing.

25:58 Sure, that can be a server.

25:59 Then the next pop-up was, do you want to let notepad.exe be a server?

26:03 I'm like, huh, that's not probably what it should be doing.

26:06 Yeah.

26:07 That doesn't sound right.

26:09 I said, no.

26:10 And then the next one, and the next one, the whole companies and notepad.exe were being servers and I'm like, this can't be good.

26:17 And it turned out they had, something had infected it.

26:20 And until I put on one of those outbound firewalls, how do you know?

26:24 Right.

26:25 No one knew there was no indication we had, you know, super fast internet.

26:27 It wasn't like it was dragging it down.

26:29 I don't even remember what it was doing, but it was bad.

26:31 The number one thing that I think we can learn from all of those things is that awareness is the biggest part of security, because if folks aren't aware that downloading something from the internet could be a danger, then they're just going to download it and run it.

26:46 If somebody who had a previously version of, you know, software working on their machine suddenly pop up and say, this has been modified.

26:56 Are you sure you want to open it?

26:58 So many of us just click okay, without reading the dialogue.

27:01 It's like, well, wait, think about that for just a second, because you are the biggest kind of enabler and disabler of security, the human behind the keyboard, because you probably have some administrative rights on your computer that allows you to do some stuff.

27:17 And in the example with Notepad EXE, I think today, if we were to try to do that on some popular developer environment like VS Code, VS Code does act as a server in a lot of cases.

27:30 So it's like, I don't know, should this work as an inbound server or not?

27:34 I don't know.

27:35 Maybe it's just part of the local language server that I need for autocomplete.

27:40 Or maybe it's not.

27:41 It's getting more subtle every day.

27:43 It is absolutely getting more subtle.

27:45 Even Zoom had like a local loopback web server thing, I think for a while.

27:49 So before we move off of this typo squatting part of the conversation, out in the audience, we've got a pretty decent question here.

27:56 What's the possibility of something like a verified badge for popular packages?

28:00 I mean, if Twitter can charge $8 a month.

28:02 No, I'm just kidding.

28:03 I don't think they're called Twitter anymore.

28:07 But the artist formerly known as Twitter.

28:11 The challenge there is what does verified mean?

28:14 This is something that we kind of introduced some features later on that we'll talk about.

28:19 But this notion of verified is like, well, verified by whom?

28:22 Where does the level of trust?

28:24 Because if a supply chain attack happens for Django, so if you were to like search Django here in PyPI.org and we get Django, all right, we've got Django, the second line Django 425.

28:36 And if we were to enter there, like, how do we know?

28:39 This is a thing, right?

28:40 So I could add a badge here, but that doesn't give me any confidence that any of the Django folk, which are great people, that one of them didn't get compromised and suddenly a new version was pushed.

28:51 So verified, I guess it's what does that mean to whom and why?

28:56 Because the last thing I want to do is tell people, give them a false sense of security when honestly you're downloading software from the internet.

29:05 If you don't have a process to vet what it is you're doing is doing the thing, then you should probably look at that aspect of a, we vetted this version of Django.

29:15 We got these hashes, we got these releases, we pin this dependency.

29:19 We're happy with this.

29:21 And then when you upgrade, you kind of do a similar evaluation.

29:25 There's a bunch of projects out there like PyUp and safety and others that will publish, you know, and scan for advisories.

29:32 There's also the PyPA advisory database for packages that we know have some problems with them.

29:41 So that way you can use other tools to audit what you have installed to see if you have something smelly.

29:47 But we are thinking about what it would look like to add a, this release and these files of a given project have been published under, you know, stringent, you know, more secure methods.

30:00 Yeah.

30:01 I certainly see that a verified wouldn't prove that the Django devs hadn't, you know, somebody could have taken over their computer and swapped out like twine or poetry or whatever they're using to upload the package and do exactly what they did with Xcode Ghost basically.

30:17 Right.

30:18 Something equivalent to that.

30:19 So the last part we want to do is like, we don't want to give people a false sense of security and say, well, PyPI told me this was okay.

30:27 And then they find out it wasn't because then that looks really bad for us.

30:31 But on the flip side, we are looking at how do we provide mechanisms and measures to publishers to reduce the potential for the situations that you described to happen.

30:46 This portion of talk Python to me is brought to you by us over at Talk Python Training.

30:52 Let me tell you about one of our really popular courses.

30:55 HTMX plus flask modern Python web apps hold the JavaScript.

31:00 HTMX is one of the hottest properties in web development today.

31:04 And for good reason, you might even remember all the stuff we talked about with Carson Gross back on episode 321.

31:10 HTMX along with the libraries and techniques we introduced in our new course, will have you writing the best Python web apps you've ever written clean, fast and interactive all without that front end overhead.

31:21 If you're a Python web developer that has wanted to build more dynamic interactive apps, but don't want to or can't write a significant portion of your app in rich front end JavaScript frameworks, you'll absolutely love HTMX.

31:34 Check it out over at talkpython.fm/HTMX or just click the link in your podcast player show notes.

31:39 All right, let me throw some ideas out to you and tell me what I think.

31:46 So as I think about this, especially when the very first news a couple years ago, I can't remember exactly the timeframe, but not very long ago, the first malicious PyPI package, you know, NPM had been getting whacked on for a while because JavaScript yellow.

32:01 But you know, when it came to PyPI, I was like, okay, this seems to be a little more serious and more pervasive.

32:06 And they were often typo squatting type of issues, or people would introduce some package and say, here's a cool thing.

32:14 You should check it out.

32:15 And it's really a virus or one of those types of things.

32:17 And so one of my thoughts, one of the metrics I would have liked, or maybe in the future will like to apply to my local Python environment is don't let me install packages that are too new, or don't let me install install packages that have too few downloads.

32:36 And like, give me a mechanism to say that, like, I don't want to ever say pip install something and that something has not existed on PyPI for less than a week.

32:44 I don't ever want to be able to say pip install something.

32:46 And that thing has less than a thousand or 10,000, whatever downloads unless, and they could say, Nope, you can't install that.

32:53 It breaks your rules.

32:54 You could say, okay, no, I actually uploaded this.

32:56 I really need to, you know, you could do like a pip install of force dash dash force, or you know, some kind of override.

33:02 But by default, if I could just say, you know, it has to have at least 5,000 downloads, or I just don't want it.

33:07 I feel like at that point, somebody would have discovered, Oh, you know what is actually using a hundred percent CPU usage and crypto mining or whatever it happens to be doing.

33:15 I don't want to be the first Guinea pig in the world to discover this.

33:18 What do you think about this idea?

33:19 The download count one is always an interesting one, right?

33:22 It's a topic that comes up a lot.

33:24 And like, I can tell you personally from experience that writing a little loop to increase download counts is super easy.

33:32 Interesting.

33:33 Okay.

33:34 Like write a wild true pip install something and like you'll drive up download counts that will be meaningless in the grand scheme of things.

33:41 So you could say, well, maybe make it like, it's got to have, you know, a thousand distinct IP addresses, but then, you know, if you own a botnet, then you're good to go.

33:49 Okay.

33:50 Fair.

33:51 This becomes the, like the cat and mouse game of like, all right, well, what is something that is good?

33:55 Today we have a mechanism where we don't advertise new packages that have been there for, I think under a week to any kind of crawlers.

34:05 So any search engine crawlers.

34:07 So if you were going to like Google for Python, Django, and it was a brand new package, you wouldn't find it via Google because we wouldn't advertise that for indexing yet.

34:18 Right.

34:19 But after a week, like we do.

34:21 So that's one method that we have for preventing some of these like newer packages from getting widespread visibility because they, you know, everything is a webpage.

34:32 They are all subject to search engine optimization.

34:34 Somebody could craft their readme to, you know, be the best hit on Google and therefore they'll show up first.

34:41 And with all this crazy AI stuff, it's only getting easier.

34:44 Hey ChatGPT, I would like to create a page that is like the Django PyPI page, but I want it to rank highly for this.

34:52 Something that we are talking about internally of like, how do we put packages that are brand new, either from some heuristic of a brand new user or a brand new version, or differs enough from the previous versions and kind of put those in kind of a holding or a time out zone to let our security research partners who are really excellent at like just listening to the package feeds and going after and just running all their analysis on them to give them first crack.

35:21 Right.

35:22 And then when they see, okay, out of these hundred thousand packages that were published in the last 24 hours, 1% need to be addressed or reviewed by a human.

35:31 They can raise those red flags and then we can kind of apply the administrative action that is necessary in order to keep the users from getting too much of the bad stuff on their computers.

35:43 What about some kind of whitelist or a check back to like sneak or one of these other companies that you kind of referenced there just a moment ago?

35:52 Having like published allow lists, right?

35:54 These are very prevalent in large corporations that have very strong security policies and they have teams of folks that will maintain internal mirrors of a package index.

36:05 So they will disallow any pip install of anything unless you're using their package index.

36:11 And I think that is another tool in the security toolbox to have people who are that like security focused to say, we will only allow in the things that we have already tested to be true.

36:25 We vetted them and those kind of match our heuristics.

36:30 If you scroll down a little bit on the Django page, almost every sidebar to every one of these has these statistics.

36:38 This particular one shows GitHub statistics because this package has a GitHub URL, but there's also libraries.io, which is not affiliated with pypi.org.

36:47 They're just a really great service and you can search for packages of any shape, kind of any ecosystem, but they have a really good kind of ranking system.

36:57 Again, if it works for you, the crux of it, don't install garbage off the internet, right?

37:01 Check out what you're doing.

37:03 But by using something like libraries, which I don't know why that didn't load.

37:07 Probably was just getting a virus.

37:08 I probably misspelled it.

37:09 Oh yeah.

37:10 Just kidding.

37:11 But they offer a nice set of stats around a given package.

37:17 So you can try and be a little bit more informed on your own.

37:21 The challenge there remains that nothing is going to tell you on libraries.io or PyPI if somebody has uploaded malicious software and this is a bad one.

37:33 The best we can do is once we know about it, we handle it.

37:35 Yeah.

37:36 I feel like PyPI has been pretty on top of it.

37:39 We try.

37:40 I published a blog earlier today where I pulled together a lot of analytics and stats from our inbound malware reporters and it's looking pretty good.

37:50 We handle over 80% of inbound reports in under 60 minutes.

37:55 I go into the article about the whys and wherefores, the timeliness matters and the response time because the longer something is out there, the worse it can contagion to other folks.

38:08 So we try and do as quick as possible, often under like five to 10 minutes, but we also have to do some investigation and kind of like confirm that the report is accurate.

38:18 We don't want false positives.

38:19 Most of our researchers don't give us false positives.

38:22 So shout out to all those folk, but it's hard and time consuming.

38:27 I remember one of the more recent PyPI supply chain issues where somebody uploaded something bad was attributed to all these different ATP and hacking groups have cutesy names like the solar winds was by something bear.

38:43 Hold on.

38:44 Which bear?

38:45 Cozy bear.

38:46 That's the kind of bear it was, which is really Russia state actor hacking.

38:49 Right.

38:50 And one of the PyPI ones was North Korea.

38:53 I think they were doing crypto mining on computers, which seems like a real big waste of I have access to the server in a bank.

39:01 But anyway, it works for them.

39:03 It works for them, you know, like, but the reason I bring this up is like, it's you all have a serious challenge in that if you're up against state actors from a security perspective, like that's not just script kiddies or some weird automation or, you know, like those are you guys got to be on top of the top of your game.

39:21 Right.

39:22 This is again, where I think relying on our ecosystem of security partners is so important because they will corroborate intelligence that they've garnered from other ecosystems that are beyond PyPI and be able to identify these kinds of actors.

39:37 Me, I see kind of just a slice of what the universe has.

39:42 They're going to see a different slice, but broader in spectrum and not necessarily as focused on one particular ecosystem.

39:50 So working together, we can kind of do the best that we can for all the users out there.

39:56 Excellent.

39:57 So let's talk about hypo squatting, which is serious, but also kind of the silliest, kind of not that big of a deal because recommendations could be like, you know, actually use a requirements management system rather than just every time you create a new environment, just type pip install X, Y, and Z.

40:14 Like the chances you might fat finger that versus pip install dash R requirements, TXT or, you know, poet something with poetry or whatever.

40:22 Right.

40:23 So that helps a lot, although it's not perfect.

40:25 The other one is more the Xcode go style.

40:28 Like what if somebody were to take over one of the other systems and you all had over here, you have a new two factor requirement for PyPI.

40:39 You want to talk about that?

40:40 Yeah, absolutely.

40:41 This also was covered on an earlier podcast of talk Python where I think in 2022, we had announced that we were starting to ratchet down the amount of potential.

40:56 I think you got the wrong link there.

40:57 I do have the wrong link.

40:58 Keep going.

41:00 It's Dustin, Dustin Ingrams.

41:02 Yes, exactly.

41:03 I thought I pulled it up.

41:04 I put the other one twice.

41:05 There we go.

41:06 The 2FA story is largely, again, we talked about there's about 740,000 users, right?

41:11 These are the publishers of packages, right?

41:15 So if in our use case, we talked about Django devs, right?

41:19 And I'm sorry to pick on Django.

41:20 They're just the one that's up there.

41:21 But if one of the Django devs was using a classic problem, which is an email expiry or a domain expiry attack.

41:29 So let's say I'm a Django admin maintainer and I use MikeDemand.com as my email address, right?

41:37 And that's great.

41:38 Because we don't want to ever, we don't use Gmail.

41:39 We don't want to use or, you know, the dot me or Outlook.

41:43 I'm a good citizen of the internet.

41:45 So I got my own domain.

41:46 Yeah.

41:47 I just haven't been paying attention.

41:48 Right.

41:49 I haven't been paying attention this year.

41:50 Right.

41:51 And then let's say I let it expire.

41:52 Whoops.

41:53 You know, like that happens.

41:54 People forget to pay their bills.

41:56 Or your credit card gets stolen and canceled.

41:58 You forget to renew it there.

42:00 And then the other thing goes to spam.

42:01 Like it could actually be super easy that that happens.

42:04 And it happens all the time, right?

42:05 Like people, there are numerous domains that I've registered over the year that I was like, yeah, I don't need that anymore.

42:11 Obviously I have never used anything from that domain to sign up for anything securely that's there.

42:16 But then someone else can come along and register mikedemand.com, set up an email server, request a password reset, get that email.

42:24 And now they can do anything I could have done before.

42:28 With 2FA, that entire set of problems goes away.

42:32 And we're not even talking about like phishing.

42:34 If somebody fishes my password or if they use the same, if I made the mistake and use the same password on two websites and one website stored it in securely, and they pop that in a breach and you know, now they have my username and password.

42:48 2FA just solves.

42:49 Do you discourage that using the same username and password?

42:51 I absolutely discourage that.

42:53 I find it very inconvenient to have a separate password.

42:55 I just use the letter A. Yeah, that's a choice, right?

42:59 It's a bad choice.

43:00 No, like the amount of tooling out there today, both free and paid for password management is just so pervasive.

43:10 It's almost like irresponsible to not use one.

43:12 I 100% agree.

43:13 Yeah, I was one password.

43:15 I think I don't know if it'll tell me how many I have in here, but I think it's coming up on like 1500 and not quite just just under 1000 different distinct passwords and accounts.

43:28 But you know, a lot of people don't want to pay for it.

43:30 Bitwarden.

43:31 Bitwarden is fantastic.

43:32 It's open source.

43:33 I don't know if you got a recommendation, but you're right.

43:35 It's irresponsible.

43:36 I mean, I'm a one password fan.

43:38 It's just a great tool.

43:39 I used it back when when it was like a single thing and I used it as a as a organization account, right?

43:46 Like I was an admin for our org and like managing that lifecycle was pretty sweet.

43:52 And then it's like, OK, we have this as an organization.

43:54 We have over 400 employees.

43:55 Why doesn't everyone have this right now?

43:57 So you know, it became a good rollout.

44:00 But having a second factor, a two FBA or multifactor MFA, I think, is this notion of something you have versus something, you know.

44:10 So let's say that even by using a password manager, you don't know that password anymore, right?

44:16 You don't remember it, but let's say you do.

44:18 Right.

44:19 Like let's say somebody gets your entire vault of passwords.

44:21 They still don't have this second factor, which is often a time based one time password or web authentication device, which could be a hardware device or a browser fingerprint.

44:35 Like they don't have that.

44:36 Right.

44:37 It's a defense in depth kind of problem that is solving where it's like you need you need to have two things in order to get through this door.

44:45 And if you only have one, that's not good enough.

44:49 And using that capability and having that ability on PyPI user management has enabled us to roll out a higher grade of security for the packages and maintainers of those packages by attesting that, well, we know that this maintainer or this publisher of this package has already secured themselves.

45:14 So against these kinds of attacks.

45:17 Yeah, I can just hear the voices.

45:20 In fact, they don't come through an audio form.

45:22 They come in email like, you know, on that last episode.

45:26 Sometimes they come through on the artist formerly known as Twitter.

45:29 Sometimes they come through an email.

45:31 But like, you know, Michael, you said that two factor will help you realize you don't seem to realize I'm saying I realize so I don't get this email.

45:38 Please don't email me that this doesn't stop phishing.

45:41 Like people could still Pish you.

45:42 You go and they could ask you your name and password and they'll ask for your time based authentication and then they're in.

45:48 Yes, that's true.

45:49 But it stops some things and stopping some things rather than going, well, it's not good enough.

45:55 That is certainly not not a responsible way.

46:00 Way to go.

46:01 I think it's kind of like making the argument that if nothing is perfect, don't do anything else.

46:05 Exactly right.

46:06 That's a fallacy.

46:07 If you're going to die, don't get out of bed.

46:09 Right.

46:10 Like, no, like we get out of bed, we go to work, we do our things right.

46:13 We ultimately as sad as it is right.

46:15 We have an end date.

46:17 We hopefully don't know what that is.

46:19 But like, do the best you can while you can.

46:22 That's where I come to from like, this is the best we know.

46:26 Yes.

46:27 Will there be something new and exciting tomorrow that is even better?

46:30 Maybe.

46:31 But until then, let's do the thing that we know to be the best that we can do right now.

46:35 Right.

46:36 Maybe PASCYs will be awesome.

46:37 I don't know about that.

46:38 Yeah.

46:39 But for example, you know, from a phishing perspective, things like One Password and Bitwarden have plugins for your browsers and they will suggest to autofill on the right domains.

46:49 But if you're on pypy.io, is it a pypi.org or, you know, whatever, right.

46:56 If they're on some kind of phishing domain, they will not suggest to autofill.

47:00 Right.

47:01 If you find yourself going to your password manager and going, God, why does this not work?

47:05 Like, let me just copy this over.

47:06 Stop, figure out why it's not working really, really, really well before you somehow subvert this broken extension that won't autofill.

47:15 Right.

47:16 There are ways to limit phishing through these mechanisms, even if they're not perfect.

47:21 Exactly.

47:22 I think I said this before, but like, I'll reiterate it.

47:26 You the human are the best defender.

47:29 Use your logic, use your sense.

47:31 Like don't just click at things mindlessly.

47:34 Take a moment, take a look, see that error message.

47:37 That looks weird.

47:38 Why does that look weird?

47:39 The domain I'm on looks a little odd.

47:42 The little browser lock symbol isn't locked.

47:45 Why is that?

47:46 Hmm.

47:47 Take a moment.

47:48 Notepad.exe once stacked at the server.

47:49 Yes, I want to load it.

47:50 Come on.

47:51 Yes, let it.

47:52 I got it.

47:53 Yeah.

47:54 The reason that I think the news around the 2FA for PyPI.org is not that it exists, but that it's required now.

48:03 I think that's what's different since I spoke with Dustin.

48:05 We've been on a path and as you've got this blog post open, we've been on a path of like starting with the carrot.

48:12 We want to provide as many people in the packaging ecosystem, all the incentive, all the time, all the kind of expectation that they could have in order to set this up voluntarily.

48:25 There was even a wonderful giveaway of hardware security keys that Google sponsored, which is excellent.

48:33 That doesn't mean you need a physical security key.

48:35 You can use them.

48:36 You can use software security keys, Google Authenticator or any other tool.

48:40 Duo Labs has a nice one, but anything in order to kind of move the bar on this 2FA engagement.

48:50 We've seen some decent adoption and it's like, okay, well now let's set a timeline.

48:56 This post by Donald kind of starts the clock on that.

49:00 We are basically drawing a line in the sand that's saying at the end of 2023, if you want to publish a new package, like that's it, you need to have 2FA.

49:11 We've started on that process by requiring 2FA for new users.

49:17 So if you registered today, you need to set up 2FA.

49:21 Like if you've been around for a while and you don't have it yet, we'll still allow you to upload, but we'll send you a notice that's saying, here's what's going to happen at the end of this year.

49:30 And we've slowly been kind of ratcheting down the areas at which 2FA is not required with the intent on basically January, December 31st, January 1st, 2024, enabling the requirement on all accounts.

49:46 So that way we can kind of walk away from the problem of, well, I guess one of the Django maintainers got phished and that's why we had a big issue in the ecosystem.

49:57 Like I don't want that to be the problem.

49:59 But again, apologies to Django, y'all are awesome.

50:02 It's because they're so popular and loved that you pick on them, I can tell.

50:05 Yes.

50:06 Again, this doesn't completely solve all phishing attempts, but it certainly is another layer of defense.

50:13 So I think it's certainly worth doing.

50:15 Now there was a bit of a pushback.

50:18 I think somebody even like rage quit their package temporarily and then said, oh no, I want it back on PyPI when this came out as if it was a big deal.

50:27 And this is, you know, this blog post was from May.

50:31 The deadline is end of 2023.

50:33 In between those two times, GitHub just comes out and goes, everyone gets 2FA right now.

50:37 I don't care.

50:38 Right.

50:39 And it's such a broader, more impactful thing in terms of the many people use Python who are not creating packages, but almost everyone who uses Python is also in some way using GitHub.

50:50 And so it just touches so much more of the ecosystem and people are like, oh, okay.

50:53 I don't know why there was so much blowback in one and not the other.

50:56 But it's an odd thing, right?

51:00 Because on the one hand, PyPI or the index itself, right, has been around for about 20 years.

51:06 This is a long lived concept in the Python ecosystem of having a place where people can publish software freely, no charge, and others can install that software.

51:20 This requirement is a shift, right?

51:24 And a lot of folks are like, well, what else is going to happen?

51:27 It's like, well, probably nothing, right?

51:29 I don't see us talking about other requirements or enforcements unless they're necessary.

51:35 Again, I can't predict the future.

51:38 And if somebody says that like pass keys are the best way and TOTP is broken and proves it, and the industry wide decides, oh, wow, this is not a good idea.

51:50 Let's do this other thing.

51:51 Then maybe we'll do that.

51:53 But until then, this is the best we've got.

51:56 The requirement for 2FA is even on the OWASP top 10 list of why you should be doing this.

52:05 And it's like, this is what governments use, companies use, and auditors use to say, we are adhering to the best practices.

52:13 Because if you had a security vulnerability reported to your company because you weren't using 2FA, auditors will say, well, why not?

52:21 It's in the top 10 list.

52:23 It's like the SQL injection of yesteryear.

52:26 Yeah.

52:27 Just like, just do this, right?

52:29 Just solve this class of problem.

52:31 You will have other problems.

52:33 We all have problems, but solve the ones that we know are relatively easy to solve.

52:38 Good advice.

52:39 I feel like, you know, when the two factor software problem, like that's not good enough, you know, these YubiKeys and stuff are too tricky.

52:47 We're just going to go back to SMS.

52:49 Like that's, that's where it's.

52:51 I cannot believe that my bank will let me use 2FA.

52:54 They forced me to use SMS.

52:56 You might want to check out for different banks.

52:58 Well, it's like one of the top four banks in the U.S.

53:01 It's nuts.

53:02 They also have limits on the length, not lower bounds, upper bounds on the length of the password.

53:08 My, that, that, that I understand why, right?

53:13 Upper bounds.

53:14 I understand why, but it usually boils down to like database design and like the cost of doing a database migration.

53:19 I hear like, I think it's like 12 or something.

53:21 It's very short.

53:22 Oh, that's short.

53:23 That's way too short.

53:24 But here's the thing.

53:25 Do you know, it doesn't matter if you have one letter or a hundred letters, the hash is still the same length.

53:29 Depending on how you're hashing it.

53:30 Yeah.

53:32 But they will not be stored.

53:33 Like if they're not storing the hash, it makes me extra nervous.

53:35 Anyway, onward.

53:36 I'm glad they got the SMS 2FA backing it up.

53:41 Yeah.

53:42 Another thing that I, that I, that I do want to kind of plug on the, the, like the security spectrum and kind of to address the question around like verifiable releases is something that we launched earlier this year, which is called trusted publishers.

53:55 That's right.

53:56 That's alluded to or linked to in the there we go on our docs.pypi.org of what it is.

54:02 Links in the show notes, people can check it out.

54:04 Where we leverage an open standard called OpenID Connect.

54:08 And today we only implement this with one publishing tool called you know, GitHub actions where the service GitHub actions is now delegated to be a trusted publisher for your project.

54:22 When you set this up, you have to opt into this completely.

54:25 We didn't do this for you, but you can now opt in to say GitHub actions is allowed to publish my project.

54:32 And then you can say, you know what?

54:34 None of my humans are allowed to publish the project.

54:37 The computer that is getting a short lived token for like five minutes or 10 minutes, whatever it is, is allowed to publish this package and no one else's.

54:46 And that's how we can start to build the levels of attestation and kind of the software supply chain security to say, I know where the source code is.

54:57 I know the source code that built it.

55:00 I know the builder who built it.

55:02 I know the builder who published it and no one else tampered with it in the interim.

55:08 We're not there to like prove that nobody else tampered, but we are there to say, I can now delegate authority to GitHub, GitHub actions to perform this release for me as opposed to me creating a token in PyPI and giving that token to GitHub actions.

55:25 That's how we did it before.

55:26 Right.

55:27 A long lived permanent token that you put in plain text somewhere, right?

55:33 What could go wrong?

55:34 I mean, usually like an environment variable or a secrets and GitHub actions, they have pretty good ways of securing data, but again, it's long lived.

55:42 So if anything ever happened over there, if anybody dumped a debug log that they shouldn't have, that token could be there.

55:48 So by using a trusted publisher flow, you can now have your GitHub actions deployed directly to pypi.org once the artifact is complete and not have to do that token management.

56:03 So we're getting short on time, Mike.

56:05 What else do you want people to know about what you all in particular, what you're doing at PyPI and some of the initiatives and maybe how they can help?

56:14 The top of mind for me right now is the malware reporting project that we're engaged in.

56:19 And that's kind of linked to at the very bottom of my blog from today, the inbound malware reporting blog, where we are looking to establish what a kind of machine readable protocol would be to interact with security researchers.

56:33 A few of them have chimed in already on what they think of, and we're just kind of building the conversation around what it would look like to report.

56:41 How do you like to report?

56:43 And then we'll proceed with whatever guidance we get there and kind of build out the payloads and stuff like that all the way at the bottom, very bottom, all the way at the bottom.

56:54 And once we have this format in place, we're going to be building out like the infrastructure and ecosystem in order to submit those payloads and then figure out how to kind of put packages in timeout while these payloads are being investigated.

57:09 So that way we can continue to provide a secure ecosystem for all users of pypi.org.

57:15 I think that's great.

57:16 I certainly, you know, these companies that are checking out and just monitoring the flow of packages and scanning them, that's a huge service.

57:25 Is there, there probably is, never will be like a bug bounty equivalent.

57:30 Is there?

57:31 I mean, never say never, but.

57:32 Never say never.

57:33 From that perspective, it becomes a bit of a challenge because then you could start funneling money through a bug bounty program because we are offering an ability for people to create packages and then saying, we're giving you a monetary incentive to report them to us.

57:50 So it's like, well, now we've given you a pipeline for money.

57:54 There's a whole shadow industry of like, you first create it, then you get it popular, then you report it.

57:58 Yeah, yeah, yeah.

57:59 No, I hear you.

58:00 Yeah.

58:01 But you know, no, no ideas too farfetched.

58:04 We like talking about ideas and figuring out what, what makes sense and kind of, again, with a lot of security work is like, okay, well, how can this go wrong?

58:14 How can this fail?

58:15 Right.

58:16 How can it be gamed?

58:17 Yeah, absolutely.

58:18 Well, I, for one, feel better that you're putting all your time and energy into focusing on these problems and seeing how we can make PyPI better for everyone.

58:27 Almost everyone, not for everyone, but for 99.9% of us, for most people, just want to use it in a solid way to build Python software.

58:34 That's kind of why I was drawn to it, right?

58:37 Like to contributing to it is such a foundational piece of modern day infrastructure that it's important that it be safe, secure, convenient, useful to anybody who wants to use it because Python itself is such a ubiquitous language across the planet and beyond that, you know, we want to make it the right thing.

58:59 Yeah.

59:00 Surprisingly, every time you say that statement, it's more true.

59:02 Like that just, that graph continues to go up and in surprising ways.

59:06 All right.

59:07 Before you get out of here, I'll ask you one of the final questions, a notable PyPI package, not malware ridden, but a good, useful one.

59:16 What do you recommend?

59:17 Anything you come across that's awesome lately?

59:18 I'm a huge fan of pytest and I know that, you know, your big pals with Brian Okken, hey Brian, who talks a lot about testing and pytest plugins are a wonderful extension to pytest.

59:30 Yes.

59:31 And there's so many of them out there and there's even like an awesome pytest aggregator of these.

59:37 And I think I have one on here, which is called pytest Socket.

59:41 Nice.

59:42 Which I maintain till today.

59:44 But the one that I want to point out is one that I recently learned about, which is called ICDiff.

59:49 I, the letter C, diff.

59:52 I don't even know if it's on this.

59:53 It's the letter C.

59:54 I gotcha.

59:55 Yeah, there it is.

59:56 So that's not the pytest package, but there's an extension pytest ICDiff.

01:00:02 We'll get there.

01:00:03 So this uses that other one.

01:00:04 But the notion here is a lot of times you get big pytest output if you're comparing, you know, dictionaries, lists or stuff that has lots of data.

01:00:14 Sometimes detecting the difference is very hard in the terminal and the pytest ICDiff extension will help highlight a lot of these with colors, with spacing, which makes finding the problem much easier.

01:00:28 Yeah.

01:00:29 That seems super helpful right there.

01:00:30 And it does a partial character by character diff and line by line diff with different colors.

01:00:36 Yeah.

01:00:37 And the here's what we expected.

01:00:38 Here's what you got.

01:00:39 Yeah.

01:00:40 So I'm learning that there's even more madness to the pretty print.

01:00:44 So it could say from pprint import pprint, but there's also apparently a pprint pp with ppi.

01:00:52 Okay.

01:00:53 Yeah.

01:00:54 I don't know.

01:00:55 That's more things to explore.

01:00:56 It's always, it's going to be in one of those 400,000 packages on PyPI.

01:00:59 Right.

01:01:00 It's got to be there.

01:01:02 And it might be a little different.

01:01:03 It might be just enough different to meet this use case that is, you know, perfect.

01:01:08 Yeah.

01:01:09 So pprint plus plus that's what the PP is like CPP up there.

01:01:14 Okay.

01:01:15 Got it.

01:01:16 Notepad++.exe.

01:01:17 It wants to act as a server.

01:01:19 All right.

01:01:20 Let's leave it with that.

01:01:21 I guess a final thing, people are excited to hear about this.

01:01:24 They want to get engaged.

01:01:25 You know, they have ideas.

01:01:26 They want to reach out to you.

01:01:27 What do you say?

01:01:28 Open an issue for us on, you know, the warehouse repository, if it's relevant to the warehouse code base.

01:01:34 If you need to reach me directly, I'm on GitHub as Mike the man.

01:01:38 I'm on a mastodon as miketheman @hackyderm.io.

01:01:42 Or if all of that fails, go ahead and email me at Mike@python.org.

01:01:47 Awesome.

01:01:48 Thank you so much.

01:01:49 Thanks for being on the show and giving us a status report here.

01:01:52 Absolutely.

01:01:53 Thanks for having me, Michael.

01:01:55 This has been another episode of talk Python to me.

01:01:58 Thank you to our sponsors.

01:01:59 Be sure to check out what they're offering.

01:02:01 It really helps support the show.

01:02:03 Take some stress out of your life.

01:02:05 Get notified immediately about errors and performance issues in your web or mobile applications with Sentry.

01:02:11 Just visit talkpython.fm/Sentry and get started for free.

01:02:16 Be sure to use the promo code talkpython, all one word.

01:02:19 Want to level up your Python?

01:02:22 We have one of the largest catalogs of Python video courses over at talk Python.

01:02:26 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:02:31 And best of all, there's not a subscription in sight.

01:02:33 Check it out for yourself at training.talkpython.fm.

01:02:37 Be sure to subscribe to the show.

01:02:38 Open your favorite podcast app and search for Python.

01:02:41 We should be right at the top.

01:02:43 You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talkpython.fm.

01:02:52 We're live streaming most of our recordings these days.

01:02:55 If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/YouTube.

01:03:03 This is your host, Michael Kennedy.

01:03:05 Thanks so much for listening.

01:03:06 I really appreciate it.

01:03:07 Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon