PyPI Security

Episode #435, published Wed, Oct 25, 2023, recorded Mon, Sep 18, 2023

Episode Deep Dive Links Transcript

Do you worry about your developer / data science supply chain safety? All the packages for the Python ecosystem are much of what makes Python awesome. But the are also a bit of an open door to your code and machine. Luckily the PSF is taking this seriously and hired Mike Fiedler as the full time PyPI Safety & Security Engineer (not to be confused with the Security Developer in Residence staffed by Seth Michael Larson). Mike is here to give us the state of the PyPI security and plans for the future.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

Mike Fiedler is the full-time PyPI Safety & Security Engineer at the Python Software Foundation (PSF). With over 30 years of software and systems engineering experience at companies like Datadog, Warby Parker, and MongoDB, Mike focuses on strengthening PyPI's security to protect the Python ecosystem from malicious packages and other supply chain threats. He also coordinates with many of the PSF’s security partners to ensure PyPI can respond rapidly to emerging issues.

What to Know If You’re New to Python

If you're new to Python and want to understand this conversation about securing packages, here are a few essentials:

It’s common to install Python libraries with tools like pip, which download code from PyPI.org.
Take a moment to double-check the library names to avoid installing lookalike or typo-squatted packages.
Learn how virtual environments (via venv or other tools) help isolate and manage project dependencies safely.

Key Points and Takeaways

1) PyPI’s Upcoming 2FA Mandate

PyPI is requiring all publishers to enable two-factor authentication (2FA) by the end of 2023. This move helps safeguard package uploads from account hijacking and password reuse vulnerabilities. While 2FA won’t block every kind of phishing attack, it significantly raises the bar for attackers who rely on leaked or stolen credentials.

Links and Tools:
- PyPI.org
- TOTP Authenticator Apps (example)
- Security Keys (e.g., YubiKey)

2) Role of the PSF and Funding for Security

Mike explained how the PSF’s small full-time staff, aided by grants from AWS and others, can now focus directly on PyPI security issues. This includes continuous monitoring, rapid response to malicious packages, and building new features like organizations and enhanced malware reporting.

Links and Tools:
- Python Software Foundation

3) PyPI's Scale and Usage Stats

PyPI hosts nearly half a million projects and over 4 million releases, with over 740,000 user accounts. Mike noted that PyPI’s sheer size means any popular Python library can quickly reach millions of developers, making secure operations essential.

Links and Tools:
- Libraries.io – A resource for statistics on packages across ecosystems

4) Supply Chain Attacks and Typo-Squatting

A big risk in open-source ecosystems is malicious actors uploading “typo-squatted” or misleading packages (e.g., misspelling something like “Django” as “Dangu”). Attackers aim to trick users into installing malware by making package names look familiar or only slightly changed.

Links and Tools:
- XcodeGhost (example of supply chain attack)

5) Trusted Publishers with GitHub Actions

PyPI rolled out “trusted publishers,” using OpenID Connect to grant short-lived tokens to CI/CD services (like GitHub Actions) instead of manually storing long-lived API keys. This eliminates a large class of token-leak vulnerabilities and automatically verifies where the package build originates.

Links and Tools:
- GitHub Actions
- PyPI Trusted Publisher Docs

6) Collaborating with Security Researchers

Mike highlighted how PyPI depends on security firms and volunteer researchers who constantly scan new packages, looking for suspicious patterns. Quick removal of newly flagged malicious uploads is a top priority, with PyPI often acting within an hour of a verified report.

Links and Tools:
- Report Inbound Malware Proposal – Mike’s blog post on shaping a new reporting standard

7) Human Factor in Security

Both Mike and Michael emphasized that “you, the human, are the best defender.” Phishing, domain expiration takeovers, and reusing passwords all remain major sources of compromise. Using password managers, scanning logs for suspicious activity, and employing mindful security practices go a long way to preventing attacks.

Links and Tools:
- 1Password
- Bitwarden

8) Testing and Essential Tools

On the testing side, libraries like pytest remain central to Python’s developer culture. Plugins such as pytest-icdiff can improve clarity when comparing large data structures. The conversation also mentioned how scanning Python packages and checking for vulnerabilities should be part of standard development and testing workflows.

Links and Tools:

Interesting Quotes and Stories

On security trade-offs: “The most secure computer is one that’s powered off and buried in concrete—useless, but perfectly safe.”
Human vigilance: “You, the human, are the best defender. Use your logic—don’t just click at things mindlessly.”
Typo-squatting hazard: “I can’t prevent anyone from making a typo. But I can help remove packages once we know they’re malicious.”

Key Definitions and Terms

2FA (Two-Factor Authentication): Adds a second layer of identity verification, often using a one-time passcode or security key, in addition to a password.
Typo-Squatting: Uploading malicious packages under names resembling popular packages (e.g., reqests for requests) to trick users who make spelling errors.
Trusted Publishers (PyPI): A mechanism allowing short-lived CI/CD tokens to publish packages without storing permanent credentials.
OpenID Connect: An identity layer on top of OAuth 2.0 used by PyPI for its trusted publisher feature, verifying a build environment’s identity.

Learning Resources

Below are a few in-depth learning materials to deepen your Python skills and better manage dependencies and security:

Managing Python Dependencies: Learn how to handle Python dependencies effectively, avoid version conflicts, and understand best practices for safe package usage.
Modern Python Projects: Covers the entire Python project lifecycle, from setting up a project structure to CI/CD, testing, and deployment—excellent for building secure, well-managed applications.

Overall Takeaway

Python’s ecosystem is thriving due to PyPI’s ease of publishing and installing packages. However, with so many packages and contributors, security can’t be taken for granted. The PSF and dedicated engineers like Mike Fiedler are working diligently—through 2FA, rapid malware takedowns, and features like trusted publishers—to protect the Python community. Staying vigilant yourself is key: use secure practices, enable 2FA, and always double-check what you’re installing.

Links from the show

Mike on Twitter: @mikefiedler
Mike on Mastodon: @miketheman@hachyderm.io

Supply Chain examples
SolarWinds: csoonline.com
XcodeGhost: wikipedia.org
Google Ad Malware: medium.com

PyPI: pypi.org
OWASP Top 10: owasp.org
Trusted Publishers: docs.pypi.org
libraries.io: libraries.io
GitHub Full 2FA: github.blog
Mike's Latest Blog Post: blog.pypi.org
pprintpp package: github.com
ICDiff: github.com
Watch this episode on YouTube: youtube.com
Episode #435 deep-dive: talkpython.fm/435
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #435 deep-dive: talkpython.fm/435

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Do you worry about your developer data science supply chain safety?

00:03 All the packages for the Python ecosystem are much of what makes Python awesome,

00:08 but they are also a bit of an open door to your code and machine.

00:13 Luckily, the PSF is taking this seriously and hired Mike Fiedler as the full-time PyPI safety and security engineer,

00:20 not to be confused with a security developer in residence, tapped by Seth Michael Larson.

00:25 Mike Fiedler is here to give us the state of PyPI security and their plans for the future.

00:31 This is Talk Python To Me, episode 435, recorded September 18th, 2023.

00:36 Welcome to Talk Python To Me, a weekly podcast on Python.

00:54 This is your host, Michael Kennedy.

00:56 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, both on fosstodon.org.

01:03 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:09 We've started streaming most of our episodes live on YouTube.

01:12 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:20 This episode is sponsored by Sentry.

01:22 Sentry.

01:23 Don't let those errors go unnoticed.

01:25 Use Sentry.

01:25 Get started at talkpython.fm/sentry.

01:28 And it's also brought to you by us over at Talk Python Training.

01:33 Did you know that we have over 250 hours of Python courses?

01:38 Yeah, that's right.

01:39 Check him out at talkpython.fm/courses.

01:43 Hey, Mike.

01:43 Hey, Michael.

01:44 Welcome to Talk Python.

01:45 I mean, it's awesome to have you here.

01:46 Oh, thanks for having me.

01:48 I'm really excited to be here.

01:49 Yeah, I'm excited to have you.

01:51 Always interesting to talk about security.

01:53 I got to tell you, talking about security just makes me nervous.

01:57 Oh, why is that?

01:58 Well, two reasons.

02:00 I feel like when you talk about security, you're kind of sticking your head up and people are like,

02:03 let me see if I could whack that.

02:05 You know, not everybody, but a few people in the world, right?

02:07 But it is the internet.

02:09 So if you take a very, very small percentage and multiply it by billions, it becomes non-zero.

02:15 And then, you know, it's just one of those things.

02:18 It's like trying to prove the absence of something.

02:22 It's very hard to prove that you're not missing stuff some step.

02:27 It's very hard to prove that you haven't, that there's not a, you know, you've got all the

02:30 controls and there's not one control you forgot, right?

02:33 In that regard, probably more so.

02:35 It's pretty tricky.

02:36 The way I've often thought about security is it's a spectrum, right?

02:40 I used this quote a million years ago.

02:42 I don't know who said it first, but the most secure computer is powered off and buried in

02:47 six feet of under concrete, right?

02:50 Like, but it's useless, right?

02:52 Like it's very secure, but nothing in there is useful.

02:55 So if we take that as like a crazy extreme of secure and say the most insecure computer

03:01 is, you know, powered on, has zero password control connected to the internet and auto

03:06 publishing IP data.

03:08 So that way anyone can come and do whatever they want.

03:10 All right.

03:11 So that's the other end of the spectrum.

03:12 That's a really bad situation.

03:14 There's a fine balance that every software application system company has to kind of navigate

03:22 to figure out where along those two crazy extremes, where do they fall and where are their kind

03:28 of risk thresholds of tolerance are?

03:31 Like, what would it cost me to add more security?

03:34 Well, I could, you know, lock down all of my users and not allow them in unless they come

03:39 to the front door and show a picture ID, right?

03:43 Like, okay, if that's how we want to secure our building, that's one way to do it, but

03:48 that'll slow down the ingress to our building.

03:51 So we issue our employees badge cards and we assume that they act in good faith and they

03:56 don't kind of lose them and report if they lose them.

03:58 Oh, great.

03:59 So that's kind of a middle ground where you kind of delegate some of the security to the

04:05 individuals and, and just kind of, you have to figure out where, where your security

04:10 is and what you're willing to do and sacrifice in order to get it.

04:15 Yeah.

04:15 I totally agree.

04:17 Wild sidebar.

04:18 I can't believe the internet in its early days was like you described, like no NAT firewalls

04:24 that stopped direct access, no passwords, just, we might want to know who you are just

04:30 so we can assign the files more conveniently to you, you know?

04:34 Yeah.

04:34 I hearken back to like the bulletin board days where you would dial up into somebody's random

04:39 computer and you would do stuff in there.

04:41 And I hosted a BBS and I interacted with others and it was like, we were all generally operating

04:48 in good faith because we wanted to kind of play together.

04:51 And not until much later did, you know, bad actors saying, you know what?

04:56 I see how I could take advantage of this in a way that suits me and not you.

05:01 Yeah.

05:01 To which we started to say, all right, well then how do we control for these things today?

05:06 That conversation comes into, you know, modern systems development of secure by design, right?

05:11 Or, you know, a lot of folks will say shift left, right?

05:14 Take security into account much earlier into the life cycle as opposed to, oh, we have

05:20 to tack this on at the end.

05:22 So I think, you know, the evolution of the internet was necessary for us to get to here.

05:28 But as we're seeing newer protocols develop, those are taking this more secure by design approach.

05:35 Yeah.

05:35 In depth with layers.

05:37 Were you a trade wars fan?

05:39 Oh man.

05:40 That's a name I have not heard in a very long time.

05:43 That was a good one though.

05:44 Yeah.

05:44 I was very much a news and mail kind of relay kind of kid.

05:48 Just wanted to see what was going on.

05:50 Got very much involved in like understanding how the pretty good privacy would allow you to

05:57 sign your messages.

05:59 So that way other folks could believe that those were you.

06:02 Right.

06:02 And kind of like attest to truth.

06:04 And that kind of fell apart because again, these are all imperfect systems.

06:08 They were, but it was such a world full of possibilities back in those days.

06:13 I remember even just sending a mail and getting it back through that whole systems of relays

06:18 was, was mind boggling.

06:19 At the time I was living on top of a mountain in, you know, in the middle of, of Israel and

06:25 having that ability to connect with other people who there's no way I was ever going to see

06:33 this variety of people back then, like, Oh, this opened the world.

06:37 Right.

06:37 Yeah.

06:38 And that kind of fueled my, my desire to like, okay, what else can I do with these computers,

06:43 with these systems?

06:43 And Oh wait, there's this internet thing.

06:46 All right.

06:46 Well, my mom's going to be ticked off because I'm tying up the phone line for hours and like,

06:52 all right, well, let's just have some fun.

06:54 Yeah.

06:54 That's when call waiting was the nemesis.

06:57 So I bring, I kind of focus on that a little bit because while we're going to talk about

07:03 things that are not necessarily positive or people trying to do negative things to something

07:08 that we all love and has been a very positive thing for the Python ecosystem.

07:12 I do want to point out mostly technology is doing really awesome things for people like

07:16 opening these doors and educating and connecting.

07:19 It's just some of the bad people, they like to connect and bad ways.

07:23 So before we get too far down that let's, let's just have you give people a quick introduction

07:27 about yourself.

07:28 So, so they all know you.

07:29 Hey everyone.

07:29 I'm Mike Fiedler.

07:30 I'm in New York city and that's where I've been living for the last 15 years, I say, I

07:36 think.

07:36 And I've been working in software development systems engineering for over 30 years across

07:42 a couple of continents, variety of different companies.

07:45 And for the past two years, I think, or three, I've been an active contributor to pypi.org.

07:53 Prior to that, I was contributing to a lot of Ruby projects, the chef ecosystem, and I've

07:59 worked at a variety of different companies, both startups and enterprises.

08:04 You may have heard of some like Datadog, Warby Parker, MongoDB, Capital One, just kind of like

08:10 working through different scenarios and learning different industries along the way.

08:14 For the past year, I've been, well, since January, I've been focusing pretty much purely

08:20 on pypi.org.

08:22 You work for the PSF officially or what's the story?

08:25 Yeah.

08:25 As of August, I was hired to come on full-time.

08:28 We thank you to our grants from Amazon Web Services, AWS, and some other folks that are

08:33 chipping in to fund this PyPI safety and security role.

08:37 But the PSF got some funding and I am the first full-time engineer to focus on pypi.org as a

08:46 full-time.

08:46 In the past, you've spoken to some other folks who were contracted out to build out different

08:52 aspects or features.

08:53 But now I'm a full-time maintainer.

08:56 Yeah.

08:57 That's really cool.

08:58 You know, the developer in residence that Lucas Lange is playing, working in that role now.

09:03 I feel like that was the first one of these types of roles, but now there's a couple, right?

09:07 Yeah.

09:08 I mean, the PSF is a nonprofit organization, very small staff.

09:13 I think we number a total of 12.

09:16 And of those 12, I think only about five of us are engineers.

09:20 And everything else is volunteer-based.

09:22 The first developer in residence program, which is Lukash, has been successful enough that we got

09:28 another organization and grants to fund the security developer in residence, which Seth Larson is doing.

09:35 And he's kind of focusing on the wider Python ecosystem as a whole.

09:40 Whereas my role is very much more narrowly focused on PyPI.org and the ecosystem surrounding that.

09:48 So that way, you know, we can focus on specific targets around security for the packaging world, as opposed to the Python core.

09:59 Okay.

09:59 Well, I do believe if you talk to people about why they like Python and especially why they stick with Python, the language is good.

10:07 You can do cool stuff with it, but it's pip install, say your name.

10:12 Say the name of your useful library that just brings so much and makes it so sticky and useful and productive.

10:21 And so making sure that we have trust and pip install is really important.

10:26 Last year, I think Dustin Ingram came on and talked about some of the stats that he had pulled together that speak about, like how much PyPI.org is used.

10:37 That doesn't even count for the countless folk out there who are mirroring PyPI packages.

10:44 So that way they can have a local cache, you know, deal with corporate firewalls or whatever need, right?

10:50 But it's true.

10:51 There's the very popular request library or the Django project.

10:56 pip install Django and you have all the things that you need to start a Django project, right?

11:02 And the speed at which the folks who are kind of working on the tooling like pip or some of the other alternatives out there to enable users to get those packages is such a wonderful tool in anyone's toolbox.

11:17 But then very often folks forget that there is an entire kind of package universe behind what they just did as a consumer, right?

11:27 So pip install Django is, yeah, I got this thing.

11:30 It installed it.

11:31 Where did it install it from?

11:32 How did it get there?

11:33 Who put it up there?

11:35 Why is it there?

11:36 All of those questions, most people go their entire career with not even having to worry about or think about.

11:41 They're just like on the consumer side.

11:43 But then on the producer side or the package maintainer or project maintainer, there's a whole other slew of things that one has to worry about.

11:52 Yeah, there's some stuff we'll talk about in there, which will be really fun.

11:55 I think also there's the third level of just the people who run PyPI and the infrastructure and the stats behind it.

12:02 I mean, maybe give us a quick, I kind of started us off down this path.

12:06 Maybe give us a quick statement for those who don't necessarily know what PyPI is, but I think more interestingly, maybe try to give us some of the stats about the scale of things behind that.

12:15 Sure.

12:15 I mean, I haven't I haven't computed the runtime stats in a little bit, but PyPI.org stands for the Python package index.

12:23 And it's distinct from other things that have PYPY in their name, which is a different runtime.

12:30 But PyPI.org is a package index, very much kind of a grocery or a store where you would pick up ingredients for the thing that you want to bake.

12:41 Right.

12:42 If you wanted to bake a cake, you need your ingredients.

12:44 What kind of flour are you going to use?

12:46 What kind of sugar?

12:47 Sure.

12:47 There's different kinds of flour and sugar.

12:49 Which one do you want?

12:50 How do you know?

12:51 You go and find one and where the package index helps is we store and publish all the different kinds of flour and sugar that you might want that other people have spent time developing.

13:03 That doesn't mean that there is only one type of flour, but there is a variety.

13:08 And we just make it easy for people to publish their projects.

13:13 And as you've highlighted, there's over 480,000 projects live on PyPI right now and over 4.8 or almost 4.9 million releases.

13:24 And a release is not a one-to-one to a project.

13:27 A project may have many releases.

13:29 So, for instance, if there is the requests library and they publish a new version, that comes as a release.

13:36 And then beyond that, we have files.

13:39 And files map to releases as you could have a source distribution.

13:43 So, there's like literally the source code of a given release.

13:47 Or you could have compiled wheels for different platforms.

13:51 So, there's a lot more files than there are releases.

13:54 And there's a lot more releases than there are projects.

13:57 Yeah.

13:57 And then on the last stat that we show on the front page is the users.

14:01 We do have over 740,000 users on PyPI.org.

14:06 That doesn't mean that these are active users, but they have at some point signed up for an account on PyPI.org.

14:13 That's a huge number.

14:14 And these are not people who might pip install a thing.

14:16 These are people who, for some reason or other, are interested in potentially creating content for others to use.

14:22 Exactly.

14:24 Today, the only way you can publish a project on PyPI is by having a user.

14:29 Or, you know, it starts with a user.

14:31 There's other ways to publish.

14:32 But you have to have a user to kind of start the process.

14:35 And a lot of folks have started to kind of get the idea that if this project needs long-term maintainership, right, it's not just me.

14:46 Maybe I should ask somebody else to help co-maintain this.

14:50 So it's also not a one-to-one mapping of users to projects or releases or something like that.

14:55 For sure.

14:57 This portion of Talk Python To Me is brought to you by Sentry.

15:00 You know Sentry for their error tracking service.

15:03 But did you know you can take that all the way through your multi-tiered and distributed app with their distributed tracing feature?

15:09 Distributed tracing is a debugging technique that involves tracking requests of your system, starting from the very beginning, like a user action, all the way to the back-end, database, and third-party services.

15:21 This can help you identify if the cause of an error in one project is due to the error in another.

15:26 Every system can benefit from distributed tracing, but they are especially useful for microservices.

15:31 In this architecture, logs won't give you the full picture, so you can't debug every request in full just by reading the logs.

15:38 Distributed tracing with a platform like Sentry gives you a visual overview about which services were called during the execution of certain requests.

15:47 Aside from debugging and visualizing architecture, distributed tracing also helps you identify performance bottlenecks.

15:53 Through a visual like a Gantt chart, you can see if a particular span in your stack took longer than expected and how it could be causing slowdowns in other parts of your app.

16:02 Learn more and see some examples in the tracing section at docs.sentry.io.

16:07 To take advantage of all the features of the Sentry platform, just create your free account.

16:12 And for all of you Talk Python listeners, use the code TALKPYTHON, all one word, and you'll activate a free month of their premium paid features.

16:20 Get started today at talkpython.fm/sentry-trace.

16:25 That link is in your podcast player show notes and the episode page.

16:28 Thank you to Sentry for supporting Talk Python To Me.

16:33 Some of the changes coming, I think, allow for almost like a GitHub organization within PyPI, right?

16:40 Rather than, well, we're going to create an account and that one account is for all of AWS, for example.

16:45 Which is not really the right granularity, probably.

16:48 It definitely isn't, but it historically has been, right?

16:51 Like, that is just a feature we had never built.

16:54 It was never a focus.

16:55 But over the past year or so, I think we got funded to build out some of the organization's aspect.

17:04 We have launched the community organizations.

17:08 So that way, if you're running an open source project or an ecosystem there, you can sign up today and get an organization name.

17:16 We are still working through a long backlog of organizations in order to approve them.

17:21 It still requires an admin to do so.

17:24 But we are still working through some of the complexities around corporate organizations when it comes to just as a nonprofit,

17:32 how can we kind of figure out how to support corporations properly?

17:36 Yeah, I've always thought that that was something of an opportunity to work with corporations more closely on PyPI and indirectly through the PSF.

17:47 Your role exists because of these grants, because connections with certain high profile and high consumers of Python tech companies, right?

17:54 Like AWS and others.

17:56 But there's tons of companies that have things that support their product and at least their developers work with.

18:04 And having a way to make them feel more at home on PyPI, I think is a good idea.

18:09 Beyond what lots of organizations may do is, you know, have some of their in-house engineers contribute to PyPI.org, to the warehouse code base.

18:19 It's open source.

18:20 Everything you're looking at is open source.

18:22 That's where I started.

18:24 And that's the easiest way of like, oh, you want this thing?

18:27 Open an issue.

18:28 Talk about it with us.

18:29 You know, if you want to go ahead and put some effort behind it, we'll welcome that too.

18:34 But there is a wiki page out there of like packaging fundable improvement projects of like, all right, if you're considering throwing some money at the problem,

18:45 here are some things we've thought about and would love your assistance with beyond that.

18:48 Like there's other ways of just like straight up funding a role that can focus on a particular thing.

18:54 Excellent.

18:55 All right.

18:55 Let's talk about supply chain issues.

18:58 We were talking before we went live here that probably the biggest side of security or the biggest, at least from my perspective,

19:07 what seems like a very huge opportunity for people to do bad things is to just upload malware basically of different ways, right?

19:15 Sure, you could talk about hacking PyPI.org itself or other stuff, but I think that that's probably quite well covered.

19:22 And it's more about, can I get tricked somebody and through various ways of installing something that they didn't.

19:29 And that generally falls under the supply chain security sort of thing.

19:33 So I wanted to just point out three examples that just show this is a industry wide problem, not necessarily a PyPI problem, but there is a PyPI manifestation of it, right?

19:43 Yeah.

19:44 And just to kind of lay the groundwork for folks who aren't familiar with supply chain attacks, the notion is that instead of an attacker trying to get onto your computer,

19:55 they're going to go after something that they have a high probability of knowing is going to be on your computer through for the SolarWinds as kind of an administrative action.

20:07 Well, you know, many, many SolarWinds were installed on servers, on computers.

20:12 That's part of the supply chain that it's not, I'm not going directly after you.

20:16 I'm going after something you consume, right?

20:19 Right.

20:19 And it can be very, very meta, right?

20:22 So one of the examples that I would say that that falls under is this thing called Xcode Ghost.

20:27 And so I believe this was primarily a Chinese problem, basically because in China there were a lot of App Store developers who weren't,

20:35 either weren't registered as Apple developers or for whatever reason didn't go,

20:39 maybe it's just a latency thing, didn't go through the App Store to get their Xcode or go through the developer portal.

20:44 They just found like a local mirror.

20:46 And what are those local mirrors?

20:48 What could go wrong?

20:50 I'll just get it from, you know, this IP address instead of apple.com.

20:54 Right.

20:54 Yeah.

20:55 So what it did was it was a backdoored version of Xcode.

20:59 So they weren't attacking even the things that people were using.

21:03 They said, let's take over the developer's tool chain.

21:06 So whatever they happen to be building, we don't know what that is, but we'll install a virus into their app.

21:11 That app will go in the App Store.

21:12 Then whoever installs that app will have it, right?

21:14 These things get very indirect.

21:16 This is kind of the challenge is like nobody until somebody surfaced this as an attack, right?

21:23 Nobody thought this was a problem.

21:25 This is kind of earlier to your comment of like, how do you disprove the existence of a problem?

21:32 And a lot of it is just like, all right, we got to think about every aspect that goes into producing a given piece of software.

21:41 But like the strongest answer here is don't download random stuff from people on the internet, right?

21:47 Like I'm sure that this one in particular had a good reason for having a local mirror.

21:52 But if you're going to local mirror it, then who is the local mirror and what is there?

21:57 What are they doing, right?

21:58 What kind of attestation or assurances do you have that they haven't modified anything in the process?

22:05 It's very tricky because I might absolutely trust some company out there that's building a very popular.

22:10 They have 10 million downloads like that.

22:12 Surely that's fine.

22:14 But one of their developers or one of their consultants to one of their developers may have, you know, misappropriately gotten their tools.

22:21 And it's very hard from the outside to even know that that could be a problem.

22:26 So these things are tricky.

22:27 Yeah.

22:27 I mean, the good news is that there's a large volume of security companies out there who, you know, make their bread and butter by scanning and looking for patterns that looks, you know, sneaky, tricky.

22:39 And they spend a lot of investigative time digging into these.

22:43 We get lots of reports from those types of folk of like, here, this is a new package.

22:49 It looks, you know, fishy and here's why.

22:51 And then we take action on those.

22:53 I hear you.

22:54 Hypo squatting was a big issue for a while.

22:58 That's a form of supply chain attack.

23:01 Like here, this Xcode ghost is we're going to get people to use a fake Xcode or a broken bad Xcode that they think is fine.

23:10 Right. Instead of trying to say, take over Django, the package and do some malicious to it, try to take over Django or, you know, whatever.

23:19 Right. Some common misspelling of that and upload that package.

23:24 And you could even embed Django.

23:26 Right.

23:26 And so it still functions.

23:28 It's like, I don't remember it being spelled this way, but it's working.

23:32 So it got to be fine.

23:33 Yeah.

23:33 Typo squatting is, is, is very much a prevalent problem.

23:36 Right.

23:37 Because like, I can't prevent you from making a typo.

23:40 Like I literally can't.

23:41 If you type in Django, that's it.

23:43 Game over.

23:44 Right.

23:44 What I can do is look or receive reports that Django exists.

23:50 It looks malware.

23:50 And let's just take that down.

23:52 Let's not do that.

23:53 Right.

23:53 Yeah.

23:53 The other sides we can do when it comes to type of, oh, well, you talked about typoscutting.

23:59 And I was reminded of a, of, of an article I remember reading around DNS record bit flipping, where some computers, some browsers would not properly process a given bit in a memory register for a DNS record.

24:16 So this author figured out what those bit flips would be for popular DNS names, registered those DNS names and started just harvesting traffic and said, you know what, this is not anything you can do.

24:28 This is just how browsers and memory work.

24:30 And that was, I don't know, about six, seven years ago.

24:33 And I believe it's been fixed since, but it was like, yeah, there's sometimes there's just not anything that you did wrong.

24:39 It's the ecosystem you're in is doing things in a way that you don't expect for something as nefarious as, as like DNS bit flipping.

24:49 Like this is where like having outbound firewalls can help a whole lot to say, don't allow traffic that I didn't initiate in some manner.

24:58 And if I did, have I, have I initiated the traffic to this address before?

25:03 Do you remember zone alarm from the early two thousands?

25:07 Yes.

25:08 So this is before, this is, this harkens back to a slightly less naive version of, I can't believe there was no passwords on the accounts, just on the open internet.

25:17 But windows 95, 98, there were no firewalls.

25:22 And I was at a company that was based inside of a university where we all got ethernet and every computer that plugged in got its own IP address and all sorts of crazy stuff.

25:34 But there were no firewalls.

25:35 And I remember when that thing came out, I thought, you know what, maybe I'm just gonna go around and put this on all the dev machines.

25:41 Like it's kind of insane that we have this incredibly insecure software just on the open internet.

25:47 Yeah.

25:47 And so I did in all the, when I started, it used to say, do you want to let such and such thing act as a server?

25:53 Do you want to let IIS or, you know, engine X or this type of thing?

25:57 Act sure.

25:58 That can be a server.

25:59 Then the next pop-up was, do you want to let notepad.exe be a server?

26:03 I'm like, huh, that's not probably what it should be doing.

26:06 Yeah.

26:07 That doesn't sound right.

26:08 That doesn't sound right.

26:09 I said no.

26:10 And then the next one and the next one, the whole company's notepad.exe were being servers.

26:15 And I'm like, this can't be good.

26:17 And it turned out they had something had infected it.

26:20 And until I put on one of those outbound firewalls, how do you know?

26:23 Right.

26:24 No one knew there was no indication.

26:25 We had, you know, super fast internet.

26:27 It wasn't like it was dragging it down.

26:29 I don't even remember what it was doing, but it was bad.

26:31 The number one thing that I think we can learn from all of those things is that awareness is the biggest part of security.

26:38 Because if folks aren't aware that downloading something from the internet could be a danger, then they're just going to download it and run it.

26:46 If somebody who had a previously version of, you know, software working on their machine suddenly pop up and say, this has been modified.

26:56 Are you sure you want to open it?

26:58 So many of us just click OK without reading the dialogue.

27:01 It's like, well, wait, think about that for just a second.

27:04 Because you are the biggest kind of enabler and disabler of security, the human behind the keyboard.

27:11 Because you probably have some administrative rights on your computer that allows you to do some stuff.

27:16 And in the example with Notepad.exe, I think today, if we were to try to do that on some popular developer environment like VS Code,

27:27 VS Code does act as a server in a lot of cases.

27:30 So it's like, I don't know, should this work as an inbound server or not?

27:34 I don't know.

27:35 Maybe this is just part of the local language server that I need for autocomplete.

27:39 Yeah.

27:39 Or maybe it's not.

27:41 It's getting more subtle every day.

27:43 It is absolutely getting more subtle.

27:45 Even Zoom had like a local loopback web server thing, I think, for a while.

27:48 All right.

27:49 So before we move off of this typo squatting part of the conversation, out in the audience, we have a pretty decent question here.

27:55 What's the possibility of something like a verified badge for popular packages?

27:59 I mean, if Twitter can charge $8 a month.

28:02 No, I'm just kidding.

28:04 I don't think they're called Twitter anymore.

28:06 But...

28:08 The artist formerly known as Twitter.

28:09 Yeah.

28:10 The challenge there is, what does verified mean, right?

28:14 This is something that we kind of introduce some features later on that we'll talk about.

28:18 But this notion of verified is like, well, verified by whom?

28:22 Where does the level of trust?

28:23 Because if a supply chain attack happens for Django, so if you were to like search Django here in PyPI.org, and we get Django, all right, we've got Django, the second line, Django 425.

28:35 And if we were to enter there, like, how do we know?

28:38 Yeah.

28:38 This is a thing, right?

28:40 So I could add a badge here, but that doesn't give me any confidence that any of the Django folk, which are, you know, great people, that one of them didn't get compromised and suddenly a new version was pushed.

28:50 So verified, I guess, it's what does that mean to whom and why?

28:56 Because the last thing I want to do is tell people, give them a false sense of security when, honestly, you're downloading software from the internet.

29:04 If you don't have a process to vet what it is you're doing is doing the thing, then you should probably look at that aspect of a, we vetted this version of Django.

29:15 We got these hashes.

29:16 We got these hashes.

29:16 We got these releases.

29:17 We pin this dependency.

29:19 We're happy with this.

29:20 And then when you upgrade, you kind of do a similar evaluation.

29:25 There's a bunch of projects out there like PiUp and safety and others that will publish, you know, and scan for advisories.

29:32 There's also the PiPA advisory database for packages that we know have some problems with them.

29:40 So that way you can use other tools to audit what you have installed to see if you have something smelly.

29:46 But we are thinking about what it would look like to add a, this release and these files of a given project have been published under, you know, stringent, you know, more secure methods.

30:00 Yeah, I certainly see that a verified wouldn't prove that the Django devs hadn't, you know, somebody could have taken over their computer and swapped out like twine or poetry or whatever they're using to upload the package and do exactly what they did with Xcode ghost, basically.

30:17 Right.

30:17 Something equivalent to that.

30:19 So the last part we want to do is like, we don't want to give people a false sense of security and say, well, PiPI told me this was okay.

30:26 And then they find out it wasn't because then that looks really bad for us.

30:31 But on the flip side, we are looking at how do we provide mechanisms and measures to publishers to reduce the potential for the situations that you described to happen.

30:46 This portion of Talk Python To Me is brought to you by us over at Talk Python Training.

30:51 Let me tell you about one of our really popular courses.

30:55 HTMX plus Flask, modern Python web apps hold the JavaScript.

30:59 HTMX is one of the hottest properties in web development today.

31:03 And for good reason.

31:04 You might even remember all the stuff we talked about with Carson Gross back on episode 321.

31:10 HTMX, along with the libraries and techniques we introduced in our new course, will have you writing the best Python web apps you've ever written.

31:17 Clean, fast, and interactive.

31:19 All without that front-end overhead.

31:20 If you're a Python web developer that has wanted to build more dynamic, interactive apps, but don't want to or can't write a significant portion of your app in rich front-end JavaScript frameworks,

31:30 you'll absolutely love HTMX.

31:33 Check it out over at talkpython.fm/HTMX or just click the link in your podcast player show notes.

31:39 All right, let me throw some ideas out to you and tell me what I think.

31:46 So as I think about this, especially when the very first news a couple years ago, I can't remember exactly the time frame, but not very long ago, the first malicious PyPI package.

31:55 You know, NPM had been getting whacked on for a while because JavaScript, YOLO.

32:01 But, you know, when it came to PyPI, I was like, okay, this seems to be a little more serious, a little more pervasive.

32:06 And they were often typo-squatting type of issues.

32:10 Or people would introduce some package and say, here's a cool thing, you should check it out, and it's really a virus.

32:16 Or one of those types of things.

32:17 So one of my thoughts, one of the metrics I would have liked, or maybe in the future will like to apply to my local Python environment is,

32:27 don't let me install packages that are too new.

32:31 Or don't let me install packages that have too few downloads.

32:35 And give me a mechanism to say that.

32:37 Like, I don't want to ever say pip install something and that something has not existed on PyPI for less than a week.

32:43 I don't ever want to be able to say pip install something and that thing has less than a thousand or ten thousand, whatever, downloads.

32:51 Unless, and they could say, nope, you can't install that.

32:53 It breaks your rules.

32:54 You could say, okay, no, I actually uploaded this.

32:56 I really need to, you know, you could do like a pip install of force, --force.

33:01 You know, some kind of override.

33:02 But by default, if I could just say, you know, it has to have at least 5,000 downloads.

33:06 Or I just don't want it.

33:07 I feel like at that point, somebody would have discovered, oh, you know what?

33:10 Is actually using 100% CPU usage and crypto mining or whatever it happens to be doing.

33:15 I don't want to be the first guinea pig in the world to discover this.

33:18 What do you think about this idea?

33:19 The download count one is always an interesting one, right?

33:22 It's a topic that comes up a lot.

33:23 And like, I can tell you personally from experience that writing a little loop to increase download counts is super easy.

33:32 Interesting.

33:32 Like, write a wild true pip install something and like, you'll drive up download counts.

33:39 It will be meaningless in the grand scheme of things.

33:41 So you could say, well, maybe make it like, it's got to have, you know, a thousand distinct IP addresses.

33:46 But then, you know, if you own a botnet, then you're good to go.

33:48 Okay.

33:49 Fair.

33:49 This becomes like the cat and mouse game of like, all right, well, what is something that is good?

33:55 Today, we have a mechanism where we don't advertise new packages that have been there for, I think, under a week to any kind of crawlers.

34:05 So any search engine crawlers.

34:07 So if you were going to like Google for Python Jangu and it was a brand new package, you wouldn't find it via Google because we wouldn't advertise that for indexing yet.

34:18 Right.

34:19 But after a week, like we do.

34:20 So that's one method that we have for preventing some of these like newer packages from getting widespread visibility because they, you know, everything is a webpage.

34:32 They are all subject to search engine optimization.

34:34 Somebody could craft their readme to, you know, be the best hit on Google and therefore they'll show up first.

34:41 And with all this crazy AI stuff, it's only getting easier.

34:44 Hey, ChatGPT.

34:45 I would like to create a page that is like the Jango pypy page, but I wanted to rank highly for this.

34:51 Something that we are talking about internally of like, how do we put packages that are brand new, either from some heuristic of a brand new user or a brand new version or differs enough from the previous versions and kind of put those in kind of a holding or a timeout zone to let our security research partners who are really excellent at like just listening to the package feeds and going after and just running all their analysis on them.

35:19 To give them first crack, right?

35:49 What was the kind of reference there just a moment ago?

35:51 Having like published allow lists, right?

35:54 These are very prevalent in large corporations that have very strong security policies and they have teams of folks that will maintain internal mirrors of a package index.

36:04 So they will disallow any pip install of anything unless you're using their package index.

36:11 And I think that is another tool in the security toolbox to have people who are that like security focused to say we will only allow in the things that we have already tested to be true.

36:25 We vetted them.

36:26 And those kind of match our heuristic.

36:29 If you scroll down a little bit on the Django page, almost every sidebar to every one of these has these statistics.

36:37 This particular one shows GitHub statistics because this package has a GitHub URL.

36:41 But there's also libraries.io, which is not affiliated with PyPI.org.

36:47 They're just a really great service.

36:49 And you can search for packages of any shape, kind of any ecosystem.

36:53 But they have a really good kind of ranking system.

36:57 Again, if it works for you, the crux of it, don't install garbage off the internet, right?

37:01 Check out what you're doing.

37:02 But by using something like libraries, which I don't know why that didn't load.

37:07 Probably was just getting a virus.

37:08 I probably misspelled it.

37:09 Oh, yeah.

37:09 It's good.

37:11 But they offer a nice set of stats around a given package so you can try and be a little bit more informed on your own.

37:21 The challenge there remains that nothing is going to tell you on libraries.io or PyPI if somebody has uploaded malicious software and this is a bad one.

37:32 The best we can do is once we know about it, we handle it.

37:35 Yeah, I feel like PyPI has been pretty on top of it.

37:39 We try.

37:39 I published a blog earlier today where I pulled together a lot of analytics and stats from our inbound malware reporters.

37:47 And it's looking pretty good.

37:50 We handle over 80% of inbound reports in under 60 minutes.

37:54 You know, I go into the article about like the whys and wherefores, the timeliness matters and the response time.

38:00 Because the longer something is out there, the worse it can contagion to other, you know, other folks.

38:07 Yeah.

38:07 So we try and do as quick as possible, often under like five to 10 minutes.

38:12 But we also have to do some investigation and kind of like confirm that the report is accurate.

38:17 We don't want false positives.

38:19 Most of our research researchers don't give us false positives.

38:22 So shout out to all those folk.

38:24 But it's hard and time consuming.

38:27 I remember one of the more recent PyPI supply chain issues where somebody uploaded something bad was attributed to all these different ATP and hacking groups have cutesy names like the SolarWinds was by something bear.

38:43 Which bear?

38:44 Which bear?

38:44 Cozy bear.

38:45 That's the kind of bear it was.

38:46 Which is really Russia state actor hacking, right?

38:50 And one of the PyPI ones was North Korea.

38:53 And I think they were doing crypto mining on computers, which seems like a real big waste of I have access to the server in a bank.

39:01 But anyway, it works for them.

39:03 It works for them, you know?

39:04 But the reason I bring this up is like, it's you all have a serious challenge in that if you're up against state actors from a security perspective, like that's not just script kitties or some weird automation or, you know, like those are you guys got to be on top of top of your game, right?

39:21 This is, again, where I think relying on our ecosystem of security partners is so important because they will corroborate intelligence that they've garnered from other ecosystems that are beyond PyPI and be able to identify these kinds of actors.

39:37 Me, I see kind of just a slice of what the universe has.

39:41 They're going to see a different slice, but broader in spectrum and not necessarily as focused on one particular ecosystem.

39:50 So working together, we can kind of do the best that we can for all the users out there.

39:56 Excellent.

39:56 So we talked about hyposquatting, which is serious, but also kind of the silliest, kind of not that big of a deal because recommendations could be like, you know, actually use a requirements management system rather than just every time you create a new environment, just type pip install X, Y, and Z.

40:14 Like the chances you might fat finger that versus pip install -r requirements.txt or, you know, poet, something with poetry or whatever.

40:22 Right.

40:22 So that helps a lot, although it's not perfect.

40:25 The other one is more the Xcode go style.

40:28 Like what if somebody were to take over one of the other systems and you all had over here, you have a new two factor requirement for PyPI.

40:39 Do you want to talk about that?

40:40 Yeah, absolutely.

40:41 This also was covered on an earlier podcast of Talk Python where I think in 2022, we had announced that we were starting to ratchet down the amount of potential.

40:55 I think you got the wrong link there.

40:57 I do have the wrong link.

40:58 Keep going.

40:59 It's Dustin.

41:00 Dustin Ingrams.

41:02 Yes, exactly.

41:03 I thought I pulled it up.

41:04 I put the other one twice.

41:05 There we go.

41:05 The 2FA story is largely, again, we talked about there's about 740,000 users, right?

41:11 These are the publishers of packages, right?

41:14 So if in our use case, we talked about Django devs, right?

41:18 And I'm sorry to pick on Django.

41:20 They're just the one that's up there.

41:21 But if one of the Django devs was using a classic problem, which is an email expiry or a domain expiry attack.

41:29 So let's say I'm a Django admin maintainer and I use MikeTheMan.com as my email address, right?

41:36 And that's great.

41:37 Because we don't want to use Gmail.

41:39 We don't want to use or, you know, the .me or Outlook.

41:43 I'm a good citizen of the internet, so I got my own domain.

41:46 Yeah.

41:46 I just haven't been paying attention, right?

41:48 I haven't been paying attention this year.

41:49 Right.

41:50 And then let's say I let it expire.

41:52 Whoops.

41:53 You know, like that happens.

41:54 People forget to pay their bills.

41:56 Or your credit card gets stolen and canceled.

41:58 You forget to renew it there.

41:59 And then the other thing goes to spam.

42:01 Like it could actually be super easy that that happens.

42:04 And it happens all the time, right?

42:05 Like people, there are numerous domains that I've registered over the year that I was like, yeah, I don't need that anymore.

42:10 Hopefully I have never used anything from that domain to sign up for anything securely that's there.

42:16 But then someone else can come along and register, MikeTheMan.com, set up an email server, request a password reset, get that email.

42:24 And now they can do anything I could have done before.

42:28 With 2FA, that entire set of problems goes away.

42:32 And we're not even talking about like phishing.

42:34 If somebody phishes my password or if they use the same, if I made the mistake and use the same password on two websites and one website stored it in securely and they pop that in a breach.

42:45 And, you know, now they have my username and password.

42:47 2FA just solves.

42:49 Do you discourage that using the same username and password?

42:51 I absolutely discourage that.

42:53 I find it very inconvenient to have a separate password.

42:55 I just use the letter A.

42:56 Yeah.

42:56 That's a choice, right?

42:58 It's a bad choice.

43:00 No, like the amount of tooling out there today, both free and paid for password management is just so pervasive.

43:09 It's almost like irresponsible to not use one.

43:12 I 100% agree.

43:13 Yeah.

43:14 I always use one password.

43:15 I think, I don't know if it'll tell me how many I have in here, but I think it's coming up on like 1,500 and not quite, just under a thousand different distinct passwords and accounts.

43:28 You know, a lot of people don't want to pay for it.

43:30 Bitwarden.

43:30 Bitwarden is fantastic.

43:32 It's open source.

43:32 I don't know if you got a recommendation, but you're right.

43:35 It's irresponsible.

43:36 I mean, I'm a 1Password fan.

43:37 It's just a great tool.

43:39 I used it back when, when it was like a single thing.

43:42 Then, you know, I used it as a, as a organization account, right?

43:46 Like the, I was an admin for our org and like managing that life cycle was pretty sweet.

43:51 And then it's like, okay, we have this as an organization.

43:54 We have over 400 employees.

43:55 Why doesn't everyone have this right now?

43:57 So, you know, it became a good rollout, but, but having a second factor, a 2FA or multi-factor MFA, I think is this notion of something you have versus something, you know.

44:10 So let's say that even by using a password manager, you don't know that password anymore, right?

44:15 Like you don't remember it, but let's say you do, right?

44:18 Like, let's say somebody gets your entire vault of passwords.

44:21 They still don't have this second factor, which is often a time-based one-time password or web authentication device, which could be a hardware device or a browser fingerprint.

44:34 Like they don't have that, right?

44:36 It's a defense in depth kind of problem that is solving where it's like, you need, you need to have two things in order to get through this door.

44:45 And if you only have one, that's not good enough.

44:49 Using that capability and having that ability on PyPI user management has enabled us to roll out a higher grade of security for the packages and maintainers of those packages by attesting that, well, we know that this maintainer or this publisher of this package has already secured themselves.

45:14 So against these kinds of attacks.

45:17 Yeah.

45:18 I can just hear the voices.

45:20 In fact, they don't come through an audio form.

45:22 They come in email.

45:23 Like, you know, on that last episode, sometimes they come through on the artists formerly known as Twitter.

45:28 Sometimes they come through an email, but like, you know, Michael, you said that two factor will help.

45:33 You realize, you don't seem to realize, I'm saying I realize, so I don't get this email.

45:38 Please don't email me.

45:39 That this doesn't stop phishing.

45:40 Like people could still phish you.

45:42 You could go in, they could ask for your username and password, then they'll ask for your time-based authentication.

45:47 And then they're in.

45:48 Yes, that's true.

45:49 But it stops some things.

45:52 And stopping some things rather than going, well, it's not good enough, so I'll do nothing is certainly not a responsible way to go, I think.

46:00 It's kind of like making the argument that if nothing is perfect, don't do anything at all.

46:05 Yes, exactly.

46:05 Right.

46:06 That's a fallacy.

46:07 If you're going to die, don't get out of bed.

46:09 Right?

46:10 Like, no, like, we get out of bed, we go to work, we do our things, right?

46:13 We ultimately, as sad as it is, right, we have an end date.

46:17 We hopefully don't know what that is.

46:19 But like, do the best you can while you can.

46:21 That's where I come to from.

46:23 Like, this is the best we know.

46:25 Yes.

46:26 Will there be something new and exciting tomorrow that is even better?

46:29 Maybe.

46:30 But until then, let's do the thing that we know to be the best that we can do right now.

46:35 Right.

46:35 Maybe Pasky's will be awesome.

46:37 I don't know about that.

46:38 But for example, you know, from a phishing perspective, things like 1Password and Bitwarden have plugins for your browsers, and they will suggest to autofill on the right domains.

46:49 But if you're on pypi.io, is it a pypi.org or, you know, whatever, right?

46:56 If they're on some kind of phishing domain, they will not suggest to autofill.

47:00 Right.

47:00 If you find yourself going to your password manager and going, God, why does this not work?

47:04 Like, let me just copy this over.

47:06 Stop.

47:07 Figure out why it's not working really, really, really well before you somehow subvert this broken extension that won't autofill.

47:15 Right.

47:15 So there are ways to limit phishing through these mechanisms, even if they're not perfect.

47:21 Exactly.

47:22 I think I said this before, but like, I'll reiterate it.

47:25 You, the human, are the best defender.

47:29 Use your logic.

47:30 Use your sense.

47:31 Like, don't just click at things mindlessly.

47:33 Take a moment.

47:34 Take a look.

47:35 See, that error message, that looks weird.

47:38 Why does that look weird?

47:39 The domain I'm on looks a little odd.

47:42 The little browser lock symbol isn't locked.

47:45 Why is that?

47:46 Hmm.

47:47 Take a moment.

47:48 Nope.

47:48 Had not exe once stacked at the server.

47:50 Yes, I want to load it.

47:51 Come on.

47:51 Yes, let it.

47:52 Just, I got it.

47:54 Yeah.

47:54 The reason, I think the news around the 2FA for PyPI.org is not that it exists, but that it's required now.

48:03 I think that's what's different since I spoke with Dustin.

48:05 We've been on a path, and as you've got this blog post open, we've been on a path of like, starting with the carrot.

48:12 We want to like, provide as many people in the packaging ecosystem, all the incentive, all the time, all the kind of expectation that they could have in order to set this up voluntarily, right?

48:24 Like, there was even a wonderful giveaway of hardware security keys that like Google sponsored, which is excellent.

48:32 That doesn't mean you need a physical security key.

48:35 You can use them.

48:36 You can use software security keys.

48:38 Google Authenticator or any other tool.

48:40 Duo Labs has a nice one.

48:42 But like, anything in order to kind of move the bar on this 2FA engagement.

48:48 And we've seen some decent adoption.

48:52 And it's like, okay, well, now let's set a timeline.

48:55 This post by Donald kind of starts the clock on that.

48:59 And we are basically drawing a line in the sand that's saying at the end of 2023, if you want to publish a new package, like, that's it.

49:08 You need to have 2FA.

49:09 We've started on that process by requiring 2FA for new users.

49:17 So if you registered today, you need to set up 2FA.

49:20 Like, if you've been around for a while and you don't have it yet, we'll still allow you to upload, but we'll send you a notice that's saying, here's what's going to happen at the end of this year.

49:30 And we've slowly been kind of ratcheting down the areas at which 2FA is not required with the intent on basically January, December 31st, January 1st, 2024, enabling the requirement on all accounts.

49:46 So that way we can kind of walk away from the problem of, well, I guess one of the Django maintainers got fished.

49:54 And that's why we had a big issue in the ecosystem.

49:57 Like, I don't want that to be the problem.

49:59 And again, apologies to Django.

50:00 Y'all are awesome.

50:01 It's because they're so popular and loved that you pick on them, I can tell.

50:04 Yes, yes.

50:05 Again, this doesn't completely solve all fishing attempts, but it certainly is another layer of defense.

50:13 So I think it's certainly worth doing.

50:15 Now, there was a bit of a pushback.

50:18 I think somebody even like rage quit their package temporarily and then said, oh, no, I want it back on PyPI when this came out as if it was a big deal.

50:26 And this is, you know, this blog post was from May.

50:30 The deadline is end of 2023.

50:33 In between those two times, GitHub just comes out and goes, everyone gets 2FA right now.

50:37 I don't care.

50:37 Right.

50:38 And this is such a broader, broader, more impactful thing in terms of the many people use Python who are not creating packages.

50:46 But almost everyone who uses Python is also in some way using GitHub.

50:49 And so it just touches so much more of the ecosystem and people are like, oh, OK, I don't know why there was so much blowback in one and not the other.

50:56 But it's an odd thing.

50:59 Right.

50:59 Because on the one hand, PyPI or the index itself, right, has been around for about 20 years.

51:06 This is a long lived concept in the Python ecosystem of having a place where people can publish software freely, no charge, and others can install that software.

51:19 This requirement is a shift.

51:23 Right.

51:24 Yeah.

51:24 And a lot of folks are like, well, what else is going to happen?

51:27 It's like, well, probably nothing.

51:29 Right.

51:29 I don't see us talking about other requirements or enforcements unless they're necessary.

51:36 Again, I can't predict the future.

51:37 And if somebody says that pass keys are the best way and TOTP is broken and proves it and the industry-wide decides, oh, wow, this is not a good idea.

51:49 Let's do this other thing.

51:51 Then maybe we'll do that.

51:52 But until then, this is the best we've got.

51:56 The requirement for 2FA is even on the OWASP top 10 list of why you should be doing this.

52:05 And it's like, this is what governments use, companies use, and auditors use to say, we are adhering to the best practices.

52:13 Because if you had a security vulnerability reported to your company because you weren't using 2FA, you know, auditors will say, well, why not?

52:21 It's in the, like, top 10 list.

52:22 It's like the SQL injection of yesteryear.

52:26 Yeah.

52:26 Just like, just do this.

52:28 Right.

52:29 Just solve this class of problem.

52:31 You will have other problems.

52:32 We all have problems.

52:33 But solve the ones that we know are relatively easy to solve.

52:37 Good advice.

52:38 I feel like, you know, when the two-factor software problem, like, that's not good enough.

52:44 You know, these YubiKeys and stuff are too tricky.

52:47 We're just going to go back to SMS.

52:49 Like, that's, that's where it's, what?

52:51 I cannot believe that my bank will let me use 2FA.

52:54 They forced me to use.

52:55 SMS.

52:56 You might want to check out for different banks.

52:58 Well, it's like one of the top four banks in the US.

53:01 It's nuts.

53:01 They also have limits on the length, not lower bounds, upper bounds on the length of the password.

53:07 My, ooh, that.

53:09 That, that, I understand why.

53:11 Right.

53:13 Upper bounds, I understand why.

53:14 But it usually boils down to, like, database design and, like, the cost of doing a database migration.

53:19 I hear you.

53:20 Like, I think it's like 12 or something.

53:21 It's very short.

53:22 Oh, that's short.

53:23 That's way too short.

53:23 But here's the thing.

53:25 Do you know, it doesn't matter if you have one letter or a hundred letters.

53:27 The hash is still the same length.

53:29 Depending on how you're hashing it.

53:30 Yeah.

53:31 But they will not be stored.

53:32 Like, if they're not storing the hash, it makes me extra nervous.

53:35 Yeah.

53:36 Anyway.

53:36 Onward.

53:37 I'm glad they got the SMS 2FA backing it up.

53:41 Yeah.

53:42 Another thing that I, that I, that I, that I do want to kind of plug on the, the, like, the security spectrum and kind of to address the question around, like, verifiable releases is something that we launched earlier this year, which is called trusted publishers.

53:54 That's right.

53:55 That's alluded to, or linked to in the, there we go on, on our docs.pypi.org of what it is.

54:02 Links in the show notes.

54:03 People can check it out.

54:04 Yeah.

54:04 So, you know, it's a great thing where we leverage an open standard called OpenID Connect.

54:08 And today we only implement this with one publishing tool called, you know, GitHub Actions, where the service GitHub Actions is now delegated to be a trusted publisher for your project.

54:22 When you set this up, you have to opt into this completely.

54:25 We didn't do this for you, but you can now opt in to say GitHub Actions is allowed to publish my project.

54:31 And then you can say, you know what?

54:34 None of my humans are allowed to publish the project.

54:37 The computer that is getting a short lived token for like five minutes or 10 minutes, whatever it is, is allowed to publish this package and no one else is.

54:46 And that's how we can start to build the levels of attestation and kind of the software supply chain security to say, I know where the source code is.

54:57 I know the source code that built it.

55:00 I know the builder who built it.

55:01 I know the builder who published it and no one else tampered with it in the, in the interim.

55:07 We're not there to like prove that nobody else tampered, but we are there to say, I can now delegate authority to GitHub, GitHub Actions to perform this release for me, as opposed to me creating a token in PyPI and giving that token to GitHub Actions.

55:25 That's how we did it before.

55:26 Right.

55:27 A long lived permanent token that you put in plain text somewhere, right?

55:32 What could go wrong?

55:33 I mean, usually like an environment variable or secrets and GitHub Actions, they have pretty good ways of securing data.

55:40 But again, it's long lived.

55:41 So if anything ever happened over there, if anybody dumped a debug log that they shouldn't have, that token could be there.

55:48 So by using a trusted publisher flow, you can now have your GitHub Actions deployed directly to PyPI.org once the artifact is complete and not have to do that token management.

56:03 So we're getting short on time, Mike.

56:04 What else do you want people to know about what you all, in particular, what you're doing at PyPI and some of the initiatives and maybe how they can help?

56:14 The top of mind for me right now is the malware reporting project that we're engaged in.

56:19 And that's kind of linked to at the very bottom of my blog from today, the Inbound Malware Reporting Blog, where we are looking to establish what a kind of machine readable protocol would be to interact with security researchers.

56:33 A few of them have chimed in already on what they think of, and we're just kind of building the conversation around what it would look like to report, how do you like to report?

56:42 And then we'll proceed with whatever guidance we get there and kind of build out the payloads and stuff like that all the way at the bottom, very bottom, all the way at the bottom.

56:53 There we go.

56:54 And once we have this format in place, we're going to be building out like the infrastructure and ecosystem in order to submit those payloads and then figure out how to kind of put packages in timeout while these payloads are being investigated.

57:08 So that way we can continue to provide a secure ecosystem for all users of PyPI.org.

57:15 I think that's great.

57:16 Certainly, you know, these companies that are checking out and just monitoring the flow of packages and scanning them, that's a huge service.

57:24 Is there, there probably is, never will be like a bug bounty equivalent?

57:30 Is there?

57:30 I mean, never say never, but.

57:32 Never say never.

57:32 From that perspective, it becomes a bit of a challenge because then you could start funneling money through a bug bounty program because we are offering an ability for people to create packages.

57:44 And then saying, we're giving you a monetary incentive to report them to us.

57:49 So it's like, well, now we've given you a pipeline for money.

57:53 There's a whole shadow industry of like, you first create it, then you get it popular, then you report it.

57:58 Yeah.

57:59 Yeah.

57:59 No, I hear you.

58:00 Yeah.

58:00 But, you know, no, no idea is too far fetched.

58:04 We like talking about ideas and figuring out what, what makes sense.

58:07 And kind of, again, with, with a lot of security work is like, okay, well, how can this go wrong?

58:12 How can this fail?

58:14 Right.

58:15 How can it be gamed?

58:16 Yeah, absolutely.

58:18 Well, I, for one, feel better that you're putting all your time and energy into focusing on these problems and seeing how we can make PyPI better for everyone.

58:26 Almost everyone.

58:27 Not for everyone.

58:28 For the 99.9% of us.

58:30 For most people.

58:31 Just wanting to use it in a solid way to build Python software.

58:34 That's kind of why I was drawn to it, right?

58:36 Like, to contributing to it is such a foundational piece of modern day infrastructure that it's important that it be safe, secure, convenient, useful to anybody who wants to use it.

58:50 Because Python itself is such a ubiquitous language across the planet and beyond that, you know, we want to make it the right thing.

58:58 Yeah.

58:59 Surprisingly, every time you say that statement, it's more true.

59:02 Like, that graph continues to go up in surprising ways.

59:06 All right.

59:07 Before we get out of here, I'll ask you one of the final questions.

59:09 Notable PyPI package, not malware-ridden, but a good, useful one.

59:15 What do you recommend?

59:16 Anything you come across that's awesome lately?

59:18 A huge fan of pytest.

59:19 And I know that, you know, you're big pals with Brian Okken.

59:22 Hey, Brian.

59:23 Who talks a lot about testing.

59:24 And pytest plugins are a wonderful extension to pytest.

59:29 Yes.

59:30 And there's so many of them out there.

59:32 And there's even like an awesome pytest aggregator of these.

59:37 And I think I have one on here, which is called pytest Socket.

59:41 Nice.

59:41 Which I maintain till today.

59:43 But the one that I want to point out is one that I recently learned about, which is called ICDiff.

59:49 I, the letter C, diff.

59:51 I don't even know if it's on this.

59:53 It's the letter C.

59:54 I gotcha.

59:55 Yeah.

59:55 There it is.

59:56 So that's not the pytest package, but there's an extension pytest ICDiff.

01:00:02 We'll get there.

01:00:03 So this uses that other one.

01:00:04 But the notion here is a lot of times you get big pytest output if you're comparing, you know, dictionaries, lists, or stuff that has lots of data.

01:00:13 Sometimes detecting the difference is very hard in the terminal.

01:00:18 And the pytest ICDiff extension will help highlight a lot of these with colors, with spacing, which makes finding the problem much easier.

01:00:28 Yeah.

01:00:28 That seems super helpful right there.

01:00:30 And it does a partial character by character diff and line by line diff with different colors.

01:00:35 Yeah.

01:00:36 And here's what we expected.

01:00:37 Here's what you got.

01:00:38 Yeah.

01:00:39 Yeah.

01:00:39 Also, I'm learning that there's even more madness to the pretty print.

01:00:44 So it could say from Pprint, import Pprint.