#457: Software Supply Chain Security with Phylum Transcript
00:00 We've spoken previously about security and software supply chains, and we're back at it
00:04 on this episode. We're diving in again with Charlie Coggins. Charlie works at a software
00:09 supply chain company and is on the episode to give us an insider's look and a defender's
00:15 perspective on how to keep our Python apps and infrastructure safe. This is Talk Python to Me,
00:21 episode 457, recorded January 24th, 2024. Welcome to Talk Python to Me, a weekly podcast on Python.
00:43 This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy, and follow the
00:48 podcast using @talkpython, both on mastodon.org. Keep up with the show and listen to over seven
00:54 years of past episodes at talkpython.fm. We've started streaming most of our episodes live on
01:00 YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about
01:06 upcoming shows and be part of that episode. This episode is brought to you by Sentry. Don't let
01:12 those errors go unnoticed. Use Sentry like we do here at Talk Python. Sign up at talkpython.fm/sentry.
01:20 And it's brought to you by Mailtrap, an email delivery platform that developers love. Use their
01:25 email sandbox to inspect and debug emails in staging, dev, and QA environments before sending
01:31 them to recipients in production. Try Mailtrap for free at talkpython.fm/mailtrap. Hey, Charlie,
01:38 welcome to Talk Python To Me. Hi, Michael. Great to have you here. We have corresponded back and
01:44 forth about security things. And now, are you here to scare us? Is that what's going to happen?
01:49 It's going to seem that way. There are threats everywhere, especially when you start looking.
01:55 And that's the problem. You look, you'll find them. If you're not looking, you might get
02:00 affected without even knowing it. Yeah, but that's true. But we're also going to come with
02:05 some tools and techniques and tips on how to avoid security problems with your Python code.
02:11 Yes, absolutely. Yeah. I think it's especially concerning. That certainly catches my attention
02:18 that if you mess with somebody's software, like the software builders, the developers,
02:25 it gets shipped to however many users are on the other side of that equation, right? It's not like
02:30 I just took over some teenager's gaming PC and now what can I do? It's like, I took over, name
02:37 your big web app, and now we're going to start shipping some stuff around. All right. That's
02:42 where the multiplicative aspect of this gets more concerning than just standard personal computer
02:49 safety, right? Oh, absolutely. A single developer can have very broad impacts. Maybe they publish
02:58 one package, but that one package could be included in hundreds, thousands of other packages as a
03:04 dependency. And then everyone using those packages could be affected. Whether the code is good and
03:11 works as intended or poorly written and has bugs and vulnerabilities. Yeah. It's malicious.
03:16 It's not to say there's any chance of there being a problem with Pydantic, but just to make your
03:21 point, if you go to like Pydantic or request or something like that, a lot of these have
03:26 used by projects, right? And this Pydantic is used by 315,000 people, not people, software
03:34 projects that themselves have users, right? And so that's the kind of stuff that I'm thinking
03:39 about when I said that multiplicative effect, right? It's a big multiplier, not just a couple.
03:44 Oh yeah. Yeah, for sure. Yeah. Now, before we dive into our main topic, of course,
03:49 tell people a bit about yourself. All right. Well, my name is Charles Coggins. I usually go by
03:54 Charlie and I'm a Python developer. I'm a software developer, but not through the traditional sense.
04:01 I don't have a computer science degree. I didn't come to this straight out of school. I got my
04:08 first taste of programming long enough ago, back in the '80s, in 1987. My dad got a computer for
04:16 us and I was messing around on there with some games, always with games, right? When at the time
04:24 it was basic, it was this bowling game that my brother and I would play. And I saw that I could
04:29 look at the code, I could look at the source. And I went in there and modified it a bit to make it
04:34 so that I would always win whenever I played them. But then-
04:38 How long did it take him to catch on?
04:40 Oh, he figured out pretty quickly. And he was in there too, changing ball speed and how often he
04:46 could get a gutter or make him get a gutter. But yeah, I took a class or two in high school and
04:53 college, but I was an electrical engineering major and then went to work for the government
04:59 doing something that wasn't even really that. So I spent 10 years working for the government before
05:07 they stood up the US Cyber Command and decided or figured out that they needed to hire 6,000 new
05:17 developers to fill the positions. And there weren't that many available in the industry,
05:23 let alone those who could pass the clearances and work in that environment. So they looked to
05:29 people already working in the government and I raised my hand. I said, "Yes, yes, I want to
05:33 cross-train. I'll be a developer." And so they trained me.
05:38 What did they teach you for language in that program?
05:41 We started with C, C++, and then there was some Python. So I went through a couple of boot camps
05:49 and a lot of self-learning, self-teaching. Python's the one that really clicked for me.
05:54 It just made sense in my head.
05:56 Yeah, of course. If you're learning to do cybersecurity stuff, you know, a lot of times
06:01 I'd be happy to tell people like, "Ah, you don't really need to learn C or Rust or Java." If you
06:07 just know Python, you're probably 90% of the time golden. But if you're trying to do cybersecurity,
06:14 a lot of times it's about the machine level stuff, right? Understanding things like C and
06:19 pointers and buffer overflows and all of that kind of stuff is where you actually kind of need to be.
06:24 And they taught us all that as well. In fact, we learned assembly language as well,
06:29 and that one really didn't fit in my brain.
06:32 You're like, "I want to become an assembly language programmer." I mean, yeah, that's a whole different breed.
06:41 Yeah, it sure is. And it used to be, I remember when I first got into programming, I was doing
06:48 some C, C++, and inline assembly was something people would do a lot to optimize. A lot like
06:54 people might do Cython or Numba or something like that to make Python fast. Like, "We'll find this
06:59 little part and we'll rewrite it in this way." And be like, "We're just going to do inline
07:02 assembly." I'm like, "That just doesn't seem like worthwhile. I don't need that much performance.
07:08 We're going to not do that." Fun. So now you're working at Phylum. Is it Python-focused or just software security?
07:20 It's not Python-focused. In fact, the company primarily develops with Rust, as you were
07:28 mentioning. We've got some excellent Rust developers at our company, and I think that's
07:35 what's attracted a lot of them is that that is the primary language we use. But we also have some
07:42 developments in Python. And when I came on board, I got assigned to work on our integrations.
07:48 So like GitHub integrations, GitLab, pre-commit hooks, things like that. And so I was able to
07:58 kind of architect it the way I thought best. And because I love Python, I made it all in Python and
08:04 exposed it through Docker containers.
08:07 Are you doing direct integration with Rust, like Py03? Or is it more just issuing commands out?
08:18 The Rust elements that our company works on, like our API, the command line interface,
08:24 a lot of the backend, it's just written straight Rust. And then the Python is just plain Python.
08:30 There's no interface between the two, really.
08:33 Yeah. Okay. Consuming APIs and Docker containers and stuff like that.
08:38 Right, right, right. Although I am interested in the Py03, and I think there's room to
08:44 bridge the two languages at our company.
08:48 I mean, for sure, people are adopting Rust for the performance foundations of Python.
08:54 It's pretty interesting.
08:57 Yeah, yeah. I've been at the company almost two years now. I keep saying it's what I'm
09:03 going to learn next, is Rust. And I felt like I would just kind of absorb it by going through
09:08 code reviews and the people on my team. It hasn't happened yet. I can kind of understand what's
09:13 going on by reading it, but I just, yeah, I need to jump in.
09:16 Deaf in, deaf, okay, got it. Those are the same. Okay, got it.
09:19 Yeah, yeah.
09:20 No, it's interesting. Okay. Well, we're not here to talk about Rust, although I do think
09:26 it's becoming one of those things that is sort of, I don't know, if you need to be a little
09:31 one level deeper in the Python space, that used to be C, and now it's, I think it's pretty solidly
09:37 moving to be Rust, right? There's a lot of popular things, Pydantic, for example, I pulled up earlier, where that's the foundation,
09:44 but that also seems to be where the momentum is.
09:46 Yeah. The oxidation of Python libraries is a real thing. I mean, look at Ruff.
09:52 Yeah. Ruff. I just heard about how Granian, I think it was, which is a new, similar to
10:02 G-Unicorn and MicroWSGI is a Rust-based async server. It goes on and on.
10:10 This portion of Talk Python to Me is brought to you by OpenTelemetry support at Sentry.
10:15 In the previous two episodes, you heard how we use Sentry's error monitoring at Talk Python,
10:21 and how distributed tracing connects errors, performance and slowdowns and more
10:26 across services and tiers. But you may be thinking, our company uses OpenTelemetry,
10:31 so it doesn't make sense for us to switch to Sentry. After all, OpenTelemetry is a standard,
10:37 and you've already adopted it, right? Well, did you know, with just a couple of lines of code,
10:42 you can connect OpenTelemetry's monitoring and reporting to Sentry's backend. OpenTelemetry
10:48 does not come with a backend to store your data, analytics on top of that data, a UI,
10:53 or error monitoring. And that's exactly what you get when you integrate Sentry with your
10:58 OpenTelemetry setup. Don't fly blind, fix and monitor code faster with Sentry. Integrate your
11:04 OpenTelemetry systems with Sentry and see what you've been missing. Create your Sentry account
11:09 at talkpython.fm/sentry-telemetry. And when you sign up, use the code TALKPYTHON, all caps,
11:16 no spaces. It's good for two free months of Sentry's business plan, which will give you
11:21 20 times as many monthly events as well as other features. My thanks to Sentry for supporting
11:26 Talk Python to me. All right, well, let's talk about software security, though. You know, like,
11:33 we touched on it a little bit with the multiplicative aspect of like why software
11:37 developers should care. But maybe let's start with some ways in which viruses might get on
11:44 your computer from a software perspective. Not from like, "Oh, you know, I found this cool app
11:49 on BitTorrent and normally it's paid, but this one's free." It's like, "Hmm, maybe don't install
11:53 that." But, you know, not that kind of advice, right? But, you know, specifically for software
11:58 developers. Right, right. So for software developers, I think the primary
12:03 vector, you know, for malicious code running in your environment or really any developer
12:10 environment along the way, it doesn't just have to be your system. It could be your CI/CD servers
12:15 and your runners. It's going to be software dependencies, third-party code, right?
12:21 Code from strangers on the internet, right? That's really what it boils down to.
12:26 They just, Charlie, they're just here to help out. They're just giving you the code to help out.
12:32 They have no bad intentions. Right, right. Except for that one. That one over there, don't take it.
12:37 Yeah. And it's hard to tell, you know, what's good, what's bad. And I think we all rely on
12:48 third-party code. I mean, I think it's a rare company, rare project that writes everything
12:54 from scratch on their own without any dependencies. So that's a vector for sure, is allowing code from
13:04 strangers on the internet to run. I think like the name of the game, right, for attackers and
13:11 threat actors is arbitrary code execution. Like that's the key phrase, arbitrary code execution.
13:17 If I can get arbitrary code execution with this vulnerability, then I've won. I can attack your
13:23 score of nine or above. So right there. Yeah, exactly. And that's for vulnerabilities. That's
13:29 just, you know, poorly written code or code with bugs. But forget about vulnerabilities. I mean,
13:35 if you're an attacker, you're a threat actor, you've already got the perfect means to run
13:40 arbitrary code, to gain arbitrary code execution on a developer system. That's with third-party
13:46 dependencies. Open source software is just the perfect target for writing malware.
13:53 You're slipping malware into packages.
13:56 Now, when people hear this, we've talked about it enough. It actually came as quite a surprise
14:01 a few years ago. People theoretically knew that it could happen, but that it was happening is that
14:07 packages on package stores like PyPI and NPM and so on got published vulnerabilities that people
14:14 could then install and make part of theirs. But there's a whole software supply chain, right?
14:19 Maybe talk us through some of the different elements that make that up. Only one of which
14:23 is these libraries, right? That's right. That's right. So the software supply chain is, it's
14:29 really, it's using third-party code securely, as well as securing the end-to-end development process.
14:35 So that process is, you know, very broadly broken into three phases. You've got the source phase,
14:43 that's, you know, source control management systems, and then actual, actually coding,
14:49 developers coding on their systems, you know, committing to repositories.
14:54 Yeah. You know, you mentioned the dependencies like pip install this or that. There's also,
15:02 for many of the really popular IDs and editors, there's a whole massive array of variants,
15:09 levels of trusted plugins or extensions, right? As well.
15:14 That's right. Yeah. Like Visual Studio Code. That's what I use for my IDE. You know, it's got
15:19 an extensive extension ecosystem. Just about anything you want to do. I get a little pop-up
15:27 when I open a new project and it says, "Oh, I recognize you're using a YAML file. Do you want
15:31 to download this extension that will lint YAML files?" Right? Like, there's an extension for that.
15:35 Yeah. I got one for CVEs. It was like rainbow CSV syntax highlighter. So then I'm like,
15:42 "You know what? That's not really made by a trusted company. It's probably fine.
15:47 But I don't need my CSV files highlighted so much so that I'm willing to just like run
15:54 arbitrary code from a stranger on the internet." That's right.
15:58 Right. And, you know, I use both PyCharm and VS Code and they both, especially PyCharm, has sort of a warning that says, "This is untrusted. It's a third-party
16:08 thing. Are you sure you want it?" Like, you know, that's a pretty light warning.
16:12 And also they're not the same, right? Is it installed by a million people
16:17 used every day or is it for you the fourth person to use it? And it hasn't, you know,
16:22 had the experience of people going, "Why is it called opening a network socket? What's it doing?"
16:28 You know, something like that.
16:31 Yeah. Yeah. That's another entry point that you got to be careful about.
16:36 All right. Well, I cut you off. We're only in like square one of maybe nine.
16:40 Yeah. Yeah. Square one, source code, and then there's the build phase. That's where
16:47 you take the code, you take the commits that have gone into source control,
16:50 and you build something with it, right? This usually happens in, you know, your CI/CD systems,
16:57 GitHub, GitHub's and GitLabs of the world. And it's at that point where, you know,
17:05 your third-party dependencies get included and wrapped up into your artifacts, right?
17:12 Which brings us to the third stage of the software supply chain, which is the package and deploy
17:20 phase. That's where you're creating your artifacts and making them available to the world to use.
17:28 Could be anything. It could be a wheel for a library that other parts of your company use
17:33 to build software. It could be some app you ship. It could actually be a website, an API,
17:39 who knows, right?
17:40 Yeah. A Docker container. Yeah.
17:43 Yeah. Yeah, exactly.
17:44 And then by the time you get to that, you know, the end of the supply chain and, you know,
17:50 the products or the packaged product that people are going to see and use and work with,
17:57 you know, you've baked in so many elements at that point, you know, from your third-party
18:02 dependencies to, you know, any other external resources that are getting called. So there's
18:13 lots of points along the way that it's possible to...
18:16 And one of the things that can be sneaky is, you know, it doesn't happen that often in Python,
18:23 but you're shipping like a Windows or a Mac app. There's a digital signature roof of,
18:29 we're going to sign this with our trusted certificate. So it doesn't even give you
18:33 any warnings. Like, look, this is, it's signed by the company. It is trusted. Here you go.
18:38 Pick it. Right. And somewhere upstream from that, there's an issue like with packages or other
18:45 things. Well, that issue is now that that problem is signed and verified as well.
18:50 Yeah. Yeah. You know, so you mentioned code signing, the research team at our company,
18:57 I mean, they're amazing, amazing group there. They're always finding new and novel attacks.
19:03 And when they found just this past week, involved something kind of cool where the attacker had
19:11 bundled up a valid Microsoft binary, had been signed by Microsoft, but they bundled it with
19:19 the DLL that was malicious. It was named something to be expected. Right. So when you run the
19:27 executable on the binary, you know, you could see that there's this Microsoft signs application
19:35 looking for permissions, looking to continue. And you think, oh yeah, great. Signed by Microsoft,
19:39 no problem. But then it uses this technique called like DLL search order hijacking
19:47 technique. Right. So if you have a DLL that's being called by the application more locally
19:53 than not, it's looking for the same directory. Yeah. It'll look like looking for the name of
20:00 the DLL in the same directory first, basically is what's happening. Right. Right. It shipped
20:06 their bad DLL with a good binary. So you pick something in system 32 that's got like a real
20:13 common name, like VC runtime, whatever, dot DLL, or, you know, some of the standard ones,
20:20 but then you completely reprogram it and stick it in there with that app. Or maybe not completely
20:25 because you need the app to not crash, but you give it some extra boost when it does something.
20:30 Right. Yeah. Yeah. In this case, they had just copied all the files needed for execution into
20:36 a new directory, including the known good binary, the known bad DLL, and then, you know,
20:42 it had everything it needed in that directory to run. And it looked like it was legitimate.
20:45 Right. A lot of the OS dependent, a lot of these OS checks are on the executable,
20:51 the system libraries that they use. Right. Right. Right. You'll see like this, this executable is
20:56 downloaded from the internet to show you want to run it. Like that doesn't say this executable,
21:01 which you trust is maybe possibly using a library that you downloaded. Like it doesn't say that.
21:06 Right. Yeah. Cause we could never get work done if there was that level of checking all over the
21:11 place. This is an updated somewhere. This portion of talk Python to me is brought to you by Mailtrap.
21:19 We're going to keep this super short. So please pay attention or you'll miss it. Mailtrap is an
21:23 email delivery platform that developers love. An email sending solution with industry best analytics,
21:29 SMTP, and email APIs and SDKs for major programming languages with 24/7 human support. What makes
21:36 them unique is their email sandbox. Use email sandbox to inspect and debug emails in staging
21:42 dev and QA environments before sending them to recipients in production. Try Mailtrap for free
21:48 at talkpython.fm/mailtrap. That's kind of the space that we're talking about, right? We've got
21:56 editors, we've got libraries that you use, CI/CD pipelines, containers are super interesting as
22:04 well, and all the tools to go with those. So let's talk through some of the posts that you've
22:10 written and also just selected about some of these things and maybe starting to the front of that
22:15 list there with lock files. Yeah. Okay. So yes, I wrote a blog post. I guess it's looking at the
22:23 date on your screen. It looks like it was over a year ago now. And probably seems like yesterday,
22:28 but no. Yeah, that's right. 2022 it was. So I'm sure the landscape has changed since then a bit,
22:34 and maybe there's some new players out there. But yeah, I think one thing you can do as a
22:41 developer, a big one I would recommend is use lock files for your dependencies, right? And
22:50 what's a lock file? Well, it's the fully resolved set of dependencies that are used
22:59 by your application, your package. And if nothing else, you should know what's going into your code,
23:09 right? Not just your direct code. Yeah, exactly. That's a bit of a challenge, right? And I think
23:17 I'll admit when I first got into Python, I didn't do this that well. And to me, it felt like probably
23:23 the biggest issue I might run into is instability in my app, right? Like for example, if I don't
23:28 pin a dependency, some new thing comes out, I reinstall it on a new computer, maybe it gets an
23:34 upgraded version, and there's some library that doesn't work, right? I mean, there's been certainly
23:38 popular libraries that just said, we're having a major version change and we're fixing the mistakes
23:43 we made 10 years ago, and these three functions are changing or whatever, right? That would break
23:48 it. But it could also be there's now a malicious version of library X, that's version two. But if
23:55 you pinned it on version one, even though it's bad, you're still not getting the bad one, at least
24:00 for a while, right? Absolutely, yes. So I think I gotta look it up. I always forget. PEP 665.
24:09 Okay. Yeah, PEP 665. It's a rejected PEP, unfortunately, but it was written by Brett
24:16 Cannon, some others. I know you've had Brett on the show a number of times. I love the stuff he
24:22 does. He really understands all of this. And it's kind of a shame this was rejected, but this PEP
24:27 tried to create a standard lock file format for Python. And if you look into the PEP a little
24:38 bit, there's some motivation about why you'd want to do this and four big reasons. And the third one
24:43 is one I really key on, which is that lock files allow for reproducibility. And reproducibility is
24:50 just more secure. I'm quoting here from the PEP, it says, "When you control exactly what files are
24:55 installed, you can make sure no malicious actor is attempting to slip nefarious code into your
25:00 application, i.e. some supply chain attacks. By using a lock file, which always leads to
25:05 reproducible installs, we can avoid certain risks entirely." And I mean, that's the name of the game.
25:12 That's what our company focuses on, which is avoiding those risks by ensuring you know which
25:21 dependencies you're using and you're knowing that those dependencies are benign or good,
25:26 doing no harm.
25:27 Even if there's something that happens, usually it's going to happen to a popular library because
25:34 you're using it, hence probably other people are using it other than type of squatting,
25:39 which we can talk about. But if you pin your dependencies, chances are these things only
25:46 stick around for a little while. It's not like, "Oh, they discovered it had been there for eight
25:49 months." It's like, "Oh my gosh, we heard about it. A few people got it and then we got rid of it."
25:54 Right?
25:55 Yes.
25:55 The folks at PyPI are pretty excellent. So it's to some degree a timing issue as well.
26:00 Yes. Vulnerabilities are different, right? That's what a lot of people focus on. A lot of the
26:06 tooling exists to discover vulnerabilities in your dependencies, which is good to know about those,
26:13 but those exist for a long time, right? You have CVEs for known vulnerabilities and they end up in
26:19 these databases and they're there for years. And if you're using old dependencies or maybe
26:25 transitive dependencies are using old ones and you're stuck on it, then you're going to be
26:30 exposed to those vulnerabilities. But what's different about-
26:32 Examples. Sorry. Examples of those include the WebP library not too long ago, right? That was
26:39 baked into Python and then also OpenSSL, right? So people discovered issues in those. Those are
26:45 baked into different aspects of Python or some of the libraries. And it's like, well, all of a
26:50 sudden there's this fire drill, which is different than somebody going, "I'm going to sneak a thing
26:54 into the library system." Right. And then it is a timing matter. So malicious dependencies, that's a whole other story. Because if a malicious package is discovered,
27:05 there's not a CVE created for it. The package is just taken off of the registry. You report it to
27:11 good people at PyPI and they'll review the submission and take it down. I've done a few
27:17 of those myself and they're really fast, but there's still a window of time where
27:24 that malicious package, that malicious dependency is up and available. And that's-
27:29 Yeah. I do think pinning your dependency-
27:31 Often all that's needed.
27:32 Yeah, exactly. I do think having a pinned dependency there is worthwhile. Because if you
27:36 make a commit, your CI runs, et cetera, et cetera, right? The chances that you just bump the version
27:42 to this malicious thing is pretty low. Yeah, exactly. So yeah. And having version
27:48 ranges is not enough. You need to have explicit versions.
27:53 Let's talk more about these lock files then.
27:55 There's actually a bunch of choices these days. And Brett's PEP tried to make it less of a choice.
28:03 Say, "Well, it doesn't matter if you use hatch or pip or poetry or whatever, the outcome is the same."
28:09 And for reasons that I haven't learned enough about, I don't know why that didn't work. But
28:14 let's talk about what's out there now. Because there's a couple options at this point.
28:19 Sure. I think the... Yeah. So most Python developers are going to be most familiar with
28:24 pip, right? That's the standard. And pip has requirements files. And they're unique in the
28:36 lock file world because they can be named anything, right? Most other lock files have a defined name.
28:43 We're talking about Rust earlier. They're the gold standard for a lot of this stuff. And
28:48 they're very clear. They have cargo.lock. That's their lock file. You can't name it anything else.
28:53 Its contents are well-defined. It is what it is. But in Python with pip,
28:58 I mean, you could name it whatever you want. You know, dev requirements.txt. You could name it
29:04 cargo.lock, but it can contain Python dependencies in it.
29:08 Surprise. I'm not Rust.
29:10 Basically, you can just put more or less arbitrary commands that are sent to pip
29:16 in any text file, right? Which is more or less what it is. Yeah.
29:19 Yeah. Any command line option you can feed the pip, you can put in a requirements file.
29:25 It's cool because you can import by saying -r some other file.
29:30 Yes. Yes.
29:31 But it's also not...
29:32 Get the hierarchy that way.
29:33 Yeah. Yeah.
29:34 So there are some tools available to turn those loose requirements files,
29:42 the pip requirements files, into strict lock files, right? Where every entry is
29:49 pinned to a specific version. And pip itself can do it with the pip freeze command.
29:54 So that's the one most people know about. But that one's kind of not so great because it only
30:01 freezes the packages for the environment that you ran pip freeze in. And maybe you're trying to
30:08 publish your lock file for users of a different platform or system.
30:13 The other thing that I don't like about it is you want to put just the things you actually
30:17 use into your requirements file. Like I'm using HTTPX and Pydantic. That's it. But what it really
30:24 installs when you run that is the transitive closure of all those things, which is fine.
30:29 But you're not necessarily expressing that with just your requirements.txt, right?
30:35 Right. Yeah. Yeah. Your two packages could balloon to 100 dependencies. And that's not uncommon. It's
30:43 not even that bad. Like in the JavaScript ecosystem, the same handful of top level
30:49 dependencies could have two orders of magnitude explosion where you end up with thousands.
30:53 There's a really... Oh, gosh. I can't find it. You know what? I think it's on... I think I put
30:59 it on the Python bites. But there's a really funny... I want to be able to pull this up for
31:02 people so they can find it. There's a funny, funny thing that somebody did. Well, for some
31:09 definition of funny. They put... Somebody created an npm package called everything.
31:16 Yes. I saw this.
31:19 Everything becomes too much. The npm package chaos of 2024. An npm user named PatrickJS
31:26 launched a troll campaign with a package called everything, which depends on every package in npm.
31:32 Yeah. Yeah. I think it's the npm's the largest package registry out there. So, it's already
31:38 massive. I remember your early episodes, you would recount how many packages were on PyPI.
31:46 I don't even know. Are we past half a million?
31:49 Well, yeah. I remember it was a big deal. It got up to 100,000. And now it's probably,
31:53 what? 400,000? 500,000?
31:55 508,509 by rounding. Yeah. Half a million. Congratulations, world. Amazing.
32:03 I just added two new ones last week. So, I guess I made a huge difference in that number.
32:08 Nice.
32:10 Yeah. So, basically, the PyPI is awesome and it does a bunch of great stuff. And one of the
32:14 things I really like about working with PyPI is I don't need to teach people anything if they want
32:19 to work with my project. I don't need to teach them like, "Oh, I know you love poetry, but I'm
32:24 using a combination of the Hatch build backend with PDM." You're like, "What? I don't even know
32:29 what those are." There's a lot of ways in which you work that are brought in with a lot of these
32:36 tools here. So, PyPI is kind of like, it just kind of works, right?
32:39 Yes.
32:40 But having this transitive closure managed is not part of what it does, but it's super important
32:46 because if I need to upgrade something, I can't just change my version number in my requirements
32:51 because that doesn't affect its dependency possibly, right? It depends on what it said.
32:55 So, I'm a huge fan of PyPI tools. This is actually what I do most of the time.
32:59 Yes. PyPI tools is another one. It's great. I think it has this PyPI compile
33:07 command that will take as an input, I think, just about any Python manifest type that's out there.
33:15 So, you can do setup.py, requirements.txt. I'm forgetting the other ones.
33:23 The pipenv.loc maybe.
33:27 Setup.cfg, pyproject.toml. It just recognizes all the different ways people could express
33:36 their loose requirements, the manifest files. Yeah. So, yeah.
33:41 Yeah. I really like it. And you can say, "PyPI compile upgrade," and it'll look at all the
33:47 dependencies and upgrade them all as high as they can go. But what's nice about that is,
33:51 you'll be working for a while, then you choose, "Well, let me just do a refresh on the dependencies
33:56 right now and re-pin them and see how that works," and then just carry on with your business for a
34:00 while, right? And it'll manage that transitive closure as well with actually a really nice
34:06 lock file where it describes, "These are all the things in the lock file." And the reason that,
34:10 for example, in your blog post, you say, "They're certified, this version," and it's there because
34:14 you asked for it and because request needs it. If you're like, "Why is this in my virtual
34:19 environment? Why do I have this weird thing that I don't know?" It'll tell you, "Here's why it's
34:23 there." Yeah. Yeah. One of the downsides, though, I think pip tools has this issue. I know pip does,
34:30 is that in determining that transitive dependency resolution, it is very possible,
34:39 in fact, it usually happens that you have arbitrary code execution on your system, right?
34:43 If you start with the two top-level dependencies, like you mentioned, and it lists dependencies,
34:48 well, then it'll pull those in and it acquires the metadata from the wheel if that exists.
34:54 But if it doesn't, it'll build the package just to get the metadata file,
34:58 just to figure out which dependencies that needs. And so you end up-
35:01 Are you saying I should set up a Docker container to execute this?
35:06 Yeah. That's kind of what's happening.
35:07 Maybe I should. Yeah.
35:08 Yeah. Running in a sandbox is another option, right? That's what my company, Phylum, that's one
35:18 of the solutions we offer. We have extensions for our CLI where you can wrap pip by just calling
35:26 Phylum pip, and then everything runs in a sandbox. So that's another solution.
35:31 Yeah. Yeah. Yeah. Because I mean, pip is a funny one because they even have a command line option
35:37 called dry run, tac-tac dry run, which you would think, "Oh, nothing's going to happen on my
35:42 system." It's just-
35:43 Separate running code from strangers on the internet.
35:45 But it does. Yes. Dry run, even using dry run for pip install and pip download commands
35:52 will or has the possibility of downloading and running arbitrary code from strangers on the
35:57 internet. Yeah.
35:59 If we had, oh, like wheels came along far after pip, right? And we've got the source distributions
36:05 and setup.py and all that kind of stuff. And so if wheels existed from day one, it very well
36:11 may be the case that this is not a problem, right? But what is pip supposed to do? It has to
36:16 evaluate this dynamic thing to figure out what it wants in a sense.
36:18 Yes. Yes. Yeah. Yeah. Wheels are great because they have a metadata file in there that clearly
36:26 lays out what the dependencies are. And there's no arbitrary code running when you install a wheel.
36:32 It's just extracting and copying. A wheel is just a zip file. You extract that zip file and then
36:39 copy the contents to various locations. But yes, as you said, because we've had source distributions,
36:46 tarballs, and then even eggs before that, and probably never going to fully get rid of those,
36:54 it just takes one. One dependency anywhere in your chain that is only distributed as a source
37:00 distribution before now you're downloading and building a package just to get metadata to
37:07 continue.
37:07 And maybe you didn't actually choose that, right? It's the dependency of a dependency
37:11 of a dependency.
37:12 Absolutely. Yeah. Yeah. That's, yeah. Yeah. People often respond to some of the findings our company
37:22 has where we'll post these malicious packages with all sorts of crazy names. And people will
37:27 respond to say, why would I install that? Why would I ever install this random package that
37:35 no one's heard of? It's like, well, you wouldn't. But it could be included in the transit dependencies.
37:43 Right? If it gets added to a slightly more legitimate package or worked up the chain that
37:50 way, then yes, eventually you'll be running it unknowingly.
37:55 Yeah. I think there's two important things we should talk about this before we move on,
37:58 because there are some interesting ways in which you might unknowingly, you might even try to do
38:03 the right thing and you might actually shoot yourself in the foot by doing so. So number one,
38:09 these super strict lock files are awesome when you're building an application. I want to ship
38:15 talk, Python training out. It's got a strict API as it runs on this version. It uses that
38:19 version of Pydantic, that version of Beanie and whatever. I want that to be fixed, fixed,
38:25 zero flexibility until I decide to maybe a pip compile update or whatever I want a new one.
38:30 However, if I was building a library that someone else was using, I would do them many headaches
38:36 and a disservice to say, I depend on Pydantic 2.7.0. You're like, well, my other library needs
38:44 Pydantic 8.8 and I can't use it and your library together. So you need the, it's a different story
38:51 when you're building a library that others are going to consume than it is when you're building
38:55 an application. And there was some disagreement, I guess, about the recommendation of pipenv for a
39:01 while. And it's because I believe the pipenv is really focused on the application side. And it,
39:06 I don't think it was made super clear that maybe it doesn't make as much sense for libraries.
39:10 Right. So you want to speak to that a little? Yeah. Yeah. I'm an advocate for lock files for
39:15 everyone. Right. Applications for sure, but also libraries and their developers. Right. Cause
39:21 if when you distribute a library, sure. Loose dependencies is probably the way to go there.
39:31 But library developers, people who want to contribute to your projects, the developers themselves, maybe you work on a team, having a lock file alongside
39:42 your library is still going to be useful. Right. Like, yeah. Cause that way you can say everyone,
39:47 if somebody makes a change or they report a bug or whatever, they're not bringing in a change from a
39:52 different version of a dependency or like maybe something changed. Right. Yes. Yes. Yeah. And
39:58 then, and it, plus it still allows you to start from a known good spot. And then maybe, maybe if
40:06 you know you want to get the latest, then you can do it in a controlled environment,
40:13 like a sandbox or maybe a CI in a throwaway runner that has no access to any secrets or
40:22 sensitive. I hadn't really thought about having a specific requirements lock file type of thing
40:30 for the libraries that I've been working on for the developers. Right. For people who want to
40:34 contribute because it's just been like a loose requirement so that people that built against it
40:40 aren't pinned into some very specific thing. But yeah, that makes a lot of sense. I think.
40:43 Yeah. There's a, there's a link in that blog posts. It's kind of dated now, but it's from
40:48 the folks who built yarn, you know, JavaScript ecosystem, but they had, they say it a lot more
40:54 eloquently than I can. Yeah. That's the one. Lock files should be committed. On all projects. Yeah.
41:00 It's, I mean, it's a bit old now, but they, they go down the lists and spell it out a lot more
41:05 clearly than me about why libraries even can benefit from, from publishing a lock file.
41:12 Yeah. People can check that out. That's cool. Yeah. And Java, that's the JavaScript package
41:16 manager. So in JavaScript years, like a hundred years or something, it's been a couple of years.
41:19 That's right.
41:20 You got dog years, you got JavaScript years, JavaScript years just tick by like second,
41:25 the second hand. Yeah. Yeah. All right. Cool. So I see we're making great progress here. Our
41:30 list of things to talk about here. I've gone through three and I like 15 left. We'll have
41:35 plenty of time. So yeah, let's see. So another one, another PEP I think we're talking about
41:44 here is 517, a build system, independent format for source trees. I have no idea what this is.
41:50 What is this?
41:51 Yeah. Pep 517 and 518 kind of go together. This is, this was like the transition away from
41:58 setup.py towards pyproject.toml. 518 is the one that specifies pyproject.toml
42:04 kind of things that go in it. And then 517 is all about build systems and build backends.
42:13 So like in your pyproject.toml and your build system key, you'll often see things like poetry
42:23 core or flit or hatchling or these kinds of things. And so it's 517 is specifying what it means to be
42:31 one of those build backends. It's really just defining two mandatory hooks. What does it mean
42:38 to build wheel and build sdist? There's three optional hooks as well. And I think there's even
42:44 another PEP that followed on from this that talks about building editable packages or-
42:49 Right. The dash E equivalence.
42:54 Yeah. Yeah, exactly. But really it just boils down to defining a way to build a wheel and build a source distribution.
43:02 Yeah. And this is part of what opened up all the different choices we now have for package
43:08 management and things like that, right? Because now there's a common way they can all work together.
43:14 A little bit like WSGI.
43:15 Yes. Yeah.
43:16 Yeah. I've been using hatchling for my build backend recently and it's been working real
43:19 nicely.
43:20 Okay. Yeah. I was just looking at hatchling the other day and they've got- Yeah. Yeah.
43:27 They're one of the build backends that offers build hooks, which- So prior to pyproject.toml
43:38 and wheels and bdus_wheels and you go back to the source distributions and your setup.py files,
43:45 where it's just Python code. You can be doing anything in your setup.py file,
43:52 which runs when you install the package. Well, now we're starting to see methods to do the same
43:58 thing in these more modern packaging or build backend. So like hatch has their
44:02 build hooks, build system hooks where you can point it to, I think, yeah, just Python code and
44:11 have it run as part of the build.
44:14 Yeah. At least it only runs at build time, not install time. Right?
44:21 I'm looking at the documentation now. Yeah. This is still new to me, but there might be
44:25 hooks for install as well.
44:28 Okay. While you're thinking about it, one of the things, I got a couple of questions I want to
44:34 highlight from the audience here, but also one of the things that I think maybe was considered,
44:41 I have no awareness of this, but if it wasn't, it would be excellent is what if the people at pip
44:47 just pre-computed all that metadata from, at least for the common platforms that you would get,
44:54 that pip needs to download, run setup.py and then throw it away just to get that data.
44:59 Like for Mac, Windows, and Linux, if it would just go, okay, we're just going to, as you upload it,
45:05 it would just kick off a job that does that on those three platforms and puts it in a JSON blob.
45:09 It seems like that would be worthwhile.
45:12 I'm fairly certain there's discussions already around that type of a solution and maybe even a
45:18 PEP for proposal for it, but yeah, getting away from having to build a package just to get metadata.
45:24 You got packages that are downloaded billions of times with a B, it's insane.
45:31 And if somebody could do that three times instead of a billion times, it would make it work faster and it would also make it safe. Right? I think it'd be great.
45:40 All right. A couple of questions here. This one. So Tony on the audience says,
45:47 pip compiles great for finding your transitive dependencies. One interesting thing that they've
45:53 done is package up code with pants build, which supports locks files just to look through what
45:58 code gets packaged up. Is this anything you've explored?
46:01 I've heard of pants. I haven't looked into it myself yet.
46:06 Okay. Yeah. So just use it like, okay, you're going to have to build this thing and give me
46:11 a little manifest and whatnot. And then we can just look at that. That's cool. And then Tamir
46:15 says, do you have a solution for taking already locked dependencies with you when you start a new
46:20 app? I'm guessing, you know, maybe, yeah, I don't know. I guess maybe you've already got a project
46:26 you're working on and you want to say like, I want this project to use that. Probably you could
46:29 just copy the lock file. Right? Yeah. Yeah. If you, I mean, if you really, I mean, really,
46:35 you're going to, if you start a new project or new application, you're going to, you're going
46:39 to have new manifest file, you know, pyproject.toml, maybe you have the same dependencies,
46:44 the top level dependencies or not, but the, the fully resolved set of dependencies that makes up
46:50 your lock file that, that can very easily be different. So I'm not exactly sure how you just
46:56 poured over one to another. One more bit from Tony. And this is something that I now remember
47:02 from pants is this, if it just looks through your code and if you use the import statement,
47:07 regardless of whether you've put it in your requirements files, it'll figure out what
47:12 your requirements files should have been. If you were a bad developer, basically,
47:16 that's cool. Just to see what it uses. Yeah. Nice. All right. On to the next thing,
47:22 specify in PEP 5 1 8, specifying minimum build system requirements for Python projects.
47:28 Yeah. This is pyproject.toml. This is the, this is the, the PEP for that.
47:33 There's not much to it other than to say that they've settled on that name,
47:38 rejected a bunch of other possibilities. And then they've got the, you know, the,
47:42 the few entries that are required, like for your, your finding your build system.
47:46 Yeah. You don't have to have a pyproject.toml for Python, but if you're building a Python library
47:54 and you don't want to use setup.py, then you're much better off having a pyproject.toml, right?
47:59 Yes. Yeah. Yeah. It's more in the library side that it, I mean, it's not that you can't use it
48:03 on an application, but it's more required on the library side. Yeah. That's the thing. All right.
48:09 So let's talk about some of the ways in which your packages might go wrong. We've already
48:14 talked about typosquatting and we also talked about everything that's different. Yeah. But yeah,
48:20 new typosquatting is, it is tricky. I think it's pretty well understood at this, this point,
48:25 but maybe just tell people real quick to cover that base, you know?
48:29 Sure. Type of typosquatting is, is, you know, publishing a package with a name that's similar,
48:35 but not the same as, as a, as a existing known good package. Right. So like, instead of requests,
48:43 maybe you, you get request without the S or, you know, one that gets me, cause I,
48:49 cause I make the type of all the time was, is the cryptography package. Like, like if I, you know,
48:54 if I put you on the spot, would you know how to spell cryptography? I always get the first couple
48:59 of letters, you know, jumbled up a bit and, and there have been malicious packages published and
49:04 then taken down with, with you know, spelled C-R-P-Y instead of C-R-Y-P, cryptography. Right.
49:13 Yeah. But, but the idea is that, you know you, you can overlook a package cause it looks like a,
49:21 it looks like a good one. It's not necessarily that you're going to, you're going to install
49:25 it because you type it wrong. Although that is, that is, you know, one technique, right?
49:30 The drive by installs where someone just bat fingers the package name. But really having a
49:38 typo squatted package is going to allow these threat actors to be a little more stealthy
49:44 in their inclusion of that package in, in legitimate code reviews and commits and
49:50 dependencies of dependencies. Right. And so the other, the other thing that goes with
49:55 typo squatting, I don't know if I had a link for you there yet is, is star jacking. So
50:00 a lot of times if you're going to typo squat on a known good package, okay, there it is.
50:07 You know, these, these, these threat actors, they just, they just straight up copy the known
50:13 good project, right. It's just clone the repository and then change the package name.
50:19 And, and then when they, when they post the package to PyPI, for instance, the metadata
50:27 that goes with the package still exists, right. So on PyPI for a given package, you can see on
50:34 the left-hand side, it shows like some, some statistics. If, if the URL was given to like a
50:42 GitHub hosted project, for instance, it'll go in there and tell you how many stars.
50:48 Right, right, right. That's actually a signal that it seems like it should be good, right. It'll have.
50:54 Yeah. That's what star jacking is doing is just copying the metadata of a known good package.
51:02 So that on first look, yeah, there you go. You can see.
51:05 I did pull that pytest and it says statistics, GitHub statistics, 11,000 stars,
51:11 2000 forks. Okay. This is legit. Let's install it.
51:13 Right. So I could go clone pytest repository right now, change the name to pytest spelled P-I-T-E-S-T.
51:20 And then, and then push the math version of testing. Yeah. And you're going to get these
51:25 same statistics and you're going to get the same maintainers that you see if you scroll down a
51:29 little bit in the, the metadata. Yeah. So you get the maintainers list, all of that metadata that
51:37 you, you, you enter in your pyproject.toml or setup.py file gets read here on PyPI and just,
51:45 just publish. So you can, you can fake people out, right?
51:48 Yeah. That's actually really, okay. Well, there's a new terrifying thing that I hadn't thought about.
51:52 Yeah. Yeah. So, so star jacking and typosquatting where you just take a known good package, clone
51:58 it, and then maybe you, you make a change to you know, existing function, you know, the function
52:04 does what it's supposed to do, but it also does some other stuff like ship off secrets from your,
52:09 your CI server or you know, It could lay dormant and wait for some sort of production environment
52:16 and grab some SSH keys or something terrible. Yeah. Yeah. Yeah. That's, that's, that's the
52:21 other, the other dependency confusion. Okay. That's the next one you've got up.
52:26 Yeah. This is the one we kind of talked, it's similar to what we talked about before with,
52:30 I can't remember, but I said, there's, there's, we're going to come back to this. So here,
52:34 here it is again, this is a dependency confusion where if you get the wrong version or the wrong
52:40 name, it could actually, you try to be safe by having a white listed list or say, well, it's,
52:46 it's, so this is one where it's the same, same package name, different source of where you
52:52 acquire that package. So this is you'll, these attacks are mostly like companies, enterprises,
52:59 yeah. Yeah. So it's an artifactory and we, we only put our stuff there and we're,
53:07 we're going to call it like, you know, international company underscore data access.
53:12 That's right. And, and it's, and it's, and it's tricky because if you don't know, like if you
53:17 don't have your build system set up in a way, and then your CI server set up in a way to install
53:23 your dependencies in the proper order, like excluding public registries first, and only
53:28 looking for packages in your private registry, then it's very easy, especially with pip, which
53:34 defaults to looking on PyPI, the public registry first, and then only falling back to your, your
53:40 extra index URL specifications. Secondly, that if you, if someone had the knowledge or just guessed
53:49 at the package name that you had published on your internal registry, and then they made their
53:54 own package, the same name, but put it on PyPI, that's the one that's going to get installed.
53:58 And there was like a whole series of, you know, bug bounties that were claimed over this back a
54:06 few years ago, because people just went around, you know, guessing at internal package names,
54:11 or maybe they used to work there or new people. Yeah. Yeah. Yeah.
54:14 Just to share your requirements at TXT with me. Right. Right. Right. Right.
54:19 You know, it's, it's kind of, it's extra sneaky because it only affects people. It only affects
54:28 people who are going out of their way to be more secure, right? They're going out of their way to
54:33 say, we're only going to, we're going to actually set up a whole server and we're going to whitelist
54:38 a bunch of stuff. You can only ask for the names of the things on this server and, ah, you know.
54:43 Yes. And that, that might still work if you limit it to your internal registry only, or a mirror,
54:50 perhaps, of, of the, the public registries.
54:53 What do you think about that? It's pretty easy to create your own internal copy,
54:59 download a bunch of extra ones and mirror them locally and say like, these are the ones that are pre-approved at our company. Nothing else.
55:06 Yeah. Yeah. I, I, I've worked in a environment where that's exactly what we did. And,
55:12 I think there is merit to that. You just have to know that anything you're mirroring
55:17 to the trusted internal network is in fact secure. You know?
55:21 Yeah. Yeah, for sure. I think, you know, it doesn't really make sense except for a few,
55:27 very rare cases to say you cannot use external dependencies.
55:31 Right. Right.
55:32 You're just saying what we want is to not build software, but while the rest of the world does,
55:36 you know, because that's part of the magic. We just saw there's over half a million libraries
55:42 you can choose from. When you say we have zero of those, you're really, really constraining
55:47 the type of software and the velocity at which you can build.
55:51 Yeah. Yeah. It reminds me of, there's that line, you know, like, why, why do you rob banks?
55:58 Because they have the money.
56:00 Because that's where the money is. Right. It's like, well, why do attackers,
56:03 why are attackers going after open source software now? Like, well, that's, that's where
56:08 it's easiest to get arbitrary code to run. That's where developers are. That's what.
56:13 That's what to be fair though. It's not only, it's not only right. There's SolarWinds,
56:17 which really had almost nothing to do with open source, but it had to do with CI/CD systems and
56:22 other sneakiness. Right. Yeah. Yeah. And got into places that, you know, instead of getting into
56:28 libraries, you get into the build system and you just give it a little extra, a little extra include
56:32 tag there, bringing that deal out. Like you said, right. So dependency and confusion is sneaky
56:39 because you're asking for a local version off a local server. It doesn't exist on PyPI, but if it
56:44 could be made to exist on PyPI, all of a sudden that gets installed. That's potentially, that's
56:49 not good.
56:50 Potentially. Yeah. Yeah. It's, it's, that's, that's how it works in all the, in all the default
56:54 cases. And it's, it's pretty tricky actually to, to exclude, to do it in the correct order and
56:59 exclude those public registries.
57:01 Yeah. What's what I do to help this is I just, I just run the UUID command to get one of those
57:08 16 digit arbitrary X things. And I just name all my libraries that, and so it's like, oh, you have
57:13 the F3DC. That's the API one. That's right. That's that, right. No one is going to do this.
57:21 It's such a safe space. I tell you. All right. Onto the next one.
57:26 That, that would work.
57:27 Expired author domains. This is super sneaky.
57:32 Yeah. Yeah. So this is one, you know, it, it might be less of a factor now. I think,
57:41 I think it was just earlier this month that PyPI enforced two factor authentication for
57:47 all their users. But a lot of sites and, you know, even PyPI, I think before this month,
57:57 have, you know, password reset features where if, if you lose access to your account or you
58:03 forget your password, just, you know, send me an email, reset your password. But it's,
58:08 it's, it's very possible that people, you know, years ago submitted a package. They,
58:14 they don't maintain it anymore. They submitted it under an old email account that has expired.
58:20 Right. Maybe they had some domain. Yeah. Special doesn't work that well for Gmail or Outlook.
58:26 You had a custom domain and as would be awesome. Have your own, you know,
58:33 Michael@talkpython.fm that kind of thing. Yeah. Yeah. Say you, you win the lottery and,
58:39 and you know, decide to put your job. Yeah. Then you let your domain expire and
58:44 well, maybe there's still a linkage for the talk Python domain to PyPI. And then I go and
58:51 buy that domain and, you know, request password server. Yeah. Yeah. And then now I, now I can
58:58 publish new versions of the packages there. Yeah. Yeah. It's not good. Yeah. Yeah. So I don't really
59:06 know what to do about that one, but there's an amazing, amazing joke that I found on Mastodon.
59:10 Somebody posted, sit here. It's a two big red buttons. Think Ren and Stimpy or whatever. And
59:19 one of the red buttons says, admit to yourself that your dream is dead. The other one says,
59:23 pay $12 for domain renewal. Right. I mean, it's funny, but there's plenty of people who will get
59:30 a domain and I totally go. And then it's like, you know what? I haven't done anything with that
59:34 for like five years. I'm not paying another 12 bucks, but if they had set up an account under
59:38 that, right, this is what you're talking about. Yeah. Yeah, exactly. Yep. That's why you got to
59:44 buy your domains for that a hundred year renewal period. Exactly. Take out that loan. You get your
59:51 domain. All right. We're getting short on time here. I want to, let me, let's just go through.
59:57 I'll just list off a few real quick. Maybe we do lighten round. Okay. Okay. Unverifiable dependency.
01:00:02 Okay. These are for specifying dependencies that are not necessarily published to PyPI, right? So
01:00:11 that maybe you're pointing to a GitHub repository. You know, pip calls these VCS project URLs. You
01:00:19 know, if you, if you look in there, their help output. Yeah. It's like pip installed Git plus
01:00:24 HTTP to a thing that has a project. And that, and that thing, it can point to a repository.
01:00:30 Maybe it points to a tag. Maybe it points to a branch. None of that is stable, right? Like you,
01:00:37 the tag could change out from under you or the code that's related to that tag could change
01:00:43 out from under you. The code at the branch you're pointing to could change while the name remains
01:00:48 the same. So, you know, those are, those are, those are risky for that reason, right? If you're
01:00:52 not pinning to a very specific version or a very specific hash, right. If you're going to point to
01:00:57 a repository or a Git URL. Yeah. Make sure it's true. I've gotten to feel a lot of times like
01:01:02 the hash is maybe a little bit redundant given the immutability of PyPI. But if you're pointing
01:01:07 at something like this, then maybe all of a sudden you really do want that. For sure. Yeah. Okay.
01:01:12 Repo jacking. Yeah. This is similar to the expired author domain, right? So if someone was,
01:01:21 you know, pointing to one of those Git dependencies, a VCS project URL as pip calls it,
01:01:27 and you know, that account went dormant or expired, relinquished, whatever,
01:01:33 and someone else took it over, then yeah, they can now, they can now dictate what's there. Yeah.
01:01:40 Yeah, exactly. People are requiring. All right. And then maybe last bit, get a chance to talk a
01:01:46 bit about your Phylum CI project. I do want to point out really quick though, that Phylum was
01:01:53 a sponsor of the show a while ago, but this is not a sponsored episode. This is just, you and I had
01:01:58 been talking prior to that actually, and decided to like put the show together. So just to be clear,
01:02:03 but let's talk about this, what this project you guys got anyway. Yeah. Yeah. So you can pip install
01:02:09 Phylum right now, or like I prefer PipX, PipX install Phylum. Yeah. I love PipX. It's awesome.
01:02:16 Yeah, me too. Yeah. I think I heard about it from you actually.
01:02:19 So the circle goes. Yeah. Yes. Yes. So this package, it does two main things. One is it can,
01:02:27 it'll expose us to entry points. One of them is called Phylum init, and that'll get you the Phylum
01:02:34 command line interface written in Rust, but installed with Python. It'll get you the Phylum
01:02:44 CLI locally. And then the other one is, it's called Phylum CI. That's just a catch all entry
01:02:50 point. The thing that gets exposed through our Docker container to handle almost all of our
01:02:55 integrations. So if you want to monitor your PRs on GitHub, for instance, we've got an integration
01:03:03 for that. So the idea is basically that I could set this up in GitHub, a PR comes in, I could set
01:03:08 up an action, Phylum will scan it for known mischievousness and make that part of the PR,
01:03:16 or maybe even block it out, right? Yeah, exactly. It'll fail your build if you don't pass your
01:03:21 default policy or established policy on any of your given lock files or manifests. We deal with
01:03:28 manifests as well. And you mentioned GitHub. So even with GitHub, we went a step further. We have
01:03:33 an app as well. So you don't even have to modify a workflow. You could just install a GitHub app and
01:03:38 automatically monitor your repositories. But a lot of the other ecosystems don't have that. So we
01:03:47 just provide Docker containers. I love the Docker container. So you use Docker run against your code
01:03:54 or whatever. Yeah. And then there's even a pre-commit hook we expose as well. Nice.
01:04:04 I genuinely don't know the answer to this question. Does this cost money?
01:04:08 No. Anyone can sign up for free. There's a community edition where you can have up to
01:04:16 five projects. Okay, cool. You guys have to eat. There must be some way you charge for something.
01:04:21 Oh, exactly. Yeah. So there's the paid version, right? Which, you know, unlimited projects,
01:04:26 you get access to group-based management. You know, there's a few extra features. It's a
01:04:31 freemium model. A little more of a Teams, enterprise-y angle. Yeah. But for this audience,
01:04:36 I would love if everyone just went that little extra step of securing their open source software
01:04:44 and go with the free option. I'm not trying to sell you anything here. Just
01:04:47 monitor your manifest, your lock files, make sure that you remain secure and not exposing
01:04:56 your secrets. Because that's what we're finding now, is that developers are the new high-value
01:05:01 targets. That's what attackers want to go after because we know that developers,
01:05:06 they have the secrets. They've got the keys. We write the code that then gets run on the
01:05:12 production server inside the firewalls. Yeah. We have all the access, all the secrets, all the
01:05:19 keys. So, you know, if you can find a way to get arbitrary code from strangers to run on developer
01:05:26 systems, you're going to have a much better chance. We have a good time. Yeah. We have a good time.
01:05:30 I thought I mean having a bad time. Right. Yeah. Doing bad things. Okay. Let's not do that.
01:05:37 Awesome. Well, excellent work. I think probably we'll kind of just leave it there. We're pretty
01:05:40 much out of time for the rest of the stuff, but close it out for us, Charlie. People are, maybe
01:05:46 both have a few new tools to work with, but also techniques, but maybe also a little freaked out.
01:05:51 What do you tell them? I recommend everyone to restrict their use of dependencies to lock files.
01:05:56 And then carefully gate, regard the inclusion of new lock files or updates of existing ones,
01:06:04 or sorry, dependencies in those lock files with careful analysis. Don't allow arbitrary code to
01:06:10 run anywhere in your development process and give filing a try. You know, we've got the free
01:06:14 community edition. We will provide that analysis and ensure that you don't have malware running on
01:06:20 your system through bad dependencies. Awesome. All right. Well, it's been very interesting and
01:06:25 a lot of new things to think about. So thanks for being here. Thank you, Michael. Yep. See you later.
01:06:29 This has been another episode of Talk Python to Me. Thank you to our sponsors. Be sure to check
01:06:35 out what they're offering. It really helps support the show. Take some stress out of your life. Get
01:06:40 notified immediately about errors and performance issues in your web or mobile applications with
01:06:45 Sentry. Just visit talkpython.fm/sentry and get started for free. And be sure to use the promo
01:06:52 code Talk Python, all one word. Mailtrap, an email delivery platform that developers love. Use their
01:06:59 email sandbox to inspect and debug emails in staging, dev, and QA environments before sending
01:07:04 them to recipients in production. Try Mailtrap for free at talkpython.fm/mailtrap. Want to level
01:07:11 up your Python? We have one of the largest catalogs of Python video courses over at Talk Python.
01:07:16 Our content ranges from true beginners to deeply advanced topics like memory and async. And best
01:07:21 of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm.
01:07:26 Be sure to subscribe to the show. Open your favorite podcast app and search for Python.
01:07:31 We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed
01:07:36 at /play, and the Direct RSS feed at /rss on talkpython.fm. We're live streaming most of
01:07:43 our recordings these days. If you want to be part of the show and have your comments featured on the
01:07:48 air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host,
01:07:54 Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and
01:07:58 write some Python code. [Music]
01:08:14 [Music]
01:08:19 [ better right now ]