Learn Python with Talk Python's 270 hours of courses

#387: Build All the Things with Pants Build System Transcript

Recorded on Thursday, Oct 6, 2022.

00:00 You have a larger growing Python code base? If you struggle to run builds, tests, linting,

00:05 and other quality checks regularly or quickly, you'll want to hear what Benji Weinberger has

00:10 to say. He's here to introduce Pants Build to us. Pants is a fast, scalable, user-friendly build

00:17 system for code bases of all sizes. It's currently focused on Python, Go, Java, Scala, Kotlin,

00:23 Shell, and Docker with more languages to come. So you can see on that project that even has

00:27 multiple languages at play. This is Talk Python to Me, episode 388, recorded October 6th, 2022.

00:35 Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy.

00:53 Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past

00:57 episodes at talkpython.fm and follow the show on Twitter via at Talk Python. We've started

01:03 streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over at talkpython.fm

01:09 slash YouTube to get notified about upcoming shows and be part of that episode.

01:13 This episode is brought to you by the Local Maximum Podcast over at localmaxradio.com

01:20 and Microsoft for Startups, Founders Hub. Get support for your startup at talkpython.fm

01:25 slash founders hub. Transcripts for this and all of our episodes are brought to you by Assembly AI.

01:31 Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code.

01:36 Visit talkpython.fm/assembly AI. Benji, welcome to Talk Python to Me.

01:42 Thank you. It's great to be here.

01:43 It's great to have you here. I'm quite excited to talk about pants and build and bringing a little bit more structure and automation to the developer workflow using this tool that you all built.

01:55 Very happy to talk about that. Something I've been very passionate about for a long time.

02:00 Yeah, you've been working on this quite a long time, as we will see. But before we get into all the details there, let's start with your story.

02:06 How do you get into programming and Python?

02:07 I love this question because I've been a software developer for 25 years or so. So I've been around for a while.

02:15 You and me both, almost the same duration there.

02:17 So, you know, we've seen some stuff. But I first got into computers when I was about 10 years old.

02:21 And my uncle, who is a big gadget nut, bought a very, very early home computer. This was in the UK. So it was the Sinclair ZX80, if anyone's familiar with that.

02:32 1K of RAM, you know, 8-bit machine. It was, I had just never seen anything like it. And I was instantly smitten by it and got really into it then.

02:43 And at some point, got my own home computer and at some point realized, oh, this is sort of a thing I want to do for the rest of my life.

02:50 Now, granted, at that time, I also thought I would play with Legos for the rest of my life.

02:54 So it's not like I'm always right about that. But in this case, I absolutely was.

02:57 It's pretty awesome that we get to do that, right?

03:00 You know, it's like, oh, this is such a neat little project. And I would just do this for fun.

03:04 But if people pay me, I get to build even more ambitious things.

03:08 And a lot of times those are just the thoughts and dreams of kids who don't know better, right?

03:13 Just wait till you get in the real world. But as programmers, that's not true. Like we get to do it all the time.

03:18 It's absolutely unbelievable that your childhood hobby can become your grown up profession if you have the right hobby.

03:23 Yeah, exactly. It's important to pick good hobbies, kids.

03:26 If you're out there listening and you haven't picked a hobby yet and you're not there, yeah, pick a good hobby.

03:30 Yeah.

03:30 You talked about the 1K of RAM and the retro thing. I just started working on this project last night.

03:38 No, two days ago using CircuitPython. And for, what is that thing? $17, I got an ESP32 feather.

03:45 Whoa.

03:46 Little microchip. It has a Wi-Fi built in. It's got like a temperature sensor and all these things.

03:52 And all you need to make it go is just plug in a USB-C. That thing is like a 240 megahertz.

03:58 Two megs of RAM, four megs of storage, which is not much at all. But like it fits like it has two thirds of your hand.

04:08 And it's unimaginably powerful compared to what the types of computers you're talking about, right?

04:12 Oh, that, as I said, 1K of RAM. It had the Zilog Z80 was the processor, which I believe was clocked at about four megahertz or something.

04:22 It was, yeah. I mean, and at the time, you know, when you're a child, a million sounds like a lot.

04:28 So I'm sort of, what do you mean megahertz? Like I can't count to a million.

04:30 Yeah, that's unbelievable. That's right. And here we are, well, well past the early Pentiums for $14 or $17.

04:37 Anyway, it's just, it's really interesting to think about the different types of computers that we have to work with and where we all start.

04:45 The other thing that I always find interesting is thinking back to early 90s, late 80s, those computers and their interfaces were so basic.

04:54 And yet the possibility that at least I experienced when I worked with them, it seemed so great and so amazing that even stuff today doesn't come close to where you're like, I see where this is going.

05:03 It's going to be incredible.

05:05 There's definitely a joy. I mean, when you use the word basic, I'm assuming in lowercase, but the interface to many of them was literally the language basic all uppercase.

05:12 Or you could just write machine code directly. And those were your only two options essentially.

05:18 And so you were either, you know, 10 print your name, 20 go to 10, or you were messing around with registers.

05:27 And once you learned how to do that, it was such a joy.

05:32 You're essentially, you're melding with the hardware in some way.

05:35 And it's, yeah, today, obviously, mostly for good, we are 19 layers removed from the hardware.

05:42 And if you're going to be removed from the hardware, Python's a good way to do it.

05:45 But yeah, definitely some joy has been lost and replaced with other joys.

05:50 It's a new kind of joy.

05:51 Now I pip install something in three lines.

05:53 I have like a cluster of servers at my command.

05:55 It's a different kind of joy than working with registers.

05:58 All right, well, let's get into the main topic.

06:01 Let's talk about pants, this project that you all have created.

06:04 Its role, as we started, stated briefly at the beginning, is really about helping orchestrate common tasks that we have to do to build and run and prepare software.

06:15 And it's only getting more and more complex.

06:17 I guess Python grows up, as you sort of put it, as we were chatting before I hit record.

06:21 As Python grows up, as it's being used on larger projects, as it's being used across larger teams, the expectations of what it means to have a piece of software and run it is changing and evolving, right?

06:35 So maybe we could just start by talking about some of the pain points of large software, scaling software, Python in particular.

06:42 Sure. An important piece of background here that I'm sure all the listeners will be familiar with is what has happened to Python in the last 10 to 15 years.

06:50 I mean, I started programming in Python about 20 years ago on Python 2.2.

06:55 It was sort of fancy bash at that point, right?

06:58 It was just the language, the little bit of glue you used around the edges of your real code, which was written in C or whatever.

07:05 And fast forward to today, and if you look at the process of particularly over the last 10 years, Python is this absolutely critical language that has grown up incredibly and is now being used to build large systems.

07:17 It's being used as the language of choice for data science.

07:20 It's being used as the language of choice for DevOps.

07:22 It is absolutely crucial language that large, growing, scalable code bases are increasingly being built out of.

07:31 And that's presumably everyone listening is a fan of Python as I am.

07:34 And so there are good reasons why this is the case.

07:37 But we're still a little behind the curve on the tooling that you need to grow a Python code base.

07:44 The standalone Python tooling, and there are so many great tools in that toolkit, they're pretty single use.

07:52 And they tend to be designed to assume that you have a small, single use, sort of standalone binary that's the only thing that your code base contains.

08:04 But now increasingly, you have these large, growing code bases, sometimes referred to as monorepos, where you have a lot of Python code, possibly code in other languages as well.

08:15 You're trying to share code across a bunch of different projects and a bunch of different binaries.

08:20 You have a lot of, you may be deploying out of a single code base.

08:23 You want to deploy many microservices.

08:25 You want to deploy AWS Lambdas.

08:28 You want to deploy many different Docker images.

08:31 You have this complexity that you need to manage.

08:34 And so with other languages, there have been tools around to help you deal with this problem.

08:41 Python and for more contemporary languages, many of the solutions tend to come standard with that language's tool chain, such as, you know, Go just comes with a Swiss Army knife and Rust comes with a Swiss Army knife.

08:57 Python does not, for better or worse, come with a Swiss Army knife.

09:00 It comes with, you know, there are many, many blades out there.

09:03 And particularly in its focus on Python is designed to help you grow and scale that Python code base so that all of the steps you need to take to go from authoring some code to having it be validated, tested, checked, you know, have passed all its quality control checks and be ready to be deployed or used in production.

09:25 Rather than you manually having to figure out, well, which tools do I need to invoke in which order?

09:32 How do I need to ensure that they are consistent?

09:33 How do I need to ensure that they are consistent?

09:33 How do I do the least amount of work in the least amount of time that is necessary to assure those quality checks?

09:41 That is all a very automatable problem.

09:44 And that's essentially what Pants is.

09:46 It can look at your code base.

09:47 It can look at changes to your code base and say, oh, you want to run tests?

09:51 Here's the actual work that needs to happen.

09:53 And here are the tools that need to be invoked.

09:54 Sure.

09:55 So maybe you want to run pytest.

09:57 And as we'll learn, Pants has some great ways to speed up things like executing tests.

10:04 Some of that's parallelism.

10:05 Some of that's going, you know what?

10:07 We already did that work.

10:08 Nothing's changed.

10:08 Carry on.

10:09 Exactly.

10:10 Pants is essentially the layer between either you as a developer working on your laptop or

10:15 your CI environment and the underlying tools of which, you know, there are so many.

10:22 And Pants, I think, supports well over 20 Python tools.

10:25 And it's fairly easy to add more.

10:28 So, for example, test being a very important example.

10:31 You can say, hey, Pants, test, you know, run all the tests on everything that is affected

10:35 by my current set of changes.

10:37 The system can look at your code, look at the dependencies, perform all this analysis and

10:42 say, well, that means I actually need to invoke pytest on these underlying tests.

10:47 And if that means I need to first install pytest in a hermetic environment, I will do that.

10:52 Pants also runs everything in these hermetic sandboxes, which means they are neither consume

10:59 nor create side effects, which means all of this work can be cached at a very fine-grained

11:05 level.

11:05 You can cache the result of an individual test, apply concurrency at the level of an individual

11:10 test.

11:10 And so if you have eight cores on your machine, you can run eight tests at the same time and

11:16 you only need to run the ones that have not, whose inputs have changed because everything

11:20 else is cached potentially.

11:22 Yeah.

11:22 Yeah.

11:23 So that's the test example.

11:25 Then you could look at another really important quality control check is linting and formatting.

11:30 There are, I don't know, eight or 10 different linters and formatters for Python and Pants can

11:35 orchestrate them all in the right order.

11:37 It understands the distinction between linters, which don't modify your source code so they can

11:42 all run concurrently, and formatters which do modify your code so they have to run sequentially.

11:49 So it's a lot of, it's this layer between you that allows you to not worry about which tools

11:53 do I need?

11:53 How do I install them?

11:55 How do I isolate them?

11:56 How do I cache their results?

11:57 How do I reason about concurrency and what can and can't be run together?

12:01 It just takes that away from you.

12:03 One of the most powerful things for me personally, when I see all of this, how it brings all of

12:09 these things together is a lot of times, well, yeah, I should probably run the linter on that,

12:15 but it's fine.

12:17 Maybe I should run the tests, but I didn't change that much.

12:21 I mean, it was only a couple of lines of code.

12:23 I don't really, you know, like there's these different steps you got to keep in mind.

12:27 And at each level, you're like, do I need to do this?

12:30 Or do I need to remember to do it?

12:32 Or is it justified to disrupt what I'm thinking about?

12:35 And if it's just, you know, pants build or pants tests, and you don't have to worry about it,

12:40 and the system just does it all for you, you, at least personally, I am more willing to adopt

12:46 more software engineering practices and guards on my code.

12:51 If it doesn't feel like I have to do them, you know what I mean?

12:54 If I don't actually have to remember like, well, I was doing five steps and that was a lot.

12:58 Now I'm doing six and I'm really sick of these steps now, you know, if it's the same number

13:02 of steps and it's fast enough because of the caching and parallelism, then why not adopt it?

13:07 Exactly.

13:07 And I think an underappreciated complication is when it's not an individual adopting it,

13:13 but a team.

13:13 So a team wants to assume some best practices, but now everyone on your, you want to adopt a new

13:18 linter or a new quality control check of any kind, that's more cognitive load on behalf of,

13:25 you know, now what do you do?

13:26 Do you send an email to everyone on the team saying, well, now you have to run this as well.

13:30 You can set up your existing build layer pants in this case to add, apply that new linter.

13:36 And it just happens.

13:37 Nobody has to change their workflow.

13:39 What, where we want to get to is essentially, and we're in many cases, very close to this

13:45 is you run a single command on, as a developer, you run that same single command in CI and the

13:52 right thing just magically happens.

13:54 And the ability to do the right thing magically depends on the ability to do dependency analysis,

14:00 to build a fine grain workflow, to apply concurrency and caching to it.

14:04 And as I think we'll, we'll get to maybe later, even not just concurrency and caching on your

14:10 machine, but remote execution in a cluster and shared remote caching so that work is being

14:17 shared.

14:17 Not just, you know, it's not just past work that you have done, but past work that anyone

14:21 individually or in CI has done.

14:24 Yeah, that's fantastic.

14:25 And if you've got a large code base, that starts to pay off.

14:28 Yeah.

14:28 Yeah.

14:28 I want to talk to you about monorepos, but before we get to there, maybe you could just

14:33 give a quick shout out about some of the language support.

14:37 I mean, obviously we're talking about Python tools for Python code bases for Python developers

14:42 and data scientists, but we might also live in multilingual heterogeneous environments at

14:48 work and on our projects.

14:50 And we might have, you know, some Kotlin for a mobile app and our Python for APIs or

14:55 something like that.

14:57 So yes, Pants is not a Python only tool, but it is a Python first or a Python centric tool

15:04 in that when we, there is a long history to the project, but the current iteration of Pants,

15:09 which we very unimaginatively called Pants V2, because we're not great at naming, it launched

15:16 almost exactly two years ago, two years ago at the end of the month.

15:18 And with just support for just Python.

15:21 And since then we've added support for Go, for Java, for Scala, for Kotlin, for Shell.

15:27 The next thing we're looking at very closely, obviously, is JavaScript and TypeScript.

15:32 Can't ignore those.

15:33 But one of the things that makes Pants stand out and, you know, the P in the name is no

15:39 accident is the recognition that Python is not, is no longer this third cousin that you sort

15:46 of put at the end of the list of languages to deal with.

15:48 And it's sort of an afterthought.

15:49 But really this thing is designed for Java.

15:51 It was really part of the big driving use cases.

15:54 But if you have many languages in your repo, or if you have no Python at all in your repo,

15:59 Pants is still a useful tool for you.

16:01 Because you still get the benefit of all the analysis that it does on your behalf.

16:05 Indeed.

16:06 And people will see that Python, Pants is written sort of top layer in Python and lower

16:12 layer in Rust and the extensibility layers in Python.

16:15 So there's a lot of Python first, as you say there.

16:18 But I did want to call out, it does work on these other languages.

16:22 So if you're trying to adopt some kind of automation that involves multiple languages, this might

16:27 work for you.

16:27 Yep.

16:28 And we are always interested in people who have opinions about what the next languages

16:34 we support should be.

16:35 Obviously, as I said, JavaScript, TypeScript, very high on the list.

16:37 I suspect Rust is very high on the list, partly because we use it and partly because it is a

16:42 very up and coming and for very good reasons.

16:47 This portion of Talk Python to Me is brought to you by the Local Maximum Podcast.

16:50 It's an interesting and technical podcast that dives into trends in technology, stats, and more.

16:55 But rather than tell you about it, let's hear from Max and Aaron about their show.

17:00 We are now on with Talk Python to Me.

17:03 Let's say hi to all the Python fans.

17:05 Hi, Python fans.

17:06 I'm Max Sklar.

17:07 I have actually done a lot with Python myself.

17:10 So I am a fan of Talk Python.

17:12 Do you know Python, Aaron?

17:13 I took a course years ago, but I am a little rusty.

17:16 We are here today to talk about our podcast, The Local Maximum.

17:20 We've been on a roll lately with a new episode every week, and I wanted to share with you what

17:25 we've been up to.

17:26 Here on The Local Maximum, we tackle subjects in software and technology, topics as diverse

17:32 as the philosophy of probability to Elon Musk's next move.

17:36 For Talk Python listeners, I want to highlight a couple of recent episodes of The Local Maximum.

17:40 In 248, for example, I found out about an open source library that maps the world into

17:46 hexagons.

17:46 And some pentagons.

17:47 I had a discussion with an author about games and puzzles and another on a novel approach

17:52 to doing the job search well.

17:53 We discussed the ramifications of AI-generated art.

17:56 Have we reached peak creativity, or is this just another Local Maximum?

18:00 All developer tools kind of come of age when they can make themselves.

18:12 They're now fully independent where a language or a tool builds its own self with its own

18:18 features.

18:19 So yeah, if you could do that for Rust, then it can kind of be a part of that group.

18:25 Yep.

18:25 Self-hosting is a major milestone in any sort of build type project.

18:30 Indeed.

18:31 All right.

18:31 Quick question from the audience.

18:32 Mustafa out there says, how does Pants handle bulk publishing of packages where I might have

18:36 a set of preconditions to auto-publish it?

18:39 Intervals of all package that meet those conditions or something along those lines.

18:43 Great question.

18:43 So Pants, as I should mention, there are many different types of deployable that it can build.

18:48 I mentioned like AWS Lambda or Google Cloud Functions, or we have a format that's specifically

18:55 of interest to Python users called PEX, which stands for Python Executable.

18:59 And it's basically a single file that contains your Python code and all of its transitive dependencies.

19:06 So it is ready to run as long as there's an Python interpreter on the system you run it on.

19:11 And it even knows how to find that interpreter.

19:13 Interesting.

19:13 PEX came from you guys.

19:15 Yes.

19:16 Oh, I had no idea.

19:17 I mean, I've heard of PEX, but I didn't associate it with Pants.

19:20 That's cool.

19:20 Yeah.

19:20 I mean, other systems can also build PEX, and PEX has a standalone command line tool that

19:25 you can use to build PEXs.

19:26 But Pants is the home base of PEX.

19:29 But I think the question was about building and publishing Python distributions, for example,

19:36 to PyPI, which Pants can obviously do.

19:40 And I'm not 100% sure I'm answering the question appropriately.

19:43 But I think one of the ways that Pants can help you here is it knows when code has changed.

19:50 So if you're publishing a large number of packages from your repo, it can do the math of, you know,

19:56 here are the, by tracking dependencies, it can do the math to say, based on the changes,

20:01 since the last time this thing was published, it now has changed and needs to be republished.

20:07 So it can give you a lot of that logic.

20:09 When it comes to auto-publishing at intervals, I guess I would say Pants can tell you whether

20:15 it meets the conditions based on dependency analysis, etc.

20:19 There is no auto-publishing per se.

20:22 Pants is a tool you have to invoke.

20:24 So you could cron around it or something like that.

20:27 Yeah.

20:28 Cool.

20:28 You mentioned monorepos.

20:30 Now, you also mentioned sharing code.

20:33 If I've got, say, some SQL model definitions that point at what my database looks like, well,

20:41 my API code probably needs access to the right version of those.

20:46 But so do my data scientists for their library that talks using SQL model to get the data into their notebooks.

20:53 And if those things get out of sync, as we know, SQLAlchemy will go bonkers and say,

20:59 you're missing a column here.

21:01 Done.

21:01 Crash, right?

21:03 And so keeping that stuff in sync across these different projects can be challenging.

21:07 Is that the idea behind these monorepos?

21:10 Yes.

21:10 That's one of the reasons why they are increasing in popularity.

21:16 So the problem is, as your codebase grows, when your codebase is small, there are no problems.

21:20 And as your codebase grows, you're faced with kind of two, you're faced with a decision on how to manage that.

21:26 One is to keep breaking it up into multiple smaller repos, essentially, each one with their own build and their own practices.

21:33 And the way you consume code across them is through publishing, through version publishing.

21:38 Maybe you make your SQL models a data package and you publish it to an internal PyPI and everyone consumes it and they pin their

21:45 versions just like...

21:46 Exactly.

21:47 So the problem with that is that there are several problems with it, but a big one is that you're inviting the famous dependency hell problem,

21:54 which is already bad enough with third-party requirements, into your first-party code.

21:59 So the problem is, when you make changes to a library, you have no way of knowing who is consuming you upstream from you and therefore what changes they might need.

22:09 Now, you might say, well, not my problem because everything is supposed to be versioned,

22:13 but that breaks as soon as anyone needs to upgrade anything because now they have this horrific upgrade problem that is happening potentially weeks or months after the changes that are breaking them have happened.

22:25 So you're kind of pushing the problem off.

22:28 Where a monorepo is helpful is that you get this visibility into all of the upstream dependencies.

22:36 Essentially, if all the tests in the monorepo pass, you know that your changes have not broken your coworkers.

22:42 You know that you can use in-repo tools like GitGrep to find or any kind of discovery tools and dependency analysis that tools like Pat's offer within the repo to find out the impact of your changes.

22:56 And this is why monorepos are increasingly popular.

23:00 And, you know, it's not to say that because a bunch of other companies are doing it, you should be doing it.

23:04 But it is instructive to note that Google and Facebook and Twitter and a lot of successful companies have gone in that direction or in Google's case started out in that direction.

23:15 Right.

23:16 It has to be said that with monorepos or without, you need appropriate tooling.

23:21 So at some point you have to pick your poison.

23:23 But the reason I am biased towards monorepos, having worked at companies that have had one unified code base and companies that have had a very fragmented code base, is that the structure of your code base tends to recapitulate the cohesion and structure of your organization itself.

23:39 And if everybody is collaborating on a single large repo within reason, you don't necessarily need to have literally one repo for the entire company, but a small number of large repos with boundaries between them that makes sense because they don't mutually depend on each other, let's say.

23:55 Then that mutuality and that sharing of code creates more cohesion at the organizational level.

24:02 And when you have a very fragmented code base, you tend to have a fragmented organization.

24:06 Now your organization resembles, you know, a loose collection of warring tribes more than a single unified organization.

24:14 So I am biased towards monorepos.

24:17 And while you can use pants very effectively in even, you know, multiple smaller repos, I do think it supports the monorepo architecture really well.

24:26 And the last thing I would say about this is just to really clarify, because we get a lot of questions about this.

24:30 Monorepo is about your repo architecture, nothing to do with your deployment architecture.

24:35 So it is not the opposite of microservices.

24:38 For example, if you have many microservices, you probably want them to be in a single repo because they share, as you said, data models, they share code.

24:46 And it is actually easier to deploy many microservices out of a single repo than constantly creating new teeny repos and having to go through the whole publish and consume dance every time you want to publish a microservice.

25:00 And so publishing many microservices out of a single monolithic repo is actually a common pattern and a very effective one, in my opinion.

25:08 Yeah, I guess if you have a monolith where the code is architectured into one giant thing, it necessarily means that you're probably just going to have it in one repo.

25:18 But if it's microservices, there may be this temptation to have, well, we've got 10 microservices.

25:23 So we've got 20 repos because here's each one for the service and then the shared bits got to be broken out into their own so they can be reused.

25:30 Yes.

25:30 Just essentially, you get a much, in a monorepo, you get a much tighter development loop because you are cutting out all of the publishing.

25:37 Everything is consuming.

25:38 All the inner code-based consumption is happening at head.

25:41 So you don't have this constant publishing and consuming.

25:45 Yeah.

25:45 Yeah, I hadn't really thought about it that way.

25:47 But a lot of the tools, the really good tools that we have, things like PyCharm and stuff, we can open them up and go to a function or a variable or a class and right-click and say, show me all the uses of this.

25:58 But if there's a bunch of different consumers of your library, well, you don't really know because anybody could be grabbing something and using it.

26:05 But if it's a monorepo and it says no usage is found, well, that means more.

26:09 Exactly.

26:09 The consumption metadata on published artifacts goes the wrong way, right?

26:15 It says, the metadata that gets published with the wheel says, here is what this wheel consumes, but it has no idea who consumes it.

26:22 And so if you want to figure that out, now you need a whole bunch of tooling.

26:25 So why not just cut that out entirely?

26:28 Yeah, and you get things like refactor, rename, or to cross the whole company sort of in interesting ways as well.

26:35 Okay, cool.

26:36 Maybe we should touch a little bit on the history of pants.

26:40 There's, I know pants 1.0 has been around for a long time, and then there's this 2.0 version.

26:47 You want to tell people a bit about the changes there?

26:49 They maybe have experienced it a while ago.

26:52 Sure thing.

26:52 So pants, what we now refer to as pants v1, was a project that started as an internal project at Twitter,

26:59 and it was focused primarily on Scala and how can we speed up Scala builds and make them more organized and more tractable.

27:07 It then got open sourced out of Twitter and was used at a handful of other Scala-using companies, notably Foursquare.

27:15 Square used it as well.

27:17 There were a few companies of that vintage of early 2010s, Silicon Valley startups that were using Scala in a big way.

27:24 So, in fact, v1 is basically gone at this point.

27:27 I think there's a handful of organizations still maybe using it.

27:30 We're desperately trying to get them onto v2.

27:32 The pants v2 is a thing that we launched two years ago.

27:36 There's a complete ground-up re-implementation.

27:40 It really only shares a name with the old one and sort of project home with the old one.

27:47 The code is entirely new.

27:48 As you alluded to earlier, one of the big differences in the implementation is that the execution engine in v2 is written in Rust,

27:56 and the APIs are async Python 3.

28:00 But that is an important difference.

28:02 But the bigger difference is the design itself is very different in that the v2 system learned a lot from our experience with v1,

28:11 both in terms of the implementation and how to make important features like caching and concurrency just fall out of the design

28:21 rather than be this laborious thing you have to add at every corner.

28:26 And also, an equally important lesson was, unlike many other systems, including PANS v1,

28:31 which came out of a single company and were really tailored for that company's use cases,

28:36 with v2, we wanted to build something that was for everyone, that any organization, large or small, could use and get.

28:42 You shouldn't have to work at Microsoft or Google or Twitter to get this quality of build experience.

28:48 Anyone should be able to.

28:49 And that required looking at the use cases of a lot of organizations of different languages and different sizes.

28:55 And one thing we learned was nobody wants to write a ton of build metadata.

28:59 If anyone's used a system like PANS v1 or Bazel or Buck or something similar,

29:03 you start by potentially refactoring your code base to be what the system expects.

29:09 And then you write thousands and thousands of lines of what so-called build files.

29:13 We wanted to eliminate all of that.

29:16 And so the system is designed to accept your code base as it is.

29:19 And it doesn't require huge amounts of build metadata.

29:22 It requires small amounts that it can mostly generate.

29:25 But the important information, which is the dependencies, it actually infers at runtime by looking at import statements and various other tricks and heuristics

29:35 for figuring out what your code's actual dependencies are.

29:38 So that saves a huge amount of time and makes it a lot easier to use and a lot easier to adopt.

29:43 So that's kind of why v2 came about.

29:45 We wanted to build something that wasn't like, here's something we built for Twitter, throw it over the wall and you can use it if you want to.

29:51 But here's a thing that was designed for you, designed for Python, designed to be easy to adopt,

29:56 designed to be easy to use, designed to be easy to extend, has a robust API that is async Python 3 essentially.

30:03 And that's where that project came from.

30:05 Yeah, it's really interesting how it came from kind of this big tech side of things.

30:10 But second take, it's like, well, how do we just make this for all the projects, not just the large ones?

30:15 There's that, there was an interesting article and it's, you know, quoted in various forms a lot.

30:20 Like, you're not Facebook, you're not Google, you're not LinkedIn.

30:23 Speaking to most people, right?

30:25 I mean, there are people who actually are there, but like for most people who look at these architectures

30:30 and what's happening, they may, oh, look how they're scaling this.

30:33 And like, yeah, but you just have a hundred users.

30:35 You don't need to like go to that much architecture and crazy designs for, you know, what you're doing.

30:40 And so it's, I can see how it would be a temptation to have like an overly complicated system that comes along.

30:48 But it looks to me like this is really easy to adopt.

30:50 It is significantly easier and we are constantly working on automating the adoption.

30:55 There is a, one of the commands in Pants is, this is the only Pants related pun we've allowed ourselves in the system is called tailor because it quote unquote tailors your metadata.

31:05 It essentially generates, it does inspection of your code and generates a bunch of metadata, not including dependencies.

31:12 Those, as I mentioned, are inferred at runtime, but this is kind of a thing you run periodically because this is metadata that you may want to manually tweak.

31:21 And so we're constantly working on making that easier and easier to adopt for real world cases.

31:27 So just one obvious example is many repos have dependency tangles and circular dependencies and Pants v2 can handle that.

31:35 Those other systems cannot, including Pants v1.

31:38 And those other systems were not really designed for easy adoption because they didn't need to be, because they were only designed to be adopted once by a captive audience of all the developers at that company.

31:48 We want this to be adopted thousands of times by thousands of organizations.

31:53 So we want it to be much, much easier.

31:55 This portion of Talk Python to Me is brought to you by Microsoft for Startups Founders Hub.

32:02 Starting a business is hard.

32:04 By some estimates, over 90% of startups will go out of business in just their first year.

32:09 With that in mind, Microsoft for Startups set out to understand what startups need to be successful and to create a digital platform to help them overcome those challenges.

32:18 Microsoft for Startups Founders Hub was born.

32:21 Founders Hub provides all founders at any stage with free resources to solve their startup challenges.

32:27 The platform provides technology benefits, access to expert guidance and skilled resources, mentorship and networking connections, and much more.

32:36 Unlike others in the industry, Microsoft for Startups Founders Hub doesn't require startups to be investor-backed or third-party validated to participate.

32:46 Founders Hub is truly open to all.

32:48 So what do you get if you join them?

32:50 You speed up your development with free access to GitHub and Microsoft Cloud computing resources and the ability to unlock more credits over time.

32:58 To help your startup innovate, Founders Hub is partnering with innovative companies like OpenAI, a global leader in AI research and development, to provide exclusive benefits and discounts.

33:08 Through Microsoft for Startups Founders Hub, becoming a founder is no longer about who you know.

33:14 You'll have access to their mentorship network, giving you a pool of hundreds of mentors across a range of disciplines and areas like idea validation, fundraising, management and coaching, sales and marketing, as well as specific technical stress points.

33:27 You'll be able to book a one-on-one meeting with the mentors, many of whom are former founders themselves.

33:32 Make your idea a reality today with the critical support you'll get from Founders Hub.

33:37 To join the program, just visit talkpython.fm/founders hub, all one word, no links in your show notes.

33:43 Thank you to Microsoft for supporting the show.

33:46 You talked about some of the tools that you could use.

33:51 Maybe we could go through that list of common tools real quickly and you could just give us your thoughts on why you think it's great and maybe why you might want to adopt it and make it part of your flow.

34:00 Because with Pants, you don't have to add more steps, as we said.

34:04 So some of the tools you've called out are mypy.

34:06 I know you're a fan of Python 3 and Pype annotations.

34:10 Tell people quick about mypy.

34:12 So for those who aren't familiar, mypy brings a level of rigor to your Python quality control that is fantastic.

34:19 Essentially, you add type annotations to your Python 3 code and mypy performs static type checking and is absolutely tremendous.

34:29 You know, it is essentially a sort of compilation step for Python.

34:33 Right.

34:33 It's not actually generating code, but it is performing type checks that finds a wide variety of bugs and issues.

34:40 And I would never go back to non-type checked Python.

34:43 Interesting.

34:44 Yeah.

34:44 So it's like, if I were to compile it, what would happen?

34:47 We're not actually going to, but let's go through that and give you a report, like sort of print out the warnings and errors that you would have seen in a compiled language.

34:54 And then we'll carry on.

34:57 You know, a common way these things are referred to in Python is type hints, which kind of implies they have no effect.

35:03 But with mypy, it's a little bit closer, right?

35:06 Right.

35:06 So they have no runtime effect, it is true, but they have the type annotations.

35:11 Well, they can if you, in certain circumstances, pants itself uses its own code type annotations at runtime in an interesting way.

35:21 But generally, running mypy just is an extremely effective quality control check in your code.

35:28 But getting set up with mypy can be complicated.

35:31 And with pants, you just pants check all my code or pants check all my code that has changed since my last, you know, since my last edits.

35:41 And it will install mypy and it will set it up and it will run it.

35:46 First time you do that, you'll get many, many errors, I'm sure.

35:49 I'm sure.

35:50 I'm sure you will.

35:51 Another one you've given a shout out to is Protobuf for protocol buffers.

35:56 I don't know that I've spoken very much about those at all on the show.

35:59 I mean, people know REST and JSON.

36:02 They may have scars from SOAP and XML.

36:06 I'm not sure how many people are doing it.

36:07 It depends on how long they've been doing this kind of stuff.

36:10 But what's Protobuf?

36:12 I just had XML PTSD for a second there.

36:14 Yeah, I'm sorry.

36:15 I'll send a therapist your way after the show.

36:18 So Protobuf is a really fabulous tool out of Google that generates code in many languages from a .proto file,

36:30 which is a language neutral interface definition language.

36:34 And that works well with gRPC, which is the Google RPC language, where it actually generates RPC code and stubs so that you can use protocol buffers.

36:47 These over-the-wire protocols is referred to as protocol buffers.

36:51 And you can just use them as this binary interchange format that is very efficient over the wire.

36:57 So exchanging binary data, if you say, here's four bytes, that's the account.

37:03 And then here's the length of the string is the next four bytes.

37:06 And then doing that manually is super tricky.

37:10 And so protocol buffers is a formalization of that.

37:13 And then this tool would maybe write the Python code that understands a particular exchange.

37:18 Okay.

37:18 So PaaS knows how to do code generation in general.

37:21 It supports many code generators, including Thrift, which is a sort of similar in spirit to Protobuf.

37:27 But Protobuf is a very prominent one.

37:29 And the idea is that it will generate this Python code out of, you have this very succinct interface definition,

37:35 and it generates fairly elaborate Python code that can serialize and deserialize these messages

37:41 and send them via RPC interfaces and send and receive them via RPC interfaces over the wire,

37:46 where the thing on the other side of that RPC interface might not be Python at all.

37:50 But they're all talking the same IDL.

37:52 And yeah, protos are very efficient binary formats.

37:56 So they use things like variable length integer encoding.

37:59 So the same message will be significantly more compact in Protobuf as it will be in JSON.

38:06 That said, in probably many, many cases, JSON is absolutely fine, right?

38:10 Exactly. There's a ton of value to be like, I can point my web browser or postman or something at that and see the answer.

38:18 Yes.

38:19 Right? Like, that goes a long ways.

38:21 But if you're exchanging really low latency, lots of data as fast as you can, then, you know, JSON is probably not it.

38:28 Certainly XML with namespaces and XSLT is definitely not the right thing.

38:33 So this sounds like a cool, more modern way to do it.

38:37 Some other tools. We've already talked about pytest. People know what pytest is.

38:41 Black formatting, right? Auto formatting a code.

38:44 So is that much?

38:45 Stop the indentation arguments.

38:47 Yes.

38:47 Oh my God.

38:48 When we adopted black in the pants repo, everybody, myself included, got upset for about 10 minutes and then realized that far more important than which format is that there is format that is enforced automatically.

39:00 I personally used to at least prefer two space indents to four space indents. Black is the reason it's called black.

39:06 For those who don't know, is that famous, you can have any color you want as long as it's black.

39:10 I think that was Henry Ford.

39:11 I think so. Model T.

39:13 So basically, it's a very opinionated formatter that just says this is what Python code should look like.

39:18 And you know what? I embrace robot overlords in this case.

39:22 Yeah.

39:23 It's absolutely wonderful. I can just do pants format and it just formats all the code.

39:28 That is the, there are no more arguments. Like the true formatting is whatever black outputs.

39:32 And pants, again, makes it very easy to adopt black.

39:35 It also makes it easy, I should mention, for linters, for formatters, even for mypy, pants has affordances in it to help you adopt them incrementally, which you kind of have to do.

39:44 If you have an existing code base, you may want to adopt, you know, it takes time to, so black is very aggressive, but there are many other linters that pants can run.

39:53 And a doc formatter and flake eight and pilot and bandit and so on.

39:57 And I sort, and you may want to adopt them incrementally and pants has ways to help you do that.

40:03 And certainly with mypy, because it relies on upstream.

40:07 mypy, you kind of have to adopt independency order because it relies on upstream type annotations.

40:12 And so there are ways to do that.

40:14 Right. To get the most out of it, you've got to have the, we call these three functions.

40:18 Well, those three functions have to have type information and be valid.

40:22 And I got to start at the foundation with those.

40:23 Bandit's interesting.

40:25 I don't know how many people run Bandit.

40:27 Probably people who accept user input probably should or input from the internet.

40:33 Tell people about Bandit.

40:34 I'm not super familiar.

40:35 I'm no expert on Bandit, but it's just about security checks.

40:39 So it will automatically find common security issues in your code.

40:43 And again, Pants makes it very easy to adopt it.

40:46 You essentially just enable the Bandit plugin.

40:48 Pants, I should imagine, has this plugin architecture where you can opt in to whichever sort of bits of functionality you want.

40:55 So you enable the Bandit plugin.

40:56 And that's kind of it.

40:58 And the next time you run Lint, it will run the Bandit checks.

41:01 And it will yell at you about all the security issues that's found in your code.

41:04 That's cool.

41:05 So you have these different categories.

41:06 Like you have a test category, a Lint category, and so on.

41:09 And then in your configuration, you can say, when I say Lint, I mean these three things.

41:14 Correct.

41:15 So what you refer to as a category in the sort of Pants jargon is referred to as a goal.

41:20 So it's basically what you type on the command line.

41:22 You type Pants test means run test.

41:25 Pants Lint means run all the Linters.

41:27 But what all the Linters means depends on your configuration.

41:30 Sure.

41:31 That makes a lot of sense.

41:32 Maybe one more.

41:33 I know the list is kind of unbounded in a sense.

41:36 But AWS Lambda or sort of functions, serverless functions in general.

41:42 There's probably other ones.

41:43 You could probably do Azure functions and other things as well.

41:46 The two we support at the moment are Google Cloud Functions and AWS Lambdas.

41:50 Yes.

41:50 So Pants knows how to take your Python code and package it into a Lambda function that you can

41:56 deploy to AWS or a Cloud Function you can deploy to GCP.

42:00 The management of that kind of stuff is super picky because you might have 20 functions coming

42:04 out of all these different pieces of code.

42:06 Did you forget to push that to that particular function point?

42:09 That's really hard to do to keep that straight.

42:11 If you're using serverless, if you're using Cloud Functions, you probably have many of them.

42:16 And a tool that can tell you which ones need to be redeployed based on changes, because it's

42:22 very simple with Pants to say which Cloud Functions have been affected by any transitive change since

42:29 this tag, since this deployment tag.

42:31 And it will tell you, it'll just give you a list and you can just repackage them and redeploy

42:35 them.

42:36 Without having to repackage and redeploy everything every single time.

42:39 Awesome.

42:40 Yeah, that sounds incredibly helpful if you're using them.

42:42 So the reason I wanted to go through that list here is like if adopting those tools and those

42:47 features sounds interesting, the more you adopt, the more the tools like Pants can help lower

42:54 the burden and just make that automatic, right?

42:56 Exactly.

42:57 It generally just takes away the pain of how do I adopt this tool?

43:00 How do I run it?

43:01 How do I configure it?

43:03 How do I make sure that everyone in my team is on the same page about how to use these things?

43:07 It just automates away all of that.

43:10 It also supports things like you can run a Python REPL that contains all the dependencies of the bit of code you're interested in.

43:18 It can do, because it has this fine-grained dependency analysis, it can know that even if you have like a big requirements.txt file, it knows which sub parts of that requirements.txt and their transitive dependencies are relevant to every given binary.

43:34 So now you're not managing many different sets of requirements for each of your, let's say, Cloud Functions or each of your Docker images.

43:41 That's all happening automatically for you.

43:43 Nice.

43:43 Let's talk about running these tasks and caching, performance, parallelization, and so on.

43:50 So one thing it has here right on the pantsbuild.org page, and you've hinted at this a few times, is that it speaks Git.

44:00 And so it has this way of understanding changes.

44:03 Yes.

44:04 So built into it is the ability to say, like things like when you want to run tests, you can say run all the tests.

44:12 You can say run all the tests in this directory.

44:13 You can say run all the tests, run this specific test.

44:16 You can say run all the tests that have this tag, this like a tagging mechanism sort of where you can label things.

44:22 Or you can say run all the tests that are affected by changes since this other Git state.

44:30 So you can say, you know, as you're working on your laptop, you can say, all right, run the tests that could possibly be affected based on my changes since the main branch.

44:40 It will internally use Git tooling to figure out what that means.

44:45 So it'll say, OK, which files have you changed and which things are downstream from those changes.

44:50 When you say affected by, how is that determined?

44:54 Does that mean I saw there's some Python code that my, here's my test and my test is importing this and these import those?

45:00 Or does it do like code coverage?

45:03 What does it base its opinion on of what's changed?

45:07 It based its opinions on its view of your code's dependencies.

45:11 Now, almost all of that view comes from analysis of import statements.

45:16 Occasionally, you may have to override those.

45:18 So, for example, if you're using Django, we have good support for Django, but Django notoriously does a huge amount of dynamic loading based on strings in settings.py.

45:27 Right, right, right.

45:28 Occasionally, sometimes pants can actually look at those strings and figure it out.

45:31 It has a mode where you can tell it that if it sees strings in a file that look like module names to assume that those are like imports.

45:38 But sometimes that doesn't work.

45:39 And so you can manually override the dependency inference and you can say, actually, here's a dependency that's important and you failed to infer it.

45:49 Or the opposite, you can exclude a dependency that pants mistakenly inferred.

45:53 But that's extremely rare.

45:55 So it bases that on its automatic static analysis of your dependencies.

46:00 Cool.

46:00 So one of the things that might be important when you're running these steps is, for example, the protobuf thing.

46:08 Right.

46:09 Its step has an output and maybe that output is consumed by some other, maybe by the tests or maybe the tests load up the Python file that was generated by that to go talk to some binary blob and see if it understands it.

46:21 It's really important those run in order, right?

46:23 Yes.

46:23 So from your dependencies plus its understanding of which jobs need which inputs, needs to consume which inputs and which jobs produce those inputs as their outputs, it constructs this very fine-grained workflow graph.

46:38 And that is exactly where the caching and the concurrency comes from.

46:41 So every node in this graph, and it could be thousands even, every very fine-grained node in this workflow graph knows exactly which inputs it needs.

46:49 And the work is done in the right sequence.

46:53 So if you have two pieces of work that neither of which depend on each other, right, in the DAG sense, they are independent, they can run concurrently, and they will.

47:01 So presumably, if you have multiple cores on your machine, they literally will run, they could run in parallel at the same time.

47:06 But obviously, if a little work unit needs the output of some other work unit to be its input, then it will wait for those.

47:14 And how all that is strung together is using all of the work as described.

47:20 As I mentioned, the API for this is Python 3, is async Python 3.

47:23 And so you have these things called rules, which are these async coroutines.

47:30 And the Rust engine strings together executions of these rules based on data dependencies.

47:36 And the data dependencies use type annotations, use just regular Python 3 type annotations, to say, to describe the types of the inputs and outputs.

47:46 And so the engine can say, ah, you, this rule needs to consume an output of this type.

47:52 I have found a rule that produces an output of that type based on some other input that I already have.

47:58 So I will string them together.

47:59 And it will do that recursively until it ends up at sort of the initial data, which obviously it has.

48:06 Right.

48:06 We'll start here, go down this path in that order.

48:09 Some of them, though, can be parallel, right?

48:11 Like Bandit and PyFlake.

48:13 Those just, mypy, all those just look at the code and say, looks good or doesn't look good.

48:19 And here's your warning message, right?

48:20 And so you can, does it by default or just optionally parallelize that?

48:25 No, it will happen by default.

48:27 It will always, I mean, you can control the amount of concurrency.

48:30 Let's say you're running multiple things on the machine and you don't want it to consume all your cores.

48:34 You can tune that down.

48:36 But just normally by default, it will use as much concurrency as the graph allows.

48:41 So every one of those small work units is candid for being executed concurrently with other ones.

48:49 But it is also, and this is really important, a candidate for being cached.

48:52 So many, many intermediate steps can be cached.

48:56 In a typical iterative run when you're iterating on your laptop and you're developing and you make some changes and you run some tests and you make some changes and you run the tests again.

49:05 So much work that normally would be repeated will not be repeated by pants because it will be, the outputs will be pulled from cache.

49:12 Yeah, that's fantastic.

49:13 Because every one of those nodes has its, it runs, as I said, with no side effects.

49:17 It runs in a sandbox with no side effects and its inputs and outputs are statically defined.

49:23 So it can be correctly cached every time.

49:27 So you don't, you as an author of a plugin, for example, don't have to think about caching or concurrency.

49:32 You just write to the API and caching or concurrency fall out of the design.

49:36 That's really neat.

49:37 And I can imagine on a large code basis, things like Bandit and mypy, all those analysis tools, they can take a while.

49:44 And if you just change one file, and especially if you're in the monorepo business, where it's not just for your website, but there's a ton of stuff, you don't want to rerun all those things.

49:55 So it could really get a lot faster.

49:56 You can get a lot of speed increases that way.

49:59 Yes.

49:59 Yeah.

49:59 One of the hesitations to adopt tools like this is it keeps rerunning from scratch over and over, and that makes it slow.

50:06 The other one is while you may run it, your teammates may have a less buy-in on some of these linting, formatting, testing ideas.

50:18 And if you just do that in CI, well, when you check it in, it works.

50:22 Someone else gets their code.

50:24 You merge theirs.

50:24 You check it in again.

50:25 And it breaks the build.

50:27 And it can be super frustrating.

50:28 Is there a way, like a pre-commit hook or some other mechanism to encourage these to be run by everyone?

50:36 You certainly can have a-

50:37 Or is that a personnel problem?

50:39 That really depends on the organization.

50:41 We tend to see adoption of pants at the team level.

50:45 Usually what happens is someone who is someone at the team who is just fed up with a not great status quo drives adoption of it.

50:53 And other team members see, there's obviously initial skepticism because there always is whenever you try to introduce new tooling.

51:00 I'm as guilty of this as anyone.

51:02 But then there's that aha moment of, oh, things got more rigorous and faster.

51:06 That is a trade-off I will gladly take.

51:09 Yeah.

51:09 So very often if CI runs pants, and that is therefore in some sense the definition of correct quality control checks, because whether it's testing, linting, packaging, etc.

51:24 Because you can't merge until you pass CI, there's strong incentive there for you to use the same thing that CI is doing on your laptop.

51:33 Because if you get that to pass on your laptop, it is overwhelmingly more likely to pass in CI, which just gets you to merge faster.

51:40 Yeah, I guess it depends a little bit on how you work as a team and source control, right?

51:44 If everyone can just commit to main, it's much harder to have that.

51:49 But if you've got to go through kind of a get flow, like you work on your branch and then you merge it into main when it's approved and CI passes, well, then all of a sudden, if CI is not passing, you're not merging.

51:58 And yeah, then it trickles back until it gets fixed.

52:02 I think this is a big part of Python, quote unquote, growing up as a language like now, again, it's not fancy shell scripts, it is a workhorse language that people are building massive businesses and systems and data science capabilities out of.

52:18 And you need to, you know, that comes with responsibility to be rigorous about quality control.

52:26 And essentially, you know, having really good CI, having really good iterative development practices is something that is really important for these growing repos.

52:35 And it's why pants exists.

52:37 It is to make that much, much easier and much, much faster than it would otherwise be.

52:42 If you are sort of not running tests, not really running any checks, pushing directly to main, you know, because, you know, historically, Python repos were these tiny toy things that you could do that in.

52:53 You know, you're asking for trouble sooner or later.

52:57 So I'm asking you, how do I get this tool that applies all these, automates all of these engineering best practices?

53:03 And you're suggesting that maybe you start by, start at the core and work your way out, right?

53:10 Like, like just the way that you work together as a team through source control, like you formalize that a little bit, then everything else becomes easy.

53:16 I think so.

53:17 I mean, having CI that is the, you need some way of saying what is correct.

53:22 You need some way of saying, if this CI is green, that means you can merge this change.

53:27 If this CI is green, it means that you can deploy to production.

53:30 You need some way, automated way of saying this code is good.

53:33 Pants makes it very easy to build that ability.

53:38 And once you have that, you never want to go back.

53:42 There is a hurdle you have to, you know, it's less convenient than not doing any quality control, but you sort of have to.

53:49 Well, it's less convenient up front.

53:51 It's less convenient up front.

53:52 It's easier to just not worry about it.

53:54 But you.

53:55 It is.

53:55 But as soon as you spend the whole weekend trying to figure out why the thing doesn't work and you're supposed to release it and it turns out it was somebody else's problem and they didn't test.

54:03 And then like, then all of a sudden that, that little bit of work up front didn't seem so big.

54:07 Nobody goes backwards, right?

54:09 Nobody goes in the direction of fewer quality control checks.

54:12 There is a point in the evolution of your repo where you start adopting them and you just adopt more and more of them.

54:18 And right.

54:19 You don't go.

54:19 It's a rat.

54:20 It's a one way ratchet, basically.

54:21 And with good reason.

54:22 In my mind, all these conversations I've been having, I've been around like teams, I guess, maybe seated by the idea of you talking about it coming from places like Twitter and so on.

54:32 What is the pants story for open source repos?

54:36 Like if I was in charge of HTTPX, I'm not.

54:39 But if I were, what does pants offer me?

54:41 So we do, we're starting to see open source repos adopt pants now, tends to be the larger ones where things like how can I speed up tests become a question even in that regard.

54:53 One thing that we haven't talked about really is security and sort of protecting your own software supply chain, especially if you're an open source project where you are typically part of other people's software supply chain.

55:05 So one of the features of pants is it has very strong support for lock files where a universal lock files that are valued across platforms that lock down.

55:16 Essentially, you generate a lock file that contains pins every single transitive dependency, including these SHA-256s of all of the wheel files.

55:26 And pants then knows how to very efficiently build virtual environments out of the subset of those that is actually necessary in any given situation.

55:36 So if a test only needs some small subset, it will only use those.

55:39 The advantage being that that test gets invalidated a lot less when, because it doesn't get invalid, the results don't get invalidated if an unrelated requirement, upstream requirement changes.

55:51 So even for smaller repos or for, you know, open source repos, apart from all the other benefits, one benefit that I think is worth looking at is lock files and just locking down your supply chain.

56:02 It means you don't have like the left pad issue, things like that.

56:06 You are much more robust to your build is much less impacted by changes on PyPI, by changes in the world at large.

56:14 Yeah, in the homepage, it says it has out of the box support for multiple dependency resolvers in addition to these lock files, right?

56:23 So is this like your own private PyPI server that you can limit what goes in there?

56:29 What does that mean?

56:29 Well, you can.

56:30 I think what that was referring to was that you can have multiple of these lock files.

56:34 So you have a large code base.

56:36 You might have different parts of it that genuinely need conflicting dependencies.

56:41 But you can sort of say, OK, here are like two or three lock files that you are allowed to, you know, you have to pick one for a piece of code or a piece of code can be compatible with multiple, but you have to pick one when you come to build a binary or something like that.

56:54 So, for example, it's pretty common to have, you know, here is a lock file that is for my web application code.

57:00 And here's a different lock file for my data science code, because there are just conflicts between them that can't be resolved.

57:06 There may be no reason to install JupyterLab on your FastAPI server, right?

57:12 Well, that wouldn't happen anyway, because Pants would know, even if you had a single lock file that included, say, you know, NumPy, Pants would know that nothing in your web app imports NumPy, so it wouldn't bring it in.

57:24 This is more when you actually have multiple lock files.

57:27 Yeah, so Pants is very good about shaving down the dependencies, both the internal and the external ones, to just what you actually need.

57:34 But where multiple lock files comes in is when you have conflicts, when your code base is large enough that you genuinely cannot have the entire code base be in lockstep on a single set of dependencies.

57:45 We don't encourage that.

57:47 It's not a great way to be.

57:48 It's better if you can have a single, consistent resolve across your entire code base, but it's not always possible.

57:54 And this is an example of where we designed for the world as it is and not the world as we would like it to be.

58:00 Yeah, that makes a lot of sense.

58:01 I mean, if one of the APIs is written in Django 1 and the person who built it left and there's no reason for it to change, like, just don't touch it.

58:10 Just leave that alone over there.

58:11 But the other part needs to use, you know, newer libraries and sure.

58:15 Yes, that's a great example.

58:17 Yeah, cool.

58:17 All right, Benji.

58:18 I think we might be getting short on time here, but let me close this out with one final question.

58:25 So you talked a lot about the caching and the parallelization and how, like, the dependency understanding.

58:33 So if I want to run these tests, I can just say, run since this last get a tag or whatever, a shah or whatever it is you're going back to.

58:40 What is your personal workflow or common workflows you see for managing that?

58:47 Because at some point I'm like, okay, the stuff up to here is good.

58:50 Now I want to, it's been a few days.

58:52 I want to move forward.

58:54 I know the older stuff is good and we're not changing it.

58:57 Like, how do you sort of evolve this developer workflow sort of history?

59:01 What's your workflow there?

59:02 I rely very heavily on the Git comparison logic.

59:07 So I should mention I do not code very much anymore because I'm now, I'm the co-founder of Toolchain, which is a company that actually provides SaaS and support and services around builds, Python and otherwise.

59:21 And obviously that's where a lot of the pants expertise comes from.

59:25 So we provide things like remote caching, remote execution as a service.

59:31 So I don't code that much anymore.

59:33 But, you know, occasionally when I do, I rely very heavily on the Git diff functionality.

59:39 So my command lines are just basically, and one thing pants has is macros where you can create these, sorry, macros is something else.

59:48 What I was actually referring to, but I can run like pants green.

59:53 I can't really say, but I'm going to use my pants green.

59:58 I can't really say, but I'm going to use my pants.

01:00:04 I'm going to use my pants.

01:00:05 I'm going to use my pants.

01:00:06 I'm going to use my pants.

01:00:07 I'm going to use my pants.

01:00:08 I'm going to use my pants.

01:00:09 I'm going to use my pants.

01:00:10 I'm going to use my pants.

01:00:11 I'm going to use my pants.

01:00:12 I'm going to use my pants.

01:00:12 I'm going to use my pants.

01:00:12 I'm going to use my pants.

01:00:13 I'm going to use my pants.

01:00:14 I'm going to use my pants.

01:00:15 I'm going to use my pants.

01:00:16 I'm going to use my pants.

01:00:17 I'm going to use my pants.

01:00:18 I'm going to use my pants.

01:00:19 I'm going to use my pants.

01:00:20 I'm going to use my pants.

01:00:21 And what you can actually do is run this in a loop.

01:00:24 So you can have pants just sort of watch for file system.

01:00:27 It watches your file system for changes and automatically rerun that logic every time you save.

01:00:32 So often by the time I tab into my terminal, those checks have already run or are at least running.

01:00:39 Yeah.

01:00:40 Okay.

01:00:41 That's my workflow.

01:00:42 This command alias is a cool idea as well.

01:00:45 I'm sure people will dig that.

01:00:46 All right.

01:00:47 Well, congratulations on Toolchain.

01:00:48 Thank you.

01:00:49 It seems like a cool thing to be working on and clearly builds on a ton of work you all have been doing.

01:00:54 Thank you.

01:00:55 Yeah.

01:00:56 We basically feel both on the open source side and on the company side that you should not have to work at Google or Microsoft or Facebook to have a really fast, stable, powerful build experience.

01:01:06 You should have that when you're a 20 person company and when you're a 200 person company and when you're a 2,000 person company.

01:01:11 You should not have to wait to be a 100,000 person company to get that.

01:01:14 Yeah.

01:01:15 You should be somebody whose job it is to set up CI.

01:01:17 I mean, their whole job, not just something they do as part of their job.

01:01:20 Yeah, I guess that's the other side of things.

01:01:22 I see about the developer workflow.

01:01:23 What does it look like?

01:01:24 It's like, let's suppose I have a GitHub and I'm using GitHub Actions as my CI.

01:01:30 How do I get pants to work over there?

01:01:32 So that's an interesting area that we are looking at more and more closely.

01:01:36 And we will have some interesting announcements about that over the next few weeks.

01:01:40 And just a heads up, this episode will probably, for people not watching the live stream, will be out probably in three or four weeks.

01:01:46 So it might actually be real as they hear these words.

01:01:50 We'll see.

01:01:51 Maybe.

01:01:52 But it is very easy to set up GitHub Actions or CircleCI or BuildKite or whatever you're using to run pants commands.

01:01:59 And those pants commands in turn take away a lot of the or handle a lot of the concurrency and caching concerns that normally you would have to really mess with.

01:02:09 You'd have to really drill into your CI config in order to get.

01:02:13 So it essentially makes it much, much, much easier to configure CI because the complexity of, well, how do I get caching?

01:02:21 How do I get concurrency?

01:02:23 How do I speed things up?

01:02:24 Is handled automatically by pants instead of you manually having to write tons of YAML or whatever your CI providers config is in order to get that concurrency.

01:02:37 Here, there's a lot of heavy lifting going on where the system itself is analyzing your code and saying, oh, here are opportunities for concurrency.

01:02:45 Here are opportunities for caching.

01:02:46 Whereas today, with CI of all kinds, CI workflows of all kinds, either you do that yourself or you vary manually or you don't get it.

01:02:56 Interesting.

01:02:57 And maybe as part of the caching, you just say, like you described for the developer workflow, right?

01:03:01 Everything other than, you know, compare that against main and now run it on that diff.

01:03:06 Exactly.

01:03:07 And CI is basically, you set up CI to call pants.

01:03:10 Pants does its magic.

01:03:11 Correct.

01:03:11 Answers come out.

01:03:12 So you don't have to write tons of a CI config because a lot of, so much of the reason you would have to do that is now handled by pants itself.

01:03:21 Yeah.

01:03:21 Awesome.

01:03:22 All right.

01:03:23 Well, very cool project and definitely something to be checking out.

01:03:28 Now, before you get out of there, out of here, but final two questions.

01:03:31 If you're going to write some code, even if you do a little bit less these days, what editor are you using?

01:03:35 I use PyCharm.

01:03:36 Yeah.

01:03:37 Actually, technically I use IntelliJ with the PyCharm plugin because of just habit of, I used to write JVM code and I never lost the habit, but effectively PyCharm.

01:03:47 Yeah.

01:03:48 Right on.

01:03:48 Cool.

01:03:48 And then notable PyPI package.

01:03:51 I really like click.

01:03:52 We don't use it for various reasons.

01:03:54 We need like a lot of control over the CLI, but I really like click for just cobbling together cool tools that have really good CLI interfaces.

01:04:02 Excellent.

01:04:03 And let me hijack the end here just for a moment.

01:04:05 Maybe I should have asked you before to install pants.

01:04:09 It's not pip install pants, is it?

01:04:11 Nope.

01:04:11 It's, so if you go to our website, pantsbuild.org, there's very simple steps for walking through it.

01:04:17 But essentially there's a wrapper script that does things like install pants for you in a virtual env and keep it up to date.

01:04:25 So you don't have to worry about where is this virtual env, how to, which version of pants is in it.

01:04:31 It will look at the version that's in your pants config file.

01:04:34 There's a pants.toml file that contains a bunch of pants config.

01:04:36 One of them is which version of pants is this repo supposed to be using?

01:04:40 And the script will make sure that is the version being used.

01:04:43 Yeah.

01:04:43 So you don't pip install it.

01:04:45 You run this script and it does a bunch of magic on top of the vanilla virtual env experience.

01:04:50 Fantastic.

01:04:51 Okay.

01:04:52 Yeah.

01:04:52 Just pantsbuild.org slash docs slash installation and off you go.

01:04:57 All right.

01:04:57 Thank you so much for being here.

01:04:59 Final call to action.

01:05:00 People want to get started with pants.

01:05:01 What do they do?

01:05:01 So pantsbuild.org and probably one of the best resources is our Slack channel.

01:05:06 So if you go to pantsbuild.org and click on community on that community link at the top,

01:05:11 it'll take you straight into like how to come chat with us on Slack.

01:05:14 So obviously you're going to try and get started without that, but it's, we have a very friendly,

01:05:20 helpful community that firmly believes that there are no bad questions, only bad documentation.

01:05:26 And so Slack is a great place to kind of sample the community, come chat with us, tell us about

01:05:32 your needs, tell us about how pants can meet them or how it can't meet them.

01:05:35 It's open source.

01:05:37 And we have a lot of contributors from all sorts of companies and all sorts of organizations

01:05:41 and all sorts of teams who started that way and got really enamored with what pants can

01:05:47 do and got really involved both in improving the developer experience at their organizations

01:05:53 and also in improving pants itself.

01:05:55 So we really, best call to action is come say hi.

01:05:58 Congrats on cool project and thanks for coming and sharing it with us.

01:06:01 Thank you.

01:06:02 It was my pleasure.

01:06:03 You bet.

01:06:03 Bye.

01:06:03 This has been another episode of Talk Python to Me.

01:06:07 Thank you to our sponsors.

01:06:09 Be sure to check out what they're offering.

01:06:11 It really helps support the show.

01:06:12 Listen to the Local Maximum podcast.

01:06:14 Learn about topics as diverse as the philosophy of probability and Elon Musk's next move.

01:06:20 Just search for Local Maximum in your favorite podcast player.

01:06:23 Starting a business is hard.

01:06:26 Microsoft for Startups, Founders Hub provides all founders at any stage with free resources

01:06:32 and connections to solve startup challenges.

01:06:34 Apply for free today at talkpython.fm/founders hub.

01:06:39 Want to level up your Python?

01:06:41 We have one of the largest catalogs of Python video courses over at Talk Python.

01:06:45 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:06:50 And best of all, there's not a subscription in sight.

01:06:52 Check it out for yourself at training.talkpython.fm.

01:06:56 Be sure to subscribe to the show.

01:06:57 Open your favorite podcast app and search for Python.

01:07:00 We should be right at the top.

01:07:01 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:07:07 and the direct RSS feed at /rss on talkpython.fm.

01:07:12 We're live streaming most of our recordings these days.

01:07:14 If you want to be part of the show and have your comments featured on the air,

01:07:18 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:07:22 This is your host, Michael Kennedy.

01:07:24 Thanks so much for listening.

01:07:25 I really appreciate it.

01:07:26 Now get out there and write some Python code.

01:07:28 Thank you.

01:07:28 Thank you.

01:07:28 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon