Monitor performance issues & errors in your code

#436: An Unbiased Evaluation of Environment and Packaging Tools Transcript

Recorded on Thursday, Sep 21, 2023.

00:00 How well do you know your Python packaging tools?

00:02 There are things like pip, which install a project's dependencies and their dependencies and so on.

00:08 But in this mix, we also have more modern tools such as Poetry, Flit, Hatch, and others, and even tools outside of Python itself which may attempt to manage Python in addition to the libraries.

00:21 To make sense of all this, we welcome back Annalena Popkes for an unbiased evaluation of environment and packaging tools in Python.

00:29 This is Talk Python to Me, episode 436, recorded September 21st, 2023.

00:35 Welcome to Talk Python to Me, a weekly podcast on Python.

00:52 This is your host, Michael Kennedy.

00:54 Follow me on Mastodon, where I'm @mkennedy and follow the podcast @talkpython, both on mastodon.org.

01:01 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:07 We've started streaming most of our episodes live on YouTube.

01:10 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:18 This episode is brought to you by IRL, an original podcast from Mozilla.

01:24 When it comes to artificial intelligence, AI, what's good for trillion dollar companies isn't necessarily good for people.

01:30 That's the theme of season seven of IRL, Mozilla's multi-award winning podcast hosted by Bridget Todd.

01:36 Season seven is all about putting people over profit in AI.

01:40 Check them out and listen to an episode at talkpython.fm/IRL.

01:44 It's brought to you by Sentry.

01:46 They have a special live event, like a mini online conference where you can connect with the team and take a deep dive into different products and services every day for a week.

01:56 Join them for launch week, new product releases, exclusive demos, and discussions with experts from their community on the latest with Sentry.

02:04 You'll see how Sentry's latest product updates can make your work life easier.

02:08 Visit talkpython.fm/Sentry-launch-week to register for free.

02:15 Hey folks, before we jump into the interview, I want to tell you about a new course we just launched, Data Science Jumpstart with 10 Projects.

02:22 This is written by Matt Harrison, who has years of data science and Python teaching experience.

02:27 He brings his tips and guidance to you across 10 different datasets and projects in this new three-hour course.

02:35 You want to up your data science game, I encourage you to check it out at talkpython.fm/data-sci-jumpstart.

02:41 I learned a lot from this course and I'm sure that you will too.

02:44 The link is in your podcast player show notes, so be sure to check it out.

02:48 Now, on to that interview.

02:50 Anna Elena, welcome back to Talk Python to me.

02:54 It's awesome to have you here.

02:55 Thanks for having me again.

02:57 It's always good to have you on the show.

03:00 We had you on several times before.

03:03 We talked about testing and mocking out dependencies in Python.

03:08 And the very first time, this is quite a while ago.

03:11 Yeah, it was.

03:12 Way back in 2018, we talked about the magical universe, 100 days of Python by learning through Harry Potter themed problems, which is very fun.

03:22 I learned so much in that project.

03:25 It was really nice.

03:25 I can still recommend it to anyone to do this 100 days of code.

03:29 It's super fun and super fun.

03:31 Let's maybe do a quick catch up before we dive into Python packaging, comparisons and positioning.

03:38 What have you been up to?

03:39 I'm still a machine learning engineer.

03:41 I'm in a German company.

03:43 I think I was there the last time as well.

03:45 It's called Innovex.

03:46 And we do all kinds of machine learning projects with customers.

03:51 So I rotate from project to project.

03:54 Right now I'm working at a company called Babbel.

03:56 Not sure if you heard of that.

03:58 Oh, yeah.

03:58 It's really a fantastic company since they enable users to learn new languages.

04:02 And I'm working in the speech recognition team, which I like a lot since talking is such an important part of learning a new language.

04:10 And yeah, I'm there as a senior machine learning engineer and helping them build their product, develop it further.

04:16 And I really love it there.

04:17 Yeah, that sounds like such a fun problem to be working on.

04:20 And machine learning is evolving so quickly, right?

04:24 Yeah, especially now with the generative AI, there's so much going on, so much you can do that it's hard to keep track of what's happening sometimes.

04:33 Yeah, it seems like as soon as you have it figured out, something new comes along.

04:36 Are you able to talk about what libraries you're using for that project?

04:40 No, I don't think so.

04:41 No worries.

04:41 I checked before that I'm allowed to say that I work on speech recognition, but that's basically it.

04:47 But that's, I'm going to guess it probably has something to do with Python, but we'll leave it there.

04:51 I won't put you on the spot.

04:52 Awesome.

04:52 So, and that's not a surprise, right?

04:54 To say people are doing machine learning with Python.

04:56 That's by far the most popular way to do it these days.

04:59 Yeah.

05:01 Cool.

05:01 Well, again, that sounds like a super fun thing to be working on right on the cutting edge.

05:06 And tech, understanding spoken word is especially tricky, right?

05:10 For me, it's also so nice since I use the software myself to learn a language and working on something that is useful, not only for you, but for so many people, since it can be so hard to learn a different language.

05:23 That's really nice.

05:23 Yeah.

05:24 It's really fun to work on software.

05:25 It's more fun to work on software that you know, other people, many other people are using it.

05:30 It's a special kind of joy, I think.

05:32 Absolutely.

05:32 Would you say that projects that you work on have to use Python dependencies and virtual environments and stuff?

05:38 Yeah, so much.

05:39 That's actually also why I did this talk in the beginning or why I started working on the talk on this topic, since I was in a different project.

05:48 And the people there asked me, OK, which packaging tool should we use?

05:51 And I was like, wow, this is so difficult.

05:53 I cannot even answer it, since I know that there are so many tools out there.

05:58 I didn't have a good overview of them and also especially not of the differences and what they are good for, what they can do, what they are not good for.

06:07 And then I started digging into the topic and I was like, wow, this is just so complex and so many different tools.

06:14 And yeah, it was really time for a good overview.

06:18 And I think you did a really fantastic job writing this up.

06:21 And you did it in two varieties, right?

06:23 You have the article on your blog and then you also have, you gave a talk at PyCon DE, right?

06:31 So depending how people want to experience it.

06:33 I also gave one at EuroPython, which is a more updated version, I guess, since Rye came out after I gave the talk at the German PyCon.

06:41 So the new video from EuroPython, which is not on YouTube yet, it also features Rai.

06:48 Yeah, we'll talk about Rye.

06:49 It's crazy since it already shows that in a few months there was another change and another tool came up, which is so popular now.

06:56 It's only a few months old, your PyCon DE talk and it's already a little outdated, right?

07:01 That highlights what you're talking about, doesn't it?

07:03 Yes.

07:04 Let's start by thinking about Python from a beginner's perspective, because one of the first things that people are like, okay, Python is awesome.

07:13 Whether they're a machine learning engineer who wants to use PyTorch, they're a web developer who is all excited about FastAPI or whatever.

07:21 One of the huge powers of Python is that you have almost half a million libraries on pypi.org to work with, right?

07:30 And so if you pick one of them, it's awesome.

07:33 It says, okay, this somewhere requires, like, for example, FastAPI requires Python 3.7 or above.

07:40 That's a pretty low bar these days, but there's already two things you have to deal with.

07:44 FastAPI, the package, the version of it so it can't clash, and the version of Python that it runs on.

07:50 And somehow as a beginner, you have to figure out, okay, how do I put all these things together, right?

07:56 And how do you get started with these environments?

07:58 So maybe speak to that just a little bit.

07:59 I think that's a good starting point, since especially when you begin, I remember that for me, the concept of a virtual environment already was really confusing, although it's such a simple thing to understand.

08:11 And then to go from there, maybe how you install packages in the best way, since I think it's always a good idea to start with a virtual environment.

08:20 Since then you have your dependencies in this nice little box or environment where it's isolated from the rest, and you do not have these dependency issues between different projects you might have or might work on that require the same package, but in a different version.

08:36 So you can create this virtual environment or maybe the local environment.

08:41 There's also this new variant, and then you can install packages there.

08:45 So for example, with the FastAPI you just showed, you could use pip to install it, given that you have the right Python version.

08:52 Yeah, wow, this already shows right now, since this is already the next category, right?

08:58 Python version management.

08:59 Yeah, it can be quite confusing.

09:01 So you need the right Python version, which you can handle using a tool or different tools are available for that.

09:09 Then you need to be able to install the package with pip, for example, or another tool.

09:13 And it would be nice or it's always nice to have a virtual environment for your different projects, which you will also need a tool for.

09:20 Yeah, of course.

09:22 So I guess it's already three things, just if you want to get started installing a package.

09:26 So in your article, in your talk, you broke it down into five different categories, these tools might work in, right?

09:33 And depending on the tool chain you choose, you might need to use two or three different tools just to get started, right?

09:38 Or you use one tool that can do it all.

09:40 I think that for most would be the ultimate goal, I guess that we have the single one tool.

09:46 And I remember that you had this panel on packaging, right, where you also talked about the difficulties of creating this tool and why it is so hard to do that in Python.

09:57 And yeah, so I identified five main categories.

10:00 One is Python version management, which we just mentioned already.

10:04 Then you have environment management, where you can create, manage your virtual environments.

10:10 We have package management, which is basically about installing packages and upgrading them when you need a new version.

10:19 And then when it comes to packaging, I first thought, okay, there's just packaging, but there's actually also tools that can just do the package build step.

10:30 And then there are tools that just do the publishing.

10:32 So I split it up into two categories, one for building and one for publishing.

10:36 I think that makes sense.

10:37 Yeah, the publishing stuff, people have less exposure to, right?

10:41 That's farther down the line, you're not really a beginner at that point, not usually anyway.

10:46 Absolutely.

10:47 But I think since many of the tools that do the building step also do the publishing step, you most of the time are going to use a tool that could do it anyway.

10:57 Sure.

10:58 Yeah.

10:58 A bit of real-time follow-up from the audience here.

11:00 Tushar says, actually, the EuroPython videos came out just yesterday.

11:06 So how about that?

11:07 People can check that.

11:08 Oh, that's so nice.

11:09 Yeah.

11:09 Great.

11:10 Yeah, we can link it in the show notes.

11:11 Yeah, we absolutely can.

11:12 Cool.

11:13 Thanks for letting us know.

11:14 So we've already started talking about the categorization here.

11:18 And I'll give a quick shout out to some of the tools.

11:22 Obviously, pip is involved, virtual VNV, but also virtual ENV.

11:27 Then you might start talking about, well, some of the tools that do more, like you talked about.

11:32 So maybe Poetry, PDM, Hatch, Rye.

11:35 But stuff people might not know about too much is Maturin or Enscons.

11:40 There's a wide-ranging set of tools.

11:42 And what you did really nicely in your article and talk is you said, these five categories, let's create Venn diagrams and put into the overlaps.

11:52 And then you talked about the various tools like PDM can do package publishing and building and environment management and package management, not Python version management.

12:03 So that's kind of the way that you evaluated.

12:06 That's the unbiased aspect, right?

12:08 Is that you're like, okay, let's just create some categories and create some ways to evaluate how full featured or how good is this?

12:15 And then you go through it, right?

12:16 For the packaging tools like Hatch and PDM Poetry and so on, I also thought about features.

12:23 And if you scroll down, there's like a feature list and things I thought about what should these tools be able to do or how do they differ?

12:33 For example, yeah, that's the one.

12:36 So one is if it allows you to manage your dependencies, and if it resolves and locks dependencies, since there are some tools like Hatch, which cannot do that at the moment, I know that it's supposed to do it in the future.

12:50 But if you want that functionality, then for example, you might not want to use Hatch at the moment.

12:55 And then there's also there's a large number of peps on packaging, but I picked out two specific ones, one on editable installs, which I think can be quite useful, especially if you develop your package yourself and you want to install it in editable mode.

13:12 Yeah, maybe it's good to mention what this is.

13:14 Yeah, tell people why you care about that.

13:17 Yeah, exactly. So if you develop your project yourself, and you want to make sure that during development, the changes to your package are directly reflected in your environment, you would install the package with pip install minus e for the editable flag and the name of the package.

13:36 And then you do not have to reinstall it every time you make a change.

13:40 This is very useful. And then there's one pip on how to specify your project metadata in the pyproject.toml file, which is like the basic file you need when you specify or create a package where you put all your like general information, the name of the package, the website, your author name, and so on.

14:01 But also the dependencies, you can define scripts there. And there's one tool, namely poetry, which has its own way of defining the metadata, I think, because it was developed before this pep was accepted.

14:15 And they also promised to change it at some point, but they still haven't done that. So I guess that's also something at least you should be aware of when you choose a pool to like poetry that it might have like a few differences in how to specify it in the pyproject.toml file.

14:32 I'd like to hear your thoughts.

14:36 This portion of Talk Python to Me is brought to you by IRL, an original podcast from Mozilla. When it comes to artificial intelligence, AI, what's good for trillion dollar companies isn't necessarily good for people. Can the risk and rewards be balanced?

14:49 That's the theme of season seven of IRL, Mozilla's multi award winning podcast hosted by Bridget Todd. Season seven is all about putting people over profit in AI.

15:00 I think you'll find episode two pretty interesting. As you surely know, LLMs like ChatGPT are all the rage these days. Do they seem like magic? Well, it turns out that much of their power comes from millions of people entering and correcting data in these LLMs.

15:15 Episode two, the humans in the machine, gives us a glimpse into the world of these people behind the AIs.

15:21 For policy junkies, IRL looks at the idea that we're all just guinea pigs in a big AI experiment, like the meal planning app that suggests bizarre recipes such as Oreo vegetable stir fries and flawed technologies that compose more deadly risks when it comes to something going wrong, like self-driving cars blocking emergency responders.

15:40 You'll also hear from people building more responsible ways to test new AI technology. And we find out why it's taking so long to regulate this massive industry.

15:49 That's IRL season seven from Mozilla. Check them out and listen to an episode at talkpython.fm/IRL. The link is in your podcast player show notes. Thank you to IRL and Mozilla for sponsoring the show.

16:04 To me, it seems like a lot of these tools like Poetry or Flit or others as their own thing, they're pretty self-contained and they kind of do the job for most things you need to do for your package management, project management, installing.

16:20 The hash doesn't lock, but as long as you kind of stick to them, you're more or less, you can solve all the problems you need with one. But choosing and figure out how to choose which one is really hard.

16:30 And kind of like with your Rye example is the reason you chose one six months ago might not, there might be a better choice now. So it's good to see them side by side, don't you think?

16:40 Yes, absolutely. And also, that's why I wanted to do the unbiased evaluation. And there's often personal preference that comes in with packaging tools. For example, one of my colleagues, like hates might be a strong word, but he very strongly dislikes Poetry, since in the past, they once did an update or a new version, and it broke something in the older versions, but they did not tell the users beforehand.

17:06 And several people got very upset. And they just said, Okay, I'm not using Poetry anymore. Now, if you were then in a team with people, and you choose a tool, and one of them says, like, no, we are not going to use Poetry, then yeah, this is also has also an influence on people.

17:23 So I think having an unbiased view of these tools can be very difficult, since it's often also team decision that if you're already used to using a tool, it might be easier to just use that in your team. Or if something is already working for you, then it might not be worth putting an effort to learn or get caught up with a new tool.

17:43 Yeah, you don't have to necessarily keep switching to the newest, shiniest one of these, right? Like if it's working for you, whatever you're doing, it maybe that's fine, right?

17:52 Definitely.

17:53 So before we get into this, many of these things we're going to talk about don't come with Python itself. We have pip, we have venv, and set of tools. And I think that that's it out of this big long list of things. Do you have a preference or a tendency to stick with what comes with Python, so you don't have to install other things? Or do you see the advantages of these external tools to be greater and worth it?

18:17 I do see the use. And sometimes for me, although I've created so many virtual environments, I sometimes cannot remember how I need to call venv correctly to create a new virtual environment. And with virtualenv, it's just virtualenv and then the name of the environment, and it's just simpler. And I can remember that. So sometimes it can be that easy that it makes it more useful.

18:41 Other tools that they solve a different problem that's not really related, like pipx and pyenv. There's no real Python equivalent.

18:51 That's true.

18:51 Well, let's go through and I guess talk about probably one that people do less, but is also really important, not package management, but Python management. Want to tell us about that one?

19:04 Okay, so Python version management, I always included a short definition, since there are no proper definitions of these categories. It's just what I thought would be useful. So for me, Python version management means that the tool is able to install Python versions and lets you switch between them easily.

19:22 And yeah, most popular for that is pyenv. And it's also one of the few tools actually that can do that. You can also do Python version management with conda. And then there's Rhino, which can do it and also pyflow.

19:35 But pyflow, I excluded it from my list, since it's not actively developed anymore. It's still in the Venn diagram. But yeah, I'm not sure if it's that up to date anymore.

19:46 But yeah, you can just with pyenv, for example, you can say pyenv install and then like 3.10.4 and will get the version of Python and install it on your system. And then you can just switch between the different versions you have installed.

20:01 And yeah, this can be useful in many different ways. For example, if you have projects that support multiple Python versions, or maybe you just want to install the newest one and check out a few of the features it has to offer.

20:15 Yeah, it can just be nice to be able to switch between the versions yourself or set them for your current shell session and so on.

20:23 Yeah, and this starts to get us into an interesting philosophy here. Many of these tools kind of take over your day to day flow of working with your code and the tools, right?

20:35 So for example, pyenv, you do pyenv local, pyenv global, I guess maybe even more with things like hatch and flit and so on. Instead of just saying Python, my code or Python, you know, dash M something, you would say, like, flit run something sort of, you got to adopt its way of working on the terminal a little bit to get the most value out of it.

21:00 Right.

21:00 And that's also something I found confusing in the beginning. I remember that when I used poetry for the first time, I didn't really understand why I couldn't run my package or my code anymore with Python. But I always had to put poetry run Python, my script.py.

21:18 Yeah.

21:18 Once you understand that this enables the tool to run your code within a virtual environment for you with all the dependencies installed, and you do not have to do anything, then it made click for me. And it made sense. But in the beginning, I was thinking that it was just more complicated. And I didn't really see the point.

21:37 Yeah. And going back to the beginner type of thing, it helps you in a lot of ways, but it's also a new thing that you have to learn in order to get started. It alleviates the need to say, well, now you know, the venv command, you don't just run it, you say Python dash M, and then you activate it. And it's different on Windows. I don't know why it's different on Windows, but it just is. So you just do that. But at the same time, you now have to learn a slightly different way to run it. And so I think that that's an interesting trade off that a lot of these tools make.

22:05 Another thing that I think about when I think about these tools is like you were saying you can't just run your Python code, because a lot of times, this management of the dependencies in the environment that often lives in some kind of hidden place in your user profile or somewhere, right? Whereas if I say Python - venv it makes a folder wherever I run that. And so I can activate it. So for example, if one of these tools were to make the environment locally, instead of in some kind of obscure location it finds, then I could still just activate it and do regular Python things. But if it hides it from me, then I'm required basically, for any practical reason to like go through its terminal commands, its shell TLI, right?

22:53 Yeah.

22:53 Is that something you considered? Like which ones have overrides to put them locally or do it by default? Or is that anything you considered here?

23:00 No, actually, I did not. But that's a very good point for an extension of the post to consider that. I just got used to just calling, for example, you with poetry, you can just say poetry shell, and then it will activate the virtual environment for you, right? So it's just, I guess, getting used to a different way of activating your virtual environment. So that worked well for me. But I think it's just depends on how much you have used the other virtual environment.

23:30 Functionality is before. For me, it wasn't that hard to switch, I guess, since I work a lot with packages, and then it can be very convenient.

23:39 I totally agree. I guess what I didn't really say before, I was talking about like using the new CLI stuff is when you're doing this for yourself, you kind of adopt one and you get used to it. And you're like, this is great. But if I'm following, say, a tutorial on some docs, it'll say, you know, okay, activate the virtual environment this way. You're like, wait, that's not how I do it. You know, now run Python. Wait, that's not how I run it. And so this putting it together of like, I know what I'm doing. And I see what the thing tells me to do. But how do you make sure that those are are lining up?

24:08 And if there's a local environment, that's kind of equivalent, like you could sort of follow the steps and it might still work. It's kind of what I was thinking.

24:16 That's a very useful idea. I will put that on my list of things to look at.

24:21 This could be your PyCon US talk.

24:23 Yeah, I actually have an idea for PyCon US next year already, which I want to work on. I did a lot of packaging work for the last months, but I'm really, I really want to keep this post updated, since I find it useful myself.

24:38 And I was asked by so many people after the talks and also at my company now, many people ask me, which tool am I supposed to use now? And we have these requirements and so on.

24:48 And running like having the virtual environments somewhere where you can activate them also yourself. I think that's a very useful thing to do.

24:56 I never thought about this following a tutorial point of view, but it is very important for learners, I guess.

25:02 Yeah, especially when you're a beginner. Yeah, absolutely. Absolutely.

25:05 It's also worth pointing out there are some tools that you just didn't really evaluate, because you didn't quite necessarily see that they fit totally in the picture like pip-tools, which I'm a big fan of actually.

25:15 But also Mike, either out there asked, was pyenv-virtualenv evaluated as well?

25:21 No, it wasn't. I should write that down. I will do that right now.

25:25 I can guess, but I have no experience with pyenv-virtualenv.

25:29 Me neither.

25:30 The speaking of it is tricky. Okay.

25:33 But I think I put pip-tools in the very end of the post into the categories of tools which don't really fit in. Yeah, also TOX is there. And the author of TOX, which can be used, or which I only knew from testing, where it allows you during testing to specify different Python versions, which you want to run your tests with.

25:53 But TOX also can be used to handle virtual environments. And I was completely unaware of that. But it's still not complete the picture with the five categories, but it's already complicated enough, I guess.

26:07 Yeah, I don't know if it, how much value it adds to like really say, we're going to completely cover everything because part of the value is making a few recommendations as well, I think.

26:17 Yes.

26:18 Not just going, here's a complete list. That sounds more like an awesome list of packaging, which maybe that exists. I don't know.

26:23 Maybe that's also a good point to add for everyone that wants the, like the solution now. I do not have the solution. I cannot like give you the best tool. It really depends on what you want to do, what your team is doing, what your personal preferences are.

26:39 Hopefully at some point, we might have the one tool, which can do everything and is adopted by most people. But at the moment, lots of these tools are really used and also can be useful.

26:51 And there's more variations, not fewer. So I would like to see that too. But it's, it is tricky. And there's so many, I think another one of the challenges to think about that I know I've seen previously around, say, for example, pipenv versus some of the other tools where are you building a library that you want others to use?

27:09 Or are you building an application?

27:10 Yes.

27:11 Right. Because if you over-constrained, say like your lock file, no one can use your library, but that's exactly what you want for your application. So it's totally stable, right? These kinds of tensions are in there. And so it could also be pick the right tool for the right situation.

27:26 In the beginning, when I first worked on the talk, like it is 45 minutes long, which is the longest time slot you can get at PyCon. And I had this diff like or this point with applications versus library in it, but I really had to get rid of stuff, since it was way too long. And it was really hard to decide what to talk about and whatnot, since there are so many points to consider that it can be hard to boil down to the most important facts.

27:53 Yeah.

27:53 I also had live demos in the talk in the beginning for the different tools, but that also took up way too much time. But it can be fun to play around with them to get to know them better.

28:04 It's super fun. The live demos that involve downloading stuff from the internet are scary, though, because it conferences, the internet can be pretty sketchy sometimes.

28:12 That's true.

28:12 All right. So the first area was virtual environment management, and that's tools like VENV, so virtual ENV, PIPENV, which we just talked about, Conda, and then even Rye. So let's maybe talk about some of these you sort of compare. We talked a bit about using VENV versus virtual ENV. I've always just stuck with the built in one for the reason that it's built in. But, you know, sounds like you use virtual ENV more. What's your, what do you find better about it?

28:41 I use both actually. Sometimes I use one and sometimes the other.

28:45 My understanding is virtual ENV is faster, but it's not something I'm doing a ton of. So it's like I'll set one up for a project and then I'm good to go. So I don't, it doesn't really motivate me. One area that I think is important to cover is maybe the files that specify your project and your dependencies.

29:02 Yes, maybe let's do that.

29:04 Yeah, yeah. So traditionally there's been this requirements.txt, which is just lines in a text file, but there's been a, in almost all of these tools, a move towards pyproject.toml.

29:14 That's a very important point to know about. If you talk about packaging in general, that you have one file, which is pyproject.toml. For me, toml was in the beginning of a new config language or format I didn't know about.

29:29 So it's like you have yaml files and json files and toml is this other format, which is quite simple. And it was decided to use the toml format. And in the pep for pyproject.toml, where it was introduced, they also discussed the different formats.

29:45 So it's quite an interesting read. The central file in your package, I already mentioned that you put like general information there, like the name of the package, the author names, where the readme is and so on.

29:59 But it allows you to do very complex things as well. Now you can configure tools there. Like if you want to do formatting with black or style checks and so on, you can define that there.

30:12 Yeah, and you can put your dependencies, what you mentioned there. Then some tools allow you even to specify different virtual environments and how they should look like.

30:24 You can define scripts that you want to run. And then, for example, if you always run your pytest tests, you can have like a command, Hatch run test or poetry run test.

30:37 And it would run the tests for you and maybe also do the coverage report and so on.

30:43 This portion of Talk Python to Me is brought to you by Sentry. You've heard me sing the praises of Sentry for their error tracking and performance monitoring plenty of times on the show.

30:52 But this time is different. They have a special live event, like a mini online conference where you can connect with the team and take a deep dive into different products and services every day for a week.

31:03 Join them for Launch Week, new product releases, exclusive demos and discussions with experts from their community on the latest with Sentry.

31:11 You'll see how Sentry's latest product updates can make your work life easier.

31:15 New announcements will be released every day on YouTube at exactly 9 a.m. Pacific time.

31:21 On Monday, November 13th, performance. Tuesday, user feedback and session replay. Wednesday, data resiliency and platform. Thursday, frameworks, community and integrations.

31:32 And finally, Friday, November 17th, open source sustainability. I'm already signed up.

31:37 Join me at Talkpython.fm/Sentry-launch-week. The link is in your podcast player show notes. I'll see you there.

31:46 Thank you to Sentry for their continued support of Talk Pyython to Me.

31:50 In your article, you link over to the pandas pyproject.toml and that thing has a lot of stuff going on as you would imagine from such a project as pandas, right?

32:00 But you can even specify like project URLs, entry points for just running a command on the terminal.

32:09 Things like if you want to use AWS, you can pip install bracket AWS and it'll, that actually brings in a whole list of potential dependencies or GCP.

32:18 Pretty comprehensive. Way more than just the list of dependencies.

32:21 If you stay there, that is also an important point that in the pyproject.toml file, you define which build backend you use.

32:28 Yeah.

32:28 This would then be where you, for example, maybe have poetry or hatchling or set up tools and so on.

32:36 So which tool you want to use to really do the building step in the, like behind the curtain.

32:43 How do you say that? Yeah. You know what I mean? Yeah.

32:45 Yeah. Here's the build system that it's got here. Build backend is mesonpy for this particular one. Interesting.

32:53 It's less important if you have a pure Python thing, right? Although it's still potentially relevant for building the wheel.

33:01 But if you've got a really complex, like a rust integration or a C++ integration, then how that all happens when you say build, you want to have a lot of control over that. Right?

33:11 And also if you use a tool like poetry, it will set the build backend to poetry, I guess, automatically, which most of the tools do.

33:20 Most of the tools decide which build backend to use. Only PDM is a tool which allows the user to choose the build backends freely.

33:27 I think hatch uses hatchling, poetry uses poetry, I guess. I don't know about the others, actually.

33:33 Sure. We've been talking for a little while. Now we finally come to the thing the person wants to do. pip install a package, right?

33:41 Exactly. So that is package management.

33:43 There are several tools that allow you to download and install libraries and their dependencies.

33:48 And the major one, which everyone knows, I guess, is pip. But there's also pipx, or you could use conda to install packages, but also poetry, for example, like one of these packaging tools.

33:59 And yeah, it will download the library for you and install all the dependencies automatically.

34:04 Yeah, I guess that's the most important thing to know about it.

34:08 Yeah, so a lot of these will make the virtual environment for you. And then you just instead of pip install, you their name install, right? Like poetry install, or sometimes they have add, right? Something along those lines.

34:21 But then they'll figure out where their virtual environment is and install the thing the way you've asked.

34:25 And also, if you use one of the packaging tools, it will do the dependency resolutions for you, which sometimes works, sometimes not so well.

34:34 But I guess this is something that pip is not doing. I think it just tells you about conflict.

34:40 Another important thing that it helps you that many of these tools help you with that pip will not help you with is dependency recording or accounting, I guess is the right way to maybe think about that, as well as restricting it to a particular version.

34:55 So if you add one of these things, it might put the dependency into the pyproject.toml and then also create a lock file, right?

35:02 Yeah, should we shortly say what lock files are about?

35:05 Yeah, tell people about it.

35:06 Okay, so this is the second recap I had in the talk, or which is also in the blog post. First one was pyproject.toml. And the second one is a lock file.

35:16 So in the pyproject.toml file, you would usually have your dependencies, but in an abstract fashion, so you would not pin them to exact versions. So you would not say, I need pandas 2.0.3, but you would set a range or not give restriction at all.

35:34 And then you have the lock file, which really records the exact versions of all the dependencies that you have installed for a project. And if you commit or have that lock file within your repository, it allows to really reproduce the exactly same setup that you have on your machine.

35:53 So you can reproduce it on multiple platforms. And I also linked one, I think the one from Poetry. So if you look at a lock file, it can become huge, since it really has the exact versions of all dependencies and sub-dependencies and so on recorded.

36:08 Yeah, here we go. This one from Poetry. Yeah, that is, let's see, that's 1,685 lines. That is a big lock file.

36:17 It is.

36:17 It also though, it does follow some of the best practices, right? So it says the package mypy in the version of it, rather than just mypy == 1.5.1, it'll have things like, and here's the hash of that, which is, you know, a recommendation to store that.

36:34 But in addition to just saying, here's the hash for my particular install, it has it, here it is for macOS, here it is for this other version of Mac, here it is for Linux, here it is for the ARM version of Windows, or the AMD version of Windows, right? And on and on.

36:48 So it does record a lot of information there. But the main goal of just saying, in a year, if I go pip install or poetry update, or what, I don't remember the poetry command. There's so many, I've read all of the different CLIs for all of them.

37:04 So equivalent of pip install, it'll look at that and go exactly the same thing. Because I see one of the guys in the audience here, I was just speaking with him, one of my courses, I didn't pin the dependencies, and it uses SQLAlchemy, and SQLAlchemy 2 is now out, which is awesome.

37:22 But SQLAlchemy 2 has a breaking change from SQLAlchemy 1. So some code sample wouldn't run. It's like, oh, what's going on? Like, oh, no, just for now, pin the dependency yourself and I'll fix it later today. But it's not a theoretical problem. I like literally ran into it yesterday, today, through by way of one of the students.

37:40 I agree. It can be very useful. Also, if you work on a project with several people, then having the same setup everywhere can, yeah, keep you from having a headache.

37:50 Yeah. How much isolation do you do personally for your work? Do you do like Docker containers, or is it enough to just have a lock file and agreed upon version of Python?

37:59 We often use Docker containers since I work a lot with production environments. But for personal projects, I usually only use the lock file.

38:08 Same here. I don't really use Docker all that much. I find that it's enough with just a lock file.

38:14 Lock file is super important. Maybe, you know, instead of going through all of these, maybe just give your thoughts on some of the, with regard to package management, just some of the things in here.

38:24 For me, especially one important point was there is Conda also, which you can use for lots of things. But the post and also the talk does not go into detail on Conda since it's this huge, like, huge own environment or universe with Conda.

38:41 Also, packaging works a little differently, and the resulting packages will be on the, not on PyPI, but they have their own index. Yeah, there's not a lot of detail there.

38:51 And there's also pipenv. I've never used pipenv myself, actually. I never really had the use case for it, but it has been around for a long time.

39:00 And what I found interesting is that pipenv uses also the lock file functionality and also introduces a toml file, but it's pipfile.toml, which I found interesting since it shows that also some tools before the standard was introduced used something similar, which I found interesting.

39:21 But for me now, I'm not using pipenv since I like having only the PyProject.toml file. Having this additional pip file just confuses me.

39:31 Yeah, because a lot of the packaging stuff you can already do through PyProject.toml. So why have more files?

39:36 Yeah, and I often also have to do package building. So to get a wheel file or just be able to install the packages on different environments, and then it's nice to use one of the other tools since pipenv can do package management and environment management, but not the packaging itself, like building a wheel file or publishing the package.

39:58 When are you using these days for building packages or publishing them?

40:02 I'm currently using Hatch. I like it a lot since it allows you to declare your environments within the PyProject.toml file, and I like to have everything organized.

40:13 So having a single place where you can also say, okay, this is my environment for creating the documentation, and I only need material like mkdocs material for this or having one environment for all the style issues like running black and eyesore and the type checkers.

40:33 I like that a lot, but a lot of people from my company are now using Rye. So I have to check that out, I guess, very soon for some proper project.

40:43 Yeah. I want to save Rye until the end because it's a very different philosophy, not putting a judgment on it, but just it really lives in a different style and philosophy than many of these other tools.

40:53 So you hinted towards this with the packaging panel discussion I had with some folks there, and I think we're going to see stuff going that way. Maybe not exactly with Rye, but in that general vibe.

41:05 This idea of having multiple environments for certain different parts of your program or of your project is really interesting because for small projects, it doesn't matter.

41:16 But as they get bigger and bigger, I was just talking to Brian Okken yesterday on Python Bytes about this, and he brought this up.

41:23 On Talk Python Training, I have maybe where the courses are, I have maybe 48 dependencies that I list in the main top level. These are the things I'm using.

41:33 But there's 250 different packages if you pip install, you know, like build out that whole thing, right? The transitive dependencies.

41:41 Most of the time, it will not install everything. I can get the stuff to run the site all the time, but also the data science analysis stuff and the notebook tools and other things like this mkdocs stuff.

41:54 One of those has a restriction that is something less than X and another part has something greater than X and they just can't go together.

42:03 And they don't necessarily need to live together, but in order just without having a separation of where the dev tools go and where the counting tools go and where the runtime tools go, they get too mixed together, you know?

42:17 Actually, not sure if any of the other tools do the same already, since it all changes so quickly. I haven't checked the other tools in the past four weeks.

42:27 So I just got started with Hatch. And that's also what I mentioned in the beginning. Sometimes when you got used to using a tool and it works well for you, you do not get weird errors when you install it or do things and you find when you have a problem, you find the error messages useful and how it works.

42:43 And I also liked the podcast episode you had with the author. It was very...

42:48 Yeah, with effect.

42:49 Yeah, I really liked listening to it and that he's working on it. And I find it really impressive what he's doing with just... He's not even able to type, right?

42:59 I believe he can. I think it's just limited. So yeah, it's really impressive what he's doing. I think it's great. I think Hatch is cool.

43:06 I have to say, I also used Poetry in the past, which also worked well for me. I have nothing against Poetry.

43:12 Just your teammate does. Awesome. Okay, let's see. We talked about Conda. I think PDM is maybe interesting.

43:18 Yes.

43:19 It's one of the newer ones. So people might know less about it. Do you want to tell people about PDM?

43:23 Also, I sometimes forget about it since I've never used it, but I know several people which like it a lot.

43:28 PDM for me was quite new since most of the tools are based on using virtual environments. And PDM is one of the only tools, I think, that implements a PEP, which is PEP 582 on local packages, which is an alternative way of implementing environment management.

43:47 And the PEP was recently rejected. So beforehand, it was open of whether that might be the new way to do environment management. But I think it's still an interesting approach.

43:59 And yeah, PDM is also, I guess, used by many people. It can do everything except for managing Python versions.

44:07 But it allows you to choose your build backend freely. So you have quite a lot of flexibility. And it's also developed very quickly.

44:17 So like new features are added and it's a very active project.

44:22 Earlier, Mike Fiedler said, pour one out for easy install. Indeed, I would say, I would add to that, that, you know, pour one out for PEP 582, because I really like that idea.

44:34 You know, it was a little bit like the way that node modules and the project.json stuff works for node, where it's just like, if you try to do something, it's just going to go up in the directories until it finds the directory that contains the thing, you know, like find where the virtual environment is at the top and just use that without you have to activate it and do all sorts of stuff.

44:54 And I thought that was a cool idea, but it's not a thing, unfortunately. So I guess it's still a thing for PDM, right?

45:00 Yes, it is. Also, I didn't read about the rejection. So I have no idea why it was rejected. But I know that they always put a lot of thought on or into the rejection. So why they do that, I have to check that out.

45:15 You know, you look at other ecosystems, other programming languages, they've got like five ways to do one thing. And they're just, it seems like it's just constantly being changed to just chase trends. And over the years, that becomes a real messy language and way to do things.

45:30 So I really appreciate that Python says no often, but I will miss this feature.

45:35 Yeah, I think it's also impressive how much work is put into these peps, how much work they do on like formulating their ideas and discussing it very thoroughly to get to a good result.

45:47 Yeah.

45:48 I'm very happy that they do that for Python.

45:50 Me too. And I think another really interesting aspect is just so many people use it, right? There's so many edge cases or scenarios that don't necessarily, maybe this breaks that I don't know about. Let me pose a question from the audience here from demystifying dev says, newbie question, if pip freeze outputs a perfectly usable requirements.txt file, if you want, can't that be used? What's lacking? Why these other tools? I think honestly, the reason I bring this up is this kind of like, is it almost like why do all these tools exist, right?

46:18 It really is at the heart of your whole article or talk, especially now, this is just one of the use cases, right? Many of the tools do lots of other things as well. I think that's why a lot of them exist, especially on packaging such that you do not have to use a one tool or several tools for all the different steps. I still know people which use requirements.txt files. So I do think it can be useful, especially if you do not work with packaging.

46:45 So if you don't want to create a package and pyproject.toml puts or adds another level of complexity, which you don't use, then I don't think you have to use it. I guess that's really specific for your use case.

47:02 But if you have a package and you have all this other information that you need to publish with it, then it's nice to have this one single file with everything, also with the dependencies and not have many different files for different things.

47:16 I agree. I think another aspect of this is this totally works well, but a lot of it's manual, right? So I could pip install a thing and it works, but then if I forget to go and put it into requirements.txt, well, that was a manual step that I needed to remember. Or I put it in there, but I forgot to pin it. You run into the problem that I ran into earlier, right? With SQL alchemy changing.

47:39 And with the other tools, that's their flow, right? You say poetry, add, whatever. It puts it into the requirements file. It puts it into the lock file, right? It installs it. All of those things are kind of taken care of for you. So I think part of it is that the tools kind of do the recommended workflow for you rather than you having to remember to do it.

48:02 That's a very good point. And it also allows you to make less mistakes with your project.

48:07 Yeah. You don't even have to really be aware of that as a newbie. You just say, well, I know I say, hatch, add a thing and then it works, right? I don't have to know. Well, here's why you use the hash and here's why you pin the dependency. It just, it just does.

48:21 You do give a mention to pip tools here at the end. If you go the requirements.txt file way, which actually, honestly, the thing I'm using these days is pip tools. It lets you create a file and then it creates this requirement, kind of like pip freeze. It creates this requirements.txt, but it also lets you evolve that over time.

48:40 Like you can say, I want to upgrade my thing. So if I'm using FastAPI and I say pip install dash dash upgrade FastAPI, it'll upgrade FastAPI, but not the things like starlet that FastAPI uses. Right. And that's another reason to not do that more manual process that we were just talking about.

48:58 Cause it, how do you make sure you update all of the things in a coherent way? Right. That's very, very tricky. So you can still do it, but you, I, even if you do that, I think you got to use something like pip tools or some other higher order, higher order thing there.

49:11 Yeah. I can also understand that it's sometimes it can be frustrating in the beginning. If you have to look into another tool to do what you want to do, if there's the simple hacky way to do it. But if you think in the long run, and also if you work on bigger projects, it's always a good idea to get used to these tools in the beginning, since they save you a lot of work and also save you from doing mistakes that you then have to debug, which is annoying.

49:39 When you're working by yourself, you know, YOLO, you get to do whatever you want, but like you working in a team using something like hatch means everybody does the same thing. And that's actually really important too.

49:50 Rye uses pip tools as well.

49:52 Let's close out our conversation here with Rye because it's different in the way that it's philosophy on how it's, it works for package management. Right?

50:01 If you want to understand Rye, you have to know about Rust, which is a very popular programming language at the moment. And Rust has a very nice setup of how packaging works, since you have two tools, namely Rustup and Cargo, which do everything. You do not have these different tools for different steps. And everyone can like contribute their own tool and gets really messy and hard to understand as it is for Python.

50:27 But it is very simple and easy to use. And the author of Rai wrote Rai completely in Rust and was inspired by Rustup and Cargo. And Rai is also a tool that can do everything. It also is doing Python version management for you, which I guess is easier since it is not written in Python.

50:46 And yeah, it's a tool that can do it all. It was started as a personal project, but there are new versions released, I guess, weekly. Like when I last checked it, it was moving really, really fast. And the author is also the creator of Flask. So he's, I guess, very well known. And that's also why people are adopting Rai very quickly. Yeah. So I think it's a very nice or interesting addition to the whole mix.

51:14 I think it is as well. The most unusual thing, the reason I said it's unlike all of the others, the way that I use pipx is I somehow say Python or I say somehow pip install pipx and then I can use pipx. Or I somehow Python dash m pip install hatch and then I can use hatch for more Python stuff.

51:37 But all of those things start with Python, some version of Python, and then I can do more Python things with them. Whereas Rustup in that world and Rye in the Python world, it says you have nothing. You don't even have Python. You ask for a version of Python and then you ask for environments, then you ask for dependencies. And so it has all the flexibility it wants to do whatever it needs because it doesn't actually depend on you even having Python, much less the right version of Python. Yeah.

52:06 Exactly.

52:07 You think that's going to be a trend? Do you think we're headed that way?

52:09 Wow, that's hard to answer. I'm actually not sure. I think it would be nice. I would really like having a tool that can do everything and get rid of this clutter. Also, since I like everything to be organized and it can be really confusing. And I know that most people are just complaining about packaging in Python. But I know that also it's just difficult to get to the state where you have this one tool.

52:34 I remember that discussion from your podcast with the packaging panel. It's not that people do not want to have this tool. There are reasons that it's so hard to do it.

52:45 It's hard to get everybody to agree, switch over to this thing. Whereas I think Rust was more built from scratch or designed from scratch to have it. And that's an advantage Rust has existing, you know, getting created when it did more recently. Like Python came out when we had Usenet maybe, right? Like certainly the ubiquity of the internet wasn't there. And we just didn't downloading stuff off the internet everywhere, just on your command prompt or whatever it was at the time. It's just not a thing. So it's timing.

53:14 And it's good that newer languages learn from the mistakes of previous ones.

53:19 We probably will end up with something like Rye, but maybe people got to agree on it. That's tough. I guess one really quick thing to close out this whole section, main topic is Tony out in the audience asks, I'm working on a large Python mono repo. So we have all kinds of dependency conflicts and resolutions we have to deal with.

53:37 Maybe just worth pointing out that the multiple environments that Hatch has might address that. What do you think?

53:44 Also in my project right now, we have a huge mono repo, but the different folders also correspond often to different packages. So they have their own pyproject.toml file. So you can keep the dependencies like organized. But if you have lots of dependencies that are for specific things, and they are not necessarily related to what you're doing in a different step, then that can be very useful.

54:12 The Hatch functionality where you can define virtual environments with only the dependencies that you need for the specific task, like creating the documentation or checking style things.

54:24 Excellent. All right. Well, you can see there are many more tabs on my web browser of things I would like to bring up and chat with you about on this topic. But at the same time, we are out of time and it's getting late for there in Germany.

54:36 So maybe we'll call it a wrap here. But before we get out of here, how about a recommendation for a Python package or project that you think is cool?

54:46 I thought about this since there are lots of people who always suggest so many nice packages. But what I'm really using a lot is mkdocs material for building documentation without a lot of work. I've just did that today, since especially if you work on a project with like, which is difficult to explain to other people and you want to have one place and not use Confluence or other tools for documentation.

55:13 This is a very nice tool to use.

55:16 Yeah, it looks great. And more than a static site, it says, it's up search and all kinds of cool things for it.

55:22 And it's very easy to use. I think even for beginners, that is a very easy way to set up a nice documentation for your package, which you can build now with one of the tools we discussed.

55:33 Yeah, absolutely. It is. Okay, excellent presentation. I really like the way you put this all together. It's going to be super helpful for folks. So yeah, thanks. Final call action. People are interested in this, they want to learn more. Maybe they should check out your article, which we'll link to the two conference talks you gave. What else do you tell them?

55:52 Actually, that's the best way to go. The other projects that we discussed on previous podcasts are also on my blog. And definitely check out my GitHub profile. I think I have, for example, a repository on machine learning with machine learning tutorials, which is really popular. So if you like machine learning, that might be useful as well. We can link the GitHub repo, I guess.

56:13 Yeah, absolutely. I'll put it in the show notes. And Elena, thank you for being here. It's always nice to have you on the show.

56:18 Thank you. Have a good day.

56:21 This has been another episode of Talk Python to Me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show.

56:30 When it comes to artificial intelligence, AI, what's good for trillion dollar companies isn't necessarily good for people. That's the theme of season seven of IRL, Mozilla's multi award winning podcast hosted by Bridget Todd. Season seven is all about putting people over profit in AI. Check them out and listen to an episode at talkpython.fm/IRL.

56:52 Sentry. They have a special live event like a mini online conference where you can connect with the team and take a deep dive into different products and services every day for a week. Join them for launch week, new product releases, exclusive demos and discussions with experts from their community on the latest with Sentry. You'll see how Sentry's latest product updates can make your work life easier.

57:14 Visit talkpython.fm/sentry-launch-week to register for free.

57:19 Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm.

57:37 Be sure to subscribe to the show. Open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/YouTube.

58:04 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon