« Return to show page
Transcript for Episode #197:
Modern Python Standard Library Cookbook
0:00 Michael Kennedy: A recent Twitter poll hit around the web and it asked, "What percentage of the Python Standard Library do you think you know?" Someone copied me on it, maybe expecting some really high percentage answer like 80, 90%. In reality what I did answer, and my rough estimate still is, it's probably around 50%. This episode with Alessandro Molina definitely helped confirm that estimate for me. He just published a book entitled "Modern Python Standard Library Cookbook". And it's full of these great little corners of the Standard Library that you might not have bumped into, but you'll be super glad to hear about them on this episode. It's Talk Python to Me, Episode 197. Recorded January 10th, 2019. Welcome to Talk Python to Me, a weekly podcast on Python. The language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @TalkPython This episode is sponsored by Linode and Rollbar. Please check out what they're offering during their segments, it really helps support the show. Alessandro, welcome back to Talk Python.
1:18 Alessandro Molina: Hi, Michael, thank you.
1:20 Michael Kennedy: Yeah, it's great to have you on the show again. It's been a long time since, back on Episode 35 when we talked about TurboGears. And you're back with a new project that I think is really, really cool. A look at the Standard Library through a modern Python lens, which I'm excited to dig into with you. But first you know, maybe just tell us what have you been up to the last couple of years since we heard from you on the show?
1:45 Alessandro Molina: Yeah, actually I do like mostly doing Python development as usual. That's my most common interest and open source development, and I started a bunch of open source project aside of two past years but I'm still moving to progress but then we actually just released version 2.4, which had a major revise of the framework which is great, because we are trying to keep it modern right, standardize our lives, we're doing a great job at that.
2:17 Michael Kennedy: Yeah, that's really cool. What are some of the other projects that you've been working on, the other open source funds you've released?
3:06 Michael Kennedy: That's pretty cool, so it's kind of like, what people are using NPM for on Python web apps to like manage Bootstrap and Angular JS and stuff, along those lines?
4:36 Michael Kennedy: I really like it, that sounds awesome. Cool, so, let's maybe talk about this idea of the Modern Standard Library. So, you know, there's a whole range of people listening who have different levels of experience with Python. Let's just start with what is the Standard Library?
4:52 Alessandro Molina: Standard Library is practically everything that ships with Python itself. I mean, it's known that Python ships batteries included, and the Standard library have those batteries.
5:06 Michael Kennedy: Yeah, exactly they're the first, they're the batteries that come in the box, when you get Python. You get other batteries, but these are the built-in batteries, or the ones that come included, right?
5:17 Alessandro Molina: Yeah, I will say that they are far more than batteries because there is tons of things that it can do for you, there is a lot inside that Standard Library. I think it's also something that is not super frequently covered around blog spots and things like that. We always think we are like, caught up on the most recent, new, cool project, or things like that, that is outside of the Standard Library, of course, because things get included in the Standard Library after years they are around. But that's actually good, 'cause it means it's something you can really rely on.
5:53 Michael Kennedy: Absolutely, it's important that, that has to stay there, and there's actually a huge bar for bringing things in to the Standard Library. I recall, a year or two ago, there was a debate about whether requests should be brought in to the Standard Library, to more or less, supersede the built in HTTP client capabilities, and they decided no, not because they felt requests wasn't good enough, but because request was changing more quickly, than the Standard Library could really facilitate, right? 'Cause it's released with new features, really, every 18 months, and things like that.
6:25 Alessandro Molina: Yeah, absolutely, that's one of the problems of the Standard Library, but also one of the reasons why it's very good. Because the things you have there, you can rely on them for the years to come. Like, it was a especially to me, in the last years, where Python change is very quick, there is really many versions of Python in a very quick time, and it was clear that even if you are on Python 3.2, and you upgrade to Python 3.7, 99% of what we use from the Standard Library is still there, was basically like before, and you can be fine. Of course, between Python 2 and Python 3, there were major changes, but that was expected due to that be.
7:11 Michael Kennedy: That was intentional, right? That was like, okay, we finally have to just bite the bullet and make these changes, but other than that, you're right, it's really stable. There was some blog post or something about, somebody wrote how they hated Python, and one of their reasons, they claimed was, well, if you have 3.5 and you have to upgrade to 3.6, you know, it might not work. I'm like, thinking, no that's actually, exactly how it works, and I'm really impressed with the stability of Python as it changes. I haven't seen any problems. The only problems I've encountered, is I've used features too new, on my dev machine, and then I've pushed it to production, where I didn't realize, oh yeah, that feature is not yet on my server, so that's my own fault.
7:52 Alessandro Molina: Yeah, that happens to me from time to time too, of course.
7:57 Michael Kennedy: Yeah, not often, but unfortunate when it does. So, I think actually, when people talk about how amazing Python is, or when they judge any programming language, and they compare it against another programming language, they might compare the syntax, they might say, well look how much easier, like a for in loop is in this language versus that language or this exception handling block is cleaner than that exception handling block, but while that matters, I think actually, what most people have in mind, when they think about how they feel about a language, is the Standard Library of the two languages, and maybe the broader ecosystem, as well. When I think of like, why is Python awesome, I don't think, well because it's the way the language works with numbers is great. I think, well I can import all these things, and solve all these problems right away, and that's really the Standard Library, not the language, right?
8:49 Alessandro Molina: There are many things about the whole thing in Python, that like, whenever you use the dictionary or list, or things like that, there are still things the language itself but every time you do an import, it means you are going to the Standard Library, unless it's a package you installed explictly, of course.
9:08 Michael Kennedy: Of course, so let's talk about your book, and your book is called the Modern Standard Library, "Python Modern Standard Library Cookbook", right? And I like these cookbook ideas 'cause you're like, well, I'm trying to solve some networking problem, oh here's two little, recipes that I can use to solve that problem. So, let's just start with, what do you mean by modern, in the Standard Library cookbook, here. And then, we've selected a handful of specific recipes that we'll talk about, that are pretty fun.
9:40 Alessandro Molina: Yeah, the modern part was the hardest part for me to find out to pass this line in the implementation of the book, I would say, because it's of course modern because it covers Python 3, and most of the things that are data, Python 3 specific, but I didn't want to go for, like the, only with this Python version, at the time, where I started writing the book, 3.6 was just released, so we are now at 3.7 and it's probably going to happen pretty soon, 3.8, so I didn't want, like, to follow that, mostly since thing, because I know that in the real world, on your job, your probably not going to be allowed to upgrade Python every single time a new release happens, so you probably are going to need recipes that you can apply on your daily life from version of Python which are modern but not the most recent one. So I will say, at that time, to stick to things that work on Python 3.5, and I think that that's actually clear for recipes, because there are different ways to do those things in, like, more recent Python 3.6 or 3.7, but I still went for the way that you can do that and it still works with Python 3.5 and the subsequent versions, so I tried to balance between being modern, and covering as many users as possible, you know, that's always...
11:17 Michael Kennedy: Yeah, it's a tough balance to strike, but I do think the choice of targeting Python 3.5 is pretty good, like, really if you're using Python 3, most people are on 3.5 or higher, at this point, and so. I mean, you do give up a few cool things, you give up data classes and you give up f-strains, and a couple other things that would be really nice to mix in here, but at the same time, it's a little more timeless, a little more broad this way. Alright well, let's just talk about some of the recipes you have in here. So we pulled out some of the more interesting ones, and you know what I liked about going through these, in your book, was a lot of times I'm like, oh, I didn't know that class or that function existed, and did this. And, whenever I'm surprised like that, I'm like, oh, you can send E-mails out of the logging framework, that's pretty awesome, I didn't know that, right, so I think these will be pretty interesting to folks, and let's just start with that, like, reporting errors in production. So, I guess we should, maybe frame this a little, right? There's lots of ways to report errors in production or do the other things we're going to talk about, if you depend on some external library, right? But the goal of your book is, how much awesome stuff can you do, without installing or depending on other libraries, unless you absolutely have to, right?
12:33 Alessandro Molina: Yeah, absolutely, I think that you say that there might seem one of the reason why I wanted to start this book is I felt it because there are a lot of things that people don't know that are available in the Standard Library or they don't know that they can easily be changed to work in a different way or things like that. So, like for example, you mentioned the logging module, and we probably all know that it exists, and many of us are using it on the daily work, toward messages and things like that. But the interesting thing is that it has many different handlers, so you can send the output of what you are logging, in many different places. And one of those handlers, actually sends the output to a SMTP servers by mail, so you can send your logging messages by mail.
13:28 Michael Kennedy: Right, so you've got a logging.handlers.smtp handler class, right?
13:32 Alessandro Molina: Yeah.
13:32 Michael Kennedy: Can you just, plug that in, yeah it's built in.
13:34 Alessandro Molina: Exactly, and of course, you can go your log messages as an E-mail, doesn't make much sense because it will make your life very hard. But for some specific messages, it might make sense, and as you can configure the logging message and to filter only some messages, like in this case, we are talking about logging errors, so, logging exceptions, you can use the logger to only report exception by mail, so every time an exception happens in your code, you get notified, and you are aware, you don't have to wait for the user to cancel and complain that it's not working. You can answer, I already know, I've already fixed it.
14:14 Michael Kennedy: Yeah, that's really cool. And in your example, you do this a lot actually, and I like this, is you create decorators that you can use to decorate a method and say if there's an error here, you know, exception here, E-mail it to us, and things like that, right?
14:32 Alessandro Molina: Yes, exactly. In the recipes, explain how to achieve that decorator, which we usually will draw the decorator at the main function of your problem, so that every exception that happens is reported to your user, but if you are, for example, writing a web application, you might want to decorate the WSGI main callable, because the main application is actually the application server, not your own code. But the core idea is that if you applied that decorator to the beginning of your code base, then everything that crashes within your code base, will be reported to you and you'll know when the program failed.
15:15 Michael Kennedy: This portion of "Talk Python to Me" is brought to you by Linode. Are you looking for hosting that's fast, simple, and incredibly affordable? Well, look past that bookstore, and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E, plans start at just $5 a month, for a dedicated server with a gig of ram. They have ten data centers across the globe, so no matter where you are, or where your users are, there's a data center for you. Whether you want to run a Python web app, host a private git server, or just a file server, you'll get native SSD's out of all the machines, and newly upgraded 200 gigabit network, 24/7 friendly support, even on holidays, and a seven day money back guarantee. Need a little help with your infrastructure? They even offer professional services to help you with architecture, migrations, and more. Do you want a dedicated server for free, for the next four months? Just visit talkpython.fm/linode. It looks like you could probably take this idea and extend it, like if you wrote a web service of some sort, that you could call and report your errors, maybe log it in your own database,
16:18 Alessandro Molina: Yeah.
16:18 Michael Kennedy: You could create your own custom handler, stick it into the logging framework, and do the same thing. Instead of sending mail, it logs it remotely to your service, which goes in your database for reporting and then whatnot, right?
16:28 Alessandro Molina: Yeah, absolutely. You an actually write your own custom of handlers and log the messages and the exceptions, wherever you prefer, but I think that, as usual, most of the recipes and modules available in the Standard Library, are very good when you have an average, medium level of complexity of your needs because when you start getting too many customizations, at this point, it probably makes sense to start using an outside library or service, like for this specific case, there are many that work very well, I used in the past year. So if you need that the solution of using the Standard Library is a great way to report your errors whenever you start a new project and maybe it's more, you don't want to set up a warning for reporting errors, you don't want to pay an external service or things like that, but there's certain point you might want to switch to an actual, external service that is dedicated to that.
17:32 Michael Kennedy: Sure, but maybe you have some level of data protection, or you don't want to share your tracebacks and all that kind of stuff, right? So you might want to keep it custom for, you know, privacy or I.P. reasons. Alright, so that's really cool, and just to give people a sense, this is like, three or four pages in the book, it's not a huge, long chapter on it, right? These are really quick, little things out of many, many recipes, so, it's pretty cool. Another one has to do with temporary objects. So, what people often do, is if they've got to load up a bunch of arbitrary things, we'll stuff them in a dictionary, put the values in there, and pass them around, but those don't behave like objects. I can't say, container.value, I have to say container[value] or contianer.get(value) things like that, right, so it kind of breaks this idea of these truly flexible objects that you can just pass around in Python, right?
18:55 Michael Kennedy: Yeah, a C++, and a C#, and a Java developer wouldn't have this complaint, right?
19:00 Alessandro Molina: No. Yeah, absolutely, and actually in Python, that's not hard at all, you can do the same exact thing in Python, and there is a fast way to implement an object that is, that allows you to do the same exact thing, you can implement your own bunch objects, which are able to store anything in a dictionary, because in the end, it is really the same as Standard Python object works. We store attributes in a dictionary but you can access them using the dot notation, so by saying object dot whatever attribute you want. One of the really interesting thing is that if you don't have too much problems of using the module that it's made for doing, totally different things, you can even use the argparse.namespace class, which does exactly that. But, even being really obvious to other people they really of course, what are you doing and why you importing the namespace for parsing arguments, but it allows you to have accessing the property using the dot notation.
20:07 Michael Kennedy: Yeah, so in here, you define a class, you called it a "bunch", which is like, just an object that can be extended, with arbitrary values and the trick is, to have it derive from dictionary, have it implement __getattribute__ and __setattr__, and then it just reads from, and writes to, the internal dictionary, and converts key errors to attribute errors and you have these little anonymous objects you can just create.
20:32 Alessandro Molina: Yes, exactly, and one interesting benefit is also that it inherits from a dictionary, you can insert all the attributes at the begin every single step, if you want, because it can just provide them as arguments to the dictionary.
20:45 Michael Kennedy: Right, it basically has a built-in keyword argument initializer you could just use, right?
20:52 Alessandro Molina: Yeah, exalty.
20:52 Michael Kennedy: Which is cool. The one trick that I thought was nice, I had seen the get attribute and set attr or before, but one of the problems is, if you go and ask what type it is, you know, type of or type parentheses the thing, it'll always say bunch, right, but you showed a way to like, extend this, just a little bit, so it'll actually report whatever name you want it to, like in a traceback, or in a repper, things like that.
21:17 Alessandro Molina: Yes, you can extend a bunch to report the name you need, so that when you receive a traceback or something like that, as you pointed out, you already, it's actually a user for example, and not just a bunch, you never know what's it contains. And you can actually extend that even harder to pass type checks, if you are, for example, trying to use them to emulate some other object or things like that, you can actually extend it to pass the type checks for the other objects or it can be pretty flexible and convenient.
21:49 Michael Kennedy: Yeah, I really like it. So what are some of the use cases, where you might choose to use that over a dictionary or over a custom, more fixed structure class or something like that?
21:59 Alessandro Molina: Well, for example, the bunch is something that I use frequently when experimenting, when writing prototypes or things like that, I don't want to go and declare the whole hierarchy of classes, before I even have cleared the idea of what I'm trying to do. So, I frequently end up relying on bunch, for that purpose. And also, for example, in terms of itself, they are used to get around some user provided values that you don't know what the user is going to store there, you don't know what the user is going to need to keep around and things like that. And you want to allow the user free access to those attributes, without having to, like, look up in the dictionary, or things like that, so. It's generally, very convenient when you really don't know yet, what you are going to store in that object.
22:47 Michael Kennedy: Right, or it's determined at runtime, in bizarre ways, right? Like, I think of a CSV file, and maybe that's some common stuff, but other things you don't know if they're there, and you just want to put it all, you know, load it up, right?
23:00 Alessandro Molina: You just gave me a pretty interesting idea, actually, it would be easy to extend anything that'll, log the dictionary, like a for example, like a json parse, so you can access the properties in a normal way, instead of having to look up in dictionary.
23:18 Michael Kennedy: I like it, and you know, we'll talk, you know, at least in your book you cover, like, default values and avoiding all the tests, you know, is this key in this dictionary, and you could probably combine those to come up with a pretty clean API for interacting with data exchange of all sorts.
23:34 Alessandro Molina: Yeah.
23:34 Michael Kennedy: Cool. So, one of the next ones that you talked about is templating. Now, Python has lots of templating, outside the Standard Library, we've got jinja2, we've got chameleon templates, you can, like, write your own, if you really want to go crazy, but all those are both external packages, with many dependencies, and often just found in the web, and things like that, but what I was surprised to see is you can actually extend what string.format means, with your own implementation, and sort of, create your own mini-templating language, like jinja2 or something.y
24:09 Alessandro Molina: Yes, absolutely that's actually one of the recipes that I use for many years and now I was not sure about which way to propose the best implementation because there are new ways you can achieve the same result in Python 3.6, for example, to the extreme, you can actually write code in an f-string itself, and you would be evaluated. But, for the previous version of Python, that was not available and also, the kind of code that you can put in f-string is limited to expressions or that you can evaluate. While it normally suggested but by sub-classing the string formatter and define it, you can actually make it about, where run any kind of code that you provided in the template, like even define a function and call it from your template, and things like that. The trick is interesting, because I didn't know myself, how to, like, present this that through formatter, can actually, whenever you write something between the brackets, it cause the middle to look up that value, so you can actually find anything within the brackets, and it will always have to that measure, what's the value of these things that the user brought within the brackets, and at that point, I start parsing it, and driving that method, you can make anything appear, like, you can even take code that was written, within the brackets, and run and put back in place of the brackets, the result of the coded execution, and things like that.
25:48 Michael Kennedy: Yeah, that's a really great trick, so. You basically have a little miniature templating language just in a string, and you just say, format here are the values, and boom, it comes out just like you would, and you even have, what is effectively like, a loop, right, you give it a list of messages, and it can basically, iterate over them, using string.join
26:10 Alessandro Molina: Yeah, so in the specifics example that I give in the book for that recipe, it tries to showcase that you can even achieve loops, because you actually have list comprehension or you can have join, or depends on the kind of output that you need to do, but you can run a list comprehension within template, or you can run a join within the template, and so output the results for multiple entries that depend on the container that you provided.
26:43 Michael Kennedy: This is really cool, I think this is a great example, and so often you see people, sort of, imperatively building up strings in code, and it's just, it doesn't seem like a great way to do it, it can't be great for performance, it's not clear, really what the output is supposed to be sometimes, and this way, I think is a lot nicer, so, quite cool.
27:01 Alessandro Molina: Yeah, of course, because you can add the whole, you can have a better idea of the whole output that you are going to generate because you see it on the fly, instead of having to go to parts of if case this, and string contamination, and things like that.
27:16 Michael Kennedy: You could have, basically, a multi-line string, with more or less, the exact shape of what's going to be outputted
27:21 Alessandro Molina: Yeah.
27:21 Michael Kennedy: Because the place is going to be filled in, it's really nice, much like chameleon template, or something like that, right?
27:27 Alessandro Molina: A simplified version. It's not as powerful, yeah, but it's already can satisfy more something than you might have.
27:34 Michael Kennedy: Yeah, and you know, this is one of those cases where I'm like, I had no idea you could do this, this is quite cool. There was a conversation on Twitter recently, where somebody sent out a poll and said, "What percentage of the Standard Library do you think you know?" And you know, zero to 20, 20 to 40%, 40 to 60%, and so on. And they copied me thinking, I don't know, maybe they were thinking I would check, like oh yeah, like 90%, but it's because of this stuff, I'm thinking like 40% is the right answer, because I know how to do the really common stuff super well, but there's all these little extra, amazing things that are just like, I didn't even know that existed. And so, for example...
28:10 Alessandro Molina: Yeah, I totally agree with you.
28:12 Michael Kennedy: Maybe I'll ask you What do you think, what percentage of the Standard Library do you think you know, and after writing this book, maybe that number went up a lot.
28:18 Alessandro Molina: Oh yeah, absolutely. Yeah I think that's a really hard question, because I believe I probably don't go over, at least 70%, you know, and I wrote the book about it, because there are so many things that already been inside, that probably not many people that wrote the Standard Library know every way you can use it, you know?
28:40 Michael Kennedy: Yeah, it's pretty incredible, so. Anyway, okay, back to the topics. This next one, also, falls into that, oh that is so cool, I didn't realize this was around for us. So, I know about working with memory, I know about working with files, but you have this cool example of, I would like to basically cache something in memory, or load it into memory, but if it gets too big, I need to switch what I'm doing, maybe save it to a file and start reading it from there, because, well, if it's 20 gigs, it's probably not going to work out super well, most of the time, right, so tell us about this.
29:14 Alessandro Molina: Absolutely, that's one of my favorite tools in the Standard Library. When I need to, like, keep around some data, temporary data, maybe a file that the user uploaded to me or something, or something I'm generating, like think of a major resizing, or things like that. It makes sense to do that in memory, because it's the fastest upshot, so it's usually the best way to go, but at a certain point, 'cause you really don't know the type of the input that you are going to receive, if you are resizing a 300 pixels image, you can fit in the memory of every computer, but if you are resizing a 5 gigabyte jpeg, it can start to be a big problem. And that's the point where having something, and that is special that ultimately switches from memory to disk, on a specified threshold, can be very convenient, and that's certainly what the temporary file spooled, temporary file class does, you create one, and at the beginning everything is in memory, up to a point where it grows so big, that it goes over a threshold that you choose, and when that threshold is surpassed, everything switches to disk, and you don't consume memory anymore.
30:29 Michael Kennedy: That is so cool, So, this is the spooled temporary file, and you just give it a maximum size, and you work with it like a regular file, you can write from it, seek on it, read from it, and either it stays in memory, or if it turns out that it was, got to be too big, then it just, you know, writes itself to disk and streams off the disk, that's cool.
30:51 Alessandro Molina: Yeah, absolutely, because I've seen in this this sort of pattern implemented by hand, by many projects, like use bytes always, this size is small, and switch to temporary file if it's big, and but it can actually be done for you, without having to write any code, if you use this spooled temporary file.
31:09 Michael Kennedy: This is cool, to me, it feels like that's something you see a lot in Python, when people come from another language, or some other technology, where they're like, Oh, I need to do this thing, so I'm going to implement it, from scratch, when it could just be, tempfile.spooledtemporaryfile you don't need to implement it, it's done and it's already tested, and it's fast, right? I think that's part of knowing the Standard Library well, right?
31:33 Alessandro Molina: Yes, and it cannot only make your life easier because it's, of course, probably more tested than being around for years, more robust, and things like that, but it can usually be, actually faster because many parts of the Standard Library are implemented in C. So, they can be far faster than the code that you wrote yourself in Python. So it's usually a very good idea to look up into the Standard Library before trying to write something from scratch.
32:03 Michael Kennedy: That touches on this idea of, is Python fast or slow, and I feel often that the answer is both, or either, something like this, right. You can write code that just runs in pure Python, and it could be really inefficient, but soon as you work with something that, just hands off something down to a C layer, either that's in the Standard Library, that's in CPython, or if it's in, say NumPy, or something like that, like all of a sudden, that whole conversation changes, and this is a little bit of, like, you may pick up that advantage automatically, by just using the Standard Library stuff better.
32:38 Alessandro Molina: Yeah, I think that it's the only, if not one of the few languages, where the faster known performance is reversed. The further away you stay from the machine, the faster it goes, you know?
32:49 Michael Kennedy: Definitely.
32:52 Alessandro Molina: The more you try to work with a low level data, there's no where it will be. The more you try to work with higher level data, and functional, the faster it will go.
33:00 Michael Kennedy: That's a really good perspective, I like it. This portion of "Talk Python to Me" is brought to you by Rollbar. Got a question for you, have you been outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know, there's two problems with that. You can't discover all the bugs this way. And some users don't bother reporting bugs at all, they just leave, sometimes forever. The best software teams practice proactive error monitoring. They detect all the errors in their production apps and services, in real time, and debug important errors in minutes or hours, sometimes before users even notice. Team from companies, like Twilio, InstaCart and CircleCI use Rollbar to do this. With Rollbar, you get a realtime feed, of all the errors, so you know exactly what's broken in production, and Rollbar automatically collects all the relevant data and metadata, you need to debug the errors, so you don't have to sift through logs. If you aren't using Rollbar yet, they have a special offer for you, and it's really awesome. Sign up and install Rollbar at talkpython.fm/rollbar and Rollbar will send you a $100 gift card to use at the open collective, where you can donate to any of the 900 plus projects, listed under the opensource collective, or to a Women Who Code organization. Get notified of errors in real time, and make a difference in opensource. Visit talkpython.fm/rollbar today. The next pattern that you talk about is with displaying progress bar, and for for all the Python developers, you know, I think when you type pip install a thing, and you see the little downloading progress bar going across in the terminal or command prompt, like, those kinds of progress bars, right, on the terminal.
34:38 Alessandro Molina: That's a very common need. And actually, I came up with this recipe, because I wanted to showcase a more general need, that for you to continue writing terminal, text based hoster, which is, you really don't know the environment where you are going to run, you know? Maybe my user has a terminal which is very small, or maybe it's full screen, or things like that. So when you try to provide your rououtputte, it's always hard to find the right balance of how much I should provide, how much I should write on this single one, why should I go a new line, and things like that. The progress bar concept is simple for that, because it should always consume as much space as available, and try to be progress on top of that, you know? So it was a perfect case to showcase that in the Standard Library, you have tools that allow you to inspect the terminal, and understand what's the size, and then we like, I don't remember the specific length of the recipe, but I think we are talking about, like, less than ten lines of code, you can provide a fully functional progress bar that fits and adapts to the size of the screen, and things like that.
35:51 Michael Kennedy: Yeah, it's a great little example. And of course, there are things like TQDM, if you want to go outside, right? Which is a cool progress bar, but that's not the same as, I have no dependencies and I still have a progress bar, which is pretty cool.
36:04 Alessandro Molina: Yeah, exactly, yeah if you want to cover all the cases, like TQDM, are very cool when I use these in some project I think it works great, but one thing is using it five lines of code functional and the other is bringing in a whole library.
36:22 Michael Kennedy: For sure, one of the things I liked about this recipe is, you used a lot of cool parts of Python so, you have a nice decorator, that you put on to a function, and that function, long as it is a generator that returns numbers, that can drive the progress bar. So, basically, you do your work, and you're looping through it, and the function can just, periodically, yield out a number, which is 1 to 100, which is the progress, right? That's a great pattern.
36:52 Alessandro Molina: Yeah, absolutely, I think it's very convenient compared to, like, the most common solution that I saw around, when implementing progress bars, is actually to go for a generator that you, like, wrap around your original data source, and at that point, it computes the size itself, but then it requires that you already have all the data available. Like, I can apply that the pattern to a list because I know that the list has five elements, so I can compute how much progress I should do for each element. But I cannot apply that to a generator, because I might not know how much data is going to generate the generator. And with this pattern, it's the best balance, in my opinion, because you just yield your progress, your progress yourself, it doesn't matter what you are generating, it can be in the network of input, it can be on a generator, it can be on a list, doesn't really matter, it just. To create your functional progress, by the correct time and yield the difference is the one to report.
37:52 Michael Kennedy: I really like it, it's quite cool. So, one of, the next one, has to do with the overall safety or consistency of a block of code. Now, a lot of times people think, if they have a try and they have an except catching block, they've handled the exception correctly, and everything is fine, but that's sometimes true, but a lot of times, there's an iterative, or multi-step process, where a bunch of changes have to be made, and either all the changes should be applied, or none of the changes should be applied, you know? Think of a database transaction, right? That's very common, either you get to the end you committed, or roll it back. But that same thing applies in other persistent things. Even in memory, actually, if you're changing different parts in the memory of your app, you should consider this, but, this next recipe has to do with files. So, if I'm going to be making multiple changes to a file or I want to make sure I write all the file or none of the file, how do I do that?
38:52 Alessandro Molina: That's actually a very good question because whenever you handle exceptions, it's obvious that you are handling the exception in your own code, but it's less obvious that there might be side effects from that exception, like, in the case you were writing a file, and you fail at a certain point, maybe it's disk space, or something like that. What you've wrote so far, it's already there, was already written, up to the point where flash the drive, and so on. So you are not really handling the exception.
39:26 Michael Kennedy: Yeah, and even if it's not, like, you run out of disk space, you could be, I would need to write 20 things to a file, and the tenth one, something is none, and I didn't expect it to be and it crashed, right, so it just bailed out halfway through, right? Which is, how do you know that's going to happen? How do you recover from it?
39:43 Alessandro Molina: The way you recover, it's actually not super complex, and it can be made a very small tool, and in the book there is a recipe which is called "Safe Open", so it allows you to open the file, in a safe way, for writing. And whenever you safe open, all the drives that you do to the file happen all or nothing, so if everything succeeded all the drives happen. If something fails, like you have an exception in the middle of, between three lines of code that draw two pieces, then nothing was written, because everything is recovered to previous space. And that's actually done based on...
40:26 Michael Kennedy: I was going to say, I love it. That is such a great thing, like, you write exactly the same code to write to file, you safe open, instead of open, and then either the file doesn't even exist, if there's an error, or it's completely consistent at the end, and this, just to be clear, this safe open is a thing that you created in your recipe, but it's pretty straight forward, right?
40:46 Alessandro Molina: Yeah, it's like, four lines of code, or something like that, and it actually works, like I mean, a database transaction. Just start the transaction, and you can rollback on committed, and then the middle nothing happened
40:59 Michael Kennedy: It's super cool, so basically, the way it works is it, writes to a temporary file, and then it uses, it is itself a context manager, so you put it in a with block, if you exit the with block without error, then it will just rename the file to the real one, what you were targeting, if it exits with an error, just remove the temporary file, which is perfect, it's so clean and nice.
41:23 Alessandro Molina: And that works because their main operation is guaranteed to be atomic, all or nothing. You replace the old file with the new one or you didn't, it's not like when writing that you could have written only part of the output.
41:37 Michael Kennedy: Yeah, it's super. Now, this other one, is an interesting problem and maybe a more complicated area, but still really good, so, in Python we have, now in modern Python, you have basically three ways to do concurrency, right?
41:54 Alessandro Molina: Yeah.
41:54 Michael Kennedy: You've got asyncio, which actually uses only one thread and basically releases that thread to do other work while you're waiting on like a network or database or something, so that's asyncio and it'ssync and await keywords. Then we have threads, and we have multi-processing. And the multi-processing really exists to get around the limitation of the GIL, the Global Interpreter Lock, for computational type stuff, right, mostly?
42:22 Alessandro Molina: Yeah, absolutely. That's the only way you concurrently work around the GIL in Python, because all the other solutions that you mention, suffer from that problem that you, you can actually run only one operation at a time. Which is obvious in the case of async, because that's the end, in the behavior of the feature itself, but it's less obvious for new users for trap, because they expect to be, really parallel. And that's a way, frequently, when talking about parallel operations in Python, multi-processing is suggested, because that way you can actually go concurrent for real.
43:01 Michael Kennedy: So, the way you get around the GIL, the way multi-processing works, is in your code you say, I want to run these functions with these ten different values in parallel, on different, basically, processes, and Python will create tens of processes, or pool it up or whatever, but the point is, there are multiple, other Python sub-processes doing the work, and then you get the answer back, but the challenge can be, if I'm doing threading, or asyncio, I can just have an object and share it, and change it with all the different parts of my code in multi-processing, that memory's not shared, by default. So, then how do you interact with, like, if different parts of my multi-processing thing are generating data and they need access to it, like, what do I do?
43:46 Alessandro Molina: Yeah, there are many tools in Python for doing that, but I think that the most powerful one is the Multi-Processing Manager. And it's sort of, very few people that use it on every day, and they know it, they see it, and things like that, and even between those, they know it exists, they're the last people know that you have some very cool features, like, you can actually replace the whole database system with Multi-Processing Manager, because not only it allows you to share the values that you want across the processes that all forked from the same parent. So when, like using vital multi-process that forked multiple childred to do some job but it also allows you to share data across processes that have nothing in common. They were started on totally different times, different places, and different machines, even. Because the way the Multi-Processing Manager works is that it actually allows you to listen on a port, so it works on an, plain TCP protocol and you fetch and store values in the Multi-Processing Manager by pickling them to these objects, and then sending them across the network. So if you think about that, it's practically like having a like, Redis or any other key value store in Python, itself, it's already there.
45:11 Michael Kennedy: That's a great analogy, that's exactly what I was thinking. I'm like, this is like a little baby Redis that your process creates, right?
45:18 Alessandro Molina: Yeah, exactly.
45:18 Michael Kennedy: Yeah, it's pretty interesting, and the big benefit is the shared data can be changed and created over time, so for example, the other things like queues and what not, they can use for multi-processing, you have to create these values up front, and they can get more tricky, right? So this lets you start and stop processes and the values sort of persist across it, and they can be created after, by the sub-processes, it's like basically a little sub-database, or a key value store, that's just live, right?
45:49 Alessandro Molina: It's like having a dictionary that is shared by all your processes and they can insert values there, or new values there, and I think that it's very cool that it can be easily adapted to work across the network, so you can actually, by really multi-processing tools, and even distribute it across machine and continue to work, using the same Multi-Processing Manager.
46:12 Michael Kennedy: Yeah, it's pretty interesting. So, the next one that we're going to cover is a little bit in this realm, like, I want this thing, and like, I know I could go create a real server for, but I just want to keep it nice and simple, right, we could go get real Redis, but now you have a big infrastructure thing, instead of a single Python file, right? And similarly here, obviously, you know, you have TurboGears, we've got Flask, Pyramid, Django, all these different frameworks that we could go and they're big, huge dependencies, right? They have dependencies, of dependencies, of dependencies to run. But this next example shows you can actually create a non-trivial, but somewhat basic, WSGI HTTP server right in Python.
46:54 Alessandro Molina: Yes, absolutely. That's actually, in my opinion, one of the reasons why Python has so many good frameworks, you cited like four or five, but there are like, I don't know, 100 web frameworks, and something like that.
47:07 Michael Kennedy: They do have a lot.
47:09 Alessandro Molina: Yeah, exactly, and the reason, in my opinion, is that because we have so many building blocks within the Python Library, that it takes no more than a weekend to create your own web framework, you know? And so when you start adding more complex features, of course, it gets far more complex than that, but for running a plain web framework, that's about to request or functions or classes and send them back, it takes no more than a few hours of work, and actually, if you already know how to do that, it's like, a very few lines of code, like 10, 20 lines of code, you can achieve everything you need, including your routing, requests handling, returing responses, and everything.
47:53 Michael Kennedy: Yeah, that's interesting, because what I thought of, what built-in HTTP servers are there for the Standard Library, I was thinking, okay, so I can create up and just listen for an HTTP request, that's pretty easy with sockets and stuff. But, the whole routing, and all that, kind of, response stuff, I didn't realize was that easy to add on, right, like much of it is already built in and things like that.
48:20 Alessandro Molina: Yeah, absolutely. And not only, you don't even need to go as far as sending the request yourself, because actually there is a fully working application server in the Standard Library itself, that by default is single threaded, so you will not be able to send more than one request at a time. But there is actually another mixin that's in the library, that you can apply to the class, to make it multi-threaded. So, you get a fully functional multi-threaded application server, with just one single line of code. And only using what's available in the Standard Library.
48:56 Michael Kennedy: That's really cool.
48:57 Alessandro Molina: At that point, the only part you need to have on top of that is the routing, which is very easy to roll out, using regular expressions.
49:06 Michael Kennedy: Absolutely, and you cover that in this simple, little one. I guess one other thing that it doesn't do, is it doesn't serve static files, necessarily, right? In this first example, but your very next recipe answers how you serve for static files.
49:17 Alessandro Molina: Yeah, absolutely. That's very easy, can actually be extended in just a bunch of lines of code.
50:04 Alessandro Molina: That's one of the most boring parts of writing web applications in the past, that you have to write, to manually go and escape everything yourself. I can still remember those huge files of code, where like every single screen was drafting an escape code. And that's horrible, because you have to do this for every single thing, and if you forget, even one, that's a big security issue in your code.
50:32 Michael Kennedy: It is, and it's not obvious, right? Because, like, you could have regular text, or you could have, like, unicode characters that mean the text, but they don't look like the text. There's just like, all sorts of weird ways that people could try to sneak through. So you don't want to try to do that yourself, that's for sure.
50:48 Alessandro Molina: You don't want to have to care, I mean, that's something I always thought, there should be some easy way that does that for me always, unless I specifically want to write some HTML in the output. And that's actually what that recipe can do, because, again, the method can be expanded in many different ways. And one of the ways you can modify it, is actually by doing the escaping of everything you provide to the formatters. So whenever you are reading in your string, all the variables that you're injecting that string, can be escaped for you, so you don't have to care. And so, when you're writing your webpage output, or your E-mail output, or whatever you're trying to send out to the whole world, you don't have to care about proper escaping everything yourself, the formatter will do that for you.
51:39 Michael Kennedy: Yeah, I really like it. You basically, just use standard string format, and wherever the input, the variable values go, they either get escaped, or not escaped based on your pattern, you can say, you can either mark them as, like, safe HTML, 'cause you want to dynamically generate them, but you need to stick it in there 'cause it's your code, or you're taking user input, and that definitely needs to be escaped.
52:03 Alessandro Molina: Yeah.
52:03 Michael Kennedy: It's a cool pattern, yeah, yeah, cool. Alright, the very last one, that we're going to talk about is tracing code. So, understanding what code is executed, and you had an interesting comment about how it's not just useful for debugging, but it's also really interesting to just understand what a new library does. Like, if you want to say, I'm going to run this function, what does it do, if it could actually show you the sequential Python, that had executed, that would be kind of' neat, right?
52:32 Alessandro Molina: Yes, exactly, that's one way I most frequently end up using this recipe, actually. When I want to see what's going on in a new piece of code, the pdb debugger is not always the best way. It's very good when you have to pinpoint a specific point and understand what's happening right there, at that moment. But when you want to get the general view of what that whole package or library or certain function is doing and how, where things go when you're do something, so you want to follow the flow of the code itself, it's not as easy. You end up, like, spending hours just writing next to the pdb, to see where it goes next, and things like that. And what this recipe can do for you, is leverage what is available in the Standard Library to trace the code that Python executed, and tell you the output of the actually, each code was run, and show you the source code that was executed so you can understand, hey I called these methods, and by calling that method, I also ended up calling all these other methods, and I executed these branches and so, that's why I got that answer, now I see, now I understand how I ended up at that response.
53:51 Michael Kennedy: Yeah, I think it's great, and you could even see all of this library is actually calling into this other library.
53:58 Alessandro Molina: Yeah.
53:58 Michael Kennedy: And then, maybe even, why, like why is this a dependency? Oh I see what it's doing, it's using it here.
54:02 Alessandro Molina: Yes, exactly. You end up discovering a lot. It's like having something that goes through the source code for you. You know, whenever you want to understand a new library, you usually end up, going to give up, or something, like that, open the source code and start reading, you know? And you'll see that here, it caused that, so I go and looking for the source code, to where that function is implemented, and so on, you do all this work yourself, but the tracing module can, actually, do that for you, and so, it generates a single flow of everything that happened.
54:36 Michael Kennedy: I think that's really cool, a cool way to think of it, because, when you open up somebody else's code, that you've never seen before, you're like, well, alright, what is important, what isn't? I'm going to have to sort of' sift through this, and figure out, okay, it looks like this is where the action is, and I'm going to pay attention to this and it's kind of a little bit of a detective job, whereas this, it only will show you what executed, so you can, kind of, ignore all the other stuff, and just see the part that it actually used, that's pretty cool.
55:03 Alessandro Molina: Yes, and you just have, like, to newly decorate the function that you want to trace, and you will get the output printed. So, it's very easy to apply.
55:12 Michael Kennedy: That's cool, so I definitely, I'm going to try playing with this, as well. Alright, well those are the ten recipes that we chose to talk about, because we thought they were pretty cool, but there's a bunch of other ones, many, many more. How many are in the book, do you know?
55:25 Alessandro Molina: I have to admit, that I don't remember the exact number, but there are around, like, from ten to 15 recipes for each chapter, and there are 15 chapters, so. We are more than 100, for sure.
55:38 Michael Kennedy: Definitely more than 100. So, there's a lot of these types of little things in here, and I think this is a really great book, and I'm happy to highlight it, because, like I said, I learned a lot just going through this here, and I'm sure, everyone who checks it out, will learn even more, because they'll go through all of them, not just the ten.
55:55 Alessandro Molina: Yeah, I really hope that, at least a single person, by reading the book, will say, wow, I didn't know this. That was my whole purpose, for the whole time I was writing the book.
56:04 Michael Kennedy: That's cool, I think after going through all of this book, I'm going to change my answer, from 40% to 50%, of how much of the Standard Library I know. It's cool, it's really great, I appreciate the topic. So, let's leave it there for that, but I do have the two final questions for you. And I'm going to change it up, just a little bit, so. If you're going to write some Python code, what editor do you use?
56:25 Alessandro Molina: I usually use PyCharm for most of the big projects editing and for small hacking around, I started using Visual Studio Code, for these entries.
56:36 Michael Kennedy: Yeah, nice, that's exactly what I do.
56:38 Alessandro Molina: Yeah, that sounds strange, when you tell people, they always look at you, Visual Studio? No, it's pretty good editor, the Code version.
56:46 Michael Kennedy: They are doing a really good job, and they're putting so much energy into the Python space these days, so, yeah, I think it's a great answer. Alright, that's the editor, and like I said, that's basically the same way that I am and use mine. Now, I would normally ask you a notable PyPI package, but let's mix it up, and talk about a notable, standard library module package, that you want to just highlight.
57:06 Alessandro Molina: Okay, that's an interesting question. There are so many great modules within the Standard Library, that it's really hard to pick one, but if I really have to pick a single module, I would say that the logging module is one of the most fascinating ones, not because it can be the one that you really use, like most often, or it's more feature rich, but because there's so many ways it can be set up, configured, there are so many side effects on the things you do, that you can go on learning about logging for years, and you will never know everything that the logging module can do, how they interact, or the configuration format that it supports, and things like that.
57:48 Michael Kennedy: Yeah, I think that's a great answer. I totally agree with you on that, by the way. Just like you could learn it forever, you're never done learning it.
57:54 Alessandro Molina: Yeah.
57:54 Michael Kennedy: So, people are excited about Standard Library, how do they learn more, final code action, what should they do?
58:01 Alessandro Molina: Just go and read the Python source code. That's the way that I learned most things, that you know, you just open the Python repository in Github and start reading modules. Whenever you want to see what's going on, and things like that, that's really the best way in this work more for me. Because that's the only way you can actually see the hidden side effects of those functions that you use for reality but you never understand why but you never really know, why they are working that way, and that's the way you also discover that something like the formatter can be sub-classed, way of its behavior, in different ways, or things like that, because those more advanced views are not really documented. They are more in time of the day I would say of the Standard Library. But once you know that they exist, they have been there for years, and it's pretty safe to leverage them.
58:55 Michael Kennedy: Yeah, that's great. I definitely feel like there's a whole bunch of stuff to explore, and even more so after talking with you about this. So, thanks for being on the show.
59:04 Alessandro Molina: Absolutely, thank you for having me.
59:05 Michael Kennedy: You bet, bye.
59:05 Alessandro Molina: Bye.
59:08 Michael Kennedy: This has been another episode of "Talk Python to Me", our guest on this episode was Alessandro Molina, and it's been brought to you by Linode and Rollbar. Linode is your go-to hosting, for whatever you're building with Python. Get four months free at talkpython.fm/linode that's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the contacts to insight you need to quickly locate and fix errors that might have gone unnoticed. Until your users complain, of course. Track a ridiculous number of errors for free, as "Talk Python to Me" listeners, at talkpython.fm/rollbar Want to level up your Python? If you're just getting started, try my "Python Jumpstart by Building Ten Apps" course, or if you're looking for something more advanced. Check out our new async course, that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle, it's like a subscription, that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed, at /itunes, the Google play feed at /play, and the direct RSS feed at /rss on talkpython.fm This is your host, Michael Kennedy. Thanks so much for listening, I really appreciate it. Now get out there and write some Python code.