Monitor performance issues & errors in your code

#323: Best practices for Docker in production Transcript

Recorded on Monday, Jun 14, 2021.

00:00 You've got your Python API or app running in a Docker container. Great. Are you ready to ship it to that hosted container service and head off to production? Not so fast? Have you considered how you'll manage evolving the dependencies and addressing security updates over time? Not just for the base OS, but for the installed packages, about your pip install dependencies. Are you running as root? You don't know. The answer is yes. We'll discuss these and many more issues with it Itamar Turner-Trauring on this episode. It's taught by me Episode 323, recorded June 14 2021.

00:48 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via @talkpython. This episode is brought to you by SENTRY and LINODE . And the transcripts are brought to you by 'AssemblyAI', please check out what they're offering during their segments. It really helps support the show.

01:16 I'll have a quick announcement before we dive into the interview, over Talk Python Training, and we just released our latest course, Python powered chat apps with Twilio and SendGrid. Have you ever wanted to create a chat bot using Python in this course, we'll be building an ordering system for a tech savvy bakery. Our customers can place orders over WhatsApp and during the chat back and forth, we'll integrate data from our flask API, answering questions such as, what's the menu? What does that cost and so on. Then we'll integrate this into a suite back end that sends receipts as PDFs over emails, and allows our bakers to see new orders and then mark them as fulfilled and notify the customers when they're done. In short, it's cool tech and a super fun project. Oh, and one more thing, this course is free for everyone. If this six hour course sounds fun, just click the link in your podcast player show notes and jump into the course. Now on to the Docker best practices Itamar Welcome to talk python to me welcome back to talk python to me, it's been some time feels like a year or so I'm not sure exactly how long it's been. Last time we were on, we were talking about an entirely different topic. So you get to bits on the the mind map connective sort of relationship of topics here. We talked about Phil, and profiling data science. That was fun. Yeah, I sort of have found myself talking about a bunch of different subjects. And like, some people are interested in both some people are interested in the other. And Docker is the other thing I've spent a lot of time sort of researching and writing about. Yeah, it's, I think the data science profiling one was really interesting, because profiling has all these challenges. And much of it is more focused around profiling running applications or profiling code. That's all in Python. And so if you need to profile, like, say Fortran code or other weird, or a mix and match libraries, then that was sort of that that topic, right? Yeah. So feel as a memory profiler for Python, and the kind of specifically for batch processes like data science, scientific computing. And so if you're doing scientific computing, they'll there'll be a bunch of code in Fortran and C++ and RUST and, and so you want to access that memory, like sort of profile memory across all the languages you're using. Because if you've got some big glob of C code, Python thinks it's just a pointer. Tiny pointer. But it turns out to be huge. Guys, if people can check out that episode. They're interested. And yeah, just give us an update on what you've been doing. Since then, I've actually been trying to turn Phil into a sort of make an alternative version of it. You can run in production profilers often have performance overhead, it's a field take, like 40% performance off the night, trying to make something that will run with like 1% 2% overhead. So you can run on production and just always get reports by your memory usage for any job. So if it's like six hours, and it crashes, crashes, or just uses too much memory, go back. And look, oh, that would be fantastic. Yeah, we have that for like profiling in terms of performance. On some systems, you can plug them in, and they'll kind of give you real time. How is my app doing in terms of you know, here's where it's spending its time or it got slower? Maybe it's even just measuring like request response. But memory profiling typically been pretty intensive. Right. So that'd be cool. If you could get it down to that level. Yeah. And this is a very good pen, fulfill on this project and are very good pandemic projects. It's like, really, it's quite difficult to do and it's like, but it's something that is completely under my control. And we'll get to Docker and Docker is like, there's this giant ecosystem and they all have differing opinions about how to do things and like, everything settled broken around the edges of ours. Here. It's like

05:00 I have a box. And it's a very complicated box, but it's under my control. And so I can use it. It's kind of relaxing environment or the world is not under my control.

05:11 It's been a crazy time, hasn't it? Yeah. Yeah. I feel like we're getting used to it. It's odd. But you know, people just get used to whatever water they swim in eventually, I guess. Yeah. Let's talk about a darker a little bit. So hasn't been that long since I had. That's also want to shout out quick Episode 274 is when we talked about Philip, people want to go back and check that out. But I had Peter McKee from Docker. Over there, you come and talk about sort of what is Docker, you know, give us an update on Docker that company and like sort of set the stage for Python developers, right to kind of get going on the dev side and just start using Docker. So that was Episode 308. And that was fun. But then recently, you gave a talk at PyCon called 'zero to production ready', a best practices process for Docker packaging. And so I thought that was really interesting. And wanted to have you on the show. So we could dive into Docker best practices for Python. But also your focus is really on production, not necessarily development, right? Yeah. Maybe we start there, like, What does Docker look like for software development as a, I just need to make my stuff run. So I can code it and test it out versus I know, zero, downtime, Kubernetes, or whatever it is, you're trying to do type of thing. But one of those two worlds look like and maybe tell folks about when they should care about what are the advantages or whatever. Yeah, so what Docker gives you is a sort of package that contains all the files you need for the file system contains Python, it contains all the system libraries, you need to run your Python extensions and contains all your Python dependencies, contains all your code, and contains a script to launch your code. And so as a starting point, this is useful for development, because if you're, say, on macOS, or on Windows, and you're deploying to Linux, you can run something locally, that is the same on across different computers, even if you are on Linux, like my, I have one machine that's Fedora 33, I have another machine that's Ubuntu, like they're different in in a bunch of subtle and not so subtle ways. And so having a Docker container, when I'm developing, I can have a completely consistent environment. And then near that, that environment, when I want to then take that code and run in production, it'll be exactly the same there, right, I had somebody reach out to me a little while ago, and ask something to the effect of, I've got a bunch of different developers on my team, and I want to make sure that they all have the same version of Python in the same packages, right. And that's a legitimate thing that you might want to do, you might want to make sure that those are exactly the same. I think maybe, in general, there's probably more of a concern about that than an actual problem there. You know, a lot of times, either these things are going to basically work or they're going to utterly fail. I think one of the scenarios maybe where it matters more is data science, where there's slight changes in algorithms, which might lead to different, you know, ways you train the model, which might lead to different, like it could those kinds of changes, but say in like web apps, or UI apps or something like that, it's it's easy just gonna work, or it's gonna completely break that said, you know, this situation you're talking about with Docker for development, like kind of solves that, but to a much bigger degree, right, because you can specify in this image, we have exactly this version of Python compiled in this way, we have these libraries installed with this version, we have these environment variables set, and this subsystem of Linux installed as well, but not that other or you can completely control it way more than just I want the same version of Python, right. And you can then go further something like Docker compose, or composer to start up a little network of containers. And then it's very easy to say, okay, want to spin up Postgres or I want to spin up Redis. Whereas traditionally, this would be a pain in the ass with Docker and Docker compose, you can spin up a little all your dependency servers really easily, then even if you're not using Docker for your own code, you can use Docker for your service as you depend on to really easily spin them up, right? I need Redis running in this way, and then Postgres and that way, and I just need them all configured and to be able to talk so Docker compose up, right, something like that. Yeah, yeah. Yeah, Docker compose is a way to sort of run it on a network of services makes that really easily. Yeah, another big advantage before we get off of the development side of things, is onboarding new people and new hardware, right? If you've got something really complicated like that, and you get somebody on the team, instead of spending a lot of time trying to get their system put together in the right way. You just go saw Docker do this. And yeah, there's there's open source projects for like, you can serve the development environment that will also provide a dockerfile to just let you run some tests easily. Just because you're a ongoing, you're only submitting one patch. You want to run some like some tests.

10:00 on it, you don't want to go the whole thing. So it's very nice. And they provide a way to run the code and Docker. Yeah, absolutely. That said, I don't generally do my development a Docker, I just just have virtual environments and enroll with that. So it's not always required a couple of thoughts from folks out in the livestream can KIm Van wyk Hey, Kim says Docker compose is an excellent way to make sure all the developers are using the same tools and versions, and it's just much easier to pass around a Yama file. Yeah, compose? Yes. Like, I remember when composer first came out and was called the fig, I think it was, it took Docker from saying really neat to something really useful. Yeah. Right. The promise of Docker is that I can have all these different, if I want to run like we just described, one, one Redis, I want to run maybe a celery back end, I want to run Postgres, and then my dev code is going to run and talk to all that, well, keeping those all up to date, you know, start making sure they all build those files, and they all run and maybe they run in the right order. But all of a sudden isn't fun anymore. But if you can create a compose file and just say, here's the set of containers that needs to work together, bring them all up in the right order, and make sure they're all up to date, got their recent build, and so on. Like that's a whole another level of the promise of containers. Yeah, also, I don't know. If you know anything about this, I'll maybe take a wild guess here. But Daniel Chen out in the livestream aisle says question, is there a Windows? Is there any difference between wsl two for the Docker back end compared to Hyper V, or is Hyper V more for backwards compatibility legacy support I, in general, not use Windows that much. But basically, there's two. Since you're running Linux, and Windows is not on Linux, you need to have some way of running Linux, in the past, the way you do that you would run a virtual machine. And that's what it does in macOS two, I believe windows subsystem for Linux is a way to transparently run Linux applications on Windows, and Docker supports it these days, I suspect it would be faster. But that's just a guess. So I don't have a good idea. That's my thought as well that they would be a little more integrated. Probably you could more easily do things like mount windows, file system folders from your Docker container. I didn't maybe you still can with the others, probably you can, but expect to be faster at least Yeah, yeah. It seems like if you want to run on Linux, you're probably probably closer. It's definitely more lightweight, Hyper V would be running a full on Linux VM, and then hosting Docker in that I'm pretty sure. Yeah. Cool. All right. Well, hopefully our guesses, Daniel are helpful for you. All right. Well, let's talk about your talk that you gave it in PyCon . I mean, giving talks today at conferences like we started the show off, like it's a weird world, right? I'm giving a talk and both doing a live stream podcast on a conference tomorrow at the Manning developer productivity conference. How can I do that? We record it, we publish it, and then we have a live q&a afterwards. So the presentation of my recording will be during the Python bytes recording tomorrow. But then the live interactive bit will be actually after So yeah, that's the commerce world we live in. And so PyCon this year was virtual, you put together a really nice presentation, sort of in this format. And yeah, like I said, I got a lot out of it. And I liked what you covered there. Yeah. And so the starting point is you have the service, now you want to run it in production. And this is a very dramatic departure from running things locally. Because locally, the thing you're prioritizing is basically our feedback loop, like your development feedback loop. Like, if you're a web developer, you do like you save your code, like can you reload the page to have the new stuff running other sort of applications as feedback loops, a little different, but your goal is just as quickly as possible to interact with your code. When you're in production, you have to worry about a whole bunch of other issues, because it's actually you have users who are going to be interacting with the software or the data it's emitting won't be used in the real world. It's no longer just something you're working on. It's a thing that actually has some, the output actually has some weights and meaning some importance, you have to approach in a different way. So some of the things that come to mind here would be downtime, in a perfect world zero downtime in a reasonable world, a couple of seconds of downtime in the world of some bizarre web companies that I cannot literally cannot understand eight hours of downtime, because we're deploying the new version of the site. So Sunday, it'll be down like, what I just literally, I got this a while a couple months ago for something I was using, there's going to be hours of downtime for a site, just due to his expose, he said to upgrade to the new version of the site, just like that should be a button folks. That should be not long. Anyway. One of the things is downtime, right? You want to focus on that. And you don't care about that at all. With development. I mean, you want it to be somewhat responsive, but it doesn't matter if it's down for a moment.

14:52 This portion of talk Python to me is brought to you by SENTRY. How would you like to remove a little stress from your life? Do you worry that users might be having

15:00 difficulties or are encountering errors in your app right now? Would you even know it until they send that support email? How much better would it be to have the error and performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in that report. With Sentry this is not only possible, it's simple. In fact, we use Sentry on all the talk Python and web properties, we've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got their support email, that was a great email to write back, we saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users today, create your Sentry account at 'talkpython.fm/sentry'. And if you sign up with a code talk python 2021. It's good for two months of Sentries team plan, which will give you up to 20 times as many monthly events as well as other features. So just use that code talk Python 2021 as your promo code when you sign up.

15:59 Another one that you made a big deal out of that matter is security. You don't want to be in the newspaper or the news website, front pages for leaking the largest data breach ever or something like that, right? Yeah, that's, that's embarrassing. Basically, once you're packaging for production, you're at the intersection of a whole bunch of processes. This is where it starts getting complicated if you're having your coding, and then you have this image, and then you might want to run some tests with it, maybe integration tests, you're going to deploy it. And then when you deploy it, you might be upgrading an existing package, existing server to server to batch process, things are a little different. So there's deployment, and then things might go wrong in production. And then you might have some sort of feedback mechanism. And maybe you're going to try to reproduce the bug locally. So all of these different technological organizational processes, have some interact in some way with your packaging. And so it basically makes it a lot more complicated. And then you add into it all the different technologies that are intersecting and packaging. There's a lot of details to get right. And it gets complicated very quickly. Yeah, another area that you want to get into has to do with making sure that you're running the latest version, but you're not necessarily every deployment, just grabbing the latest version. So you need some way to inject stability. And you need some way that that stability doesn't lock in, like computer vulnerabilities or, or any of those kinds of issues also allows it to keep growing, right? Yeah. And this is sort of a and one of the more significant examples, but an example of the bigger picture, which is packaging as a process. And so it's not just about writing some configuration files, it's going to interact with the way you write code, and it's a thing that's parts of it are going to continue over time. So you're not when you're packaging for production, you're not just writing a few config files and calling it a day, you are actually need to think about need to set up these ongoing processes for things like security updates. And for things like dependency updates, right? It's one thing to get it running on a cluster or container cluster, it's another to say, and here's how we're going to keep the software healthy and running over time. Right? Yeah, you need to sort of think through the implications of what you're doing. And it's not just a one off thing. It's an ongoing thing. Yeah, absolutely. All right. Let's dive into some of the details. So it turns out, I discovered today, as I was pulling up your website, that you've actually written a whole bunch of stuff about production , ready Docker packaging, and that you're actually working on a handbook, I end up doing this a lot as well, I end up I'll spend you know a month doing tons of research and examples and thinking about a course and like, oh, there's a couple of nice presentations or conference talks I could pull out of here. And yeah, it's it's a good way to do it. Right. So you've been thinking a lot about this, not just for this talk, but beyond, right? Yeah, I've sort of spending two years on it so far, like, there's like, three different products up there. I've done training, there's like a lot of articles these days, just adds up. And I've spent a lot of time looking into this, because it turns out, it's, I should say, I don't like Docker packaging. Okay, this isn't the thing I'm doing because I this is fun. It's not actually fun. It's kind of a pain. It's just, it's very useful. And it's very easy to get it wrong, or to miss things. And so what I've been trying to do is sit and say, here's this really useful thing, here are the details you need to get right. And now that I've written them down, you don't have to waste your time trying to figure this out, because much of it is not. It's really useful. But it's not like you don't feel like you're a better person for having figured this out. It's just it's the getting, it's an obstacle and I'm trying to get people past those obstacles they can use this useful technology. Yeah, well, there's a lot of stuff that you talk about that is not necessarily something that would be front of mind, like security, like how to manage the versioning over time and so on, but I think also it would be

20:00 quite satisfying to have you take something that's janky. And maybe it does have that like one hour banner on Sunday, we're going to be down from three to four, upgrading our site and, and be able to remove that and say, No, we just deploy a couple times a week. Now, we don't think about it, because it's git push, prod, and then wait 30 seconds, and then prod is now the new one, right? Yeah, I think that's a very good feeling. Yeah. And there's a bunch more you need to do that. Then when I talk about I'm talking about one piece, there's also the deployment process, and there's having good observability and logging, even just the packaging part of it. Like there's a lot of details to get right, that can make it a lot easier. Yeah. So the way you started the presentation, your first thought for this was that packaging, this whole Docker as production packages, your app and Docker for production is an iterative process. Yeah. And maybe also layers, right, like, so you said, you don't necessarily start with the whole while we want zero downtime you start with, can we make it run in Docker? Yeah. And you don't have to do it as an iterative process. Like, if you can manage to keep all this in your head, which honestly, I can't, there's too many details, you go, or if I were to start Docker rising, something, I would probably do a bunch of it in one go, because I remember some of it. But if you're doing this at your job, and you probably are, like someone's there's going to be an emergency, or someone's going to pull you over, there's gonna be a bug that you have to fix. And so you're getting pulled away at some point. And so what you'd like to do is build your packaging in a way where if you're interrupted, you have to stop you have to put on hold, you can put it aside and know you're a good stopping point. And you want to sort of prioritize in most important ways. So if you run out, you may just run out of time budget, like you have limited time and you want to do the highest priority things first, then if you ever have more time, you can Well it sounds like what you're describing is a little bit of what happened to the software development side of things when it went from waterfall to agile or waterfall to something better, right? So many projects used to say, well we're going to do is we're going to build it until we're gonna work on it for six months. And it's not going to be actually usable in any meaningful way until that six month point, and then maybe it drags on and it goes over budget and gets canceled. And there's just all sorts of you know, you get no user feedback, there's all these kinds of problems, trying to build software that way. So it doesn't surprise me that there would be lots of advantages to try and apply that same sort of iterative thinking, like, let's make sure that each step along the way, we have something that's useful and more useful than I was before. Yeah, it also, even if you know you have the time budget to do all of it. Having a good understanding of the priorities means you can focus on the really important things, they are like, large images are very visible. Because very you can look at an image and say, why is this two gigabyte like this is ridiculous. And then you can go into this rabbit hole of like trying to make your image smaller. And that's a fine thing to do. But if you deploy an image that's insecure, and nice and small, that's not the best trade off. Like, I would venture to say that your organization would not praise you, for your efforts, make it small, but rather, they would be pretty upset about the security problems, right? Yeah. And so like, security seems like a high priority, like automations, high priority. And the order you might actually do it in might be different slightly different depending on your particular domain you're working on. Right. But you want to think about it's a bit of a priority stack, right? Like what is most important to me, or what is most foundational to the whole this whole packaging and Docker process? Yeah. So you actually put up some six points, or that you thought or sort of stages, you're talking, get something working, that's just can you use Docker at all? Obviously, if it doesn't work, it's not going to be useful. And then number two, even before continuous integration, security, I can see that people would look overlook that. But that's not trivial, right? Yeah. And like security is sort of a never ending thing, like because you have to deal with security updates. But if you want to, like you don't need complete automation to like spin up a server somewhere and test something like if you're using Heroku, you can like push a Docker image Heroku and spin up a server, you can manually do a git push. Yeah, right push stuff, but and then try it out. But if it's talking to like a production database, it's not secure. That's the problem. And packaging is only one small part of security, and most of its gonna be application security, but it still comes into it. And it's still important, right, right. Little Bobby tables is still a problem, even if it's running in Docker. Yes.

24:31 That's right. Okay, so number two is security. And then number three is continuous integration. So making sure that like, when we check in code that it's tested in Docker, se, the tests are run that system, something of a Docker compose up type of thing happens and it's all good. Then correctness and debug ability. So correctness is obvious, right? It needs to work it needs to have fresh data can't have like stale caches and weird things like that. But debug abilities is interesting

25:00 may be want to focus on that just a sec sure, so the idea is that like, once you have automated builds, you might start actually running things or production production for real. Or even if not, you're gonna have if you're building an image for every pull request, now you have a bunch of images. And so someone files a bug. How do you know what version of your code like? Which docker? Do they know which Docker image they were using? What version of the code it matches to? If something crashes? Like, are you going to get actual? Are you going to get logs that someone can report or not? If you don't go to the effort of exporting the volume that has where the logs get written? Every new deploy? gets a new, fresh set of logs? Yeah, so yeah, like logging to standard out, or standard error is the the other way you deal with logs and Docker, but you need to, like, put some minimal thought into, like, where my log is gonna go. Yeah. Another thing that gets really tricky around that kind of stuff, I feel has to do with the fact that there's just so many moving parts, a lot of times, you know, you've got your celery Docker container, you've got your Redis Docker container, you've got your Postgres Docker container, you've got your app, Docker container, you're doing micro services, like, Who knows how many? And then all of those things have logs? Do you use anything, any of the services that try to bring all those logs into one place? Not a huge amount of experience any particular one I have? Or I've built a system like that, which these days, I wouldn't recommend using, but it's more for display only credited for scientific computing. Yeah. But before the serve the log as a service, log aggregation as a service was a thing. Yeah, I've worked on like, airline reservation system, one of these. And this was a, like, eye opener for me being able to see logs going between five services as in like, multiple different protocols. So you could make debugging vastly simpler. Yeah. So I really, like anything I like to trace across services. Without tracing ID will make once you have more than one service will make your life much easier. However, I my recommendation is avoid micro services, unless your company has 500 people, or Yeah, something like that. It's an amazing architectural design pattern. When there, there needs to be autonomy for different parts of the application like this team works on the front end, little bits here. This team works on the user authentication and identity part API's. And but you know, looking at the most recent PSF JetBrains survey, that is not the number of employees, number of people on a team type of description for most Python developers. It's like a handful of folks. Yep. All working on a lot of it. Right? Yeah. And if you look at the companies are doing microservices successfully at scale, they will have a team of you know, three to five people working in one service. So if you have a team of five people, 20 services, you are doing, like 100 times more services per developer than the big companies. Yeah. And that's a lot of complexity you've just added to your life, and it is often unnecessary. Yeah, your let's, I'll pull up the survey. So I've done Python developer survey 2020 results, if you search in that for team, not team city, employment work working in a team versus working independently, but half the people work on a team. But if you look at the team size 75% are two to seven. Yeah, that should be one micro service. Yeah, really, micro services. And their applications are actually there distributed systems where you actually it makes sense to do more. But like any time you make something more distributed, you are adding vast amount of complexity. And so if you can avoid it, avoid it. Yeah, well, here's the way that I think about it, I think about microservices, they have a tons of value, and they move with the complexity of your application lives. So what you end up with is very simple, relatively simple, small, easy bits of code. But what you also end up with is a much more complex deployment, DevOps coordination story. So when I think about microservices, the farther you go towards micro services, the more you're taking the code complexity of a large app and architectural patterns and separation. And you're saying, well, we don't need any of that. Let's make it real simple. And we move that complexity to coordinating a bunch of services that are always up that are debuggable, across services, and versioned. And in all those things. And when I think about that, for me, I'm way better at software complexity than I am at deployment complexity. So I'm more successful not putting it where I don't have my experience or skill set, you know, it's actually software complexity too, because if you call a function, you're going to call a function. If you send a message to another service, it may never arrive and may be delayed arbitrarily. Yeah. And so like the the communication becomes and the reliability of the finger calling the switch from a function within the same process to remote service is a sort of huge increase and unpredictability and sources of error. That's a good point because as for many things that could go wrong with say calling some function or a class level funk method, not having that get called is not one of the

30:00 things you have to worry about. Yeah, you might crash because the file system isn't there isn't accessible. The database isn't there but it's not that you couldn't even call it right that's gonna happen. Yeah, before we move on, Kim says filebeat, LogStash, Prometheus and Portainer Oh portainer I've never heard that one can all help logs from Docker in various ways. Awesome. Good resources, check out. Okay. Last one on your quick hitlist before we dive into some of the details is faster builds and smaller images ethically skipped reproducible builds? Who needs reproducible? Yeah, let's go with reproducibility. Yeah. And so this is the static versus dynamic change of dependencies you talked about where, on the one hand, you really don't want every time you reinstall the application to get the latest dependencies, because a new version of Django comes out, you don't want your code to suddenly start running on it, because just because it came out, right, and maybe you're not aware, right, because you have the older version of Django, as you suggest, and you're working on it, you get push, it goes to ci, and then that, ideally, is going to like some sort of continuous delivery. And it's grabbed just the latest, which is not what you had, and then it runs with it. Like that could be bad news. Yeah, yeah, this happens a lot of like dev tools, and then like, like, I'm gonna go check if there's been a new release today. And that matters less. But when it's your actual production software, that's not Oh, I wasted 20 minutes figuring out a failed build, it's, my code is acting weird in production. Yeah, another hand, if you just freeze all your dependencies and never change them, then at some point, you're going to be running on a version of software from two or three or five years ago, like I've, the extreme cases, actually are organizations still running on Python2. And this is, becomes very problematic, because upgrades become more and more terrifying, the more you put them off, because it's not just like Django. It's like, if upgrade Python and Django and three other major libraries you depend on. And it's, it's like this project, it's a project that's not features or bug fixes or anything, that's just risk. But if you put it off more than more, so you need a process that's ongoing. So you need both in the short term to be make sure your builds are identical, reproducible, and long are mostly identical. And in the long term, you need the process to continuously update so that upgrades are not this terrifying thing. They're just a standard part of your development process. Yeah, I had Carlton anwil on from Chico chat A while ago talking about deployment. And we came up with the idea that there are basically two types of applications, there's ones that you're going to continue to add features to and you're going to care about. And you have a maybe a team dedicated to it. And for those, you never want to be that far from the latest thing, just like you described, the farther you get, the more frightening and the more potential problems you have, if you take on the latest, right? Because if if you're on Django 2 and it's Django four or five is out in some future world, you're like, well, we finally need to move to get that because the other ones gone fully and supported. Well, that's like you said as a project, if you're always kind of just in dev sort of rolling into later one and then deciding to roll that out like that's, that's a much smaller challenge so that those should absolutely stay there. Also, we talked about a set of apps, a type of app that falls into the please don't touch it. And if you do touch it, and break it is now your baby. It's some horrible legacy code, the person who created it probably doesn't work at the company anymore. Nobody really likes it. It's not important, but it it needs to be there. Like it's some internal app or something right? Like maybe those you're just freeze, freeze those in time. They're very likely not public facing or something. But certainly if you care about continuing to work on this thing and adding features to it, and it matters, then keep it not too far. That's the tension. You don't want to just constantly ship the latest thing is maybe that's a major release of some library, but at the same time, you don't want to freeze it.

33:54 This portion of talk Python to me is sponsored by Linode. Visit talk python.fm/linode. To see why linode has been voted the top infrastructure as a service provider by both G2 and TrustRadius. From their award winning support, which is offered 24/7 , 365 to every level of user the ease of use and setup. It's clear why developers have been trusting Linode for projects both big and small since 2003. deploy your entire application stack with linode to one click app marketplace, or build it all from scratch and manage everything yourself with supported centralized tools like terraform. Linode offers the best price to performance value for all compute instances, including GPUs, as well as block storage Kubernetes, and their upcoming Bare Metal release. Linode makes cloud computing fast, simple and affordable, allowing you to focus on your projects, not your infrastructure. Visit talk python.fm/linode and sign up with your Google account, your GitHub account or your email address and you'll get $100 in credit, as talk python.fm/ linode or just click the link in your podcast player show.

35:00 notes. And thank them for supporting talk Python.

35:05 quick comment back to them. monolith holdings says can you speak to microservices versus monolith in particular, for ml applications? I think that's a little bit different. I have really thought of it from an ml perspective and got my decent a decent rule of thumb is, are you working on a web application where there's hundreds of developers working on that application? And the answer is yes, then someone in the organization is going to, like write micro services, anything smaller than that, like just don't think about it, it's like, there's an I tend to see, once you're small enough, I tend to see the same way about Kubernetes. Like, there is a lot of like, technologies for a company with 5000 people, or 500 people, or 50 people or five people or one person, each organizational size, you're gonna want different technologies, because different architectures, because most applications will have 500 developers or five developers, because you're really to specialize your ability to build infrastructure different. And so if it's a thing that an organization that has 1000s of developers working like, are you building Pinterest? Probably not. Then the technology choices Pinterest makes may not be relevant to Yeah, thinking maybe another consideration is how much is part of that functionality shared? Are you building an API that has some models that make some prediction that a whole bunch of your company and different apps and websites and such might need? And you know, maybe that's its own thing. But if it's only being shared, shared in one place, maybe not. There's an interesting article that was from 2019, might be worth people checking out, called Give me back my monolith from Craig Karstens. Anyway, it's, I'm not gonna go into it here. But it's kind of an interesting read, people can check that out if they want. All right. We talked about faster builds and small images now are there Tell us about that people who are new to Docker haven't done a lot with Docker, there's a lot of things you can do to result in a smaller, physically a smaller image size, right, a smaller file on disk, it's very easy to get a giant image in Docker, because the Docker image format is basically in many ways like Git history. So every time you make a change, it's not overwriting, it's adding. So there's a history there. The history is always there. So if you delete a file, it doesn't make that image any smaller. If it was added in the previous layer, right? If it was added a different layer. That's right, yeah, if you structure things, right, there's a bunch you can do to make your images smaller. And similarly, Docker has a bunch of features to allow you to like not have to run pip install every single time you rebuild your image, because the dependencies haven't changed. So it can just cache that those files for you. But you have to set it up, right? So does that and you can go from a half an hour balance to a one minute build. And depending on how you build your Docker image. So it's like Alpine Linux being my current favorite example, the maybe that's going to get fixed over the next year. One of the things that's super interesting about that, as well as the ordering, as well as the grouping of those commands can really matter. So for example, if the first thing that you do in your Docker file is to copy over your source code, then the next thing to do is to do an apt update, apt upgrade, then the next thing to do is to install all the dependencies, and so on and so on. Right. Every time any file that you're working on, even like unrelated CSS file changes, everything below that has to be rebuilt, right? Yeah. And if you make a point to say, well, let's reorder that. So the very last thing we do is copy our files over, then as you make changes, like you want, those other layers will just be up to date. And one cool trick that I've seen that you can make that even better is somewhere in that intermediate bit. You can even copy your requirements or pyproject.tomo file over and then install those and then copy the rest of your code over which looks like a stupid duplication. Like why are you just copy this one file, you already copied it in the next one, but you can cache that pip install pip compile step, make it faster. Yeah, by basically if you understand how Docker caching works, then you can sort of structure Docker file in the right way. And then it just you get caching. And then you need extra steps to get it working and ci, but you can get a much faster builds will work. The Fast thing is super clear. Tell us about smaller though, what are you really focused on for smaller. So some of it is just things where various packaging tools are optimized for development by default. So if you do pip install, and you do something like NumPy, this can be pretty good. Say, let's go. The big packages like things like TensorFlow, like these packages, and these packages are hundreds of megabytes, like they're just huge. So you download the package, and then it unpacks it and installs all the files. And then by default, pip will keep a copy of that downloaded file, and a directory, probably the intermediate build output, the wheel file you downloaded, potentially all that stuff. Yeah, let's keep a copy of that.

40:00 Wheel file, the idea is like, you might be doing another virtual and have another two hours. And so when you do pip install this time, we can just doesn't have to download it, you can just use the cached version, and right for development, that's great. But for the Docker image, you are never going to call pip install again. So keeping this file is just like an extra 400 megabytes of this space. And so there's a command line option for pip install. It says, -- no cache, yeah, and then doesn't keep a copy. And now we are haven't used free to put a bunch of space. Adding could be fairly considerable. If you're doing certain, especially for data science tools are added up across all the dependencies and their transitive dependencies and so on. Right? Yeah, that's like, just not storing files you don't need, they're one of the mind shifts you got to get into to work with this stuff is you will never ever change the Docker configuration, right? It's not like, oh, there's updates to Linux. So I'm gonna go in and like apt update it, or there's an update to my requirements. So I'm going to reinstall the requirements, you recreate a new Docker image, and you throw away the old Docker image, right? So there's a lot of the things that are there to make that next step. Right, pip installing, again, work well, are just liabilities and negative effects on your Docker image, right? Yeah. Yeah, Docker images are sort of designed to be treated as immutable artifacts, which is sort of great. But also like, you're dealing with a whole bunch of tools that don't really have that assumption. And so you have to figure out ways to make those two conflicting goals work together. Yeah. Another interesting thing that Peter brought up Peter McKee, and the other episode we did not too long ago, was intermediate frameworks and all sorts of stuff, it doesn't apply super well to the Python space. But maybe there's certain aspects, especially in the data science side that might. So for example, if you're going to install, like the development setup for Python, not just the ability to run, but to do pip installs and do all sorts of things. The example he gave was, if you're going to have something that runs in go, Well, what are the steps you might install might do is like, well, we're gonna install the go compiler and all that business. And we're gonna compile the artifact and you try to run it like you create a separate container that will take the code and compile it and give you the binary just copy the binary without the compiler back in there. So maybe there's some techniques like that. I don't know. I mean, it's, I don't see it quite as well in Python, because we can't fully package it up as reliably. But yeah, so a common so this happens a bunch if you're compiling your own custom c extensions. Yeah. So one way you can do that is like, you can have a thing that generates wheels, and then like, you build your Docker images, download the compiled wheel. But if you want to do it in your Docker image, you're gonna have to install a compiler, but then you know, that compiler package is going to be in your final image. So it just makes your image bigger. You don't need GCC. And so you can use a multistage build, which is probably what he was describing. Yeah. So the one easy way to do that as you create a virtual and install all your code, and then you copy just the virtual and move into a new Docker image, I wish and then the new Docker image just has the resulting self contained virtual doesn't have any of the compilers needed to build it right. No matter what else might have been over there. Yeah, maybe you could even use something like packs if you really cared to like compile that or not to bundle that into a zip and then run that directly. I'm not sure. But possibly, Tim out there in the live stream says, --no cache dir has made my evening thanks. I never thought about it the intermediate files from pip only from apt path and also Docker on build option can help a bit with that scenario.

43:36 Okay, I am deprecated on builds. But I could be misremembering, that would be too bad, because I just learned about it.

43:44 Okay, those are the six things that you talk about in this iterative process, or this layered process like step one, your first deliverable or your first package this as Docker sprint, would be get something working in either the single container or the suite of containers from Docker compose. Step two, is make sure they're secure. Step three, getting them running in ci, step four, make sure that they're correct. And debuggable. Number five, is reproducibility with that balance of not exactly the latest but not super old and stale. And then finally, bass builds and small images. Yeah. And then along the way, there's a whole bunch of different things you can do, depending on what tools you're using, and like what your priorities are. And yeah, can maybe give some examples if we have time. Yeah, absolutely. We got a little bit of time I thought that'd be fun. So we could dive into just get something working, which is like I am I have

44:36 a couple of lines in Docker like see Docker, simple. Yeah, let's face it. Yeah, you choose a base image, copy your code and run pip install and say, This is what I want you to run when you start up. For many applications. I'll do the trick. So I'm always wondering what is a good container base to start from? Right so you have this Python 3.9 slim Buster version as the base. Yeah, there's a bunch

45:00 Different options, right? What do you what do you think? So the first thing is you want a, these are all based on Linux distributions typically. And so you want a Linux distribution, that's some sort of long term support where they, they are both guaranteeing backwards compatibility in terms of like binary API's, but also in terms of features, but they're also doing security at ports. So like Debian stable boon to long term support, Red Hat, enterprise, Linux, they all are going to give you this stability guarantee, they'll say, we'll give you a stable operating system with security updates to it. And so you want seconds based on one of those probably, and then you need to, you typically are going to want a up to date Python. And these distributions or sometimes like backport, new versions of Python. And so you can use that. So you can say I'm going to use like a Ubuntu long term. So they read your long term support from 2020. And like as Python 3.8, and maybe they just added 3.9, I'm not sure. So I'll send it to that effect. And then you can know where that or Docker maintains these things are the official in quotes Docker images for Python. And basically, they do they take Debian stable, and then they compile all the different versions of Python for it. So you can get 3.7 or 3.8, or 3.9, and 2.0, it comes out regardless of what's in Debian stable. So it's Debian stable plus an extra Python. So Python, colon 3.9, is Debian stable, plus 3.9. And they have two variants, one has a bunch of extra packages, and one has fewer, the fewer one fewer packages is the dash slim. And then the dash Buster is which version of Debian you're using. And the reason you don't have to specify that but like, maybe like, at the end of the year, maybe early next year, there's going to be a new version of Debian stable. And so you don't want overnight to go from Debian 10 to Debian 11, as your base image, probably, you would probably want to just at least do that consciously. Right? And so putting, saying dash Buster means I want to stick to Debian 10 buster, and for those who don't know, that'd be in Linux releases are based on Toy Story characters by so one of them, right, yeah. And I don't remember the next one is there's Debian unstable, which is named said, there's always Debian unstable. And it's always said they never release it. That's cool. Alright, so in this example, it's the Docker file says from Python, colon 3.9, dash slim dash buster, which means all the stuff that you described there, and then you copy your files over, you run pip install to run, install the dependencies, and then you just basically start your app as the entry point. And that is it, we got something working, this is probably an over simplification, there might be a data base there that also needs to start up and run its bits and so on. But yeah, that's pretty much it, right? It's pretty simple, typically pretty simple to just get everything's working, because it's gonna just install some packages, and then run this test, tell it to run the scripts when you run a container. Yeah, it's pretty much whatever you need to do to get a new machine setup to run this, do that in this file. And you're good to go. Yeah, and I guess, see and comment on the chat. And I should add that, as far as I can tell, Docker on build is not deprecated. But I'll get I'm not sure. So all right. All right. I love to look at it.

48:12 Sounds good. So getting it working is super straightforward. Yep. But getting something secure, is interesting. But let me go back, I think we might be skipping around a bit. But you're talking about having that version specified there of slim buster, I know how we'll get new dependencies for the Python code up there. And if there's some kind of security problem, what will probably happen is dependent on GitHub will send me a PR that says, Warning Warning, your version of web framework has such and such CVE, we've created a PR, you accept it, you push it back to the right branch that kicks off the whole process and everything goes again, right? stack keeps like the flow of the somewhat fresh code and dependencies going through your system. However, how do I keep that same thing happening for Linux? Right? Suppose Linux has some security vulnerability and the version that I've got, or I've got nginx running and it has something like that I need to update like, what is the trigger? That helps me know, like, what is the process that helps me know, oh, you need to even if this is a somewhat stable, stale project that we haven't touched for a month, you need to somehow go give it a kick to like force it to get the latest and do that again, because there's no auto apt upgrade. Running there. Yeah, so one thing to notice, some people assume that the idea that the official Python, or even official Debian or whatever the official base images from Docker, get security updates every time they come out. They don't, some of them get updated fairly frequently. Some of them like the Centos one, which I guess people are very swishing away from. But for a while it was a lot of people probably using it. The Centos base image will not be updated for months at a time. And so they are relying on and the Debian ones. Well, I've seen them like lag on security updates.

50:00 by two weeks, so Debbie has released a new security update with a Docker image hasn't been updated, which is not ideal because you're telling all the hackers? Well, here's the problem that you can just go look for in systems that lag on getting their patches. Yeah. And so as someone who is creating a Docker image cannot rely on the base images to be up to date, you need to install security updates. When you build your Docker image, what's that look like? The step two is apt update, apt upgrade dash y or something like that. Yeah, apt get update and apt get minus y upgrade, you can add a few more command line options make your image smaller, but yeah, it's basically do an upgrade. But there's a problem Docker has really talked about has this caching thing, where if you rebuild an image, nothing's changed. Now, we'll just use a cache layer. So when you rebuild your image, if you are using caching to speed up builds, it will look at Docker apt get, look at the apt get update, apt get upgrade and say Moses unchanged, same command. And so we'll just use a cache layer. Yeah, and you absolutely will very likely be doing that. Because it is, it's like five minutes versus three seconds to restart and build and test your app. So everyone is going to be using the caching, maybe not CI CD. But everywhere else. I mean, you probably took a little one and CI/CD two. Yeah. And so the result is that if you are if you've set up caching, to speed things up that caching will ensure you don't get security updates. And so basically, what you want you have to do is just have this process where once a day or once a week or in response to CVS coming out, you rebuild your image from scratch without caching. So you can just say every night at 3am, when no one's working, we are going to rebuild our image from scratch without caching. And so our image will always have like the latest security updates. And then if you're in a system that has continuous deployment, you can then automatically deploy that. Wait, how do you make the little banner that says you're going to be down Sunday from three to five? Can you do that part? Just kidding. Yeah. So this is easier if you have a process that that you trust enough to do automatic deploys, like anytime you want. But you basically have to rebuild your image from scratch without caching, either whenever a security update comes out, or just on a regular basis and redeploy, if it's a server, because you have these immutable artifacts, that if you're running a VM, like you can just have like a cron job that installs security updates nightly, because unintended upgrades package and Debian for Docker images, you can do that. And so you have to rebuild from scratch with security updates, and then redeploy, and ongoing process. Oh, yeah, I'm glad you pointed out the caching, because it's not enough to go out and say, Oh, well, every once a day or once a week, we'll just do a Docker build. Oh, it's up to date? Actually, we're good. No, not so much.

52:47 Yeah, yeah. That's why I brought this up. Because I think it's tricky. Like, there's a natural flow that like kicks that refresh cycle off for code, but not for the infrastructure itself. Unless you think about it, though. Yeah. So yeah, you need to you need to explicitly think about instead of his processes, either some way to get notified of CVS. Or you can probably if you have a bunch of registries have security scanners, they'll scan your images for security problems. And so you can run those in a schedule maybe honestly, the easiest though, is probably just do a forced rebuild 5am or something that will next time, every developer that comes in and runs the command Docker compose up gonna do a Docker build, it's gonna see the things how to date and it'll just trigger, let's get the fresh Yeah, yeah. And it turns on security scanners are also have some bad defaults. So you'll get there's a lot of security problems that are not really problems, like the upstream maintainer has closed it as won't fix, or it's not going to get fixed and Debian stable until the next release Debian stable and the Debian maintainer that basically have decided that is not worth fixing. And so there's nothing you can do. Most security scanners will flag those. And so you'll run a scanner, update image, it'll say you have 60 security vulnerabilities. But you if you turn on the flag that says only tell me about security vulnerabilities that I can actually fix that actually have updates from Debian and then you run that and say you're and they'll say you're fine. Right. And that is probably a much more realistic assessment of your risk, because it's like, there are bugs that are never going to be fixed. Because the G lipsy maintainers. have said, no one fix this is not our problem. It's not a real bug. Yeah, I suspect you could also get notified about things that like, are not observable, really. So Oh, there's a problem in this system. But we actually have a firewall blocking that port. And we have no interaction with it, right? It's like, how much do you worry about those kinds of things, you may as well upgrade and redeploy because maybe one day or firewall will fail, but like, there's a whole bunch of just like, utter noise if you don't configure your security scanner correctly. All right, wrapping up this bit of the topic, Kim says forced rebuild is a great for your own images based on Debian or otherwise you probably still need some kind of scanning. Yeah, if you're not able to build it yourself. Yep. Makes sense. All right. We got a little bit of time to touch on a couple of things. One of the areas stage

55:00 Two was security, you always want different layers, I talked about a firewall, we're talking about security updates and patches. But there's layers of security one. And one of the very straightforward ones is you probably don't want to run this as root. And like certain systems will even warn you about this. So if I try to sudo brew something on my Mac, it'll complain like you should never ever run brew is root. What are you doing? Are you crazy, stop doing that thing. Micro whiskey might warn that you're running as root, if you look at the logs when you started up, so when I run Docker, and I just get that simple, get started one, what does that run us. So by default, Docker runs as root. Oh, okay, that kind of makes sense. Because all these system packages are designed to be installed this route. And so if you're going to install system packages, or install security upgrades, you have to be root by default. But as soon as you've you've switched to like installing your Python code, you should stop being root and create a new user and switch that user because otherwise your application will be running as root. And root inside a container is more restricted. But it's still not as restricted as a normal user. And different runtime systems might take more aggressive steps to restrict what you can do. And so sometimes it might be okay, but just a good best practice your you don't know where your things are going to run, things might change around, just don't run as root, what you're saying is basically, if you run as root in a Docker container, and somebody picks over your container, well, the worst thing they can do is like crash around inside of the container. It's not like they now have full access to the machine. But you know, maybe those rights are propagated onward like maybe they can do something else to I don't know, decrypt something that then gets them further in the network. Like there's, there's challenges that could happen, right? So it is much easier to escape a container and onto the host, if you're running as root, because you like in Linux, a security access is granted by these things called capabilities. And if your route, you have a little bit more capabilities, it gives you a larger attack surface on Linux kernel. And so if there's a bug in the Linux kernel, it's easier to take it over if your route, there are other things you can do to restrict all capabilities to containers, even if you're running as normal user. So like, if you're running the ping utility, for example, it gets a little bit of extra, it gets an extra capability often. So I can do a ping. And then if there's a bug in the ping command, then you can sort of you can insert code into it somehow, then it'll execute it with elevated privileges, and you can do more stuff. And so yeah, you don't want that. Yeah. And so he wants to like, write as normal user that will restrict the attack surface on the Linux kernel, like removing all capabilities, or restrict the attack surface even more, and you do these things. And for many applications, it won't really matter too much. But it's very, it's not a lot of work. And it's like a little bit more assurance that if someone does somehow take over the limited damage they can cause because it only be restricted to this container. Yeah. Okay. Good advice. Use the Add User Docker command. Very cool. And then let's see, what was the next one here? We talked about the security updates, like that's, that's a challenge. So what do I need to think about for continuous integration, automated builds, specifically with regard to Dockers that are anything special, like So first, it's just doing the actual work of automating it. So like, you really, it's really nice if every time you push your Git repository, every time you pull requests, it builds a Docker image for you. Because then like, you can test it, maybe you can write additional tests actually use a Docker image to do integration tests, that sort of thing. Yeah, for example, there was a really cool framework or library, I can't remember exactly what it is we talked about on Python byte, that instead of trying to mock out, say, your database, or as mostly databases, there's a Docker, there's like a testing library you can use that will bring up a Docker container running Mongo or Postgres or something, and then fill it with test data. And you just connect those things and say, yeah, you can talk to the database, you know, parents test data monitors. Well, yeah, testing with a real database is so much easier these days, it just be the default. Like you shouldn't be if you're didn't deploy with Postgres, you shouldn't test a SQLite, because, like, they're different enough that there'll be bugs that you're gonna miss. Yeah. And so once you have that automation, and like building for every pull request, you don't you start having this issue where you don't want the image you built for the future 123 branch to overwrite your production image. That would be awkward, but you would still like continuous integration to do its job and say, You checked into this thing. It was okay or not. Okay. Yeah, it's useful to have like images uploaded for every pull request that can download and replay out of it, but you don't want feature branches, images to interfere with your production image. One easy way to do that is to name your Docker images based on the git branch. So like, you can just use the git branch as the part after the call on the tab. So it can be like your image colon main, if it's the main branch or your image colon feature 123 . It's the feature 123 branch. Yeah, that works really well with like Git flow feature branch. Style

01:00:00 Programming as well, I create a an issue, then I create a branch named something along those lines, and I create a PR along those lines. And guess what? Here's the container that goes with that thing, right? Yeah. Yeah. So things like name your Docker image based on the git commit. So you can sort of go from git commit to corresponding Docker image really easily. Yeah. Yeah, that's really cool idea. I did find that package, by the way, in case people are interested, it's called test containers - Python. Anyway, it The idea is you just say, like with MySQL container, do your test. And it like literally creates a Docker container with your test date and all that stuff. So people can check that out? That's kind of cool. All right. Well, I'm getting a little bit short on time here. What else do you want to throw out? For people who are thinking about a lot of these best practices? Have? We touched on a lot of them, but I know there's, there's plenty more to go like, for example, faster builds, you talk about, say, pre compiling the Pyc files. That's more for that actually use slower builds, but it'll give you a faster startup. That's what I mean. Yeah, sorry, since this comes up a bunch. Alpine Linux is not a thing you want is often not a thing you want to use for your Docker base image. And the reason is, Alpine Linux is highly recommended for if you want small images, it's kind of nice because you install the installing the Alpine packages, somehow, I don't know what they do. But it's vastly faster than like installing Debian packages, and you get small images. And it's kind of nice. Problem is Alpine Linux uses a different standard C library than most Linux distributions. Most Linux distributions use lipsy. Alpine uses MUSCLE or muscle, I don't know how to pronounce it. And binary wheels are compiled by default on Linux for glib C. And so they are on if you install Python packages on Alpine Linux, you will not get binary wheels, you're gonna have to compile them from scratch. And so what happens is people say, Oh, I'm gonna use Alpine Linux, it's gonna make my images smaller. And they try to install, like a Postgres package, which is precompiled and doesn't work. They're like, okay, it's an python, solid compiler, and on solid Postgres headers, now you have this image that has compiler and Postgres headers in it, and you have to compile stuff. And like when you get to like data science, or scientific computing, and you're like compiling these massive packages that take a really long time to compile and not your builds are super slow, then you can do a whole bunch of work to then use a multistage builds that your image is small, and then you can use caching so that the builds are fast. And then, but or you can just use a different Linux distribution, then use binary wheels, don't cause yourself the challenges, do some alpha, but there is a pep that I believe was accepted to start the process of building wheels for Alpine. And I've started seeing some packaging tools who started adding support. And so it may be that in a year or two, it'll be just like everyone builds binary wheels for like many Linux is the glitzy people might start building wheel binary wheels for Alpine. And when that happens, they'll be much less of an issue. Until then, avoid using Alpine Linux as your base image, you want to close it out with a PYC thing? Sure. So when you start up a Python program, it loads in your Python source files and then compiles them. And completion here is not really the same as compiling a C extension. It's basically a one to one translation that compiles into bytecode or writes models, PYC files, and then the next time you start, you can just load the PYC file and that make your application start up quickly. And so if you're doing some sort of like many applications, it doesn't matter. If you're doing like a serverless kind of thing where like you're, you want things to start really quickly, like having to compile the PYC is it's gonna add some startup time, I guess, any times where the container lifetime the life cycle is short, right? Let's say with a web app, you would start it and it would run for hours. And so it doesn't matter, right? Yeah. So it's like it took you another 20 milliseconds to start off, it's gonna be around for three days. But if you're doing like a serverless thing where like you, like 20 milliseconds might be a significant chunk of the latency of your service. So when you build your Docker image, you can pre compile your PYC files, and then they'll be in the image and you won't have your startup way faster. And the reason you have to actually think about this is that Docker images are immutable. So your container starts up and compiles and writes the Pisces, but those Pisces never make it to the original image. Every time you start the images has the same immutable artifact, unlike your local home directory. And so if you really want the fastest startup, you can make your image a bit larger and compile the PYC s. And basically, that becomes a step in your Docker build files to compile the PYC ahead of time. Yeah. Okay. Awesome. Great advice, many, many tips. I think we're gonna have to leave it there or getting basically running out of time. But yeah, really nice talk. I'll link to your talk that you did at PyCon and thanks for coming here and sharing the audio version with us. Thanks for inviting me. Of course. Before you get out of here, though, there's the final two questions. If you're going to write some Python code, what editor Do you use, I use spacemacs, which is kind of like they took Emacs and they configured it like 20 years. That's a good jumping 20 years in the future to the future. Like it's Emacs. But with all

01:05:00 The things you need pre configured to actually have a nice development environment and it has vi bindings, Emacs bindings, I use the Emacs bindings especially like VM, you can use vi finings. Yeah, cool. their, their subtitle and sub subtitle is a community driven Emacs distribution, the best editors, neither Emacs nor vim. It's Emacs and vim. I honestly don't use it in bindings at all I'm using it for like, it does all the IDE stuff you want out of the box. So it's just, it's a much more modern experience. Okay, really cool. And then notable PyPI package, PyO3, which is a way to create Python extensions in RUST . I've used it to create most will sort of fill my memory profiler. So I wrapped some Rust library with it. It's really, really nice way to create fast, safe extensions for Python. And it comes there's a packaging call tool called Maturin, which was probably the nicest Python packaging experience I've ever had. Like, you add like three lines. If you add a pipe project line file, which is like relines, you add like a tiny bit of metadata. And now you can build wheels, and you can pip install, and it just works. And it's just amazingly smooth development experience. That's fantastic. Yeah. So basically, if you're going to write C extensions, maybe we can consider that and write them in Rust and use this off. Yeah, yeah, it's like Rust is like gives you the same performance that you would get from C or C++, but it's much safer. And as someone who used to write C++ long ago, like I, I learned it and then over the past couple years, and it's like, it is the language I always wanted C++ to be. Yeah, I hear you. I did a lot of C++ as well, was always bringing in these things like smart pointers and other stuff. It's like, Why does it have to be hard to just like, make this better. It's not a simple language. Because if you want performance, you need to do work and the way it has a very different paradigm, but it's really lovely language, you'll write much safer code. And pile three makes it really nice to write Python extensions. Yeah. Cool. Cool. And then I told him that there has an interesting comment. Is that like an isotope? I think it's a reference to like oxidization. Like, oh, three, but yeah, yeah. Could rasa and there's a lot of oxidizing happening? Yeah. Around pi. Iser. Yeah. There's my project. Yeah. Think about it. That's like a completely different project. But yeah, it's another rust fun packaging. Yeah, exactly. Very, very cool. All right, final call to action. People are interested in this. They want to go deeper. You've got some various things you can find on your website. 'Pythonspeed.com/docker'. Yep. Where do they go? What do you tell? Yeah, so we're going to 'Pythonspeed com/docker'. There's a whole bunch of free articles about various best practices. If you're specifically interested in the process, we covered today, there's a PyCon talk. But also, if you go to 'pythonspeed.com/docker' process, it's also linked on that page. It's like an introduction to Docker rising for production. It's basically a little mini book I wrote, that's about 10 pages. But it goes over this process that we talked about today and sort of pros and talks about sort of the decisions you have to make and how it integrates your organizational processes. Let's recall on that site, the slash Docker, thank you. It has a bunch of articles, and it has a very small scroll bar and a lot of stuff below it. So yeah, there's a lot of a lot of things going on. People can go check out for more resources there, right. Yeah. And I have a bunch of pip products if anyone just said about Docker packaging, from intro to much more details, one, if you use the code talk Python, you can get a 15% discount. Oh, fantastic. Awesome. Yeah. So be sure to do that. Thank you so much for being on the show and sharing a lot of your hard earned Docker experience. Yeah. Thanks for inviting me. You bet. great to talk to you. You too. Bye.

01:08:45 This has been another episode of talk Python to me. My guest in this episode, was it Itamar Turner-Trauring . It has been brought to you by Sentry, Linode & Assembly AI. Take some stress out of your life get notified immediately about errors in your web applications with Sentry just visit talkpython.fm/sentry and get started for free and use the promo code talkpython 2021. When you sign up, simplify your infrastructure and cut your cloud bills in half with Linode. Linux virtual machines develop, deploy and scale your modern applications faster and easier. Visit 'talkpython.fm/linode' and click the Create free account button to get started. Transcripts for this and all of our episodes are brought to you by AssemblyAI. Do you need a great automatic speech to text API get human level accuracy in just a few lines of code visit 'talkpython.fm/assemblyAI'. Want to level up your Python. We have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription insight. Check it out for yourself at 'training.talkpython.fm' Be sure to subscribe to the show. Open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed

01:10:00 at /iTunes, the Google Play feed at /play, and the direct RSS feed at /RSS on talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon