Best practices for Docker in production

Episode Deep Dive Links Transcript

You've got your Python API or app running in a Docker container. Great! Are you ready to ship it to that hosted cluster service and head off to production? Not so fast. Have you considered how you'll manage evolving dependencies and addressing security updates over time? Not just for the base OS but for installed packages? How about your pip installed dependencies? Are you running as root? If you don't know, the answer is yes.

We'll discuss these issues and many more with Itamar Turner-Trauring on this episode.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests introduction and background

Itamar Turner-Trauring is a seasoned Python developer and author who focuses on production-ready Docker packaging and Python performance optimizations. He created the Fil memory profiler for Python and has written extensively on Docker packaging best practices at pythonspeed.com. In this episode, Itamar shares a wealth of knowledge on deploying Python code in Docker—from ensuring better security, to managing dependencies, to making builds both smaller and faster.

What to Know If You're New to Python

If you’re new to Python, here are some basics to help you follow along with this episode:

Familiarity with Python’s virtual environments and pip install concepts is helpful when discussing Docker’s build stages.
Basic understanding of how Python packages (like Django or NumPy) can be installed will help clarify why pinned dependencies matter.
Some awareness of the command line and Linux file systems can be useful, as Docker images often rely on Debian-based or Alpine-based distributions.

Key points and takeaways

Docker for consistent deployment
Docker containers let you bundle up Python, system libraries, and your own code into a single artifact to run anywhere. This consistency reduces the "it works on my machine" issues by giving all developers—and production—the exact same environment.
- Links and tools:
  - Docker (Official Documentation)
  - Python Official Images
Security updates and forced rebuilds
Simply rebuilding the same Dockerfile doesn’t guarantee new OS-level patches get applied. Because Docker uses layer caching, you must occasionally force a fresh build to pull in security fixes and OS updates. Doing a scheduled rebuild (e.g., daily or weekly) helps keep images secure and up to date.
- Links and tools:
  - apt-get (Debian/Ubuntu Docs)
Don’t run your container as root
By default, Docker containers run as root, but that opens unnecessary security risks if the container is compromised. It’s straightforward to switch to a non-root user (e.g., RUN adduser + USER instructions in the Dockerfile), which reduces the damage an attacker can do.
- Links and tools:
  - Dockerfile USER instruction
Container layering and caching
Docker builds are split into layers, and each build step can be cached. While caching dramatically speeds up incremental builds, it can also inadvertently prevent installing new patches or updates. Carefully ordering your Dockerfile (for example, copying requirements before code) helps you cache what you really need.
- Links and tools:
  - no-cache-dir option for pip
Alpine vs. Debian-based images
Alpine Linux is popular for being extremely small. However, it uses a different C library (musl) which means Python wheels often have to be recompiled. For many Python projects (especially data science ones), this leads to slow, large builds. Debian-based slim images (e.g., python:3.9-slim-buster) are often more practical.
- Links and tools:
  - Python slim-buster image
Iterative approach to Dockerizing
Itamar stresses treating Docker packaging as a process. Start with something that works, then layer on security best practices, continuous integration, correct version pinning, and eventually performance optimizations (like multi-stage builds or caching). That way, you can stop at any point and still have a working foundation.
- Links and tools:
  - Itamar’s article on Docker Process
  - Docker multi-stage builds
CI/CD builds for each commit or pull request
Automating Docker builds in your CI/CD pipeline ensures each pull request or feature branch has its own Docker image. This approach preserves the stability of your main image tag and makes it easy to test or even deploy ephemeral versions of your app.
- Links and tools:
  - Git branch-based tagging in Docker
Version pinning and dependency updates
There’s a balance between locking dependencies too tightly vs. grabbing the latest version automatically. Locking them prevents random breakages, but you must have an ongoing strategy (like Dependabot or scheduled checks) for security updates and new releases.
- Tools:
  - Dependabot
  - Pip tools (for pinning)
Precompiling .pyc for faster startup
By default, .pyc files (compiled Python bytecode) aren’t preserved across container restarts. Precompiling them during the Docker build can speed up container startup times, particularly for short-lived or serverless workloads.
- Tools:
  - Python’s compileall module
Docker Compose for local development
While the episode focuses on production concerns, Docker Compose is invaluable for spinning up dev environments that mirror production. It allows you to run Postgres, Redis, or other services as separate containers with minimal fuss.

Links and tools:
- Docker Compose

Interesting quotes and stories

"I don’t like Docker packaging. It’s not a thing I’m doing because it’s fun, it’s just extremely useful." — Itamar Turner-Trauring

"If you’re running as root in your container, the answer is ‘yes, you’re running as root.’ And that’s probably not what you want in production." — Michael Kennedy

"You have to set up these ongoing processes for things like security updates and dependency updates. It’s not just a one-off thing." — Itamar Turner-Trauring

Key definitions and terms

Layer Caching: A feature in Docker builds that reuses intermediate steps if they haven’t changed, speeding up repeated builds.
Multi-stage Builds: A Docker technique for building your app in one stage (with compilers, etc.) and copying only the final artifacts into a smaller runtime image.
Musl vs. Glibc: Musl is a lightweight C standard library used in Alpine Linux; Glibc is the more common library in Debian/Ubuntu-based images. Many precompiled Python wheels assume Glibc, causing complications on Alpine.
Immutable Artifacts: The practice of treating container images as read-only snapshots. Once an image is built, you don’t edit it in place; you rebuild a new one.

Learning resources

Here are some resources to learn more about Python and Docker:

Python for Absolute Beginners (Talk Python Course): Ideal if you want a thorough introduction to programming in Python.
Docker Documentation: Official docs with tutorials on containers, Compose, and best practices.
pythonspeed.com: Itamar’s site featuring detailed articles on Docker packaging, Python performance, and more.

Overall takeaway

Docker provides a robust, repeatable way to package and deploy Python apps to production. However, it requires a thoughtful process—from selecting the right base image to pinning dependencies to staying on top of security updates. The effort pays off with consistent environments, reproducible builds, and cleaner deployments. As you adopt these best practices, you’ll gain efficiency, stability, and confidence that your Python Docker containers are truly production-ready.

Links from the show

PyCon Talk: youtube.com
Docker packaging articles (code TALKPYTHON to get 15% off): pythonspeed.com
PSF+JetBrains 2020 Survey: jetbrains.com
Give me back my monolith article: craigkerstiens.com
TestContainers: github.com
SpaceMacs: spacemacs.org
Rust bindings for Python: github.com
PyOxidizer: pyoxidizer.readthedocs.io
ahocorasick_rs: Quickly search for multiple substrings at once: github.com
FIL Profiler: pythonspeed.com
Free ebook covering this process: pythonspeed.com

Talk Python Twilio + Flask course: talkpython.fm/twilio

Watch this episode on YouTube: youtube.com
Episode #323 deep-dive: talkpython.fm/323
Episode transcripts: talkpython.fm

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 You've got your Python API or app running in a Docker container.

00:03 Great.

00:03 Are you ready to ship it to that hosted container service and head off to production?

00:07 Not so fast.

00:08 Have you considered how you'll manage evolving the dependencies and addressing security updates

00:13 over time?

00:14 Not just for the base OS, but for the installed packages?

00:17 How about your pip installed dependencies?

00:19 Are you running as root?

00:21 If you don't know, the answer is yes.

00:23 We'll discuss these and many more issues with Itamar Turner-Trowing on this episode.

00:28 It's Talk Python To Me.

00:29 Episode 323, recorded June 14th, 2021.

00:34 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:53 and the personalities.

00:54 This is your host, Michael Kennedy.

00:55 Follow me on Twitter where I'm @mkennedy.

00:58 And keep up with the show and listen to past episodes at talkpython.fm.

01:01 And follow the show on Twitter via at Talk Python.

01:04 This episode is brought to you by Sentry and Linode.

01:08 And the transcripts are brought to you by Assembly AI.

01:10 Please check out what they're offering during their segments.

01:13 It really helps support the show.

01:16 Hi, all.

01:16 I have a quick announcement before we dive into the interview.

01:19 Over at Talk Python Training, we just released our latest course, Python-powered chat apps with Twilio and Syngrid.

01:25 Have you ever wanted to create a chatbot using Python?

01:29 In this course, we'll be building an ordering system for a tech-savvy bakery.

01:33 Our customers can place orders over WhatsApp.

01:35 And during the chat back and forth, we'll integrate data from our Flask API, answering questions such as, what's the menu?

01:42 What does that cost?

01:44 And so on.

01:44 Then we'll integrate this into a sweet backend that sends receipts as PDFs over emails and allows our bakers to see new orders and then mark them as fulfilled and notify the customers when they're done.

01:56 In short, it's cool tech and a super fun project.

01:59 Oh, and one more thing.

02:01 This course is free for everyone.

02:03 So if this six-hour course sounds fun, just click the link in your podcast player's show notes and jump into the course.

02:08 Now, on to the Docker best practices.

02:10 Itamar, welcome to Talk Python To Me.

02:13 Welcome back to Talk Python To Me.

02:14 It's been some time.

02:16 Feels like a year or so.

02:17 I'm not sure exactly how long it's been.

02:19 Last time we were on, we were talking about an entirely different topic.

02:24 So you get two bits on the mind map connective sort of relationship of topics here.

02:30 We talked about fill and profiling data science.

02:34 That was fun.

02:34 Yeah.

02:35 I sort of have found myself talking about a bunch of different subjects and like some people are interested in both.

02:42 Some people are interested in the other.

02:43 And Docker is the other thing I've spent a lot of time sort of researching and writing about.

02:48 Yeah, it's I think the data science profiling one was really interesting because profiling has all these challenges and much of it is more focused around profiling running applications or profiling code that's all in Python.

03:00 And so if you need to profile like, say, Fortran code or other weird sort of mix and match libraries, then that was sort of that topic, right?

03:09 Yeah, so feel as a memory profiler for Python and the kind of and specifically for batch processes like data science, scientific computing.

03:17 And so if you're doing scientific computing, they'll there'll be a bunch of code in Fortran and C++ and Rust.

03:22 And so you want to access that memory, like sort of profile memory across all the languages you're using.

03:30 Yeah, because if you've got some big glob of C code, Python thinks it's just a pointer, that little tiny pointer to some, but it turns out to be huge.

03:37 Yeah, so people can check out that episode.

03:39 They're interested.

03:40 And yeah, just give us an update on what you've been doing since then.

03:43 I've actually been trying to turn feel into a sort of make an alternative version of it that you can run in production.

03:48 Profilers often have performance overheads of feel take like 40% performance off the top.

03:54 Trying to make something that will run with like 1%, 2% overhead.

03:58 So you can run on production and just always get reports about your memory usage for any job.

04:03 So if it's like six hours in it, it crashes out of crashes or just uses too much memory, go back and look at that.

04:10 Oh, that would be fantastic.

04:11 We have that for like profiling in terms of performance.

04:14 On some systems, you can plug them in and they'll kind of give you real time.

04:19 How is my app doing in terms of, you know, here's where it's spending its time or it got slower.

04:25 Maybe it's even just measuring like request response.

04:28 But memory profiling has typically been pretty intensive, right?

04:33 So that'd be cool if you could get it down to that level.

04:35 Yeah, and this is a very good pen.

04:39 Both Phil and this project are very good.

04:41 Pandemic projects.

04:42 It's like really, it's quite difficult to do.

04:45 And it's like, but it's something that is sort of completely under my control.

04:48 Like we'll get to Docker and Docker is like, there's this giant ecosystem and they all have

04:53 differing opinions about how you do things.

04:54 And like everything's so broken around the edges.

04:57 Whereas here it's like, I have a box and it's a very complicated box, but it's under my control.

05:04 And so I can do so.

05:04 It's kind of relaxing in an environment where the world is not under my control.

05:10 It's been a crazy time, hasn't it?

05:13 Yeah.

05:13 I feel like we're getting used to it.

05:15 It's odd, but you know, people just get used to whatever water they swim in eventually, I guess.

05:20 Yeah.

05:21 Yeah.

05:21 Let's talk about Docker a little bit.

05:24 So it hasn't been that long since I had, that's, I also want to shout out quick episode 274 is when

05:31 we talked about Phil, people want to go back and check that out.

05:33 But I had Peter McKee from Docker over there to come and talk about sort of what is Docker, you know, give us an update on Docker, the company and like sort of set the stage for Python developers, right?

05:46 To kind of get going on the dev side and just start using Docker.

05:49 So that was episode 308 and that was fun.

05:51 But then recently you gave a talk at PyCon called Zero to Production Ready, a best practices process for Docker packaging.

06:00 And so I thought that was really interesting and wanted to have you on the show so we could dive into Docker best practices for Python, but also your focus is really on production, not necessarily development, right?

06:13 Yeah.

06:13 Maybe we start there.

06:14 Like what's Docker look like for software development as a, I just need to make my stuff run so I can code it and test it out versus, I don't know, zero downtime Kubernetes or whatever it is you're trying to do type of thing.

06:26 What do those two worlds look like and maybe tell folks about when they should care about what are the advantages or whatever.

06:32 Yeah.

06:32 So what Docker gives you is a sort of package that contains all the files you need for the file system, contains Python.

06:41 It contains all the system libraries you need to run your Python extensions.

06:45 It contains all your Python dependencies, contains all your code and contains a script to launch your code.

06:52 And so as a starting point, this is useful for development because if you're say on macOS or in Windows and you're deploying to Linux, you can run something locally that is the same on across different computers.

07:05 Even if you are on Linux, like I have one machine that's Fedora 33, I have another machine that's Ubuntu, like they're different in a bunch of subtle and lots of subtle ways.

07:17 And so by having a Docker container when I'm developing, I can have a completely consistent environment.

07:24 And then know that that environment when I, when I then take that code and run in production, it'll be exactly the same there.

07:29 Right.

07:29 I had a, somebody reach out to me a little while ago and ask something to the effect of, I've got a bunch of different developers on my team and I want to make sure that they all have the same version of Python in the same packages.

07:42 Right.

07:42 And that's a legitimate thing that you might want to do.

07:45 You might want to make sure that those are exactly the same.

07:47 I think maybe in general, there's probably more of a concern about that than an actual problem there.

07:54 You know, a lot of times either these things are going to basically work or they're going to utterly fail.

07:58 I think one of the scenarios maybe where it matters more is data science, where there's slight changes in algorithms, which might lead to different, you know, ways.

08:06 You train the model, which might lean to different, like it could, those kinds of changes, but say in like web apps or UI apps or something like that, it's, it's either just going to work or it's going to completely break.

08:16 That said, you know, this situation you're talking about with Docker for development, like kind of solves that, but to a much bigger degree.

08:23 Right.

08:24 Cause you could specify in this image, we have exactly this version of Python compiled in this way.

08:30 We have these libraries installed with this version.

08:34 We have these environment variables set and this subsystem of Linux installed as well, but not that other way.

08:41 You can completely control it way more than just, I want the same version of Python, right?

08:45 And you can then go further with something like Docker compose or compose lets you start up a little network of containers.

08:52 And then it's very easy to say, okay, I want to spin up Postgres or I want to spin up Redis.

08:57 Whereas traditionally this would be a pain in the ass with Docker and Docker compose.

09:01 You can spin up a little, all your dependency servers really easily.

09:04 So then even if you're not using Docker for your own code, you can use Docker for the services

09:09 you depend on to really easily spin them up.

09:11 Right.

09:11 I need Redis running in this way and I need Postgres in that way.

09:15 And I just need them all configured and to be able to talk.

09:18 So Docker compose up, right?

09:20 Something like that.

09:21 Yeah.

09:22 Docker compose is a way to sort of run a little network of services, makes that really easily.

09:26 Yeah.

09:27 Another big advantage before we get off of the development side of things is onboarding new

09:32 people and new hardware, right?

09:34 If you've got something really complicated like that and you get somebody on the team, instead

09:38 of spending a lot of time trying to get their system put together in the right way, you just

09:43 go, sell Docker, do this.

09:45 Yeah.

09:45 There's, there's open source projects where like you can set up the development environment

09:49 that will also provide a Docker file to just let you run some tests easily, just because

09:55 you're a ongoing, you're only submitting one patch.

09:58 You want to run some, like some tests on it.

10:00 You don't want to go the whole thing.

10:02 So it's very nice when they provide a way to run the code in Docker.

10:05 Yeah, absolutely.

10:06 That said, I don't generally do my development in Docker.

10:09 I just, just have virtual environments and roll with that.

10:13 So it's not always required.

10:14 A couple of thoughts from folks out in the live stream.

10:16 Kim Van Wick.

10:17 Hey Kim.

10:18 Says Docker Compose is an excellent way to make sure all the developers are using the same

10:22 tools and versions.

10:23 And it's just much easier to pass around a YAML file.

10:25 Yeah.

10:26 Compose is like, I remember when Compose first came out and it was called Fig, I think.

10:31 And it was, it took Docker from something really neat to something really useful.

10:36 Yeah.

10:36 Right.

10:37 The promise of Docker is that I can have all these different, if I want to run, like we just

10:42 described, I want to run Redis.

10:43 I want to run maybe a Celery backend.

10:45 I want to run Postgres.

10:47 And then my dev code is going to run and talk to all that.

10:50 Well, keeping those all up to date, you know, start making sure they all build those files

10:55 and they all run and maybe they run in the right order.

10:57 Well, that all of a sudden isn't fun anymore.

10:59 But if you can create a Compose file and just say, here's the set of containers that needs

11:03 to work together, bring them all up in the right order and make sure they're all up to

11:07 date, you know, got their recent build and so on.

11:09 Like that's a whole nother level of the promise of containers.

11:12 Yeah.

11:13 Also, I don't know if you know anything about this.

11:15 I'll maybe take a wild guess here.

11:17 But Daniel Chen out in the live stream also says, question, is there in Windows, is there

11:22 any difference between WSL 2 for the Docker backend compared to Hyper-V or is Hyper-V more

11:28 for backwards compatibility legacy support?

11:30 I, in general, don't use Windows that much.

11:33 But basically, there's two, like, since you're running Linux and Windows is not Linux, you

11:38 need to have some way of running Linux.

11:40 In the past, the way you do that, you would run a virtual machine.

11:43 This is what it does in macOS 2, I believe.

11:44 Windows subsystem for Linux is a way to transparently run Linux applications on Windows and Docker supports

11:52 it these days.

11:53 I suspect they would be faster, but that's just a guess.

11:56 So I don't have a good...

11:58 Yeah, that's my thought as well, that they would be a little more integrated.

12:00 Probably you could more easily do things like mount Windows file system folders from your

12:07 Docker container.

12:08 I don't know.

12:08 Maybe you still can't with the others.

12:09 Probably you can.

12:10 But...

12:10 I'd expect to be faster at least, yeah.

12:12 Yeah.

12:12 It seems like if you want to run on Linux, you're probably...

12:16 I think you're probably closer.

12:16 It's definitely more lightweight.

12:18 Hyper-V would be running a full-on Linux VM and then hosting Docker in that, I'm pretty

12:23 sure.

12:23 Yeah.

12:24 Yeah.

12:24 Cool.

12:24 All right.

12:25 Well, hopefully our guesses there, Daniel, are helpful.

12:27 All right.

12:28 Well, let's talk about your talk that you gave at PyCon.

12:32 I mean, giving talks today at conferences, like we started the show off, like it's a weird

12:36 world, right?

12:37 I'm giving a talk and both doing a live stream podcast on a conference tomorrow at the Manning

12:44 Developer Productivity Conference.

12:46 How can I do that?

12:47 We record it.

12:48 We publish it.

12:49 And then we have a live Q&A afterwards.

12:51 So the presentation of my recording will be during the Python bytes recording tomorrow,

12:56 but then the live interactive bit will be actually after.

12:59 So yeah, that's the commerce world we live in.

13:01 And so PyCon this year was virtual.

13:05 You put together a really nice presentation sort of in this format.

13:09 And yeah, like I said, I got a lot out of it and I liked what you covered there.

13:14 Yeah.

13:14 And so the starting point is you have the service, now you want to run it in production.

13:18 And this is a very dramatic departure from running things locally, because locally, the

13:25 thing you're prioritizing is basically your feedback loop, like your development feedback

13:31 loop.

13:31 Like if you're a web developer, you do like it and you save your code, like can you reload

13:36 the page to have the new stuff running?

13:37 Other applications of feedback loops are a little different.

13:40 But like your goal is just as quickly as possible to interact with your code.

13:45 When you're in production, you have to worry about a whole bunch of other issues because

13:48 it's actually, you have users who are going to be interacting with the software or the data

13:54 it's emitting will be used in the real world.

13:56 It's no longer just something you're working on.

13:59 It's a thing that actually has some, the output actually has some weights and meaning, some

14:04 importance.

14:04 You have to approach it in a different way.

14:06 So some of the things that come to mind here would be downtime, you know, in a perfect

14:11 world, zero downtime in a reasonable world, a couple of seconds of downtime in the world

14:17 of some bizarre web companies that I cannot, I literally cannot understand eight hours of

14:23 downtime because we're deploying the new version of the site.

14:25 So Sunday it'll be down.

14:26 Like what?

14:27 I just literally, I got this a while, you know, a couple of months ago for something

14:32 I was using.

14:32 There's going to be hours of downtime for a site.

14:35 That's just, do, do, as you said, do you upgrade to the new version of the site?

14:39 That should be a button folks.

14:40 That should be not long.

14:41 Anyway, one of the things is downtime, right?

14:43 You want to focus on that and you don't care about that at all with development.

14:47 I mean, you want it to be somewhat responsive, but it doesn't matter if it's down for a moment.

14:53 This portion of Talk Python Army is brought to you by Sentry.

14:55 How would you like to remove a little stress from your life?

14:58 Do you worry that users might be having difficulties or are encountering errors in your app right

15:03 now?

15:03 Would you even know it until they send that support email?

15:06 How much better would it be to have the error and performance details immediately sent to

15:11 you, including the call stack and values of local variables and the active user recorded

15:16 in that report?

15:17 With Sentry, this is not only possible, it's simple.

15:20 In fact, we use Sentry on all the Talk Python web properties.

15:24 We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we

15:30 got their support email.

15:31 That was a great email to write back.

15:32 We saw your error and have already rolled out the fix.

15:35 Imagine their surprise.

15:36 Surprise and delight your users today.

15:38 Create your Sentry account at talkpython.fm/sentry.

15:42 And if you sign up with the code talkpython2021, it's good for two months of Sentry's team plan,

15:48 which will give you up to 20 times as many monthly events as well as other features.

15:53 So just use that code talkpython2021 as your promo code when you sign up.

15:58 Another one that you made a big deal out of that can matter is security.

16:03 You don't want to be in the newspaper or the news website front pages for leaking the largest

16:10 data breach ever or something like that, right?

16:12 Yeah, that's embarrassing.

16:13 Basically, once you're packaging for production, you're at the intersection of a whole bunch

16:18 of processes.

16:18 This is where it starts getting complicated.

16:20 You're coding, and then you have this image, and then you might want to run some tests with

16:24 it, maybe integration tests, and you're going to deploy it.

16:27 And then when you deploy it, you might be upgrading an existing package, existing server.

16:32 It's a server.

16:32 It's a batch process.

16:33 Things are a little different.

16:35 So there's deployment, and then things might go wrong in production, and then you might

16:39 have some sort of feedback mechanism, and maybe you're going to try to reproduce the

16:42 bug locally.

16:43 So all of these different technological organizational processes have some interact in some way with

16:50 your packaging.

16:50 And so it basically makes it a lot more complicated.

16:54 And then you add into it all the different technologies that are intersecting in packaging.

16:58 There's a lot of details to get right, and it gets complicated very quickly.

17:01 Another area that you want to get into has to do with making sure that you're running

17:07 the latest version, but you're not necessarily every deployment just grabbing the latest version.

17:14 So you need some way to inject stability, and you need some way that that stability doesn't

17:19 lock in computer vulnerabilities or any of those kinds of issues.

17:24 It also allows it to keep growing, right?

17:26 Yeah.

17:26 And this is sort of one of the more significant examples, but an example of the bigger picture,

17:33 which is packaging is a process.

17:35 And so it's not just about writing some configuration files.

17:39 It's going to interact with the way you write code, and it's a thing that parts of it are going

17:44 to continue over time.

17:45 So when you're packaging for production, you're not just writing a few config files and calling

17:50 it a day.

17:50 You actually need to think about, need to set up these ongoing processes for things

17:55 like security updates and for things like dependency updates.

17:58 Right.

17:58 It's one thing to get it running on a cluster, a container cluster.

18:01 It's another to say, and here's how we're going to keep this software healthy and running

18:06 over time, right?

18:08 Yeah.

18:08 You need to sort of think through the implications of what you're doing, and it's not just a

18:13 one-off thing.

18:13 It's an ongoing thing.

18:14 Yeah, absolutely.

18:15 All right.

18:16 Let's dive into some of the details.

18:18 So it turns out I discovered today, as I was pulling up your website, that you've actually

18:23 written a whole bunch of stuff about production-ready Docker packaging, and that you're actually

18:29 working on a handbook.

18:30 I end up doing this a lot as well.

18:32 I end up, I'll spend a month doing tons of research and examples and thinking about a course,

18:38 and I'm like, oh, there's a couple of nice presentations or conference talks I could pull

18:42 out of here.

18:42 And yeah, it's a good way to do it, right?

18:45 So you've been thinking a lot about this, not just for this talk, but beyond, right?

18:48 Yeah.

18:49 I'm sort of spending two years on it so far.

18:52 Like, I have three different products up there.

18:56 I've done training.

18:56 There's like a lot of articles these days.

19:00 It just adds up.

19:01 And yeah, I've spent a lot of time looking into this because it turns out it's, I should say,

19:07 I don't like Docker packaging.

19:10 Okay.

19:11 This isn't a thing I'm doing because this is fun.

19:14 It's not actually fun.

19:16 It's kind of a pain.

19:17 It's just, it's very useful and it's very easy to get it wrong or to miss things.

19:23 And so what I've been trying to do is sit and say, here's this really useful thing.

19:28 Here are the details you need to get right.

19:30 And now that I've written it down, you don't have to waste your time trying to figure this

19:34 out because much of it is not, it's really useful, but it's not like, you don't feel like

19:39 you're a better person for having figured this out.

19:42 It's just, it's the getting, it's an obstacle.

19:45 And I'm trying to get people past those obstacles that can use this useful technology.

19:48 Yeah.

19:49 Well, there's a lot of stuff that you talk about that is not necessarily something that

19:52 would be front of mind, like security, like how to manage the versioning over time and

19:57 so on.

19:58 But I think also it would be quite satisfying to have, you know, take something that's janky

20:03 and maybe it does have that like one hour banner on Sunday, we're going to be down from three

20:07 to four upgrading our site and, and be able to remove that and say, no, we just deploy a

20:12 couple of times a week now.

20:13 We don't think about it because it's get push prod and then wait 30 seconds.

20:19 And then prod is now the new one.

20:20 Right.

20:21 Yeah.

20:21 I think that's a very good feeling.

20:22 Yeah.

20:23 And there's a bunch more you need to do that than what I talk about.

20:26 I'm talking about one piece.

20:27 There's also the deployment process and that there's having good observability and

20:31 logging, even just the packaging part of it.

20:33 Like there's a lot of details to get right that can make it a lot easier.

20:36 Yeah.

20:36 So the way you started the presentation, your first thought for this was that packaging,

20:42 this whole Docker as production, packaging your app and Docker for production is an iterative

20:48 process and maybe also layers, right?

20:51 Like, so you sit, you don't necessarily start with the whole, well, we want zero downtime.

20:56 You start with, can we make it run in Docker?

20:58 Yeah.

20:58 And you don't have to do it as an iterative process.

21:01 Like if you can manage to keep all this in your head, which honestly, I can't, but there's

21:05 too many details.

21:06 Or if I were to start Dockerizing something, I would probably do a bunch of it in one go

21:11 because I remember some of it.

21:12 But if you're doing this at your job and you probably are like someone's, there's going

21:17 to be an emergency or someone's going to pull you over or there's going to be a bug that

21:21 you have to fix.

21:21 And so you're going to get pulled away at some point.

21:23 And so what you'd like to do is build your packaging in a way where if you're interrupted,

21:29 you have to stop, you have to put it on hold.

21:31 You can put it aside and know you're a good stopping port and you want to sort of prioritize

21:35 the most important ways.

21:37 So if you run, like you may just run out of time budget, like you just have limited time

21:42 and you want to do the highest priority things first.

21:44 And if you ever have more time, you can-

21:46 Well, it sounds like what you're describing is a little bit of what happened to the software

21:50 development side of things when it went from waterfall to agile or waterfall to something

21:55 better, right?

21:56 So many projects used to say, well, what we're going to do is we're going to build it until

21:59 we're going to work on it for six months.

22:01 And it's not going to be actually usable in any meaningful way until that six month point.

22:06 And then maybe it drags on and it goes over budget and gets canceled.

22:09 And there's just all sorts of, you know, you get no user feedback.

22:12 There's all these kinds of problems trying to build software.

22:14 So it doesn't surprise me that there would be lots of advantages to trying to apply that

22:19 same sort of iterative thinking.

22:20 Like, let's make sure that each step along the way, we have something that's useful and

22:25 more useful than it was before.

22:26 Yeah.

22:27 And also, even if you know you have the time budget to do all of it, having a good understanding

22:32 of the priorities means you can focus on the really important things.

22:36 Yeah.

22:37 Like large images are very visible.

22:39 Like it's very easy.

22:40 Like you can look at your image and say, why is this two gigabyte?

22:42 Like this is ridiculous.

22:44 And then you can go into this rabbit hole of like trying to make your image smaller.

22:48 And that's a fine thing to do.

22:50 But if you deploy an image that's insecure and nice and small, that's not the best trade-off.

22:57 I would venture to say that your organization would not praise you for your efforts to make

23:03 it small.

23:04 But rather, they would be pretty upset about the security problems, right?

23:07 Yeah.

23:07 And so like security seems like a high priority, like automation's high priority.

23:13 And the order you might actually do it, it might be slightly different depending on your

23:17 particular domain you're working on.

23:19 Right.

23:19 But yeah, you want to think about.

23:21 It's a bit of a priority stack, right?

23:23 Like what is most important to me or what is most foundational to the whole, this whole

23:27 packaging and Docker process.

23:29 Yeah.

23:29 So you actually put up some six points that you thought were sort of stages in your talk.

23:35 Yeah.

23:35 Get something working.

23:36 That's just, can you use Docker at all?

23:38 Obviously, if it doesn't work, it's not going to be useful.

23:40 And then number two, even before continuous integration, security.

23:43 I can see that people would overlook that, but that's not trivial, right?

23:47 Yeah.

23:48 And like security is sort of a never ending thing, like, because you have to deal with security

23:54 updates.

23:54 But if you want to, like, you don't need complete automation to like spin up a server somewhere

24:01 and test something.

24:02 Like if you're using Heroku, you can like push a Docker image to Heroku and spin up a server.

24:06 Yeah.

24:06 You could manually do a get push.

24:08 Yeah.

24:09 You can push stuff.

24:10 But, and then try it out.

24:12 But if it's talking to like a production database and it's not secure, that's the problem.

24:17 And packaging is only one small part of the security and most of it's going to be application

24:22 security, but it still comes into it and it's still important.

24:25 Right.

24:25 Little Bobby tables is still a problem, even if it's running in Docker.

24:29 Yes.

24:30 That's right.

24:31 Okay.

24:32 So number two is security.

24:33 And then number three is continuous integration.

24:36 So making sure that like when we check in code that it's tested in Docker, guessing the tester

24:43 run, the system, something of a Docker compose up type of thing happens and it's all good.

24:48 Then correctness and debugability.

24:51 So correctness is obvious, right?

24:53 It needs to work.

24:54 It needs to have fresh data.

24:55 It can't have like stale caches and weird things like that.

24:58 But debugability is interesting.

25:00 Maybe you want to focus on that for just a sec?

25:02 Sure.

25:03 So the idea is that like once you have automated builds, you might start actually running things

25:08 for product in production for real.

25:09 Or even if not, you're going to have, if you're building an image for every pull request, now

25:14 you have a bunch of images.

25:15 And so someone files a bug.

25:17 How do you know what version of your code, like which Docker image, do they know which Docker

25:20 image they were using?

25:21 What version of the code it matches to?

25:22 If something crashes, like, are you going to get actual, are you going to get logs that someone

25:27 can report or not?

25:28 If you don't go to the effort of exporting the volume that has where the logs get written,

25:33 every new deploy gets a new fresh set of logs.

25:36 Yeah.

25:36 So yeah, like logging to standard out or standard error is the other way you deal with logs in

25:42 Docker.

25:42 But you need to like put some minimal thought into like, where are my logs going to go?

25:47 Yeah.

25:47 Another thing that gets really tricky around that kind of stuff, I feel, has to do with

25:52 the fact that there's just so many moving parts a lot of times.

25:54 You know, you've got your Celery Docker container.

25:57 You've got your Redis Docker container.

25:59 You've got your Postgres Docker container.

26:01 You've got your app Docker container.

26:03 You're doing microservices, like who knows how many.

26:06 And then all of those things have logs.

26:08 Do you use anything, any of the services that try to bring all those logs into one place?

26:13 Not a huge amount of experience on any particular one.

26:16 I have, or I've built a system like that, which these days I wouldn't recommend using, but

26:22 it's more for this point only for scientific computing.

26:24 Yeah.

26:24 But before the log as a service, log aggregation as a service was a thing.

26:30 Yeah.

26:30 I've worked on like airline reservation system.

26:32 I had one of these.

26:33 And this was a like eye opener for me being able to see logs going between five services

26:38 in like multiple different protocols.

26:41 So you could make debugging vastly simpler.

26:44 Yeah.

26:44 So you really like anything that lets you trace across services with a tracing ID will

26:51 make your, like once you have more than one service, it'll make your life much easier.

26:54 However, I, my recommendation is avoid microservices unless your company has 500 people or something

27:02 like that.

27:02 Yeah.

27:03 It's an amazing architectural design pattern when there, there needs to be autonomy for

27:09 different parts of the application.

27:10 Like this team works on the front end little bits here.

27:13 This team works on the user authentication and identity part APIs.

27:18 And, but you know, looking at the most recent PSF JetBrains survey, that is not the number

27:25 of employees, number of people on a team type of description for most Python developers.

27:30 It's like a handful of folks all working on a lot of it.

27:34 Right.

27:34 Yeah.

27:34 And if you look at the companies that are doing microservices successfully at scale,

27:38 they will have a team of, you know, three to five people working on one service.

27:42 So if you have a team of five people, 20 services, you are doing like a hundred times more services

27:48 per developer than the big companies.

27:50 Yeah.

27:50 And that's a lot of complexity you've just added to your life and it is often unnecessary.

27:56 Yeah.

27:56 Here, let's, I'll pull up the survey.

27:59 So Python developer survey 2020 results.

28:02 If you search in that for team, not team city, employment work, working in a team versus

28:07 working independently, about half the people work on a team.

28:10 But if you look at the team size, 75% are two to seven.

28:13 Yeah.

28:13 That should be one microservice.

28:15 Yeah.

28:15 It's really microservices.

28:17 And there are applications that are actually, there are distributed systems where you actually,

28:21 it makes sense to do more, but like any time you make something more distributed, you

28:26 are adding a vast amount of complexity.

28:28 And so if you can avoid it, avoid it.

28:31 Yeah.

28:31 Well, here's the way that I think about it.

28:32 I think about microservices.

28:35 They have a tons of value and they move where the complexity of your application lives.

28:41 So what you end up with is very simple, relatively simple, small, easy bits of code.

28:47 But what you also end up with is a much more complex deployment DevOps coordination story.

28:53 So when I think about microservices, the farther you go towards microservices, the more you're

28:57 taking the code complexity of a large app and architectural patterns and separation.

29:02 And you're saying, well, we don't need any of that.

29:03 Let's make it real simple.

29:04 And we move that complexity to coordinating a bunch of services that are always up, that are

29:09 debuggable across services and versioned and all those things.

29:13 And when I think about that for me, I'm way better at software complexity than I am at deployment

29:18 complexity.

29:18 So I'm more successful not putting it where I don't have my experience or skillset.

29:23 You know?

29:23 It's actually software complexity too, because if you call a function, you're going to call

29:28 a function.

29:29 If you send a message to another service, it may never arrive.

29:32 It may be delayed arbitrarily.

29:34 Yeah.

29:35 And so the communication becomes and the reliability of a thing you're calling, the switch from a

29:42 function within the same process to remote service is a sort of huge increase in unpredictability

29:48 and sources of error.

29:49 That's a good point.

29:50 Because as for many things that could go wrong with, say, calling some function or a

29:55 sort of class level method, not having it get called is not one of the things you'd have

30:00 to worry about.

30:01 Yeah.

30:02 Right?

30:02 You might crash because the file system isn't there, isn't accessible.

30:06 The database isn't there.

30:07 But it's not that you couldn't even call it, right?

30:09 That's going to happen.

30:10 Yeah.

30:10 Before we move on, Kim says, FileBeat, Logstash, Prometheus, and Protainer.

30:16 Oh, Protainer.

30:17 I've never heard of that one.

30:18 Can all help with logs from Docker in various ways.

30:20 Awesome.

30:20 Good resources to check out.

30:22 Okay.

30:22 Last one on your quick hit list before we dive into some of the details is faster builds

30:27 and smaller images.

30:28 I think we skipped reproducible builds.

30:31 Who needs reproducibility?

30:32 Yeah.

30:32 Let's go with reproducibility.

30:33 Yeah.

30:34 And so this is the static versus dynamic change of dependencies that you talked about,

30:39 where on the one hand, you really don't want every time you reinstall your application to

30:44 get the latest dependencies because a new version of Django comes out.

30:48 You don't want your code to suddenly start running on it just because it came out.

30:52 Right.

30:52 And maybe you're not aware, right?

30:53 Because you have the older version of Django, as you suggest, and you're working on it.

30:58 You get push.

30:59 It goes to CI.

31:00 And then that ideally is going to like some sort of continuous delivery and it's grabbed

31:05 just the latest, which is not what you had.

31:07 And then it runs with it.

31:08 Like that could be bad news.

31:09 Yeah.

31:09 This happens a lot of like dev tools.

31:11 And then like, oh, I'm going to go check if there's been a new release today.

31:15 And the matter is less.

31:16 But when it's your actual production software, that's not, oh, I wasted 20 minutes figuring

31:21 out a failed build.

31:22 It's my code is acting weird in production.

31:25 On the other hand, if you just freeze all your dependencies and never change them, then

31:29 at some point you're going to be running on a version of software from two or three or

31:34 five years ago.

31:35 Like I've, the extreme cases, like there are organizations still running on Python too.

31:40 And this is, becomes very problematic because upgrades become more and more terrifying the more

31:48 you put them off.

31:49 Because it's not just like Django.

31:51 It's like, you have to upgrade Python and Django and three other major libraries you depend

31:56 on.

31:57 And it's, it's like this project.

32:00 And it's a project that's not features or bug fixers or anything.

32:04 It's just risk.

32:05 But if you put it off more, then it's more risk.

32:07 And so you need a process that's ongoing.

32:09 So you need both in the short term to be, make sure your builds are identical, reproducible

32:14 and long or mostly identical.

32:16 And then the longterm you need the process to continuously update so that upgrades are

32:21 not this terrifying thing.

32:22 They're just a standard part of your development process.

32:24 Yeah.

32:24 I had Carlton and Will on from Django chat a while ago talking about deployment.

32:29 And we came up with the idea that there are basically two types of applications.

32:33 There's ones that you're going to continue to add features to, and you're going to care

32:37 about, and you have a, maybe a team dedicated to it.

32:40 And for those, you never want to be that far from the latest thing.

32:43 Just like you described, the farther you get, the more frightening and the more potential

32:47 problems you have if we take on the latest, right?

32:50 Because if you're on Django 2 and it's Django 4.5 is out in some future world, you're like,

32:56 well, we finally need to move to get that because the other one's gone fully unsupported.

32:59 Well, that's like you said, as a project.

33:01 Because if you're always kind of just in dev, sort of rolling into later one and then deciding

33:05 to roll that out, like that's a much smaller challenge.

33:08 So those should absolutely stay there.

33:10 Also, we talked about a set of apps, a type of app that falls into the, please don't

33:16 touch it.

33:16 And if you do touch it and break it, it's now your baby.

33:19 It's some horrible legacy code.

33:22 The person who created it probably doesn't work at the company anymore.

33:25 Nobody really likes it.

33:26 It's not important, but it needs to be there.

33:29 Like it's some internal app or something, right?

33:30 Like maybe those, you just freeze those in time.

33:34 They're very likely not public facing or something.

33:37 But certainly if you care about continuing to work on this thing and adding features to

33:41 it and it matters, then keep it not too far.

33:44 That's the tension.

33:45 You don't want to just constantly ship the latest thing because maybe that's a major release of

33:50 some library, but at the same time, you don't want to freeze it.

33:54 This portion of Talk Python To Me is sponsored by Linode.

33:57 Visit talkpython.fm/Linode to see why Linode has been voted the top infrastructure

34:02 as a service provider by both G2 and TrustRadius.

34:05 From their award-winning support, which is offered 24, 7, 365 to every level of user,

34:11 to the ease of use and setup, it's clear why developers have been trusting Linode for

34:15 projects both big and small since 2003.

34:18 Deploy your entire application stack with Linode's one-click app marketplace or build

34:24 it all from scratch and manage everything yourself with supported centralized tools like Terraform.

34:28 Linode offers the best price-to-performance value for all compute instances, including GPUs

34:34 as well as block storage, Kubernetes, and their upcoming bare metal release.

34:39 Linode makes cloud computing fast, simple, and affordable, allowing you to focus on your

34:44 projects, not your infrastructure.

34:46 Visit talkpython.fm/Linode and sign up with your Google account, your GitHub account,

34:52 or your email address, and you'll get $100 in credit.

34:55 That's talkpython.fm/Linode, or just click the link in your podcast player's show notes.

35:00 And thank them for supporting Talk Python.

35:02 Quick comment back to the monolith.

35:07 Tolines says, can you speak to microservices versus monolith, in particular for ML applications?

35:14 I think that's a little bit different.

35:16 I haven't really thought of it from an ML perspective.

35:17 You got thoughts?

35:18 My decent, a decent rule of thumb is, are you working on a web application where there's

35:23 hundreds of developers working on that application?

35:25 And the answer is yes.

35:27 Then someone in the organization is going to bring up microservices.

35:32 Anything smaller than that, just don't think about it.

35:37 Once you're small enough, I tend to feel the same way about Kubernetes.

35:42 There's a lot of technologies for a company with 5,000 people or 500 people or 50 people

35:48 or five people or one person.

35:50 Each organizational size, you're going to want different technologies because different

35:54 architectures because, when I say applications, we'll have 500 developers or five developers

35:58 because your ability to specialize, your ability to build infrastructure are different.

36:03 And so if it's a thing that an organization that has thousands of developers working, like,

36:08 are you building Pinterest?

36:10 Probably not.

36:11 Then the technology sources Pinterest makes may not be relevant to you.

36:15 Yeah.

36:15 I think maybe another consideration is how much is part of that functionality shared?

36:19 Are you building an API that has some models that make some prediction that a whole bunch

36:24 of your company and different apps and websites and such might need?

36:28 And, you know, maybe that's its own thing.

36:30 But if it's only being shared in one place, maybe not.

36:34 There's an interesting article that was from 2019.

36:37 Might be worth people checking out.

36:39 It's called Give Me Back My Monolith from Craig Kirsten's.

36:44 Anyway, I'm not going to go into it here, but it's kind of an interesting read.

36:48 People can check that out if they want.

36:49 All right.

36:50 We talked about faster builds and small images now.

36:52 We're there.

36:53 Tell us about that.

36:54 People who are new to Docker, haven't done a lot with Docker.

36:56 There's a lot of things you can do to result in a smaller, physically a smaller image size,

37:04 right?

37:04 A smaller file on disk.

37:06 Yeah.

37:06 It's very easy to get a giant image in Docker because the Docker image format is basically

37:12 in many ways like a Git history.

37:14 So every time you make a change, it's not overriding.

37:17 It's adding.

37:18 So there's a history there.

37:19 The history is always there.

37:20 So if you delete a file, it doesn't make the Docker image any smaller if it was added in

37:25 a previous layer.

37:25 Right.

37:26 If it was added in a different layer, that's right.

37:27 Yeah.

37:28 If you structure things right, there's a bunch you can do to make your images smaller.

37:31 And similarly, Docker has a bunch of features to allow you to not have to run pip install

37:39 every single time you rebuild your image because the dependencies haven't changed.

37:42 So it can just cache those files for you, but you have to set it up right.

37:46 So it does that.

37:47 And you can go from a half an hour build to a one minute build depending on how you built

37:54 your Docker image.

37:55 So LP Linux being my current favorite example, although maybe that's going to get fixed over

37:59 the next year.

38:00 One of the things that's super interesting about that as well is the ordering as well as the

38:04 grouping of those commands can really matter.

38:08 So for example, if the first thing that you do in your Docker file is to copy over your source

38:14 code, then the next thing to do is to do an apps update, apps upgrade.

38:19 Then the next thing to do is to install the dependencies and so on and so on.

38:24 Right.

38:24 Every time any file that you're working on, even like an unrelated CSS file changes, everything

38:30 below that has to be rebuilt.

38:31 Right.

38:31 And if you make a point to say, well, let's reorder that.

38:34 So the very last thing we do is copy our files over.

38:37 Then as you make changes, like you won't, those other layers will just be up to date.

38:42 And one cool trick that I've seen that you can make that even better is somewhere in that

38:46 intermediate bit.

38:47 You could even copy your requirements or pyproject.toml file over and then install those and then

38:53 copy the rest of your code over, which looks like a stupid duplication.

38:57 Like, why are you just copying this one file?

38:58 You already copied it in the next one, but you can cache that pip install, pip compile step,

39:04 make it faster.

39:04 Yeah.

39:05 But basically, if you understand how Docker caching works, then you can sort of structure

39:11 Docker file in the right way.

39:12 And then you get caching.

39:14 Then you need extra steps to get it working in CI, but you can get a much faster build.

39:18 So it'll work.

39:19 The fast scene is super clear.

39:20 Tell us about smaller though.

39:21 What do you really focus on for smaller?

39:23 So some of it is just things where various packaging tools are optimized for development

39:29 by default.

39:30 So if you do pip install and you do something like NumPy, this can be, or if you're, let's

39:35 say, let's go, the big packages, like things like TensorFlow, like these packages, some of

39:41 these packages are hundreds of megabytes.

39:42 Like they're just huge.

39:44 So you download the package and then it unpacks it and installs all the files.

39:49 And then by default, pip will keep a copy of that downloaded file in a directory.

39:53 Probably the intermediate build output.

39:55 The wheel file you downloaded.

39:57 Potentially all that stuff.

39:58 Yeah.

39:58 It'll keep a copy of the wheel file.

40:00 The idea is like you might be doing another virtual end in another two hours.

40:07 And so when you do pip install, this time it can just, it doesn't have to download it.

40:10 It can just use the cache version.

40:11 And for development, that's great.

40:13 But for a Docker image, you are never going to call pip install again.

40:18 So keeping this file is just like an extra 400 megabytes of this space.

40:22 And so there's a command line option for pip install.

40:25 It says --no cache there.

40:27 Yeah.

40:27 And then it doesn't keep a copy.

40:28 And now you're, you haven't, you're free to put a bunch of space by adding, which can be

40:34 fairly considerable if you're doing certain, especially for data science tools.

40:38 Right.

40:39 You add it up across all the dependencies and their transitive dependencies and so on.

40:43 Right.

40:43 Yeah.

40:43 That's like just not storing files you don't need.

40:46 One of the mind shifts you got to get into to work with this stuff is you will never, ever

40:52 change the Docker configuration.

40:54 Right.

40:55 It's not like, oh, there's updates to Linux.

40:57 So I'm going to go in and like app update it, or there's an update to my requirements.

41:00 So I'm going to reinstall the requirements.

41:02 You recreate a new Docker image and you throw away the old Docker image.

41:05 Right.

41:06 So there's a lot of the things that are there to make that next step.

41:09 Right.

41:10 pip installing, again, work well are just liabilities and negative effects on your Docker image.

41:16 Right.

41:16 Yeah.

41:17 Because Docker images are sort of designed to be treated as immutable artifacts, which is

41:23 sort of great.

41:24 But also like you're dealing with a whole bunch of tools that don't really have that assumption.

41:28 And so you have to figure out ways to make those two conflicting goals work together.

41:33 Yeah.

41:33 Another interesting thing that Peter brought up, Peter McKee in the other episode we did

41:39 not too long ago, was intermediate frameworks and all sorts of stuff.

41:43 It doesn't apply super well to the Python space, but maybe there's certain aspects, especially

41:49 in the data science side that might.

41:51 So for example, if you're going to install like the development setup for Python, not

41:55 just the ability to run, but to do pip installs and do all sorts of things.

41:59 The example he gave was if you're going to have something that runs in Go, well, one of the

42:04 steps you might install might do is like, well, we're going to install the Go compiler and

42:08 all that business.

42:08 And we're going to compile the artifact.

42:10 And then you would try to run it.

42:11 It's like, you create a separate container that will take the code and compile it and

42:16 give you the binary, just copy the binary without the compiler back in there.

42:20 So maybe there's some techniques like that.

42:22 I don't know.

42:22 I mean, I don't see it quite as well in Python because we can't fully package it up as reliably.

42:27 But yeah.

42:28 So a common, so this happens a bunch if you're compiling your own custom C extensions.

42:33 Yeah.

42:34 So one way you can deal with that is like, you can have a thing that generates wheels and

42:38 then like you build your Docker image, you just download the compiled wheel.

42:41 But if you want to do it in your Docker image, you're going to have to install a compiler,

42:45 but then you don't, that compiler package is going to be in your final image.

42:49 So it just makes your image bigger.

42:50 You don't need GCC.

42:51 And so you can use a multi-stage build, which is probably what he was describing.

42:56 Yeah.

42:56 So the one easy way to do that is you create a virtual env, install all your code, and then

43:02 you copy just the virtual env into a new Docker image.

43:06 And then the new Docker image just has the resulting self-contained virtual env, doesn't

43:10 have any of the compilers that you needed to build it.

43:11 Right.

43:11 No matter what else might've been over there.

43:14 Yeah.

43:14 Maybe you could even use something like PEX if you really cared to like compile that or

43:18 to bundle that into a zip and then run that directly.

43:20 I'm not sure, but possibly.

43:22 Yeah.

43:22 Tim out there in the live stream says, dash dash, no cash dir has made my evening.

43:27 Thanks.

43:27 I never thought about it.

43:28 The intermediate files from pip only from app.

43:30 Yeah.

43:31 Then also Docker on build option can help a bit with that scenario.

43:35 Yeah.

43:36 But not including the dev tools.

43:36 Okay.

43:37 That's cool.

43:37 I may have deprecated on builds, but I could be misremembering.

43:41 That would be too bad because I just learned about it.

43:43 Okay.

43:45 Those are the six things that you talk about in this iterative process or this layered process.

43:49 It's like step one, your first deliverable or your first package of this as Docker sprint

43:55 would be get something working in either the single container or the suite of containers

44:00 from Docker Compose.

44:01 Step two is make sure they're secure.

44:03 Step three, getting them running in CI.

44:04 Step four, make sure that they're correct and debuggable.

44:08 Number five is reproducibility with that balance of not exactly the latest, but not super old

44:14 and stale.

44:14 And then finally fast builds and small images.

44:17 Yeah.

44:17 And then along the way, there's a whole bunch of different things you can do depending what

44:22 tools you're using and like what's your priorities are.

44:25 And yeah.

44:26 Can maybe give some examples if we have time.

44:28 Yeah, absolutely.

44:29 We got a little bit of time.

44:30 I thought that'd be fun.

44:31 So we could dive into just get something working, which is like a hand line of a couple of lines

44:38 in Docker, like see Docker symbol.

44:39 Yeah.

44:40 Yeah.

44:40 You choose a base image, copy your code and run pip install and say, this is what I want

44:45 you to run when you start up.

44:46 And for many applications, I'll do the trick.

44:48 So I'm always wondering what is a good container base to start from, right?

44:53 So you have this Python 3.9 Slim Buster version as the base.

44:59 Yeah.

44:59 There's a bunch of different options, right?

45:00 What do you think?

45:01 So the first thing is you want a, these are all based on Linux distributions typically.

45:06 So you want a Linux distribution that's some sort of long-term support where like they are

45:10 both guaranteeing backwards compatibility in terms of like binary ABIs, but also in terms

45:16 of features, but they're also doing security backboards.

45:19 So like Debian stable, Ubuntu long-term support, Red Hat Enterprise Linux, they all are going

45:24 to give you this stability guarantee.

45:27 They'll say, we'll give you a stable operating system with security updates to it.

45:31 So you want something that's based on one of those probably.

45:33 And then you need to, you typically are going to want a up-to-date Python.

45:37 And these distributions will sometimes like backport, new versions of Python.

45:43 And so you can use that.

45:44 So you can say, I'm going to use like Ubuntu long-term support from 2020.

45:50 And like it has Python 3.8 and maybe they just added 3.9.

45:54 I'm not sure.

45:54 I saw something to that effect.

45:56 And then you can go with that.

45:57 Or Docker maintains these things for the official, in quotes, Docker images for Python.

46:04 And basically what they do is they take Debian stable and then they compile all the different

46:09 versions of Python for it.

46:10 So you can get 3.7 or 3.8 or 3.9 and 3.10 when it comes out, regardless of what's in Debian

46:14 stable.

46:15 So it's Debian stable plus an extra Python.

46:17 So Python colon 3.9 is Debian stable plus 3.9.

46:22 Then they have two variants.

46:24 One has a bunch of extra packages and one has fewer.

46:27 One with fewer packages is the dash slim.

46:29 And then the dash buster is which version of Debian you're using.

46:33 And the reason you don't have to specify that, but like maybe like at the end of the year,

46:38 maybe early next year, there's going to be a new version of Debian stable.

46:41 And so you don't want overnight to go from Debian 10 to Debian 11 as your base image.

46:46 You would probably want to just at least do that consciously, right?

46:51 And so saying dash buster means I want to stick to Debian 10 buster.

46:56 And for those who don't know, Debian Linux releases are based on Toy Story characters.

47:00 Nice.

47:01 So Buster is one of them, right?

47:03 Yeah.

47:03 And I don't remember what the next one is.

47:07 There's Debian unstable, which is named CID.

47:09 It's always Debian unstable and it's always CID.

47:12 They never release it.

47:13 That's cool.

47:14 All right.

47:14 So in this example, it's the Docker file says from Python colon 3.9 dash slim dash buster,

47:21 which means all the stuff that you described there.

47:24 And then you copy your files over, you run pip install to install the dependencies.

47:29 And then you just basically start your app as the entry point.

47:32 And that is, hey, we got something working.

47:34 This is probably an oversimplification.

47:36 There might be a database thing that also needs to start up and run its bits and so on.

47:41 But yeah, that's pretty much it, right?

47:43 It's pretty simple.

47:44 Typically pretty simple to just get everything's working because it's just install some packages

47:49 and then tell it to run the scripts when you run a container.

47:52 Yeah.

47:52 It's pretty much whatever you need to do to get a new machine set up to run this,

47:57 do that in this file and you're good to go.

47:59 Yeah.

47:59 And I guess seeing a comment on the chat and I should add that as far as I can tell,

48:04 Docker on build is not deprecated, but I'm not sure.

48:08 So maybe.

48:09 All right.

48:10 All right.

48:10 I'll have to look into it.

48:11 Sounds good.

48:12 So getting it working is super straightforward.

48:15 Yep.

48:15 But getting something secure is interesting.

48:18 Let me go back.

48:19 I think we might be skipping around a bit, but you're talking about having that version specified there of Slim Buster.

48:25 I know how we'll get new dependencies for the Python code up there.

48:30 And if there's some kind of security problem, what will probably happen is Dependabot on GitHub will send me a PR that says,

48:37 warning, warning, your version of web framework has such and such CVE.

48:41 We've created a PR.

48:42 You accept it.

48:43 You push it back to the right branch.

48:45 That kicks off the whole process and everything goes again, right?

48:49 That keeps like the flow of the somewhat fresh code and dependencies going through your system.

48:55 However, how do I keep that same thing happening for Linux, right?

49:00 Suppose Linux has some security vulnerability in the version that I've got,

49:04 or I've got Nginx running and it has something like that I need to update.

49:08 Like, what is the trigger that helps me know?

49:12 Like, what is the process that helps me know?

49:14 Oh, you need to, even if this is a somewhat stable, stale project that we haven't touched for a month,

49:19 you need to somehow go give it a kick to like force it to get the latest and do that again.

49:23 Because there's no auto app upgrade running there.

49:27 Yeah.

49:27 So one thing to note is some people assume that the Debian, like the official Python or even official Debian or whatever,

49:34 the official base images from Docker get security updates every time they come out.

49:38 They don't.

49:39 Some of them get updated fairly frequently.

49:42 Some of them, like the CentOS one, which I guess people are probably switching away from,

49:46 but for a while it was a lot of people probably using it.

49:49 The CentOS base image will not be updated for months at a time.

49:53 And so they are relying on, and the Debian ones, well, I have seen them like lag on security updates by two weeks.

50:00 So Debian has released a new security update, but the Docker image hasn't been updated.

50:04 Which is not ideal because you're telling all the hackers, here's the problem that you can just go look for in systems that lag on getting their patches.

50:11 Yeah.

50:12 And so you as someone who is creating a Docker image cannot rely on the base images to be up to date.

50:18 You need to install security updates when you build your Docker image.

50:22 But what's that look like?

50:23 The step two is apt update, apt upgrade, dash Y, or something like that?

50:28 Yeah.

50:28 apt get update, and then apt get minus Y upgrade.

50:31 You can add a few more command line options to make your image smaller.

50:34 But yeah, it's basically you do an upgrade.

50:36 But there's a problem.

50:38 Docker has, as we talked about, has this caching thing, where if you rebuild an image and nothing's changed, it will just use a cache layer.

50:45 So when you rebuild your image, if you are using caching to speed up builds,

50:48 it'll look at Docker apt get, look at the apt get update, apt get upgrade,

50:51 and say, well, this is unchanged.

50:53 Same command.

50:54 And so it'll just use a cache layer.

50:56 Yeah.

50:56 And you absolutely will very likely be doing that.

50:59 Because it's like five minutes versus three seconds to restart and build and test your app.

51:04 So everyone is going to be using the caching.

51:06 Maybe not CI, CD, but everywhere else.

51:09 I mean, you probably typically will want it in CI, CD too.

51:12 And so the result is that if you've set up caching to speed things up, that caching will ensure you don't get security updates.

51:19 And so basically what you have to do is just have this process where once a day or once a week,

51:26 or in response to CVs coming out, you rebuild your image from scratch without caching.

51:31 So you can just say every night at 3 a.m. when no one's working, we are going to rebuild our image from scratch without caching.

51:40 And so our image will always have the latest security updates.

51:45 And then if you're in a system that has continuous deployment, you can then just automatically deploy that.

51:50 Wait, how do you make the little banner that says you're going to be down Sunday from 3 to 5

51:53 when you do that part?

51:54 Just kidding.

51:55 Yeah.

51:56 So this is easier if you have a process that you trust enough to do automatic deploys like anytime you want.

52:02 But you basically have to rebuild your image from scratch without caching,

52:07 either whenever a security update comes out or just on a regular basis and redeploy if it's a server,

52:14 because you have these immutable artifacts that if you're running a VM, like you can just have like a cron job that installs security updates nightly.

52:24 Like there's unattended upgrades package in Debian.

52:26 For Docker images, you can't do that.

52:28 And so you have to rebuild from scratch with security updates and then redeploy.

52:32 And this is an ongoing process.

52:34 Yeah, I'm glad you pointed out the caching because it's not enough to go out and say,

52:38 oh, every once a day or once a week, we'll just do a Docker build.

52:42 Oh, it's up to date.

52:43 Actually, we're good.

52:43 Yeah, no.

52:44 Not so much.

52:46 Yeah.

52:47 Yeah.

52:47 And that's why I brought this up because I think it's tricky.

52:49 Like there's a natural flow that like kicks that refresh cycle off for code,

52:53 but not for the infrastructure itself, unless you think about it.

52:57 Yeah.

52:58 So you need to explicitly think about and set up these processes, either some way to get notified of CVs

53:03 or you can probably, if you have a bunch of registries, have security scanners.

53:09 They'll scan your images for security problems.

53:12 And so you can run those in a schedule, maybe.

53:14 Honestly, the easiest though is probably just do a forced rebuild 5 a.m. or something.

53:19 Yeah.

53:19 And that next time every developer that comes in and runs the command Docker compose up,

53:24 going to do a Docker build.

53:25 It's going to see the things out of date and it'll just trigger a let's get the fresh.

53:28 Yeah.

53:29 Yeah.

53:29 And it turns out security scanners are also have some, bad defaults.

53:34 So you'll get, there's a lot of security problems that are not really problems.

53:39 Like the upstream maintainer has closed it as won't fix, or it's not going to get fixed in Debian stable until the next release of Debian stable.

53:47 And so the Debian maintainers are basically have decided that it is not worth fixing.

53:50 And so there's nothing you can do.

53:51 Most security scanners will flag those.

53:53 And so you'll run a scanner on an updated image and it'll say, you have 60 security vulnerabilities.

53:59 But you, if you turn on the flag that says only tell me about security vulnerabilities that I can actually fix,

54:04 that actually have updates from Debian.

54:05 And then you run that and it'll say you're fine.

54:08 And that is probably a much more realistic assessment of your risk because it's like,

54:11 there are bugs that are never going to be fixed because the GLAB C maintainers have said,

54:16 no, won't fix.

54:18 This is not our problem.

54:19 It's like, it's not a real bug.

54:20 Yeah.

54:21 I suspect you could also get notified about things that like are not observable really.

54:25 So, oh, there's a problem in the system, but we actually have a firewall blocking that port and we have no interaction with it.

54:32 Right.

54:32 It's like, how much do you worry about those kinds of things?

54:34 You may as well upgrade and redeploy because maybe one day your firewall will fail,

54:38 but like there's a whole bunch of just like utter noise if you don't configure your securities

54:43 kind of correctly.

54:44 Yeah.

54:44 All right.

54:45 Wrapping up this bit of the topic, Kim says forced rebuild is a great for your own images based on Debian or other OS.

54:50 You probably still need some kind of scanning.

54:52 Yeah.

54:53 If you're not able to build it yourself.

54:54 Yeah.

54:54 Makes sense.

54:55 All right.

54:55 We got a little bit of time to touch on a couple of things.

54:57 One of the areas stage two was security.

55:01 You always want different layers.

55:03 I talked about a firewall.

55:04 We're talking about security updates and patches, but there's layers of security you want.

55:09 And one of the very straightforward ones is you probably don't want to run this as root.

55:13 And like certain systems will even warn you about this.

55:16 So if I try to pseudo brew something on my Mac, it'll complain.

55:20 Like you should never, ever run brew as root.

55:21 What are you doing?

55:22 Are you crazy?

55:22 Stop doing that.

55:23 I think micro whiskey might warn that you're running this root.

55:26 If you look at the logs when you started up.

55:28 So when I run Docker and I just get that simple, get started one, what does that run as?

55:33 Yeah.

55:33 So by default, Docker runs as root.

55:35 Oh, okay.

55:36 That kind of makes sense because all these system packages are designed to be installed as

55:40 root.

55:41 And so if you're going to install system packages or install security updates,

55:44 you have to be a root by default.

55:45 But as soon as you've, you've switched to like installing your Python code,

55:49 you should stop being root and create a new user and switch that user because otherwise

55:54 your application will be running as root and root inside a container is more restricted,

55:58 restricted, but it's still not as restricted as a normal user.

56:02 And different runtime systems might take more aggressive steps to restrict what you can do.

56:08 And so sometimes it might be okay, but just a good best practice.

56:11 You're, you don't know where your things are going to run.

56:13 Things might change around.

56:14 Just don't run as root.

56:15 What you're saying is basically, if you run as root in a Docker container and somebody takes over your container,

56:20 well, the worst thing they can do is like crash around inside of the container.

56:24 It's not like they now have full access to the machine, but, you know, maybe those rights are propagated onward.

56:30 Like maybe they can do something else to, I don't know, decrypt something that then gets them further in the network.

56:37 Like there's, there's challenges that could happen, right?

56:38 So it is much easier to escape a container and onto the host if you're running as root,

56:45 because you, like in Linux, security access is granted by these things called capabilities.

56:51 And if you're root, you have a little bit more capabilities.

56:54 It gives you a larger attack service on Linux kernel.

56:57 And so if there's a bug in the Linux kernel, it's easier to take it over if you're root.

57:01 There are other things you can do to like restrict all capabilities to containers,

57:05 even if you're running as a normal user.

57:06 So like if you're running the ping utility, for example, it gets a little bit of extra,

57:11 it gets an extra capability often.

57:13 So it can do a ping.

57:15 And then if there's a bug in the ping command, then you can sort of, and you can insert code into it somehow,

57:22 then it'll execute it with elevated privileges and you can do more stuff.

57:25 And so.

57:25 Yeah.

57:26 You don't want that.

57:27 Yeah.

57:28 And so you want to like run as a normal user that will restrict the attack surface on the Linux kernel,

57:33 like removing all capabilities or restrict the attack surface even more.

57:38 And you do these things.

57:39 And for many applications, it won't really matter too much, but it's very, it's not a lot of work.

57:45 And it's like a little bit more assurance that if someone does somehow take over,

57:49 you'll limit the damage they can cause because they're only restricted to this container.

57:53 Yeah.

57:53 Okay.

57:54 Good advice.

57:54 And that uses the add user Docker command.

57:56 Very cool.

57:57 And then let's see, what was the next one here?

58:00 We talked about the security updates.

58:02 Like that's, that's a challenge.

58:04 So what do I need to think about for continuous integration automated builds,

58:08 specifically with regard to Docker?

58:10 Is there anything special?

58:12 Like.

58:12 So first it's just doing the actual work of automating it.

58:15 So like you really, it's really nice if every time you push to your Git repository,

58:20 every time you pull it across, it builds a Docker image for you.

58:23 Because then like you can test it.

58:26 Maybe you can write additional tests to actually use a Docker image, do integration tests, that sort of thing.

58:30 Yeah.

58:30 Like for example, there was a really cool framework or library.

58:34 I can't remember exactly what it is.

58:36 We talked about it on Python Byte.

58:37 That instead of trying to mock out, say your database or, it was mostly databases.

58:43 There's a Docker, there's like a testing library you can use that will bring up a Docker container

58:48 running Mongo or Postgres or something, and then fill it with test data.

58:52 And you just connect those things and say, yeah, you can talk to the database.

58:55 You don't care.

58:56 It's test data.

58:57 Might as well.

58:57 Yeah.

58:58 Testing with a real database is so much easier these days.

59:00 It'd just be the default.

59:01 Like you shouldn't be, if you're didn't deploy with Postgres, you shouldn't test with SQLite because like they're different enough

59:07 that there'll be bugs that you're going to miss.

59:09 Yeah.

59:10 And so once you have that automation of like building for every pull request,

59:14 you don't, you start having this issue where you don't want the image you built

59:19 for the feature one, two, three branch to overwrite your production image.

59:23 That would be awkward.

59:24 But you would still like continuous integration to do its job and say, you checked in this thing.

59:29 It was okay or not okay.

59:30 Yeah.

59:31 And it's useful to have like images uploaded for every pull request that you can download

59:35 and maybe play around with it.

59:36 But you don't want feature branches images to interfere with your production image.

59:40 One easy way to do that is to name your Docker images based on the Git branch.

59:46 So like you can just use the Git branch as the part after the colon, the tag.

59:50 So it can be like your image colon main, if it's the main branch or your image colon feature

59:55 one, two, three, it's the feature one, two, three branch.

59:56 Yeah.

59:57 That works really well with like Git flow feature branch style programming as well.

01:00:01 I created an issue.

01:00:02 Then I create a branch named something along those lines.

01:00:05 Then I create a PR along those lines.

01:00:07 And oh, guess what?

01:00:08 Here's the container that goes with that thing, right?

01:00:10 Yeah.

01:00:10 And you can also do things like name your Docker image based on the Git commit.

01:00:14 So you can sort of go from Git commit to corresponding Docker image really easily.

01:00:18 Yeah.

01:00:19 That's a really cool idea.

01:00:19 I did find that package, by the way, in case people are interested.

01:00:22 It's called test containers dash Python.

01:00:25 Anyway, the idea is you just say like with my SQL container, do your tests.

01:00:30 And it like literally creates a Docker container with your test data and all that stuff.

01:00:34 So people can check that out.

01:00:35 That's kind of cool.

01:00:36 All right.

01:00:36 Well, we're getting a little bit short on time here.

01:00:39 What else do you want to throw out for people who are thinking about a lot of these best practices?

01:00:44 We touched on a lot of them, but I know there's plenty more to go.

01:00:47 Like for example, faster builds.

01:00:49 You talk about, say, pre-compiling the PYC files.

01:00:53 Yeah.

01:00:53 That's more for that actually do slower builds, but it'll give you a faster startup.

01:00:57 That's what I mean.

01:00:58 Yeah.

01:00:58 Sorry.

01:00:58 Since this comes up a bunch, Alpine Linux is not a thing you want, is often not a thing you

01:01:04 want to use for your Docker base image.

01:01:07 And the reason is Alpine Linux is highly recommended for if you want small images.

01:01:11 And it's kind of nice because you install, like installing the Alpine packages somehow, I don't

01:01:15 know what they do, but it's vastly faster than like installing Debian packages.

01:01:19 And you get small images and it's kind of nice.

01:01:22 Problem is Alpine Linux uses a different standard C library than most Linux distributions.

01:01:27 Most Linux distributions use Glybcy.

01:01:29 Alpine uses Musil or Musil.

01:01:32 I don't know how to pronounce it.

01:01:33 And binary wheels are compiled by default on Linux for Glybcy.

01:01:39 And so if you install Python packages on Alpine Linux, you will not get binary wheels.

01:01:45 You're going to have to compile them from scratch.

01:01:46 And so what happens is people say, oh, I'm going to use Alpine Linux.

01:01:49 It's going to make my images smaller.

01:01:51 And they try to install like a Postgres package, which is pre-compiled and doesn't work.

01:01:55 They're like, okay, so now I'll install a compiler and I'll install the Postgres headers.

01:02:00 Now you have this image that has compiler and Postgres headers in it, and you have to compile

01:02:05 stuff.

01:02:05 And like when you get to like data science or scientific computing, you're like compiling

01:02:10 these massive packages that take a really long time to compile and all your builds are super

01:02:14 slow.

01:02:15 And you can do a whole bunch of work to then use a multi-stage build so your image is small.

01:02:21 And then you can use caching so the builds are fast.

01:02:23 And then, but, or you can just use a different Linux distribution and use binary wheels.

01:02:28 Don't cause yourself the challenge.

01:02:30 Just use something else, huh?

01:02:31 But there is a PEP that I believe was accepted to start the process of building wheels for Alpine.

01:02:38 And I've started seeing some packaging tools who started adding support.

01:02:42 And so it may be that in a year or two, it'll be just like everyone builds binary wheels for

01:02:48 like many Linux, just the Glibsie.

01:02:50 People might start building wheels, binary wheels for Alpine.

01:02:53 And when that happens, there'll be much less of an issue.

01:02:56 But until then, avoid using Alpine Linux as your base image.

01:02:59 Yeah.

01:03:00 You want to close it out with a PYC thing?

01:03:02 Sure.

01:03:03 So when you start up a Python program, it loads in your Python source files and then compiles them.

01:03:09 And compilation here is not really the same as compiling a C extension.

01:03:12 It's basically a one-to-one translation.

01:03:14 It compiles them to bytecode and writes them out as PYC files.

01:03:17 And then the next time you start, it can just load the PYC file and that will make your application

01:03:22 start up quickly.

01:03:23 And so if you're doing some sort of like, many applications, it doesn't matter.

01:03:27 If you're doing like a serverless kind of thing where like you want things to start up

01:03:32 really quickly, like having to compile the PYC is, it's going to add some startup time.

01:03:37 I guess any times where the container lifetime, the life cycle is short, right?

01:03:41 Let's say with a web app, you would start it and it would run for hours.

01:03:44 And so it doesn't matter, right?

01:03:45 Yeah.

01:03:46 So it's like it took you another 20 milliseconds to start off.

01:03:49 It's going to be run for three days.

01:03:51 But if you're doing like a serverless thing where like 20 milliseconds might be a significant

01:03:56 chunk of the latency of your service.

01:03:58 So when you build your Docker image, you can pre-compile all your PYC files and then it'll

01:04:03 be in the image and you won't have, then your startup will be faster.

01:04:05 And the reason you have to actually think about this is that Docker images are immutable.

01:04:09 So like your container starts up, it compiles and writes the PYCs, but those PYCs never make

01:04:14 it to the original image.

01:04:15 Every time you start the image, it has the same immutable artifact, unlike your local home directory.

01:04:20 And so if you really want the fastest startup, you can make your image a bit larger and compile

01:04:25 the PYCs.

01:04:26 Basically that becomes a step in your Docker build files to compile the PYCs ahead of time.

01:04:31 Yeah.

01:04:31 Okay.

01:04:31 Awesome.

01:04:32 Great advice.

01:04:33 Many, many tips.

01:04:33 I think we're going to have to leave it there.

01:04:36 We're getting basically running out of time, but yeah, really nice talk.

01:04:39 I'll link to your talk that you did at PyCon and thanks for coming here and sharing the audio

01:04:43 version with us.

01:04:44 Thanks for inviting me.

01:04:45 Of course.

01:04:46 Before you get out of here though, there's the final two questions.

01:04:48 If you're going to write some Python code, what editor do you use?

01:04:50 I use SpaceMax, which is kind of like they took Emacs and they configured it like 20 years.

01:04:56 That's like you're jumping 20 years into the future.

01:04:58 It's like it's Emacs, but with all the things you need pre-configured to actually have a

01:05:03 nice development environment.

01:05:04 And it has VI bindings and Emacs bindings.

01:05:07 I use the Emacs bindings.

01:05:08 But if you like Vim, you can use the VI bindings.

01:05:10 Yeah.

01:05:11 Cool.

01:05:11 Their subtitle and sub-subtitle is a community-driven Emacs distribution.

01:05:16 The best editor is neither Emacs nor Vim.

01:05:18 It's Emacs and Vim.

01:05:19 I honestly don't use the Vim bindings at all.

01:05:22 I'm using it for like, it does all the IDE stuff you want out of the box.

01:05:27 So it's just, it's a much more modern experience.

01:05:30 Okay.

01:05:30 Really cool.

01:05:31 And then notable PyPI package?

01:05:33 Py03, which is a way to create Python extensions in Rust.

01:05:38 I've used it to create both for Phil, my memory profiler, but also I wrapped some Rust library

01:05:45 with it.

01:05:45 It's really, really nice way to create fast, safe extensions for Python.

01:05:51 And it comes, there's a packaging tool called Maturin, M-A-T-U-R-I-N, which was probably the

01:05:57 nicest Python packaging I've experienced I've ever had.

01:05:59 Like you add like three lines, like you had a PyProject line file, which is like three

01:06:05 lines.

01:06:05 You add like a tiny bit of metadata and now you can build wheels and you can pip install

01:06:09 and it just works.

01:06:11 And it's just amazingly smooth development experience.

01:06:15 That's fantastic.

01:06:16 Yeah.

01:06:16 So basically if you're going to write a C extensions, maybe reconsider that and write

01:06:21 them in Rust and use this possible.

01:06:23 Yeah.

01:06:24 It's like Rust is like, gives you the same performance that you would get from C or C++, but it's

01:06:31 much safer.

01:06:31 And as someone used to write C++ long ago, like I, I've learned it in the, over the past

01:06:38 couple of years.

01:06:38 And it's like, it is the language I always wanted C++ to be.

01:06:42 Yeah.

01:06:42 I hear you.

01:06:43 I did a lot of C++ as well.

01:06:44 I was always bringing in these things like smart pointers and other stuff.

01:06:47 It's like, why does it have to be hard?

01:06:49 Can't we just like make this better?

01:06:50 It's not a simple language because if you want performance, you need to do work.

01:06:53 And it, the way it has a very different paradigm, but it's really lovely language.

01:06:57 You'll write much safer code and Py03 makes it really nice to write Python extensions.

01:07:02 Yeah.

01:07:02 Cool.

01:07:03 Cool.

01:07:03 And then Talden's out there has a interesting comment.

01:07:05 Is that like an isotope?

01:07:07 I think it's a reference to like oxidization, like O3, but.

01:07:12 Yeah.

01:07:13 Because it rusts.

01:07:14 And there's a lot of oxidizing happening.

01:07:15 Yeah.

01:07:16 Around.

01:07:17 There's also PyOxidizer.

01:07:18 Yeah.

01:07:18 There's another project.

01:07:19 There's another project.

01:07:19 Yeah.

01:07:20 That's like a completely different project, but yeah, it's another rust pun.

01:07:25 Packaging.

01:07:26 Yeah, exactly.

01:07:26 Very, very cool.

01:07:27 All right.

01:07:28 Final call to action.

01:07:29 People are interested in this.

01:07:31 They want to go deeper.

01:07:31 You've got some various things that you can find on your website, pythonspeed.com slash

01:07:36 Docker.

01:07:37 Yep.

01:07:37 Where do they go?

01:07:38 What do you tell them?

01:07:38 Yeah.

01:07:38 So if you go to pythonspeed.com slash Docker, there's a whole bunch of free articles about

01:07:43 various best practices.

01:07:44 If you're specifically interested in the process that we covered today, there's a PyCon talk,

01:07:49 but also if you go to pythonspeed.com slash Docker process, it's also linked on that page.

01:07:54 It's like an introduction to Dockerizing for production.

01:07:57 It's basically like a little mini ebook I wrote that's about 10 pages, but it goes over

01:08:02 this process that we talked about today and sort of prose and talks about sort of the

01:08:06 decisions you have to make and how it integrates your organizational processes.

01:08:09 That's super cool.

01:08:10 Yeah.

01:08:11 And on that site, the slash Docker, it has a bunch of articles and it has a very small

01:08:15 scroll bar and a lot of stuff below it.

01:08:17 So yeah, there's a lot of things going on.

01:08:19 People can go check out for more resources there, right?

01:08:21 Yeah.

01:08:22 And I have a bunch of paid products if anyone's interested about Docker packaging from intro

01:08:26 to much more detailed one.

01:08:28 And if you use the code talkpython, you can get a 15% discount.

01:08:32 Oh, fantastic.

01:08:33 Awesome.

01:08:33 Yeah.

01:08:34 So be sure to do that.

01:08:35 Thank you so much for being on the show and sharing a lot of your hard-earned Docker experience.

01:08:40 Yeah.

01:08:40 Thanks for inviting me.

01:08:41 You bet.

01:08:42 Great to talk to you.

01:08:42 You too.

01:08:43 Bye.

01:08:45 This has been another episode of Talk Python To Me.

01:08:47 Our guest in this episode was Itamar Turner-Trowing.

01:08:50 It has been brought to you by Sentry, Linode, and Assembly AI.

01:08:54 Take some stress out of your life.

01:08:56 Get notified immediately about errors in your web applications with Sentry.

01:09:00 Just visit talkpython.fm/sentry and get started for free.

01:09:04 And use the promo code talkpython2021 when you sign up.

01:09:09 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

01:09:15 Develop, deploy, and scale your modern applications faster and easier.

01:09:18 Visit talkpython.fm/Linode and click the Create Free Account button to get started.

01:09:23 Transcripts for this and all of our episodes are brought to you by Assembly AI.

01:09:27 Do you need a great automatic speech-to-text API?

01:09:30 Get human-level accuracy in just a few lines of code.

01:09:32 Visit talkpython.fm/assemblyai.

01:09:35 Want to level up your Python?

01:09:37 We have one of the largest catalogs of Python video courses over at Talk Python.

01:09:41 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:09:46 And best of all, there's not a subscription in sight.

01:09:49 Check it out for yourself at training.talkpython.fm.

01:09:52 Be sure to subscribe to the show, open your favorite podcast app, and search for Python.

01:09:56 We should be right at the top.

01:09:58 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:10:03 and the direct RSS feed at /rss on talkpython.fm.

01:10:08 We're live streaming most of our recordings these days.

01:10:10 If you want to be part of the show and have your comments featured on the air,

01:10:14 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:10:18 This is your host, Michael Kennedy.

01:10:20 Thanks so much for listening.

01:10:21 I really appreciate it.

01:10:22 Now get out there and write some Python code.

01:10:24 I'll see you next time.

01:10:45 Thank you.