All chapters

Chapter 5: Docker, Docker, Docker

Multiple times now, I've spoken about the power of moving to a single larger server. We saw the benefit where our apps can spike to 4x-8x the prior levels. Moreover, the idea of leaving the cloud, as DHH has laid out over his 6 or 7 blog posts, has serious merit. All the complexity, expense, and not-quite-fulfilled promises of cloud-native do not draw me in. Yet, my business doesn't need $600,000 in hardware to run it. Leaving the cloud by purchasing a bunch of hardware like Basecamp did would be misguided for sure.

When this discussion about cloud-native vs. simple single server architectures is usually had, it's the darkest of blacks contrasted with the brightest of whites. Black and white. One side claims you're crazy to run your own server. You'll clearly also need your own data center at your office location, complete with your own $1M generator and multiple fiber internet connections, a team of hardware specialists to keep it running, and all of that madness. They'll say how dare you consider "lifting and shifting" like a mindless Luddite into the cloud. No, you must design for auto-scaling Kubernetes backed by managed databases and 100 different serverless functions, all of which used to be a single FastAPI app with a coherent API.

I say "no thanks" to these premises. I want you to think about this one big server idea as a practical version of Basecamp's leaving the cloud. You get a single piece of (virtual) hardware that you do whatever you want with as if it were a massive server you physically owned within your location, but minus the team of hardware specialists, generators, etc., because it's in a simple and cost-effective cloud. For the cloud, think DigitalOcean, Linode, and Hetzner rather than AWS.

After all, what's the difference between Basecamp's co-located and third-party tended hardware they buy and a big VM dedicated to just you and your team? I think very little.

Does this single-server architecture apply to you? If you run your app entirely in some free tier, it probably doesn't apply (yet). If you have a $1M/yr cloud bill, our ideas here might be too simple for you (maybe). But if you run a few small servers all the way up to 20-30 containers, it probably is a good fit.

If we jump over to Hetzner, you'll see some of the VM options, many of them fully dedicated hardware. So again, how different is it from buying it yourself in Basecamp style? On the low end, you'll find a 2 CPU / 2 GB RAM system for $5/mo. On the highest end, you can get a fully dedicated 48 CPU / 192 GB RAM system for $311/mo, all of these options come with 20 Gb/sec networking and $90/mo - $740/mo of AWS-priced bandwidth included for free.

If your workload fits somewhere on the scales between those two levels described above, then the setup we're about to discuss would work well for you.

Now it's time to talk about how to use this big server in a useful and safe way, all the while spending exceedingly little time on the ops side.

Just Docker Compose

The title might have given away where we are going. It's all about Docker and containers. Start with our one big server (8 CPU, 16 GB RAM). Whether you choose a $25/mo, $50/mo, or even $300/mo server, you're likely to have many apps and servers running there (such as Postgres or MongoDB). We want to partition our apps and their dependencies into isolated blocks we can manage separately. Docker is a perfect fit for this. As discussed in the previous chapter, apps that run in Docker effectively run on the host machine directly with clever isolation. There is basically zero performance overhead when using Docker. This is not the case for virtual machines or multiple small cloud servers.

Containers can get very complicated very fast. The good news is that you don't have to go down that route. How much durability do you really need? Could your web app or API handle being down for 19 sec/month? If it can, you can host your containers on a single machine without load balancers and other complexity. This limited level of disruption equates to five 9's uptime (99.999% up).

How did I come up with this number? Let's say you do ten deploys per week. At 200ms per deployment (we'll see how to accomplish this later), that's roughly 9 seconds of aggregate downtime due to restarting your containers. And 10 seconds comes from the monthly Linux kernel security patches that require reboots to the host machine. So it's 19 seconds / 2,592,000 seconds in a month = 0.0007330247% downtime or, seen in the positive light, 99.9992669753% uptime.

Pushing for 0% downtime dramatically increases the complexity and cost. Zig (the programming language) recently wrote a very interesting and germane post entitled "Migrating from AWS to Self-Hosting." They talk a lot about cost, but they also have an undertone of complexity in their discussion:

"It's not really an emergency if [ziglang.org] goes down, and 99% uptime is perfectly fine for this use case. The funny thing about that last 1% of uptime is that it represents 99% of the costs."

It's the last 0.0001% for us, but the point is the same.

One more quote while we are on this topic: I've been discussing these ideas on social media as part of this book. One person had an interesting reply after researching a bunch of self-hosting platforms and orchestration engines:

I really went down a rabbit hole... I'm starting to see the value of "just docker-compose." 😁

Exactly. There is nothing inherently wrong with Kubernetes or self-hosted Heroku equivalents. But my sense is that as people start down these opaque hosting options, they lead to the adoption of more cloud-specific APIs and services, which then lead to even more. For example, Kubernetes probably needs orchestration and log aggregation. Log aggregation needs log monitoring. And so on and on.

Each of our full-stack setups has its own Docker Compose configuration. We're running MongoDB for the database and Granian for the Python web host with workers scaled out to 4 workers per app. The Docker Compose system is set to auto-start on Linux boot, and the docker apps (e.g., Granian hosting Talk Python Training) are set to restart "unless stopped."

As we dive deeper into our workflows and setups throughout this book, I'll give you concrete tools and techniques to make highly optimized Python web apps and APIs running in Docker containers. We'll see many obvious benefits I laid out above and some surprise ones that really level up our infrastructure and privacy game.

Next up: Chapter 6: Running on Rust