Python Apps that Scale to Billions of Users
While Python can be fast at some operations and slow at others, it's generally not so much about language raw performance as it is about building an architecture for this scale. That's why it's great to have Julian Danjou on the show today. We'll dive into his book "The Hacker's Guide to Scaling Python" as well as some of his performance work he's doing over at Datadog.
.
Episode Deep Dive
Guest Introduction and Background
Julian Danzhao is a seasoned Python developer and performance engineer who has extensive experience working on large-scale Python projects such as OpenStack. He has also contributed to performance and profiling tools at Datadog. Julian is the author of The Hacker’s Guide to Scaling Python, focusing on pragmatic ways to build Python applications that can scale to billions of requests while balancing real-world trade-offs.
What to Know If You're New to Python
If you are still early on your Python journey, here are a few essentials to set the context for this discussion:
- You don’t need to be an expert in concurrency or microservices before you begin. A clear understanding of how Python handles multiple processes and threads will be enough to follow the conversation.
- Some libraries and tools mentioned (e.g.,
cProfile
, async/await features, or queue systems like Celery) may be new to you. Just know they all help manage performance or concurrency in Python. - Having a basic handle on Python packaging (installing via
pip
, working with virtual environments) will be helpful when you hear about profiling or caching packages.
Key Points and Takeaways
- It’s Not (Always) About Speed—It’s About Architecture
Properly scaling Python to billions of requests is less about micro-optimizations in the language and more about the overall application design. Handling massive traffic often requires using multiple processes, caching, asynchronous I/O, and smart architecture decisions that let you meet ever-increasing user demands without dramatically rewriting code.
- Links and Tools:
- goingbython.com (Julian’s book: The Hacker’s Guide to Scaling Python)
- Datadog – Offers continuous profiling and observability tools
- Links and Tools:
- Demystifying Python’s Global Interpreter Lock (GIL)
Many developers encounter the GIL and think Python “cannot scale.” In truth, it restricts only one thread at a time from executing Python bytecode, but you can work around this with multiprocessing, native libraries that release the GIL, or asynchronous approaches. Understanding where the GIL does and does not matter is crucial to writing high-throughput applications in Python.
- Links and Tools:
- Python docs on GIL – Official explanation
- C extensions like NumPy – They often release the GIL for heavy numeric work
- Links and Tools:
- Leveraging Multiple Processes Over Threads
Rather than threads, many Python services scale by spawning multiple processes. Tools such as
gunicorn
oruwsgi
can start multiple worker processes and thus bypass the GIL restriction for CPU-bound code. This approach can utilize multi-core systems effectively and is especially common for Python web apps.- Links and Tools:
- gunicorn.org – Python WSGI HTTP Server
- uWSGI – Another popular server for Python
- Links and Tools:
- Asyncio and Non-Blocking I/O
Python’s
async
andawait
keywords, introduced more prominently in Python 3.5, let you scale I/O-intensive tasks (like network requests) without using multiple threads or processes. This is especially handy for web servers or services that spend a lot of time waiting on I/O, such as reading from APIs or databases.- Links and Tools:
- Asyncio documentation
- Starlette and FastAPI – Async-friendly web frameworks
- Links and Tools:
- Design for Failure As you scale, the likelihood of network hiccups, hardware failures, or unpredictable slowdowns increases. Building resilience means using retries with backoff and, if necessary, queuing or caching to handle intermittent outages. It’s often worth writing explicit exception handling for connection loss and planning how your system recovers.
- Caching as the Secret Weapon
Caching can be a simple yet powerful way to speed up applications. Whether you cache locally in a Python dictionary or externally in tools like Redis or memcached, caching helps absorb large traffic spikes and reduce repetitive heavy operations. Striking a balance between fresh data and cached data is vital; so is having a good strategy for cache invalidation.
- Links and Tools:
- Redis.io – In-memory data store
- memcached.org – Popular caching system
- Links and Tools:
- Continuous Profiling and Performance Analysis
Understanding why your code is slow is the first step to improving it. Profilers like
cProfile
or sampling profilers in Datadog can highlight CPU or I/O hot-spots. Continuous profiling in production with a low-overhead, sampling-based profiler lets you see real-world loads and spot inefficiencies that might be missed in synthetic tests.- Links and Tools:
- cProfile docs – Built-in Python profiler
- dd-trace-py – Datadog’s Python tracing and profiling library
- Links and Tools:
- Queues for Decoupling and Scalability
Offloading long-running or resource-intensive tasks to background workers can dramatically increase responsiveness and reliability. If a web endpoint triggers something requiring 30 seconds of compute time, it’s far better to enqueue that job, return quickly, and have workers process the queue in the background.
- Links and Tools:
- Celery – A popular Python task queue
- RabbitMQ – Commonly used broker for Celery
- Redis Queue (RQ) – Another popular queue library for Python
- Links and Tools:
- Load Testing With Realistic Traffic
Before optimizing, measure your system’s performance with realistic or near-realistic workloads. Tools like Locust let you simulate user behavior, incorporate random delays, and measure overall throughput. This ensures you spend time fixing genuine bottlenecks, not just assumptions.
- Links and Tools:
- Locust.io – Python-based load testing
- pytest-benchmark – Useful for smaller-scale performance checks
- Links and Tools:
- Embrace Functional and Stateless Patterns Writing your code in a functional style—pure functions with no side effects—can make distributing or parallelizing your work across processes simpler. The more each function depends solely on its inputs, the easier it becomes to run those functions on separate nodes or across cloud services.
- Links and Tools:
Interesting Quotes and Stories
“Premature optimization is the root of all evil.” – Cited from Donald Knuth, emphasizing the importance of measuring performance before spending time optimizing.
“Python is not fast or slow. It’s about how you design your system.” – Julian comparing the language’s performance trade-offs to the architecture you choose to implement.
“If you ever want to know where you’re spending 80% of your CPU time, measure it. Otherwise, you’re just guessing.” – Underscoring the value of profiling.
Key Definitions and Terms
- GIL (Global Interpreter Lock): A mechanism in CPython that allows only one thread to execute Python bytecode at a time.
- Sampling Profiler: A tool that periodically checks which functions are running to estimate performance usage.
- Stateless Architecture: A design principle where no local data is preserved from request to request, making horizontal scaling and failover more straightforward.
Learning Resources
If you want to deepen your Python journey or brush up on core skills as you explore high-scale architectures, consider these:
- Python for Absolute Beginners (Talk Python Training): An excellent place to level up if you’re still new to the Python language basics and coding concepts.
- Building Data-Driven Web Apps with Flask and SQLAlchemy (Talk Python Training): Learn to build robust web applications and see how scaling fits into a typical Flask+Database setup.
- Async Techniques and Examples in Python (Talk Python Training): Dive deeper into asynchronous and parallel programming, including concurrency options for CPU and I/O-bound scenarios.
Overall Takeaway
Scaling Python apps to billions of requests involves building an architecture tailored to real workloads rather than fixating on raw language speed. Julian’s insights show how multiprocessing, async/await, profilers, caching, and queues each play a role in letting Python applications flourish under high concurrency. By carefully measuring performance, designing for failure, and relying on proven patterns, you can harness Python’s ease of development and still handle massive traffic.
Links from the show
Scaling Python Book: scaling-python.com
DD Trace production profiling code: github.com
Futurist package: pypi.org
Tenacity package: tenacity.readthedocs.io
Cotyledon package: cotyledon.readthedocs.io
Locust.io Load Testing: locust.io
Datadog: datadoghq.com
daiquiri package: daiquiri.readthedocs.io
YouTube Live Stream Video: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy