Learn Python with Talk Python's 270 hours of courses

Python Apps that Scale to Billions of Users

Episode #312, published Sun, Apr 18, 2021, recorded Thu, Apr 8, 2021

How do you build Python applications that can handling literally billions of requests. I has certainly been done to great success with places like YouTube (handling 1M requests / sec) and Instagram as well as internal pricing APIs at places like PayPal and other banks.

While Python can be fast at some operations and slow at others, it's generally not so much about language raw performance as it is about building an architecture for this scale. That's why it's great to have Julian Danjou on the show today. We'll dive into his book "The Hacker's Guide to Scaling Python" as well as some of his performance work he's doing over at Datadog.

.
Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Julian Danzhao is a seasoned Python developer and performance engineer who has extensive experience working on large-scale Python projects such as OpenStack. He has also contributed to performance and profiling tools at Datadog. Julian is the author of The Hacker’s Guide to Scaling Python, focusing on pragmatic ways to build Python applications that can scale to billions of requests while balancing real-world trade-offs.

What to Know If You're New to Python

If you are still early on your Python journey, here are a few essentials to set the context for this discussion:

  • You don’t need to be an expert in concurrency or microservices before you begin. A clear understanding of how Python handles multiple processes and threads will be enough to follow the conversation.
  • Some libraries and tools mentioned (e.g., cProfile, async/await features, or queue systems like Celery) may be new to you. Just know they all help manage performance or concurrency in Python.
  • Having a basic handle on Python packaging (installing via pip, working with virtual environments) will be helpful when you hear about profiling or caching packages.

Key Points and Takeaways

  1. It’s Not (Always) About Speed—It’s About Architecture Properly scaling Python to billions of requests is less about micro-optimizations in the language and more about the overall application design. Handling massive traffic often requires using multiple processes, caching, asynchronous I/O, and smart architecture decisions that let you meet ever-increasing user demands without dramatically rewriting code.
    • Links and Tools:
      • goingbython.com (Julian’s book: The Hacker’s Guide to Scaling Python)
      • Datadog – Offers continuous profiling and observability tools
  2. Demystifying Python’s Global Interpreter Lock (GIL) Many developers encounter the GIL and think Python “cannot scale.” In truth, it restricts only one thread at a time from executing Python bytecode, but you can work around this with multiprocessing, native libraries that release the GIL, or asynchronous approaches. Understanding where the GIL does and does not matter is crucial to writing high-throughput applications in Python.
  3. Leveraging Multiple Processes Over Threads Rather than threads, many Python services scale by spawning multiple processes. Tools such as gunicorn or uwsgi can start multiple worker processes and thus bypass the GIL restriction for CPU-bound code. This approach can utilize multi-core systems effectively and is especially common for Python web apps.
    • Links and Tools:
      • gunicorn.org – Python WSGI HTTP Server
      • uWSGI – Another popular server for Python
  4. Asyncio and Non-Blocking I/O Python’s async and await keywords, introduced more prominently in Python 3.5, let you scale I/O-intensive tasks (like network requests) without using multiple threads or processes. This is especially handy for web servers or services that spend a lot of time waiting on I/O, such as reading from APIs or databases.
  5. Design for Failure As you scale, the likelihood of network hiccups, hardware failures, or unpredictable slowdowns increases. Building resilience means using retries with backoff and, if necessary, queuing or caching to handle intermittent outages. It’s often worth writing explicit exception handling for connection loss and planning how your system recovers.
    • Links and Tools:
      • Tenacity – A retry library authored by Julian
      • Celery – Task queue system for asynchronous workloads
  6. Caching as the Secret Weapon Caching can be a simple yet powerful way to speed up applications. Whether you cache locally in a Python dictionary or externally in tools like Redis or memcached, caching helps absorb large traffic spikes and reduce repetitive heavy operations. Striking a balance between fresh data and cached data is vital; so is having a good strategy for cache invalidation.
  7. Continuous Profiling and Performance Analysis Understanding why your code is slow is the first step to improving it. Profilers like cProfile or sampling profilers in Datadog can highlight CPU or I/O hot-spots. Continuous profiling in production with a low-overhead, sampling-based profiler lets you see real-world loads and spot inefficiencies that might be missed in synthetic tests.
    • Links and Tools:
  8. Queues for Decoupling and Scalability Offloading long-running or resource-intensive tasks to background workers can dramatically increase responsiveness and reliability. If a web endpoint triggers something requiring 30 seconds of compute time, it’s far better to enqueue that job, return quickly, and have workers process the queue in the background.
    • Links and Tools:
  9. Load Testing With Realistic Traffic Before optimizing, measure your system’s performance with realistic or near-realistic workloads. Tools like Locust let you simulate user behavior, incorporate random delays, and measure overall throughput. This ensures you spend time fixing genuine bottlenecks, not just assumptions.
  10. Embrace Functional and Stateless Patterns Writing your code in a functional style—pure functions with no side effects—can make distributing or parallelizing your work across processes simpler. The more each function depends solely on its inputs, the easier it becomes to run those functions on separate nodes or across cloud services.
  • Links and Tools:
    • Futurist – Concurrency library from OpenStack
    • Daiquiri – Julian’s library simplifying logging configuration

Interesting Quotes and Stories

“Premature optimization is the root of all evil.” – Cited from Donald Knuth, emphasizing the importance of measuring performance before spending time optimizing.

“Python is not fast or slow. It’s about how you design your system.” – Julian comparing the language’s performance trade-offs to the architecture you choose to implement.

“If you ever want to know where you’re spending 80% of your CPU time, measure it. Otherwise, you’re just guessing.” – Underscoring the value of profiling.

Key Definitions and Terms

  • GIL (Global Interpreter Lock): A mechanism in CPython that allows only one thread to execute Python bytecode at a time.
  • Sampling Profiler: A tool that periodically checks which functions are running to estimate performance usage.
  • Stateless Architecture: A design principle where no local data is preserved from request to request, making horizontal scaling and failover more straightforward.

Learning Resources

If you want to deepen your Python journey or brush up on core skills as you explore high-scale architectures, consider these:

Overall Takeaway

Scaling Python apps to billions of requests involves building an architecture tailored to real workloads rather than fixating on raw language speed. Julian’s insights show how multiprocessing, async/await, profilers, caching, and queues each play a role in letting Python applications flourish under high concurrency. By carefully measuring performance, designing for failure, and relying on proven patterns, you can harness Python’s ease of development and still handle massive traffic.

Links from the show

Julian on Twitter: @juldanjou
Scaling Python Book: scaling-python.com

DD Trace production profiling code: github.com
Futurist package: pypi.org
Tenacity package: tenacity.readthedocs.io
Cotyledon package: cotyledon.readthedocs.io
Locust.io Load Testing: locust.io
Datadog: datadoghq.com
daiquiri package: daiquiri.readthedocs.io

YouTube Live Stream Video: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon