Learn Python with Talk Python's 270 hours of courses

Python at Netflix

Episode #421, published Sun, Jul 2, 2023, recorded Thu, Jun 8, 2023

When you think of Netflix (as a technology company), you probably imagine them as cloud innovators. They were one of the first companies to go all-in on a massive scale for cloud computing as well as throwing that pesky chaos monkey into the servers. But they have become a hive of amazing Python activity. From their CDN, demand predictions and failover, security, machine learning, executable notebooks and lots more, the Python at play is super interesting. On this episode, we have Zoran Simic and Amjith Ramanujam on the show to give us this rare inside look.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

Zoran Simic and Amjith Ramanujam are both seasoned Python developers working at Netflix. Zoran has a background in software engineering spanning multiple languages, including a lengthy stint with Eiffel, before moving on to Python at LinkedIn and then Netflix. He focuses on empowering Python developers internally at Netflix by building and supporting Python tooling and infrastructure. Amjith is similarly passionate about Python, coming from a background that included C/C++ and Haskell, eventually landing on Python for its practicality and rich ecosystem. He spent several years on the Netflix team responsible for failover and demand engineering (among other things) and is now part of a central team enabling Netflix developers to use Python effectively. Both guests have deep roots in the Python community and open-source—Zoran through projects like Pickly, and Amjith through powerful command-line tools such as PGCLI and MYCLI.

What to Know If You're New to Python

If you're just getting started with Python, it’s helpful to grasp the basic idea of Python’s ecosystem and how it’s used at scale. Netflix heavily relies on common frameworks (e.g., Flask and FastAPI), version management, and the REPL experience. The conversation also highlights how different teams (data science, infrastructure, etc.) each have unique but overlapping Python workflows.

Key Points and Takeaways

  • 1) Netflix’s Python Ecosystem and Tooling Netflix embraces a “freedom and responsibility” culture where teams independently choose their tools, but Python has become a major player across the company. A newly formed central Python team supports Netflix engineers with shared tooling, best practices, and infrastructure to make teams more productive.
  • 2) Open Connect CDN and Python’s Role Open Connect is Netflix’s custom Content Delivery Network (CDN), which physically ships servers to internet exchange points. While the video streaming components are highly optimized (often in lower-level languages), Python drives the orchestration, management, and data analysis behind the scenes. This includes managing network devices, loading the right video encodes into CDNs, and forecasting content demand.
  • 3) Large-Scale Failover and Demand Engineering Netflix runs in multiple AWS regions, and the Demand Engineering team (previously including Amjith) automates massive failovers when a region experiences problems. Python-based tooling orchestrates traffic shifts that can happen in as little as five to seven minutes. This failover approach combines Python scripts, region-specific tooling, and SRE best practices to minimize downtime and user impact.
  • 4) Video Encoding and Automated Quality Control Zoran worked on Netflix’s encoding pipeline, which uses Python for scheduling and spinning up encoding workloads. Netflix often re-encodes its entire catalog to improve compression and quality. Tools like VMAF (a machine learning–based metric developed in Python) automate the detection of video quality, ensuring encodes meet Netflix’s high standards.
    • Links and Tools:
  • 5) Machine Learning, Personalization, and Metaflow Many personalization and ML tasks at Netflix rely heavily on Python and popular libraries like XGBoost, pandas, and TensorFlow. Metaflow (now open source) streamlines the end-to-end machine learning process, from data ingestion to production deployment, so data scientists can stay focused on modeling rather than infrastructure.
  • 6) Python Version Upgrades and Performance Gains Teams at Netflix have the freedom to upgrade to newer Python releases when it suits them. Many teams are moving to Python 3.10 or 3.11, capitalizing on better performance, faster startup times, and modern language features. The transition can yield dramatic improvements in certain workflows, notably data-intensive or compute-heavy tasks.
  • 7) Central Python Tooling: Portable Python Zoran leads an internal system called “Portable Python,” allowing Netflix teams to grab pre-built, minimal Python binaries for their projects—no local compiling required. It also ensures a consistent version of Python across laptops and servers, keeping Python footprints small (tens of MB, rather than hundreds).
  • 8) Interactive REPL Enhancements (Bpython, PTpython, PDB++ ) During high-pressure fixes or ad-hoc data exploration, Netflix engineers often rely on enhanced REPLs that offer auto-completion and syntax highlighting. Bpython and PTpython are community favorites. For debugging, PDB++ extends Python’s built-in debugger with colorized output and autocompletion.
  • 9) Command-Line Tools for Databases Amjith created PGCLI and MYCLI, interactive command-line clients that bring auto-completion and syntax highlighting for PostgreSQL and MySQL. Inspired by advanced REPLs, these tools reduce friction for engineers who spend much of their time exploring or debugging database schemas.
  • 10) Security and Resilience (Security Monkey & Observability) Netflix enforces security best practices across its AWS infrastructure, historically including internal Python-based tools like Security Monkey. Observability services—powered by Python-based alerting, logging, and remediation scripts—monitor performance, automatically shut down unhealthy instances, and keep Netflix’s large-scale platform stable.

Interesting Quotes and Stories

  • Zoran’s Teacher Story: Zoran recalled a high school math teacher warning him that programming was a “dead end” with few job prospects. He ignored that advice, eventually finding his passion in Python and technology at companies like LinkedIn and Netflix.
  • Amjith’s Haskell Revelation: Amjith began exploring functional programming in Haskell, fell in love with its REPL and list comprehensions, but realized job opportunities were limited. He landed on Python, drawn by its familiar features and immediate practicality.
  • “AI That Watches Netflix All Day”: Zoran mentioned how Netflix uses VMAF as an AI to literally watch video encodes all the time, rating perceived quality just as a human might.

Key Definitions and Terms

  • CDN (Content Delivery Network): A distributed network of servers that deliver web content (like streaming video) to users from the nearest location to reduce latency.
  • Chaos Monkey: A Netflix tool that randomly terminates instances in production to ensure services are fault-tolerant and resilient to failure.
  • Failover: Automatically rerouting traffic from one data center or AWS region to another in case of major outages or issues.
  • Portable Python: Netflix’s internal packaging of Python into a precompiled, lightweight, and platform-specific distribution for easy deployment on laptops and servers.
  • REPL (Read-Eval-Print Loop): An interactive programming environment that evaluates user-entered expressions and displays the results in real time.

Learning Resources

Here are a few curated course suggestions that complement the episode topics:

Overall Takeaway

Netflix’s story shows that Python can flourish in nearly every corner of a large-scale tech operation—from controlling CDNs to orchestrating global failovers, from video encoding to deep learning for personalization. By continually refining shared tooling, fostering a resilient culture, and harnessing Python’s expressive power, Netflix keeps innovating and delivering new features quickly. This episode reminds us how a supportive engineering environment, combined with a flexible language like Python, can tackle massive real-world problems in surprisingly elegant ways.

Links from the show

Zoran on Twitter: @zsimic
Amjith on Mastodon: @amjith@fosstodon.org

Python at Netflix blog post: netflixtechblog.com
pdb++: github.com
Pickley: github.com
Pickley vs. pipx: github.com
DB CLI: dbcli.com
Learn you a Haskell: learnyouahaskell.com
How Much of the Internet's Bandwidth Does Netflix Use?: makeuseof.com
PtPython: github.com
BPython: bpython-interpreter.org
Flask REST-Plus: readthedocs.io
RustUp: rustup.rs
Rye: github.com
PEP 711 - Distributing Python Binaries episode: talkpython.fm
Portable Python: github.com
Python Build Standalone: github.com
How Netflix does failovers in 7 minutes flat: opensource.com
Security Monkey: github.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon