Learn Python with Talk Python's 270 hours of courses

Pydantic Performance Tips

Episode #466, published Fri, Jun 14, 2024, recorded Thu, Jun 13, 2024

You're using Pydantic and it seems pretty straightforward, right? But could you adopt some simple changes to your code that would make it a lot faster and more efficient? Chances are, you'll find a couple of the tips from Sydney Runkle that will do just that. Join us to talk about Pydantic performance tips here on Talk Python.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

  1. Guest Introduction and Role on the Pydantic Team

    • Sydney Runkle recently graduated from the University of Wisconsin and now works full-time at the company behind Pydantic. She first joined as an intern, contributing to both open-source and commercial development. Sydney is involved with the ongoing evolution of Pydantic, including performance enhancements and community-driven features.
    • Sydney explained her path from student to full-time contributor on the Pydantic project. She shared how PyCon 2023 was her first big conference experience, and how transitioning into full-time work at Pydantic allowed her to focus on open source, Rust integrations, and helping shape the future of the library.
  2. Pydantic Overview

    • Library Purpose: Pydantic is a data validation and settings management library that uses Python type hints for input validation.
    • Core Use Cases: Commonly used in frameworks like FastAPI to validate request and response data.
    • Massive Adoption: Over 400,000 projects depend on Pydantic, with over 200 million downloads in some months.
  3. Pydantic v1 vs. v2

    • Rust Core: Version 2 rewrote its core validation engine in Rust for major performance gains.
    • Python Wrapper and Rust Engine: V2 still presents a Python API but delegates intense validation and serialization to Rust.
    • Backward Compatibility: Although it was a big architectural change, the team sought to minimize breaking changes for users upgrading from v1.
  4. Performance Tips (One-Liners and Small Changes)

    • Use model_validate_json Instead of model_validate + json.loads
      Skip materializing Python objects unnecessarily by using model_validate_json to parse and validate JSON directly in Rust.
    • Initialize TypeAdapter Once
      If you are validating the same type repeatedly, build the TypeAdapter once (rather than in a loop) to avoid repeatedly constructing schemas.
    • Prefer Specific Type Hints Over General Ones
      For example, use list[int] instead of Sequence[int] to leverage more efficient validation paths.
    • Defer Schema Building (If Startup Matters More)
      Consider deferring model schema builds using the defer_build flag (or similar config options) if import time is critical.
  5. Discriminated (Tagged) Unions

    • Concept: A union of multiple models distinguished by a specific field or a callable that identifies which model to apply.
    • Example: A field that can be either a Cat model or a Dog model, each marked by a pet_type field.
    • Why It Helps: Lets Pydantic skip validating irrelevant fields when it knows which union branch is correct. This boosts performance on large or nested models with many union members.
    • Callable Discriminators: Instead of a simple literal field, you can use a function to decide which model applies, especially if multiple attributes determine the branch.
  6. Future and Ongoing Performance Work

    • Avoiding Materializing Data: Further enhancements will reduce the need to fully parse or transform data before validation.
    • SIMD (Single Instruction, Multiple Data): Efforts in the custom JSON parser (JSON-iterable parser / Jiter) aim to do parallelized or vectorized operations.
    • FastModel Concept (Not Yet Released): A potential feature allowing attributes to remain in Rust until explicitly accessed, boosting performance by lazily converting them into Python objects only when needed.
  7. Tools for Measuring and Improving Performance

    • Codspeed: Integrated with CI to compare performance on main vs. a PR branch using specialized benchmarks.
    • CodeFlash: An LLM-based tool that suggests more performant code, tests the suggestions, and can open PRs with explanations.
  8. JSON-to-Pydantic Generation

    • json2pydantic.com: A useful website for quickly creating Pydantic models from raw JSON. This can save time in setting up complex nested models.
    • Command-line or LLM Approaches: Some developers also use local code-generation libraries (e.g. datamodel-code-generator) or LLMs to generate models from JSON, then refine as needed.
  9. Pydantic the Company and Commercial Offering

    • Logfire Observability Platform: A new open beta service from the same team, built around an opinionated approach to OpenTelemetry. Provides nested logs, profiling, and a dashboard that integrates well with Python code.
    • FastUI and Other Projects: The team is also working on developer productivity tools (like FastUI) that tightly integrate with Pydantic.

Overall Takeaway

Pydantic remains a top choice for data validation in Python, especially with its massive v2 performance improvements via Rust. By applying small tweaks—like using model_validate_json, initializing TypeAdapter once, specifying precise type hints, or adopting discriminated unions—developers can further optimize their code. Add to that the ecosystem of tools (Codspeed, CodeFlash, Logfire) and a vibrant, rapidly evolving open-source community, and it’s clear that Pydantic continues to push forward both ease of use and performance for Python data modeling.

Links from the show

Sydney Runkle: linkedin.com
Pydantic: pydantic.dev
Performance docs: docs.pydantic.dev
Union tips: docs.pydantic.dev
Sydney's presentation slides: docs.google.com
JSON to Pydantic: jsontopydantic.com
Samuel talking FastUI: talkpython.fm

CodeFlash: codeflash.ai
Codspeed: codspeed.io
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon