Pydantic Performance Tips
Episode Deep Dive
Guest Introduction and Role on the Pydantic Team
- Sydney Runkle recently graduated from the University of Wisconsin and now works full-time at the company behind Pydantic. She first joined as an intern, contributing to both open-source and commercial development. Sydney is involved with the ongoing evolution of Pydantic, including performance enhancements and community-driven features.
- Sydney explained her path from student to full-time contributor on the Pydantic project. She shared how PyCon 2023 was her first big conference experience, and how transitioning into full-time work at Pydantic allowed her to focus on open source, Rust integrations, and helping shape the future of the library.
Pydantic Overview
- Library Purpose: Pydantic is a data validation and settings management library that uses Python type hints for input validation.
- Core Use Cases: Commonly used in frameworks like FastAPI to validate request and response data.
- Massive Adoption: Over 400,000 projects depend on Pydantic, with over 200 million downloads in some months.
Pydantic v1 vs. v2
- Rust Core: Version 2 rewrote its core validation engine in Rust for major performance gains.
- Python Wrapper and Rust Engine: V2 still presents a Python API but delegates intense validation and serialization to Rust.
- Backward Compatibility: Although it was a big architectural change, the team sought to minimize breaking changes for users upgrading from v1.
Performance Tips (One-Liners and Small Changes)
- Use
model_validate_json
Instead ofmodel_validate
+json.loads
Skip materializing Python objects unnecessarily by usingmodel_validate_json
to parse and validate JSON directly in Rust. - Initialize
TypeAdapter
Once
If you are validating the same type repeatedly, build theTypeAdapter
once (rather than in a loop) to avoid repeatedly constructing schemas. - Prefer Specific Type Hints Over General Ones
For example, uselist[int]
instead ofSequence[int]
to leverage more efficient validation paths. - Defer Schema Building (If Startup Matters More)
Consider deferring model schema builds using thedefer_build
flag (or similar config options) if import time is critical.
- Use
Discriminated (Tagged) Unions
- Concept: A union of multiple models distinguished by a specific field or a callable that identifies which model to apply.
- Example: A field that can be either a
Cat
model or aDog
model, each marked by apet_type
field. - Why It Helps: Lets Pydantic skip validating irrelevant fields when it knows which union branch is correct. This boosts performance on large or nested models with many union members.
- Callable Discriminators: Instead of a simple literal field, you can use a function to decide which model applies, especially if multiple attributes determine the branch.
Future and Ongoing Performance Work
- Avoiding Materializing Data: Further enhancements will reduce the need to fully parse or transform data before validation.
- SIMD (Single Instruction, Multiple Data): Efforts in the custom JSON parser (JSON-iterable parser / Jiter) aim to do parallelized or vectorized operations.
- FastModel Concept (Not Yet Released): A potential feature allowing attributes to remain in Rust until explicitly accessed, boosting performance by lazily converting them into Python objects only when needed.
Tools for Measuring and Improving Performance
- Codspeed: Integrated with CI to compare performance on main vs. a PR branch using specialized benchmarks.
- CodeFlash: An LLM-based tool that suggests more performant code, tests the suggestions, and can open PRs with explanations.
JSON-to-Pydantic Generation
- json2pydantic.com: A useful website for quickly creating Pydantic models from raw JSON. This can save time in setting up complex nested models.
- Command-line or LLM Approaches: Some developers also use local code-generation libraries (e.g. datamodel-code-generator) or LLMs to generate models from JSON, then refine as needed.
Pydantic the Company and Commercial Offering
- Logfire Observability Platform: A new open beta service from the same team, built around an opinionated approach to OpenTelemetry. Provides nested logs, profiling, and a dashboard that integrates well with Python code.
- FastUI and Other Projects: The team is also working on developer productivity tools (like FastUI) that tightly integrate with Pydantic.
Overall Takeaway
Pydantic remains a top choice for data validation in Python, especially with its massive v2 performance improvements via Rust. By applying small tweaks—like using model_validate_json
, initializing TypeAdapter
once, specifying precise type hints, or adopting discriminated unions—developers can further optimize their code. Add to that the ecosystem of tools (Codspeed, CodeFlash, Logfire) and a vibrant, rapidly evolving open-source community, and it’s clear that Pydantic continues to push forward both ease of use and performance for Python data modeling.
Links from the show
Pydantic: pydantic.dev
Performance docs: docs.pydantic.dev
Union tips: docs.pydantic.dev
Sydney's presentation slides: docs.google.com
JSON to Pydantic: jsontopydantic.com
Samuel talking FastUI: talkpython.fm
CodeFlash: codeflash.ai
Codspeed: codspeed.io
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy