Learn Python with Talk Python's 270 hours of courses

Pydantic v2 - The Plan

Episode #376, published Thu, Aug 4, 2022, recorded Thu, Aug 4, 2022

Pydantic has become a core building block for many Python projects. After 5 years, it's time for a remake. With version 2, the plan is to rebuild the internals (with benchmarks already showing a 17x performance improvement) and clean up the API. Sounds great, but what does that mean for us? Samuel Colvin, the creator of Pydantic, is here to share his plan for Pydantic v2.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Background

Samuel Colvin is the creator and lead maintainer of Pydantic, a popular Python data validation library. He has been working on Pydantic for over five years and recently began a major rewrite of its internals to improve performance and design. Samuel is an active Python developer who came to Rust as a way to optimize low-level, compute-heavy parts of the ecosystem, especially the core of Pydantic. He has also developed related tools such as watchfiles and rtoml using Rust bindings for Python.

What to Know If You're New to Python

Before diving into advanced data validation topics, it helps to understand a few Python fundamentals:

  • Familiarity with Python classes: Pydantic leverages classes for structuring data.
  • Basic knowledge of type hints (e.g., int, str, list) and Python 3.7+ features: Pydantic ties deeply into type annotations.
  • Some exposure to JSON data exchange and web-related usage will help you follow why performance and validation matter.

Key Points and Takeaways

  1. Pydantic v2’s Core Rewrite in Rust
    Pydantic v2 introduces an internal engine called pydantic-core, written in Rust with PyO3 bindings to expose a Python-friendly API. This rewrite targets major performance boosts and a cleaner design. Rust offers safe, low-level control over data handling and error-checking, which is especially beneficial for repeatedly validating large volumes of data.

  2. Performance Gains and Environmental Impact
    Early benchmarks show 4x to 50x speed improvements (commonly around 17x) for validation tasks. This can significantly reduce CPU usage across large-scale systems—many of which rely on Pydantic to validate millions of requests daily. Reduced compute often translates to lower operational costs and even environmental benefits due to decreased energy consumption.

  3. Strict Mode vs. Coercion
    Pydantic has always allowed “loose” validation, automatically converting compatible data (like "123" to int). Pydantic v2 formalizes a strict mode so that, when enabled, fields refuse to coerce data types (e.g., a string passed to an int field raises an error). This solves use-cases where data integrity demands zero unexpected conversions.

  4. Built-in JSON Parsing
    Previously, JSON parsing was done in Python before passing data to Pydantic. With v2, you can parse JSON bytes/strings directly through Rust-based logic. This not only increases speed but also smoothly handles strict-mode scenarios (e.g., ISO date strings remain valid for date fields when coming from JSON).

  5. Validation Without a Python Class
    Pydantic’s v1 approach often created hidden “model classes” behind the scenes. In v2, pydantic-core allows direct schema definitions (e.g., validating a TypedDict or individual fields) without defining a Python BaseModel. This opens up more flexible, micro-validation patterns for advanced or lower-level usage.

  6. Aliases and Deep Flattening
    The new alias system lets you pull data from nested locations via a path-like notation. For instance, you could flatten foo["bar"]["baz"] onto a top-level field. This is extremely helpful when dealing with large or inconsistent JSON structures, letting you unify how data is accessed without extra pre-processing steps.

  7. Improved Error Messages and Documentation Links
    Pydantic v2 aims to provide more thorough error messages, including references to online docs for further clarification. Borrowing inspiration from Rust’s error-handling approach, you’ll have targeted help links for each validation error. This ensures users quickly track down where and why validation fails.

  8. “From Attributes” Replaces “From ORM”
    Pydantic v1 had a method called from_orm, mainly for ORMs like SQLAlchemy. It’s being replaced with “from attributes,” a generalized approach to read Python objects’ attributes (including properties) for validation. You can validate any class instance, not just database models, making the feature far more flexible.

  9. Wrap Validators / Middleware-Style Logic
    A new “wrap validator” approach mimics the onion/middleware pattern used in web frameworks. Developers can write before-and-after logic around core field validation. This allows skipping redundant checks for already-valid data or gracefully catching specific errors in a layered, composable way.

  10. WebAssembly and Browser Testing
    With help from Pyodide, all of Pydantic’s tests run directly in the browser as WebAssembly, verifying cross-platform reliability. This demonstration highlights the future potential of Python and Rust code in the browser, ensuring Pydantic’s expanded environment coverage.

  1. Namespace and Method Cleanup
    There will be several renamed or reorganized methods to make Pydantic’s API clearer (model_validate_python, model_validate_json, etc.). Deprecated methods will likely raise warnings for a while, but silent changes in behavior (like how sets are or aren’t coerced) can break code if not addressed.
  1. Licensing and Documentation Considerations
    Samuel discussed how the MIT license for Pydantic remains intact, but the docs licensing might shift. The goal is to prevent out-of-date or duplicated documentation from floating around under the same terms. This step ensures official references stay authoritative and accurate.

Interesting Quotes and Stories

  • Samuel on building Pydantic initially: “I literally built Pydantic for me and put it on PyPI just to see what would happen.”
  • On environment and performance: “If we reduce Pydantic’s CPU usage by 10x, that might actually have an environmental impact given how often it’s called across big companies.”
  • Regarding strict type checks: “For me, it was obvious that a string '123' should become an int. But I also see the value in sometimes saying, ‘No, that’s not an int if it’s a string.’”

Key Definitions and Terms

  • Strict Mode: A configuration that disallows automatic data type coercion (e.g., no conversion of "5" to an integer).
  • Alias Flattening: A feature letting you specify how deeply nested data paths map onto a top-level field name.
  • PyO3: A library enabling Rust and Python interoperability, allowing Rust code to be compiled as Python modules.
  • Wrap Validator: A new validation approach that wraps the validation chain, letting you add or skip logic before and after the core validator runs.

Learning Resources

If you want to grow your Python skills and foundational knowledge:

Overall Takeaway

Pydantic v2 heralds a significant leap forward for Python data validation. By moving its core to Rust, it achieves astonishing performance gains while enhancing clarity around strict typing, JSON parsing, and custom validation. Teams can look forward to cleaner, faster, and more reliable validation pipelines—potentially with broad benefits from both a productivity and environmental standpoint.

Links from the show

Samuel on Twitter: @samuel_colvin
Pydantic v2 plan: pydantic-docs.helpmanual.io
Py03: pyo3.rs
FastAPI: fastapi.tiangolo.com
Beanie: github.com
SQLModel: sqlmodel.tiangolo.com
Speedate: docs.rs
Pytests running on Pydantic in browser: githubproxy.samuelcolvin.workers.dev
JSON to Pydantic tool: jsontopydantic.com
Pyscript: pyscript.net
Michael's Pyscript + WebAssembly: Python Web Apps video: youtube.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon