Learn Python with Talk Python's 270 hours of courses

Automate your data exchange with Pydantic

Episode #313, published Thu, Apr 22, 2021, recorded Wed, Apr 14, 2021

Data validation and conversion is one of the truly tricky part of getting external data into your app. This might come from a REST API, a file on disk, or somewhere else. This includes checking for required fields, correct data types, converting from compatible types (for example, strings to numbers), and much more. Pydantic is one of the best ways to do this in modern Python using dataclass-like constructs and type annotations to make it all seamless and automatic.

We welcome Samuel Colvin, creator of Pydantic, to the show. We'll dive into the history of Pydantic and it's many uses and benefits.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest introduction and background

Samuel Colvin is the creator and maintainer of Pydantic, a highly popular Python library for data validation and settings management. He has a background in engineering and spent time working on oil rigs in Indonesia, where he discovered his passion for programming. Over the years, Samuel has grown Pydantic from a small project into a widespread tool used by many top tech companies and organizations such as Microsoft, Uber, and financial institutions. His unique journey, from drilling operations to open-source development, shapes his pragmatic approach to building tools that emphasize both performance and simplicity.

What to Know If You're New to Python

If you’re just getting started with Python and want to better understand how Pydantic integrates with Python’s type hints, here are a few points:

  • Make sure you’re comfortable with Python classes, dictionaries, and functions, since Pydantic heavily leverages those.
  • Familiarize yourself with type hints (e.g., int, str, float) because Pydantic automatically interprets them.
  • Having a basic sense of JSON data and Python’s dict usage will help you follow the discussion on serialization.
  • Consider brushing up your foundational skills with Python for Absolute Beginners if you need a gentle introduction to the language.

Key points and takeaways

  1. Pydantic’s Core Mission: Data Validation and Conversion Pydantic aims to simplify data intake by automatically converting and validating data types. For example, if you define a field as an int, Pydantic will coerce "123" (a string) into 123 (an integer) if possible. It also allows deeper checks like validating if something is a correct email address or file path. This makes it an excellent fit for modern API development but also for any situation where external or user data must be safely processed.
  2. Why Type Hints Matter One of Pydantic’s defining features is that it uses type annotations found in modern Python for validation logic. Instead of learning a new schema language, you rely on Python's typing style, which also means better IDE support, static analysis with tools like mypy, and consistent code. This synergy between Python’s built-in type hints and runtime validation is a core reason for Pydantic’s rapid adoption.
  3. Simple, Declarative Usage with Model Classes Pydantic centers around its BaseModel class. Developers define attributes with type hints, and Pydantic automatically enforces them at instantiation. Whether you’re pulling data from a REST API, web form, or local file, calling your model with User(**data) triggers type coercion and validation. This minimal boilerplate approach helps reduce hard-to-track bugs caused by unexpected data types.
  4. Leniency vs. Strictness in Data Parsing A critical design choice in Pydantic is its generally “optimistic” style of conversion. It will do its best to turn a string into an integer, parse a timestamp, or interpret a float as an int if it makes sense. Users can tighten this default behavior by configuring stricter rules or writing custom validators. This flexible approach offers a friendlier developer experience, especially for web apps and user-generated data pipelines, but you can easily dial it in for enterprise-grade strictness.
  5. Serialization and JSON Support Pydantic doesn’t just validate: it simplifies exporting data too. The .dict() and .json() methods convert a model instance back into a dictionary or JSON string, making it easy to pass clean data to other systems or log valid states of your application. This is particularly beneficial for teams building APIs or microservices where consistent data interchange is essential.
  6. Comparing Pydantic to Other Validation Libraries Many developers historically used tools like Marshmallow or pure data classes. However, Marshmallow doesn’t tightly integrate with Python 3 type hints, and plain data classes offer no runtime validation. Pydantic merges type hints with an engine for runtime type coercion and enforcement, typically with better performance than Marshmallow and more straightforward usage than rolling your own solution.
  7. Performance and Cython Optimization Under the hood, Pydantic benefits from optional Cython compilation, speeding up certain data validation paths significantly. While Pydantic is already efficient in pure Python, those who demand high throughput can install the Cython-based build to get an extra performance edge. This has helped Pydantic stand out in production-grade, high-performance web services.
  8. Broad Usage Beyond FastAPI Although closely associated with FastAPI for building modern Python APIs, Pydantic is in no way limited to that ecosystem. It can handle command-line input, form submissions in Flask, advanced data pipelines, or even function-level validation with the @validate_arguments decorator. If you have inbound data you do not fully trust, there’s probably a place for Pydantic.
  9. Future of Pydantic: V2 and Python Typing Changes Samuel discussed possible challenges posed by upcoming Python releases (notably the stringification of annotations in Python 3.10+). There’s an ongoing PEP discussion around ensuring libraries like Pydantic can still dynamically access type objects. The team is also focused on more customizable serialization, improved plugin systems, and even more performance improvements in V2.
  10. Data Model Code Generator and JSON Schema JSON Schema support is built into Pydantic, making it easy to generate or consume schemas for your models. Tools like the Data Model Code Generator can even produce Pydantic models directly from an existing OpenAPI or JSON schema specification, saving developers significant time. This tight integration with standard schemas streamlines creating and documenting an API that shares its structure with front-end and other service teams.

Interesting quotes and stories

  • Samuel’s Oil Rig Anecdote: One of the more notable stories is how Samuel started coding “seriously” while stationed on remote oil rigs. He spoke about the peculiar combination of stressful, 14-hour night shifts and downtime that allowed him to explore programming on the job.
  • Analogy to Aerospace: Samuel drew an interesting comparison between engineering constraints in drilling versus aerospace: “It's kind of like aerospace, but no one minds crashing. So you can innovate more quickly.”
  • Community Contributions: Samuel highlighted the “tragedy of the commons” problem around open-source, where new feature requests can pile up, but addressing the backlog of PRs and issues can be slow without more help from the community.

Key definitions and terms

  • Type Hints: Python’s optional annotations for variables and function parameters, e.g. def greet(name: str) -> str:
  • JSON Schema: A vocabulary that allows you to annotate and validate JSON documents, crucial for describing and enforcing data shapes in APIs.
  • Validators: Functions or decorators in Pydantic that run custom checks and transformations for individual fields.
  • BaseModel: The core class in Pydantic from which user-defined models inherit, enabling validation, JSON conversion, etc.
  • PEP 649: A proposed change to Python that could affect how type annotations are stored and accessed, with implications for Pydantic.

Learning resources

Overall takeaway

Pydantic has emerged as a cornerstone for modern Python projects thanks to its seamless integration of type hints, rich validation features, and excellent performance. Whether you’re validating complex data in microservices, orchestrating CLI input, or converting user-generated data for machine learning pipelines, it can bring safety, clarity, and speed to your code. Through Samuel’s journey and the community’s collective innovation, it continues to evolve, setting the standard for data validation in Python. If you work with external or untrusted data at any scale, Pydantic is a powerful ally to have in your Python toolkit.

Links from the show

Samuel on Twitter: @samuel_colvin
pydantic: pydantic-docs.helpmanual.io
Contributing / help wanted @ pydantic: github.com
python-devtools package: python-devtools.helpmanual.io

IMPORTANT: PEP 563, PEP 649 and the future of pydantic #2678
GitHub issue on Typing: github.com

YouTube live stream video: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon