Pydantic Performance Tips

Episode #466, published Fri, Jun 14, 2024, recorded Thu, Jun 13, 2024

Episode Deep Dive Links Transcript

You're using Pydantic and it seems pretty straightforward, right? But could you adopt some simple changes to your code that would make it a lot faster and more efficient? Chances are, you'll find a couple of the tips from Sydney Runkle that will do just that. Join us to talk about Pydantic performance tips here on Talk Python.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guest Introduction and Role on the Pydantic Team
- Sydney Runkle recently graduated from the University of Wisconsin and now works full-time at the company behind Pydantic. She first joined as an intern, contributing to both open-source and commercial development. Sydney is involved with the ongoing evolution of Pydantic, including performance enhancements and community-driven features.
- Sydney explained her path from student to full-time contributor on the Pydantic project. She shared how PyCon 2023 was her first big conference experience, and how transitioning into full-time work at Pydantic allowed her to focus on open source, Rust integrations, and helping shape the future of the library.
Pydantic Overview
- Library Purpose: Pydantic is a data validation and settings management library that uses Python type hints for input validation.
- Core Use Cases: Commonly used in frameworks like FastAPI to validate request and response data.
- Massive Adoption: Over 400,000 projects depend on Pydantic, with over 200 million downloads in some months.
Pydantic v1 vs. v2
- Rust Core: Version 2 rewrote its core validation engine in Rust for major performance gains.
- Python Wrapper and Rust Engine: V2 still presents a Python API but delegates intense validation and serialization to Rust.
- Backward Compatibility: Although it was a big architectural change, the team sought to minimize breaking changes for users upgrading from v1.
Performance Tips (One-Liners and Small Changes)
- Use model_validate_json Instead of model_validate + json.loads
  Skip materializing Python objects unnecessarily by using model_validate_json to parse and validate JSON directly in Rust.
- Initialize TypeAdapter Once
  If you are validating the same type repeatedly, build the TypeAdapter once (rather than in a loop) to avoid repeatedly constructing schemas.
- Prefer Specific Type Hints Over General Ones
  For example, use list[int] instead of Sequence[int] to leverage more efficient validation paths.
- Defer Schema Building (If Startup Matters More)
  Consider deferring model schema builds using the defer_build flag (or similar config options) if import time is critical.
Discriminated (Tagged) Unions
- Concept: A union of multiple models distinguished by a specific field or a callable that identifies which model to apply.
- Example: A field that can be either a Cat model or a Dog model, each marked by a pet_type field.
- Why It Helps: Lets Pydantic skip validating irrelevant fields when it knows which union branch is correct. This boosts performance on large or nested models with many union members.
- Callable Discriminators: Instead of a simple literal field, you can use a function to decide which model applies, especially if multiple attributes determine the branch.
Future and Ongoing Performance Work
- Avoiding Materializing Data: Further enhancements will reduce the need to fully parse or transform data before validation.
- SIMD (Single Instruction, Multiple Data): Efforts in the custom JSON parser (JSON-iterable parser / Jiter) aim to do parallelized or vectorized operations.
- FastModel Concept (Not Yet Released): A potential feature allowing attributes to remain in Rust until explicitly accessed, boosting performance by lazily converting them into Python objects only when needed.
Tools for Measuring and Improving Performance
- Codspeed: Integrated with CI to compare performance on main vs. a PR branch using specialized benchmarks.
- CodeFlash: An LLM-based tool that suggests more performant code, tests the suggestions, and can open PRs with explanations.
JSON-to-Pydantic Generation
- json2pydantic.com: A useful website for quickly creating Pydantic models from raw JSON. This can save time in setting up complex nested models.
- Command-line or LLM Approaches: Some developers also use local code-generation libraries (e.g. datamodel-code-generator) or LLMs to generate models from JSON, then refine as needed.
Pydantic the Company and Commercial Offering
- Logfire Observability Platform: A new open beta service from the same team, built around an opinionated approach to OpenTelemetry. Provides nested logs, profiling, and a dashboard that integrates well with Python code.
- FastUI and Other Projects: The team is also working on developer productivity tools (like FastUI) that tightly integrate with Pydantic.

Overall Takeaway

Pydantic remains a top choice for data validation in Python, especially with its massive v2 performance improvements via Rust. By applying small tweaks—like using model_validate_json, initializing TypeAdapter once, specifying precise type hints, or adopting discriminated unions—developers can further optimize their code. Add to that the ecosystem of tools (Codspeed, CodeFlash, Logfire) and a vibrant, rapidly evolving open-source community, and it’s clear that Pydantic continues to push forward both ease of use and performance for Python data modeling.

Links from the show

Sydney Runkle: linkedin.com
Pydantic: pydantic.dev
Performance docs: docs.pydantic.dev
Union tips: docs.pydantic.dev
Sydney's presentation slides: docs.google.com
JSON to Pydantic: jsontopydantic.com
Samuel talking FastUI: talkpython.fm

CodeFlash: codeflash.ai
Codspeed: codspeed.io
Watch this episode on YouTube: youtube.com
Episode #466 deep-dive: talkpython.fm/466
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #466 deep-dive: talkpython.fm/466

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 You're using Pydantic and it seems pretty straightforward, right?

00:02 But could you adopt some simple changes to your code that would make it a lot faster and more

00:07 efficient? Chances are you'll find a couple of the tips from Sydney Runkle that will do just that.

00:12 Join us to talk about Pydantic performance tips here on Talk Python, episode 466, recorded June 13th,

00:20 2024. Are you ready for your host, here he is! You're listening to Michael Kennedy on Talk Python

00:27 to me. Live from Portland, Oregon, and this segment was made with Python.

00:31 Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy.

00:39 Follow me on Mastodon, where I'm @mkennedy and follow the podcast using @talkpython,

00:44 both on fosstodon.org. Keep up with the show and listen to over seven years of past episodes at

00:51 talkpython.fm. We've started streaming most of our episodes live on YouTube. Subscribe to our

00:57 YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows

01:02 and be part of that episode. This episode is brought to you by Sentry. Don't let those errors

01:07 go unnoticed. Use Sentry like we do here at Talk Python. Sign up at talkpython.fm/sentry.

01:13 And it's brought to you by Code Comments, an original podcast from Red Hat. This podcast covers

01:18 stories from technologists who've been through tough tech transitions and share how their teams

01:24 survive the journey. Episodes are available everywhere you listen to your podcasts and

01:29 at talkpython.fm/code dash comments. Hey folks, I got something pretty excellent for you.

01:35 PyCharm Professional for six months for free. Over at Talk Python, we partnered with the JetBrains

01:41 team to get all of our registered users free access to PyCharm Pro for six months. All you have to do is

01:48 take one of our courses. That's it. However, do note that this is not valid for renewals over at

01:53 JetBrains. Only new users there. And if you're not currently a registered user at Talk Python, well,

01:59 no problem. This offer comes with all of our courses. So even if you just sign up for one of our free

02:05 courses at talkpython.fm, click on courses in the menu, you're in. So how do you redeem it? Once you have

02:11 an account over at Talk Python, then it's super easy. Just visit your account page on Talk Python

02:16 training. And in the details tab, you'll have a code and a link to redeem your six months of PyCharm

02:21 Pro. So why not take a course, even a free one and get six months free of PyCharm.

02:26 Sydney, welcome back to Talk Python To Me. It's awesome to have you here.

02:31 Thank you. Super excited to be here. And yeah, I'm excited for our chat.

02:34 I am too. We're going to talk about Pydantic, one of my very favorite libraries that

02:39 just makes working with Python data, data exchange so, so easy, which is awesome. And it's really

02:46 cool that you're on the Pydantic team these days. Before then, I guess, you know, let's jump back

02:51 just a little bit. A few weeks ago, got to meet up a little bit in Pittsburgh at PyCon. How was PyCon

02:58 for you?

02:58 It was great. So it was my first PyCon experience ever. It was a very, very large conference. So it

03:04 was a cool kind of first introductory conference experience. I had just graduated not even a week

03:09 before. So it was a fun way to kind of roll into full time work and get exposed really to the Python

03:15 community. And it was, it was great to just kind of have a mix of getting to give a talk, getting to

03:20 attend lots of awesome presentations, and then most of all, just like meeting a bunch of really awesome

03:25 people in the community.

03:26 Yeah, I always love how many people you get to meet from so many different places and perspectives

03:33 and it's, it just reminds you the world is really big, but also really small, you know,

03:38 get to meet your friends and new people from all over the place.

03:41 Definitely. I was impressed by the number of like international attendees. I didn't really expect

03:46 that. It was great.

03:46 Yeah, same, same here. All right. Well, maybe a quick introduction for yourself, for those who

03:53 didn't hear your previous episode, and then we'll, we'll talk a bit about this Pydantic library.

03:58 Yeah, sure. Sounds great. So my name is Sydney. I just graduated from the University of Wisconsin.

04:03 Last time I chatted with you, I was still pursuing my degree in computer science and working part time as an

04:10 intern at the company Pydantic, which is kind of founded around the same ideas that inspired the open source tool.

04:17 And now we're building commercial tools and now I've rolled over into full-time work with them, primarily on the open source side.

04:23 So yeah, very excited to kind of be contributing to the open source community, but also getting to help with our commercial tools and development there.

04:32 Yeah. Yeah. Awesome. We'll talk a bit about that later. Super cool to be able to work on open source as a job, as a proper job, right?

04:40 Yeah. It's, it's awesome. It's really unique. I've kind of encouraged lots of people to contribute to open source as kind of a jumpstart into their software development careers, especially like young folks who are looking to get started with things and maybe don't have an internship or that sort of thing set up yet. I think it's a really awesome pipeline for like getting exposed to good code and collaborating with others and that sort of thing. But it's definitely special to get to do and get paid as well.

05:06 Indeed. So it's, it's a little bit unbelievable to me, but I'm sure that it is true that there are folks out there listening to the podcast that are like, Pydantic, maybe you've heard of that. What is, what is this Pydantic thing?

05:18 Yeah. What is Pydantic?

05:21 So Pydantic is the leading data validation library for Python. And so Pydantic uses type hints, which are optional in Python, but kind of generally more and more encouraged to enforce constraints on data and kind of validate data structures, etc.

05:37 So we're kind of looking at a very simple example together right now, where we're importing things like date time and tuple types from typing. And then kind of the core of Pydantic is you define these classes that inherit from this class called base model that's in Pydantic.

05:56 And that inheritance is what ends up helping you use methods to validate data, build JSON schema, things like that.

06:05 And so in our case, we have this delivery class that has a timestamp, which is of type date time, and then a dimensions tuple, which has two int parts.

06:15 And so then when you pass data into this delivery class to create an instance, Pydantic handles validating that data to make sure that it conforms to those constraints we've specified.

06:26 And so it's really a kind of intermediate tool that you can use for deserialization or loading data and then serialization, dumping data.

06:34 Yeah, it's a thing of beauty. I really love the way that it works. If you've got JSON data, nested JSON data, right?

06:41 If you go to Pydantic.dev slash open source, there's an example of here that we're talking about. And it's got a tuple, but the tuple contains integers, two of them.

06:51 And so if there's a tuple of three things, it'll give you an error. If it's a tuple of a date time in an int, it'll give you an error.

06:58 Like it reaches all the way inside. And, you know, things I guess it compares against. It's a little bit like data classes. Have you done much with data classes and compare them?

07:07 Yeah, that's a great question. So we actually offer support for like Pydantic data classes.

07:13 So I think data classes kind of took the first step of, you know, really supporting using type hints for model fields and things like that.

07:21 And then Pydantic sort of takes the next jump in terms of like validation and schema support.

07:26 And so I think one like very common use case is if you're defining like API request and response models, you can imagine like the JSON schema capabilities come in handy there.

07:36 And just ensuring like the integrity of your API and the data you're dealing with. Very helpful on the validation front.

07:43 Yeah, yeah. Very cool. Okay. Well, I guess one more thing for people who are not super familiar that Pydantic is, I think it's used every now and then.

07:54 Let's check it out on GitHub here. I'm just trying to think of, you know, like some of the main places people have heard of it.

08:00 Obviously, FastAPI, I think is the thing that really launched its popularity in the early days, if I had to guess.

08:06 But if we go over to GitHub, GitHub says that for the open source things that Pydantic is a foundational dependency for 412,644 different projects.

08:20 Yeah.

08:20 That's unbelievable.

08:21 Yeah, it's very exciting. We just got our May download numbers and heard that we have over 200 million downloads in May.

08:30 So that's both version one and version two, but definitely exciting to see how kind of critical of a tool it's become for so many different use cases in Python, which is awesome.

08:40 Yeah, absolutely. It's really, really critical.

08:43 And I think we should probably talk a little bit about Pydantic v1, v2 as a way to get into the architecture conversation, right?

08:52 That was a big thing. I talked to Samuel Colvin maybe a year ago or so, I would imagine.

08:57 I think around PyCon, I think we did actually at PyCon last year as well.

09:01 Yeah, for sure. So a lot of the benefit of using Pydantic is we promise some great performance.

09:08 And a lot of those performance gains came during our jump from v1 to v2.

09:14 So v1 was written solely in Python. We had some compiled options, but really it was mostly Pythonic data validation.

09:23 Or I say Pythonic, it's always Pythonic, but data validation done solely in Python.

09:28 And the big difference with v2 is that we rewrote kind of the core of our code in Rust.

09:34 And so Rust is much faster.

09:36 And so depending on what kind of code you're running, v2 can be anywhere from two to 20 times faster in certain cases.

09:45 So right now we still have this Python wrapper around everything in v2.

09:50 But then, and that's kind of used to define schemas for models and that sort of thing.

09:55 And then the actual validation and serialization logic occurs in Pythonic core in Rust.

10:02 Right. So I think the team did a really good job to make this major change, this major rewrite,

10:08 and split the whole monolithic thing into a Pythonic core, Pythonic itself, which is Python-based,

10:15 in a way that didn't break too many projects, right?

10:18 Yeah, that was the goal.

10:20 You know, every now and then there are breaking changes that I think are generally a good thing for the library moving forward, right?

10:27 Like hopefully whenever we make a breaking change, it's because it's leading to a significant improvement.

10:32 But we definitely do our best to avoid breaking changes.

10:35 And certainly someday we'll launch a v3 and hopefully that'll be an even more seamless transition for v2 users to v3 users.

10:44 Yeah, I would imagine that the switch to Rust probably, that big rewrite,

10:48 it probably caused a lot of thoughts of reconsidering, how are we doing this?

10:54 Or now that it's over in Rust, maybe it doesn't make sense this way or whatever.

10:58 Yeah. And I think just kind of, you know, we got a lot of feedback and usage of Pythonic v1,

11:03 so tried to do our best to incorporate all that feedback into a better v2 version in terms of both APIs and performance and that sort of thing.

11:10 Sure, sure.

11:11 John out in the audience asks, how do the team approach thread safety with this?

11:15 So Rust can be multiple threads, easy.

11:18 Python, not so much really, although maybe soon with free-threaded Python.

11:24 Yeah, that's a good question.

11:25 So our kind of Rust guru on the team is David Hewitt.

11:29 And he's very in the know about all of the multi-threading and things happening on the Rust side of things.

11:35 I myself have some more to learn about that, certainly.

11:37 But I think in general, kind of our approach is that Rust is quite type-safe, both performant and type-safe,

11:44 which is great and memory-safe as well.

11:48 And I think most of our...

11:50 I'll talk a little bit later about some, like, parallelization and vectorization that we're looking at for performance improvements.

11:56 But in terms of safety, I think if you have any questions, feel free to open an issue on the Pydantic core repo and get a conversation going with David Hewitt.

12:05 I would imagine it's not...

12:06 You guys haven't had to do too much with it.

12:08 just that Python currently, but soon, but currently doesn't really let you do much true multi-threading because of the GIL.

12:17 But the whole...

12:19 I think, you know...

12:20 Yeah, I think Python 3.13 is going to be crazy with free-threaded Python, and it's going to be interesting to see how that evolves.

12:27 Yep.

12:27 Anyway.

12:28 Yeah, I know we definitely do some jumping through hoops and just, you know, having to be really conscious of stuff with the GIL in Pydantic core and Py03.

12:36 And Py03 is kind of the library that bridges Python and Rust, and so it's heavily used in Pydantic core, as you can imagine.

12:43 So I'm excited to see what changes might look like there.

12:45 Yeah, same.

12:46 All right, well, let's jump into the performance because you're here to tell us all about Pydantic performance tips.

12:52 And you got a whole bunch of these.

12:53 Did you give this talk at PyCon?

12:55 I did, partially.

12:56 It's a little bit different, but some of the tips are the same.

12:58 I don't think the videos are out yet, are they?

13:00 As the time of recording on June 13th.

13:03 Yeah.

13:04 No, I actually checked a couple of minutes ago.

13:06 I was like, I said one thing during my talk that I wanted to double check, but the videos are not out yet.

13:10 No, I'm really excited.

13:12 There's going to be a bunch.

13:13 There was actually a bunch of good talks, including yours and some others.

13:16 I want to watch, but they're not out yet.

13:18 All right.

13:18 Let's jump into Pydantic performance.

13:21 Where should we start?

13:22 I can start on the slideshow if we want.

13:25 Yeah, let's do that.

13:26 Awesome.

13:27 So, yeah, I think kind of the categories of performance tips that we're going to talk about here kind of have some like fast one-liner type performance tips that you can implement in your own code.

13:37 And then kind of the meat of the, like, how do I improve performance in my, in my, you know, application that uses Pydantic, we're going to talk a bit about discriminated unions, also called tag unions, and then kind of finally talk about on our end of the development, how are we continuously improving performance, you know, Pydantic internals wise, etc.

13:58 Sure. Do you have the equivalent of unit tests for performance?

14:03 Yeah, we do.

14:04 Okay.

14:05 We use a library called codspeed that I'm excited to touch on a bit more later.

14:10 Yeah, all right. Let's talk about that later. Perfect.

14:12 Yeah, sure thing. So I have this slide up right now, just kind of talking about why people use Pydantic. We've already covered some of these, but just kind of as a general recap, it's powered by type hints. And one of our biggest promises is speed.

14:26 We also have these other great features like JSON schema compatibility and documentation comes in particularly handy when we talk about APIs, you know, support for custom validation and serialization logic.

14:37 And then as we saw with the GitHub repository observations, a very robust ecosystem of libraries and other tools that use and depend on Pydantic that leads to this kind of extensive and large community, which is really great.

14:52 But this all kind of lies on the foundation of like Pydantic is easy to use and it's very fast.

14:57 Yeah.

14:58 So let's talk some more.

14:58 And this, yeah, well, the speed is really interesting in the multiplier that you all have for basically a huge swath of the Python ecosystem, right?

15:08 We just saw the 412,000 things that depend on Pydantic.

15:11 Well, a lot of those, their performance depends on Pydantic's performance as well, right?

15:17 Yeah, certainly.

15:18 Yeah, it's nice to have such a large ecosystem of folks to also, you know, contribute to the library as well, right?

15:25 Like, you know, because other people are dependent on our performance, the community definitely becomes invested in it as well, which is great.

15:31 This portion of Talk Python is brought to you by OpenTelemetry support at Sentry.

15:37 In the previous two episodes, you heard how we use Sentry's error monitoring at Talk Python and how distributed tracing connects errors, performance and slowdowns and more across services and tiers.

15:50 But you may be thinking, our company uses OpenTelemetry, so it doesn't make sense for us to switch to Sentry.

15:56 After all, OpenTelemetry is a standard and you've already adopted it, right?

16:01 Well, did you know, with just a couple of lines of code, you can connect OpenTelemetry's monitoring and reporting to Sentry's backend.

16:08 OpenTelemetry does not come with a backend to store your data, analytics on top of that data, a UI or error monitoring.

16:16 And that's exactly what you get when you integrate Sentry with your OpenTelemetry setup.

16:21 Don't fly blind.

16:23 Fix and monitor code faster with Sentry.

16:26 Integrate your OpenTelemetry systems with Sentry and see what you've been missing.

16:30 Create your Sentry account at talkpython.fm/sentry dash telemetry.

16:35 And when you sign up, use the code TALKPYTHON, all caps, no spaces.

16:39 It's good for two free months of Sentry's business plan, which will give you 20 times as many monthly events as well as other features.

16:46 My thanks to Sentry for supporting Talk Python.

16:49 But yeah, so kind of as that first category, we can chat about some basic performance tips.

16:55 And I'll do my best here to kind of describe this generally for listeners who maybe aren't able to see the screen.

17:01 So when you are validating...

17:04 Can we share your slideshow later with the audience?

17:06 Can we put it in the show notes?

17:07 Yeah, yeah, absolutely.

17:08 Okay, so people want to go back and check it out.

17:10 But yeah, we'll describe it for everyone.

17:11 Go ahead.

17:12 So when you're validating data in Pydantic, you can either validate Python objects or like dictionary type data, or you can validate JSON formatted data.

17:24 And so one of these kind of like one liner tips that we have is to use our built in model validate JSON method instead of calling this our model validate method and then separately loading the JSON data with the standard lib JSON package.

17:40 And the reason that we recommend that is one of the like crux of the general performance patterns that we try to follow is not materializing things in Python when we don't have to.

17:50 So we've already mentioned that our core is written in Rust, which is much faster than Python.

17:55 And so with our model validate JSON built in method, whenever you pass in that string, we send it right to Rust.

18:01 Whereas if you do the JSON loading by yourself, you're going to like materialize Python object and then have to send it over.

18:07 Right.

18:08 And so you're going to be using the built in JSON load S, which will then or load or whatever, and then it'll pull that in, turn it into a Python dictionary.

18:17 Then you take it and try to convert that back to a Rust data structure and then validate it in Rust.

18:23 That's where all the validation lives anyway.

18:25 So just get out of the way, right?

18:27 Exactly.

18:28 Yep.

18:28 It's like skip the Python step if you can.

18:30 Right.

18:31 And I will note there is one exception here, which is I mentioned we support custom validation.

18:36 If you're using what we call like before and wrap validators that do something in Python and then call our internal validation logic and then maybe even do something after.

18:47 It's OK.

18:47 You can use model validate and the built in JSON dot load S because you're already kind of guaranteed to be materializing Python objects in that case.

18:55 But for the vast majority of cases, it's great to just go with the built in model validate JSON.

19:00 Yeah, that's really good advice.

19:01 And they seem kind of equivalent.

19:03 But once you know the internals, right, then it's well, maybe it's not exactly.

19:07 Yeah.

19:07 And I think implementing some of these tips is helpful in that if you understand some of the kind of like pydantic architectural context, it can also just help you think more about like, how can I write my pydantic code better?

19:19 Absolutely.

19:20 So the next tip I have here, very easy one liner fix, which is when you're using a type adapter, which is this structure you can use to basically validate one type.

19:32 So we have base models, which we've chatted about before, which is like if you have a model with lots of fields, that's kind of the structure you use to define it.

19:39 Well, type adapter is great if you're like, I just want to validate that this data is a list of integers, for example, as we're seeing on the screen.

19:45 Right.

19:46 Because let me give people an idea.

19:48 Like if you accept if you've got a JSON, well, just JSON data from wherever.

19:53 But, you know, a lot of times it's coming over an API or it's provided to you as a file and it's not your data you control, right?

19:59 You're trying to validate it.

20:00 You could get a dictionary JSON object that's got curly braces with a bunch of stuff, in which case that's easy to map to a class.

20:08 But if you just have JSON, which is bracket thing, thing, thing, thing, close bracket.

20:12 Well, how do you have class that represents a list?

20:15 Like it gets really tricky, right, to be able to understand.

20:19 You can't model that with classes.

20:21 And so you all have this type adapter thing, right?

20:23 That's what the role plays generally.

20:25 Is that right?

20:26 Yeah.

20:26 And I think it's also really helpful in a testing context.

20:29 Like, you know, when we want to check that our validation behavior is right for one type, there's no reason to go like build an entire model.

20:38 If you're really just validating against one type or structure, type adapter is great.

20:43 And so kind of the advice here is you only want to initialize your type adapter object once.

20:50 And the reason behind that is we build a core schema in Python and then attach that to a class or type adapter, etc.

20:59 And so if you can, you know, not build that type adapter within your loop, but instead of instead do it right before or not build it, you know, in your function, but instead outside of it, then you can avoid building the core schema over and over again.

21:11 Yeah.

21:12 So basically what you're saying is that the type adapter that you create might as well be a singleton because it's stateless, right?

21:19 Like it doesn't store any data.

21:21 It's kind of slightly expensive to create relatively.

21:25 And so if you had a function that was called over and over again and that function had a loop and inside the loop, you're creating the type adapter, that'd be like worst case scenario almost, right?

21:33 Yeah, exactly.

21:34 And I think this kind of goes along with like general best programming tips, right?

21:37 Which is like, if you only need to create something once, do that once.

21:41 Exactly.

21:42 You know, a parallel that maybe goes way, way back in time could be like a compiled regular expression.

21:49 You know, right.

21:50 You wouldn't do that over and over in a loop.

21:52 You would just create a regular, the compiled regular expression and you use it throughout your program, right?

21:57 Because it's kind of expensive to do that, but it's fast once it's created.

22:00 Yeah, exactly.

22:01 And funny that you mentioned that.

22:02 I actually fixed a bug last week where we were compiling regular expressions twice when folks like specified that as a constraint on a field.

22:11 So definitely just something to keep in mind and easy to fix or implement with type adapters here.

22:16 Yeah.

22:16 Awesome.

22:17 Okay.

22:17 I like this one.

22:18 That's a good one.

22:19 Yeah.

22:21 So this next tip also kind of goes along with like general best practices, but the more specific you can be with your type hints, the better.

22:28 And so specifically, if you know that you have a list of integers, it's better and more efficient to specify a type hint as a list of integers instead of a sequence of integers, for example.

22:40 Or if you know you have a dictionary that maps strings to integers, specify that type hint as a dictionary, not a mapping.

22:48 Interesting.

22:48 Yeah.

22:49 So you could import a sequence from the typey module, which is the generic way.

22:53 But I guess you probably have specific code that runs that can validate lists more efficiently than a general iterable type of thing, right?

23:01 Yeah, exactly.

23:02 So in the case of like a sequence versus a list, it's the like square and rectangle thing, right?

23:07 Like a list is a sequence, but there are lots of other types of sequences.

23:10 And so you can imagine for a sequence, we like have to check lots of other things.

23:15 Whereas if you know with certainty, this is going to be a list or it should be a list, then you can have things be more efficient with specificity there.

23:23 Does it make any difference at all whether you use the more modern type specifications?

23:29 Like traditionally, people would say from typing import capital L list, but now you can just say lowercase L list with the built in and no import statement.

23:38 Are those equivalent or is there some minor difference there?

23:42 Do you know?

23:43 Yeah, that's a good question.

23:44 I wouldn't be surprised if there was a minor difference that was more a consequence of like Python version, right?

23:50 Because there's like, I mean, I suppose you could import the old capital L list in a newer Python version.

23:56 But I think the difference is like more related to specificity of a type hint rather than kind of like versioning.

24:02 Yeah.

24:02 If the use of that capital L list made you write an import statement, I mean, it would cause the program to start ever so slightly slower.

24:13 Cause there's another import.

24:14 It's got to run worse.

24:15 It already knows.

24:16 It's already imported.

24:16 What list is.

24:17 You wouldn't believe how many times I get messages on YouTube videos I've done, or even from courses saying, Michael, I don't know what you're doing, but your code is just wrong.

24:28 I wrote lowercase L list bracket something.

24:31 And it said list is not a sub indexable or something like that.

24:35 And look, you've just done it wrong.

24:37 You're going to need to fix this.

24:38 Or, or you're on Python 3.7 or something super old before these new features were added.

24:44 But there's just somewhere in the community.

24:47 We haven't communicated this well.

24:50 I don't know.

24:50 Yeah, for sure.

24:52 I was writing some code earlier today in a meeting and I used the, like, from typing import union and then union X and Y types.

24:59 And my coworker was like, Cindy, use the pipe.

25:02 Like, what are you doing?

25:03 Use the, exactly.

25:05 But here's the thing that was introduced in 3.10, I believe.

25:08 And if people are in 3.9, that code doesn't run.

25:11 Or if they're not familiar with the changes, it's, so there's all these trade-offs.

25:15 I almost feel like it would be amazing to go back for any time there's a security release that releases, say, another 3.7 or something.

25:22 And change the error message to say, this feature only works in the future version of Python rather than some arbitrary error if you're doing it wrong.

25:31 You know, that would be great.

25:32 Yeah, definitely.

25:33 Yeah, some of those errors can be pretty cryptic with the syntax stuff.

25:36 And they can.

25:37 All right.

25:37 So be specific.

25:39 List tuple, not sequence if you know it's a list or a tuple or whatever.

25:43 Yeah.

25:43 And then kind of my last minor tip, which great that you brought up import statements and kind of adding general time to a program, is I don't have a slide for this one.

25:54 But if we go back to the type adapter slide, we talked about the fact that initializing this type adapter builds a core schema and attaches it to that class.

26:03 And that's kind of done at build time, at import time.

26:07 So that's, like, already done.

26:10 And if you really don't want to have that import or, like, build time take a long time, you can use the defer build flag.

26:20 And so what that does is defers the core schema build until the first validation call.

26:24 You can also set that on model config and things like that.

26:28 But basically, the idea here is, like, striving to be lazier, right?

26:31 I see.

26:32 Like, if we don't need to build this core schema right at import time because we want our program to start up quickly, that's great.

26:38 We might have a little bit of a delay on the first validation, but maybe startup time is more important.

26:43 So that's a little bit more of a preferential validation.

26:46 Sorry, preferential performance tip, but available for folks who need it.

26:50 Yeah.

26:51 Like, let me give you an example.

26:52 Let's give people an example where I think this might be useful.

26:54 So in the Talk Python training, the courses site, I think we've got 20,000 lines of Python code, which is probably more at this point.

27:02 I checked a long time ago, but a lot.

27:04 And it's a package.

27:05 And so when you import it, it goes and imports all the stuff to, like, run the whole web app.

27:10 But also little utilities like, oh, I just want to get a quick report.

27:14 I want to just access this model and then use it on something real quick.

27:19 It imports all that stuff so that app startup would be potentially slowed down by this.

27:23 Where if you know, like, only sometimes is that type adapter used, you don't want to necessarily have it completely created until that function gets called.

27:31 So then the first function call might be a little slow, but there'd be plenty of times where maybe it never gets called, right?

27:36 Yep, exactly.

27:37 Yeah, awesome.

27:38 Okay.

27:38 All right.

27:39 So kind of a more complex performance optimization is using tagged unions.

27:45 They're still pretty simple.

27:46 It's just like a little bit more than a one line change.

27:48 So kind of talking about tagged unions, we can go through a basic example why we're using tagged unions in the first place and then some more advanced examples.

27:57 This portion of Talk Python To Me is brought to you by Code Comments, an original podcast from Red Hat.

28:04 You know, when you're working on a project and you leave behind a small comment in the code, maybe you're hoping to help others learn what isn't clear at first.

28:13 Sometimes that Code Comment tells a story of a challenging journey to the current state of the project.

28:19 Code Comments, the podcast, features technologists who've been through tough tech transitions and they share how their teams survived that journey.

28:27 The host, Jamie Parker, is a Red Hatter and an experienced engineer.

28:32 In each episode, Jamie recounts the stories of technologists from across the industry who've been on a journey implementing new technologies.

28:40 I recently listened to an episode about DevOps from the folks at Worldwide Technology.

28:44 The hardest challenge turned out to be getting buy-in on the new tech stack rather than using that tech stack directly.

28:51 It's a message that we can all relate to, and I'm sure you can take some hard-won lessons back to your own team.

28:58 Give Code Comments a listen.

28:59 Search for Code Comments in your podcast player or just use our link, talkpython.fm/code dash comments.

29:06 The link is in your podcast player's show notes.

29:09 Thank you to Code Comments and Red Hat for supporting Talk Python To Me.

29:13 Let's start with what are tag unions because I honestly have no idea.

29:16 I know what unions are, but tagging them, I don't know.

29:19 Yeah, sure thing.

29:20 So tag unions are a special type of union.

29:23 We also call them discriminated unions.

29:25 And they help you specify a member of a model that you can use for discrimination in your validation.

29:34 So what that means is if you have two models that are pretty similar and your field can be either one of those types of models, model X or model Y.

29:43 But you know that there's one tag or discriminator field that differs.

29:48 You can specifically validate against that field and skip some of the other validation, right?

29:54 So I'll move on to an example here in a second.

29:57 But basically, it helps you validate more efficiently because you get to skip validation of some fields.

30:03 So it's really helpful if you have models that have like 100 fields, but one of them is really indicative of what type it might be.

30:09 I see. So instead of trying to figure out like, is it all of this stuff, once you know it has this aspect or that aspect,

30:16 then you can sort of branch on a path and just treat it as one of the elements of the union.

30:20 Is that right?

30:20 Yes, exactly.

30:21 And so one other note about discriminated unions is you specify this discriminator and it can either be a string, like literal type or callable type.

30:31 And we'll look at some examples of both.

30:33 So here's kind of a more concrete example so we can really better understand this.

30:37 So let's say we have a, this is the classic example, right?

30:41 A cat model and a dog model.

30:43 And they both have...

30:45 Cat people, dog people, you're going to start a debate here.

30:47 Exactly, exactly.

30:48 They both have this pet type field.

30:51 And for the cat model, it's a literal that is just the string cat.

30:56 And then for the dog model, it's the literal that's the string dog.

30:59 So it's just kind of a flag on a model to indicate what type it is.

31:03 And you can imagine, you know, in this basic case, we only have a couple of fields attached to each model.

31:08 But maybe this is like data in a, like vet database.

31:13 And so you can imagine like there's going to be tons of fields attached to this, right?

31:16 So it'd be pretty helpful to just be able to look at it and say, oh, the pet type is dog.

31:20 Let's make sure this data is valid for a dog type.

31:24 Also note, we have a lizard in here.

31:25 So what this looks like in terms of validation with Pydantic then is that when we specify this pet field, we just add one extra setting, which says that the discriminator is that pet type field.

31:40 And so then when we pass in data that corresponds to a dog model, Pydantic is smart enough to say, oh, this is a discriminated union field.

31:48 Let me go look for the pet type field on the model and just see what that is.

31:53 And then use that to inform my decision for what type I should validate against.

31:58 Okay, that's awesome.

31:59 So if we don't set the discriminator keyword value in the field for the union, it'll still work, right?

32:07 It just has to be more exhaustive and slow.

32:10 Yeah, exactly.

32:11 So it'll still validate and it'll say, hey, let's take this input data and try to validate it against the cat model.

32:18 And then Pydantic will come back and say, oh, that's not a valid cat.

32:20 Like, let's try the next one.

32:22 Whereas with this discriminated pattern, we can skip right to the dog, which you can imagine helps us skip some of the unnecessary steps.

32:30 Absolutely.

32:30 Okay, that's really cool.

32:31 I had no idea about this.

32:32 Yeah, yeah.

32:33 It's a cool, I'd say like moderate level feature.

32:36 Like I think if you're just starting to use Pydantic, you probably haven't touched discriminated unions much.

32:41 But we hope that it's simple enough to implement that most folks can use it if they're using unions.

32:46 Yeah, that's cool.

32:47 I don't use unions very often, which is probably why.

32:49 Other than, you know, something pipe none, which is, you know, like optional.

32:53 But yeah.

32:54 Yeah.

32:54 If I did, I'll definitely remember this.

32:57 Yeah.

32:57 Alrighty.

32:58 So as I've mentioned, this helps for more efficient validation.

33:02 And then where this really comes and has a lot of value is when you are dealing with lots of nested models or models that have tons of fields.

33:10 So let's say you have a union with like 10 members and each member of the union has 100 fields.

33:15 If you can just do validation against 100 fields instead of 1,000, that would be great in terms of a performance gain.

33:22 And then once again, with nested models, you know, if you can skip lots of those union member validations, also going to boost your performance.

33:29 Yeah, for sure.

33:30 You know, an example where this seems very likely would be using it with Beanie or some other document database where the modeling structure is very hierarchical.

33:40 You end up with a lot of nested sub-identic models in there.

33:45 Yeah.

33:45 Yeah, very much so.

33:46 Cool.

33:47 So as a little bit of an added benefit, we can talk about kind of this improved error handling, which is a great way to kind of visualize why the discriminated union pattern is more efficient.

33:58 So right now we're looking at an example of validation against a model that doesn't use a discriminated union.

34:04 And the errors are not very nice to look at.

34:07 You basically see the errors for every single permutation of the different values.

34:13 And we're using nested models.

34:14 So it's very hard to interpret.

34:16 So we don't have to look at this for too long.

34:19 It's not very nice.

34:20 But if we look at...

34:21 But basically the error message says, look, there's something wrong with the union.

34:26 If it was a string, it is missing these things.

34:28 If it was this kind of thing, it misses those things.

34:31 Like if it was a dog, it misses this.

34:33 If it's a pet, a cat, it misses that.

34:35 Right.

34:35 It doesn't specifically tell you.

34:37 Exactly.

34:38 It's a dog.

34:39 So it's missing like the collar size or whatever, right?

34:42 Right.

34:43 Exactly.

34:44 But then, and I'll go back and kind of explain the discriminated model for this case in a second.

34:49 But if you look at this is the model with the discriminated union instead, we have one very nice error that says, okay, you're trying to, you know, validate this X field and it's the wrong type.

35:01 Right.

35:03 So, yeah, the first example that we were looking at was using string type discriminators.

35:08 So we just had this pet type thing that said, oh, this is a cat or this is a dog, that sort of thing.

35:14 We also offer some more customization in terms of we also allow callable discriminators.

35:21 So in this case, this field can be either a string or this instance of discriminated model.

35:29 So it's kind of a recursive pattern, right?

35:32 And that's where you can imagine the nested structures becoming very complex very easily.

35:37 And we use this kind of callable to differentiate between, you know, which model we should validate against.

35:44 And then we tag each of the cases.

35:47 So a little bit more of a complex application here.

35:50 But once again, when you kind of see the benefit in terms of errors and interpreting things and performance, I think it's generally a worthwhile investment.

35:57 That's cool.

35:58 So if you wanted something like a composite key equivalent of a discriminator, right?

36:04 Like if it has this field and its nested model is of this type, it's one thing versus another.

36:10 Like a free user versus a paying user.

36:13 You might have to look and see their total lifetime value plus that they're a registered user.

36:18 I don't know.

36:18 Something like you could write code that would pull that information out and then discriminate which thing to validate against, right?

36:24 Yeah, exactly.

36:25 Yeah, it definitely comes in handy when you have like, you're like, okay, well, I still want the performance benefits of a discriminated union.

36:32 But I kind of have three fields on each model that are indicative of which one I should validate against, right?

36:37 Yeah.

36:37 And it's like, well, you know, taking the time to look at those three fields over the hundred is definitely worth it.

36:43 Just a little bit of complexity for the developer.

36:46 Yeah, cool.

36:47 One other note here is that discriminated unions are-

36:50 Can we go back really quick on the previous one?

36:52 So I got a quick question.

36:53 So for this, you write a function.

36:55 It's given the value that comes in, which could be a string.

37:01 It could be a dictionary, et cetera.

37:02 Could you do a little bit further performance improvements and add like a functools, LRU cache to cache the output?

37:11 So every time it sees the same thing, if there's repeated data through your validation, it goes, I already know what it is.

37:16 What do you think?

37:17 Yeah.

37:17 I do think that would be possible.

37:19 That's definitely an optimization we should try out and put in our docs for like the advanced, advanced performance tips.

37:25 Yeah.

37:25 Because if you've got a thousand strings and then, you know, that were like, it's maybe male, female, male, female, male, male, female, like that kind of where the data is repeated a bunch.

37:36 Yeah.

37:37 Then it could just go, yep, we already know that answer.

37:40 Yeah.

37:41 Potentially.

37:42 I don't know.

37:43 Yeah.

37:43 No, definitely.

37:44 And I will say, I don't know if it takes effect.

37:47 I don't think it takes effect with discriminated unions because this logic is kind of in Python.

37:52 But I will say we recently added a like string caching setting because we have kind of our own JSON parsing logic that we use in Pydantic Core.

38:01 And so we added a string caching setting so that you don't have to rebuild the exact same strings every time.

38:07 So that's a nice performance piece.

38:09 Yeah, nice.

38:09 Caching's awesome.

38:10 Until it's not.

38:11 Yeah, exactly.

38:12 So one quick note here is just that discriminated unions are still JSON schema compatible, which is awesome for the case where you're once again defining like API request and responses.

38:24 You want to still have valid JSON schema coming out of your models.

38:27 Yeah, very cool.

38:28 And that might show up in things like open API documentation and stuff like that, right?

38:33 Yep, exactly.

38:35 So I'll kind of skip over this.

38:37 We already touched on the callable discriminators.

38:39 And then I'll leave these slides up here as a reference.

38:43 Again, I don't think this is worth touching in too much detail.

38:46 But just kind of another comment about if you've got nested models, that still works well with discriminated unions.

38:53 So we're still on the pet example, but let's say this time you have a white cat and a black cat model.

38:59 And then you also have your existing dog model.

39:03 You can still create a union of, you know, your cat union is a union of black cat and white cat.

39:10 And then you can union that with the dogs and it still works.

39:13 And once again, you can kind of imagine the exponential blow up that would occur if you didn't use some sort of discriminator here in terms of errors.

39:21 Yeah. Very interesting.

39:22 Okay, cool.

39:23 Yeah.

39:23 So that's kind of all in terms of my recommendations for discriminated union application.

39:30 I would encourage folks who are interested in this to check out our documentation.

39:34 It's pretty thorough in that regard.

39:35 And I think we also have those links attached to the podcast.

39:38 Yeah, definitely.

39:39 And then performance improvements in the pipeline.

39:42 Is this something that we can control from the outside?

39:44 Is this something that you all are just adding for us?

39:47 Yeah.

39:47 Good question.

39:48 This is hopefully maybe not all in the next version, but just kind of things we're keeping our eyes on in terms of requested performance improvements and ideas that we have.

39:58 I'll go a little bit out of order here.

39:59 We've been talking a bunch about core schema and kind of maybe deferring the build of that or just trying to optimize that.

40:07 And that actually happens in Python.

40:09 So one of the biggest things that we're trying to do is effectively speed up the core schema building process so that import times are faster and just Pydantic is more performant in general.

40:20 Well, so one thing that I'd like to ask about, kind of back on the Python side a little bit, suppose I've got some really large document, right?

40:32 Really nested document.

40:33 Maybe I've converted some terrible XML thing into JSON or I don't know, something.

40:38 And there's a little bit of structured schema that I care about.

40:42 And then there's a whole bunch of other stuff that I could potentially create nested models to go to, but I don't really care about validating them.

40:50 It's just whatever it is, it is.

40:51 Yep.

40:51 What if you just said that was a dictionary?

40:53 Would that short circuit a whole bunch of validation and stuff that would make it faster potentially?

40:58 Yeah.

40:59 Can it turn off the validation for a subset of the model if it's really big and deep and you don't really care for that part?

41:06 Yeah, good question.

41:07 So we offer an annotation called skip validation that you can apply to certain types.

41:12 So that's kind of one approach.

41:14 I think in the future, it could be nice to offer kind of a config setting so that you can more easily list features that you want to skip validation for instead of applying those on a field-by-field basis.

41:25 And then the other thing is, if you only define your model in terms of the fields that you really care about from that very gigantic amount of data,

41:34 we will just ignore the extra data that you pass in and pull out the relevant information.

41:39 Right.

41:40 Okay.

41:40 Yeah, good.

41:41 Back to the pipeline.

41:43 Yeah, back to the pipeline.

41:45 So another improvement, we talked a little bit about potential parallelization of things or vectorization.

41:52 One thing that I'm excited to learn more about in the future and that we've started working on is this thing called SIMD in Jitter.

41:58 And that's our JSON iterable parser library that I was talking about.

42:02 And so SIMD stands for single instruction, multiple data.

42:06 Basically means that you can do operations faster.

42:10 And that's with this kind of vectorization approach.

42:12 I certainly don't claim to be an expert in SIMD, but I know that it's improving our validation speeds in the department of JSON parsing.

42:22 So that's something that we're hoping to support for a broader set of architectures going forward.

42:27 Yeah, that's really cool.

42:29 Almost like what Pandas does for Python.

42:32 Instead of looping over and validation and doing something to each piece, you just go, this whole column, multiply it by two.

42:37 Yep, yep, exactly.

42:39 I'm sure not implemented the same, but conceptually the same, just to be clear.

42:43 Yep, very much so.

42:44 And then the other two things in the pipeline that I'm going to mention are kind of related once again to the avoiding materializing things in Python if we can.

42:53 And we're even kind of extending that to avoiding materializing things in Rust if we don't have to.

42:59 So the first thing is when we're parsing JSON in Rust, can we just do the validation as we kind of chomp through the JSON instead of like materializing the JSON as a Rust object and then doing all the validation?

43:10 It's like, can we just do it in one pass?

43:13 Okay, is that almost like generators and iterables rather than loading all into memory at once and then processing it one at a time?

43:21 Yeah, exactly.

43:22 And it's kind of like, do you build the tree and then walk it three times or do you just do your operations every time you add something to the tree?

43:31 Yeah.

43:32 And then the last performance improvement in the pipeline that I'll mention is this thing called FastModel.

43:37 Has not been released yet.

43:38 Hasn't really even been significantly developed.

43:41 But this is cool in that it's really approaching that kind of laziness concept again.

43:45 So attributes would remain in Rust after validation until they're requested.

43:50 So this is kind of along the lines of the defer build logic that we were talking about in terms of like, we're not going to send you the data or perform the necessary operations until they're requested.

44:01 Right.

44:01 Okay.

44:02 Yeah, if you don't ever access the field, then why process all that stuff, right?

44:05 And convert it into Python objects.

44:07 Yeah, exactly.

44:08 But yeah, we're kind of just excited in general to be looking at lots of performance improvements on our end, even after the big V2 speedup.

44:16 Still have lots of other things to work on and improve.

44:19 Yeah, it sure seems like it.

44:21 And if this free-threaded Python thing takes off, who knows?

44:26 Maybe there's even more craziness with parallel processing of different branches of the model alongside each other.

44:34 Yeah.

44:35 So I think this kind of dovetails nicely into like you asked earlier, like, is there a way that we kind of monitor the performance improvements that we're making?

44:44 And we're currently using and getting started with two tools that are really helpful.

44:51 And I can share some PRs if that's helpful and send links after.

44:55 Yeah, sure.

44:56 But one of them is Codspeed, which integrates super nicely with CI and GitHub.

45:02 And it basically runs tests tagged with this, like, benchmark tag.

45:08 And then it'll, you know, run them on main compared to on your branch.

45:11 And then you can see, like, oh, this made my code, you know, 30% slower.

45:15 Like, maybe let's not merge that right away.

45:17 Or conversely, if, you know, there's a 30% improvement on some of your benchmarks, it's really nice to kind of track and see that.

45:25 I see.

45:25 So it looks like it sets up, this is Codspeed.io, right?

45:30 Yeah.

45:31 And then it sets up as, say, a GitHub action as part of your CI, CD, and, you know, probably automatically runs when a PR is open and things along those lines, right?

45:40 Yep, exactly.

45:41 All right.

45:42 I've never heard of this.

45:43 But, yeah, if it just does the performance testing for yourself automatically, why not, right?

45:48 Let it do that.

45:49 Yeah.

45:50 And then, I guess, another tool that I'll mention while talking about kind of our, you know, continuous optimization is a one word for it, is this tool kind of similarly named called CodeFlash.

46:04 So CodeFlash is a new tool that uses LLMs to kind of read your code and then develop potentially more performant versions, kind of analyze those in terms of, you know, is it passed, is this new code passing existing tests?

46:21 Is it passing additional tests that we write?

46:23 And then another great thing that it does is open PRs for you with those improvements and then explain the improvements.

46:30 So I think it's a really pioneering tool in the space, and we're excited to kind of experiment with it more on our PRs and in our repository.

46:40 Okay.

46:41 I love it.

46:42 Just tell me, why is this?

46:43 Why did this slow down?

46:45 Well, here's why.

46:46 Yeah, exactly.

46:47 Yeah.

46:47 And they offer both, like, local runs of the tool and also built-in CI support.

46:54 So those are just kind of two tools that we use to use and are increasingly using to help us kind of check our performance as we continue to develop and really inspire us to, you know, get those green check marks with the, like, performance improved on lots of PRs.

47:09 Yeah.

47:09 The more you can have it where if it passes the automated build, it's just ready to go and you don't have to worry a little bit and keep testing things and then have uncertainty.

47:20 You know that.

47:20 That's nice, right?

47:21 Yeah.

47:22 Because it lets you rest and sleep at night.

47:25 Yeah.

47:25 Most certainly.

47:26 I mean, I said it before, but the number of people who are impacted by Pydantic, I don't know what that number is, but it has to be tremendous.

47:35 Because if there's 400,000 projects that use it, like, think of the users of those projects, right?

47:40 Like, that multiple has got to be big for, you know, I'm sure there's some really popular ones.

47:44 For example, FastAPI.

47:45 Yeah.

47:46 Right?

47:46 Yeah.

47:47 And it's just nice to, you know, know that there are other companies and tools out there that can help us to, you know, really boost the performance benefits for all those users, which is great.

47:58 All right.

47:58 Yeah.

47:58 That is really cool.

47:59 I think, you know, let's talk about one more performance benefit for people and not so much in how fast your code runs, but in how fast you go from raw data to Pydantic models.

48:11 So, one thing, you probably have seen, we may have even spoken about this before.

48:16 Are you familiar with JSON to Pydantic, the website?

48:19 Yeah, it's a really cool tool.

48:20 Yeah.

48:21 It's such a cool tool.

48:21 And if you've got some really complicated data, like, let's see, I'll pull up some weather data that's in JSON format or something, right?

48:29 Like, if you just take this and you throw it in here, just don't even have to pretty print it.

48:33 It'll just go, okay, well, it looks like what we've got is, you know, this really complicated nested model here.

48:40 And it took, you know, we did this while I was talking, it took 10 seconds from me clicking the API to get a response to having, like, a pretty decent representation here.

48:49 Yeah.

48:50 It's great in terms of, like, developer agility, especially, right?

48:53 It's like, oh, I've, you know, heard of this tool called Pydantic.

48:55 I've seen it in places.

48:56 Like, I don't really know if I want to manually go build all these models for my super complicated JSON data.

49:02 It's like, boom, three seconds, done for you, basically.

49:04 Exactly.

49:06 Like, is it really worth it?

49:07 Because I don't want to have to figure this thing out and figure out all the types.

49:11 And like, no, just paste it in there and see what you get.

49:13 You're going to be, it won't be perfect, right?

49:15 Some things, if they're null in your data, but they could be something that would make them an optional element.

49:20 Like, they could be an integer or they could be null.

49:22 It won't know that it's going to be an integer, right?

49:25 Right.

49:26 So you kind of got to patch it up a tiny bit.

49:29 But in general, I think this is really good.

49:30 And then also, you know, just drop in with your favorite LLM, you know.

49:35 I've been using LLM Studio, which is awesome.

49:38 Nice.

49:39 I heard you talk about that on one of the most recent podcasts, right?

49:42 Yeah, yeah.

49:43 It's super cool.

49:43 You can just download Llama 3 and run it locally with like a, I think my computer can only handle 7 billion parameter models.

49:51 But you know that you get pretty good answers.

49:53 And if you give it a piece of JSON data and you say, convert that to Pydantic, you'll get really good results.

49:59 You have a little more control over than what you just get with this tool.

50:03 But I think those two things, while not about runtime performance, you know, going from I have data till I'm working with Pydantic, that's pretty awesome.

50:12 Yeah, definitely.

50:13 And if any, you know, passionate open source contributors are listening and want to create like a CLI tool for doing this locally, I'm sure that would be very much appreciated.

50:23 I think this is based on something that I don't use, but I think it's based on this data model code generator, which I think might be a CLI tool or a library.

50:34 Let's see.

50:35 Yes.

50:35 Oh, yeah.

50:36 Very nice.

50:37 But here's the problem that, you know, you go and define like a YAML file.

50:41 Like it's, it's just not as easy as like there's a text field.

50:44 I paste in my stuff, but it does technically, technically work, I suppose.

50:48 Yeah.

50:49 But no, definitely the LLM approach or just the basic website approaches are very quick, which is nice.

50:55 Yeah.

50:56 And speaking of LLMs, just really quick, like I feel, you know, you get some of the Python newsletters and other places like, here's the cool new packages.

51:03 A lot of them are like nine out of 10 of them are about LLMs these days.

51:07 I was like, that feels a little over the top to me.

51:09 But I know there's other things going on in the world.

51:11 But, you know, just put your thoughts on LLMs and coding these days.

51:15 I know you write a lot of code and think about it a lot and probably use LLMs somewhere in there.

51:19 Yeah.

51:20 No, for sure.

51:22 I'm pretty optimistic and excited about it.

51:24 I think there's a lot of good that can be done and a lot of productivity boosting to be had from integrating with these tools, both in your like local development environment and also just in general.

51:36 I think sometimes, you know, it's also great in the performance department, right?

51:39 Like we can see with CodeFlash, using LLMs to help you write more performant code can also be really useful.

51:47 And it's been exciting to see some libraries really leverage Pydantic as well in that space in terms of like validating LLM outputs or even using LLM calls in Pydantic validators to validate, you know, data along constraints that are more like language model friendly.

52:04 So, yeah, I'm optimistic about it.

52:06 I still have a lot to learn, but it's cool to see the variety of applications and kind of where you can plug in Pydantic in that process for fun.

52:13 Yeah, I totally agree.

52:15 Right now, the context window, like how much you can give it as information then to start asking questions is still a little bit small.

52:22 Like you can't give it some huge program and say, you know, find me the bugs where this function is called or, you know, whatever.

52:28 And it's like, it doesn't quite understand enough all at once, but that thing keeps growing.

52:33 So eventually someday we'll all see.

52:35 Yep.

52:36 All right.

52:36 Well, let's talk just for a minute, maybe real quick about what you all are doing at Pydantic, the company, rather than Pydantic, the open source library.

52:47 Like what do you all got going on there?

52:48 Yeah, sure.

52:49 Sure.

52:50 So Pydantic has, the company has released our first commercial tool.

52:55 It's called Logfire and it's in open beta.

52:58 So it's an observability platform.

53:01 And we would really encourage anyone interested to try it out.

53:04 It's super easy to get started with, you know, just the basic like pip install of the SDK and then start using it in your code base.

53:12 And then we have the kind of Logfire dashboard where you're going to see the observability and results.

53:19 And so we kind of adopt this like needle in the haystack philosophy where we want this to be a very easy to use observability platform that offers very like Python centric insights.

53:30 And it's this kind of opinionated wrapper around open telemetry, if folks are familiar with that.

53:37 But in kind of the context of performance, one of the great things about this tool is that it offers this like nested logging and profiling structure for code.

53:47 And so it can be really helpful in kind of looking at your code and being like, we don't know where this, you know, performance slowdown is occurring.

53:54 But if we integrate with Logfire, we can see that like very easily in the dashboard.

53:59 Yeah, you have some interesting approaches like specifically targeting popular frameworks like instrument FastAPI or something like that, right?

54:09 Yeah, definitely.

54:10 Trying to kind of build integrations that work very well with FastAPI, other tools like that.

54:16 And even also offering kind of like custom features in the dashboard, right?

54:20 Like if you're looking at, you know, if you're using an observability tool, you're probably advanced enough to want to add some extra things to your dashboard.

54:27 And we're working on supporting that with fast UI, which I know you've chatted with Samuel about as well.

54:31 Yeah, absolutely.

54:32 I got a chance to talk to Samuel about Logfire and some of the behind the scenes infrastructure was really interesting.

54:39 But also speaking of fast UI, you know, I did speak to him.

54:43 When was that?

54:44 Back in February.

54:45 So this is a really popular project.

54:48 And even on the, I was like, quite a few people decided that they were interested in even watching the video on that one, which, yeah.

54:57 Anything with fast UI?

54:59 Sorry, did you say anything with fast UI?

55:01 Yeah.

55:02 Yeah.

55:02 Are you doing anything on the fast UI side or are you on the Pydantic side of things?

55:07 Yeah, good question.

55:08 I've been working mostly on Pydantic, just, you know, larger user base, more feature requests.

55:14 But excited to, I've done a little bit on the fast UI side and excited to kind of brush up on my TypeScript and build that out as a more robust and supported tool.

55:22 I think, especially as we grow as a company and have more open source support in general, that'll be a priority for us, which is exciting.

55:29 Yeah.

55:30 It's an interesting project.

55:32 Basically, a cool way to do JavaScript front ends and React and then plug those back into Python APIs, like FastAPI and those types of things, right?

55:42 So, yeah.

55:43 Yeah.

55:44 And kind of a similarity with fast UI and Logfire, the new tool, is that there's pretty seamless integration with Pydantic, which is definitely going to be one of the kind of core tenets of any products or open source things that we're producing in the future.

55:56 Yeah, I can imagine that's something you want to pay special attention to is like, how well do these things fit together as a whole rather than just, here's something interesting, here's something interesting.

56:05 Yeah.

56:06 Awesome.

56:06 All right.

56:07 Well, I think that pretty much wraps it up for the time that we have to talk today.

56:12 Let's close it out for us with maybe final call to action for people who are already using Pydantic and they want it to go faster or maybe they could adopt some of these tips.

56:24 What do you tell them?

56:24 Yeah.

56:25 I would say, you know, inform yourself just a little bit about kind of the Pydantic architecture just in terms of like, what is core schema and why are we using Rust for validation and serialization?

56:37 And then that can kind of take you to the next steps of when do I want to build my core schemas based on kind of the nature of my application?

56:43 Is it okay if imports take a little bit longer or do I want to delay that?

56:47 And then take a look at discriminated unions.

56:50 And then maybe if you're really interested in improving performance across your application that supports Pydantic and other things, trying out Logfire and just seeing what sort of benefits you can get there.

57:00 Yeah.

57:01 See where you're spending your time is one of the very, you know, not just focused on Pydantic, but in general, our intuition is often pretty bad for where is your code slow and where is it not slow?

57:12 You're like, that looks really complicated.

57:14 That must be slow.

57:15 Like, nope.

57:15 It's that one call to like some sub module that you didn't realize was terrible.

57:19 Yeah.

57:20 Yeah.

57:21 And I guess that kind of circles back to the like LLM tools and, you know, integrated performance analysis with Codspeed and CodeFlash and even just other LLM tools, which is like, use the tools you have at hand.

57:32 And yeah, sometimes they're better at performance improvements than you might be, or it can at least give you good tips that give you, you know, a launching point, which is great.

57:40 Yeah, for sure.

57:41 Or even good old C profile built right in, right?

57:44 If you really, if you want to do it that way.

57:46 Awesome.

57:47 Yeah.

57:47 All right.

57:47 Well, Sydney, thank you for being back on the show and sharing all these tips and congratulations on all the work you and the team are doing.

57:55 You know, what a success Pydantic is.

57:58 Yeah.

57:58 Thank you so much for having me.

57:59 It was wonderful to get to have this discussion with you and excited that I got to meet you in person at PyCon recently.

58:04 Yeah, that was really great.

58:05 Really great.

58:06 Until next PyCon.

58:08 See you later.

58:08 This has been another episode of Talk Python To Me.

58:12 Thank you to our sponsors.

58:14 Be sure to check out what they're offering.

58:15 It really helps support the show.

58:17 Take some stress out of your life.

58:19 Get notified immediately about errors and performance issues in your web or mobile applications with Sentry.

58:25 Just visit talkpython.fm/sentry and get started for free.

58:30 And be sure to use the promo code talkpython, all one word.

58:33 Code comments and original podcast from Red Hat.

58:36 This podcast covers stories from technologists who've been through tough tech transitions and share how their teams survived the journey.

58:46 Podcasts are available everywhere you listen to your podcasts and at talkpython.fm/code dash comments.

58:51 Want to level up your Python?

58:53 We have one of the largest catalogs of Python video courses over at Talk Python.

58:57 Our content ranges from true beginners to deeply advanced topics like memory and async.

59:02 And best of all, there's not a subscription in sight.

59:05 Check it out for yourself at training.talkpython.fm.

59:08 Be sure to subscribe to the show.

59:10 Open your favorite podcast app and search for Python.

59:12 We should be right at the top.

59:14 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

59:23 We're live streaming most of our recordings these days.

59:26 If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

59:35 This is your host, Michael Kennedy.

59:36 Thanks so much for listening.

59:37 I really appreciate it.

59:38 Now get out there and write some Python code.

59:40 I'll see you next time.