00:00 You're using Pydantic and it seems pretty straightforward, right?
00:02 But could you adopt some simple changes to your code that would make it a lot faster and more efficient?
00:08 Chances are you'll find a couple of the tips from Sydney Runkle that will do just that.
00:12 Join us to talk about Pydantic performance tips here on Talk Python.
00:16 Episode 466 recorded June 13th, 2024.
00:21 [music]
00:34 Welcome to Talk Python to Me, a weekly podcast on Python.
00:38 This is your host, Michael Kennedy.
00:40 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,
00:45 both on fosstodon.org.
00:47 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.
00:52 We've started streaming most of our episodes live on YouTube.
00:56 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.
01:04 This episode is brought to you by Sentry.
01:06 Don't let those errors go unnoticed.
01:07 Use Sentry like we do here at Talk Python.
01:10 Sign up at talkpython.fm/sentry.
01:13 And it's brought to you by Code Comments, an original podcast from Red Hat.
01:17 This podcast covers stories from technologists who've been through tough tech transitions
01:22 and share how their teams survived the journey.
01:26 Episodes are available everywhere you listen to your podcast and at talkpython.fm/code-comments.
01:32 Hey, folks, I got something pretty excellent for you.
01:35 PyCharm Professional for six months for free.
01:38 Over at Talk Python, we partnered with the JetBrains team to get all of our registered users
01:44 free access to PyCharm Pro for six months.
01:47 All you have to do is take one of our courses.
01:49 That's it.
01:50 However, do note that this is not valid for renewals over at JetBrains.
01:53 Only new users there.
01:55 And if you're not currently a registered user at Talk Python, well, no problem.
02:00 This offer comes with all of our courses.
02:02 So even if you just sign up for one of our free courses at talkpython.fm,
02:07 click on Courses in the menu, you're in.
02:09 So how do you redeem it?
02:10 Once you have an account over at Talk Python, then it's super easy.
02:14 Just visit your account page on Talk Python Training.
02:17 And in the Details tab, you'll have a code and a link to redeem your six months of PyCharm Pro.
02:22 So why not take a course, even a free one, and get six months free of PyCharm?
02:28 Sydney, welcome back to Talk Python.
02:30 It's awesome to have you here.
02:31 Thank you.
02:32 I'm super excited to be here.
02:33 And yeah, I'm excited for our chat.
02:34 I am too.
02:35 We're going to talk about Pydantic, one of my very favorite libraries that just makes working with Python data,
02:41 data exchange so, so easy, which is awesome.
02:45 And it's really cool that you're on the Pydantic team these days.
02:49 More than I guess, you know, let's jump back just a little bit.
02:52 A few weeks ago, got to meet up a little bit in Pittsburgh at PyCon.
02:57 How was PyCon for you?
02:58 It was great.
02:59 So it was my first PyCon experience ever.
03:02 It was a very, very large conference.
03:03 So it was a cool kind of first introductory conference experience.
03:08 I had just graduated not even a week before.
03:10 So it was a fun way to kind of roll into full-time work and get exposed really to the Python community.
03:16 And it was great to just kind of have a mix of getting to give a talk, getting to attend lots of awesome presentations,
03:22 and then most of all, just like meeting a bunch of really awesome people in the community.
03:26 Yeah.
03:27 I always love how many people you get to meet from so many different places and perspectives.
03:33 And it just reminds you, the world is really big, but also really small.
03:38 You know, get to meet your friends and new people from all over the place.
03:41 Definitely.
03:42 I was impressed by the number of international attendees.
03:45 I didn't really expect that.
03:46 It was great.
03:46 Yeah.
03:47 Same here.
03:48 All right.
03:48 Well, maybe a quick introduction for yourself for those who didn't hear your previous episode,
03:54 and then we'll talk a bit about this Pydantic library.
03:57 Yeah, sure.
03:58 Sounds great.
03:59 So my name is Sydney.
04:00 I just graduated from the University of Wisconsin.
04:03 Last time I chatted with you, I was still pursuing my degree in computer science.
04:08 And working part-time as an intern at the company Pydantic, which is kind of founded around the same ideas that inspired the open source tool,
04:17 and now we're building commercial tools.
04:18 And now I've rolled over into full-time work with them, primarily on the open source side.
04:23 So yeah, very excited to kind of be contributing to the open source community,
04:27 but also getting to help with our commercial tools and development there.
04:32 Yeah, yeah.
04:33 Awesome.
04:33 We'll talk a bit about that later.
04:35 Super cool to be able to work on open source as a job, as a proper job, right?
04:40 Yeah, it's awesome.
04:41 It's really unique.
04:43 I've kind of encouraged lots of people to contribute to open source as kind of a jumpstart
04:47 into their software development careers, especially like young folks who are looking to get started
04:53 with things and maybe don't have an internship or that sort of thing set up yet.
04:56 I think it's a really awesome pipeline for like getting exposed to good code and collaborating
05:01 with others and that sort of thing.
05:02 But it's definitely special to get to do and get paid as well.
05:06 Indeed.
05:07 So it's a little bit unbelievable to me, but I'm sure that it is true that there are folks
05:12 out there listening to the podcast that are like, "Pydantic, maybe I've heard of that.
05:16 What is this Pydantic thing?" Yeah, great question.
05:20 What is Pydantic?
05:22 So Pydantic is the leading data validation library for Python.
05:26 And so Pydantic uses type hints, which are optional in Python, but kind of generally more
05:31 and more encouraged to enforce constraints on data and kind of validate data structures,
05:37 et cetera.
05:38 So we're kind of looking at a very simple example together right now where we're importing
05:43 things like date time and tuple types from typing.
05:46 And then kind of the core of Pydantic is you define these classes that inherit from this
05:54 class called base model that's in Pydantic.
05:57 And that inheritance is what ends up helping you use methods to validate data, build JSON
06:03 schema, things like that.
06:05 And so in our case, we have this delivery class that has a timestamp, which is of type
06:09 date time, and then a dimensions tuple, which has two int parts.
06:16 And so then when you pass data into this delivery class to create an instance, Pydantic handles
06:22 validating that data to make sure that it conforms to those constraints we've specified.
06:27 And so it's really a kind of intermediate tool that you can use for deserialization
06:31 or loading data and then serialization, dumping data.
06:34 Yeah, it's a thing of beauty.
06:37 I really love the way that it works.
06:38 If you've got JSON data, nested JSON data, right?
06:41 If you go to pydantic.dev/opensource, there's an example of here that we're talking about.
06:47 It's got a tuple, but the tuple contains integers, two of them.
06:51 And so if there's a tuple of three things, it'll give you an error.
06:54 If it's a tuple of a date time and an int, it'll give you an error.
06:58 Like it reaches all the way inside and, you know, things I guess it compares against.
07:03 It's a little bit like data classes.
07:05 Have you done much with data classes and compared them?
07:07 Yeah, that's a great question.
07:09 So we actually offer support for like Pydantic data classes.
07:13 So I think data classes kind of took the first step of, you know, really supporting using
07:18 type hints for model fields and things like that.
07:21 And then Pydantic sort of takes the next jump in terms of like validation and schema support.
07:27 And so I think one like very common use cases, if you're defining like API request and response
07:32 models, you can imagine like the JSON schema capabilities come in handy there.
07:36 And just ensuring like the integrity of your API and the data you're dealing with.
07:41 Very helpful in the validation front.
07:43 Yeah, yeah.
07:44 Very cool.
07:45 Okay.
07:46 So I guess one more thing for people who are not super familiar that Pydantic is, I think
07:52 it's used every now and then.
07:55 Let's check it out on GitHub here.
07:56 I'm just trying to think of, you know, like some of the main places people have heard
07:59 of it.
08:00 Obviously, FastAPI, I think is the thing that really launched its popularity in the early
08:05 days if I had to guess.
08:06 But if we go over to GitHub, GitHub says that for the open source things that Pydantic is
08:13 a foundational dependency for 412,644 different projects.
08:20 Yeah.
08:21 That's unbelievable.
08:22 Yeah, it's very exciting.
08:23 We just got our May download numbers and heard that we have over 200 million downloads in
08:30 May.
08:31 So that's both version one and version two.
08:33 But definitely exciting to see how kind of critical of a tool it's become for so many
08:38 different use cases in Python, which is awesome.
08:40 Yeah, absolutely.
08:42 It's really, really critical.
08:43 And I think we should probably talk a little bit about Pydantic v1, v2 as a way to get
08:49 into the architecture conversation.
08:52 Right.
08:53 That was a big thing I talked to Samuel Colvin maybe a year ago or so, I would imagine.
08:57 Think around PyCon.
08:59 I think we did actually a PyCon last year as well.
09:01 Yeah, for sure.
09:02 So a lot of the benefit of using Pydantic is we promise some great performance.
09:10 And a lot of those performance gains came during our jump from v1 to v2.
09:14 So v1 was written solely in Python.
09:17 We had some compiled options, but really it was mostly Pythonic data validation.
09:23 Or I say Pythonic.
09:24 It's always Pythonic, but data validation done solely in Python.
09:29 And the big difference with v2 is that we rewrote kind of the core of our code in Rust.
09:35 And so Rust is much faster.
09:36 And so depending on what kind of code you're running, v2 can be anywhere from two to 20
09:42 times faster in certain cases.
09:45 So right now we still have this Python wrapper around everything in v2.
09:50 But then, and that's kind of used to define schemas for models and that sort of thing.
09:55 And then the actual validation and serialization logic occurs in Pydantic core in Rust.
10:02 Right.
10:03 So I think the team did a really good job to make this major change, this major rewrite,
10:09 and split the whole monolithic thing into a Pydantic core, Pydantic itself, which is
10:14 Python based in a way that didn't break too many projects, right?
10:19 Yeah, that was the goal.
10:20 You know, every now and then there are breaking changes that I think are generally a good
10:25 thing for the library moving forward, right?
10:27 Like hopefully whenever we make a breaking change, it's because it's leading to a significant
10:31 improvement.
10:32 So we definitely do our best to avoid breaking changes and certainly someday we'll launch
10:37 a v3 and hopefully that'll be an even more seamless transition for v2 users to v3 users.
10:44 Yeah, I would imagine that the switch to Rust probably, that big rewrite, it probably caused
10:50 a lot of ways, thoughts of reconsidering, how are we doing this?
10:54 Or now that it's over in Rust, maybe it doesn't make sense this way or whatever.
10:58 Yeah.
10:59 So we kind of, you know, we got a lot of feedback and usage of Pydantic v1, so tried to do our
11:03 best to incorporate all that feedback into a better v2 version in terms of both APIs
11:08 and performance and that sort of thing.
11:10 Sure, sure.
11:11 John out in the audience asks, how did the team approach thread safety with this?
11:16 So Rust can be multiple threads easy.
11:19 Python, not so much really, although maybe soon with free threaded Python.
11:24 Yeah, that's a good question.
11:26 So our kind of Rust guru on the team is David Hewitt, and he's very in the know about all
11:31 of the multi-threading and things happening on the Rust side of things.
11:35 I myself have some more to learn about that, certainly.
11:38 But I think in general, kind of our approach is that Rust is quite type safe, both performant
11:43 and type safe, which is great and memory safe as well.
11:48 And I think most of our, I'll talk a little bit later about some like parallelization
11:54 and vectorization that we're looking at for performance improvements.
11:57 But in terms of safety, I think if you have any questions, feel free to open an issue
12:01 on the Pydantic core repo and get a conversation going with David Hewitt.
12:05 I would imagine it's not, you guys haven't had to do too much with it, just that Python
12:09 currently, but soon, but currently doesn't really let you do much true multi-threading
12:16 because of the GIL.
12:17 But the whole, I think, you know, yeah, I think Python 3.13 is going to be crazy with
12:23 free threaded Python and it's going to be interesting to see how that evolves.
12:27 Yep.
12:28 Yeah.
12:29 I know we definitely do some jumping through hoops and just, you know, having to be really
12:32 conscious of stuff with the GIL in Pydantic core and Py03.
12:36 And Py03 is kind of the library that bridges Python and Rust.
12:41 And so it's heavily used in Pydantic core, as you can imagine.
12:44 So I'm excited to see what changes might look like there.
12:46 Yeah, same.
12:47 All right.
12:48 Well, let's jump into the performance because you're here to tell us all about Pydantic
12:51 performance tips.
12:52 And you've got a whole bunch of these.
12:54 Did you give this talk at PyCon?
12:55 I did partially.
12:56 It's a little bit different, but some of the tips are the same.
12:59 I don't think the videos are out yet, are they?
13:01 As the time of recording on June 13th.
13:03 Yeah.
13:04 No, I actually checked a couple of minutes ago.
13:06 I was like, I said one thing during my talk that I wanted to double check, but the videos
13:10 are not out yet.
13:11 So.
13:12 No, I'm really excited.
13:13 There's going to be a bunch.
13:14 There was actually a bunch of good talks, including yours and some others I want to
13:16 watch, but they're not out yet.
13:18 All right.
13:19 Let's jump into Pydantic performance.
13:21 Where should we start?
13:23 I can start on the slideshow if we want.
13:25 Yeah, let's do that.
13:26 Awesome.
13:27 So, yeah, I think kind of the categories of performance tips that we're going to talk
13:30 about here kind of have some like fast one liner type performance tips that you can implement
13:37 in your own code and then kind of the meat of the like, how do I improve performance
13:41 in my in my application that uses Pydantic?
13:44 We're going to talk a bit about discriminated unions, also called tagged unions, and then
13:49 kind of finally talk about on our end of the development, how are we continuously improving
13:55 performance?
13:56 You know, Pydantic internals, lies, et cetera.
13:58 Sure.
13:59 Do you have the equivalent of unit test for performance?
14:03 Yeah, we do.
14:05 We use a library called CodSpeed that I'm excited to touch on a bit more later.
14:10 Yeah.
14:11 All right.
14:12 Let's talk about that later.
14:13 Perfect.
14:14 Yeah, sure thing.
14:15 So I have this slide up right now just kind of talking about why people use Pydantic.
14:18 We've already covered some of these, but just kind of as a general recap, it's powered by
14:22 type hints.
14:23 And one of our biggest promises is speed.
14:26 We also have these other great features like JSON schema compatibility and documentation
14:31 comes in particularly handy when we talk about APIs, you know, support for custom validation
14:36 and serialization logic.
14:38 And then as we saw with the GitHub repository observations, a very robust ecosystem of libraries
14:44 and other tools that use and depend on Pydantic that leads to this kind of extensive and large
14:49 community, which is really great.
14:52 But this all kind of lies on the foundation of like Pydantic is easy to use and it's very
14:56 fast.
14:57 So let's talk some more about that.
14:58 And this, yeah, well, the speed is really interesting in the multiplier that you all
15:03 have for basically a huge swath of the Python ecosystem, right?
15:08 We just saw the 412,000 things that depend on Pydantic.
15:12 Well, a lot of those, their performance depends on Pydantic's performance as well.
15:17 Right?
15:18 Yeah, certainly.
15:19 Yeah, it's nice to have such a large ecosystem of folks to also, you know, contribute to
15:24 the library as well, right?
15:25 Like, you know, because other people are dependent on our performance, the community definitely
15:28 becomes invested in it as well, which is great.
15:33 This portion of Talk Python to Me is brought to you by OpenTelemetry support at Sentry.
15:38 In the previous two episodes, you heard how we use Sentry's error monitoring at Talk Python,
15:43 and how distributed tracing connects errors, performance and slowdowns and more across
15:48 services and tiers.
15:50 But you may be thinking our company uses OpenTelemetry.
15:54 So it doesn't make sense for us to switch to Sentry.
15:56 After all, OpenTelemetry is a standard, and you've already adopted it, right?
16:01 Well, did you know, with just a couple of lines of code, you can connect OpenTelemetry's
16:06 monitoring and reporting to Sentry's backend.
16:10 OpenTelemetry does not come with a backend to store your data, analytics on top of that
16:13 data, a UI or error monitoring.
16:17 And that's exactly what you get when you integrate Sentry with your OpenTelemetry setup.
16:22 Don't fly blind, fix and monitor code faster with Sentry.
16:26 Integrate your OpenTelemetry systems with Sentry and see what you've been missing.
16:30 Create your Sentry account at talkpython.fm/sentry-telemetry.
16:35 And when you sign up, use the code TALKPYTHON, all caps, no spaces.
16:39 It's good for two free months of Sentry's business plan, which will give you 20 times
16:43 as many monthly events as well as other features.
16:47 My thanks to Sentry for supporting Talk Python to me.
16:50 But yeah, so kind of as that first category, we can chat about some basic performance tips.
16:55 And I'll do my best here to kind of describe this generally for listeners who maybe aren't
16:59 able to see the screen.
17:01 So when you are...
17:03 Can we share your slideshow later with the audience?
17:06 Can we put it in the show notes?
17:07 Yeah, yeah, absolutely.
17:08 Okay, so people want to go back and check it out.
17:10 But yeah, we'll just, we'll describe it for everyone.
17:12 Go ahead.
17:13 Yeah.
17:14 So when you're validating data in Pydantic, you can either validate Python objects or
17:19 like dictionary type data, or you can validate JSON formatted data.
17:24 And so one of these kind of like one liner tips that we have is to use our built in model
17:31 validate JSON method, instead of calling this our model validate method, and then separately
17:37 loading the JSON data with the standard lib JSON package.
17:41 And the reason that we recommend that is one of the like crux of the general performance
17:46 patterns that we try to follow is not materializing things in Python when we don't have to.
17:51 So we've already mentioned that our core is written in Rust, which is much faster than
17:54 Python.
17:55 And so with our model validate JSON built in method, whenever you pass in that string,
18:00 we send it right to Rust.
18:01 Whereas if you do the JSON loading by yourself, you're going to like materialize Python object
18:06 and then have to send it over.
18:07 Right.
18:08 And so you're going to be using the built in JSON load S, which will then or load or
18:14 whatever, and then it'll pull that in, turn into a Python dictionary, then you take it
18:18 and try to convert that back to a Rust data structure, and then validate it in Rust.
18:23 That's where all the validation lives anyway.
18:25 So just get out of the way, right?
18:28 Exactly.
18:29 Yep.
18:30 It's like skip the Python step if you can, right.
18:31 And I will note there is one exception here, which is I mentioned we support custom validation.
18:36 If you're using what we call like before and wrap validators that do something in Python,
18:41 and then call our internal validation logic, and then maybe even do something after, it's
18:47 okay, you can use model validate and the built in JSON dot load S because you're already
18:52 kind of guaranteed to be materializing Python objects in that case.
18:55 But for the vast majority of cases, it's great to just go with the built in model validate
18:59 JSON.
19:00 Yeah, that's really good advice.
19:02 And they seem kind of equivalent.
19:03 But once you know the internals, right, then it's well, maybe it's not exactly.
19:07 Yeah.
19:08 And I think implementing some of these tips is helpful in that if you understand some
19:11 of the kind of like pydantic architectural context, it can also just help you think more
19:16 about like, how can I write my pydantic code better?
19:19 Absolutely.
19:20 So, the next tip I have here, very easy one liner fix, which is when you're using type
19:26 adapter, which is this structure you can use to basically validate one type.
19:32 So we have base models, which we've chatted about before, which is like if you have a
19:36 model with lots of fields, that's kind of the structure you use to define it.
19:39 Well, type adapter is great if you're like, I just want to validate that this data is
19:43 a list of integers, for example, as we're seeing on the screen.
19:46 Right.
19:47 Because let me give people an idea.
19:48 Like if you accept, if you've got a JSON, well, just JSON data from wherever, but you
19:53 know, a lot of times it's coming over an API or it's provided to you as a file and it's
19:57 not your data you control, right?
19:59 You're trying to validate it.
20:00 You could get a dictionary, a JSON object that's got curly braces with a bunch of stuff,
20:06 in which case that's easy to map to a class.
20:08 But if you just have JSON, which is bracket, thing, thing, thing, thing, close bracket.
20:12 Well, how do you have class that represent the list?
20:15 Like it gets really tricky, right?
20:17 To be able to understand, you can't model that with classes.
20:21 And so you all have this type adapter thing, right?
20:24 That's what this, the role plays generally.
20:25 Is that right?
20:26 Yeah.
20:27 And I think it's also really helpful in a testing context.
20:30 Like you know, when we want to check that our validation behavior is right for one type,
20:35 there's no reason to go like build an entire model if you're really just validating against
20:40 one type or structure, type adapter is great.
20:44 And so kind of the advice here is you only want to initialize your type adapter object
20:50 once.
20:51 And the reason behind that is we build a core schema in Python and then attach that to a
20:56 class or type adapter, et cetera.
20:59 And so if you can, you know, not build that type adapter within your loop, but instead
21:03 of instead do it right before or not build it, you know, in your function, but instead
21:08 outside of it, then you can avoid building the core schema over and over again.
21:11 Yeah.
21:12 So basically what you're saying is that the type adapter that you create might as well
21:16 be a singleton because it's stateless, right?
21:19 Like it doesn't store any data.
21:21 It's kind of slightly expensive to create relatively.
21:24 And so if you had a function that was called over and over again, and that function had
21:28 a loop and inside the loop, you're creating the type adapter, that'd be like worst case
21:31 scenario almost, right?
21:33 Yeah, exactly.
21:34 And I think this kind of goes along with like general best programming tips, right?
21:37 Which is like, if you only need to create something once, do that once.
21:42 Exactly.
21:43 You know, a parallel that maybe goes way, way back in time could be like a compiled
21:48 regular expression.
21:49 You know, you wouldn't do that over and over in a loop.
21:52 You would just create a regular, the compiled regular expression and then use it throughout
21:55 your program, right?
21:57 Because it's kind of expensive to do that, but it's fast once it's created.
22:00 Yeah, exactly.
22:01 And funny that you mentioned that.
22:02 I actually fixed a bug last week where we were compiling regular expressions twice when
22:08 folks like specified that as a constraint on a field.
22:11 So definitely just something to keep in mind and easy to fix or implement with type adapters
22:16 here.
22:17 Yeah, awesome.
22:18 Okay.
22:19 I like this one.
22:20 That's a good one.
22:21 Yeah.
22:22 So this next tip also kind of goes along with like general best practices, but the more
22:25 specific you can be with your type hints, the better.
22:28 And so specifically, if you know that you have a list of integers, it's better and more
22:34 efficient to specify a type hint as a list of integers instead of a sequence of integers,
22:39 for example.
22:40 Or if you know you have a dictionary that maps strings to integers, specify that type
22:46 hint as a dictionary, not a mapping.
22:48 Interesting.
22:49 Yeah.
22:50 So you could import a sequence from the typing module, just the generic way, but I guess
22:54 you probably have specific code that runs that can validate lists more efficiently than
22:59 a general iterable type of thing, right?
23:01 Yeah, exactly.
23:02 So in the case of like a sequence versus a list, it's the like square and rectangle thing,
23:07 right?
23:08 Like a list is a sequence, but there are lots of other types of sequences.
23:11 And so you can imagine for a sequence, we like have to check lots of other things.
23:15 Whereas if you know with certainty, this is going to be a list or it should be a list,
23:20 then you can have things be more efficient with specificity there.
23:23 Does it make any difference at all whether you use the more modern type specifications?
23:29 Like traditionally people would say from typing import capital L list, but now you can just
23:35 say lowercase L list with the built-in and no import statement.
23:39 Are those equivalent or is there some minor difference there?
23:42 Do you know?
23:43 Yeah, that's a good question.
23:44 I wouldn't be surprised if there was a minor difference that was more a consequence of
23:48 like Python version, right?
23:50 Because there's like, I mean, I suppose you could import the old capital L list in a newer
23:55 Python version, but I think the difference is like more related to specificity of a type
23:59 in rather than kind of like versioning.
24:03 If the use of that capital L list made you write an import statement, I mean, it would
24:09 cause the program to start ever so slightly slower because there's another import.
24:14 It's got a run where it already knows it's already imported what list is.
24:18 You wouldn't believe how many times I get messages on YouTube videos I've done or even
24:23 from courses saying, Michael, I don't know what you're doing, but your code is just wrong.
24:28 I wrote lowercase L list bracket something and it said list is not a sub indexable or
24:35 something like that.
24:36 And look, you've just done it wrong.
24:37 You're going to need to fix this.
24:38 Or you're on Python 3.7 or something super old before these new features are added.
24:45 But there's just somewhere in the community we haven't communicated this well.
24:49 I don't know.
24:50 Yeah, for sure.
24:52 I was writing some code earlier today in a meeting and I used the like from typing import
24:56 union and then union X and Y type.
25:00 My coworker was like, Sydney, use the pipe.
25:02 Like what are you doing?
25:04 Use the pipe, exactly.
25:05 But here's the thing that was introduced in 3.10, I believe.
25:08 And if people are in 3.9, that code doesn't run or if they're not familiar with the changes.
25:13 So there's all these tradeoffs.
25:15 I almost feel like it would be amazing to go back for any time there's a security release
25:20 that releases, say another 3.7 or something and change the error message to say this feature
25:25 only works in the future version of Python rather than some arbitrary error of you're
25:30 doing it wrong.
25:31 You know, that would be great.
25:32 Yeah, definitely.
25:33 Yeah.
25:34 Some of those errors can be pretty cryptic with the syntax stuff.
25:36 They can.
25:37 All right.
25:38 So be specific list tuple not sequence if you know it's a list or tuple or whatever.
25:43 Yeah.
25:44 And then kind of my last minor tip, which great that you brought up import statements
25:49 and kind of adding general time to a program is I don't have a slide for this one, but
25:54 if we go back to the type adapter slide, we talked about the fact that initializing this
25:59 type adapter builds a core schema and attaches it to that class.
26:04 And that's kind of done at build time at import time.
26:08 So that's like already done.
26:10 And if you really don't want to have that import or like build time, take a long time,
26:17 you can use the defer build flag.
26:20 And so what that does is defers the core schema build until the first validation call.
26:25 You can also set that on model config and things like that.
26:28 But basically the idea here is like striving to be lazier, right?
26:32 Like I see if we don't need to build this core schema right at import time because we
26:36 want our program to start up quickly.
26:38 That's great.
26:39 There's a little bit of a delay on the first validation, but maybe startup time is more
26:43 important.
26:44 So that's a little bit more of a preferential validation, sorry, preferential performance
26:48 tip, but available for folks who need it.
26:50 Yeah.
26:51 Like, let me give you an example.
26:52 I'll give people an example where I think this might be useful.
26:55 So in the Talk Python Training, the courses site, I think we've got 20,000 lines of Python
27:00 code, which is probably more at this point.
27:02 I checked a long time ago, but a lot.
27:04 And it's a package.
27:05 And so when you import it, it goes and imports all the stuff to like run the whole web app,
27:10 but also little utilities like, oh, I just want to get a quick report.
27:14 I want to just access this model and then use it on something real quick.
27:19 It imports all that stuff.
27:20 So that app startup would be potentially slowed down by this.
27:23 Where if you know, like only sometimes is that type adapter used, you don't want to
27:28 necessarily have it completely created until that function gets called.
27:31 So then the first function call might be a little slow, but there'd be plenty of times
27:34 where maybe it never gets called.
27:36 Right?
27:37 Yep, exactly.
27:38 Awesome.
27:39 Okay.
27:40 All right.
27:41 So kind of a more complex performance optimization is using tagged unions.
27:45 They're still pretty simple.
27:46 It's just like a little bit more than a one line change.
27:49 So kind of talking about tagged unions, we can go through a basic example, why we're
27:54 using tagged unions in the first place, and then some more advanced examples.
27:59 This portion of talk Python to me is brought to you by code comments and original podcast
28:03 from Red Hat.
28:05 You know, when you're working on a project and you leave behind a small comment in the
28:08 code, maybe you're hoping to help others learn what isn't clear at first.
28:14 Sometimes that code comment tells a story of a challenging journey to the current state
28:18 of the project.
28:19 Code comments, the podcast features technologists who've been through tough tech transitions,
28:25 and they share how their teams survived that journey.
28:28 The host Jamie Parker is a Red Hatter and an experienced engineer.
28:32 In each episode, Jamie recounts the stories of technologists from across the industry
28:37 who've been on a journey implementing new technologies.
28:40 I recently listened to an episode about DevOps from the folks at Worldwide Technology.
28:45 The hardest challenge turned out to be getting buy in on the new tech stack rather than using
28:50 that tech stack directly.
28:52 It's a message that we can all relate to.
28:54 And I'm sure you can take some hard one lessons back to your own team.
28:58 Give code comments a listen.
29:00 Search for code comments in your podcast player or just use our link, talkpython.fm/code-comments.
29:07 The link is in your podcast player's show notes.
29:09 Thank you to code comments and Red Hat for supporting Talk Python to Me.
29:14 Let's start with what are tag unions because I honestly have no idea.
29:16 I know what unions are, but tagging them, I don't know.
29:19 Yeah, sure thing.
29:20 So tag unions are a special type of union.
29:24 We also call them discriminated unions.
29:26 And they help you specify a member of a model that you can use for discrimination in your
29:33 validation.
29:34 So what that means is if you have two models that are pretty similar, and your field can
29:40 be either one of those types of models, model X or model Y, but you know that there's one
29:46 tag or like discriminator field that differs, you can specifically validate against that
29:52 field and skip some of the other validation, right?
29:54 So like, I'll move on to an example here in a second, but basically it helps you validate
30:00 more efficiently because you get to skip validation of some fields.
30:03 So it's really helpful if you have models that have like 100 fields, but one of them
30:06 is really indicative of what type it might be.
30:09 I see.
30:10 So instead of trying to figure out like, is it all of this stuff, once you know it has
30:14 this aspect or that aspect, then you can sort of branch it on a path and just treat it as
30:18 one of the elements of the union.
30:20 Is that right?
30:21 Yes, exactly.
30:22 Okay.
30:23 So one other note about discriminated unions is you specify this discriminator and it can
30:28 either be a string, like literal type or callable type.
30:31 And we'll look at some examples of both.
30:33 So here's kind of a more concrete example so we can really better understand this.
30:38 So let's say we have a, this is the classic example, right?
30:41 A cat model and a dog model.
30:43 Yeah.
30:44 Cat people, dog people.
30:45 You're going to start a debate here.
30:47 Exactly, exactly.
30:48 They both have this pet type field.
30:52 So for the cat model, it's a literal that is just the string cat.
30:56 And then for the dog model, it's the literal that's the string dog.
30:59 So it's just kind of a flag on a model to indicate what type it is.
31:03 And you can imagine, you know, in this basic case, we only have a couple of fields attached
31:07 to each model, but maybe this is like data in a like vet database.
31:13 And so you can imagine like there's going to be tons of fields attached to this, right?
31:16 So it'd be pretty helpful to just be able to look at it and say, oh, the pet type is
31:20 dog.
31:21 So this data is valid for a dog type.
31:23 I'll also note we have a lizard in here.
31:26 So what this looks like in terms of validation with Pydantic then is that when we specify
31:33 this pet field, we just add one extra setting, which says that the discriminator is that
31:39 pet type field.
31:40 And so then when we pass in data that corresponds to a dog model, Pydantic is smart enough to
31:46 say, oh, this is a discriminated union field.
31:48 Let me go look for the pet type field on the model and just see what that is and then use
31:54 that to inform my decision for what type I should validate against.
31:58 OK, that's awesome.
32:00 So if we don't set the discriminator keyword value in the field for the union, it'll still
32:07 work, right?
32:08 It just has to be more exhaustive and slow.
32:10 Yeah, exactly.
32:11 So it'll still validate and it'll say, hey, let's take this input data and try to validate
32:16 it against the cat model.
32:18 And then Pydantic will come back and say, oh, that's not a valid cat.
32:20 Like let's try the next one.
32:22 Whereas with this discriminated pattern, we can skip right to the dog, which you can imagine
32:27 helps us skip some of the unnecessary steps.
32:29 Yeah, absolutely.
32:30 OK, that's really cool.
32:31 I had no idea about this.
32:32 Yeah, yeah.
32:33 It's a cool, I'd say like moderate level feature.
32:36 Like I think if you're just starting to use Pydantic, you probably haven't touched discriminated
32:40 unions much, but we hope that it's simple enough to implement that most folks can use
32:44 it if they're using unions.
32:46 Yeah, that's cool.
32:47 I don't use unions very often, which is probably why, other than, you know, symantic pipe none,
32:52 which is, you know, like optional.
32:53 But yeah, if I did, I'll definitely remember this.
32:57 Yeah.
32:58 All righty.
32:59 So as I've mentioned, this helps for more efficient validation.
33:03 And then where this really comes and has a lot of value is when you are dealing with
33:07 lots of nested models or models that have tons of fields.
33:10 So let's say you have a union with like 10 members and each member of the union has 100
33:15 fields.
33:16 If you could just do validation against 100 fields instead of a thousand, that would be
33:19 great in terms of a performance gain.
33:22 And then once again, with nested models, you know, if you can skip lots of those union
33:27 member validations, also going to boost your performance.
33:29 Yeah, for sure.
33:30 You know, an example where this seems very likely would be using it with Beanie or some
33:35 other document database where the modeling structure is very hierarchical.
33:40 You end up with a lot of nested sub-Pydantic models in there.
33:45 Yeah, very much so.
33:48 So as a little bit of an added benefit, we can talk about kind of this improved error
33:52 handling, which is a great way to kind of visualize why the discriminated union pattern
33:57 is more efficient.
33:58 So right now we're looking at an example of validation against a model that doesn't use
34:03 a discriminated union.
34:05 And the errors are not very nice to look at.
34:07 You basically see the errors for every single permutation of the different values.
34:13 And we're using nested models, so it's very hard to interpret.
34:17 So we don't have to look at this for too long.
34:19 It's not very nice.
34:20 But if we look at...
34:22 But basically the error message says, look, there's something wrong with the union.
34:26 If it was a string, it is missing these things.
34:28 If it was this kind of thing, it misses those things.
34:31 Like if it was a dog, it misses this.
34:33 If it's a cat, it misses that.
34:35 And it doesn't specifically tell you.
34:38 Exactly.
34:39 It's a dog, so it's missing like the collar size or whatever, right?
34:43 Exactly.
34:44 But then...
34:45 And I'll go back and kind of explain the discriminated model for this case in a second.
34:49 But if you look at...
34:50 This is the model with the discriminated union instead.
34:54 We have one very nice error that says, okay, you're trying to validate this X field and
35:00 it's the wrong type, right?
35:03 So yeah.
35:05 The first example that we were looking at was using string type discriminators.
35:09 So we just had this pet type thing that said, oh, this is a cat or this is a dog, that sort
35:13 of thing.
35:14 We also offer some more customization in terms of...
35:19 We also allow callable discriminators.
35:21 So in this case, this field can be either a string or this instance of discriminated
35:29 model.
35:30 So it's kind of a recursive pattern, right?
35:32 And that's where you can imagine the nested structures becoming very complex very easily.
35:38 And we use this kind of callable to differentiate between which model we should validate against.
35:44 And then we tag each of the cases.
35:47 So a little bit more of a complex application here.
35:50 But once again, when you kind of see the benefit in terms of errors and interpreting things
35:54 and performance, I think it's generally a worthwhile investment.
35:57 That's cool.
35:58 So if you wanted something like a composite key equivalent of a discriminator, right?
36:04 Like if it has this field and its nested model is of this type, it's one thing versus another.
36:10 Like a free user versus a paying user, you might have to look and see their total lifetime
36:15 value plus that they're a registered user.
36:18 I don't know, something like that.
36:19 You could write code that would pull that information out and then discriminate which
36:22 thing to validate against, right?
36:24 Yeah, exactly.
36:25 Yeah, definitely comes in handy when you have like...
36:28 You're like, okay, well, I still want the performance benefits of a discriminated union,
36:32 but I kind of have three fields on each model that are indicative of which one I should
36:36 validate against, right?
36:37 And it's like, well, you know, taking the time to look at those three fields over the
36:41 hundred is definitely worth it.
36:44 Just a little bit of complexity for the developer.
36:48 One other note here is that discriminated unions are...
36:50 Can we go back really quick on the previous one?
36:52 So I got a quick question.
36:53 So for this, you write a function.
36:56 It's given the value that comes in, which could be a string, it could be a dictionary.
37:02 Could you do a little bit further performance improvements and add like a func tools, LRU
37:09 cache to cache the output?
37:11 So every time it sees the same thing, if there's a repeated data through your validation, it
37:15 goes, I already know what it is.
37:16 What do you think?
37:17 Yeah, I do think that would be possible.
37:19 That's definitely an optimization we should try out and put in our docs for like the advanced,
37:24 advanced performance tips.
37:25 Yeah, because if you've got a thousand strings and then you, you know, that were like it's
37:31 going to be male, female, male, female, male, male, female, like that kind of where the
37:35 data is repeated a bunch.
37:37 Then it could just go, yep, we already know that answer.
37:40 Yeah.
37:41 That'd be potentially, I don't know.
37:43 Yeah, no, definitely.
37:44 And I will say, I don't know if it takes effect.
37:47 I don't think it takes effect with discriminated unions because this logic is kind of in Python,
37:53 but I will say we recently added a like string caching setting because we have kind of our
37:57 own JSON parsing logic that we use in Pydantic Core.
38:01 And so we added a string caching setting so that you don't have to rebuild the exact same
38:05 strings every time.
38:07 So that's a nice performance piece.
38:09 Yeah, nice.
38:10 Caching's awesome.
38:11 Until it's not.
38:12 Yeah, exactly.
38:14 So one quick note here is just that discriminated unions are still JSON schema compatible, which
38:19 is awesome for the case where you're once again, defining like API requests and responses.
38:24 You want to still have valid JSON schema coming out of your models.
38:27 Yeah, very cool.
38:28 And that might show up in things like open API documentation and stuff like that, right?
38:34 Yep, exactly.
38:35 So I'll kind of skip over this.
38:37 We already touched on the callable discriminators.
38:39 And then I'll leave these slides up here as a reference.
38:43 Again, I don't think this is worth touching in too much detail, but just kind of another
38:48 comment about if you've got nested models, that still works well with discriminated unions.
38:53 So we're still on the pet example, but let's say this time you have a white cat and a black
38:58 cat model.
39:00 And then you also have your existing dog model.
39:03 You can still create a union of, you know, your cat union is a union of black cat and
39:09 white cat.
39:10 And then you can union that with the dogs and it still works.
39:13 And once again, you can kind of imagine the exponential blow up that would occur if you
39:18 didn't use some sort of discriminator here in terms of errors.
39:21 Yeah, very interesting.
39:22 OK, cool.
39:23 Yeah.
39:24 So that's kind of all in terms of my recommendations for discriminated union application.
39:30 I would encourage folks who are interested in this to check out our documentation.
39:34 It's pretty thorough in that regard.
39:35 And I think we also have those links attached to the podcast.
39:38 Yeah, definitely.
39:39 And then performance improvements in the pipeline.
39:42 Is this something that we can control from the outside?
39:44 Is this something that you all are just adding for us?
39:47 Yeah, good question.
39:48 This is hopefully maybe not all in the next version, but just kind of things we're keeping
39:53 our eyes on in terms of requested performance improvements and ideas that we have.
39:57 I'll go a little bit out of order here.
40:00 We've been talking a bunch about core schema and kind of, you know, maybe deferring the
40:04 build of that or, you know, just trying to optimize that.
40:07 And that actually happens in Python.
40:09 So one of the biggest things that we're trying to do is effectively speed up the core schema
40:14 building process so that import times are faster and just, you know, Pydantic is more
40:19 performant in general.
40:21 So one thing that I'd like to ask about, kind of back on the Python side a little bit, suppose
40:28 I've got some really large document, right?
40:32 Really nested document.
40:33 Maybe I've converted some terrible XML thing into JSON or I don't know, something.
40:38 And there's a little bit of structured schema that I care about.
40:42 And then there's a whole bunch of other stuff that I could potentially create nested models
40:47 to go to, but I don't really care about validating them.
40:49 It's just whatever it is, it is.
40:52 What if you just said that was a dictionary?
40:54 Would that short circuit a whole bunch of validation and stuff that would make it faster
40:58 potentially?
40:59 Yeah.
41:00 Turn off the validation for a subset of the model if it's really big and deep and you
41:05 don't really care for that part?
41:06 Yeah.
41:07 Good question.
41:08 So there's an annotation called skip validation that you can apply to certain types.
41:13 So that's kind of one approach.
41:15 I think in the future, it could be nice to offer kind of a config setting so that you
41:18 can more easily list features that you want to skip validation for instead of applying
41:23 those on a field by field basis.
41:26 And then the other thing is if you only define your model in terms of the fields that you
41:30 really care about from that very gigantic amount of data, we will just ignore the extra
41:36 data that you pass in and pull out the relevant information.
41:39 Right.
41:40 Okay.
41:41 Yeah.
41:42 Good.
41:43 Back to the pipeline.
41:44 Yeah.
41:45 Back to the pipeline.
41:46 So another improvement, we talked a little bit about potential parallelization of things
41:51 or vectorization.
41:52 One thing that I'm excited to learn more about in the future and that we've started working
41:56 on is this thing called SIMD in Jitter.
41:59 And that's our JSON iterable parser library that I was talking about.
42:02 And so SIMD stands for a single instruction, multiple data.
42:07 Basically means that you can do operations faster.
42:10 And that's with this kind of vectorization approach.
42:12 I certainly don't claim to be an expert in SIMD, but I know that it's improving our validation
42:19 speeds in the department of JSON parsing.
42:22 So that's something that we're hoping to support for a broader set of architectures going forward.
42:28 Yeah, that's really cool.
42:30 Just like what Pandas does for Python, instead of looping over and validation and doing something
42:34 to each piece, you just go this whole column, multiply it by two.
42:38 Yep.
42:40 Exactly.
42:41 I'm sure it's not implemented the same, but like conceptually the same.
42:42 Yep.
42:44 Very much so.
42:45 And then the other two things in the pipeline that I'm going to mention are kind of related
42:49 once again to the avoiding materializing things in Python if we can.
42:53 And we're even kind of extending that to avoiding materializing things in Rust if we don't have
42:58 to.
42:59 So the first thing is when we're parsing JSON in Rust, can we just do the validation as
43:03 we kind of chomp through the JSON instead of like materializing the JSON as a Rust object
43:09 and then doing all the validation?
43:10 It's like, can we just do it in one pass?
43:12 Okay.
43:13 Is that almost like generators and iterables rather than loading all into memory at once
43:19 and then processing it one at a time?
43:21 Yeah, exactly.
43:22 And it's kind of like, do you build the tree and then walk it three times or do you just
43:28 do your operations every time you add something to the tree?
43:32 And then the last performance improvement in the pipeline that I'll mention is this
43:35 thing called fast model.
43:37 Has not been released yet.
43:39 Hasn't really even been significantly developed, but this is cool in that it's really approaching
43:44 that kind of laziness concept again.
43:46 So attributes would remain in Rust after validation until they're requested.
43:51 So this is kind of along the lines of the defer build logic that we were talking about
43:55 in terms of like, we're not going to send you the data or perform the necessary operations
43:59 until they're requested.
44:00 Right.
44:01 Okay.
44:02 Yeah.
44:03 If you don't ever access the field, then why process all that stuff, right?
44:05 And convert it into Python objects.
44:07 Yeah, exactly.
44:08 But yeah, we're kind of just excited in general to be looking at lots of performance improvements
44:14 on our end, even after the big V2 speed up, still have lots of other things to work on
44:18 and improve.
44:19 Yeah, it sure seems like it.
44:21 And if this free threaded Python thing takes off, who knows, maybe there's even more craziness
44:27 with parallel processing of different branches of the model at different, you know, alongside
44:34 each other.
44:35 Yeah.
44:36 So I think this kind of devtools nicely into like you asked earlier, like, is there a way
44:41 that we kind of monitor the performance improvements that we're making?
44:45 And we're currently using and getting started with two tools that are really helpful.
44:51 And I can share some PRs if that's helpful, send links after.
44:55 But one of them is Codspeed, which integrates super nicely with CI and GitHub.
45:02 And it basically runs tests tagged with this like benchmark tag.
45:08 And then it'll, you know, run them on main compared to on your branch.
45:11 And then you can see like, oh, this made my code, you know, 30% slower, like maybe let's
45:16 not merge that right away.
45:18 Or conversely, if you know there's a 30% improvement on some of your benchmarks, it's really nice
45:23 to kind of track and see that.
45:25 I see.
45:26 So it looks like it sets up.
45:27 So this is a Codspeed.io, right?
45:31 And then it sets up as a GitHub action as part of your CI/CD.
45:36 And you know, probably automatically runs when a PR is open and things along those lines,
45:40 right?
45:41 Yep, exactly.
45:42 All right.
45:43 I've never heard of this.
45:44 But yeah, if it just does the performance testing for yourself automatically, why not?
45:48 Right?
45:49 Let it let it do that.
45:50 Yeah.
45:51 And then I guess another tool that I'll mention while talking about kind of our, you know,
45:57 continuous optimization is a one word for it is this tool kind of similarly named called
46:03 CodeFlash.
46:05 So CodeFlash is a new tool that uses LLMs to kind of read your code and then develop
46:12 potentially more performant versions.
46:15 Kind of analyze those in terms of, you know, is it pass is this new code passing existing
46:20 tests?
46:21 Is it passing additional tests that we write?
46:24 And then another great thing that it does is open PRs for you with those improvements
46:28 and then explain the improvements.
46:31 So I think it's a really pioneering tool in the space.
46:34 And we're excited to kind of experiment with it more on our PRs and in our repository.
46:40 Okay.
46:41 I love it.
46:42 Just tell me why is this?
46:44 Why did this slow down?
46:45 Well, here's why.
46:46 Yeah, exactly.
46:47 Yeah.
46:48 And they offer both like local runs of the tool and also built in CI support.
46:54 So those are just kind of two tools that we use to use and are increasingly using to help
46:59 us kind of check our performance as we continue to develop and really inspire us to, you know,
47:05 get those green check marks with the like performance improved on lots of PRs.
47:10 The more you can have it where if it passes the automated build, it's just ready to go
47:15 and you don't have to worry a little bit and keep testing things and then have uncertainty.
47:20 You know that.
47:21 It's nice, right?
47:22 Because you're allowed to rest and sleep at night.
47:25 Yeah, most certainly.
47:27 I mean, I said it before, but the number of people who are impacted by Pydantic, I don't
47:33 know what that number is, but it has to be tremendous because if there's 400,000 projects
47:37 that use it, like think of the users of those projects, right?
47:40 Like that, that multiple has got to be big for, you know, I'm sure there's some really
47:43 popular ones, for example, FastAPI.
47:45 Yeah.
47:48 And it's just nice to know that there are other companies and tools out there that can
47:52 help us to really boost the performance benefits for all those users, which is great.
47:57 All right.
47:58 Yeah, that is really cool.
47:59 I think, you know, let's talk about one more performance benefit for people and not so
48:04 much in how fast your code runs, but in how fast you go from raw data to Pydantic models.
48:11 So one thing you probably have seen, we may have even spoken about this before, are you
48:16 familiar with JSON to Pydantic?
48:18 The website?
48:19 Yeah, it's a really cool tool.
48:20 Yeah, it's such a cool tool.
48:21 And if you've got some really complicated data, like let's see, I'll pull up some weather
48:26 data that's in JSON format or something, right?
48:29 Like if you just take this and you throw it in here, just don't even have to pretty print
48:33 it, it'll just go, okay, well, it looks like what we've got is, you know, this really complicated
48:38 nested model here.
48:40 And it took, you know, we did this while I was talking, it took 10 seconds for me clicking
48:44 the API to get a response to having like a pretty decent representation here.
48:49 Yeah, it's great in terms of like developer agility, especially, right?
48:53 It's like, oh, I've, you know, heard of this tool called Pydantic.
48:55 I've seen it in places like, I don't really know if I want to manually go build all these
48:59 models for my super complicated JSON data.
49:02 It's like, boom, three seconds done for you, basically.
49:05 Exactly.
49:06 Like, is it really worth it?
49:07 Because I don't want to have to figure this thing out and figure out all the types and
49:11 like, no, just paste it in there and see what you get.
49:13 You're going to be, it won't be perfect, right?
49:15 Some things, if they're null in your data, but they could be something that would make
49:19 them an optional element, like they could be an integer or they could be null.
49:22 It won't know that it's going to be an integer, right?
49:25 Right.
49:26 So you kind of got to patch it up a tiny bit, but in general, I think this is really good.
49:30 And then also, you know, just drop in with your, your favorite LLM, you know, I've been
49:36 using LM studio, which is awesome.
49:37 Nice.
49:38 I heard you talk about that on one of the most recent podcasts, right?
49:42 Yeah.
49:44 It's super cool.
49:45 You can download, just download LLAMA3 and run it locally with like a, I think my, my
49:49 computer can only handle 7 billion parameter models, but you know that you get pretty good
49:53 answers and if you give it a JSON, a piece of JSON data and you say, convert that to
49:57 Pydantic, you'll get really good results.
50:00 You have a little more control over than what you just get with this, this tool.
50:03 But I think those two things, while not about runtime performance, you know, going from
50:09 I have data till I'm working with Pydantic, that's pretty awesome.
50:12 Yeah, definitely.
50:13 And if any, you know, passionate open source contributors are listening and want to create
50:18 like a CLI tool for doing this locally, I'm sure that would be.
50:23 I think this is based on something that I don't use, but I think it's based on this
50:29 data model code generator, which I think might be a CLI tool or a library.
50:34 Let's see.
50:35 Yes.
50:36 Oh yeah.
50:37 Very nice.
50:38 So here, here's the problem that you, you know, you go and define like a YAML file.
50:40 You know, like it's, it's just not as easy as like there's a text field I pasted my stuff,
50:45 but it does technically, technically work, I suppose.
50:49 Yeah.
50:50 I know.
50:51 Definitely the LLM approach or just the basic website approach is very quick, which is nice.
50:55 Yeah.
50:56 Let's talk about LLMs just really quick.
50:57 Like I feel, you know, you get some of the Python newsletters and other places that like,
51:02 here's the cool new packages.
51:03 A lot of them are like nine out of 10 of them are about LLMs these days.
51:07 I was like, that feels a little over the top to me, but I know there's other things going
51:11 on in the world, but you know, just what are your thoughts on LLMs and encoding these days?
51:15 I know you, you write a lot of code and think about it a lot and probably use LLM somewhere
51:19 in there.
51:20 Yeah, no, for sure.
51:21 I, I'm pretty optimistic and excited about it.
51:24 I think there's a lot of good that can be done and a lot of productivity boosting to
51:29 be had from integrating with these tools, both in your like local development environment
51:34 and also just in general.
51:36 I think sometimes, you know, it's also great in the performance department, right?
51:40 Like we can see with CodeFlash using LLMs to help you write for performant code can
51:46 also be really useful.
51:47 And it's been exciting to see some libraries really leverage Pydantic as well in that space
51:52 in terms of like validating LLM outputs or even using LLM calls in Pydantic validators
51:58 to validate, you know, data along constraints that are more like language model friendly.
52:05 So yeah, I'm optimistic about it.
52:06 I still have a lot to learn, but it's cool to see the variety of applications and kind
52:10 of where you can plug in Pydantic in that process for fun.
52:13 Yeah, I totally agree.
52:15 Right now, the context window, like how much you can give it as information than to start
52:20 asking questions is still a little bit small.
52:23 Like you can't give it some huge program and say, you know, find me the bugs where this
52:27 function is called or you know, whatever.
52:28 And it like it doesn't quite understand enough all at once.
52:31 But that thing keeps growing.
52:33 So eventually, someday we'll all see.
52:36 Yep.
52:37 All right.
52:38 Well, let's talk just for a minute, maybe real quick about what you all are doing at
52:42 Pydantic the company rather than Pydantic the open source library.
52:47 Like what do you all got going on there?
52:48 Yeah, sure.
52:49 So Pydantic has the company has released our first commercial tool.
52:55 It's called LogFire and it's in open beta.
52:58 So it's an observability platform.
53:01 And we'd really encourage anyone interested to try it out.
53:04 It's super easy to get started with, you know, just the basic like pip install of the SDK
53:10 and then start using it in your code base.
53:12 And then we have the kind of LogFire dashboard where you're going to see the observability
53:18 and results.
53:20 And so we kind of adopt this like needle in the haystack philosophy where we want this
53:24 to be a very easy to use observability platform that offers very like Python centric insights.
53:31 And it's this kind of opinionated wrapper around open telemetry, if folks are familiar
53:37 with that.
53:38 But in kind of the context of performance, one of the great things about this tool is
53:42 that it offers this like nested logging and profiling structure for code.
53:47 And so it can be really helpful in kind of looking at your code and being like, we don't
53:51 know where this performance slowdown is occurring.
53:54 But if we integrate with LogFire, we can see that like very easily in the dashboard.
53:59 Yeah, you have some interesting approaches, like specifically targeting popular frameworks
54:06 like instrument FastAPI or something like that, right?
54:09 Yeah, definitely.
54:10 Trying to kind of build integrations that work very well with FastAPI, other tools
54:15 like that, and even also offering kind of like custom features in the dashboard, right?
54:20 Like if you're looking at, you know, if you're using an observability tool, you're probably
54:24 advanced enough to want to add some extra things to your dashboard.
54:27 And we're working on supporting that with fast UI, which I know you've chatted with
54:30 Samuel about as well.
54:32 Yeah, absolutely.
54:33 I got a chance to talk to Samuel about LogFire and some of the behind the scenes infrastructure
54:38 was really interesting.
54:39 But also speaking of fast UI, you know, I did speak to him.
54:43 When was that?
54:44 Back in February.
54:45 So this is a really popular project.
54:48 And even on the, I was like, quite a few people decided that they were interested in even
54:54 watching the video on that one, which, yeah.
54:58 Anything with fast UI?
54:59 Sorry, did you say anything with fast UI?
55:01 Yeah, yeah.
55:03 Are you doing anything on the fast UI side?
55:04 Are you on the Pydantic side of things?
55:07 Yeah, good question.
55:08 I've been working mostly on Pydantic, just, you know, larger user base, more feature requests,
55:14 but excited to, I've done a little bit on the fast UI side and excited to kind of brush
55:18 up on my TypeScript and build that out as a more robust and supported tool.
55:22 I think, especially as we grow as a company and have more open source support in general,
55:27 that'll be a priority for us, which is exciting.
55:30 Yeah.
55:31 It's an interesting project.
55:33 Definitely a cool way to do JavaScript front ends and react and then plug those back into
55:39 Python APIs, like FastAPI and those types of things.
55:42 Right.
55:43 So, yeah.
55:44 Yeah.
55:45 And kind of a similarity with fast UI and Logfire, the new tool, is that there's pretty
55:48 seamless integration with Pydantic, which is definitely going to be one of the kind
55:52 of core tenants of any products or open source things that we're producing in the future.
55:56 Yeah.
55:57 I can imagine that's something you want to pay special attention to is like, how well
56:00 do these things fit together as a whole, rather than just, here's something interesting, here's
56:04 something interesting.
56:05 Yeah.
56:06 Awesome.
56:07 All right.
56:08 Well, I think that pretty much wraps it up for the time that we have to talk today.
56:13 Let's close it out.
56:14 Close it out for us with maybe a final call to action for people who are already using
56:19 Pydantic and they want it to go faster, or maybe they could adopt some of these tips.
56:24 What do you tell them?
56:25 Yeah.
56:26 So, you know, inform yourself just a little bit about kind of the Pydantic architecture,
56:31 just in terms of like, what is core schema and why are we using Rust for validation and
56:35 serialization?
56:36 And then that can kind of take you to the next steps of, when do I want to build my
56:41 core schemas based on kind of the nature of my application?
56:44 Is it okay if imports take a little bit longer or do I want to delay that?
56:48 And then take a look at discriminated unions.
56:51 And then maybe if you're really interested in improving performance across your application
56:54 that supports Pydantic and other things, trying out LogFire and just seeing what sort of benefits
57:00 you can get there.
57:01 Yeah.
57:02 See where you're spending your time is one of the very, you know, not just focused on
57:05 Pydantic, but in general, our intuition is often pretty bad for where is your code slow
57:11 and where is it not slow?
57:12 You're like, that looks really complicated.
57:14 That must be slow.
57:15 Like, nope.
57:16 It's that one call to like some sub module that you didn't realize was terrible.
57:19 Yeah.
57:21 And I guess that kind of circles back to the like LLM tools and, you know, integrated performance
57:27 analysis with CodSpeed and CodeFlash and even just other LLM tools, which is like, use the
57:31 tools you have at hand.
57:32 And yeah, sometimes they're better at performance improvements than you might be, or it can
57:37 at least give you good tips that give you, you know, a launching point, which is great.
57:40 Yeah, for sure.
57:41 Or even good old C profile built right in, right?
57:44 If you really, if you want to do it that way.
57:46 Awesome.
57:47 All right.
57:49 Sydney, thank you for being back on the show and sharing all these tips and congratulations
57:53 on all the work you and the team are doing.
57:56 You know, what a, what a success Pydantic is.
57:58 Yeah.
57:59 Thank you so much for having me.
58:00 It was wonderful to get to have this discussion with you and excited that I got to meet you
58:03 in person at PyCon recently.
58:04 Yeah, that was really great.
58:05 Really great.
58:06 Until, until next PyCon.
58:08 See you later.
58:10 This has been another episode of Talk Python to Me.
58:13 Thank you to our sponsors.
58:14 Be sure to check out what they're offering.
58:15 It really helps support the show.
58:18 Take some stress out of your life.
58:19 Get notified immediately about errors and performance issues in your web or mobile applications
58:24 with Sentry.
58:25 Just visit talkpython.fm/sentry and get started for free.
58:30 And be sure to use the promo code talkpython, all one word.
58:34 Code comments and original podcast from Red Hat.
58:37 This podcast covers stories from technologists who've been through tough tech transitions
58:42 and share how their teams survived the journey.
58:46 Those are available everywhere you listen to your podcasts and at talkpython.fm/code-comments.
58:49 Want to level up your Python?
58:53 We have one of the largest catalogs of Python video courses over at Talk Python.
58:57 Our content ranges from true beginners to deeply advanced topics like memory and async.
59:02 And best of all, there's not a subscription in sight.
59:05 Check it out for yourself at training.talkpython.fm.
59:08 Be sure to subscribe to the show.
59:10 Open your favorite podcast app and search for Python.
59:13 We should be right at the top.
59:14 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the Direct
59:20 RSS feed at /rss on talkpython.fm.
59:24 We're live streaming most of our recordings these days.
59:26 If you want to be part of the show and have your comments featured on the air, be sure
59:30 to subscribe to our YouTube channel at talkpython.fm/youtube.
59:35 This is your host, Michael Kennedy.
59:36 Thanks so much for listening.
59:37 I really appreciate it.
59:39 Now get out there and write some Python code.
59:41 [MUSIC PLAYING]
59:43 [MUSIC ENDS]
59:45 [MUSIC PLAYING]
59:47 [MUSIC ENDS]
59:49 [MUSIC PLAYING]
59:52 [MUSIC ENDS]
59:54 [MUSIC PLAYING]
59:57 [MUSIC ENDS]
59:59 [MUSIC]