Learn Python with Talk Python's 270 hours of courses

#466: Pydantic Performance Tips Transcript

Recorded on Thursday, Jun 13, 2024.

00:00 You're using Pydantic and it seems pretty straightforward, right?

00:02 But could you adopt some simple changes to your code that would make it a lot faster and more efficient?

00:08 Chances are you'll find a couple of the tips from Sydney Runkle that will do just that.

00:12 Join us to talk about Pydantic performance tips here on Talk Python.

00:16 Episode 466 recorded June 13th, 2024.

00:21 [music]

00:34 Welcome to Talk Python to Me, a weekly podcast on Python.

00:38 This is your host, Michael Kennedy.

00:40 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

00:45 both on fosstodon.org.

00:47 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

00:52 We've started streaming most of our episodes live on YouTube.

00:56 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:04 This episode is brought to you by Sentry.

01:06 Don't let those errors go unnoticed.

01:07 Use Sentry like we do here at Talk Python.

01:10 Sign up at talkpython.fm/sentry.

01:13 And it's brought to you by Code Comments, an original podcast from Red Hat.

01:17 This podcast covers stories from technologists who've been through tough tech transitions

01:22 and share how their teams survived the journey.

01:26 Episodes are available everywhere you listen to your podcast and at talkpython.fm/code-comments.

01:32 Hey, folks, I got something pretty excellent for you.

01:35 PyCharm Professional for six months for free.

01:38 Over at Talk Python, we partnered with the JetBrains team to get all of our registered users

01:44 free access to PyCharm Pro for six months.

01:47 All you have to do is take one of our courses.

01:49 That's it.

01:50 However, do note that this is not valid for renewals over at JetBrains.

01:53 Only new users there.

01:55 And if you're not currently a registered user at Talk Python, well, no problem.

02:00 This offer comes with all of our courses.

02:02 So even if you just sign up for one of our free courses at talkpython.fm,

02:07 click on Courses in the menu, you're in.

02:09 So how do you redeem it?

02:10 Once you have an account over at Talk Python, then it's super easy.

02:14 Just visit your account page on Talk Python Training.

02:17 And in the Details tab, you'll have a code and a link to redeem your six months of PyCharm Pro.

02:22 So why not take a course, even a free one, and get six months free of PyCharm?

02:28 Sydney, welcome back to Talk Python.

02:30 It's awesome to have you here.

02:31 Thank you.

02:32 I'm super excited to be here.

02:33 And yeah, I'm excited for our chat.

02:34 I am too.

02:35 We're going to talk about Pydantic, one of my very favorite libraries that just makes working with Python data,

02:41 data exchange so, so easy, which is awesome.

02:45 And it's really cool that you're on the Pydantic team these days.

02:49 More than I guess, you know, let's jump back just a little bit.

02:52 A few weeks ago, got to meet up a little bit in Pittsburgh at PyCon.

02:57 How was PyCon for you?

02:58 It was great.

02:59 So it was my first PyCon experience ever.

03:02 It was a very, very large conference.

03:03 So it was a cool kind of first introductory conference experience.

03:08 I had just graduated not even a week before.

03:10 So it was a fun way to kind of roll into full-time work and get exposed really to the Python community.

03:16 And it was great to just kind of have a mix of getting to give a talk, getting to attend lots of awesome presentations,

03:22 and then most of all, just like meeting a bunch of really awesome people in the community.

03:26 Yeah.

03:27 I always love how many people you get to meet from so many different places and perspectives.

03:33 And it just reminds you, the world is really big, but also really small.

03:38 You know, get to meet your friends and new people from all over the place.

03:41 Definitely.

03:42 I was impressed by the number of international attendees.

03:45 I didn't really expect that.

03:46 It was great.

03:46 Yeah.

03:47 Same here.

03:48 All right.

03:48 Well, maybe a quick introduction for yourself for those who didn't hear your previous episode,

03:54 and then we'll talk a bit about this Pydantic library.

03:57 Yeah, sure.

03:58 Sounds great.

03:59 So my name is Sydney.

04:00 I just graduated from the University of Wisconsin.

04:03 Last time I chatted with you, I was still pursuing my degree in computer science.

04:08 And working part-time as an intern at the company Pydantic, which is kind of founded around the same ideas that inspired the open source tool,

04:17 and now we're building commercial tools.

04:18 And now I've rolled over into full-time work with them, primarily on the open source side.

04:23 So yeah, very excited to kind of be contributing to the open source community,

04:27 but also getting to help with our commercial tools and development there.

04:32 Yeah, yeah.

04:33 Awesome.

04:33 We'll talk a bit about that later.

04:35 Super cool to be able to work on open source as a job, as a proper job, right?

04:40 Yeah, it's awesome.

04:41 It's really unique.

04:43 I've kind of encouraged lots of people to contribute to open source as kind of a jumpstart

04:47 into their software development careers, especially like young folks who are looking to get started

04:53 with things and maybe don't have an internship or that sort of thing set up yet.

04:56 I think it's a really awesome pipeline for like getting exposed to good code and collaborating

05:01 with others and that sort of thing.

05:02 But it's definitely special to get to do and get paid as well.

05:06 Indeed.

05:07 So it's a little bit unbelievable to me, but I'm sure that it is true that there are folks

05:12 out there listening to the podcast that are like, "Pydantic, maybe I've heard of that.

05:16 What is this Pydantic thing?" Yeah, great question.

05:20 What is Pydantic?

05:22 So Pydantic is the leading data validation library for Python.

05:26 And so Pydantic uses type hints, which are optional in Python, but kind of generally more

05:31 and more encouraged to enforce constraints on data and kind of validate data structures,

05:37 et cetera.

05:38 So we're kind of looking at a very simple example together right now where we're importing

05:43 things like date time and tuple types from typing.

05:46 And then kind of the core of Pydantic is you define these classes that inherit from this

05:54 class called base model that's in Pydantic.

05:57 And that inheritance is what ends up helping you use methods to validate data, build JSON

06:03 schema, things like that.

06:05 And so in our case, we have this delivery class that has a timestamp, which is of type

06:09 date time, and then a dimensions tuple, which has two int parts.

06:16 And so then when you pass data into this delivery class to create an instance, Pydantic handles

06:22 validating that data to make sure that it conforms to those constraints we've specified.

06:27 And so it's really a kind of intermediate tool that you can use for deserialization

06:31 or loading data and then serialization, dumping data.

06:34 Yeah, it's a thing of beauty.

06:37 I really love the way that it works.

06:38 If you've got JSON data, nested JSON data, right?

06:41 If you go to pydantic.dev/opensource, there's an example of here that we're talking about.

06:47 It's got a tuple, but the tuple contains integers, two of them.

06:51 And so if there's a tuple of three things, it'll give you an error.

06:54 If it's a tuple of a date time and an int, it'll give you an error.

06:58 Like it reaches all the way inside and, you know, things I guess it compares against.

07:03 It's a little bit like data classes.

07:05 Have you done much with data classes and compared them?

07:07 Yeah, that's a great question.

07:09 So we actually offer support for like Pydantic data classes.

07:13 So I think data classes kind of took the first step of, you know, really supporting using

07:18 type hints for model fields and things like that.

07:21 And then Pydantic sort of takes the next jump in terms of like validation and schema support.

07:27 And so I think one like very common use cases, if you're defining like API request and response

07:32 models, you can imagine like the JSON schema capabilities come in handy there.

07:36 And just ensuring like the integrity of your API and the data you're dealing with.

07:41 Very helpful in the validation front.

07:43 Yeah, yeah.

07:44 Very cool.

07:45 Okay.

07:46 So I guess one more thing for people who are not super familiar that Pydantic is, I think

07:52 it's used every now and then.

07:55 Let's check it out on GitHub here.

07:56 I'm just trying to think of, you know, like some of the main places people have heard

07:59 of it.

08:00 Obviously, FastAPI, I think is the thing that really launched its popularity in the early

08:05 days if I had to guess.

08:06 But if we go over to GitHub, GitHub says that for the open source things that Pydantic is

08:13 a foundational dependency for 412,644 different projects.

08:20 Yeah.

08:21 That's unbelievable.

08:22 Yeah, it's very exciting.

08:23 We just got our May download numbers and heard that we have over 200 million downloads in

08:30 May.

08:31 So that's both version one and version two.

08:33 But definitely exciting to see how kind of critical of a tool it's become for so many

08:38 different use cases in Python, which is awesome.

08:40 Yeah, absolutely.

08:42 It's really, really critical.

08:43 And I think we should probably talk a little bit about Pydantic v1, v2 as a way to get

08:49 into the architecture conversation.

08:52 Right.

08:53 That was a big thing I talked to Samuel Colvin maybe a year ago or so, I would imagine.

08:57 Think around PyCon.

08:59 I think we did actually a PyCon last year as well.

09:01 Yeah, for sure.

09:02 So a lot of the benefit of using Pydantic is we promise some great performance.

09:10 And a lot of those performance gains came during our jump from v1 to v2.

09:14 So v1 was written solely in Python.

09:17 We had some compiled options, but really it was mostly Pythonic data validation.

09:23 Or I say Pythonic.

09:24 It's always Pythonic, but data validation done solely in Python.

09:29 And the big difference with v2 is that we rewrote kind of the core of our code in Rust.

09:35 And so Rust is much faster.

09:36 And so depending on what kind of code you're running, v2 can be anywhere from two to 20

09:42 times faster in certain cases.

09:45 So right now we still have this Python wrapper around everything in v2.

09:50 But then, and that's kind of used to define schemas for models and that sort of thing.

09:55 And then the actual validation and serialization logic occurs in Pydantic core in Rust.

10:02 Right.

10:03 So I think the team did a really good job to make this major change, this major rewrite,

10:09 and split the whole monolithic thing into a Pydantic core, Pydantic itself, which is

10:14 Python based in a way that didn't break too many projects, right?

10:19 Yeah, that was the goal.

10:20 You know, every now and then there are breaking changes that I think are generally a good

10:25 thing for the library moving forward, right?

10:27 Like hopefully whenever we make a breaking change, it's because it's leading to a significant

10:31 improvement.

10:32 So we definitely do our best to avoid breaking changes and certainly someday we'll launch

10:37 a v3 and hopefully that'll be an even more seamless transition for v2 users to v3 users.

10:44 Yeah, I would imagine that the switch to Rust probably, that big rewrite, it probably caused

10:50 a lot of ways, thoughts of reconsidering, how are we doing this?

10:54 Or now that it's over in Rust, maybe it doesn't make sense this way or whatever.

10:58 Yeah.

10:59 So we kind of, you know, we got a lot of feedback and usage of Pydantic v1, so tried to do our

11:03 best to incorporate all that feedback into a better v2 version in terms of both APIs

11:08 and performance and that sort of thing.

11:10 Sure, sure.

11:11 John out in the audience asks, how did the team approach thread safety with this?

11:16 So Rust can be multiple threads easy.

11:19 Python, not so much really, although maybe soon with free threaded Python.

11:24 Yeah, that's a good question.

11:26 So our kind of Rust guru on the team is David Hewitt, and he's very in the know about all

11:31 of the multi-threading and things happening on the Rust side of things.

11:35 I myself have some more to learn about that, certainly.

11:38 But I think in general, kind of our approach is that Rust is quite type safe, both performant

11:43 and type safe, which is great and memory safe as well.

11:48 And I think most of our, I'll talk a little bit later about some like parallelization

11:54 and vectorization that we're looking at for performance improvements.

11:57 But in terms of safety, I think if you have any questions, feel free to open an issue

12:01 on the Pydantic core repo and get a conversation going with David Hewitt.

12:05 I would imagine it's not, you guys haven't had to do too much with it, just that Python

12:09 currently, but soon, but currently doesn't really let you do much true multi-threading

12:16 because of the GIL.

12:17 But the whole, I think, you know, yeah, I think Python 3.13 is going to be crazy with

12:23 free threaded Python and it's going to be interesting to see how that evolves.

12:27 Yep.

12:28 Yeah.

12:29 I know we definitely do some jumping through hoops and just, you know, having to be really

12:32 conscious of stuff with the GIL in Pydantic core and Py03.

12:36 And Py03 is kind of the library that bridges Python and Rust.

12:41 And so it's heavily used in Pydantic core, as you can imagine.

12:44 So I'm excited to see what changes might look like there.

12:46 Yeah, same.

12:47 All right.

12:48 Well, let's jump into the performance because you're here to tell us all about Pydantic

12:51 performance tips.

12:52 And you've got a whole bunch of these.

12:54 Did you give this talk at PyCon?

12:55 I did partially.

12:56 It's a little bit different, but some of the tips are the same.

12:59 I don't think the videos are out yet, are they?

13:01 As the time of recording on June 13th.

13:03 Yeah.

13:04 No, I actually checked a couple of minutes ago.

13:06 I was like, I said one thing during my talk that I wanted to double check, but the videos

13:10 are not out yet.

13:11 So.

13:12 No, I'm really excited.

13:13 There's going to be a bunch.

13:14 There was actually a bunch of good talks, including yours and some others I want to

13:16 watch, but they're not out yet.

13:18 All right.

13:19 Let's jump into Pydantic performance.

13:21 Where should we start?

13:23 I can start on the slideshow if we want.

13:25 Yeah, let's do that.

13:26 Awesome.

13:27 So, yeah, I think kind of the categories of performance tips that we're going to talk

13:30 about here kind of have some like fast one liner type performance tips that you can implement

13:37 in your own code and then kind of the meat of the like, how do I improve performance

13:41 in my in my application that uses Pydantic?

13:44 We're going to talk a bit about discriminated unions, also called tagged unions, and then

13:49 kind of finally talk about on our end of the development, how are we continuously improving

13:55 performance?

13:56 You know, Pydantic internals, lies, et cetera.

13:58 Sure.

13:59 Do you have the equivalent of unit test for performance?

14:03 Yeah, we do.

14:05 We use a library called CodSpeed that I'm excited to touch on a bit more later.

14:10 Yeah.

14:11 All right.

14:12 Let's talk about that later.

14:13 Perfect.

14:14 Yeah, sure thing.

14:15 So I have this slide up right now just kind of talking about why people use Pydantic.

14:18 We've already covered some of these, but just kind of as a general recap, it's powered by

14:22 type hints.

14:23 And one of our biggest promises is speed.

14:26 We also have these other great features like JSON schema compatibility and documentation

14:31 comes in particularly handy when we talk about APIs, you know, support for custom validation

14:36 and serialization logic.

14:38 And then as we saw with the GitHub repository observations, a very robust ecosystem of libraries

14:44 and other tools that use and depend on Pydantic that leads to this kind of extensive and large

14:49 community, which is really great.

14:52 But this all kind of lies on the foundation of like Pydantic is easy to use and it's very

14:56 fast.

14:57 So let's talk some more about that.

14:58 And this, yeah, well, the speed is really interesting in the multiplier that you all

15:03 have for basically a huge swath of the Python ecosystem, right?

15:08 We just saw the 412,000 things that depend on Pydantic.

15:12 Well, a lot of those, their performance depends on Pydantic's performance as well.

15:17 Right?

15:18 Yeah, certainly.

15:19 Yeah, it's nice to have such a large ecosystem of folks to also, you know, contribute to

15:24 the library as well, right?

15:25 Like, you know, because other people are dependent on our performance, the community definitely

15:28 becomes invested in it as well, which is great.

15:33 This portion of Talk Python to Me is brought to you by OpenTelemetry support at Sentry.

15:38 In the previous two episodes, you heard how we use Sentry's error monitoring at Talk Python,

15:43 and how distributed tracing connects errors, performance and slowdowns and more across

15:48 services and tiers.

15:50 But you may be thinking our company uses OpenTelemetry.

15:54 So it doesn't make sense for us to switch to Sentry.

15:56 After all, OpenTelemetry is a standard, and you've already adopted it, right?

16:01 Well, did you know, with just a couple of lines of code, you can connect OpenTelemetry's

16:06 monitoring and reporting to Sentry's backend.

16:10 OpenTelemetry does not come with a backend to store your data, analytics on top of that

16:13 data, a UI or error monitoring.

16:17 And that's exactly what you get when you integrate Sentry with your OpenTelemetry setup.

16:22 Don't fly blind, fix and monitor code faster with Sentry.

16:26 Integrate your OpenTelemetry systems with Sentry and see what you've been missing.

16:30 Create your Sentry account at talkpython.fm/sentry-telemetry.

16:35 And when you sign up, use the code TALKPYTHON, all caps, no spaces.

16:39 It's good for two free months of Sentry's business plan, which will give you 20 times

16:43 as many monthly events as well as other features.

16:47 My thanks to Sentry for supporting Talk Python to me.

16:50 But yeah, so kind of as that first category, we can chat about some basic performance tips.

16:55 And I'll do my best here to kind of describe this generally for listeners who maybe aren't

16:59 able to see the screen.

17:01 So when you are...

17:03 Can we share your slideshow later with the audience?

17:06 Can we put it in the show notes?

17:07 Yeah, yeah, absolutely.

17:08 Okay, so people want to go back and check it out.

17:10 But yeah, we'll just, we'll describe it for everyone.

17:12 Go ahead.

17:13 Yeah.

17:14 So when you're validating data in Pydantic, you can either validate Python objects or

17:19 like dictionary type data, or you can validate JSON formatted data.

17:24 And so one of these kind of like one liner tips that we have is to use our built in model

17:31 validate JSON method, instead of calling this our model validate method, and then separately

17:37 loading the JSON data with the standard lib JSON package.

17:41 And the reason that we recommend that is one of the like crux of the general performance

17:46 patterns that we try to follow is not materializing things in Python when we don't have to.

17:51 So we've already mentioned that our core is written in Rust, which is much faster than

17:54 Python.

17:55 And so with our model validate JSON built in method, whenever you pass in that string,

18:00 we send it right to Rust.

18:01 Whereas if you do the JSON loading by yourself, you're going to like materialize Python object

18:06 and then have to send it over.

18:07 Right.

18:08 And so you're going to be using the built in JSON load S, which will then or load or

18:14 whatever, and then it'll pull that in, turn into a Python dictionary, then you take it

18:18 and try to convert that back to a Rust data structure, and then validate it in Rust.

18:23 That's where all the validation lives anyway.

18:25 So just get out of the way, right?

18:28 Exactly.

18:29 Yep.

18:30 It's like skip the Python step if you can, right.

18:31 And I will note there is one exception here, which is I mentioned we support custom validation.

18:36 If you're using what we call like before and wrap validators that do something in Python,

18:41 and then call our internal validation logic, and then maybe even do something after, it's

18:47 okay, you can use model validate and the built in JSON dot load S because you're already

18:52 kind of guaranteed to be materializing Python objects in that case.

18:55 But for the vast majority of cases, it's great to just go with the built in model validate

18:59 JSON.

19:00 Yeah, that's really good advice.

19:02 And they seem kind of equivalent.

19:03 But once you know the internals, right, then it's well, maybe it's not exactly.

19:07 Yeah.

19:08 And I think implementing some of these tips is helpful in that if you understand some

19:11 of the kind of like pydantic architectural context, it can also just help you think more

19:16 about like, how can I write my pydantic code better?

19:19 Absolutely.

19:20 So, the next tip I have here, very easy one liner fix, which is when you're using type

19:26 adapter, which is this structure you can use to basically validate one type.

19:32 So we have base models, which we've chatted about before, which is like if you have a

19:36 model with lots of fields, that's kind of the structure you use to define it.

19:39 Well, type adapter is great if you're like, I just want to validate that this data is

19:43 a list of integers, for example, as we're seeing on the screen.

19:46 Right.

19:47 Because let me give people an idea.

19:48 Like if you accept, if you've got a JSON, well, just JSON data from wherever, but you

19:53 know, a lot of times it's coming over an API or it's provided to you as a file and it's

19:57 not your data you control, right?

19:59 You're trying to validate it.

20:00 You could get a dictionary, a JSON object that's got curly braces with a bunch of stuff,

20:06 in which case that's easy to map to a class.

20:08 But if you just have JSON, which is bracket, thing, thing, thing, thing, close bracket.

20:12 Well, how do you have class that represent the list?

20:15 Like it gets really tricky, right?

20:17 To be able to understand, you can't model that with classes.

20:21 And so you all have this type adapter thing, right?

20:24 That's what this, the role plays generally.

20:25 Is that right?

20:26 Yeah.

20:27 And I think it's also really helpful in a testing context.

20:30 Like you know, when we want to check that our validation behavior is right for one type,

20:35 there's no reason to go like build an entire model if you're really just validating against

20:40 one type or structure, type adapter is great.

20:44 And so kind of the advice here is you only want to initialize your type adapter object

20:50 once.

20:51 And the reason behind that is we build a core schema in Python and then attach that to a

20:56 class or type adapter, et cetera.

20:59 And so if you can, you know, not build that type adapter within your loop, but instead

21:03 of instead do it right before or not build it, you know, in your function, but instead

21:08 outside of it, then you can avoid building the core schema over and over again.

21:11 Yeah.

21:12 So basically what you're saying is that the type adapter that you create might as well

21:16 be a singleton because it's stateless, right?

21:19 Like it doesn't store any data.

21:21 It's kind of slightly expensive to create relatively.

21:24 And so if you had a function that was called over and over again, and that function had

21:28 a loop and inside the loop, you're creating the type adapter, that'd be like worst case

21:31 scenario almost, right?

21:33 Yeah, exactly.

21:34 And I think this kind of goes along with like general best programming tips, right?

21:37 Which is like, if you only need to create something once, do that once.

21:42 Exactly.

21:43 You know, a parallel that maybe goes way, way back in time could be like a compiled

21:48 regular expression.

21:49 You know, you wouldn't do that over and over in a loop.

21:52 You would just create a regular, the compiled regular expression and then use it throughout

21:55 your program, right?

21:57 Because it's kind of expensive to do that, but it's fast once it's created.

22:00 Yeah, exactly.

22:01 And funny that you mentioned that.

22:02 I actually fixed a bug last week where we were compiling regular expressions twice when

22:08 folks like specified that as a constraint on a field.

22:11 So definitely just something to keep in mind and easy to fix or implement with type adapters

22:16 here.

22:17 Yeah, awesome.

22:18 Okay.

22:19 I like this one.

22:20 That's a good one.

22:21 Yeah.

22:22 So this next tip also kind of goes along with like general best practices, but the more

22:25 specific you can be with your type hints, the better.

22:28 And so specifically, if you know that you have a list of integers, it's better and more

22:34 efficient to specify a type hint as a list of integers instead of a sequence of integers,

22:39 for example.

22:40 Or if you know you have a dictionary that maps strings to integers, specify that type

22:46 hint as a dictionary, not a mapping.

22:48 Interesting.

22:49 Yeah.

22:50 So you could import a sequence from the typing module, just the generic way, but I guess

22:54 you probably have specific code that runs that can validate lists more efficiently than

22:59 a general iterable type of thing, right?

23:01 Yeah, exactly.

23:02 So in the case of like a sequence versus a list, it's the like square and rectangle thing,

23:07 right?

23:08 Like a list is a sequence, but there are lots of other types of sequences.

23:11 And so you can imagine for a sequence, we like have to check lots of other things.

23:15 Whereas if you know with certainty, this is going to be a list or it should be a list,

23:20 then you can have things be more efficient with specificity there.

23:23 Does it make any difference at all whether you use the more modern type specifications?

23:29 Like traditionally people would say from typing import capital L list, but now you can just

23:35 say lowercase L list with the built-in and no import statement.

23:39 Are those equivalent or is there some minor difference there?

23:42 Do you know?

23:43 Yeah, that's a good question.

23:44 I wouldn't be surprised if there was a minor difference that was more a consequence of

23:48 like Python version, right?

23:50 Because there's like, I mean, I suppose you could import the old capital L list in a newer

23:55 Python version, but I think the difference is like more related to specificity of a type

23:59 in rather than kind of like versioning.

24:03 If the use of that capital L list made you write an import statement, I mean, it would

24:09 cause the program to start ever so slightly slower because there's another import.

24:14 It's got a run where it already knows it's already imported what list is.

24:18 You wouldn't believe how many times I get messages on YouTube videos I've done or even

24:23 from courses saying, Michael, I don't know what you're doing, but your code is just wrong.

24:28 I wrote lowercase L list bracket something and it said list is not a sub indexable or

24:35 something like that.

24:36 And look, you've just done it wrong.

24:37 You're going to need to fix this.

24:38 Or you're on Python 3.7 or something super old before these new features are added.

24:45 But there's just somewhere in the community we haven't communicated this well.

24:49 I don't know.

24:50 Yeah, for sure.

24:52 I was writing some code earlier today in a meeting and I used the like from typing import

24:56 union and then union X and Y type.

25:00 My coworker was like, Sydney, use the pipe.

25:02 Like what are you doing?

25:04 Use the pipe, exactly.

25:05 But here's the thing that was introduced in 3.10, I believe.

25:08 And if people are in 3.9, that code doesn't run or if they're not familiar with the changes.

25:13 So there's all these tradeoffs.

25:15 I almost feel like it would be amazing to go back for any time there's a security release

25:20 that releases, say another 3.7 or something and change the error message to say this feature

25:25 only works in the future version of Python rather than some arbitrary error of you're

25:30 doing it wrong.

25:31 You know, that would be great.

25:32 Yeah, definitely.

25:33 Yeah.

25:34 Some of those errors can be pretty cryptic with the syntax stuff.

25:36 They can.

25:37 All right.

25:38 So be specific list tuple not sequence if you know it's a list or tuple or whatever.

25:43 Yeah.

25:44 And then kind of my last minor tip, which great that you brought up import statements

25:49 and kind of adding general time to a program is I don't have a slide for this one, but

25:54 if we go back to the type adapter slide, we talked about the fact that initializing this

25:59 type adapter builds a core schema and attaches it to that class.

26:04 And that's kind of done at build time at import time.

26:08 So that's like already done.

26:10 And if you really don't want to have that import or like build time, take a long time,

26:17 you can use the defer build flag.

26:20 And so what that does is defers the core schema build until the first validation call.

26:25 You can also set that on model config and things like that.

26:28 But basically the idea here is like striving to be lazier, right?

26:32 Like I see if we don't need to build this core schema right at import time because we

26:36 want our program to start up quickly.

26:38 That's great.

26:39 There's a little bit of a delay on the first validation, but maybe startup time is more

26:43 important.

26:44 So that's a little bit more of a preferential validation, sorry, preferential performance

26:48 tip, but available for folks who need it.

26:50 Yeah.

26:51 Like, let me give you an example.

26:52 I'll give people an example where I think this might be useful.

26:55 So in the Talk Python Training, the courses site, I think we've got 20,000 lines of Python

27:00 code, which is probably more at this point.

27:02 I checked a long time ago, but a lot.

27:04 And it's a package.

27:05 And so when you import it, it goes and imports all the stuff to like run the whole web app,

27:10 but also little utilities like, oh, I just want to get a quick report.

27:14 I want to just access this model and then use it on something real quick.

27:19 It imports all that stuff.

27:20 So that app startup would be potentially slowed down by this.

27:23 Where if you know, like only sometimes is that type adapter used, you don't want to

27:28 necessarily have it completely created until that function gets called.

27:31 So then the first function call might be a little slow, but there'd be plenty of times

27:34 where maybe it never gets called.

27:36 Right?

27:37 Yep, exactly.

27:38 Awesome.

27:39 Okay.

27:40 All right.

27:41 So kind of a more complex performance optimization is using tagged unions.

27:45 They're still pretty simple.

27:46 It's just like a little bit more than a one line change.

27:49 So kind of talking about tagged unions, we can go through a basic example, why we're

27:54 using tagged unions in the first place, and then some more advanced examples.

27:59 This portion of talk Python to me is brought to you by code comments and original podcast

28:03 from Red Hat.

28:05 You know, when you're working on a project and you leave behind a small comment in the

28:08 code, maybe you're hoping to help others learn what isn't clear at first.

28:14 Sometimes that code comment tells a story of a challenging journey to the current state

28:18 of the project.

28:19 Code comments, the podcast features technologists who've been through tough tech transitions,

28:25 and they share how their teams survived that journey.

28:28 The host Jamie Parker is a Red Hatter and an experienced engineer.

28:32 In each episode, Jamie recounts the stories of technologists from across the industry

28:37 who've been on a journey implementing new technologies.

28:40 I recently listened to an episode about DevOps from the folks at Worldwide Technology.

28:45 The hardest challenge turned out to be getting buy in on the new tech stack rather than using

28:50 that tech stack directly.

28:52 It's a message that we can all relate to.

28:54 And I'm sure you can take some hard one lessons back to your own team.

28:58 Give code comments a listen.

29:00 Search for code comments in your podcast player or just use our link, talkpython.fm/code-comments.

29:07 The link is in your podcast player's show notes.

29:09 Thank you to code comments and Red Hat for supporting Talk Python to Me.

29:14 Let's start with what are tag unions because I honestly have no idea.

29:16 I know what unions are, but tagging them, I don't know.

29:19 Yeah, sure thing.

29:20 So tag unions are a special type of union.

29:24 We also call them discriminated unions.

29:26 And they help you specify a member of a model that you can use for discrimination in your

29:33 validation.

29:34 So what that means is if you have two models that are pretty similar, and your field can

29:40 be either one of those types of models, model X or model Y, but you know that there's one

29:46 tag or like discriminator field that differs, you can specifically validate against that

29:52 field and skip some of the other validation, right?

29:54 So like, I'll move on to an example here in a second, but basically it helps you validate

30:00 more efficiently because you get to skip validation of some fields.

30:03 So it's really helpful if you have models that have like 100 fields, but one of them

30:06 is really indicative of what type it might be.

30:09 I see.

30:10 So instead of trying to figure out like, is it all of this stuff, once you know it has

30:14 this aspect or that aspect, then you can sort of branch it on a path and just treat it as

30:18 one of the elements of the union.

30:20 Is that right?

30:21 Yes, exactly.

30:22 Okay.

30:23 So one other note about discriminated unions is you specify this discriminator and it can

30:28 either be a string, like literal type or callable type.

30:31 And we'll look at some examples of both.

30:33 So here's kind of a more concrete example so we can really better understand this.

30:38 So let's say we have a, this is the classic example, right?

30:41 A cat model and a dog model.

30:43 Yeah.

30:44 Cat people, dog people.

30:45 You're going to start a debate here.

30:47 Exactly, exactly.

30:48 They both have this pet type field.

30:52 So for the cat model, it's a literal that is just the string cat.

30:56 And then for the dog model, it's the literal that's the string dog.

30:59 So it's just kind of a flag on a model to indicate what type it is.

31:03 And you can imagine, you know, in this basic case, we only have a couple of fields attached

31:07 to each model, but maybe this is like data in a like vet database.

31:13 And so you can imagine like there's going to be tons of fields attached to this, right?

31:16 So it'd be pretty helpful to just be able to look at it and say, oh, the pet type is

31:20 dog.

31:21 So this data is valid for a dog type.

31:23 I'll also note we have a lizard in here.

31:26 So what this looks like in terms of validation with Pydantic then is that when we specify

31:33 this pet field, we just add one extra setting, which says that the discriminator is that

31:39 pet type field.

31:40 And so then when we pass in data that corresponds to a dog model, Pydantic is smart enough to

31:46 say, oh, this is a discriminated union field.

31:48 Let me go look for the pet type field on the model and just see what that is and then use

31:54 that to inform my decision for what type I should validate against.

31:58 OK, that's awesome.

32:00 So if we don't set the discriminator keyword value in the field for the union, it'll still

32:07 work, right?

32:08 It just has to be more exhaustive and slow.

32:10 Yeah, exactly.

32:11 So it'll still validate and it'll say, hey, let's take this input data and try to validate

32:16 it against the cat model.

32:18 And then Pydantic will come back and say, oh, that's not a valid cat.

32:20 Like let's try the next one.

32:22 Whereas with this discriminated pattern, we can skip right to the dog, which you can imagine

32:27 helps us skip some of the unnecessary steps.

32:29 Yeah, absolutely.

32:30 OK, that's really cool.

32:31 I had no idea about this.

32:32 Yeah, yeah.

32:33 It's a cool, I'd say like moderate level feature.

32:36 Like I think if you're just starting to use Pydantic, you probably haven't touched discriminated

32:40 unions much, but we hope that it's simple enough to implement that most folks can use

32:44 it if they're using unions.

32:46 Yeah, that's cool.

32:47 I don't use unions very often, which is probably why, other than, you know, symantic pipe none,

32:52 which is, you know, like optional.

32:53 But yeah, if I did, I'll definitely remember this.

32:57 Yeah.

32:58 All righty.

32:59 So as I've mentioned, this helps for more efficient validation.

33:03 And then where this really comes and has a lot of value is when you are dealing with

33:07 lots of nested models or models that have tons of fields.

33:10 So let's say you have a union with like 10 members and each member of the union has 100

33:15 fields.

33:16 If you could just do validation against 100 fields instead of a thousand, that would be

33:19 great in terms of a performance gain.

33:22 And then once again, with nested models, you know, if you can skip lots of those union

33:27 member validations, also going to boost your performance.

33:29 Yeah, for sure.

33:30 You know, an example where this seems very likely would be using it with Beanie or some

33:35 other document database where the modeling structure is very hierarchical.

33:40 You end up with a lot of nested sub-Pydantic models in there.

33:45 Yeah, very much so.

33:48 So as a little bit of an added benefit, we can talk about kind of this improved error

33:52 handling, which is a great way to kind of visualize why the discriminated union pattern

33:57 is more efficient.

33:58 So right now we're looking at an example of validation against a model that doesn't use

34:03 a discriminated union.

34:05 And the errors are not very nice to look at.

34:07 You basically see the errors for every single permutation of the different values.

34:13 And we're using nested models, so it's very hard to interpret.

34:17 So we don't have to look at this for too long.

34:19 It's not very nice.

34:20 But if we look at...

34:22 But basically the error message says, look, there's something wrong with the union.

34:26 If it was a string, it is missing these things.

34:28 If it was this kind of thing, it misses those things.

34:31 Like if it was a dog, it misses this.

34:33 If it's a cat, it misses that.

34:35 And it doesn't specifically tell you.

34:38 Exactly.

34:39 It's a dog, so it's missing like the collar size or whatever, right?

34:43 Exactly.

34:44 But then...

34:45 And I'll go back and kind of explain the discriminated model for this case in a second.

34:49 But if you look at...

34:50 This is the model with the discriminated union instead.

34:54 We have one very nice error that says, okay, you're trying to validate this X field and

35:00 it's the wrong type, right?

35:03 So yeah.

35:05 The first example that we were looking at was using string type discriminators.

35:09 So we just had this pet type thing that said, oh, this is a cat or this is a dog, that sort

35:13 of thing.

35:14 We also offer some more customization in terms of...

35:19 We also allow callable discriminators.

35:21 So in this case, this field can be either a string or this instance of discriminated

35:29 model.

35:30 So it's kind of a recursive pattern, right?

35:32 And that's where you can imagine the nested structures becoming very complex very easily.

35:38 And we use this kind of callable to differentiate between which model we should validate against.

35:44 And then we tag each of the cases.

35:47 So a little bit more of a complex application here.

35:50 But once again, when you kind of see the benefit in terms of errors and interpreting things

35:54 and performance, I think it's generally a worthwhile investment.

35:57 That's cool.

35:58 So if you wanted something like a composite key equivalent of a discriminator, right?

36:04 Like if it has this field and its nested model is of this type, it's one thing versus another.

36:10 Like a free user versus a paying user, you might have to look and see their total lifetime

36:15 value plus that they're a registered user.

36:18 I don't know, something like that.

36:19 You could write code that would pull that information out and then discriminate which

36:22 thing to validate against, right?

36:24 Yeah, exactly.

36:25 Yeah, definitely comes in handy when you have like...

36:28 You're like, okay, well, I still want the performance benefits of a discriminated union,

36:32 but I kind of have three fields on each model that are indicative of which one I should

36:36 validate against, right?

36:37 And it's like, well, you know, taking the time to look at those three fields over the

36:41 hundred is definitely worth it.

36:44 Just a little bit of complexity for the developer.

36:48 One other note here is that discriminated unions are...

36:50 Can we go back really quick on the previous one?

36:52 So I got a quick question.

36:53 So for this, you write a function.

36:56 It's given the value that comes in, which could be a string, it could be a dictionary.

37:02 Could you do a little bit further performance improvements and add like a func tools, LRU

37:09 cache to cache the output?

37:11 So every time it sees the same thing, if there's a repeated data through your validation, it

37:15 goes, I already know what it is.

37:16 What do you think?

37:17 Yeah, I do think that would be possible.

37:19 That's definitely an optimization we should try out and put in our docs for like the advanced,

37:24 advanced performance tips.

37:25 Yeah, because if you've got a thousand strings and then you, you know, that were like it's

37:31 going to be male, female, male, female, male, male, female, like that kind of where the

37:35 data is repeated a bunch.

37:37 Then it could just go, yep, we already know that answer.

37:40 Yeah.

37:41 That'd be potentially, I don't know.

37:43 Yeah, no, definitely.

37:44 And I will say, I don't know if it takes effect.

37:47 I don't think it takes effect with discriminated unions because this logic is kind of in Python,

37:53 but I will say we recently added a like string caching setting because we have kind of our

37:57 own JSON parsing logic that we use in Pydantic Core.

38:01 And so we added a string caching setting so that you don't have to rebuild the exact same

38:05 strings every time.

38:07 So that's a nice performance piece.

38:09 Yeah, nice.

38:10 Caching's awesome.

38:11 Until it's not.

38:12 Yeah, exactly.

38:14 So one quick note here is just that discriminated unions are still JSON schema compatible, which

38:19 is awesome for the case where you're once again, defining like API requests and responses.

38:24 You want to still have valid JSON schema coming out of your models.

38:27 Yeah, very cool.

38:28 And that might show up in things like open API documentation and stuff like that, right?

38:34 Yep, exactly.

38:35 So I'll kind of skip over this.

38:37 We already touched on the callable discriminators.

38:39 And then I'll leave these slides up here as a reference.

38:43 Again, I don't think this is worth touching in too much detail, but just kind of another

38:48 comment about if you've got nested models, that still works well with discriminated unions.

38:53 So we're still on the pet example, but let's say this time you have a white cat and a black

38:58 cat model.

39:00 And then you also have your existing dog model.

39:03 You can still create a union of, you know, your cat union is a union of black cat and

39:09 white cat.

39:10 And then you can union that with the dogs and it still works.

39:13 And once again, you can kind of imagine the exponential blow up that would occur if you

39:18 didn't use some sort of discriminator here in terms of errors.

39:21 Yeah, very interesting.

39:22 OK, cool.

39:23 Yeah.

39:24 So that's kind of all in terms of my recommendations for discriminated union application.

39:30 I would encourage folks who are interested in this to check out our documentation.

39:34 It's pretty thorough in that regard.

39:35 And I think we also have those links attached to the podcast.

39:38 Yeah, definitely.

39:39 And then performance improvements in the pipeline.

39:42 Is this something that we can control from the outside?

39:44 Is this something that you all are just adding for us?

39:47 Yeah, good question.

39:48 This is hopefully maybe not all in the next version, but just kind of things we're keeping

39:53 our eyes on in terms of requested performance improvements and ideas that we have.

39:57 I'll go a little bit out of order here.

40:00 We've been talking a bunch about core schema and kind of, you know, maybe deferring the

40:04 build of that or, you know, just trying to optimize that.

40:07 And that actually happens in Python.

40:09 So one of the biggest things that we're trying to do is effectively speed up the core schema

40:14 building process so that import times are faster and just, you know, Pydantic is more

40:19 performant in general.

40:21 So one thing that I'd like to ask about, kind of back on the Python side a little bit, suppose

40:28 I've got some really large document, right?

40:32 Really nested document.

40:33 Maybe I've converted some terrible XML thing into JSON or I don't know, something.

40:38 And there's a little bit of structured schema that I care about.

40:42 And then there's a whole bunch of other stuff that I could potentially create nested models

40:47 to go to, but I don't really care about validating them.

40:49 It's just whatever it is, it is.

40:52 What if you just said that was a dictionary?

40:54 Would that short circuit a whole bunch of validation and stuff that would make it faster

40:58 potentially?

40:59 Yeah.

41:00 Turn off the validation for a subset of the model if it's really big and deep and you

41:05 don't really care for that part?

41:06 Yeah.

41:07 Good question.

41:08 So there's an annotation called skip validation that you can apply to certain types.

41:13 So that's kind of one approach.

41:15 I think in the future, it could be nice to offer kind of a config setting so that you

41:18 can more easily list features that you want to skip validation for instead of applying

41:23 those on a field by field basis.

41:26 And then the other thing is if you only define your model in terms of the fields that you

41:30 really care about from that very gigantic amount of data, we will just ignore the extra

41:36 data that you pass in and pull out the relevant information.

41:39 Right.

41:40 Okay.

41:41 Yeah.

41:42 Good.

41:43 Back to the pipeline.

41:44 Yeah.

41:45 Back to the pipeline.

41:46 So another improvement, we talked a little bit about potential parallelization of things

41:51 or vectorization.

41:52 One thing that I'm excited to learn more about in the future and that we've started working

41:56 on is this thing called SIMD in Jitter.

41:59 And that's our JSON iterable parser library that I was talking about.

42:02 And so SIMD stands for a single instruction, multiple data.

42:07 Basically means that you can do operations faster.

42:10 And that's with this kind of vectorization approach.

42:12 I certainly don't claim to be an expert in SIMD, but I know that it's improving our validation

42:19 speeds in the department of JSON parsing.

42:22 So that's something that we're hoping to support for a broader set of architectures going forward.

42:28 Yeah, that's really cool.

42:30 Just like what Pandas does for Python, instead of looping over and validation and doing something

42:34 to each piece, you just go this whole column, multiply it by two.

42:38 Yep.

42:40 Exactly.

42:41 I'm sure it's not implemented the same, but like conceptually the same.

42:42 Yep.

42:44 Very much so.

42:45 And then the other two things in the pipeline that I'm going to mention are kind of related

42:49 once again to the avoiding materializing things in Python if we can.

42:53 And we're even kind of extending that to avoiding materializing things in Rust if we don't have

42:58 to.

42:59 So the first thing is when we're parsing JSON in Rust, can we just do the validation as

43:03 we kind of chomp through the JSON instead of like materializing the JSON as a Rust object

43:09 and then doing all the validation?

43:10 It's like, can we just do it in one pass?

43:12 Okay.

43:13 Is that almost like generators and iterables rather than loading all into memory at once

43:19 and then processing it one at a time?

43:21 Yeah, exactly.

43:22 And it's kind of like, do you build the tree and then walk it three times or do you just

43:28 do your operations every time you add something to the tree?

43:32 And then the last performance improvement in the pipeline that I'll mention is this

43:35 thing called fast model.

43:37 Has not been released yet.

43:39 Hasn't really even been significantly developed, but this is cool in that it's really approaching

43:44 that kind of laziness concept again.

43:46 So attributes would remain in Rust after validation until they're requested.

43:51 So this is kind of along the lines of the defer build logic that we were talking about

43:55 in terms of like, we're not going to send you the data or perform the necessary operations

43:59 until they're requested.

44:00 Right.

44:01 Okay.

44:02 Yeah.

44:03 If you don't ever access the field, then why process all that stuff, right?

44:05 And convert it into Python objects.

44:07 Yeah, exactly.

44:08 But yeah, we're kind of just excited in general to be looking at lots of performance improvements

44:14 on our end, even after the big V2 speed up, still have lots of other things to work on

44:18 and improve.

44:19 Yeah, it sure seems like it.

44:21 And if this free threaded Python thing takes off, who knows, maybe there's even more craziness

44:27 with parallel processing of different branches of the model at different, you know, alongside

44:34 each other.

44:35 Yeah.

44:36 So I think this kind of devtools nicely into like you asked earlier, like, is there a way

44:41 that we kind of monitor the performance improvements that we're making?

44:45 And we're currently using and getting started with two tools that are really helpful.

44:51 And I can share some PRs if that's helpful, send links after.

44:55 But one of them is Codspeed, which integrates super nicely with CI and GitHub.

45:02 And it basically runs tests tagged with this like benchmark tag.

45:08 And then it'll, you know, run them on main compared to on your branch.

45:11 And then you can see like, oh, this made my code, you know, 30% slower, like maybe let's

45:16 not merge that right away.

45:18 Or conversely, if you know there's a 30% improvement on some of your benchmarks, it's really nice

45:23 to kind of track and see that.

45:25 I see.

45:26 So it looks like it sets up.

45:27 So this is a Codspeed.io, right?

45:31 And then it sets up as a GitHub action as part of your CI/CD.

45:36 And you know, probably automatically runs when a PR is open and things along those lines,

45:40 right?

45:41 Yep, exactly.

45:42 All right.

45:43 I've never heard of this.

45:44 But yeah, if it just does the performance testing for yourself automatically, why not?

45:48 Right?

45:49 Let it let it do that.

45:50 Yeah.

45:51 And then I guess another tool that I'll mention while talking about kind of our, you know,

45:57 continuous optimization is a one word for it is this tool kind of similarly named called

46:03 CodeFlash.

46:05 So CodeFlash is a new tool that uses LLMs to kind of read your code and then develop

46:12 potentially more performant versions.

46:15 Kind of analyze those in terms of, you know, is it pass is this new code passing existing

46:20 tests?

46:21 Is it passing additional tests that we write?

46:24 And then another great thing that it does is open PRs for you with those improvements

46:28 and then explain the improvements.

46:31 So I think it's a really pioneering tool in the space.

46:34 And we're excited to kind of experiment with it more on our PRs and in our repository.

46:40 Okay.

46:41 I love it.

46:42 Just tell me why is this?

46:44 Why did this slow down?

46:45 Well, here's why.

46:46 Yeah, exactly.

46:47 Yeah.

46:48 And they offer both like local runs of the tool and also built in CI support.

46:54 So those are just kind of two tools that we use to use and are increasingly using to help

46:59 us kind of check our performance as we continue to develop and really inspire us to, you know,

47:05 get those green check marks with the like performance improved on lots of PRs.

47:10 The more you can have it where if it passes the automated build, it's just ready to go

47:15 and you don't have to worry a little bit and keep testing things and then have uncertainty.

47:20 You know that.

47:21 It's nice, right?

47:22 Because you're allowed to rest and sleep at night.

47:25 Yeah, most certainly.

47:27 I mean, I said it before, but the number of people who are impacted by Pydantic, I don't

47:33 know what that number is, but it has to be tremendous because if there's 400,000 projects

47:37 that use it, like think of the users of those projects, right?

47:40 Like that, that multiple has got to be big for, you know, I'm sure there's some really

47:43 popular ones, for example, FastAPI.

47:45 Yeah.

47:48 And it's just nice to know that there are other companies and tools out there that can

47:52 help us to really boost the performance benefits for all those users, which is great.

47:57 All right.

47:58 Yeah, that is really cool.

47:59 I think, you know, let's talk about one more performance benefit for people and not so

48:04 much in how fast your code runs, but in how fast you go from raw data to Pydantic models.

48:11 So one thing you probably have seen, we may have even spoken about this before, are you

48:16 familiar with JSON to Pydantic?

48:18 The website?

48:19 Yeah, it's a really cool tool.

48:20 Yeah, it's such a cool tool.

48:21 And if you've got some really complicated data, like let's see, I'll pull up some weather

48:26 data that's in JSON format or something, right?

48:29 Like if you just take this and you throw it in here, just don't even have to pretty print

48:33 it, it'll just go, okay, well, it looks like what we've got is, you know, this really complicated

48:38 nested model here.

48:40 And it took, you know, we did this while I was talking, it took 10 seconds for me clicking

48:44 the API to get a response to having like a pretty decent representation here.

48:49 Yeah, it's great in terms of like developer agility, especially, right?

48:53 It's like, oh, I've, you know, heard of this tool called Pydantic.

48:55 I've seen it in places like, I don't really know if I want to manually go build all these

48:59 models for my super complicated JSON data.

49:02 It's like, boom, three seconds done for you, basically.

49:05 Exactly.

49:06 Like, is it really worth it?

49:07 Because I don't want to have to figure this thing out and figure out all the types and

49:11 like, no, just paste it in there and see what you get.

49:13 You're going to be, it won't be perfect, right?

49:15 Some things, if they're null in your data, but they could be something that would make

49:19 them an optional element, like they could be an integer or they could be null.

49:22 It won't know that it's going to be an integer, right?

49:25 Right.

49:26 So you kind of got to patch it up a tiny bit, but in general, I think this is really good.

49:30 And then also, you know, just drop in with your, your favorite LLM, you know, I've been

49:36 using LM studio, which is awesome.

49:37 Nice.

49:38 I heard you talk about that on one of the most recent podcasts, right?

49:42 Yeah.

49:44 It's super cool.

49:45 You can download, just download LLAMA3 and run it locally with like a, I think my, my

49:49 computer can only handle 7 billion parameter models, but you know that you get pretty good

49:53 answers and if you give it a JSON, a piece of JSON data and you say, convert that to

49:57 Pydantic, you'll get really good results.

50:00 You have a little more control over than what you just get with this, this tool.

50:03 But I think those two things, while not about runtime performance, you know, going from

50:09 I have data till I'm working with Pydantic, that's pretty awesome.

50:12 Yeah, definitely.

50:13 And if any, you know, passionate open source contributors are listening and want to create

50:18 like a CLI tool for doing this locally, I'm sure that would be.

50:23 I think this is based on something that I don't use, but I think it's based on this

50:29 data model code generator, which I think might be a CLI tool or a library.

50:34 Let's see.

50:35 Yes.

50:36 Oh yeah.

50:37 Very nice.

50:38 So here, here's the problem that you, you know, you go and define like a YAML file.

50:40 You know, like it's, it's just not as easy as like there's a text field I pasted my stuff,

50:45 but it does technically, technically work, I suppose.

50:49 Yeah.

50:50 I know.

50:51 Definitely the LLM approach or just the basic website approach is very quick, which is nice.

50:55 Yeah.

50:56 Let's talk about LLMs just really quick.

50:57 Like I feel, you know, you get some of the Python newsletters and other places that like,

51:02 here's the cool new packages.

51:03 A lot of them are like nine out of 10 of them are about LLMs these days.

51:07 I was like, that feels a little over the top to me, but I know there's other things going

51:11 on in the world, but you know, just what are your thoughts on LLMs and encoding these days?

51:15 I know you, you write a lot of code and think about it a lot and probably use LLM somewhere

51:19 in there.

51:20 Yeah, no, for sure.

51:21 I, I'm pretty optimistic and excited about it.

51:24 I think there's a lot of good that can be done and a lot of productivity boosting to

51:29 be had from integrating with these tools, both in your like local development environment

51:34 and also just in general.

51:36 I think sometimes, you know, it's also great in the performance department, right?

51:40 Like we can see with CodeFlash using LLMs to help you write for performant code can

51:46 also be really useful.

51:47 And it's been exciting to see some libraries really leverage Pydantic as well in that space

51:52 in terms of like validating LLM outputs or even using LLM calls in Pydantic validators

51:58 to validate, you know, data along constraints that are more like language model friendly.

52:05 So yeah, I'm optimistic about it.

52:06 I still have a lot to learn, but it's cool to see the variety of applications and kind

52:10 of where you can plug in Pydantic in that process for fun.

52:13 Yeah, I totally agree.

52:15 Right now, the context window, like how much you can give it as information than to start

52:20 asking questions is still a little bit small.

52:23 Like you can't give it some huge program and say, you know, find me the bugs where this

52:27 function is called or you know, whatever.

52:28 And it like it doesn't quite understand enough all at once.

52:31 But that thing keeps growing.

52:33 So eventually, someday we'll all see.

52:36 Yep.

52:37 All right.

52:38 Well, let's talk just for a minute, maybe real quick about what you all are doing at

52:42 Pydantic the company rather than Pydantic the open source library.

52:47 Like what do you all got going on there?

52:48 Yeah, sure.

52:49 So Pydantic has the company has released our first commercial tool.

52:55 It's called LogFire and it's in open beta.

52:58 So it's an observability platform.

53:01 And we'd really encourage anyone interested to try it out.

53:04 It's super easy to get started with, you know, just the basic like pip install of the SDK

53:10 and then start using it in your code base.

53:12 And then we have the kind of LogFire dashboard where you're going to see the observability

53:18 and results.

53:20 And so we kind of adopt this like needle in the haystack philosophy where we want this

53:24 to be a very easy to use observability platform that offers very like Python centric insights.

53:31 And it's this kind of opinionated wrapper around open telemetry, if folks are familiar

53:37 with that.

53:38 But in kind of the context of performance, one of the great things about this tool is

53:42 that it offers this like nested logging and profiling structure for code.

53:47 And so it can be really helpful in kind of looking at your code and being like, we don't

53:51 know where this performance slowdown is occurring.

53:54 But if we integrate with LogFire, we can see that like very easily in the dashboard.

53:59 Yeah, you have some interesting approaches, like specifically targeting popular frameworks

54:06 like instrument FastAPI or something like that, right?

54:09 Yeah, definitely.

54:10 Trying to kind of build integrations that work very well with FastAPI, other tools

54:15 like that, and even also offering kind of like custom features in the dashboard, right?

54:20 Like if you're looking at, you know, if you're using an observability tool, you're probably

54:24 advanced enough to want to add some extra things to your dashboard.

54:27 And we're working on supporting that with fast UI, which I know you've chatted with

54:30 Samuel about as well.

54:32 Yeah, absolutely.

54:33 I got a chance to talk to Samuel about LogFire and some of the behind the scenes infrastructure

54:38 was really interesting.

54:39 But also speaking of fast UI, you know, I did speak to him.

54:43 When was that?

54:44 Back in February.

54:45 So this is a really popular project.

54:48 And even on the, I was like, quite a few people decided that they were interested in even

54:54 watching the video on that one, which, yeah.

54:58 Anything with fast UI?

54:59 Sorry, did you say anything with fast UI?

55:01 Yeah, yeah.

55:03 Are you doing anything on the fast UI side?

55:04 Are you on the Pydantic side of things?

55:07 Yeah, good question.

55:08 I've been working mostly on Pydantic, just, you know, larger user base, more feature requests,

55:14 but excited to, I've done a little bit on the fast UI side and excited to kind of brush

55:18 up on my TypeScript and build that out as a more robust and supported tool.

55:22 I think, especially as we grow as a company and have more open source support in general,

55:27 that'll be a priority for us, which is exciting.

55:30 Yeah.

55:31 It's an interesting project.

55:33 Definitely a cool way to do JavaScript front ends and react and then plug those back into

55:39 Python APIs, like FastAPI and those types of things.

55:42 Right.

55:43 So, yeah.

55:44 Yeah.

55:45 And kind of a similarity with fast UI and Logfire, the new tool, is that there's pretty

55:48 seamless integration with Pydantic, which is definitely going to be one of the kind

55:52 of core tenants of any products or open source things that we're producing in the future.

55:56 Yeah.

55:57 I can imagine that's something you want to pay special attention to is like, how well

56:00 do these things fit together as a whole, rather than just, here's something interesting, here's

56:04 something interesting.

56:05 Yeah.

56:06 Awesome.

56:07 All right.

56:08 Well, I think that pretty much wraps it up for the time that we have to talk today.

56:13 Let's close it out.

56:14 Close it out for us with maybe a final call to action for people who are already using

56:19 Pydantic and they want it to go faster, or maybe they could adopt some of these tips.

56:24 What do you tell them?

56:25 Yeah.

56:26 So, you know, inform yourself just a little bit about kind of the Pydantic architecture,

56:31 just in terms of like, what is core schema and why are we using Rust for validation and

56:35 serialization?

56:36 And then that can kind of take you to the next steps of, when do I want to build my

56:41 core schemas based on kind of the nature of my application?

56:44 Is it okay if imports take a little bit longer or do I want to delay that?

56:48 And then take a look at discriminated unions.

56:51 And then maybe if you're really interested in improving performance across your application

56:54 that supports Pydantic and other things, trying out LogFire and just seeing what sort of benefits

57:00 you can get there.

57:01 Yeah.

57:02 See where you're spending your time is one of the very, you know, not just focused on

57:05 Pydantic, but in general, our intuition is often pretty bad for where is your code slow

57:11 and where is it not slow?

57:12 You're like, that looks really complicated.

57:14 That must be slow.

57:15 Like, nope.

57:16 It's that one call to like some sub module that you didn't realize was terrible.

57:19 Yeah.

57:21 And I guess that kind of circles back to the like LLM tools and, you know, integrated performance

57:27 analysis with CodSpeed and CodeFlash and even just other LLM tools, which is like, use the

57:31 tools you have at hand.

57:32 And yeah, sometimes they're better at performance improvements than you might be, or it can

57:37 at least give you good tips that give you, you know, a launching point, which is great.

57:40 Yeah, for sure.

57:41 Or even good old C profile built right in, right?

57:44 If you really, if you want to do it that way.

57:46 Awesome.

57:47 All right.

57:49 Sydney, thank you for being back on the show and sharing all these tips and congratulations

57:53 on all the work you and the team are doing.

57:56 You know, what a, what a success Pydantic is.

57:58 Yeah.

57:59 Thank you so much for having me.

58:00 It was wonderful to get to have this discussion with you and excited that I got to meet you

58:03 in person at PyCon recently.

58:04 Yeah, that was really great.

58:05 Really great.

58:06 Until, until next PyCon.

58:08 See you later.

58:10 This has been another episode of Talk Python to Me.

58:13 Thank you to our sponsors.

58:14 Be sure to check out what they're offering.

58:15 It really helps support the show.

58:18 Take some stress out of your life.

58:19 Get notified immediately about errors and performance issues in your web or mobile applications

58:24 with Sentry.

58:25 Just visit talkpython.fm/sentry and get started for free.

58:30 And be sure to use the promo code talkpython, all one word.

58:34 Code comments and original podcast from Red Hat.

58:37 This podcast covers stories from technologists who've been through tough tech transitions

58:42 and share how their teams survived the journey.

58:46 Those are available everywhere you listen to your podcasts and at talkpython.fm/code-comments.

58:49 Want to level up your Python?

58:53 We have one of the largest catalogs of Python video courses over at Talk Python.

58:57 Our content ranges from true beginners to deeply advanced topics like memory and async.

59:02 And best of all, there's not a subscription in sight.

59:05 Check it out for yourself at training.talkpython.fm.

59:08 Be sure to subscribe to the show.

59:10 Open your favorite podcast app and search for Python.

59:13 We should be right at the top.

59:14 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the Direct

59:20 RSS feed at /rss on talkpython.fm.

59:24 We're live streaming most of our recordings these days.

59:26 If you want to be part of the show and have your comments featured on the air, be sure

59:30 to subscribe to our YouTube channel at talkpython.fm/youtube.

59:35 This is your host, Michael Kennedy.

59:36 Thanks so much for listening.

59:37 I really appreciate it.

59:39 Now get out there and write some Python code.

59:41 [MUSIC PLAYING]

59:43 [MUSIC ENDS]

59:45 [MUSIC PLAYING]

59:47 [MUSIC ENDS]

59:49 [MUSIC PLAYING]

59:52 [MUSIC ENDS]

59:54 [MUSIC PLAYING]

59:57 [MUSIC ENDS]

59:59 [MUSIC]

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon