Learn Python with Talk Python's 270 hours of courses

#487: Building Rust Extensions for Python Transcript

Recorded on Thursday, Nov 21, 2024.

00:00 There have been a lot of changes in the low-level Python space these days.

00:02 The biggest has to be how many projects have rewritten core performance-sensitive

00:07 sections in Rust, or even the wholesale adoption of Rust for newer projects such as UV and Ruff.

00:14 On this episode, we dive into the tools and workflow needed to build these portions of

00:19 Python apps in Rust with David Seddon and Samuel Colvin. This is Talk Python to Me,

00:24 episode 487, recorded November 21st, 2024.

00:29 Are you ready for your host?

00:30 Here he is.

00:31 You're listening to Michael Kennedy on Talk Python to Me.

00:35 Live from Portland, Oregon, and this segment was made with Python.

00:38 Welcome to Talk Python to Me, a weekly podcast on Python.

00:44 This is your host, Michael Kennedy.

00:47 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

00:52 both accounts over at fosstodon.org, and keep up with the show and listen to over nine years of

00:58 episodes at talkpython.fm. If you want to be part of our live episodes, you can find the live

01:03 streams over on YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube and get

01:09 notified about upcoming shows.

01:11 This episode is sponsored by Posit Connect from the makers of Shiny. Publish, share, and deploy all of your data projects that you're creating using Python.

01:20 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.

01:27 Posit Connect supports all of them.

01:29 Try Posit Connect for free by going to talkpython.fm/Posit.

01:34 P-O-S-I-T. And it's brought to you by the Data Citizens Dialogues podcast from Colibra.

01:39 If you're ready for a deeper dive into the latest hot topics and data, listen to an episode at talkpython.fm/citizens.

01:46 Hey everyone, it's the week of Thanksgiving in the United States. You know what that means.

01:51 Black Friday sales galore.

01:53 Of course, we're having Black Friday sales at Talk Python as well.

01:57 All of our courses are on sale from between 18 to 50% off through our Everything Bundle.

02:02 Get huge discounts on the Everything Bundle by visiting talkpython.fm/Black Friday.

02:09 But be sure to hurry. This deal ends next week after Monday.

02:14 So act now if you've been looking at taking a couple of our courses.

02:17 The price is about two courses for the entire library.

02:20 So you probably want to jump on that.

02:22 And thank you to everyone who's taken some of our courses.

02:25 It really helps support the show.

02:26 Let's jump into that interview.

02:28 David, Samuel, welcome to talkpython.fm.

02:31 Thank you very much for having me.

02:32 Thank you for having me.

02:33 Yeah. Good to have you, David.

02:34 And Samuel, it's always good to have you back on the show.

02:37 So we're going to talk about Rust.

02:40 Some really cool experiences that David had building a linter that works on a ridiculous number of different projects and packages.

02:50 And Samuel, Rust is obviously core to Pydantic.

02:54 And I want to talk a bit about Pydantic and how you guys have used Rust as well.

02:57 Should be a good time.

02:58 Yeah. Looking forward to it.

02:59 And thanks for having me on.

03:00 I've been listening to this podcast for years.

03:02 So it's very nice finally to be talking.

03:05 It's a little like open source.

03:06 You get to help create it, you know?

03:07 Exactly.

03:09 All right.

03:09 Let's do quick introductions.

03:10 I know, Samuel, you almost don't need introductions.

03:13 Pydantic is so popular.

03:15 However, I'm sure there's a couple people out there.

03:18 Before I let you introduce yourself and stuff just a tiny, just quickly.

03:21 You have the honor, distinguished or maybe otherwise, of actually participating in the loudest Talk Python episode ever.

03:31 Do you remember that forklift that was driving around behind us?

03:35 It was dystopian, wasn't it?

03:36 It was the end of PyCon two years ago.

03:39 And we started doing the podcast.

03:40 And then they started taking down the PyCon around us.

03:43 And there was this forklift.

03:44 It felt like slight jeopardy, but it was good.

03:47 It certainly was a concentration test to have the forklift driving right behind us, beeping as loud as I could.

03:53 It was amazing.

03:54 All right.

03:54 Tell people about yourself and Pydantic and stuff.

03:57 So I started Pydantic back in 2017 as a side project.

04:01 And I toodled along.

04:02 And then 2021, it somehow, something happened.

04:05 And the rate of downloads just started to increase a lot.

04:08 Started working on it full time in 2022.

04:10 Decided to do this rebuild of the core in Rust.

04:13 Because while I was really proud of how much people were using Pydantic, I wasn't particularly proud of its internals.

04:18 I had done a bit of Rust, a bit of a couple of other projects that wrapped Rust to produce Python packages, but nothing on the scale of Pydantic.

04:26 And then eight months into this three-month project, I was halfway through.

04:30 And Sequoia very kindly got in touch and offered to invest.

04:34 And so I started a company around Pydantic.

04:36 And so, yeah, we released Pydantic V2, the rewrite, middle of last year.

04:41 And, yeah, adoption of Pydantic, as I'm pleased to say, continued to grow.

04:46 We had, I think, 307 million downloads in October.

04:49 And now we're obviously building commercial stuff, Logfire in particular.

04:52 But we also do a bit more Rust in Python.

04:55 So we have Jitter, which is our very fast Rust-based JSON parser, which is available both.

05:00 It's used in Pydantic and is a separate package.

05:02 So, yeah, that's a kind of summary of my interaction with Rust and Python over the last kind of five, six years.

05:08 Amazing.

05:08 Yeah, and it was that move, that pending move or partial move to Rust that actually was the basis of that forklift episode.

05:17 David, hello.

05:20 Hi.

05:21 Yeah, so I am based in London, like Samuel.

05:25 I work on a product called Kraken that is not anything to do with cryptocurrency or gin.

05:30 I think there's a gin.

05:32 Or Git clients.

05:34 It's actually, it came out of a company called Octopus Energy.

05:37 And it's, Octopus Energy is a renewable energy company.

05:41 And Kraken is basically a big Django monolith that we used at Octopus Energy.

05:47 And it worked really well.

05:49 And Octopus Energy over the last eight years has grown to be the second biggest energy company in the UK.

05:55 And so what we've been doing is using Kraken throughout the world in lots of different countries and lots of different energy companies.

06:04 I think the interesting thing about it is it's absolutely massive.

06:07 It's, I just counted it today, eight and a half million lines of code for one Django packet.

06:13 That is nuts.

06:13 I was going to have you elaborate because you said I work on a large Django monolith.

06:18 Like, it's really large.

06:20 That's amazing.

06:21 Sometimes I wonder, has there ever been a bigger one?

06:24 I don't know.

06:25 I should probably ask on Hacker News or something.

06:27 Yeah.

06:27 On the GitHub Notes.

06:28 It's big and it's also, there's quite a lot of people working on it at once.

06:32 There's, like I just looked at last week, there were over 400 authors of pull requests in one week, all merging stuff together onto the same Python packet.

06:42 And you'd think it wouldn't work.

06:44 I would have said it wouldn't work.

06:45 It turns out, actually, you can run an energy company on that basis.

06:49 You're a huge energy company.

06:51 You must have a ton of repositories.

06:52 Yeah, we got one.

06:53 There are some others, to be honest, but mostly it's one big one.

06:57 Can I ask a question?

06:58 Is that continuously deployed or how is that managed in terms of deploys?

07:03 Actually, each energy company, there are nearly 30 energy companies.

07:07 Each energy company gets their own separate installation.

07:10 And every time anyone merges, pretty much, we push it out to all of the energy companies.

07:16 You know, it's on the basis that really, if we break something, then we can fix it quickly as well.

07:22 And also, in the domain of energy, how often do you actually log in to your energy account?

07:29 You know, are you going to be on the phone saying, where's my bill?

07:32 Do you know what I mean?

07:33 We can get away with that kind of thing, which maybe we couldn't do that if it was, I don't know, a payments gateway or something like that.

07:39 Or if you built a package that shipped over PyPI to hundreds of millions of people.

07:45 Yeah.

07:45 We released Pydantic 2.10 yesterday, and we've definitely got quite a few issues back.

07:51 So we definitely break stuff too.

07:53 Don't, for a second thing, we don't break anything.

07:55 We don't mind the things that are unintentionally broken because we can go and fix them.

07:58 It's when we change an API.

08:00 That's the painful thing for us rather than actually, weirdly, bugs are in some sense, like, less severe than, like, getting the wrong API.

08:08 Yeah, that's got to be really tricky to say, you know, we need to add, we need to add a parameter to this function.

08:13 Or, you know, the whole one to two switch, you deprecated quite a few methods and stuff as well.

08:19 In some sense, deprecated methods and parameters are relatively easy because someone's, you know, it's relatively clear to see what's happened.

08:25 It's when it's, like, subtle changes in behavior.

08:27 Like, do you apply an, like, integer constraint before or after a function that wraps the integer validation?

08:35 Like, things like that, if you decide to change it because you're like, well, this is marginally more correct.

08:40 Is it worth making it marginally more correct for a lot of people spending a lot of time?

08:43 Very, very confused.

08:44 That can definitely generate some open issues on the issue tracker.

08:49 Also, the parsing.

08:50 I think you all changed the rules about the parsing, right?

08:52 One of the pieces of magic of Pydantic is you had a list and you said it was a list of integers, but it happened to be strings that could be converted to integers.

08:59 It would just become a list of integers.

09:01 But I feel like you changed the strictness or looseness of that behavior at one point, right?

09:06 Yeah, we made it a little bit more strict in places.

09:08 We said things like you can't coerce an integer to a string because basically the original semantics of Pydantic was like, we'll just call integer on the thing and see if it's an int.

09:18 And that mostly worked.

09:19 We'll call list on something and see if it's a list.

09:21 Problem is with string, you can call string on everything and then you'll get back a string, right?

09:25 So it's no longer a valid test of is it a string to call string on it.

09:28 What time did this happen?

09:29 It happened at angle bracket class datetime dot datetime dot.

09:34 And that's another thing because no one writes their unit tests for the weird edge cases of what's valid.

09:38 But then if you're a bank and one bank is sending you strings for a number or numbers for a string and you change it, like we change how it works.

09:45 That is that's problematic.

09:46 But luckily the banks don't pay us anything in sponsorship.

09:48 So I don't mind breaking it for banks as a rule.

09:50 These are the bootstrapped struggling small businesses.

09:54 They're not ready.

09:55 These banks to have enough money to, you know, support open source.

09:58 We're just trying to persuade them to use LogFire right now.

10:00 So I'll stop being rude about banks.

10:02 Yes.

10:02 No, of course.

10:03 It's well, it's a really tricky balance, right?

10:06 To get those folks to open their checkbooks and pay for it.

10:09 Right. They they don't have charity usually in their bylines or whatever they're supposed to be doing, right?

10:15 Like they but with something like LogFire, they can say, well, here's a service that we can use.

10:20 And by using it, we might be able to support Pydantic and the other things that are important to us.

10:25 So I think that's great.

10:27 You know, tell people just real quickly about LogFire out there.

10:30 Yeah, absolutely.

10:31 So LogFire is an observability platform built, obviously, by the team behind Pydantic.

10:36 And if you go to LogFire along the along the top, the I suppose the two things that make it different from some of the stuff that's come before is LogFire is built on open telemetry.

10:45 So on open standards that mean that, yeah, the rails of that where the data is being transferred are on an open standard.

10:50 And if you decide you didn't want to use the LogFire platform anymore, you can send that data to anything else that supports open telemetry.

10:56 But unlike lots of other companies in our space, instead of using open telemetry as an excuse to abandon the SDK space and just say use the horrible open telemetry SDK directly, we have the LogFire package, which tries to make that super nice and easy to use.

11:10 And then the other big change we have, I think there'll be an example maybe further on down the page.

11:15 Like we maybe maybe there isn't one right there, but we allow you to write SQL directly against your data.

11:20 So instead of having to use kind of ClickOps to go around and do surveys.

11:25 So if you look here, right, you can go and like write SQL to query your data.

11:29 And so there are although we're still early, there are things you can do in LogFire that you can't you cannot and never been able to do in like one of the big incumbents like Datadog because it's just SQL.

11:38 And it's obviously much easier to learn for you, much easier for LLMs to write.

11:43 So we have the like plausible chance in the future that you could basically chat with LogFire and say, what's wrong with my app?

11:49 And the agent can go off and run SQL to investigate things in a way that is much harder if you have your own dialect.

11:56 But allowing people to write SQL against data of the scale is a monumentally difficult challenge.

12:01 And one of the things we struggle with a lot, but we think it's useful enough for developers that we like put the time in.

12:07 So you can write any SQL you want, except for please only do the ones that have indexes.

12:11 So our new database is data fusion.

12:13 So there are no indexes.

12:14 But yeah, if you do, there are queries you can do now that just like eat memory and we have to find ways around them.

12:21 And that's actually the hardest bit is like whether intentionally or not, there are people, you know, there are SQL you can write, which is enormously like heavy to go and compute.

12:29 And so we have to be able to find ways to like process that without taking down other customers, I suppose, is the point.

12:35 You don't want to do a denial of service on yourself.

12:38 Right. I mean, the definition of DOS is that like the effort required to DOS is significantly lower than the effort required to process it.

12:45 Right. And writing SQL is the ultimate example of that.

12:47 Yeah, it absolutely is.

12:50 OK, well, that's a really interesting idea to have SQL there rather than like, let's just click around our UI until we get an answer and those sorts of things.

12:58 Or you've got some dashboards on that page.

12:59 But the idea is like, sure, we'll go and give you a nice dashboard for HTTP traffic and like response codes.

13:05 I think it's the next one down.

13:07 But if you want to go and edit that, like the point is that you can basically next one after that.

13:12 It's the point is you can go and customize that however you like by just editing the SQL if you so wish.

13:17 So it's like it's trying to same with alert.

13:19 So we have, I guess you would say century style alerting.

13:21 But again, that's all configurable because in the end, it's just SQL that you can go and edit.

13:25 Congratulations. I know many people were curious about when they heard that, hey, Pydantic starting a company, they have funding.

13:32 It's like, what are they going to exactly going to do with that?

13:35 And maybe a lot of people were worried just, hey, it now costs money to just use the library.

13:40 Right.

13:40 I don't think anyone ever thought it would have been a good idea to literally have like Pydantic Pro where you had to have some API key to install Pydantic.

13:47 But it could definitely have been what most open source companies do is they have like open source project as a service.

13:54 And then they start, if not taking out features, then adding new features to the paid version.

13:58 And I was super keen that Pydantic continued to be successful as an open source project.

14:03 And so we did the slightly weird thing about building something which is in a different space from what we're known for.

14:09 And that definitely has its challenges.

14:11 But I think it's overall, I think it's the right decision.

14:13 Awesome.

14:14 Well, congratulations.

14:15 Beautiful looking project service here and so on.

14:17 And I think maybe one of the big pieces of news here, and that's not the one, although that is awesome.

14:22 That's where we're going.

14:23 Is this jitter?

14:25 What is this jitter?

14:25 David, are you familiar with this jitter?

14:28 I've heard of it.

14:29 Okay.

14:29 Yeah.

14:30 Because parsing JSON is super important.

14:32 Tell us about this.

14:33 So parsing JSON is a big thing.

14:35 And when I first built Pydantic version 2 with the Rust core, we're using Serdi, which is the kind of default parsing library in Rust.

14:44 The problem is that Serdi wants to parse the entire JSON object before it returns you some representation of that.

14:51 And jitter came out of this idea that we could do parsing as we do validation.

14:57 So if you have, imagine you have a list of integers, you, instead of what you would traditionally do is you would parse the JSON,

15:03 you would allocate some vector of all of the values in that list, each of which would have to be itself a like enum of here are the different types you might get in JSON,

15:10 because you don't know what it's going to be up front.

15:11 And then once you finish doing that, you can then go and do validation on that.

15:14 And jitter, the idea is it's an iterative JSON parser, hence the name.

15:18 You can effectively get, if you think of it in Python parlance, as like an iterable that gives you back each individual element of the JSON as you go along.

15:27 The truth is right now inside Pydantic core, we're still using this JSON value, which is jitter's variant of doing the whole parsing first.

15:35 There are a few optimizations that we get to get away with.

15:38 So there are some really neat things in JSON.

15:40 Like if your strings, for example, John Doe, you're showing on the screen there, does not include any escape sequences like backslash N or Unicode sequences,

15:50 then you can parse a pointer to that range of the underlying JSON object as a string instead of having to allocate that string.

15:58 So we do some like clever optimizations like that.

16:00 But the plan in a future version of Pydantic, either as a opt-in feature in v2 or as v3, is to be able to do the iterative parsing.

16:09 What's crazy about jitter is, well, one, once we started work on this, we've actually got to the point where jitter is full on faster than CERDI in any use case,

16:17 even if you're not doing the iterative thing.

16:18 But also this iterative JSON parsing thing is exactly what you want when you want to allow people to query JSON in a way like JSONB in Postgres.

16:27 And so we went and used jitter to implement JSON querying inside DataFusion when we moved our database to DataFusion.

16:34 And it was just like very luckily, yeah, it happens to be exactly the right concept you need for like for querying JSON,

16:40 where you want to iterate over looking for the like string foo and then stop as soon as you find it.

16:45 So the code samples you got on the repo here, got a lot of semicolons in them.

16:49 Is there a, is it interoperable with Python as well?

16:52 Is it just a Rust level thing?

16:53 If you go up actually, it's a, it's a, this is a monorepo.

16:56 So if you go up into source, into crates, which is, so this is, you have a bunch of different crates in here.

17:02 But if you look at jitter Python, and I think if you go down, you'll see an example of calling jitter directly from Python.

17:07 Awesome.

17:08 The reason we released this as a Python package was a large AI company who I don't know if they wanted me to name them,

17:13 basically were using Pydantic V1 so heavily, but they needed some of the functionality of jitter.

17:18 And so they basically begged us to release this as a separate package so that they could,

17:22 they could use jitter themselves before they upgraded to Pydantic V2.

17:26 In fact, the OpenAI SDK uses jitter.

17:28 So I think that is public information who it might be.

17:33 This portion of Talk Python to Me is brought to you by Posit, the makers of Shiny, formerly RStudio, and especially Shiny for Python.

17:41 Let me ask you a question.

17:43 Are you building awesome things?

17:44 Of course you are.

17:45 You're a developer or a data scientist.

17:47 That's what we do.

17:48 And you should check out Posit Connect.

17:50 Posit Connect is a way for you to publish, share, and deploy all the data products that you're building using Python.

17:57 People ask me the same question all the time.

18:00 Michael, I have some cool data science project or notebook that I built.

18:03 How do I share it with my users, stakeholders, teammates?

18:06 Do I need to learn FastAPI or Flask or maybe Vue or React.js?

18:11 Hold on now.

18:12 Those are cool technologies, and I'm sure you'd benefit from them, but maybe stay focused on the data project?

18:17 Let Posit Connect handle that side of things.

18:20 With Posit Connect, you can rapidly and securely deploy the things you build in Python.

18:25 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, ports, dashboards, and APIs.

18:31 Posit Connect supports all of them.

18:33 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise requirements.

18:39 Make deployment the easiest step in your workflow with Posit Connect.

18:44 For a limited time, you can try Posit Connect for free for three months by going to talkpython.fm/posit.

18:50 That's talkpython.fm/P-O-S-I-T.

18:53 The link is in your podcast player show notes.

18:55 Thank you to the team at Posit for supporting Talk Python.

19:00 The other interesting thing that came out, which, again, I'm ashamed to say I had no idea would be useful, this idea of iteratively parsing JSON until you stop effectively turns out to be incredibly helpful in LLMs, where, as you see on screen here, you can basically parse an incomplete JSON string.

19:18 And obviously, because LLMs stream you the response, you can use this to effectively do validation as you receive structured responses.

19:28 Work on a JSON stream instead of a JSON response.

19:32 I've been surprised by how, like, the legs that this has had, which I wasn't expecting when we first started it.

19:37 But yeah, the nice bit is it's all the input, the actual JSON parsing is Rust, but then we have the logic to, yeah, basically access that from Python, both in this package and in pydantic-core.

19:47 Well, you sent me over to this cargo section, or the crate section, rather.

19:52 And looking here, I see the cargo.toml, pyproject.toml, some source.

19:58 I think this might be, in the source, we've got some Rust.

20:02 I think that this might be, David, a good way to start leading into working with Rust.

20:09 And, you know, this is kind of the destination of your whole presentation you gave at PyCon Italy, right?

20:13 We sort of alluded to it already.

20:16 But Python and Rust can interoperate, which is an amazing fact.

20:20 I think it's so easy to think, oh, well, within a particular application, you know, you're stuck with the programming language that you've picked.

20:27 But actually, Python was really designed originally, from what I understand, to be a kind of glue language between different C programs and things like that.

20:36 So actually, what you can do is you can use that design to use Rust to compile what are called extension modules, which are used in lots of other bits of Python, say, NumPy or something like that.

20:49 Or even maybe less obvious ones like SQLAlchemy and other places.

20:52 There's optional speedup C extension type things, right?

20:56 Yeah, absolutely.

20:57 And in the standard library, of course.

21:00 But you're not writing C or you're not writing, I think NumPy might even have Fortran.

21:04 You're writing Rust.

21:05 You're ending up with something that's the same as you might have done with a different lower level language.

21:12 And that allows you to use Python for all of the nice stuff like domain modeling and that thing that we're all familiar with.

21:20 And the reason we like Python is we can be very productive with it.

21:23 But just occasionally, you hit something with Python where it's basically slow.

21:28 There's a high level language there and those abstractions are coming at a cost.

21:32 You know, we talk about the GIL as well.

21:35 You know, the GIL gets in the way of using all of the CPUs on our machine.

21:39 But that's not the case for extension modules.

21:42 They can leave the GIL behind if you want.

21:44 So you can actually, by writing these extension modules, get more out of Python than you might have thought.

21:52 And Rust is a pretty new language that's learned a lot of lessons from older languages.

21:58 And it's, I think it's really lovely.

22:01 You feel like you're in the presence of greatness.

22:04 I want to ask both of you this question before we get too far down this path.

22:08 Python often is referred to as CPython because the runtime, the core is all built in C, not Python.

22:15 Right.

22:16 And the original interop story is around C extensions and CFFI and those kinds of things.

22:21 So why not just pick C for Pydantic or for your linter, David?

22:28 You know, everyone else is doing it.

22:30 They had been doing it.

22:31 Maybe sometimes it would be the right answer.

22:33 But Rust is offering something very different.

22:36 It's offering like a really nice developer ecosystem.

22:40 All the tooling is brilliant.

22:42 And also, I think security is a big thing.

22:46 So many security problems come from like mismanagement of pointers.

22:50 And Rust is designed so that the compiler won't let you make a certain kind of mistake that happens all over the place in C and C++.

23:01 I don't know if you can say more about that.

23:02 I'm putting it more simply than that.

23:03 I'm not clever enough to write C.

23:04 That's a void pointer pointer.

23:08 What are we doing here?

23:09 Good way of looking at it is that you can have the unsafe mode in Rust where you effectively lose those constraints.

23:15 And sometimes there are the occasional place where you need to use it.

23:18 When we do that inside Pydantic core, myself or David Hewitt, who's a much better Rust developer than I am, we agonize for genuinely minutes at a time over how exact, whether or not it is really safe to write that one line of code as unsafe.

23:32 And whether there was any possible case in which it could lead to problems.

23:35 If you write an app in C, every single line is unsafe.

23:39 And if you write a...

23:40 You're agonizing even over just a string, like a sprint F equivalent.

23:45 And it doesn't have to be 8.5 million lines of code for it to be incredibly hard to go through every single line of that.

23:51 So I forget there's like 40,000 lines of code in Pydantic core.

23:54 It is inconceivable that we could have written that in C and have anywhere near the confidence we have with it written in Rust that it was memory safe.

24:03 We have had one memory safety issue reported in Pydantic core.

24:08 And it was a side effect of something deep in PyO3, which in turn was a result of like greenlit doing something crazy that technically the guild didn't think you could do.

24:19 I mean, that's the level of complexity.

24:21 Another example is Jitter, which I showed you.

24:24 We released Jitter a year ago now.

24:27 It is the most downloaded JSON parser in Python other than the one in the standard library.

24:31 No one has ever reported a bug in it in terms of it parsing stuff incorrectly, except for impartial mode, I think.

24:37 Now, I said that someone will probably go and find one.

24:39 But like it has been...

24:40 That just wouldn't happen in Python.

24:41 You could not write...

24:42 That reliable...

24:44 One of the ways it achieves this is by like really stopping you doing all the stuff that you would expect to be able to do.

24:51 Whenever I sort of tell someone about Rust, I say, you know, if you have a string in Rust and you assign it to a variable, let's say X, and then you like store that X in Y.

25:03 So you do Y equals X.

25:04 Then you can't use X anymore because it's gone.

25:08 That seems very strange.

25:10 You know, and it seems strange maybe as Python developers because we don't actually hardly ever, maybe never need to think about memory safety.

25:18 You know, we've got that luxury.

25:19 But once you go down lower level, you are starting to.

25:22 And what's going on there with the kind of you can't have X and Y both pointing to that string.

25:29 I mean, you can if you copy that memory so that there's then separate bits of memory in X and Y.

25:34 But it's kind of making you think in detail about where is the memory for this program?

25:40 Ownership, right?

25:40 The ownership of who owns this.

25:42 I can give you another good practical example.

25:44 So I was saying earlier in the jitter case, we have this point where some strings, many strings, you can, instead of having to copy the whole of the string out of the chunk of data within the original input of JSON,

25:58 you can literally refer to that slice of data as a pointer.

26:02 Literally say this is this is the string itself.

26:04 You don't have to copy.

26:05 But that means the type you get back for when you parse a string in jitter is what is called a cow in Rust.

26:11 And it is either an owned string, e.g. when you've copied it, or it's a reference to the underlying chunk of the array.

26:17 And so that gets quite difficult when you're actually writing code with that because you can't go and access that string after you've validated, after you've parsed the next token, because in theory that could have now changed.

26:28 If you were writing that in C, you'd have to basically manually keep track of am I accessing that cow in the right place?

26:34 Whereas in Rust, well, you wouldn't have a cow in C, but you could presumably have some similar construct.

26:39 Rust basically takes care of that.

26:41 As soon as you go and do another piece of validation, it immediately says, no, no, you can't even access that previous cow because the lifetime of it has now been used when you've called into the parser to get the next token.

26:52 And so it's a very neat, it basically stops you from having to think about that.

26:56 And that means you can actually do not more unsafe, but more complex things, knowing that the borrow checker is going to save you in the end.

27:05 You can try and do something.

27:05 And if it doesn't work, the borrow checker will say it's not possible.

27:07 And if the borrow checker says it is possible, you're safe to access that reference to a string that you passed like 50 lines of code higher up.

27:14 It sounds like a paradigm shift, but also super valuable.

27:17 So I know primarily Python people are listening, so this word might not make any sense.

27:21 But is this a compile time check or is this a runtime check?

27:26 We don't really think about compiling much in Python.

27:28 We don't think about it, but you can think of import time when you do different stuff or you think about static typing when you go and run PyWrite over your code.

27:37 And that is very similar in some ways to compile time.

27:40 It's also one of the big disadvantages of Rust.

27:42 I mean, people think Rust is faster than Python.

27:44 In many cases, if you write your script, it's a lot faster to run it in Python than it is in Rust.

27:48 Because it'll take two seconds to run in Rust or in Python or a few hundred milliseconds to run in Python and 10 seconds to compile in Rust.

27:54 So it is not universally the case that the development cycle is faster in Rust.

28:00 It depends how many times you run the code, right?

28:02 If you're going to run it once or twice and you've got to create it to run it, well, you know, maybe it took you 10 minutes versus 20 minutes.

28:09 And it speaks to another of the great powers of the Python ecosystem and of PyPI that we, the maintainers of Python packages built in Rust, can take care of compiling that Rust once, distribute all of the binaries.

28:23 And then when people go and install it, we don't have to compile it.

28:25 If you were using most other, if you were using cargo, you would have to take care.

28:30 I mean, putting the controversy over Surdi to one side for a minute, you would have to take care of compiling that code yourself every time you wanted to go and go and use it.

28:38 Whereas ActionPyPI does an amazing job of distributing compiled Rust for virtually every ecosystem, every architecture, and it just working when you install it.

28:46 Absolutely.

28:47 Something I just wanted to say about the memory management, like it is a compile time, but actually it also, you can also use reference counting and things like that.

28:57 It's just opt-in.

28:58 So there are lots of tools for allowing you to do runtime checking of things.

29:04 And sometimes that's what you have to do.

29:05 You're like, oh, I can't really do this.

29:08 It's just using stuff that's worked out before the program even runs.

29:12 I do need to do it at runtime, but it's fine.

29:14 I can do that for this particular data type.

29:16 And then there's an API to do it and it will come with other trade-offs.

29:20 It's giving you that control, but it's giving you control over things which, if you've only ever programmed in Python, which to be honest, pretty much, I mean, I used to do PHP a bit.

29:29 My mind is soaked up in Python.

29:31 And coming to Rust, it's like, oh, this is a completely different way of thinking.

29:36 It's like the red pill in the matrix.

29:38 You're like, what is this?

29:40 Yeah.

29:41 It's nice, though.

29:42 It's hard.

29:44 I'm not going to lie.

29:45 I feel very confused a lot of the time.

29:47 I suppose this is a time for the obligatory.

29:50 The White House recommends the future software should be written in memory stuff languages such as Rust and Python.

29:56 So, right, that's always nice to have around to think about.

29:59 Whether or not that's an argument in favor of Rust or not, depending on your take on a particular White House, is another question altogether.

30:05 But it's definitely a data point.

30:06 I will draw your attention to the date of this.

30:09 I'm interested to see what the new leadership recommends next year.

30:16 We'll see if they're like, assembly all the way.

30:19 We've decided.

30:20 No, okay.

30:20 Let's carry on with your story, David.

30:24 So, you wrote this thing called an import linter, which is kind of unique.

30:29 There's lots of linters in Python, right?

30:31 There's PyFlake.

30:32 There's Ruff.

30:33 Another Rust success story, I suppose.

30:36 But this one checks more architecturally, right?

30:39 And you want to use this on your small little repo that you all are running over there.

30:43 Tell us about that story.

30:45 I wrote it for Kraken.

30:46 But actually, I had the idea for it in a previous company.

30:51 You know, it was still a complex monolith, but nowhere near as big.

30:55 And to be honest, I find even in quite small projects with just a handful of files, it's

31:00 still useful to think about the architecture of those files.

31:06 And I think, for me, architecture is ultimately about dependencies.

31:12 And something that Python lets you do, which I don't think is very good, is that you can

31:18 end up with lots of cross dependencies between different modules and packages within a particular

31:25 Python project.

31:26 So, for example, you know, I mean, we know that sometimes you get circular import errors,

31:30 but this is something a bit wider than that.

31:33 Like, possibly, if you've got two sibling sub packages, you might have one module which imports

31:40 from something in one, in the other sibling package.

31:44 And then there'd be, like, other modules that import the other way.

31:47 And it won't stop Python from loading up.

31:50 But there's a sort of, there's a circular dependency conceptually between those two packages.

31:55 They're tightly coupled.

31:56 And, you know, this is just one example of the kind of architectural rule you might want

32:01 to impose on a project to say, well, actually, I want there to be a dependency only one way.

32:07 I want one of those packages to be kind of lower level and one of them to be higher level.

32:11 Or maybe you'd have, like, five packages that are all siblings.

32:14 And, you know, you call this layer, really, but where you maybe, you know, say, yeah, this

32:19 can import all the ones below it, but nothing can import the ones above it.

32:23 And you sort of think of them in a sort of stack.

32:25 Here are the details.

32:26 Here's the public interface.

32:27 And then things use the public interface.

32:29 They shouldn't reach down below and vice versa.

32:32 Yeah.

32:32 That's certainly one thing you can do with it.

32:34 It's mostly focused on controlling dependencies between packages as a whole.

32:40 You can also do it for individual modules.

32:42 But so what it does is it uses imports as the proxy for a dependency.

32:47 So it will statically analyze all of the imports in your module.

32:53 And then it will build a directed graph.

32:55 So you can then think of your module, think of your package as, like, every module is a node and a graph with arrows pointing between all of them.

33:05 And the arrows are the imports.

33:06 Sometimes you call them edges or whatever.

33:09 What import linter allows you to do is to write contracts that are just in YAML or TOML, which just say, all right, these are the rules I want you to check when you run the linter.

33:20 And it will then you can put it in CI and it will fail if someone adds an import the wrong way.

33:25 And it's fascinating to see, like, if you have an idea for an architecture and then you put an import linter contract on and then discover all the places where you're breaking it, they're almost always quite interesting, interesting things.

33:37 And you're like, yeah, actually, this is a bit of a problem.

33:40 Like, it needs a bit of thought to kind of untangle it.

33:43 It is genuinely, you know, some conceptual tangling that would simplify things if we didn't do it.

33:48 Yeah, it's kind of a higher order code smell.

33:51 People are familiar with this idea from refactoring a code smell.

33:54 Like, it's not wrong in that it won't run, but you kind of turn your nose up at it.

33:58 You go, oh, right.

34:00 And this is that, but not at a, oh, it takes too many parameters.

34:03 But, like, why is it all tied together like this?

34:05 Can we think about it better, right?

34:06 Yeah, exactly.

34:09 This portion of Talk Python to Me is brought to you by the Data Citizens Dialogues podcast.

34:14 If you're ready for a deeper dive into the latest hot topics in data, you need to listen to the Data Citizens Dialogues podcast, brought to you by Colibra, the leader in data intelligence.

34:25 In every episode of Data Citizens Dialogues, industry leaders unpack data's impact on the world.

34:31 From big picture questions like AI governance and data sharing to more nuanced questions like, how do we balance offense and defense in data management?

34:40 You'll hear firsthand insights about the data conversations affecting all kinds of industries.

34:46 With guests sharing unique stories from some of the world's largest companies, such as Adobe, Fidelity, Deloitte, Hewlett-Packard, McDonald's, and even the United States Coast Guard, you'll get an amazing look inside how these organizations handle their data.

34:58 My favorite episode is Solving Data Discovery with a Self-Service Approach.

35:03 It's an interesting look inside creating a single source of truth for data inside an online university.

35:09 Check them out and try an episode for yourself at talkpython.fm/citizens.

35:13 That's talkpython.fm/citizens.

35:16 The link is in your podcast player's show notes.

35:18 Thank you to Data Citizens Dialogues podcast for supporting the show.

35:22 You know, and you can do things like, say, all of these modules are independent, say, so they don't import anything from each other.

35:29 I mean, this is like just my whole mentality for architecting Python packages, and I find it works really well.

35:36 And I think the fact that we've got these rules in place is probably a big reason why we're still able to have a thousand PRs a week all happening in different time zones without that many faults.

35:46 Because we've got, we've architected it.

35:49 So say we know that there aren't any imports between, say, two energy companies or between two territories.

35:55 You've got to install your competitors version as well to use yours.

36:00 It's allowing us to think about things as being more independent.

36:04 And so like someone can make a change in, say, Spain, and it's not going to break something in Australia because you're pretty confident because they aren't importing anything from there.

36:14 Does that mean, in theory, if you've got this, that you could use this within, for example, CI to only run unit tests, systems that have changed or direct descendants of that package?

36:25 Because I presume we do do that because we've got a lot of money on CI.

36:30 And so we've had to invest a lot of money to like figure out how do we narrow down the tests.

36:34 So, yeah.

36:35 Yeah, that's awesome.

36:36 That's a great observation.

36:37 This is all based on an underlying library called Grimp, which is, I kind of broke the libraries up.

36:43 It's a little bit like what you did, Samuel, with Pydantic and Pydantic Core.

36:47 So import linter is pure Python, but it has a dependency, which is Grimp, which is, I mean, it's got Python and Rust in at the moment.

36:55 But that is just a Python API for building a graph.

37:01 And then you can explore the graph.

37:02 And you can do things like it's really quite a useful tool if you're just interested to know something about code base.

37:09 You know, you can just type in, build me a graph of this package, say, and then you could say, tell me all of the downstream modules of this module.

37:19 And then it will explore the whole graph and like give you a set of modules or tell me all the upstream modules of this.

37:24 So you can tell what dependencies are or you can like do, you know, what's the shortest path between this module and this module.

37:31 Makes a lot of sense.

37:32 Yeah. Can you get it to do dependencies, the things you pip install?

37:37 Oh, I see what you mean.

37:37 Well, you can build the graph with multiple packages.

37:42 I don't even know if that's really, is that possible?

37:45 Can you have two installable PyPI packages that depend on each other?

37:50 I'm more wondering if I can use this to visualize, well, what parts of my app?

37:54 So I know that I have this dependency and I have to have it to run.

37:58 But what parts of my app are using that library?

38:01 So if I wanted to change it or whatever, you know what I mean?

38:04 You can build a tree from there's a package for it or UV will give you a tree of dependencies.

38:08 But that is based on what packages require what other packages to be installed.

38:12 It doesn't actually say what's being imported from where.

38:15 So if you have an issue with URL in three, what is the actual like graph of where URL in three is being imported rather than what packages depend on it?

38:24 There's a flag when you build the graph, which is include external packages.

38:29 And if you pass it, then it will include them.

38:32 But it doesn't include them.

38:33 It just like includes the root name.

38:35 If you wanted the whole thing, then you'd say, build me the graph of, I mean, that would work as well.

38:39 But you could say, build me the graph of Django and Kraken, whatever.

38:45 Or you could say, build me the graph of Kraken with external packages.

38:49 Then you'd just see like Django would just be a node in the graph, but it would just be one node.

38:54 Yeah, that'd be great.

38:55 Does that make sense?

38:56 The story is this was working, but slow.

38:59 As an example, one of our contracts, and we had lots of contracts, one contract was taking six minutes to check.

39:05 The reason for this is because actually it is quite algorithmically expensive to check whether or not certain rules are being followed.

39:14 Because it's not like just doing a kind of search.

39:17 It's actually looking for indirect imports.

39:20 So it kind of needs to explore the whole graph to see.

39:24 It's like a pathfinding algorithm to sort of see, oh, well, do you end up here via this other thing that you didn't ask about?

39:30 And yeah, so certain things were taking a really long time.

39:33 And that was costing us money because we run this, you know, hundreds, if not thousands of times a day.

39:40 And it all adds up.

39:41 And it also just slows down, you know, your pull request.

39:44 You want to get to green quicker.

39:46 So I had heard that Rust was interoperable with Python.

39:51 I didn't really know anything about it.

39:53 And I found this library called Py03, which is a Rust library.

39:59 And it makes it, I'm not going to say easy, but it makes it.

40:04 Easier.

40:04 It makes it surprisingly easy to write Rust extension modules.

40:09 It gives you sort of all the tooling in place.

40:11 And I would say, you know, you're writing a lot less code than you would be if you're writing a C extension module.

40:16 Because it's sort of a nice, it's a really nice API for creating these interoperable compiled modules.

40:23 So like I started by just finding the function that was taking all the time and had millions and millions and millions of calls.

40:31 I was like, why don't I just write that in Rust?

40:34 And I kind of just sort of almost copied and pasted it over and like cobbled together some pretty rubbish Rust that did basically exactly the same algorithm.

40:45 And it was a thousand times slower.

40:47 Well done.

40:49 Well done.

40:50 There's a good reason for that, which is I just zoomed in too close.

40:54 So because that function was called a lot of times, there is actually a cost in crossing the boundary.

41:00 So I just had to step one level up and like wrap all of those calls for that function in Rust instead.

41:06 So there's only one call from Python to Rust.

41:09 And then suddenly, majorly quicker.

41:12 I mean, I think that that particular problematic contract went from about six minutes to one minute.

41:18 That is, to be honest, and that was brilliant.

41:21 But that is not.

41:22 That's just scratching the surface of how much quicker it could get.

41:25 But still, a six times speed up for relatively little work and basically just writing the same algorithm, but with some curly brackets and a bit of head scratching.

41:35 Like I was able to deliver something without having ever done anything in Rust.

41:40 And it really made a difference, you know, and saved us lots of money.

41:44 Yeah.

41:45 So that was a risk.

41:46 And now you're a Rust developer as well.

41:48 Now I'm a Rust developer and I read through it.

41:49 And I'm, you know, I want to move more of Grimp into Rust.

41:52 Another example, even less work, was we had a problem with translations.

41:57 So there's a kind of standard for doing translations called Fluent.

42:01 And some of the listeners may have come across it.

42:04 And there are libraries for it in all sorts of different languages, including Python and Rust.

42:09 And we realized that it was responsible for almost all of the bootstrap time of our application in production was loading and scanning these translation files.

42:22 You know, it was pretty problematic.

42:24 It was like really spiky as well.

42:26 You could see sometimes it would spike to like nine minutes to turn on the application.

42:30 And we knew it was all in this translations thing.

42:33 And someone pointed out, look, there's a Rust library for this.

42:36 So all we need to do is just link them up.

42:39 And so they, it wasn't, it wasn't actually me, but they just wrote this Py03 crate that a crate is like a Rust library.

42:46 You know, probably only about 10, 15 lines of code just glued the two things together.

42:52 It was really like not very much work.

42:55 And compile that, it just completely sorted the problem out.

42:59 We went from eight minutes to 30 seconds.

43:02 And, you know, of that, like the, the, you know, that 30 seconds is other stuff.

43:06 You know, I think, I think it turned it from being something that took several minutes to being virtually instant for really hardly any work.

43:13 And I think that that's a really good example of where, as Python developers, we should be aware this is an option.

43:18 If something's slow, you might just, I mean, Rust has so many good libraries.

43:24 You might just be able to like just glue it up and job done.

43:27 One thing I would add, the, the overhead of calling into Py03 has, has dropped a lot.

43:32 I don't know how long ago that was you, you were trying, but I think it's, yeah, reduced significantly.

43:37 So, I mean, I did this quite a lot of the same work in Pydantic Core to avoid the overhead of calling into and out of it lots of times.

43:43 And I think that's less of a thing now than it used to be.

43:45 That's good to hear.

43:46 One point I want to make is, I think that's really interesting, David, to say that, hey, there's this whole equivalent of PyPI called Cargo, or you can get these libraries just like we can, I mean, sorry, that you install with Cargo, that you can get all these pre-built, pre-tested libraries and maybe just put a wrapper on them and do some amazing stuff.

44:06 I just started using this web server at the Python layer called Granian.

44:11 And you're like, well, how many people are working on it?

44:14 It's got up to 3,000 stars now.

44:16 Is that enough to trust?

44:17 Yes, but if you really look at it, it's really using just Hyper, right?

44:22 And Hyper, Hyper is a library that, excuse me, that has got 14,000 GitHub stars, 400 contributors.

44:29 And like, oh, you know, it's kind of this cool application is something of a wrapper around this really popular and well-known thing, right?

44:38 I think we'll see more of this kind of stuff.

44:39 Giovanni, who worked on this, is also working on, it's R-loop, basically an alternative to UV-loop.

44:45 Yes.

44:46 I don't know if that's in here as well.

44:48 I maintain watch files, which is the file watching library used by uvicorn and some other things.

44:55 And that is, again, wrapping a Rust library for getting fast system notifications.

45:00 And also R-toml, which is the fastest tomml parser in Python, which is, again, just wrapping the Rust library.

45:06 So there are a number of places where you can get enormous performance improvements and indeed like fundamentally more reliable libraries because you're building in Rust.

45:15 when you're doing complex, particularly multi-threaded things by relying on Rust.

45:20 These libraries seem to be very good quality and well thought through.

45:23 That's been my experience.

45:24 Somehow it attracts people that are very thorough because you can't really program in Rust unless you're thorough because you can't get an extra pile.

45:31 I think it's hard to say Rust does have a bit of a problem with abandoned libraries.

45:35 I think that is, I think if we didn't call it out, I know David, you and I were speaking at an event where there was one particular chap who had a bee in his bonnet about that exact issue.

45:43 I think he was slightly overblown on it.

45:45 But there is definitely an issue with some abandoned libraries in Rust.

45:49 But like, I mean, same is true in Python in its own place.

45:52 But you're right.

45:53 There are also libraries that are abandoned and remaining incredibly good quality.

45:57 So Jitter uses a library for parsing floats.

46:00 I didn't know this, but the complexity of parsing floats from strings is a entire subject of academic interest.

46:06 There are eight different algorithms for doing it with different performance, depending on whether or not it is the structure of the float.

46:12 And there was a library that does this very well.

46:14 And the library was abandoned, but it worked perfectly.

46:17 And eventually the guy replied to my email and went and fixed it.

46:19 So, I mean, the quality, yeah, the quality is very high in my experience.

46:22 Yeah, I just make the point.

46:23 I think, you know, there's over half a million packages in PyPI as well.

46:27 And I'm sure there's a non-trivial amount of them that people are no longer maintaining as well.

46:31 Sometimes things are done and sometimes they're abandoned.

46:33 And it's hard to tell the difference, you know?

46:35 Like, no, I haven't updated it in two years because it's done.

46:38 But also, also.

46:40 It's hard to tell the difference until you're three weeks in and you don't want to back out of using that library and you realize that actually it's...

46:46 It's actually not done.

46:46 It's not done.

46:47 But nobody wants to worry.

46:49 All right, David, before we wrap this up, I would really like to have you talk us through sort of the tool chain workflow of...

46:56 I've got this idea.

46:58 I want to maybe, you know, rewrite it in Rust if you've spoken about it.

47:02 What are the building blocks?

47:03 What are the moving pieces?

47:04 What do people need to know?

47:05 Absolutely.

47:05 Okay.

47:06 So the things you need to know is, first of all, the thing you're going to create is an extension module, which is like a built compiled thing.

47:16 And that is going to end up being a Python wheel.

47:20 And what you want to do is when you release something on PyPI, you want to have wheels for all the different versions of Python and chip architectures and things.

47:28 So that's kind of like some of the complexity.

47:31 So that's kind of the end goal.

47:33 We want to end up with these built versions of Python that are going to contain some Python code, but also some compiled things for, you know, macOS under Python 312 or whatever like that.

47:44 So the way you start this, you need to install Rust on your computer.

47:48 And the tool for that is actually called RustUp.

47:52 This kind of confused me to start with.

47:54 I didn't really understand what RustUp is.

47:55 But RustUp is kind of like one level up.

47:58 And it's maybe a bit like PyPI if people have used that.

48:02 But it sort of does a bit more than that.

48:04 Maybe also a bit like UV these days where you can UV install Python based on what you asked for or just create a virtual environment and it'll just grab the tools you need to make that happen.

48:13 I'd say that Cargo is closer to UV probably, which is the next thing I'll talk about.

48:17 But RustUp is like one step up and it's like it's for managing the versions of Rust that are on your computer and the versions of Cargo that are on your computer, which are Cargo is the package manager for Rust.

48:30 What you'll do, your first step is to go to the Rust website and install RustUp.

48:35 And then hopefully at the command line, you'll be able to type rustc and then you can compile a Rust file if you want to.

48:40 But you don't actually need to do that because that's what you use Cargo for.

48:43 So your main kind of thing that you're typing all the time when you're using Rust is Cargo and then some kind of command.

48:50 So you've installed RustUp and you've got Cargo.

48:53 Then what you'd want to do is create a project and you can create a project using Cargo.

48:59 You could do Cargo new and then it'll give you a whole file system structure and you'll be able to run it and run tests and all that.

49:05 Very nice.

49:06 And that is how you would install the latest version of Py03.

49:10 So you would, I probably would just go over onto crates.io, which is the PyPI equivalent,

49:17 and look at what the version is of PyPI and then write it in a, I think there is a Cargo command probably for adding it.

49:24 But it's also worth saying that Maturin is a big part of the ecosystem.

49:29 Jono, David, maybe you could introduce Maturin because I think that's about where that comes in.

49:33 Cargo is the thing that Rust people will be using to compile these things.

49:37 But you can't just use that on your own because you need the whole pip installing kind of side of things.

49:43 So you can use Cargo and just create an extension module, but then you need to like give it the right name and copy it into the right place.

49:49 So there's a bit of kind of gluing that needs to happen.

49:53 And that is what Maturin is for.

49:55 Why do we need two tools?

49:57 Well, Cargo is the Rust side of things, but Maturin is actually just a pip installable Python package.

50:03 So what you'll do is in your, in the Python project that you want to Rustify, you will do, go into your, activate your virtual environment or whatever, and then do pip install Maturin.

50:15 And then you've got this tool for Maturin.

50:18 And you could even do Maturin new project or something like that.

50:21 I can't remember what the command is.

50:22 What I had to do was I had a pre-existing project, so I couldn't just do it like that.

50:27 But it might be quite a nice way to learn is to follow the Maturin docs to do a new project.

50:33 And what Maturin allows you to do is I think you might edit your Rust code and then you would type Maturin develop.

50:41 And then it would compile your Rust code, but it would kind of install it locally so that you can work with it.

50:48 And then the final piece of this picture is how that gets onto PyPI.

50:52 Maturin also gives you some tools for that.

50:54 There's a command called generate CI, I think, which gives you, it just spits out YAML for the, for whatever CI provider you want.

51:03 And that's really helpful as well.

51:04 I mean, that's a bit about verbally.

51:07 I've given a talk of people.

51:10 Yeah, well, I think one of the things you both have to deal with here is releasing a package, a wheel for this is not just, well, let's zip up the Python files and put them up there, right?

51:21 You're compiling native code and that means you need variations, right?

51:25 Yeah, I think Pydantic core releases 60 something different wheels to cover all different possible combinations of Python version and architecture.

51:33 I would basically second what David was saying.

51:35 Maturin in particular does an amazing job of smoothing out the kind of rough edges between Rust and Python.

51:41 Rust has had a good, relatively good story on package management for some time.

51:45 Python is just coming, but often when you're trying to do these things, it's the, like, trying to get one to speak to the other and working out what the hell is going wrong and what does that file need to be or that, like, sim link between some DLL and some other place.

51:57 Maturin effectively gets rid of that whole challenge for you and just works and lets you work on it like you're, it's as easy as writing Python or Rust interacting with the two.

52:07 Yeah, because cargo and Rust C will output Rust conventions, but Python wants C extension conventions, naming like .so versus Dynelib and things like that, right?

52:18 And getting the inputs right and yeah.

52:19 So what your work, once you've sort of set it all up, your workflow looks like is if you haven't changed any of the Rust, then, you know, maybe like you run tests, you just run pytest or whatever.

52:29 That will run tests against the compiled bit, assuming you're writing Python tests against that.

52:36 But if you make a change to the Rust, then you do have to have an extra step of building it.

52:40 So you type Maturin, develop, and then it would build it.

52:43 And then you could run pytest.

52:45 Or you might recommend you do, actually, if you're working with Rust and you're not very experienced.

52:50 I tend to have a test suite for Python, but also some tests in Rust because you get a much quicker feedback loop because you can type cargo test and that will compile it and run the test really quite quickly.

53:03 One other thing I think is worth adding is that, obviously, if you have this binary that contains your Rust code, which is, you know, in the end, it becomes a module that you can import in Python.

53:11 But by default, that is obviously opaque to type checkers.

53:15 So things like IDEs and static type checking can't look inside there and see what your functions are.

53:19 So you want to write a PYI file, which contains your definitions in like stub Python code, effectively.

53:26 The problem you then have is you end up with two separate definitions of what your functions have in them.

53:31 And so there's a really neat tool, which I didn't know about until quite recently, but we use it in Pydantic Core, which mypy can basically type check that the PYI file stubs match the definitions inside the DLL.

53:44 And so you can have like guarantees that where you've written in a PYI file, my function foobar takes A as a string and C as a list of bytes.

53:54 That's really cool.

53:55 I didn't know about that.

53:55 Yeah, that's awesome.

53:56 Fixes all together.

53:57 Is it a mypy plugin?

53:58 If you have a look at Pydantic Core, we run it in CI or in maybe even pre-commit so you can find it there.

54:04 I couldn't remember.

54:05 I discovered it because like so often with so many of us, I discovered it because it had a bug.

54:12 And so it suddenly didn't do quite the right thing for me.

54:14 But I mean, in general, it's been perfect until now.

54:16 Yeah, it's plumbing until it doesn't work and then it's a flood.

54:18 Bug-based discovery.

54:19 Definitely a thing we've all practiced.

54:21 Bug-driven development.

54:23 Yeah, okay.

54:24 And if people want to check out these PYI files, there's a project called TypeShed on GitHub that whose job is to basically become these so-called stub files for a ridiculous number of projects that, you know, we're talking about things that are not going to get upgraded.

54:38 Maybe they're never going to get typing put on them, but you can go and import one of these or whatever and get.

54:44 But also lots of packages now have their own, either they have types in them or they have a PYI file that defines the types either in the Python code or in PYI.

54:52 That's really cool.

54:53 I didn't realize about the tool there, though, that does the integration.

54:56 That's awesome.

54:57 I think presumably you can use it to generate your first PYI file.

55:00 Then you go in and put your doc strings in, tweak things a bit, and then you run it in test mode.

55:04 Oh, nice.

55:04 Yeah, because that way use the true version, the Rust version, and just say generate what I need to make this work.

55:11 That makes a lot of sense.

55:11 Yeah.

55:12 Awesome.

55:12 All right.

55:12 Well, gentlemen, we are just pretty much out of time.

55:16 I guess let me ask you one more thing here.

55:20 And I don't think I pulled it up.

55:21 So there's a new PEP called external wheel hosting.

55:24 Have you all heard of this?

55:25 I have not seen that particular.

55:27 Is that for Wasm?

55:28 I know that I've been talking to a fair bit about the challenges of uploading Wasm wheels.

55:35 I know it has to do with more than it is five.

55:38 Here we go.

55:39 Copy link.

55:40 It is five, seven, five, nine.

55:43 And the idea that if you could, the nomenclature here is ridiculous.

55:47 It is in clever.

55:50 So the idea is that, you know, each build of Pydantic puts 60 binary artifacts onto PYPI.

55:57 And there's limitations on how large your projects can be and how large individual releases can be.

56:05 This is especially problematic for machine learning stuff, right?

56:09 And so the idea is, can we create a wheel stub?

56:11 And what is a wheel stub called?

56:13 It's like the wheel without the content.

56:15 It's called a rim because wheels go on, you know, and there's all sorts of stuff like that here.

56:20 But it contains a hash and then a location where the thing actually lives, right?

56:26 So you mentioned OpenAI, for example.

56:29 Maybe they have some huge thing they want you to download eventually.

56:32 They could host it and you just publish the rim, not the wheel.

56:36 Do you know the story of why they're called wheels?

56:38 This entertains me a lot.

56:40 No, tell me.

56:40 So they're called wheels because you have wheels of cheese.

56:43 They're called wheels of cheese because the original PYPI was called the cheese shop,

56:48 which in turn was called the cheese shop because there is a Monty Python sketch called the cheese shop

56:54 where he goes in to buy cheese and none of the cheeses he asked for are available.

56:58 And it was called that as a way of taking the piss out of, I forget which other language,

57:04 it might've been Ruby or PHP, how their original package registry had no packages in it like the cheese shops get in Monty Python.

57:11 And that has now assisted to rims in 2024.

57:16 That's incredible.

57:17 Yeah.

57:18 I did know the cheese shop equivalent there that that was part of it, but I didn't realize that it was the wheels of cheese.

57:25 Okay.

57:25 Incredible.

57:26 There's a lot of Monty Python.

57:28 You know, people that you look around everywhere.

57:30 The logo is the snake, right?

57:32 But the logo, the name is not Python, the snake.

57:35 It's Monty Python and that horrible bunny.

57:38 If I wrote a book about Python, I'd want to have the knights who say me on the front.

57:42 Yes.

57:44 I want the bunny, the killer bunny.

57:47 It's definitely a certain era of humor.

57:49 I don't know.

57:49 You know, it definitely dates Python in its way.

57:51 And us, perhaps I fear.

57:52 Yeah, perhaps, perhaps.

57:55 I don't know what this gray hair is about, but we're going to go.

57:58 All right, guys, let's close this out.

58:00 I'll give each of you a chance to just give us sort of a parting thought on integrating Rust and maybe thoughts and just Rust and Python together.

58:08 I'll go, David.

58:09 Only really to say I am a better programmer for having learned to write Rust as well as Python.

58:13 And I fail to see the value in C anymore for new projects.

58:19 I get why I exist for lots of existing ones.

58:21 I'm not going to get into the issue of whether stuff should be rewritten in Rust.

58:24 But I think if you're starting from scratch and you're trying to write something high performance, the experience of doing it in Rust is completely different.

58:30 from trying to do it in C or C++ or C# or any of the other or Fortran or Julia or any of those other languages.

58:35 Rust is awesome.

58:36 So I'd really encourage people to give it a go.

58:39 And when you're giving it a go, you might also have a give Logfire a quick go.

58:43 We're releasing our Rust SDK fairly soon.

58:45 We have Python already.

58:45 So we're matching the like Python Rust TypeScript ecosystem, which I think is the like the stack to build with today.

58:52 My parting thought would be like, if you give it a go, don't be surprised if you feel quite inadequate.

58:59 It's really quite hard to sort of get your head around what's going on.

59:03 And I feel very much still at the beginning of my journey.

59:06 But nevertheless, I have actually managed to like deliver some stuff which is valuable, even though, to be honest, I'm not like particularly proud of the code I'm writing.

59:17 It's like, you know, I'm a beginner again.

59:18 And I just want to say, like, don't let that put you off.

59:21 And if you're confused, you know, just keep on going and trying to get your head around it.

59:26 It will make you a better programmer.

59:28 The book that I found the best for learning Rust is the Rust Programming Language, which is a free book on the Rust website.

59:36 Right.

59:36 Just click the learn button in the top, right?

59:39 Yeah, exactly.

59:39 Learn menu item.

59:40 Yeah.

59:40 Awesome.

59:41 Yeah, I'll also just add one quick thought as well to just follow up to what you said, David, is whenever you're switching programming languages and you've been programming for a while, you just, you feel inadequate and you feel like I was so good.

59:54 I had it figured out.

59:55 I could just sit down and do stuff.

59:56 And now even how do I just read a file or just run a program?

01:00:00 I'm lost all over again.

01:00:02 But every time you do that, your prior experience still carries over way more than it initially feels like it does.

01:00:07 And you're not throwing everything away and starting over.

01:00:10 You're learning a new tool chain and then on you go.

01:00:13 Absolutely.

01:00:14 And particularly when it's so easy to call Python from Rust and vice versa.

01:00:17 And so you can build applications that are like hybrid of the two.

01:00:20 Absolutely.

01:00:21 Yeah.

01:00:21 Well, congratulations both on some awesome projects.

01:00:25 And yeah, thanks.

01:00:26 Thanks for being here as well.

01:00:27 Bye, y'all.

01:00:27 Thanks for having us.

01:00:28 Thank you so much.

01:00:30 This has been another episode of Talk Python to Me.

01:00:33 Thank you to our sponsors.

01:00:34 Be sure to check out what they're offering.

01:00:36 It really helps support the show.

01:00:37 This episode is sponsored by Posit Connect from the makers of Shiny.

01:00:42 Publish, share, and deploy all of your data projects that you're creating using Python.

01:00:47 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.

01:00:53 Posit Connect supports all of them.

01:00:55 Try Posit Connect for free by going to talkpython.fm/posit.

01:01:00 P-O-S-I-T.

01:01:01 This episode is brought to you by the Data Citizens Dialogues podcast from Colibra.

01:01:06 If you're ready for a deeper dive into the latest hot topics and data, listen to an episode at talkpython.fm/citizens.

01:01:13 Want to level up your Python?

01:01:15 We have one of the largest catalogs of Python video courses over at Talk Python.

01:01:19 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:01:24 And best of all, there's not a subscription in sight.

01:01:26 Check it out for yourself at training.talkpython.fm.

01:01:29 Be sure to subscribe to the show.

01:01:31 Open your favorite podcast app and search for Python.

01:01:34 We should be right at the top.

01:01:36 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:01:45 We're live streaming most of our recordings these days.

01:01:48 If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:01:56 This is your host, Michael Kennedy.

01:01:58 Thanks so much for listening.

01:01:59 I really appreciate it.

01:02:00 Now get out there and write some Python code.

01:02:02 I'll see you next time.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon