Learn Python with Talk Python's 270 hours of courses

#347: Cinder - Specialized Python that Flies Transcript

Recorded on Monday, Nov 29, 2021.

00:00 The team at Instagram dropped the performance bomb on the Python world when they open-sourced

00:04 Cinder, their performance-oriented fork of CPython.

00:08 It contains a number of performance optimizations, including bytecode inline caching, eager evaluation

00:14 of coroutines, a method-at-a-time JIT, and an experimental bytecode compiler that uses

00:20 type annotations to emit type-specialized bytecode that performs better in the JIT.

00:25 While it's not a general-purpose runtime we can all pick up and use, it contains many

00:29 powerful features and optimizations that make their way back to mainline Python.

00:34 We welcome Dino Velen to the show to dive into Cinder.

00:37 This is Talk Python to Me, episode 347, recorded November 29th, 2021.

00:57 Welcome to Talk Python to Me, a weekly podcast on Python.

01:00 This is your host, Michael Kennedy.

01:02 Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past

01:06 episodes at talkpython.fm.

01:08 And follow the show on Twitter via at Talk Python.

01:11 We've started streaming most of our episodes live on YouTube.

01:14 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming

01:20 shows and be part of that episode.

01:23 This episode is brought to you by Sentry and TopTal.

01:26 Please check out what they're offering during their segments.

01:28 It really helps support the show.

01:30 Dino, welcome to Talk Python to Me.

01:33 Hi, Michael.

01:34 Thanks for having me.

01:34 I'm really excited to talk to you.

01:36 You've been involved in a lot of projects that I've wanted to talk to you about over the years

01:41 and haven't yet.

01:42 So we're going to touch on a couple of those.

01:44 But we've got some really big news around Sender and some performance stuff that you all over at

01:51 Instagram are doing to try to make Python faster.

01:53 You did a really cool Python keynote, or not keynote, but talk on that.

01:56 So we're going to dive deep into this alternate reality runtime of Cpython called Sender that you

02:03 all have created.

02:04 That's going to be a lot of fun.

02:05 Yeah.

02:05 And it's a slightly alternate reality.

02:07 It's not that much of an alternate reality.

02:09 Just a little bit.

02:10 Before we do, though, let's just hear your story.

02:13 How'd you get into programming in Python?

02:15 I started programming when I was a teenager.

02:18 I got into computers initially really through BBSs.

02:22 Oh, yes.

02:23 Maybe.

02:24 Was this pre-internet?

02:25 This was pre- this is like dial up only.

02:28 You would dial into the BBS?

02:30 Oh, my gosh.

02:31 Yeah.

02:31 Like, you know, I had a modem.

02:33 Someone else had a modem sitting in their home waiting for people to call in.

02:37 You'd log in, send emails.

02:39 Post messages.

02:40 Take your turns on games.

02:43 Log out.

02:44 And someone else could log in and respond to your emails.

02:47 It was so amazing.

02:48 And send email meant wait for another BBS to dial in to connect to that one to like sync

02:54 its local batch of emails.

02:55 It was like peer-to-peer emails.

02:57 So weird.

02:58 Yeah.

02:58 I mean, or like there's a lot of local emails, right?

03:01 Where it's just on.

03:02 Right.

03:02 You're waiting for the other person to have a chance to log in.

03:05 But yeah, there's also that network, like a couple of different big networks.

03:10 It was such a different time.

03:12 It was such a different time.

03:13 I was not super into this as much.

03:16 My brother was really into it.

03:18 We had two phone lines so that we could do more of this.

03:21 Did you ever play Trade Wars or any of the games that were on there?

03:24 Trade Wars was awesome.

03:25 It was so good.

03:27 I think I would still enjoy Trade Wars.

03:28 It was so good.

03:29 I was still playing Trade Wars in college.

03:31 We like formed teams and like we're trying to take over some Trade Wars that was available

03:37 over the internet, actually, that you could like Telnet into and play.

03:41 And a lot of this BBS stuff had sort of found a home over Telnet for a while, hadn't it?

03:47 Yeah.

03:47 Yeah.

03:47 I think the main BBS software that I used there was used in St. Louis where I grew up,

03:54 which was World War IV with, I think it's still around and like available for you to

04:01 like, if you really want to host it on some internet server, but who's going to do that?

04:07 Incredible.

04:08 Okay.

04:09 So how's the BBS story fit into the programming side of things?

04:12 The BBS software that kind of was really popular, you could get a license to it for 50 bucks

04:19 and you got a source code for it along with it.

04:22 And there's a big active modding community.

04:25 And so, you know, I started off like taking people's mods and applying them and then trying

04:30 to make my own mods and like just ended up teaching myself C and initially very poorly

04:38 taught myself C, but then, you know, finally got good at this at some point.

04:43 How fun.

04:44 Yeah.

04:44 Did other people use your mods or were you running your own BBS or anything like that?

04:49 Where'd this surface?

04:50 I did a really bad job at running my own BBS.

04:53 I petitioned my parents for a second phone line, but I also wanted to use it for phone calls.

04:59 So to call my BBS, we had to like dial in.

05:01 And then like I had this device where you could punch four extra codes and it would connect you

05:06 to the modem.

05:07 So that was kind of annoying and didn't make it the world's most popular BBS.

05:11 And it was rather short web.

05:13 Heard some of the automation.

05:14 Yeah.

05:14 Yeah.

05:15 But I've published my mods.

05:17 My friends ran BBSs.

05:19 They pick up some of the mods.

05:21 I don't know that I was the most popular modder out there.

05:24 I should go and see if I could find them.

05:26 That might be terrifying though.

05:27 Yeah, that might be terrifying, but it could also be amazing.

05:30 Let's wrap up the BBS side of things with putting some bookends on the timeframe here.

05:35 What was the beginning baud rate and end baud rate of your BBS time?

05:39 4,400 to 57.6K.

05:43 Yeah.

05:44 So you took it all the way to the end there, but 2,400 probably meant it didn't require

05:48 putting the phone on to a device like the war games.

05:53 Never that bad.

05:56 Fantastic.

05:57 All right.

05:58 How about Python?

05:59 How'd you get to that?

06:00 I got into Python in a very weird way because I started working on a Python implementation.

06:04 Having really never touched or used Python before.

06:09 Obviously, I'd heard about it and it's kind of like significant white space.

06:12 That sounds weird.

06:13 And, but ended up really loving working on it on Iron Python, really loving the language

06:21 and the way it was designed.

06:23 It's a very, it gave me a very weird outlook on Python, I think, just because, you know,

06:30 I knew all sorts of weird corner cases about Python and the language and all the details

06:36 there.

06:37 But then didn't really know much about libraries and things like that.

06:41 And to some extent that continues today, but I get to write a lot more Python code today

06:48 too.

06:48 Sure.

06:48 But like always having been on the implementation side is a little strange.

06:53 It is strange.

06:55 And it is, I guess it would be a weird way to get to know the language.

06:58 So I feel like one of the real big powers of Python is that you can be really effective

07:03 with it with a super partial understanding.

07:06 Like you could have literally no idea how to create a function and you could still do useful

07:10 things with Python.

07:11 Whereas if you're going to jump in and create Iron Python, which we'll talk about in a second,

07:18 you have to start out, what are these meta classes and how do I best implement dynamic,

07:23 you know, dynamic objects and all this stuff.

07:26 That's like the opposite of starting with a partial understanding.

07:29 Well, how do imports work?

07:33 That was a big thing.

07:34 Yeah, yeah, yeah.

07:34 I remember when I learned about that, I'm like, wait, this is like running code.

07:38 It's not like an include file or a statically linked file or, you know, adding a reference

07:45 in .NET or something like that.

07:47 It's nope.

07:48 It just runs whatever's in the script.

07:50 And it happens to be most of the time it defines behaviors, but it doesn't have to.

07:53 Yeah.

07:53 And like, how do you pick what's going to get imported?

07:57 And yeah, the semantics there are so complicated.

08:03 Yeah.

08:03 There are some oddities of Python, but in general, it seems to be working well for people.

08:09 But I can see as implementing it, it could, you know, you could definitely be pulling some

08:13 hair out.

08:13 And I mean, so many things implemented are just, they're super safe.

08:17 Like they make a lot of sense.

08:19 There's just some weird corner cases that you run into that are, it's like, what's going on

08:26 here?

08:26 Yeah.

08:26 When I worked on Iron Python, we couldn't look at the source code that's like Python,

08:30 which made things really interesting.

08:32 Okay.

08:33 Because this predates .NET being open source and all that kind of stuff, right?

08:38 Yeah.

08:38 And so you don't want to be poisoned, poisoned by the ideas.

08:42 Yeah.

08:43 Okay.

08:44 How interesting.

08:44 iPhone was open source, but this was when Microsoft was still very much figuring out how they wanted

08:51 to approach open source.

08:53 And we're still very cagey about it.

08:56 It was very interesting.

08:57 Yeah.

08:58 They've come a long way and many companies have, I would say.

09:01 It's still, there's some idiosyncrasies, I guess there, but certainly it's a different

09:06 time now than it was then.

09:07 This was like what, 2008, 2009 timeframe-ish or 2005 maybe?

09:14 Yeah.

09:14 2005, 2006.

09:16 I think it was around Iron Python 1 out.

09:19 2006 sounds about right.

09:21 Yeah.

09:21 So that's a while ago.

09:22 Yes.

09:23 It doesn't sound that long ago to me, but honestly, it's a while ago.

09:26 Yeah.

09:27 It's like remembering the 90s is not 10 years ago.

09:29 It's true.

09:31 It's definitely true.

09:33 All right.

09:34 How about day to day?

09:35 What are you doing now?

09:36 Your Instagram, right?

09:37 Basically, I work on our fork of CPython, which we call Sender.

09:41 And my job is to make, my entire team's job is to make Instagram run more efficiently.

09:48 I mean, obviously, Instagram is a very large website that has a lot of traffic and it's a

09:53 very large Django app.

09:56 So we just spend our time trying to improve CPython and, you know, very specifically trying

10:04 to improve CPython for Instagram's workload.

10:07 We're very driven by kind of that as our sole direction.

10:12 And so it lets us make some interesting decisions and drive some interesting decisions.

10:18 But it's just really spending the day looking at what we can do to improve performance and

10:25 going off and implementing that and making Instagram a little bit faster.

10:29 So when we talk about Python and Django running Instagram, I put up a little post here of

10:34 something I did yesterday just to have some Instagram stuff to show.

10:38 Is that talking about the website?

10:40 Is that the APIs behind the scenes?

10:42 Like when you say Django runs Instagram, what are we talking about here?

10:47 So it's the website.

10:48 It's the APIs.

10:49 There's obviously some parts that aren't Django, but kind of everything that people's devices

10:55 are interacting with is going through the Django front end.

10:59 And there's also a bunch of like, you know, if we have asynchronous processes that need to

11:04 kick off and run in the background, that's kind of all handled by a Django tier as well.

11:09 So it's a good chunk of what's going on.

11:13 Yeah.

11:13 Nice.

11:14 This is probably one of the, if not the largest Django deployment there is, right?

11:18 This is a lot of, a lot of servers we're talking about, right?

11:20 I would assume so.

11:21 I don't know.

11:22 There might be something else pretty big out there.

11:25 Yeah.

11:25 I feel like the talk at the 2017 PyCon, remember that when we used to go to places where there

11:31 are other people and we would go and like be in the same room and stuff.

11:33 That was so nice.

11:34 I know.

11:35 It was so weird.

11:36 And there was a cool Instagram talk about, I believe that one was about disabling the

11:42 GC or something like that.

11:44 And I feel like they said in that talk, at least at that time, that was one of the largest,

11:48 not the largest Django deployment.

11:49 Yeah.

11:50 And we no longer disable the GC.

11:52 We fixed the memory leak.

11:53 So that's good.

11:54 Okay.

11:55 We're going to talk a lot about memory.

11:58 And honestly, this whole conversation is going to be a bit of a test, an assessment of my

12:05 CPython internals.

12:07 But I think that's okay because a lot of people out there don't know super in-depth

12:12 details about CPython.

12:13 And I can play the person who asks the questions for them.

12:17 Awesome.

12:17 Awesome.

12:17 I can try to answer questions.

12:19 Sure.

12:21 Well, we'll keep it focused on the part that you've been doing.

12:23 But during your talk, you mentioned a couple of things.

12:27 First, you said, okay, well, when we're running over on Django, we're running on, you say UWSGI.

12:34 I feel like this is a micro.

12:36 It used to be like a-

12:37 Micro Whiskey.

12:39 Yeah.

12:39 Micro Whiskey.

12:40 I don't know.

12:40 UWSGI, Micro Whiskey, whatever it is.

12:42 Yeah.

12:42 I feel like all these projects that have interesting names should have a press here to hear how

12:46 it should be pronounced.

12:47 It should be WSGI or Whiskey.

12:49 How do it?

12:50 Anyway, this micro Whiskey you guys are running on and understanding how it creates child processes

12:57 and forks out the work is really important for understanding some of the improvements that

13:03 you've made and some of the areas you've focused on.

13:06 So maybe we could start a little bit by talking about just the infrastructure and how actually

13:12 the execution of Python code happens over at Instagram.

13:16 So in addition to UWSGI, it's running on Linux, which is probably not surprising to you.

13:23 Literally zero people are surprised now.

13:26 Yeah.

13:27 I thought it was a Windows server.

13:28 Come on.

13:29 Or Solaris.

13:31 Yep.

13:32 Or a Raspberry Pi cluster.

13:34 Come on.

13:34 That'd be awesome.

13:36 So like one of the common things that people take advantage on Linux is fork and exec, where

13:43 you start up a master process and then you fork off some trial processes and they can share

13:51 all the memory of that master process.

13:53 So it's a relatively cheap operation to go off and spawn those trial processes and you get

14:01 a lot of sharing between those two processes, which reduces kind of the memory that you need

14:06 to use and all that good stuff.

14:09 And so the way UWSGI is working is that, you know, we are spawning our master process, going

14:17 off importing kind of all of the website.

14:21 Like we try to make sure that everything gets loaded initially and then spawn off a whole bunch

14:27 of worker processes, which are going to actually be serving the traffic.

14:31 And if something happens to one of those worker processes, then the master will come in and

14:38 spawn a new worker to replace it.

14:40 Yeah.

14:40 That kind of goes on and on and on.

14:42 And it's also not just about durability.

14:45 It's also about scalability, right?

14:48 If one of the worker processes is busy working on a request, well, there might be nine others

14:54 and the supervisor process can look and say, OK, well, I got some requests got to be processed

15:00 here.

15:01 This one's not busy and sort of scale it out.

15:03 And that also helps a lot with Pythons, GIL and stuff.

15:08 You can just throw more of these worker processes at it to get more scalability.

15:11 And at some point that kind of hits the database limits anyway.

15:15 So it doesn't really matter that much, right?

15:16 Yeah.

15:16 And I think like you would ski can auto tune.

15:19 I don't know exactly all the details of our settings.

15:21 Yeah.

15:22 There's a lot of advanced settings in there.

15:24 Yeah.

15:24 Yeah.

15:25 Like it can, you know, tune for memory for stall workers.

15:30 It's pretty smart.

15:32 Yeah.

15:33 But like, yeah.

15:34 There's actually a really interesting, I don't know.

15:37 Have you, maybe you've seen this.

15:38 There's a really interesting post called configuring UISG for production deployment over on

15:45 Bloomberg tech talking about all these knobs that they turn to make it work better and

15:50 do these different things.

15:51 And it's super interesting if these, like these tuning knobs are unfamiliar to Python people.

15:56 Yeah.

15:57 Yeah.

15:57 But the important takeaway here is when we're talking about running your code on a single

16:01 server, we're talking about five, 10, 20 copies of the same process running the same

16:06 code with the same interpreter.

16:08 Yeah, exactly.

16:10 You guys pay for bigger cloud instances.

16:12 No, I mean, you have your own data centers, right?

16:14 So you probably get bigger VMs.

16:16 This portion of Talk Python to me is brought to you by Sentry.

16:21 How would you like to remove a little stress from your life?

16:24 Do you worry that users may be encountering errors, slowdowns, or crashes with your app right

16:30 now?

16:30 Would you even know it until they sent you that support email?

16:33 How much better would it be to have the error or performance details immediately sent to you,

16:38 including the call stack and values of local variables and the active user recorded in the

16:43 report?

16:43 With Sentry, this is not only possible, it's simple.

16:47 In fact, we use Sentry on all the Talk Python web properties.

16:50 We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got

16:56 the support email.

16:57 That was a great email to write back.

16:59 Hey, we already saw your error and have already rolled out the fix.

17:02 Imagine their surprise.

17:04 Surprise and delight your users.

17:06 Create your Sentry account at talkpython.fm/sentry.

17:10 And if you sign up with the code talkpython, all one word, it's good for two free months

17:15 of Sentry's business plan, which will give you up to 20 times as many monthly events as

17:21 well as other features.

17:22 Create better software, delight your users, and support the podcast.

17:26 Visit talkpython.fm/sentry and use the coupon code talkpython.

17:33 And so that impacts a lot of the decisions that we make.

17:37 We can talk about those more later.

17:40 I think another interesting thing about RUWSD and our deployments in general is that we're

17:48 also redeploying every 10 minutes when developers are...

17:52 Yeah, I saw that and that blows my mind.

17:54 So tell me about this rapid redeployment.

17:57 It blows my mind too.

17:59 And when I started it at Facebook, I guess now Meta, but it was Facebook back then, you

18:04 go through a process called Bootcamp where you spend your first several weeks just learning

18:09 about Facebook.

18:10 And one of the first things you learn is like, Facebook.com redeploys every three to four

18:14 hours.

18:14 I'm like, that's insanely fast.

18:16 And then land on Instagram.

18:18 We were redeploy every 10 minutes.

18:21 It's like, what?

18:22 Yeah, that's incredible.

18:23 Can you talk about why that is?

18:25 Is there just that many improvements in code changes going on?

18:30 Or is there some other like balancing reason that this happens?

18:34 Like a DevOps-y thing?

18:35 I don't know what all the original reasoning is.

18:39 It has a very nice...

18:41 So one of the nice things about deploying a lot is when something goes wrong, it's not

18:46 hard to figure out what caused things to go wrong.

18:49 So you're not looking at...

18:50 Right.

18:50 There's a bunch of small changes and each one gets deployed.

18:54 So you're not going back to the last six months or whatever, right?

18:56 Yeah, exactly.

18:58 Or I mean, even...

18:59 I mean, each of those deployments has a good number of changes in it.

19:04 And even if it was like four hours, it would be a huge number of changes that you have

19:10 to track things down through.

19:12 And it's also like, it's really satisfying from a developer standpoint in that, you know,

19:18 you land your change and it's rolling out in half an hour.

19:21 So I don't think anyone...

19:24 I don't know all the original reasoning, but I don't think anyone would really want to change

19:28 it just because it actually has some significant benefits.

19:32 It makes things interesting and challenging in some ways too.

19:34 But otherwise, I think it's really nice.

19:36 Yeah.

19:36 I...

19:38 It just never ceases to frustrate me or blow my mind how there's these companies just have

19:45 extended downtime.

19:46 Like, I'm not talking...

19:48 We pushed out a new version and in order to switch things in and out of the load balancer,

19:52 there's five seconds of downtime.

19:54 Or we got to run a database migration and it creates a new index and that's going to take,

19:58 you know, one or two minutes.

20:00 I'm talking...

20:00 We're going to be down for six hours on Sunday.

20:04 So please schedule your work around.

20:06 I'm just like, what is wrong with these companies?

20:09 He's like, how is it possible that it takes so long to deploy these things?

20:15 And if they had put in some mechanism to ship small amounts of code with automation, then

20:23 they would just not be in this situation, right?

20:26 Like, they would just...

20:27 They would get pushed somewhere and then something would happen and then they would have a new

20:32 version of the site, right?

20:34 It always baffles me when I end up at a website and it's like, we're currently down for service.

20:40 It's like, what?

20:41 Yes.

20:42 There's a website.

20:44 You're not supposed to do that.

20:45 The most insane thing...

20:47 I'll get off this thing, but it drives me crazy.

20:50 The most insane thing is I've seen websites that were closed on Sunday.

20:53 I'm like, what do you mean it's closed on Sunday?

20:55 Yeah.

20:55 Just go and turn it off when you go home?

20:58 Like, you know, it's open Monday to Friday sort of thing.

21:01 This is like, it was like a government website and I don't know why it had to be closed,

21:05 but apparently it had to be closed.

21:08 Yeah.

21:08 We have engineers standing by Monday through Friday to process your requests by hand.

21:12 Exactly.

21:14 We got to push the button.

21:15 No one's there to push the button.

21:16 Okay.

21:17 So I guess one more setting the stage stories here or things to know is that you run these servers quite close to their limits in terms of like CPU usage and stuff like that.

21:32 And then also you said one of the areas that you focus on is requests per second as your important metric.

21:39 Do you want to talk about those for a moment?

21:41 Sure.

21:41 So I don't know what the overall numbers under normal load are.

21:48 You know, I don't think the CPU load is necessarily super high.

21:52 But what we want to know at the end of the day is like how many requests can we serve under peak load?

21:59 And so what we can actually do is take traffic and route it to a set of servers and drive that traffic up to where the server is under peak load.

22:11 And we see how many requests per second a server is able to serve at that point, which gives us a pretty good idea of kind of what the overall level of efficiency is.

22:23 So when we make a change, we can basically run an A-B test where we take one set of servers that don't have the change, drive them up to peak load and compare it against another set of servers that have the change and drive those set of servers up to peak load.

22:41 And then compare between the two and see how many requests per second we end up getting and what the change is.

22:49 And we can do that to a decent amount of accuracy.

22:53 I think kind of like when we kick off a manual test, we try to strive for within 0.25%.

22:59 When we're doing releases of sender, I think we try to push it a little bit further by doing more runs.

23:05 So we get down to like 0.1% or something like that.

23:09 So we have a pretty good idea of what the performance impact of what those changes are going to end up looking like.

23:15 I think that makes a ton of sense.

23:16 You could do profiling and obviously you do.

23:19 And we do that too.

23:20 Yeah.

23:21 But in the end of the day, there's a bunch of different things, right?

23:25 If I profile against some process and say, well, this went this much faster in terms of CPU, maybe it took more memory.

23:35 And at production scale, it turns into swap, which means it's dramatically, you know, there's a bunch of pushes and pulls in there.

23:43 And this pragmatic, just like, let's just see what it can take now is interesting.

23:48 You all are in this advantaged situation where you have more traffic than any given server can handle, I would imagine.

23:56 Yes.

23:56 So you can tune in.

23:59 We actually run on one server.

24:00 Exactly.

24:01 We have a backup server.

24:03 It hasn't been rebooted in seven years.

24:06 You have the ability to say, well, let's just tune some of our traffic over to this one particular server to sort of see this limit.

24:14 Whereas a lot of companies and products don't, right?

24:19 Like I use this thing called Locust.io, which is just a fantastic Python framework for doing load testing to actually know the upper bound of what my servers can handle.

24:32 Because we get a lot of traffic, but we don't get 30,000 requests a second, lots of traffic, right?

24:38 And so I think this is really neat that you can actually test in production sort of beyond integration tests, not test that it works right, but send real traffic and actually see how it responds.

24:49 Because really, that's the most important thing, right?

24:51 Does it do more or does it do less than before?

24:52 And, you know, you brought up profiling and we still have to use profiling sometimes to like, you know, 0.25%, 0.1%.

25:01 That's still a lot of noise.

25:03 And like, so if there's some little micro optimization, we can still be like, okay, well, what's this function using after the change, you know, kind of across some percentage of the entire fleet, which is kind of amazing because the profiling is just running on production traffic sampled.

25:22 So for smaller things that ends up becoming super important.

25:27 Right. And you're making a ton of changes as we're about to dive into, but they're additive or multiplicative or something like that, right?

25:34 So if you make this thing 1% faster, that 5% faster, this 3% faster, all of a sudden you could end up at 20 to 30% faster in production, right?

25:42 Yeah.

25:42 And how do you do that math?

25:44 Well, we just add up the percents.

25:45 Exactly.

25:47 Where's Cinder? Here we go.

25:49 All right.

25:50 So when I saw this come out, when did you all make this public?

25:54 Shortly before PyCon?

25:56 Yeah, that's right.

25:57 Yeah.

25:58 Which would put it like February, March, something like that.

26:01 Yeah, something like that.

26:02 That sounds exactly right with this eight months ago.

26:04 So this is under the Facebook incubator.

26:08 You guys got to rename this.

26:10 It should be meta.

26:11 Yeah.

26:11 I wonder whose job that is.

26:13 I mean, permalinks.

26:14 Come on.

26:14 So it doesn't matter all that much.

26:18 It's Instagram, I guess.

26:19 But let me just read the first opening bit here.

26:23 I think there's a lot to take away just from the first sentence.

26:25 Cinder is Instagram's internal performance-oriented production version of CPython 3.8.

26:32 So performance-oriented.

26:34 We've been talking about performance.

26:35 We're going to get into a lot of the cool things you've done.

26:37 Production version.

26:38 So you guys are running on this on Cinder?

26:41 Mm-hmm.

26:41 Fantastic.

26:42 And we redeploy like once a week?

26:45 You redeploy the Cinder or CPython runtime, right?

26:48 Yeah.

26:49 Yeah, yeah, yeah.

26:50 So like the source that's up here is, yeah, if you go back and look at maybe a week ago

26:57 is what we're probably running in introduction at any given time.

26:59 Right.

26:59 Okay.

27:00 Fantastic.

27:00 And then CPython 3.8, because you've made a lot of changes to this that can't really move

27:08 forward.

27:08 So you picked the one, I'm guessing that was the most current when you first started,

27:13 most current and stable, and just started working on that, right?

27:16 So we do upgrade.

27:17 Okay.

27:17 It was previously built on CPython 3.7.

27:21 Oh, cool.

27:21 Okay.

27:22 There's hundreds or, I don't know if we're yet up into thousands of changes yet.

27:29 But there's a lot of dips that we've applied.

27:33 It's been, I mean, we've been working on it for, I've been working on it for three years

27:39 now and it predates me.

27:41 So we've upgraded 3.7.

27:43 We're going to upgrade to 3.10 next, which we're actually starting early next year.

27:50 So it's just a big and bold process.

27:52 And you've also contributed some stuff from Cinder to 3.10.

27:56 So that'll be, that'll be interesting as well.

27:58 That probably actually makes it harder to merge rather than easier.

28:01 We hope that makes it easier.

28:04 That is one of the things.

28:06 I guess you could drop that whole section, right?

28:08 You could just say, you know what?

28:09 We don't even need this whole enhancement because that's just part of Python now, right?

28:13 Okay.

28:13 Yeah.

28:14 That is the incentive for us.

28:15 One of the incentives for us to contribute.

28:17 Yeah.

28:19 Yeah.

28:19 Itamar out in the live stream in the audience says, we're close to 2,000 commits.

28:25 Oh my gosh.

28:25 Yeah.

28:26 That's awesome.

28:26 Itamar's going to be, he is now our kind of full-time dedicated resource to help us upstream

28:34 things.

28:34 Oh, fantastic.

28:35 That's, oh, to upstream it.

28:36 Itamar's job is to take the work you're doing here and then work on getting that into CPython

28:42 properly.

28:43 Okay.

28:43 We could have been doing such a better job.

28:46 I think we've upstream some little things, some slightly more significant things, but it's

28:52 something that we really need to be working on more.

28:54 Oh, that's fantastic.

28:55 We've got someone who's dedicated to it and obviously he's not just doing it in a vacuum.

29:00 We're going to help him.

29:02 But having someone drive that and make sure it actually happens is super important.

29:07 Yeah.

29:07 That's really cool.

29:08 I suspect that he and Lucas Lingo will become friends.

29:11 Yeah.

29:12 Lucas will be on the receiving end of that a lot.

29:16 I bet.

29:16 giving it the developer and residents over at CPython.

29:20 Cool.

29:20 All right.

29:21 So I guess let's talk about this.

29:25 Is it supported?

29:26 So right now the story is you guys have put this out here as sort of a proof of concept.

29:32 And by the way, we're using it, but not we expect other teams and companies to take this

29:38 and then like just run on it as well.

29:39 Right.

29:40 This is probably more to like work on the upstreaming side.

29:43 Is that the story?

29:44 Yeah.

29:45 And like let people know what we're doing.

29:47 If someone wants to pick it up and try it, that's great.

29:50 It's just mainly we're focused on our workload and making it faster and can't commit to helping

29:59 people out and making it work for them.

30:01 Right.

30:01 But as you just said, you are working on bringing these changes up to CPython and you already

30:07 have to some degree.

30:08 So that's, you know, that's pretty good.

30:10 I guess it also lets you all take a more specialized, focused view and say, you know what?

30:17 We want to make micro WSGI when it works off child processes.

30:21 We want to make it that happen better and use less memory.

30:25 And we're going to focus on that.

30:27 And if it makes sense to move that to main Python, good.

30:30 If not, then we're just going to keep those changes there.

30:33 Right.

30:33 Yeah.

30:33 And then, I mean, that's happened, I think with like, we've done some work around

30:38 immortalization of the GC heap, which is kind of a big improvement over not collecting.

30:44 We were talking about earlier.

30:46 And that didn't make sense for upstream CPython.

30:49 And so it's okay.

30:50 That's something that we just have to maintain.

30:51 Cool.

30:52 I was so excited when I saw this come out.

30:53 I'm like, wow, this is the biggest performance story I've seen around CPython for quite a

30:58 while.

30:58 And now there have been some other things as well.

31:00 We'll touch on at the end on how they come together.

31:03 But maybe walk us through what is Cinder?

31:07 What is this work?

31:08 And we can dive into some of the areas, maybe.

31:10 Sure.

31:11 You have immortal instances highlighted.

31:13 So we could start talking about that.

31:15 You want to start with JIT first?

31:16 I think.

31:16 Yeah.

31:17 Okay, sure.

31:17 If you think JIT makes sense.

31:18 Yeah, let's talk there.

31:19 So the JIT isn't what I live on day to day.

31:21 We have several other team members who are working on that full time.

31:26 But it's obviously a huge part of the performance story.

31:29 So the JIT right now is it's a method at a time JIT.

31:34 So it compiles each individual method.

31:36 It's, again, very tuned for our workload.

31:39 Kind of you can see here, like some of the descriptions of how to use this thing.

31:45 And it's mentioning this JIT list file.

31:47 So when we're using this in production, what happens is we compile all the functions ahead

31:54 of time inside of the master process before we fork off all this worker processes.

31:59 Because we want all that memory to be shared between the different processes.

32:05 So that's kind of a unusual mode for a JIT to work in.

32:09 Right, right.

32:11 They don't normally think about children processes and forking.

32:13 They just do their own thing, right?

32:15 Yeah.

32:16 It's just like, okay, I have this method.

32:17 It's gone hot.

32:18 It's time to jib it.

32:19 So it's used in this weird way.

32:22 At some point, we need to, I think, add support for kind of normal jitting methods when they

32:28 get hot.

32:29 Like we were at the point where we're talking about using Cinder a little bit beyond Instagram

32:34 within meta.

32:36 And so at that point, people are going to need something that isn't so heavily tuned to

32:41 UWSC.

32:42 The JIT does, it's, you know, entirely, we kind of boom in the full stack.

32:48 So it uses, I think, is it ASM shit?

32:51 Yeah.

32:52 It uses a library to do the X64 code generation.

32:56 Other than that, we go from a high level representation.

33:01 How close is the high level representation to just Python's bytecode?

33:06 There is a pretty good set of overlap.

33:11 There are also a lot of opcodes which kind of turn into multiple smaller things.

33:18 So like off the top of my head, I think like making a function involves setting several

33:24 different attributes on it at the end.

33:26 So there's something that says, make me this function, which is just a single opcode in CPython.

33:32 And there's several different opcodes which are setting those fields on it.

33:37 So it's pretty close, but maybe slightly lower level.

33:41 There's also a lot of opcodes in there for just kind of super low level operations.

33:47 So one of the things, the thing that I spend most of my time working on is static Python.

33:52 And so we added a bunch of things that support primitive math and simple loads and stores of

33:59 fields and lower level things like that.

34:01 So it's a mix.

34:02 Yeah, the static Python that we're going to talk about is super cool.

34:05 And is that possible because the JIT, like you can do whatever you want and then the

34:09 JIT will see that and then adapt correctly?

34:12 The JIT's really important to it because like it takes things that are usually tons of instructions

34:19 and turns them into a single instruction or a couple of instructions.

34:23 It's not 100% required.

34:25 Like we support it in the interpreter loop and kind of our goal is to do no harm and generally

34:32 like at least get the normal performance.

34:34 But the JIT being able to resolve things statically and turn them into simple loads is super important.

34:40 So from HIR, like we turn that into an SSA form and run a bunch of optimizations over it.

34:47 I think one really interesting optimization is rough count removal.

34:52 So we can see kind of these objects are either borrowed or just like that we'd have extra rough counts happening on them that we don't need to actually insert.

35:04 And we can just elide all of those, which is super awesome.

35:09 This portion of Talk Python and me is brought to you by TopTal.

35:12 Are you looking to hire a developer to work on your latest project?

35:16 Do you need some help with rounding out that app you just can't seem to get finished?

35:21 Maybe you're even looking to do a little consulting work of your own.

35:24 You should give TopTal a try.

35:26 You may know that we have mobile apps for our courses over at Talk Python on iOS and Android.

35:32 I actually used TopTal to hire a solid developer at a fair rate to help create those mobile apps.

35:38 It was a great experience and I can totally recommend working with them.

35:42 I met with a specialist who helped figure out my goals and technical skills that were required for the project.

35:47 Then they did all the work to find just the right person.

35:50 I had short interviews with two folks.

35:53 I hired the second one and we released our apps just two months later.

35:57 If you'd like to do something similar, please visit talkpython.fm/TopTal and click that hire top talent button.

36:05 It really helps support the show.

36:09 There's a lot of interesting stuff happening around memory that you all are doing.

36:13 Yes.

36:14 But one of them is this ref count and you make assumptions that are reasonable.

36:20 Like when I'm in a method call of a class, I don't need to increment and then decrement the self object.

36:27 Because guess what?

36:28 The thing must be alive because it's doing stuff, right?

36:31 And then it sounds like also maybe with constants, like the number one doesn't need its ref count changed and stuff like that.

36:36 You notice that and go, you know what?

36:38 We're just going to skip that.

36:39 One of the things we've done is the immortalization of objects.

36:43 And so we can also like the number one is going to be an immortal instance.

36:48 And so in that case, we can be like, okay, yeah, we don't need to deal with ref counts on this.

36:53 Unless, of course, like that number one ends up going off to somewhere that, you know, maybe doesn't understand the ref counting semantics of the JIT.

37:03 In which case, maybe we do have to end up inserting them.

37:06 Right.

37:06 Or like if it's going through like an if else or something where one of the branches, we have to end up ref counting.

37:13 So it's smart.

37:14 So it's smart.

37:14 And it's important because with immortal instances, our ref counts are a little bit more expensive than normal ref counts because we have to check to see if the object is immortal too.

37:24 Right.

37:24 So they're just doing an increment on a number.

37:26 Yeah.

37:27 Okay.

37:27 So this, this immortal instances, this comes back to that memory thing that comes back to the turning off the GC, which you stopped turning it off.

37:35 It sounds like immortal instances are a more nuanced way to solve that same problem.

37:39 This is really about that fork and exact model.

37:42 Yeah.

37:42 So when we fork off those worker processes, they're initially sharing all the memory with the master process unless they happen to go off and write to it.

37:53 And ref counts are a really big source of writing to that shared memory.

38:00 And so what this does is takes all the objects that are present inside of the master process and runs through, marks them all as immortal.

38:09 And then from then on out, the trial process will be like, oh, this thing's immortal.

38:15 I'm not going to change the ref count.

38:17 Okay.

38:17 So this happens, you basically just scan the whole heap right before you do the fork and you're like everything, we're just going to clone this and it becomes unchangeable.

38:27 And then we'll just, at least with regard to its ref count and we'll go from there.

38:31 Yeah.

38:32 And then as long as ideally we also don't, we shouldn't have a lot of global mutable state.

38:38 I think people should be like, you know, if you think about what's in the master process, a process that's like classes and functions.

38:46 And people shouldn't be really going off and mutating those things inside of the worker processes.

38:52 Modules, yeah.

38:52 It seems like something strange is happening if that's going on.

38:56 Maybe let me ask you really quick or let you talk about really quickly this.

39:00 The real benefit here is on Linux, when you fork off these processes, if the memory itself hasn't been changed, that can be shared across the 40 or 60 processes.

39:12 But as soon as that memory changed, it has like a local copy has to be dedicated to that one worker process.

39:19 So silly stuff, simple stuff like I want to pass this string around that happens to be global.

39:26 And then it says, well, it's passed.

39:28 So you've got to add ref to it, which means now you get 60 copies of it all of a sudden.

39:36 Those really simple things, you are able to get lots better memory sharing, which then leads to cache hits versus cache miss and misses.

39:43 And there's like all these knock on effects, right?

39:45 Yeah.

39:45 And it's not just the string itself, right?

39:48 It's the entire page that the string lives on.

39:50 So, you know, you might have a 15 byte string with, you know, a 16 byte object header and you end up copying 4K of memory.

40:01 Because you changed a 6 reference number to a 7 or to a 5.

40:06 Yeah.

40:07 Fascinating.

40:08 Okay.

40:09 Do you think that Python, CPython itself could adopt this?

40:12 Would it make sense?

40:13 We tried to upstream it and there was resistance to it.

40:17 I mean, it is touching something that's very core and it's going to be a bit of a maintenance burden.

40:22 There are other reasons, I think, that people are now talking about wanting to have immortal instances.

40:29 So Eric Snow has been working on sub-interpreters for a long time.

40:33 And I think he has been interested in them recently for sharing objects between interpreters.

40:40 And I think Sam Gross's work on NoGill might have some form of immortal instances as well.

40:47 So maybe the core immortal instances support could land upstream at some point.

40:57 But, you know, maybe the code that actually is walking the heat and is freezing every day, maybe that's very Instagram specific.

41:05 And it doesn't have much value upstream.

41:08 It seems to me that there's probably a set of things that would be good immortal instances for almost any Python process that starts up, right?

41:18 Like before your code runs, everything there probably would be a good candidate for that.

41:23 And, you know, there's potential, like it's kind of scary because ref counts are so frequent.

41:29 And so adding extra code in the ref count process seems risky.

41:35 But if you can't freeze enough stuff that was kind of there before the program started up, that's super core and happening a lot, then maybe it does actually end up making sense for other workloads too.

41:49 Yeah, perhaps.

41:50 Okay.

41:50 So these immortal instances are one of the things that you all have done that's pretty fascinating.

41:55 And also a huge win.

41:57 Something like 5%.

41:58 Yeah, yeah, that's right.

42:00 It says right here.

42:00 Big win in production, 5%.

42:03 And does that mean 5% request per second?

42:06 Is that when you say 5%, is that the metric you're talking about here?

42:09 Yeah.

42:10 Yeah.

42:10 Have you thought about or tested, I'm sure you've thought about, like if this lets you run more worker processes off of increment that, that spawn worker process number?

42:21 I think the developer who worked on this was doing, did look at that number and was looking at tweaking the number of worker processes.

42:30 If I recalling, it got a little bit of pushback from people who were nervous about increasing it.

42:37 Don't mess with this number.

42:38 We never mess with this number.

42:39 What are you doing?

42:41 Yeah, but I hear you.

42:42 I'm just thinking, you know, if it really does create more shared memory, maybe it creates more space on the same hardware for you to actually create more.

42:50 And then that would just possibly allow even a bigger gain in request per second because there's more parallelism.

42:57 I mean, given that it was such a big win, it could have just been that we were already under significant memory pressure and it got us out of significant memory pressure.

43:06 Maybe we had the right number.

43:08 Maybe we had too many hosts.

43:09 I don't know.

43:10 Yeah, yeah, perhaps.

43:11 Perhaps.

43:11 But still, 5% is, as one of the changes, is still a pretty big deal.

43:16 Mm-hmm.

43:17 All right.

43:17 The next one on deck is strict modules.

43:21 Let's talk about strict modules.

43:23 I mean, we've talked about a little bit of things that are kind of related to this.

43:27 You know what I'm saying?

43:28 Like, if you have things that are going off and mutating your things in the master process, it's like, what?

43:33 That's kind of crazy.

43:34 So, strict modules work about performance, actually.

43:39 There's a little bit of performance thought behind it, but now they're really, we're not considering them as a performance feature at all.

43:46 They're more about a reliability feature.

43:48 And so you brought up early on how, like, Python modules, just going off, executing some code.

43:55 Who knows what that code's going to do?

43:56 Right.

43:57 So strict modules is an attempt to tame that process.

44:04 And what we do is we run static analysis over the code.

44:09 I mean, we are basically interpreting the code in a safe interpreter.

44:13 And if the module has any external side effects or depends upon any external side effects, we don't allow it to be imported.

44:22 And so we know that all the modules are side effect free that are strict.

44:27 When you say they're side effect free, does that mean that the importing of them is side effect free or all of the functions are also side effect free?

44:35 The importing of them.

44:36 Their functions can do whatever they want.

44:38 Got it.

44:38 They can call functions from other modules.

44:41 They can call functions from themselves.

44:43 If they call those modules at the top level while doing the import, then those functions need to be side effect free.

44:50 So where does this lead you?

44:51 What do you get out of this?

44:52 We get additional reliability.

44:54 So like, you know, Instagram, as I think maybe we mentioned this being a big Bono monolithic application.

45:04 Maybe we didn't get to that.

45:05 Yeah, I don't think we talked about that.

45:07 But this is not a hundred microservices type of thing, is it?

45:10 No, it's one giant application.

45:12 The thing that gets redeployed every 10 minutes is that giant application.

45:16 That makes the redeployment even more impressive, by the way.

45:19 Right?

45:20 Yeah, I mean, maybe it's nice in that it's one giant application because you just have to redeploy one thing.

45:27 Yeah, exactly.

45:29 It's not a hundred different things you got to keep in sync all at the same time, right?

45:34 Yeah.

45:34 Our PEs make that happen.

45:36 And it just happens behind the scenes as far as I'm concerned.

45:39 So, you know, if you have like, you know, if you import one module and it depends on side effects from another module and then something changes,

45:50 there's the import order, whether that's like state that things are depending upon, suddenly things blow up in production and your site doesn't work and everyone's really sad.

46:00 So this is like, we want to get to a world where our modules are completely safe.

46:06 We've used this.

46:07 We've experimented doing other things with this.

46:09 So like adding a hot reload capability.

46:12 We know the modules are completely side effect free.

46:15 Why not just patch the module in place and like let developers move on without restarting the website.

46:22 It has the potential to kind of really change the way we store modules.

46:27 So that we haven't gone down this route yet where instead of storing modules is a bunch of Python code that needs to run off and execute.

46:35 Could we store modules as like, here's a class definition.

46:38 Here's a function.

46:41 And can we lazily load portions of the modules out of there?

46:45 But we also have a really other different take on lazy loading that's in center now too.

46:51 Okay.

46:51 Yeah, that's pretty interesting.

46:53 Because normally you can't re-import something because maybe you've set some kind of static value on a class.

47:04 You've set some module level variable and that'll get wiped away, right?

47:08 I mean, you can call reload on a module.

47:11 But whether or not that's the safe thing to do, who knows?

47:16 Exactly.

47:17 Exactly.

47:18 All right, cool.

47:19 So I think one of the more interesting areas, probably the two that really stood out to me, are the JIT and StaticPython with the immortal objects being right behind it.

47:29 But StaticPython, this is your area, right?

47:31 What is this?

47:31 Yeah.

47:31 So this is an attempt to leverage the types that we already have throughout our entire code base.

47:40 So Instagram is 100% typed, although there are still some many types flowing around.

47:47 But you can't add code that isn't typed.

47:50 So we know the types of things.

47:52 Right.

47:53 You're talking traditional just colon int, colon str, optional str, that type of typing.

47:59 Yeah.

48:00 Yeah.

48:00 So why not add a compilation step when we're compiling things to PYCs instead of just ignoring the types?

48:10 Why don't we pay attention to the types?

48:11 Yeah.

48:12 So we have a compiler that's written in Python.

48:15 There's actually this old compiler package that started in Python 2.

48:21 There's this external, there's this developer on GitHub, PF Falcon, who upgraded it to Python 3 at some point.

48:30 And we upgraded it to Python 3.8 and made it match CPython identical for bytecode generation.

48:39 So we have this great Python code base to work in, to write a compiler in.

48:44 And we analyze the type annotations.

48:47 And then we have runtime support and a set of new opcodes that can much more efficiently dispatch to things.

48:56 There's a great, my coworker Carl Meyer had this awesome slide of calling a function during a Python talk.

49:04 And it was just like pages, well, it was one page and a very, very tiny font of the assembly of what it takes for CPython to invoke a function.

49:13 And then we're able to just directly call a function using the x64 calling convention.

49:19 So shuffle a few registers around and admit a call instruction.

49:23 That's awesome.

49:23 It surprised me when I first got into Python, how expensive calling a function was.

49:29 Not regardless of what it does, just the act of calling it.

49:33 You know, coming from C# and C++, where you think so good in line by either the compiler or the JIT compiler and all sorts of interesting things.

49:40 You're like, wait, this is expensive.

49:42 I should consider whether or not I'm calling a function in a tight loop.

49:46 There's so many things it has to deal with.

49:49 The JIT has to deal with adding the default values then, and you don't know whether you're going to have to do that until you get to the function.

49:56 It's got to deal with taking keyword arguments and mapping those onto the correct keywords.

50:03 And that's one thing in static Python, we do that at compile time.

50:07 If you're calling the keyword arguments, they turn into positional arguments because we know what we're going to.

50:13 And we can just shuffle those around at compile time and just save a whole bunch of overhead.

50:19 Yeah, that's fantastic.

50:20 So the way people should think of this is maybe like mypyC or Cython, where it looks like regular Python, but then out the other side comes better stuff.

50:30 Except for the difference here is you guys do it at JIT, not some sort of ahead of time pre-deployment type of thing.

50:36 Yeah.

50:36 And so the first thing we did with it was actually we had 40 Cython modules that were inside of the Instagram code base.

50:45 And that was a big developer pain point and that those things had to be rebuilt.

50:49 The tooling for like editing them wasn't as good because you don't get syntax color highlighting.

50:56 And so we were able to just get rid of all of those.

50:58 And those were heavily tuned, like using a bunch of Cython features.

51:03 And so that really kind of proved things out that like, if we need to use low level features, we support things like permanent events if you want to use them.

51:12 Instead of like having boxed variable size ends.

51:16 So that was a good proving that it worked.

51:20 And now I think it's more close to my PyC at runtime as we've been going through and converting other modules to static Python within the Instagram code base.

51:32 Yeah, fantastic.

51:33 You guys say that static Python plus sender JIT achieves seven times performance improvements over CPython on the type version of Richard's benchmark.

51:43 I mean, obviously you got to be specific, right?

51:45 But still, that's a huge difference.

51:47 Yep.

51:47 And some of that's like the ability to use primitive integers.

51:52 Some of that's the ability to use b tables for invoking functions instead of having to do the dynamic lookup, which is something that both my PyC and Cython support.

52:03 So lots of little things end up adding up a lot.

52:06 And so that's just the JIT.

52:07 Yeah, that's fantastic.

52:08 Talk Python to me is partially supported by our training courses.

52:13 We have a new course over at Talk Python.

52:16 HTMX plus Flask.

52:18 Modern Python web apps hold the JavaScript.

52:20 HTMX is one of the hottest properties in web development today.

52:24 And for good reason.

52:25 You might even remember all the stuff we talked about with Carson Gross back on episode 321.

52:30 HTMX, along with the libraries and techniques we introduced in our new course, will have you writing the best Python web apps you've ever written.

52:38 Clean, fast, and interactive.

52:40 All without that front-end overhead.

52:41 If you're a Python web developer that has wanted to build more dynamic, interactive apps, but don't want to or can't write a significant portion of your app in rich front-end JavaScript frameworks, you'll absolutely love HTMX.

52:54 Check it out over at talkpython.fm/HTMX or just click the link in your podcast player show notes.

53:02 You've talked about using primitive integers.

53:06 And I've always thought that Python should support this idea somehow.

53:10 Like, if you're doing some operation, like computing the square root or something, you take two numbers, two integers, and do some math, you know, maybe multiply, you know, square them, and then subtract them or something like that.

53:23 And all of that stuff goes through a really high overhead version of what a number is, right?

53:30 Like, instead of being a four or eight byte thing on a register, it's 50 bytes or something like that.

53:39 As a pi object long thing that gets ref counted, and then, like, somewhere in there is the number bit.

53:46 And that's awesome because it supports having huge numbers.

53:49 Like, you don't ever see negative 2.1 billion when you're adding.

53:53 You increment a number by one in Python, which is great.

53:55 But it also means that at certain times you're doing math is just so much slower because you can't use registers.

54:02 You've got to use, like, complex math, right?

54:05 It sounds like you're doing this, like, let's treat this number as a small number rather than a pi object pointer drive thing.

54:14 You know, JITs can handle this to some degree, right?

54:17 And they can recognize that things are small numbers and generate more efficient code.

54:24 I think when you had Anthony Yanni is talking about Pigeon doing this.

54:27 Yeah.

54:28 You know, if there's still some overhead there for dealing with the cases where you have to bail out and it's not that case.

54:35 It's nice just having the straight line code that's there.

54:39 You can also do type pointers, which, again, kind of handle that.

54:42 Type pointers are kind of difficult on CPython because things expect pi object stars.

54:48 And if that pi object star ever escapes to something that's not your CPython code, it's going to be very unhappy.

54:55 Yeah.

54:55 So this is, I mean, the nice thing is it's a relatively straightforward way to allow it.

55:00 It was actually a little bit controversial in that, like, is this really what Python developers are going to expect?

55:07 And are we going to have the right semantics there?

55:09 And I think we have a to-do item to actually make things raise overflow errors if they do overflow instead of flowing over to negative 2 billion.

55:19 That would be fantastic.

55:20 I would personally rather see an overflow error than have it, you know, wrap around to the negative side or go back to zero if it's unsigned or whatever terrible outcome you're going to get there.

55:32 Yeah.

55:33 It's a much more reasonable behavior.

55:34 We just, I guess we haven't been very motivated to actually go and fix that.

55:38 Well, you're probably not doing the type of processing that would lead to that, right?

55:44 You're probably not doing, like, scientific stuff where all of a sudden, you know, you took a factorial too big or you did some insane thing like that.

55:51 There's probably not a single factorial in the entire code base, I would guess.

55:55 Yeah.

55:55 There's not a lot of math.

55:57 There was, like, some, like, the only place where you've used primitive integers really was in the existing conversion, in the conversion of the existing Cython code where people had resorted to them.

56:08 Right.

56:08 Because it probably started as an int 32 or an int 64, right?

56:11 Yeah.

56:12 Yeah.

56:13 Like, they had that option available to them.

56:15 They used it.

56:16 It's not, like, something that we're going through and sprinkling in in our random Python code.

56:21 Because, like, yeah, we don't do much math.

56:22 It's very object-oriented.

56:24 Lots of function calls.

56:26 Lots of classes.

56:27 Yeah.

56:28 Absolutely.

56:29 All right.

56:30 There's a lot of other good things that you talked about that are not necessarily listed right here.

56:35 sort of, kind of, stuff with async and await.

56:38 It sounds like you guys use async and await a lot.

56:41 Is that right?

56:41 Yeah.

56:41 The entire code base is basically async.

56:43 There was a big conversion, a big push to convert it right as I was starting.

56:50 And now everything basically is async.

56:53 Unless, obviously, it's not as...

56:55 Wait a minute.

56:55 I heard that async and await is slow.

56:57 Why would you ever use that?

56:58 Because it allows additional parallelization.

57:00 Oh, yeah, yeah.

57:01 Because multiple requests can be served by the same worker.

57:04 Sure.

57:05 Well, you know, whenever I hear those, I see examples of, like, we're just calling something as fast as you can.

57:11 And it doesn't really provide...

57:13 There's not an actual waiting, right?

57:15 Like, the async and await is really good to scale the time.

57:18 When you're waiting, do something else.

57:20 And a lot of the examples say, well, this is slower.

57:22 There's, like, no waiting period.

57:23 But you know what is a really good slow thing?

57:25 An external API and a database.

57:27 And it sounds like you guys probably talk to those things.

57:29 And yes.

57:30 And the no waiting case is actually what this eager co-routine evaluation is all about.

57:37 Like, yeah, sometimes we're talking to a database.

57:39 But sometimes you have a function that's like, have I fetched this from the database?

57:44 Okay, here it is.

57:46 I don't have to wait for it.

57:47 Otherwise, I'll go off and fetch it from the database.

57:50 Right.

57:50 If there's an early return before the first await.

57:53 Exactly.

57:54 There's not a huge value to calling this, right?

57:56 Yeah.

57:57 So tell us about this eager co-routine evaluation, which deals with that, right?

58:00 Yeah.

58:01 So this lets us run the function up to the first await and only go off and kind of.

58:08 So normally what happens is you produce your co-routine object, schedule that on your event loop,

58:17 and then eventually it'll get called.

58:19 And now when you call the function, it's going to run.

58:21 It's going to immediately run up to the first await.

58:24 And if it doesn't hit that first await, it's just going to have the value that's produced.

58:28 And you're not going to have to go through this big churn of going through the event loop with this whole co-routine object.

58:36 Yeah, that's fantastic.

58:36 Yeah.

58:37 Yeah.

58:37 Yeah.

58:37 Yeah.

58:37 It is slightly different semantics because now you could have some CPU heavy thing, which is just like not sharing with CPU with other workers, which is a great.

58:52 And I think it can end up kind of, I think there can be some slight differences on what the scheduling happens, like where you could have observable differences, but we haven't had any issues with that.

59:05 So I think it's might be a little bit controversial, but it's such a big when that it makes a lot of sense for us.

59:11 It certainly could change the order.

59:13 If you're doing, here's a whole bunch of co-routines and a bunch of awaits and stuff, and then you ran them in one mode, the sort of standard mode versus this, you would get a different order.

59:23 But, you know, I mean, it sounds like you're going to ultimately put the same amount of CPU load on.

59:28 I mean, async and await runs on one thread anyway, generally.

59:32 Yeah.

59:33 Unless you do something funky to like wrap some kind of thread or something, but in general, it still runs there.

59:39 I would hope that most people aren't super dependent upon the order.

59:43 If you're dependent upon the order and you're doing threading or something like that, you're doing it wrong.

59:49 Yeah.

59:50 The fairness issue might be a bigger issue.

59:53 Yeah, yeah, yeah.

59:54 Yeah.

59:54 For us, it makes a lot of sense.

59:56 Yeah, that's really cool.

59:57 All right.

59:58 Another one was shadow code or shadow byte code.

01:00:01 Yeah.

01:00:02 So this is our inline caching implementation.

01:00:04 We've had this for a few years.

01:00:08 Python 3.11 is getting something very similar.

01:00:12 So we kind of expect that our version will be going away.

01:00:16 We'll have to see if there's any cases that aren't covered or if there's any performance differences.

01:00:22 But basically, it's nearly identical.

01:00:25 We have an extra copy of the byte code, which is why it's called shadow byte code, which we can mutate in the background and replace the normal opcodes with specialized ones.

01:00:38 So if we're doing a load adder and that load adder is an instance of a specific type, we can just say, okay, well, we know that this load adder doesn't have a type descriptor associated with it.

01:00:52 Descriptor associated with it, like a get set data descriptor.

01:00:57 We know that the instance has a split dictionary, which is the way CPython shares dictionaries, dictionary layout between instances of classes.

01:01:08 We know this attribute is at offset two within split dictionary.

01:01:13 So we just do a simple type check and make sure that the type is still compatible and go off and look in the instance dictionary and pull the value out.

01:01:22 Instead of going through and looking up all those other things that I've just described, which is kind of what you have to do every single time on a normal load adder.

01:01:31 Yeah, that's really cool.

01:01:32 Is this something that could come back to CPython?

01:01:35 I think the fact that they've gone off and built their own version 3.11 means that's not going to happen.

01:01:40 But the idea lives there.

01:01:44 Yes.

01:01:44 Yeah.

01:01:45 Yeah.

01:01:45 Okay.

01:01:45 Awesome.

01:01:46 So we're getting short on time here, but maybe you could just highlight really quickly, stepping back one feature point on the asyncio stuff is the send receive without stop iteration stuff that you did.

01:02:02 And then that getting upstreamed as well already.

01:02:05 Yeah.

01:02:05 Yeah.

01:02:05 So that was adding.

01:02:07 So I did work on this developer Vladimir Mativ worked on this and that was adding in a, I think he added in a new set of slots for actually achieving this at the end of the day.

01:02:22 And sender.

01:02:23 And sender.

01:02:23 And sender.

01:02:23 And sender.

01:02:23 And sender.

01:02:23 We have a type flag that says this type has these additional slots.

01:02:27 And so we can call the send function and the receive function and get back and be done.

01:02:34 That's kind of did this thing return a result?

01:02:37 Did this thing throw an exception?

01:02:39 And here's the result.

01:02:41 Yeah.

01:02:41 So that instead of producing the stop iteration on every single result, we just return the result.

01:02:48 And that is obviously big with coroutines because coroutines are generators at the end of the day.

01:02:55 Yeah.

01:02:56 That's fantastic.

01:02:57 Everything can get more efficient by not allocating on sort of hidden behind the scene exceptions, right?

01:03:03 Yeah.

01:03:03 All right.

01:03:04 Well, there's a bunch of cool stuff here.

01:03:06 And I'm really happy to hear that you and your team and Edomar out there are working on bringing this stuff over.

01:03:11 Because I was so excited when I saw it.

01:03:13 And then I saw, is it supported?

01:03:14 Like, not really.

01:03:15 You really shouldn't use this.

01:03:16 I'm like, oh, but it looks so good.

01:03:18 Like, I want so much of this stuff to be moved over.

01:03:20 So that's cool.

01:03:21 And I think some of it will be difficult to move over.

01:03:24 Like, in moving the entire JIT over, the JIT's written in C++.

01:03:28 Obviously, the CPython core developers were open to C++ for a JIT at one point in time with unladen swallow.

01:03:35 Whether or not that feeling has changed, who knows.

01:03:40 But it's a big piece of code to drop in.

01:03:42 So one thing that we really want to do going forward is actually get to the point where the big pieces of sender are actually just pip installable.

01:03:51 So we'll work on getting the hooks that we need upstreamed.

01:03:55 One thing that the JIT relies on a lot is dictionary watchers that we can do really super fast global loads.

01:04:02 And we have a bunch of hooks into, like, type modification and function modification that aren't super onerous by any means.

01:04:10 Yeah.

01:04:11 So if we can get those upstream, then we can make the JIT just be, here, pip install us.

01:04:15 And so hopefully we can get those upstreamed in 3.11 and have pip install sender start working.

01:04:24 Yeah.

01:04:25 That'd be awesome.

01:04:26 Yeah.

01:04:26 So, yeah, really good work on these.

01:04:27 I guess let's wrap up our conversation here because we're definitely short on time.

01:04:31 But, you know, there's the other projects, which I'm going to start calling the Shannon Plan that Mark and Guido are working on.

01:04:39 They've been working on for a year.

01:04:41 And then there's Pidgin, which, by the way, Anthony Shaw has taken over, but you created Pidgin, right?

01:04:48 Yep.

01:04:48 That's awesome.

01:04:49 Well done on that.

01:04:51 On a whim at a playtime.

01:04:54 Exactly.

01:04:55 And Sam Gross's work on the NoGill stuff.

01:04:59 All of this seems to be independent, but in the same area as those things.

01:05:04 Where do you see the synergies?

01:05:05 Do you see any chance for those to, like, come together?

01:05:07 Is that through some kind of pip putting the right hooks in there and other people plugging in what they want?

01:05:12 Or what do you see there?

01:05:14 It'd be great if these could come together a little bit.

01:05:16 Yeah.

01:05:16 In a lot of places, we're working on independent things.

01:05:21 Obviously, Pidgin is a JIT and we're a JIT.

01:05:23 With different goals to some degree, right?

01:05:25 Yeah.

01:05:26 But, I mean, also very similar and overlapping goals.

01:05:30 I think there'll probably have to be discussion of, like, what the future of JITs look like in CPython.

01:05:37 Like, is that something that's part of the core?

01:05:39 Or is that something that should live on as being external?

01:05:43 Or is there going to be a grand competition and at one point one of the JITs will win?

01:05:48 Who knows?

01:05:49 It's a good discussion that should probably take place.

01:05:52 The hooks for JITs are there.

01:05:54 And between what Brett and I added for Pidgin and Mark Shannon's Vector Call work that happened several releases ago,

01:06:03 I think JITs have a pretty good foundation for booking in and replacing code execution.

01:06:08 They probably need other books to, you know, get into other things like the Dictionary Watchers that I mentioned.

01:06:15 But, like, we can keep working on books.

01:06:18 Other things have less overlap.

01:06:20 So, hopefully we can all kind of work in our own streets and work to improve things and make those available to Python developers in the best way that's available.

01:06:32 And not be stomping on each other's shoes or do profiting work too much.

01:06:37 Yeah, absolutely.

01:06:37 Well, it's an exciting time.

01:06:40 I feel like a lot of stuff is sort of coming back to the forefront.

01:06:42 And it feels like...

01:06:44 So much performance work.

01:06:45 Yeah, for sure.

01:06:46 It feels like the core developers are open to hearing about it and taking on some of the, you know, the disruption and complexity that might come from it.

01:06:54 But still, it could be valuable, right?

01:06:56 Mm-hmm.

01:06:57 It's absolutely going to be valuable.

01:06:59 Yeah.

01:07:00 I feel like there's enough pressure from other languages like Go and Rust and stuff.

01:07:03 Oh, you should come over to our world and forget that Python stuff.

01:07:07 You're like, hold on, hold on, hold on.

01:07:08 We can just...

01:07:09 We can do that too.

01:07:10 But we've got to...

01:07:11 We can get faster.

01:07:12 Yeah.

01:07:12 Well, this is awesome work.

01:07:13 Thanks for coming on and sharing.

01:07:15 Thank you for having me.

01:07:16 Yeah, you and your team are doing...

01:07:18 Now, before you get out of here, got the final two questions.

01:07:20 Okay.

01:07:21 You're going to write some...

01:07:22 Let's do notable PyPI package first.

01:07:24 So is there some library or notable package out there that you come across?

01:07:28 Like, oh, this thing's awesome.

01:07:29 People should know about whatever.

01:07:31 So does it have to be PyPI?

01:07:33 No, any project.

01:07:34 So as I said, I have a very weird relationship with Python, right?

01:07:38 As using mainly from the implementation side.

01:07:42 So I think my favorite package is the standard library.

01:07:45 Okay.

01:07:46 Right on.

01:07:47 And if I had to pick something out of the standard library, I think one of the coolest parts is

01:07:51 mock.

01:07:52 It's been an interesting integration with static Python, but like it, like seeing the way people

01:07:59 use it and drive their tests, it's kind of really kind of amazing.

01:08:03 Yeah, I agree.

01:08:04 It's definitely a very cool one people should certainly be using.

01:08:07 And now if you're going to write some Python code, you might also have special requirements

01:08:10 that shift you in one way or the other.

01:08:12 But what editor are you using?

01:08:13 Oh, I use VS Code pretty much.

01:08:15 Well, I use VS Code.

01:08:17 I use nano when I need to make a quick edit from the command prompt.

01:08:21 Yeah, cool.

01:08:21 I'm a fan of nano as well.

01:08:22 Like, let's just keep it simple.

01:08:23 It's just give me a nano.

01:08:25 Let me edit this thing over the shelf.

01:08:26 It has syntax cover highlighting that.

01:08:29 It's so advanced.

01:08:31 It's awesome.

01:08:31 Cool.

01:08:32 No, no, I use it as well.

01:08:33 All right.

01:08:33 Well, Dino, thank you so much for being here.

01:08:35 Final call to action.

01:08:36 People are excited about these ideas.

01:08:38 Maybe they want to contribute back or try them out.

01:08:40 What do you say?

01:08:41 I mean, try out sender.

01:08:42 Yeah, it's unsupported.

01:08:43 But, you know, if you have thoughts on it, that's cool.

01:08:46 You do have instructions on how to build it right here.

01:08:48 So you could check it out.

01:08:49 There's a Docker container.

01:08:50 Yeah.

01:08:51 Okay.

01:08:51 Yeah.

01:08:52 So it's pretty easy to give it a shot.

01:08:55 You know, like, it might be harder to get it up and running in a perf-sensitive environment.

01:09:01 If you want to try out Static Python, that'd be cool.

01:09:04 Or Strict Modules.

01:09:05 And give us any feedback you have on those.

01:09:08 Fantastic.

01:09:09 All right.

01:09:09 Well, thanks for being on the show.

01:09:10 Great to chat with you.

01:09:11 Thank you, Michael.

01:09:12 Yeah, you bet.

01:09:13 Bye.

01:09:13 See ya.

01:09:14 See ya.

01:09:14 This has been another episode of Talk Python to Me.

01:09:18 Thank you to our sponsors.

01:09:20 Be sure to check out what they're offering.

01:09:21 It really helps support the show.

01:09:23 Take some stress out of your life.

01:09:25 Get notified immediately about errors and performance issues in your web or mobile applications with

01:09:30 Sentry.

01:09:30 Just visit talkpython.fm/sentry and get started for free.

01:09:35 And be sure to use the promo code TALKPYTHON, all one word.

01:09:39 With TopTal, you get quality talent without the whole hiring process.

01:09:44 Start 80% closer to success by working with TopTal.

01:09:47 Just visit talkpython.fm/TopTal to get started.

01:09:52 Want to level up your Python?

01:09:54 We have one of the largest catalogs of Python video courses over at Talk Python.

01:09:58 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:10:03 And best of all, there's not a subscription in sight.

01:10:06 Check it out for yourself at training.talkpython.fm.

01:10:09 Be sure to subscribe to the show.

01:10:11 Open your favorite podcast app and search for Python.

01:10:14 We should be right at the top.

01:10:15 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:10:20 and the direct RSS feed at /rss on talkpython.fm.

01:10:25 We're live streaming most of our recordings these days.

01:10:28 If you want to be part of the show and have your comments featured on the air,

01:10:31 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:10:36 This is your host, Michael Kennedy.

01:10:37 Thanks so much for listening.

01:10:39 I really appreciate it.

01:10:40 Now get out there and write some Python code.

01:10:42 I'll see you next time.

01:11:03 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon