#339: Making Python Faster with Guido and Mark Transcript
00:00 There has been a bunch of renewed interest in making Python faster.
00:03 While for some of us, Python is already plenty fast for others, such as those in data science, scientific computing, and even large tech companies, making Python even a little faster would be a big deal.
00:15 This episode is the first of several that dive into some of the active efforts to increase the speed of Python while maintaining compatibility with existing code and packages.
00:24 And who better to help kick this off than Gitovan Rossum and Mark Shannon?
00:29 They both joined us to share their project to make Python faster. I'm sure you'll love hearing about what they're up to.
00:35 This is Talk Python to me. Episode 339, recorded November 1, 2021. Welcome to Talk Python, a weekly podcast on Python.
00:57 This is your host, Michael Kennedy.
00:59 Follow me on Twitter, where I'm @mkennedy and keep up with the show and listen to past episodes at Talkpython FM and follow the show on Twitter via @ Talkpython.
01:08 We started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at 'talk python.fm/youtube' to get notified about upcoming shows and be part of that episode.
01:20 This episode is brought to you by Shortcut and Linode and the transcripts are sponsored by 'Assembly AI' Mark, Guidow.
01:29 Welcome to Talk Python to Me.
01:31 Fantastic to have you here.
01:33 I'm so excited about all the things that are happening around Python performance.
01:37 I feel like there's just a bunch of new ideas springing up and people working on it, and it's exciting times, definitely.
01:44 You two are, of course, right at the center of it. But before we talk about the performance work that you are doing, as well as some of the other initiatives going along, maybe in parallel there.
01:55 Let's just get started with a little bit of background on you.
01:58 Guido, you've been on the show before.
02:00 Creator of Python. You hardly need an introduction to most people out there, but you have recently made a couple of big changes in your life. I thought I'd just ask you how that's going.
02:09 You retired and we're all super happy for you on that. And then you said, you know what kind of want to play with code some more? And now you're at Microsoft. What's the story there?
02:17 Oh, I just like the idea of retiring. So I try to see how many times in a lifetime I can retire.
02:24 Starting with my retirement from BDFL didn't stop me from staying super active in the community.
02:31 But when I retired from Dropbox a little over two years ago, I really thought that that was it that I believed it and everybody else believed it to Dropbox.
02:41 Certainly believed it.
02:42 They were very sad to see me go.
02:44 I was sad to go, but I thought it was time and I had a few great months decompressing going on bike rides with my wife and family.
02:56 Fun stuff.
02:57 And then the pandemic hit and a bunch of things got harder.
03:02 Fortunately, the bike ride eventually got restored. But other activities, like eating out was a lot more stressful.
03:09 Basically, just life was a lot more stressful in general.
03:12 And human interaction was definitely drunken down to a kernel.
03:17 And somehow I thought, Well, I want to have something to do.
03:22 I want to do more sort of software development in the team.
03:27 And the Python core development team didn't really cut it for me because it's sort of diffuse and volunteer based.
03:34 And sometimes you get stuck waiting for months for the steering Council to sort of approve of or reject a certain idea that you worked on.
03:45 So I asked around and I found that Microsoft was super interested in hiring me, and that was now exactly a month.
03:55 A year ago, I started at Microsoft.
03:59 In the beginning, I just had to find my way around at Microsoft.
04:04 Eventually I figured I should pick a project.
04:07 And after looking around and realizing I couldn't really sort of turn the world of machine learning upside down, I figured I'd stay closer to home and see if Microsoft was interested in funding a team working on speeding up C Python. And I was actually inspired by Mark's proposals that were going around at the time.
04:28 So I convinced Microsoft to sort of start a small team and get Mark on board.
04:35 Yeah, that's fantastic.
04:36 I also feel a little bit like machine learning is amazing, but I don't have a lot of experience with it. And whenever I work with it, I always kind of feel on the outside of it. But this core performance of Python that helps everybody, including even Microsoft, it maybe saves them absolutely.
04:54 Energy and Azure when they're running Python workloads or whatever.
04:58 So you enjoying your time. Are you happy there?
05:00 I'm very happy.
05:02 A lot of freedom to basically pursue what you are. Right.
05:04 And it's nice that the new Microsoft is very open source friendly, at least in many cases.
05:10 Obviously not everywhere.
05:12 But our Department is very opensource friendly things like Visual Studio code are all open source.
05:18 And so there was great support with management for sort of the way I said I wanted to do this project, which is completely out in the open.
05:28 Everything we do is sort of just merged into Main as soon as we can.
05:34 We work with the core developers.
05:36 We don't have, like, a private fork of Python where we do amazing stuff.
05:41 And then we knock on the steering console door and say, hey, we'd like to merge this.
05:48 You're not going to drop six months of work just in one block, right.
05:52 It's there for everyone to see.
05:54 I think that's really positive.
05:56 And wow, what a change.
05:58 Not just for Microsoft, but so many companies to work that way compared to ten or 15 years ago.
06:05 Now, before I get to Mark, I just want to bunch of people are excited that you're here.
06:10 And Luis, out in the audience said, wow, it's Guido. I can't thank you enough for your amazing Python and all the community.
06:17 Great to hear, Mark, how about you? How did you get into this Python performance thing?
06:21 I know you did some stuff with HotPy back in the day.
06:24 Yeah, that was sort of my PhD work, so I guess I kind of go into the performance almost before the Python.
06:33 So things of compiler work Masters, and obviously just you need to write scripts and just get stuff done.
06:43 Just Python is just a language to get stuff done. And then it's that I think I'm in Regal, I think one of his sort of credit in one of his papers or something.
06:52 Thank you for Python for being such a great language to use and such a challenge to optimize.
06:57 So it's definitely good if you're coming at it from a provides this great intellectual challenge when you're actually trying to optimize it.
07:04 And it's a really nice language to use as well. So it's doubly good.
07:07 It is definitely good before we move on. Really quick. Paul, ever it says it's really impressive how the in the open work has been done.
07:15 I totally agree. Hi, Paul.
07:17 Yeah. Keep that going. Hey, Paul, happy to see you here.
07:19 We're going to talk about making Python faster, but I want to start this conversation.
07:24 It's a bit of a hypothetical question, but sort of set the stage and ask, how much does Python really need to be faster?
07:31 Because on one hand, sure, there's a lot more performance we can do if you're going to say, well, we're going to solve the in body problem using C++ or C# versus Python. It's going to be faster with the native value types and whatnot on the other, people are building amazing software that runs really fast with Python already.
07:49 We've got the C optimizations for things like NumPy and SQLAlchemies transformation layer, serialization layer, and so on.
07:58 So a lot of times that kind of brings it back to see performance. So how much do you think Python really needs to be optimized already?
08:06 Now that more is always better. Faster is always better. But I just kind of want to set the stage and get your two thoughts on that.
08:11 I always think back to my experience at Dropbox, where there was a large server called the Meta Server, which did sort of all the Serve side work like anything that hits www.
08:25 dropbox.com hits that server, and that server was initially a small prototype written in Python.
08:33 The client was actually also a small prototype written in Python.
08:37 And to this day, both the server and the client at Dropbox as far as I know. And unless in the last two years, they totally ripped it apart, but I don't think they did.
08:46 They tweaked it, but it's still all now very large Python applications.
08:51 And so Dropbox really sort of feels the speed of Python in its budget because they have thousands.
09:02 I don't know how many thousands of machines that all run this enormous Python application.
09:08 Right. If it was four times faster, that's not just a quarter of the machines that's less DevOps, less admin, all sorts of stuff. Right.
09:17 Even if it was 4% faster, they would notice.
09:20 The other area where I think it's really relevant has to do with the multicore side of things.
09:26 I have a PC over there.
09:28 16 cores.
09:29 My new laptop has ten cores, although with Python, it's hard to take true advantage of that side of modern CPU performance if it's not IO bound, right?
09:39 I don't know how deep you want me to go into that and Mark and stop me if I'm going too deep, too. But there are existing patterns that work reasonably well.
09:50 If you have a server application that handles multiple, fairly independent requests, like if you're building a multicore web application, you can use multi processing or pre forking or variety of ways of running a Python interpreter on each core that you have each independently handling requests.
10:13 And you can do that. If you have 64 cores, you run 64 Python processes.
10:18 Maybe that's just a number in a microwave config file. It's nothing.
10:22 It works for applications that are designed to sort of handle multiple independent requests in a scalable fashion.
10:31 There are other things that other algorithms that you would want to execute where it's much more complicated to sort of employ all your cores efficiently.
10:43 That's still enough that Python hasn't cracked.
10:46 And I'm assuming you're asking this question because Sam Gross, very smart developer at Facebook, claims that he has cracked it.
10:54 Perhaps he has.
10:55 It's an interesting idea.
10:56 We'll dive into that a little bit later.
10:58 I'm more asking it just because I see a lot of people say that Python is too slow.
11:03 And then I also see a lot of people being very successful with it, and it not being slow in practice or not being much slower than other things. And so I'm more or less at the stage of, like, the context matters.
11:16 This Dropbox example you have it really matters to them.
11:20 My course website where people take courses, the response time of the pages is 40 milliseconds. If it was 38, it doesn't matter.
11:27 It's really fast. It's fine.
11:29 But if I was trying to do computational biology in Python really want to be able to take advantage of the 16 cores.
11:35 So there's just such a variety of perspectives where it matters.
11:40 Mark, what are your thoughts on all this?
11:41 Well, it's just a case of saving energy saving time.
11:45 It just makes the whole thing nice to use.
11:47 So there's a lot of just initiative development in data science and that responsiveness, the whole just breaking your train of thought because things take too long versus just keeping in the flow.
12:02 It's just nice to have something that's faster.
12:04 I mean, it's not just the big companies saving money as well.
12:07 It just keeps everyone server budgets down if you just need a smaller virtual instance because you can serve the request fast enough because Python is faster. So just generally it's a responsible thing to do.
12:21 So people expect technology to move forward.
12:24 It is this feeling of falling behind or people wanting to move other languages because of the perceived performance.
12:31 I do think that that's an issue.
12:32 I'm moving to go because it has better async support, rewriting this in rust for whatever reason.
12:37 Sometimes that might make sense. But other times I feel like that's just a shame.
12:41 And it could be better.
12:42 A couple of questions from the audience just want to throw out there.
12:45 Let's see.
12:46 One was especially you must be really proud to hear about the Mars helicopter and the lander and Python in space.
12:55 How did you feel when you heard about the helicopter using Python and the lander using Python and Flask and things like that?
13:01 It wasn't really a surprise given how popular Python is amongst scientists.
13:07 So I didn't throw a party, but it made me feel good. I mean, it's definitely sort of one of those accomplishments for a piece of technology if it's actually shot into space, you know, you've made a difference.
13:22 I remember, like, 30 years ago or more when I helped some coding on European project called Amiibak, which was like a little distributed operating system.
13:33 And one of the things that they always boasted was that our software runs on the European space station.
13:39 That was very important.
13:41 I totally get the feeling. And I hope that everyone who contributed to Python also sort of feels that their contribution has made that sense of awe.
13:51 You look up in the night sky, that little bright star that's actually Mars, and you think, yeah, it's up there.
13:57 All right.
13:58 Let's dive into some of the performance stuff that you all have been doing.
14:02 So maybe Guido starts out with the team.
14:05 You've got a group of folks working together.
14:07 It's not just you. And also now Mark Shannon is working with you as well. Right.
14:11 That's correct.
14:13 In March or so, the initial team was Eric Snow, Mark and myself.
14:18 And since I think since early October, we've got fourth team member, Brand Booker, who is also a Python corded, since I think since about a year and a half, he's a really smart guy.
14:31 So now we have four people.
14:33 Except you should really discount me as a team member because I spend most of my time in meetings, either with a team or with other things going on at Microsoft in practice.
14:45 How closely do you work with, say, the Vs code, Python plugin team and other parts? Or is this more a focused effort?
14:51 This is more focused.
14:52 I know those people I've not met anyone in person.
14:56 Of course not.
14:57 I've been to Microsoft Office since I started there.
15:01 Which is really crazy.
15:03 But what we're doing is really quite separate from other sort of Python related projects at Microsoft, but I do get called into meetings to give my opinion or sort of what I know about how the community is feeling or how the core dev team is feeling about various things that are interesting to Microsoft, or sometimes things that management is concerned about.
15:28 I'd be worth saying it's not just Microsoft as well. It contributes there's quite a few other core developers are helping out.
15:36 It's a broader effort.
15:39 This portion of Talk Python to Me is brought to you by Shortcut, formerly known as Clubhouse IO.
15:44 Happy with your project management tool.
15:46 Most tools are either too simple for a growing engineering team to manage everything or way too complex for anyone to want to use them without constant prodding.
15:54 Shortcut is different, though, because it's worse.
15:57 Wait, no, I mean it's better.
15:59 Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible, powerful, and many other nice, positive adjectives.
16:07 Key features include team based workflows.
16:10 Individual teams can use default workflows or customize them to match the way they work.
16:15 Orgwide goals and roadmaps.
16:17 The work in these workflows is automatically tied into larger company goals.
16:21 It takes one click to move from a roadmap to a team's work to individual updates and back. Pipe version control integration, whether you use GitHub, GitLab or Bitbucket Clubhouse ties directly into them so you can update progress from the command line. Keyboard friendly interface. The rest of Shortcut is just as friendly as their power bar, allowing you to do virtually anything without touching your mouse.
16:44 Throw that thing in the trash. Iteration-planning, set weekly priorities, and let Shortcut run the schedule for you with accompanying burn down charts and other reporting.
16:53 Give it a try over at 'Talkpython.fm/shortcut'.
16:58 Again, that's 'talkpython.fm/shortcut' choose Shortcut because you shouldn't have to project manage your project management.
17:07 Mark, what's your role on the team?
17:09 I already have sort of official roles, but I guess I'm sort of doing a fair bit of technical sort of architectural sort of stuff, obviously, because it's just like my field, right.
17:21 Yeah, I guess so.
17:23 All right. Guido, you gave a talk at the Python Language Summit in May this year talking about faster Python, this team some of the work that you're doing.
17:31 So I thought that might be a good place to start the conversation.
17:35 Yeah, some of the content there is a little outdated.
17:38 Well, you just have to let me know when things have changed.
17:42 So one of the questions you ask is, can we make C Python specifically faster?
17:47 And I think that's also worth pointing out. Right. There's many run times.
17:51 Often they're called interpreters. I prefer to the runtime word because sometimes they compile and they don't interpret sometimes they're called virtual machines.
17:58 Yeah, there's many Python virtual machines PyPy C Python.
18:04 Traditionally there's been Jython and IronPython, although I don't know if they're doing anything but your focus and your energy is about how do we make the Python people get if they just go to their terminal and type Python the main Python faster? Because that's what people are using, right?
18:18 For the most part.
18:19 I don't have specific numbers or sources, but I believe that between 95% and 99% of people using Python are using some version of C Python. Hopefully not too many of them are still using Python too.
18:32 Yes, I would totally agree with that. And I would think it would trend more towards the 99 and less towards the 95, for sure, maybe a fork of C Python that they've done something weird too. But yeah, I would say CPython.
18:43 So you ask the question, can we speed up C Python and Teddy out in the live stream be able to catch this comment?
18:52 What will we lose in making Python faster, if anything, for example, what are the trade offs? So you point out, well, can we make it two times faster, ten times faster, and then without breaking anybody's code?
19:02 Because I think we just went through a two to three type of thing that was way more drawn out than I feel like it should have been.
19:09 We don't want to reset that again, do we?
19:11 Well, obviously the numbers on this slide are just teasers.
19:15 Of course, I don't know how to do it.
19:17 I think Mark has a plan, but that doesn't necessarily mean he knows how to do it exactly.
19:23 The key thing is and sort of to answer your audience question without breaking on anybody's code. So we're really trying to sort of not have there be any downsides to adopting this new version of Python, which is unusual because definitely if you use PyPy, which is, I think, the only sort of competitor that competes on speed that is still live, and in some use you pay in terms of how well does it work with extension modules?
19:57 It does not work with all extension modules, and with some extension modules, it works, but it's slower.
20:04 There are various limitations, and that in particular, is something that has kept many similar attempts back.
20:13 If we just give this up, we can have X, Y, and Z, right. But those turn out to be pretty big compromises.
20:20 And sometimes quite often extension modules are the issue.
20:24 Sometimes there are also things where Python's runtime semantics are not fully specified.
20:31 It's not defined by the language when exactly objects are finalized, when they go out of scope.
20:38 In practice, there's a lot of code around there that in very subtle ways depends on C Python's finalization semantics based on reference counting and so anything.
20:50 And this is also something that PyPy I learned, and I think Piston, which is definitely alive and open source.
20:58 You should talk to the Piston guy if you haven't already.
21:00 But their first version, which they developed many years ago with Dropbox, suffered from sort of imprecise finalization semantics.
21:10 And they found with early tests on the Dropbox server code that there was too much behavior that didn't work, right?
21:20 Because objects weren't always finalized at the same time, or sometimes in the same order as they were in standard CPython.
21:28 Oh, interesting.
21:29 So there's no promises about that, right? It just says, Well, when you're done with it, it goes away pretty much eventually.
21:35 If it's a reference account, it might go away quickly.
21:37 If it's a cycle, it might go away slower.
21:39 That's correct.
21:40 And unfortunately, this is one of those unspecified parts of the language where people in practice all depend on not everybody, obviously, but many large production code bases do end up depending on that, not sort of intentionally. It's not that a bunch of application architects got together and said, we're going to depend on precise finalization based on reference counting. It's more that those servers, like the 5 million lines of server code that Dropbox had when I left, were written by hundreds of different engineers, some of whom wrote only one function or two lines of code, some of whom sort of maintained several entire subsystems for years.
22:25 But collectively, it's a very large number of people who don't all have the same understanding of how Python works and which part is part of the promises of the language, and which is just sort of how the implementation happens to work.
22:41 And some of those are pretty obvious.
22:43 I mean, sometimes there are functions where the documentation says, Well, you can use this, but it's not guaranteed that this function exists or that it always behaves the same way.
22:54 But the sort of the finalization behavior is pretty implicit.
22:57 Mark, what are your thoughts here?
22:58 People just expectations is derived from what they use.
23:01 Trouble, and documentation is like instructions they don't always get red.
23:06 And also it's not just the finalization.
23:08 It's also reclaiming memory.
23:09 So anything that has different memory management system might just need more memory reference counting is pretty good at reclaiming memory quickly and will run near the limit of what you have available.
23:21 Whereas a sort of more tracing garbage collection like pipe pie doesn't always work so well like that.
23:26 One thing we are going to change is the performance characteristics. Now, that should generally be a good thing. But there may be people on who rely on more consistent performance.
23:35 You may end up unearthing race conditions, potentially that no one really knew was there.
23:41 But I would not blame you for making Python faster, and people who write bad, poorly threat safe code fall into some trap there. But I guess there's even those kinds of unintended consequences.
23:51 I guess that one sounds like pretty low risk.
23:53 To be honest.
23:55 Also, the warm up time will get a warm up time. Now, what will happen is, of course, it's just getting faster. So it's no slower to start with, but it still has the perception that it now takes a while to get up to speed, whereas previously it used to get up to speed very quickly because it didn't really get up to speed.
24:12 It stays around.
24:14 It stayed at the same speeds.
24:15 But these are subtle things, but they're detectable changes that people may notice.
24:21 Like any Optimizer.
24:22 There are certain situations where the optimization doesn't really work.
24:27 It's not necessarily a customization, but somehow it's not any faster than previous versions, while other similar code may run much faster.
24:37 And so you have this strange effect that you make a small tweak to your code, which you think should not affect performance at all. Or you're not aware that suddenly you've made that part of your go 20% slower.
24:52 Yeah, it is one of our design goals not to have these pricing performance edges, but, yeah, there are cases where it might actually make a difference.
25:00 Things will get a bit slower.
25:01 Yeah, there are very subtle things that can have huge performance differences that I think people who are newer to Python run into like, oh, I see you can do this.
25:11 Comprehension and I had square brackets, but I saw they had parentheses.
25:14 So that's the same thing. Right?
25:16 So much, not so much. None of it's a million lines of code or a million lines of data.
25:22 All right.
25:23 So that's a great way to think about it. Not making it break. A lot of code is, I think as much as it's exciting to think about completely reinventing it, it's super important that we just have a lot of consistency now that we've kind of just moved beyond the Python 2 versus 3 type of thing.
25:38 Also, it's worth mentioning Guido you gave a shout out to Sam Gross's proposal.
25:43 Stuff you're doing is not Sam Gross's proposal. It's not about even from what I can see from the outside, that much about threading. It's more about how do I make just the fundamental stuff of Python go faster? Is that right?
25:55 That's right. These are, like completely different developments.
25:59 When we started this, we didn't actually know Sam or that there was anyone who was working on something like that.
26:06 But there have been previous attempts to remove the GIL, which is what SEM has done.
26:14 And the most recent one of those was by Larry Hastings, who came up with the great name, the Gallectomy.
26:19 That's a fantastic name.
26:21 Yeah, put a lot of time in it, but in the end, he had to give up because the baseline performance was just significantly slower than Vanilla interpreter.
26:33 And I believe it also didn't scale all that. Well, although I don't remember whether it sort of stopped scaling at five or ten or 20 cores, but then claims that he's sort of got the baseline performance.
26:49 I think within 10% or so of Vanilla 3.9, which is what he's worked off.
26:55 And he also claims that he has a very scalable solution, and he obviously put much more effort in it, much more time in it than Larry ever had.
27:04 It sounds like Facebook is putting some effort into funding his work on that, which is great.
27:09 But it feels like a very sort of bottom up project.
27:12 It feels like Sam thought that this was an interesting challenge, and he sort of convinced himself that he could do it.
27:20 And he sort of gradually worked on all the different problems that he encountered on the way.
27:26 And he convinced his manager that this was a good use of his time.
27:31 It's my theory, because that's usually how these projects go.
27:35 But you almost never have management say, oh, we got to find an engineer to make faster or make it multicore or whatever.
27:44 Find a good engineer.
27:45 It's probably like that Dropbox story you told we have all these servers. There's a lot to maintain.
27:51 Hey, what if we could have fewer of them?
27:53 What if we could do better?
27:55 That's something a manager that could totally get behind.
27:57 All right.
27:58 So you all are adopting what I see is going as the Shannon plan is in. Mark Shannon, I guess in the top left here.
28:05 That's fantastic.
28:05 I remember talking about this as well that you had posted this thing.
28:11 When was this back?
28:12 A little over a year ago. So interesting time in there.
28:16 You had talked about making Python faster over the next four releases by a factor of five, which is pretty awesome.
28:22 And you have a concrete plan to sort of make changes along each yearly release to add a little bit of performance because of geometric growth may get quite a bit faster over time.
28:33 Do you want to run through these?
28:35 Tell us about your plan.
28:36 You've got four stages.
28:37 Maybe we could talk through each stage and focus on some of the tech there.
28:40 The way we're implementing it is now kind of a bit of a jumble of stage one and two. But the basic idea is that dynamic languages, the key performance improvement is always based on specialization.
28:51 So obviously, most of the time the code does mostly the same thing as it did last time.
28:58 And even in, like, non loopy code for a web server, they're still like a big loop level. That sort of like request response level.
29:07 So you're still hitting the same sort of code.
29:09 And those codes are doing much the same sort of thing. And the idea is that you multiply the code. So it works for those particular cases.
29:17 Specialize in the obvious sort of simple stuff is like binary arithmetic.
29:21 I have a special version of adding it to a special version floats.
29:25 Obviously, Python, it's much more to do. Special versions for different coding, different things and different attributes and all this sort of stuff that's the key first stage that's mixed in with the second stage, which is really much more to just doing lots and lots of little bits and Tweaks memory layout, so that's to do better memory layout.
29:45 Modern CPUs are extremely efficient, but you still have to fetch from Speedline issues with fetching stuff from memory.
29:52 So how things are laid out in memory is key performance. And it's just those little bits and tweaks here and just kind of writing the code as we would if it had been written for speed in the first place.
30:03 A lot of C Python is old and it's just sort of evolved and a lot of it has this lost potential for just sort of rearranging data structures and rearranging the code and so on.
30:15 And these all add up a few percent here, a few percent there. And it doesn't take many of those to get at least speed up.
30:21 So that's the first two stages and those are the ones where we have some pretty concrete idea of what we're doing, right.
30:26 And this is the kind of stuff that will benefit everybody.
30:29 We all use numbers, we all do comparisons, we all do addition call functions and so on.
30:34 Yeah. I mean, the way we're sort of trending with performance in the moment is that sort of Webby type code, but web back end sort of code.
30:42 You'd be looking at kind of where we are now 25 30% speed up, whereas if it's machine learning sort of numerical code, it's more likely to be 10% region.
30:53 Obviously, we'd hope to push both up.
30:54 I don't think we're particularly focused on either.
31:00 It's just often a case where the next sort of obvious sort of convenient speed up lies.
31:06 Although everyone talks about speed ups, I've been doing the same myself.
31:09 It's best think of really at the time something takes to execute, so it's often just shaving off 1% of the time rather than being up by 1%. And because obviously, as the overall runtime shrinks, what were marginal improvements become more valuable.
31:24 Shaving off 0.2% might be worth it now, but once you spread something up by a factor of three or four, then that suddenly becomes a percent.
31:32 And it's worth the efforts.
31:35 This portion of Talk Python to Me is sponsored by Linode.
31:39 Cut your cloud bills in half with Linode's Linux virtual machines.
31:42 Develop, deploy, and scale your modern applications faster and easier.
31:47 Whether you're developing a personal project or managing larger workloads, you deserve simple, affordable and accessible cloud computing solutions.
31:54 Get started on Linode today with $100 in free credit. For listeners of Talk Python, you can find all the details over at 'talkpython.fm/linode'. Linode has data centers around the world with the same simple and consistent pricing, regardless of location.
32:10 Choose the data center that's nearest to you.
32:13 You also receive 24/7, 365 human support with no tears or handoffs, regardless of your plan size.
32:21 Imagine that real human support for everyone.
32:24 You can choose shared or dedicated Compute instances, or you can use $100 in credit on S3 compatible object storage, managed Kubernetes clusters, and more.
32:34 If it runs on Linux, it runs on Linode.
32:36 Visit 'Talkpython.FM' and click the Create Free Account button to get started.
32:41 You can also find the link right in your podcast player showing us.
32:44 Thank you to Linode for supporting Talk Python.
32:49 Yeah, which leads on to stages three and four. So just in time compilation is always hailed as a way to speed up interpreted languages.
32:56 Now, before you move on.
32:57 Let me just sort of list out what you have on stage, too, for people who haven't drove into this because I think some of the concrete details people hear this in the abstract. They kind of want to know.
33:07 Okay, Well, what actually are some of the things you all are considering. So improved performance for integers less than one machine word.
33:15 It's been a long time since I've done C++ is a word. Two bytes. How big is a word word is, how big?
33:20 It depends on the machine. So that'll be 64 bits for pretty much anything now, apart from, like, a little tiny embedded system, which is 32.
33:27 So that's a lot of numbers.
33:30 Many of the numbers you work with are less than 2 billion or whatever that is.
33:34 Yeah, basically, there are two types of integers. There are big ones they use for cryptography and other such things, where it's a number in a sort of mathematical sense. But it's really some elaborate code.
33:46 And then those numbers actually represent the number of things or the number of times you're going to do something.
33:50 And those are all relatively tiny and they are all fit. So the long ones used for cryptography and so on are relatively rare and they're quite expensive.
33:56 So it's the other ones we want to optimize for, because when you see an integer that's the integers you get, they aren't in the quarterly.
34:04 They're in the thousands.
34:08 A loop index or an array index or something.
34:11 Some languages, one that I'm thinking of. That also, maybe is kind of close to where Guido is right now. Also in Microsoft space is C#, which treats integers sometimes as value types and sometimes as reference types, so that when you're doing, like loops and other stuff, they operate more like C++ numbers and less like Py pointers to Py long objects.
34:35 Have you considered any of that kind of stuff? Is that what you're thinking?
34:38 An obvious thing is an old thing as well is to have tagged integers. So basically, where we would normally have a pointer, we've got a whole bunch of zeros at the end.
34:48 64 bits machine is three, and then for alignment, it's often effectively four zeros at the end.
34:55 So we're using a 16th of the possible numbers that a pointer could hold four pointers, which means leaves a bunch of integers and floating point numbers.
35:05 So there's a number of what's called tagging schemes.
35:07 For example, Lua Jet, which is a very fast implementation of Lua, uses what's called nanboxing, which is everything is a floating point.
35:15 But there is sophisticated something like two to 53, which is a huge number of not the numbers in the floating point range. So you can use a lot of those for integers or pointers. Now that's a little problematic with 64 bit pointers, because obviously 64 bit 53.
35:29 But there are other schemes where you get a simple scheme is that basically the least significant bit is one for pointers and zero for integers or vice versa, and basically just gives you full machine performance for integers, because basically anything up to 63 bits sits in a 64 bit integer, and that's basically all of your numbers.
35:51 Okay, Because it's shifted across all the machine. The arithmetic works as normal and overflows.
35:56 You just overflow checks and a single machine instruction and things like this.
36:00 That's again pretty standard and any sort of like fast Lisp implementation and older small talk and other historical languages.
36:18 So another one that stands out here is zero overhead exception handling that's making it into 311 already. Right?
36:25 That's basically just what we used to have is we'd have a little set up and sort of tear down except instruction for every time we wanted to control block of code inside a try as you try finally, but also with statements, but we've just ditched those in favor of just a table look up. So there's an exception now it's just looked up at the table, which is what the JVM Java virtual machine does.
36:45 Yeah. Excellent.
36:46 Zero overhead is a slightly optimistic term.
36:49 It's obviously not zero overhead, but it is less.
36:52 You have a harder time find it in the profiler.
36:54 There's a little bit of memory that you didn't have before.
36:57 That's a look up table, but sort of it really is zero overhead if no exceptions happen. Right
37:02 Not quite just because there is extra memories that causes, but also because of like tracing guarantees.
37:13 Sometimes we have to insert a note where the try was.
37:16 So there's still some slight overhead then potentially in future when we compile code that should effectively become zero.
37:22 But it is definitely reduced.
37:23 Mark Apple surprised the world and they took their phone chips and turned them into desktop chips.
37:29 And that seemed to actually work pretty well with the Arm stuff.
37:33 There's a switch not just having basically just X 86 and 64 bit stuff to think about. But now you also have this Arm stuff. Does that make life harder or easier?
37:42 Does it open up possibilities, or is it another thing to deal with?
37:45 It's just harder because we were never excluding those anyway, and we may want to look to the future risk five currently, CPython makes Net is affordable.
37:57 That's a key thing.
37:59 It's portability rather depends on testing.
38:04 It's all very well, saying it's perfectly portable, but if you have never tested on a platform, you may have surprises, but it's all written in C.
38:11 Portability is a sort of serious consideration.
38:14 So things like this tagging. I was just talking about that's technically not portable C, but it's basically a lot of things aren't technically portable C, but in effect, technically it's impossible to write memory allocator and C, because the specification says once you've called free, you can't access the memory, which makes it kind of difficult to write something that handles the memory.
38:35 But these are oddities.
38:37 But in practice, if you write sensible C Code that you should expect to be portable.
38:43 So we are kind of basing around that.
38:51 They're interpreted often written in assembler or some variant of it.
38:54 There's definitely a performance advantage in that, but I'm not convinced it's great enough to lose the portability and the maintenance overhead.
39:01 Yeah, and one of the things that you focused on Guido was that you wanted this to be to keep one of the constraints is you said you want to keep the code maintainable, right.
39:11 This is important.
39:12 Why does that matter so much rather than if we can get 20% speed up? If Mark refreshes his assembly language skills.
39:18 Well, it would leave most of the core development team behind.
39:23 And so suddenly Mark would be very valuable contributor because he's the only one who understands that assembly code. That's just how it goes.
39:34 And I don't think that that would be healthy for the Python ecosystem if the technology we used was so hard to understand and so hard to learn, making it so hard to maintain.
39:47 Then, as an open source project, we lose velocity.
39:51 The only thing that would sort of cause to happen in the core team might be people decide to move more code to Python code, because now the interpreter is faster anyway, so they don't have to write so much in C Code. But then, of course, likely it's actually going to be slower.
40:10 At least that particular bit of code.
40:12 That's an interesting intention to think about. If you could make the interpreter dramatically faster, you could actually do more Python and less C.
40:21 I don't know.
40:22 It would have to be there's some big number where that happens, right? It's not just a 10%.
40:26 But maybe that could be in the distant future, but Nevertheless, I wouldn't want the C Code to be unreadable for most of the core developers.
40:35 Yeah, I agree.
40:36 That makes a lot of sense.
40:37 Being a C expert is not a requirement for being a core developer. In practice, quite a few of the core developers are really good C coders, and we support each other in that we take pride in it, and we help each other out.
40:53 Code reviews are incredibly important, and we will happily help newbies to sort of get up to speed with C if we had a considerable portion that was written in assembler, and then it would have to be written in multiple assemblers.
41:10 Or there would also have to be a C version.
41:13 For platforms where we don't have access to the assembler.
41:17 Nobody has bothered to write that assembler code yet.
41:20 All these things make things even more complicated than they already are.
41:24 Right, and the portability the approachability of it is certainly a huge benefit.
41:30 Two other constraints that you had here, maybe you could just elaborate on real quick is don't break stable ABI compatibility, and don't break limited API compatibility.
41:39 Yeah, so the stable ABI is the Application Binary Interface, and that guarantees that extension modules that use a limited set of Capi functions don't have to be recompiled for each new Python version.
41:57 And so you can, in theory, have a wheel containing binary code, and that binary code will still be platform specific, but it won't be Python version specific.
42:07 Yes, that's very nice.
42:09 We don't want to break that. It is a terrible constraint because it means we can't move fields like the reference count or the type field around any object, many other things as well, but nevertheless, it is an important property because people depend on that.
42:26 Sure and the API compatibility. Well, that's pretty clear. You don't want people to have to rewrite it.
42:30 The limited API is sort of the compiled time version of the stable Abi. I think it's the same set of functions, except the stable Abi actually means that you don't have to recompile. The limited API offers the same.
42:46 And I think a slightly larger set of API functions where if you do recompile, you're guaranteed to get the same behavior.
42:55 And again, our API is pretty large, and a few things have snuck into the limited API and stable ABI that sort of are actually difficult to support with changes that we want to make.
43:11 And so sometimes this holds us back. But at the same time, we don't want to break the promises that were made to the Python community about API compatibility.
43:22 We don't want to say sorry, folks, we made everything 20% faster, but Alas, you're going to have to use a new API, and all your extensions just recompiling isn't going to be good enough.
43:34 Some functions suddenly have three arguments instead of two or no longer exists or return memory that you own instead of returning a borrowed reference.
43:45 And we don't want to do any of those things because that just would break the entire ecosystem in a way that would be as bad as the Python 3 transition, right.
43:55 It's not worth it.
43:57 All right.
43:57 Let's go back to the Shannon plan. So we talked about stage one and stage two and Mark, I see here. This is Python 3.10 and Python 3.11.
44:05 Are those the numbers where they're actually going to make it in? Or do we just do, like, a plus plus or plus SQL on them?
44:11 I think the plus one would be appropriate.
23:55 All right. Plus equals one.
44:15 Yeah. So maybe a bit faster because obviously I envision this with basically me and one other developer plus, maybe sort of some sort of reasonable buy in from the wider code of a team.
44:27 So I wasn't sort of doing the work entirely in isolation, but yeah, I'm still having extra hands will definitely help things.
44:36 So back when you were thinking this was written at 39 timeframe, right. And you're like, okay, well, the next version, maybe we can do this. The version after that. And by the time it really got going, it's more like 311, 312 and so on right
44:47 Yeah It's just around the time. I think we switched from 3.09 to 3.10.
44:52 Okay. So stage three out of the four stages you have is, I guess, Python 313 now, which is a miniature jet compiler.
45:03 Is that right characterization.
45:04 I think that's not the compiler.
45:06 Well, I suppose it will be smaller.
45:08 Maybe the parts it applies to the parts that get.
45:10 Yeah so I think the idea is that you want to compile all of the code where performance matters that sort of hot code, but it makes life easier if you just compile little chunks of code and sort of stitch them together afterwards, because it's very easy to fall back into the interpreter to jump into sort of compiled code.
45:33 You can sort of just hang these bits of compiled code off by individual byte codes where they sort of start from.
45:39 Obviously, that's not fantastic for performance, because you're having to fall back into interpreter, which limits your ability to infer things about the state of things. So obviously, as I said earlier, specialization, you have to do some type checks and other sort of checks.
45:55 If you've done a whole bunch of checks, if you then fall back into the interpreter, you have to throw away all the information.
46:00 If you compile a bigger region of code, which is of the stage four, then you already know something about the code and you can apply those compilations.
46:09 The problem with trying to do big regions up front is that if you choose poorly, you can make performance worse.
46:17 And this is the real issue for existing ones. I think we're going to talk about some of the other historical sort of compilers in the past, and this is a real issue for those that they just try to compile method at a time, regardless of whether that is a sensible unit to compile, right.
46:32 It's sometimes hard to do optimizations when it's too small, right.
46:36 Yeah and also it's very expensive to do regions that are too big or just in the bounded in the wrong places.
46:43 Okay, yeah, that definitely sounds tricky.
46:44 There was a question earlier about MyPy C work and the mypy stuff, and you are really central to that doing a lot of work there.
46:52 How do both of you either of you feel about using type annotations as some sort of guide to this compiler? For example, Cython? Let's just say X: int as a parameter, and it will take that as meaning something.
47:07 When you compile with Cython, it seems like Mark is talking about knowing the types and guessing them correctly matters in terms of what's faster is there any thought or appetite for using type annotations to mean more than static analysis?
47:21 It's a great idea.
47:22 And I think for smaller code bases, something like mypy C will prove to be viable or for code bases where there is an incredible motivation to make this happen.
47:35 I could see that happen at Instagram, for example.
47:38 But in general, most people haven't annotated their code completely and correctly.
47:45 And so if you were to switch to using something like mypy C, you'd find that basically it wouldn't work a large number of cases.
47:57 It's a different language and it has different semantics, and it has sort of different rules.
48:03 And so you have to write to that.
48:05 I can see there's a big challenge to say, hey, everybody, we can do this great stuff. If you type annotated, and only 4% of people have properly annotated their code, and then there's also the possibility that it's incorrectly annotated, in which case it probably makes it worse in some way of a crash or something.
48:24 Mypy C will generally crash if a type is detected.
48:28 That doesn't match the annotation.
48:30 Yeah, and if you annotate stuff with simple types, you can get quite good speed up. So number is generally time for numerical stuff. But again, it's a simple type. Integers, floats Cython do this number, does it dynamically side statically.
48:45 And the number model, for example, is similar to the model that Julia language users.
48:50 Essentially, you compile method at a time, but you make as many specializations as you need for the particular types.
48:57 And I can give very good performance for that sort of numerical code.
49:01 But the problem is that saying something is a particular type doesn't tell you very much about it.
49:06 It doesn't tell you what attributes an instance of it may or may not have.
49:10 It depends.
49:12 It's not like Java or C++ where having a particular class means it has those instance attributes and they will exist, or at least they exist in a particular place, and they can be checked very efficiently because of dictionary look up and so on. These things get a bit fuzzy, sir.
49:27 72 bytes into the C object is where you find the name or something like that. Right? Yeah.
49:32 So basically because anything might not be, as the annotations say, effectively at the virtual machine level, we have to check everything.
49:41 And if we're going to check it anyway, we may as well just check it once up ahead as we first do the compilation or ever specialization, and then assume it's going to be like that, because if the annotations are correct, then that's just as efficient.
49:55 And if the annotations are wrong, we still get some performance benefit, and it's robust as well. So there's really no.
50:02 The only advantage annotations is for this sort of like very sort of loopy code where we can do things like loop transformations and so on, because we can infer the types from the arguments of enough of the function to do stuff. And that works great for numerical stuff, but for more general, code is problematic.
50:21 What about slots?
50:30 Slots are an interesting, not frequently used aspect of Python types that seem to change how things are laid out a little bit.
50:33 One of Mypy C main tricks is that it turns every class into a class with slots.
50:40 If you know how slots work, you will immediately see the limitation because it means there are no dynamic attributes at all.
50:49 These are what you get for your fields, and that's it. Yeah.
50:52 I mean, if you don't have dynamic attributes, though, it gives you pretty efficient memory use.
50:58 It's not too far of Java and more predictability about what's there and what's not, which is why they came to mind.
51:04 Yeah, they definitely have their use.
51:05 All right, Mark, that was your four stage plan, hoping to make 1.5 times as fast as before.
51:12 Each time which you do that over four releases, you end up with five times faster. Right. That's the Shannon plan.
51:19 Where are we on this?
51:20 How's it going for you and everyone on the team?
51:22 I say it's a bit of a jumble of stages one and two that we're implementing largely because it's a larger and more diverse team that was expecting. So it makes sense to just sort of spread things.
51:34 You'll work on operators, you go work on zero overhead, accept handling.
51:41 Yeah I would say from where we are now, I was probably a bit optimistic with stage one, but stage two seems to have a lot of potential.
51:50 Still, there's always little bits of the interpreter we can tweak and improve.
51:53 So between the two of them, I'm confident we'll get projected over twice the speed.
51:59 That's fantastic. So the course you're on right now if let's just say stage one and two happen, and for some reason, the jet stuff doesn't.
52:06 That's still a big contribution.
52:08 What do you think in terms of speed up for that?
52:10 Well, again, it's going to depend a lot.
52:13 I know it matters so much.
52:15 But I just want to because currently we have a sort of set of benchmarks that we're using.
52:22 The more benchmarks is always better. So it's a broad set individually, the benchmarks, some of them are great, but collectively, it's a sort of useful, they said. But I mean, we speed up from up, like up to 60%, down to zero so it's definitely a spread, so it can try it out would be the thing. I mean, you can download 311, Alpha One and Alpha two should be out a few days at all time now, presumably before they publish a podcast.
52:49 Yeah. Fanstatic
52:50 So people can download it play.
52:52 Yeah, that's fantastic.
52:53 Thank you for this. I think even 50 60% if it stays there.
52:57 That's pretty incredible.
53:00 This language has been around for 30 years.
53:02 People have been trying to optimize it for a long time. It's incredible. Right. And then to do this sort of change now, that would be really significant.
53:09 Yeah This is an area that we haven't spent much time on previously for various reasons.
53:16 People have spent a lot of time on sort of making the string objects fast, making dictionary operations fast, making the memory efficient, adding functionality that the Python has generally, I think, has more of a focus on functionality than on speed.
53:35 And so for me, this is also a change in mindset.
53:38 I'm still learning a lot.
53:39 Mark actually teaches me a lot about how to think about this stuff, and I decided to buy this horrible book.
53:45 Well, it's great book, but it weighs more than 17 laptop.
53:55 Classic text, but not a light read.
53:57 Yeah down into beyond the software layer into the hardware beds.
54:01 It makes me amazed that we have any performance at all, and that any performance is predictable because we're doing everything wrong from the perspective of giving the CPU something to work with.
54:15 All the algorithms described in their branch prediction, speculative execution, caching of instructions, all that is aimed at small loops of numerical code, and we have none of that.
54:28 Yeah, exactly.
54:31 C else is not a numerical loop.
54:32 Definitely not.
54:33 All right. Well, I think that might be it for the time we have.
54:35 I got a couple questions from the audience out there to be. An army captain says I'm interested in Guido's thoughts about the Microsoft funded effort versus the developer in residence, particularly in terms of the major work of the language and the C Python runtime going forward.
54:51 I think these are both good things, both really good things.
54:55 They seem super different to me.
54:56 I think it's great that we have a developer in residence.
54:59 It's a very different role than what we're doing here.
55:02 The team at Microsoft is at least we're trying to be super focused on performance to the exclusion of almost everything else except all those constraints I mentioned.
55:11 Of course, the developer in residence is focused on sort of the community other core developers, but also contributors.
55:23 Lucas is great.
55:24 He's the perfect guy for that role, and his work is completely orthogonal to what we're doing.
55:30 I hope that somehow the Psf finds funds for keeping the developer in Residence role and maybe even expanding it for many years.
55:40 It seems to me like a really important role to smooth the edges of people contributing to C Python and the difference of what Mark and you all are doing is heads down focused on writing one type of code, whereas Lucas is there to make it easier for everyone else to do whatever they were going to do.
56:00 And I think one sort of a horizontal scale of the C Python team, and the other is very focused, which is also needed.
56:07 It's actually amazing that we've been able to do all the work that we've been doing over the past 30 years on Python without a developer in residence.
56:18 I think in the early years, I was probably taking up that role, but the last decade or two, there just have been too many issues, too many PEPs for me to sort of get everything going.
56:33 I was always working part time on Python and part time working on my day job.
56:37 Right. Absolutely.
56:38 Lucas is working full time on Python, and he has a somewhat specific mandate to sort of.
56:47 Help people.
56:48 Contributions go smoother.
56:50 Make working with the issue tracker easier.
56:54 And that sort of developer contributors must be encouraged and rewarded.
56:59 And currently often the way the Python.
57:03 Org experience is.
57:05 It's a very old web app, and it looks that way and it's difficult to learn how to do various things with that thing.
57:13 And so Lucas is really helping people.
57:14 Yeah, it's fantastic.
57:17 Of course.
57:18 There's also the somewhat separate project of switching from Bugs Python.
57:23 Org to a purely GitHub based tracker.
57:26 Yeah I was just thinking of that as you were speaking there. Do you think that'll help? I feel like people are more familiar with that workflow.
57:32 People are more familiar.
57:33 It's more integrated with the pull request flow that we already have on GitHub.
57:39 I think it'll be great expectations is that I think it'll be actually happening before the end of this year or very early next year.
57:45 That would be fantastic.
57:46 The code is already there. The work is already there.
57:49 Might as well have the conversations and the issues and whatnot I think we are definitely overtime, but I really appreciate, first of all, the work that you're doing, Mark on this project Guido on the last 30 years. This is amazing. You can see on the comments how appreciative folks are for all the work you've done. So thank you for that.
58:08 Let's close with a final call to action.
58:11 You have the small team working on. I'm sure the community can help in some way.
58:15 What do you want from people?
58:16 How can they help you either now or in the future?
58:19 I mean, it's just contribute to C Python, so I don't think it's specifically performance.
58:24 All contributions help improve code quality and reliability are still very important.
58:32 So I don't think two, three people can do, but we do have ideas repo. If people do have things they want to suggest or bounce ideas around.
58:44 Maybe they could test their workloads on Alpha versions of things like that.
58:49 Yeah, that would be fantastic.
58:51 We don't really have a set for where people can put information, but just open an issue on the ideas thing and have some data.
58:57 Be fantastic.
58:58 We'd love it for people to try to use the new code and see how it works out for them.
59:03 Yeah, fantastic.
59:05 All right. Well, thank you both for being here. It's been great.
59:08 Our pleasure.
59:09 Thank you.
59:10 This has been another episode of Talk Python to me.
59:13 Thank you to our sponsors.
59:15 Be sure to check out what they're offering. It really helps support the show.
59:18 Choose Shortcut formerly Clubhouse IO for tracking all of your projects work because you shouldn't have to project manage your project management.
59:27 Visit 'talkpython.fm/Shortcut' Simplify your infrastructure and cut your Cloud bills in half with Linode's Linux virtual machines.
59:33 Develop, deploy and scale your modern applications faster and easier.
59:37 Visit 'TalkPython.fm/linode' and click the Create Free Account button to get started.
59:43 Do you need a great automatic speech to text API?
59:46 Get human level accuracy in just a few lines of code?
59:48 Visit 'talkpython.fm/assemblyAI' want you level up your Python, we have one of the largest catalogs of Python video courses over at Talk Python.
59:57 Our content ranges from true beginners to deeply advanced topics like memory and async and best of all, there's not a subscription in site.
01:00:04 Check it out for yourself at
01:00:05 'training.talkpython.fm' Be sure to subscribe to the show.
01:00:09 Open your favorite podcast app and search for Python.
01:00:11 We should be right at the top.
01:00:13 You can also find the itunes feed at /itunes, the Google Play feed at /Play and the Direct RSS feed at /rsson Talk Python FM.
01:00:23 We're live streaming most of our recordings these days.
01:00:25 If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkpython.fm/youtube'.
01:00:34 This is your host, Michael Kennedy.
01:00:35 Thanks so much for listening. I really appreciate it.
01:00:38 Now get out there and write some Python code.