Making Python Faster with Guido and Mark

Episode #339, published Thu, Nov 4, 2021, recorded Mon, Nov 1, 2021

Episode Deep Dive Links Transcript

There has a been a bunch of renewed interested in making Python faster. While for some of us, Python is already plenty fast. For others, such as those in data science, scientific computing, and even the large tech companies, making Python even a little faster would be a big deal.

This episode is the first of several that dive into some of the active efforts to increase the speed of Python while maintaining compatibility with existing code and packages.

Who better to help kick this off than Guido van Rossum and Mark Shannon? They both join us to share their project to make Python faster. I'm sure you'll love hearing what they are up to.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

Guido van Rossum is the creator of Python, often referred to as Python's "benevolent dictator for life" (BDFL) until 2018. After retiring from Dropbox and stepping away from some leadership roles, he joined Microsoft to form a small team dedicated to optimizing CPython's performance.

Mark Shannon is a Python core developer who has worked extensively on performance, compiler internals, and just-in-time (JIT) approaches for speeding up Python. His proposal, nicknamed the "Shannon Plan," is guiding much of the current CPython optimization efforts.

What to Know If You're New to Python

If you're new to Python and want to fully appreciate the details in this episode, here are a few key points:

CPython vs. Other Runtimes: CPython is the "default" implementation of Python, the one you get when you download Python from python.org. This episode centers on making that version faster.
The GIL (Global Interpreter Lock): CPython uses a lock to ensure only one thread runs Python code at a time, which can limit CPU parallelism. You'll hear discussions around attempts to remove or work around it.
Reference Counting and Memory: Python primarily uses reference counting to manage memory. This is integral to how and when objects are cleaned up, important for performance.
Performance vs. Flexibility: Python's dynamic nature makes it incredibly flexible. However, that same flexibility introduces performance challenges that many contributors (including Guido and Mark) are now tackling head-on.

Key Points and Takeaways

Speeding Up CPython as a Primary Goal Python's huge user base relies on the CPython implementation, so any speed improvement here helps nearly everyone. Rather than creating a specialized fork or a separate engine like PyPy, this approach keeps total compatibility with existing code and extension modules.
- Links and Tools:
  - CPython repository
  - PyPy
Microsoft's Dedicated Python Performance Team Guido formed a small, focused team at Microsoft, initially him, Mark Shannon, and Eric Snow, with a later addition of Brett Booker, to work on CPython performance full-time. Their efforts are entirely open-source, merging changes into mainline Python rather than maintaining a private fork.
- Links and Tools:
  - Microsoft's open-source projects
The Shannon Plan Mark Shannon proposed a four-stage roadmap aiming to achieve roughly a 5x speed boost over several Python releases. While the stages have somewhat merged in practice, they include specialized "hot paths," better memory layout, some level of inline or tiered JIT, and continuing refinements.
- Links and Tools:
  - Discussion on Python-Dev mailing list (search for "Shannon Plan")
Keeping Python's Codebase Maintainable One critical constraint is that any speedup must not make CPython's C codebase so complex or specialized that few can contribute. Writing raw assembly or removing the GIL outright in a naïve way could hurt long-term sustainability. This maintainability ethic also ensures minimal disruption for the broader ecosystem.
Stable ABI and Extension Module Compatibility CPython has a stable ABI (Application Binary Interface) that lets extension modules continue working without recompilation across minor versions, provided they use a "limited API." The team does not want to break this or impose major rewrites on maintainers, preserving Python's strong community library support.
- Links and Tools:
  - PEP 384 – Defining a Stable ABI
Comparisons with Other Approaches Projects like PyPy offer speed benefits but have partial support for certain C extensions, which is a deal-breaker for many. Meanwhile, Sam Gross's no-GIL fork shows promise for multi-core parallelism but still remains separate from mainline CPython. Guido and Mark's path is incremental, focusing on broad compatibility.
Zero-Overhead Exception Handling Python 3.11 introduced a more efficient exception handling mechanism that skips certain extra instructions if no exception is thrown. This can lead to noticeable performance boosts in real-world code with many try blocks or with statements.
JIT Compilation Plans Later phases of the Shannon Plan mention a modest or miniature JIT. The idea is to compile small "hot" segments of code as they run, with quick fallbacks to the interpreter. This incremental approach is less risky than a full method-at-a-time JIT that might degrade performance unpredictably.
The Role of the Developer in Residence Although not directly about performance, the Python Software Foundation (PSF) now funds a developer in residence (Łukasz Langa) to help triage issues, review PRs, and keep contributions flowing smoothly. This organizational support complements the Microsoft effort and ensures the language stays healthy overall.
- Links and Tools:
  - [PSF Developer in Residence announcement](https://pyfound.blogspot.com/search/label/Developer in Residence)
Impact on the Community and Ecosystem Beyond speed, these changes save energy, reduce hardware budgets for large deployments (like Dropbox), and help Python remain competitive for both large-scale services and iterative data science workflows. The team encourages testing alpha builds and sharing real-world benchmarks.

Interesting Quotes and Stories

Guido on "Retiring": "I just like the idea of retiring. So I try to see how many times in a lifetime I can retire… but then the pandemic hit, and I found myself wanting to code in a team again."
Mark Shannon on Python as a Challenge: Referencing Armin Rigo, "Python is such a great language to use and such a challenge to optimize," underlining how dynamic features make for unique performance hurdles.
Guido on Team Culture: "We don't have a private fork. We work fully in the open, merging changes in as soon as we can so everyone benefits."

Key Definitions and Terms

CPython: The default and most widely used implementation of the Python language, written in C.
GIL (Global Interpreter Lock): A mechanism in CPython that allows only one thread to execute Python bytecode at a time.
JIT (Just-in-Time Compilation): A technique where code is compiled on the fly at runtime for performance.
Stable ABI: A set of C-level interfaces in Python guaranteed not to break across certain versions, allowing compiled extension modules to keep working without recompilation.

Learning Resources

Here are a few curated courses from Talk Python Training to help you go deeper into Python, its performance, and internals:

Python for Absolute Beginners: If you're new to Python, this course gives you a thorough foundation in the language.
Python Memory Management and Tips: Learn about reference counting, garbage collection, and performance considerations in Python's memory model.
Python 3.11: A Guided Tour Through Code: Dive into Python 3.11's new features, including its performance improvements and refined exception handling.

Overall Takeaway

Python is evolving to meet the rising demands of performance-critical applications. Thanks to the work of Guido, Mark, and other core developers, CPython is seeing meaningful speedups without sacrificing the language's hallmark compatibility and simplicity. This renewed emphasis on performance, combined with the community's commitment to maintainability, ensures that Python will remain not just easy to learn, but also fast enough to keep up with cutting-edge needs.

Links from the show

Guido van Rossum: @gvanrossum
Mark Shannon: linkedin.com
Faster Python Plan: github.com/faster-cpython
The “Shannon Plan”: github.com/markshannon
Sam Gross's nogil work: docs.google.com
Watch this episode on YouTube: youtube.com
Episode #339 deep-dive: talkpython.fm/339
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #339 deep-dive: talkpython.fm/339

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 There's been a bunch of renewed interest in making Python faster. While for some of us,

00:04 Python is already plenty fast, for others, such as those in data science, scientific computing,

00:09 and even large tech companies, making Python even a little faster would be a big deal.

00:13 This episode is the first of several that dive into some of the active efforts to increase the

00:20 speed of Python while maintaining compatibility with existing code and packages. And who better

00:25 to help kick this off than Guido van Rossum and Mark Shannon. They both joined us to share their

00:30 project to make Python faster. I'm sure you'll love hearing about what they're up to.

00:34 This is Talk Python To Me, episode 339, recorded November 1st, 2021.

00:54 Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy.

00:59 Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past

01:03 episodes at talkpython.fm. And follow the show on Twitter via at Talk Python. We've started streaming

01:09 most of our episodes live on YouTube. Subscribe to our YouTube channel over at talkpython.fm slash

01:15 YouTube to get notified about upcoming shows and be part of that episode. This episode is brought to

01:21 you by Shortcut and Linode, and the transcripts are sponsored by Assembly AI.

01:26 Mark Guido, welcome to Talk Python To Me. Fantastic to have you here. I'm so excited about all the

01:35 things that are happening around Python performance. I feel like there's just a bunch of new ideas

01:40 springing up and people working on it, and it's exciting times.

01:43 Definitely. You two are, of course, right at the center of it. But before we talk

01:48 about the performance work that you are all doing, as well as some of the other initiatives going along,

01:53 maybe in parallel there, let's just get started with a little bit of background on you. Guido,

01:58 you've been on the show before, creator of Python. You hardly need an introduction to most people out

02:03 there. But you have recently made a couple of big changes in your life. I thought I'd just ask you how

02:08 that's going. You retired, and we were all super happy for you on that. And then you said, you know

02:13 what, I kind of want to play with code some more. And now you're at Microsoft. What's the story there?

02:17 Oh, I just like the idea of retiring. So I try to see how many times in a lifetime I can retire.

02:23 And starting with my retirement from BDFL didn't stop me from staying super active in the community.

02:30 But when I retired from Dropbox a little over two years ago, I really thought that that was it,

02:36 that I believed it. And everybody else believed it too. Dropbox certainly believed it. They were very

02:43 sad to see me go. I was sad to go, but I thought there was time. And I had a few great months decompressing,

02:52 going on bike rides with my wife and family, fun stuff. And then the pandemic hit.

02:59 Yeah.

02:59 And a bunch of things got harder. Fortunately, the bike rides eventually got restored. But

03:05 other activities like eating out was a lot more stressful. Basically, just life was a lot more

03:11 stressful in general.

03:12 Right. And the human interaction was definitely shrunken down to a kernel.

03:16 Yeah. And somehow I thought, well, I want to have something to do. I want to do more sort of

03:25 software development in a team. And the Python core development team didn't really cut it for me,

03:30 because it's sort of diffuse and volunteer based. And sometimes you get stuck waiting for months for

03:37 the steering council to sort of approve of or reject a certain idea that you've worked on.

03:45 So I asked around and I found that Microsoft was super interested in hiring me. And that was now,

03:52 well, tomorrow, exactly a month, a year, tomorrow, a year ago, I started at Microsoft officially.

03:58 Yeah.

03:59 In the beginning, I just had to find my way around at Microsoft. Eventually, I figured I should pick a

04:06 project. And after looking around and realizing I couldn't really sort of turn the world of machine

04:13 learning upside down, I figured I'd stay closer to home and see if Microsoft was interested in funding

04:19 a team working on speeding up CPython. And I was actually inspired by Mark's proposals that were

04:27 going around at the time. So I convinced people, Microsoft to sort of start a small team and get Mark on board.

04:35 Yeah, that's fantastic. I also feel a little bit like machine learning is amazing, but I don't have a lot of

04:41 experience with it. And whenever I work with it, I always kind of feel on the outside of it. But this core

04:47 performance of Python, that helps everybody, right? Including even Microsoft, right? It maybe saves them

04:53 Oh, absolutely.

04:54 energy on Azure when they're running Python workloads or whatever. So you're enjoying your time? You're happy you're there?

05:00 I'm very happy.

05:00 Yeah, yeah, a lot of freedom to basically pursue what you are, right?

05:05 Yeah, it's nice that the new Microsoft is very open source friendly, at least in many cases, obviously,

05:10 not everywhere. But our department is very open source friendly. Things like Visual Studio Code are all open source. And so there was great support with management for sort of the way I said, I wanted to do this project, which is completely out in the open. Everything we do is sort of just merged into main as soon as we can.

05:33 Yeah, we work with the core developers. We don't have like a private fork of Python, where we do amazing stuff. And then we knock on the steering council door and say, Hey, we'd like to merge this.

05:47 Yeah, you're not going to drop six months of work just in one block, right? It's there for everyone to see.

05:53 Exactly.

05:54 I think that's really, really positive. And wow, what a change, not just for Microsoft, but so many companies to work that way compared to 10, 15 years ago.

06:03 Yeah, absolutely. Now, before I get to Mark, I just want to, you know, some bunch of people are excited that you're here. And Luis out in the audience said, Wow, it's Guido. I can't thank you enough for your amazing Python and all the community.

06:17 Great to hear.

06:17 Mark, how about you? How'd you get into this Python performance thing? I know you did some stuff with Hot Pie back in the day.

06:24 Yeah, that was sort of my PhD work. So I guess I kind of go into the performance almost before the Python. So I was doing sort of compiler work, masters. And obviously, just, you know, you need to write scripts and just get stuff done. And, you know, just Python is just a language to get stuff done. And then it's that, I think Armin Rigo, sort of, I think one of his sort of credits in one of his papers or something says,

06:52 Thank you for Python for being such a great language to use and such a challenge to optimize. So it's doubly good if you're coming at it from a sort of it. So it provides this great intellectual challenge when you're actually trying to optimize it. And it's a really nice language to use as well. So it's doubly good.

07:07 It is doubly good. It's doubly good. Yeah. And before we move on really quick, Paul Everett says, It's really impressive how the in the open work has been done. Yeah, totally agree.

07:16 Hi, Paul.

07:17 Yeah, keep that going. Hey, Paul, happy to see you here.

07:19 We're going to talk about making Python faster. But I want to start this conversation, a bit of a hypothetical question, but sort of set the stage and ask, how much does Python really need to be faster?

07:31 Because on one hand, sure, there's a lot more performance we can do if you're going to say, well, we're going to solve the in-body problem using C++ or C# versus Python. It's going to be faster with the native value types and whatnot.

07:44 On the other, people are building amazing software that runs really fast with Python already. We've got the C optimizations for things like NumPy and SQLAlchemy's transformation layer, serialization layer, and so on. So a lot of times that kind of brings it back to C performance. So how much do you think Python really needs to be optimized already? Not that more is always better, faster is always better. But I just kind of want to set the stage and get your two thoughts on that.

08:11 I always think back to my experience at Dropbox, where there was a large server called the Meta

08:19 Server, which did sort of all the server side work, like anything that hits www.dropbox.com

08:27 hits that server. And that server was initially a small prototype written in Python, the client was

08:34 actually also a small prototype written in Python. And to this day, both the server and the client

08:40 at Dropbox, as far as I know, and unless in the last two years, they totally ripped it apart,

08:45 but I don't think they did. They tweaked it, but it's still all now very large Python applications.

08:51 And so Dropbox really sort of feels the speed of Python in its budget, because they have thousands,

09:02 I don't know how many thousands of machines that all run this enormous Python application.

09:08 Right. And if it was four times faster, that's not just for, you know, a quarter of the machines, that's less DevOps, less admin,

09:15 all sorts of stuff, right?

09:16 Oh, even if it was 4% faster, they would notice.

09:19 Yeah. The other area where I think it's really relevant has to do with the multi-core side of

09:26 things. I have a PC over there, 16 cores. My new laptop has 10 cores. Although with Python,

09:32 it's hard to take true advantage of that side of modern CPU performance if it's not IO bound,

09:38 right? Yeah. I don't know how deep you want me to go into that and Mark can stop me if I'm going too deep

09:44 too, but there are existing patterns that work reasonably well. If you have a server application

09:52 that handles multiple fairly independent requests. Like if you're building a multi-core web application,

10:00 you can use multi-processing or pre-forking or a variety of ways of running a Python interpreter on

10:08 each core that you have, each independently handling requests. And you can do that if you have 64 cores,

10:15 you run 64 Python processes.

10:18 Right. That's just a number in a microwave config file. It's nothing.

10:21 Yeah. It works for applications that are designed to sort of handle multiple independent requests in a scalable fashion.

10:31 There are other things that the other algorithms that you would want to execute where it's much more complicated to,

10:38 to sort of employ all your cores efficiently.

10:42 Yeah, absolutely.

10:43 That's still a nut that Python hasn't cracked. And I'm assuming you're asking this question because Sam Gross,

10:50 a very smart developer at Facebook, claims that he has cracked it.

10:54 Perhaps he has. It's an interesting idea. We'll dive into that a little bit later.

10:58 I'm more asking it just because I see a lot of people say that Python is too slow. And then I also see

11:04 a lot of people being very successful with it and it not being slow in practice or not being much slower

11:11 than other things. And so I'm more or less at the stage of like the context matters, right? This Dropbox example

11:17 you have, it really matters to them. You know, my course website where people take courses,

11:22 the response time of the pages is 40 milliseconds. If it was 38, it doesn't matter. It's really fast.

11:28 It's fine. So I think, but if I was trying to do computational biology in Python, we really want

11:33 to be able to take advantage of those 16 cores, right? So there's just such a variety of perspectives

11:39 where it matters. Mark, what are your thoughts on all this?

11:41 Well, it's just a case of saving energy, saving time. It just makes the whole thing nicer to use.

11:47 So, I mean, there's a lot of, you know, just iterative development in data science

11:52 and it's that responsiveness, the whole, you know, just breaking your train of thought

11:57 because things take too long versus just keeping in the flow and all that sort of stuff.

12:01 It's just nice to have something that's faster. I mean, it's not just the big companies saving

12:06 money as well. I mean, it's just, you know, just keeps everyone's server budgets down. I mean,

12:09 if you just need a smaller virtual instance, because you can serve the requests up fast enough

12:14 because Python's faster. So I think it's just generally a sort of responsible thing to do.

12:20 I mean, it's also just, you know, people expect technology to move forwards and there's this

12:25 feeling of, you know, falling behind or, you know, people wanting to move other languages because of

12:30 the perceived performance.

12:31 I do think that that's an issue. You know, I'm moving to go because it has better async support,

12:35 rewriting this in Rust for whatever reason. Sometimes that might make sense, but other times

12:39 I feel like that's just a shame and it could be used better. A couple of questions from the audience

12:43 just want to throw out there. Let's see. One was Guido, especially, you must be really proud to hear

12:50 about the Mars helicopter and the lander and Python in space. You know, how did you feel when you heard

12:56 about the helicopter using Python and the lander using Python and Flask and things like that?

13:01 It wasn't really a surprise given how popular Python is amongst scientists. So I didn't throw a

13:09 party, but it made me feel good. I mean, it's definitely sort of one of those accomplishments

13:14 for a piece of technology. If it's actually shot into space, you know, you've made a difference.

13:21 Yeah. I remember like 30 years ago or more when I helped some coding on European project called

13:29 Amoeba, which was like a little distributed operating system. And one of the things that

13:34 they always boasted was that our software runs on the European space station. And that was very important.

13:40 Yeah. So yeah, I totally get the feeling. And then I hope that everyone who contributed to Python

13:45 also sort of feels that their contribution has made it.

13:50 Yeah. And that sense of awe, if you look up in the night sky, it's that little,

13:53 that bright star that's actually Mars. And you think, yeah, it's up there. Yeah. Fantastic. All right.

13:58 Let's dive into some of the performance stuff that you all have been doing. So maybe Guido starts out

14:04 with the team. So you've, you've got a group of folks working together. It's not just you. And also now

14:09 Mark Shannon is working with you as well, right? That's correct. In March or so, the initial team

14:15 was Eric Snow, Mark and myself. And since I think since early October, we've got fourth team member,

14:23 Brent Booker, who is also a Python core dev since, I think since about a year and a half. He's a really

14:30 smart guy. So now we have four people, except you should really discount me as a team member because I

14:36 spend most of my time in meetings, either with a team or with other things going on at Microsoft in practice.

14:44 Sure. How closely do you work with, say, the VS Code Python plugin team and other parts? Or is this more

14:50 a focused effort?

14:51 This is more focused. I know those people. I've not met anyone in person, of course. I've not met,

14:57 I've not been to a Microsoft office since I started there, which is really crazy. But what we're doing

15:04 is really quite separate from other sort of Python related projects at Microsoft. But I sort of,

15:11 I do get called into meetings to give my opinion or sort of what I know about how the community is

15:17 feeling or how the core dev team is feeling about various things that are interesting to Microsoft or

15:24 sometimes things that management is concerned about.

15:27 Yeah. Excellent.

15:28 I'd be worth saying this, not just Microsoft as well. We've contributed from,

15:32 there's quite a few other core developers are helping out. So it's a broader effort.

15:38 This portion of Talk Python To Me is brought to you by Shortcut, formerly known as clubhouse.io. Happy

15:44 with your project management tool? Most tools are either too simple for a growing engineering team

15:49 to manage everything, or way too complex for anyone to want to use them without constant prodding.

15:54 Shortcut is different though, because it's worse. No, wait, no, I mean, it's better.

15:58 Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible,

16:04 powerful, and many other nice positive adjectives. Key features include team-based workflows.

16:10 Individual teams can use default workflows or customize them to match the way they work.

16:15 Org-wide goals and roadmaps. The work in these workflows is automatically tied into larger company

16:20 goals. It takes one click to move from a roadmap to a team's work to individual updates and back.

16:26 Tight version control integration. Whether you use GitHub, GitLab, or Bitbucket, clubhouse ties

16:31 directly into them so you can update progress from the command line.

16:35 Keyboard-friendly interface. The rest of Shortcut is just as friendly as their power bar,

16:40 allowing you to do virtually anything without touching your mouse. Throw that thing in the trash.

16:45 Iteration planning. Set weekly priorities and let Shortcut run the schedule for you with

16:50 accompanying burndown charts and other reporting. Give it a try over at talkpython.fm/shortcut. Again,

16:58 that's talkpython.fm/shortcut. Choose shortcut because you shouldn't have to project manage your

17:05 project management. Mark, what's your role on the team?

17:09 I know we already have sort of official roles, but I guess I'm sort of doing a fair bit of

17:15 sort of technical, sort of architectural side of stuff, obviously, because this is like my field.

17:19 So right. Optimizer in chief. Yeah, I guess so. All right. Guido, you gave a talk at the Python

17:26 Language Summit in May this year, talking about faster Python, this team, some of the work that

17:31 you're doing. So I thought that might be a good place to start the conversation.

17:35 Yeah. Some of the content there is a little outdated, but...

17:37 Well, you just have to let me know when things have changed. So one of the questions you ask is,

17:44 can we make CPython specifically faster? And I think that's also worth pointing out, right? There's many

17:49 runtimes. Often they're called interpreters. I prefer to the runtime word because sometimes they compile and

17:55 they don't interpret. So...

17:56 Sometimes they're called virtual machines.

17:58 Yeah. There's many Python virtual machines, PyPy, CPython. Traditionally, there's been Jython and

18:06 Iron Python, although I don't know if they're doing anything. But your focus and your energy is about

18:11 how do we make the Python people get if they just go to their terminal and type Python, the main Python

18:16 faster? Because that's what people are using, right? For the most part.

18:19 I don't have specific numbers or sources, but I believe that like between 95 and 99% of people using

18:27 Python are using some version of CPython. Hopefully not too many of them are still using Python too.

18:32 Yeah. I would totally agree with that. And I would think it would trend more towards the 99 and less

18:37 towards the 95 for sure. Maybe a fork of CPython that they've done something weird too. But yeah,

18:42 I would say CPython. So you asked the question, can we speed up CPython? And Teddy out in the live

18:48 stream, I don't know if I'll be able to catch his comment exactly how there is. He says, you know,

18:52 what will we lose in making Python faster if anything? For example, what are the trade-offs?

18:56 So you point out, well, can we make it two times faster, 10 times faster, and then without breaking

19:01 anybody's code, right? Because I think we just went through a two to three type of thing that was way

19:06 more drawn out than I feel like it should have been. We don't want to reset that again, do we?

19:11 No. Well, obviously the numbers on this slide are just teasers.

19:15 Of course. I don't know how to do it. I think Mark has a plan, but that doesn't necessarily mean he knows

19:21 how to do it exactly either. The key thing is, and sort of to answer your audience question without

19:28 breaking on anybody's code. So we're really trying to sort of not have there be any downsides to adopting

19:36 this new version of Python, which is unusual because definitely if you use PyPy, which is,

19:45 I think, the only sort of competitor that competes on speed that is still alive, and in some use,

19:52 you pay in terms of how well does it work with extension modules. It doesn't work with all extension modules.

20:00 And with some extension modules, it works, but it's slower. There are various limitations. And that in particular

20:08 is something that has kept many similar attempts back.

20:13 If we just give this up, we can have X, Y, and Z, right? But that those turn out to be pretty big compromises.

20:19 Absolutely. And sometimes, I mean, quite often extension modules are the issue. Sometimes there are also

20:25 things where Python's runtime semantics are not fully specified. Like, it's not defined by the language

20:33 when exactly objects are finalized when they go out of scope. In practice, there's a lot of code around

20:41 there that in very subtle ways depends on CPython's finalization semantics based on reference counting.

20:48 And so anything, and this is also something that PyPy learned, and I think,

20:53 oh, Piston, which is definitely alive and open source. You should talk to the Piston guys if you

20:59 haven't already. But their first version, which they developed many years ago at Dropbox, suffered from

21:06 sort of imprecise finalization semantics. And they found with sort of early tests on the Dropbox server code

21:15 that there was too much behavior that didn't work right because objects weren't always finalized at the

21:23 same time or sometimes in the same order as they were in standard CPython.

21:28 Oh, interesting. So there's no promises about that, right? It just says, well, when you're done with it,

21:33 it goes away pretty much eventually. If it's a reference count, it might go away quickly. If it's a cycle,

21:38 it might go away slower.

21:39 That's correct. And unfortunately, this is one of those unspecified parts of the language where people

21:46 in practice all depend on, not everybody, obviously, but many large production code bases do end up

21:54 depending on that. Not sort of intentionally. It's not that a bunch of application architects got together

22:01 and said, we're going to depend on precise finalization based on reference counting. It's more that those

22:08 servers, like the 5 million lines of server code that Dropbox had when I left, were written by hundreds of

22:15 different engineers, some of whom wrote only one function or two lines of code, some of whom sort of maintained

22:23 several entire subsystems for years. But collectively, it's a very large number of people who don't

22:29 all have the same understanding of how Python works and which part is part of the sort of the promises

22:35 of the language and which is just sort of how the implementation happens to work. And some of those

22:42 are pretty obvious. I mean, sometimes there are functions where the documentation says,

22:47 well, you can use this, but it's not guaranteed that this function exists or that it always behaves the

22:53 same way. But the sort of the finalization behavior is pretty implicit.

22:57 Yeah, Mark, what are your thoughts here?

22:58 People just expectations is derived from what they use. The problem with documentation is like

23:03 instructions. They don't always get read. And also, it's not just finalization. It's also reclaiming

23:08 memory. So anything that has a different memory management system might just need more memory.

23:15 Reference counting is pretty good at reclaiming memory quickly and will run near the limit of what you

23:20 have available. Whereas a sort of more tracing garbage collector like PyPy doesn't always work so well like

23:25 that. I mean, one thing we are going to change is the performance characteristics. Now, that should

23:29 generally be a good thing, but there may be people who rely on more consistent performance.

23:35 You may end up unearthing race conditions, potentially that no one really knew was there. I mean,

23:40 but I would not blame you for making Python faster and people who write bad, poorly threads of code

23:46 fall into some trap there. But I guess that there's even those kinds of unintended consequences, I guess.

23:51 That one sounds like pretty low risk, to be honest. Yeah.

23:54 Yeah. Also, this sort of the warm up time, we'll get a warm up time. Now, what will happen is,

23:59 of course, it's just getting faster. So it's no slower to start with. But it still has the perception

24:04 that that now takes a while to get up to speed, whereas previously, it used to get up to speed very

24:08 quickly, because it didn't really get up to speed. It just started. It stays around. It stayed at the same

24:13 speeds. But these are subtle things, but they're detectable changes that people may notice.

24:18 Yeah. Also, like any optimizer, there are certain situations where the optimization doesn't really

24:25 work. It's not necessarily a pessimization, but somehow it's not any faster than previous versions.

24:32 Well, other similar code may run much faster. And so you have this strange effect that you make a small

24:40 tweak to your code, which you think should not affect performance at all. Or you're not aware that

24:47 suddenly you've made that part of your code 20% slower.

24:50 Yeah. It is one of our design goals not to have these surprising sort of performance edges. But

24:56 but yeah, there's a little cases where it might definitely make a difference. Things will get a

24:59 bit slower. Yeah. There are very subtle things that can have huge performance differences that

25:05 I think people who are newer to Python run into like, oh, I see you can do this comprehension. And I had

25:11 square brackets, but I saw they had parentheses. So that's the same thing, right? Well,

25:16 not so much, not so much. Not if it's a million lines of code or a million lines of data. All right.

25:22 So that's a great way to think about it. Not making it break a lot of code is I think as much as it's

25:28 exciting to think about completely reinventing it, it's super important that we just have a lot of

25:32 consistency now that we've kind of just moved beyond the Python two versus three type of thing.

25:37 I think also it's worth mentioning, Guido, you gave a shout out to Sam Gross's proposal. The stuff you're

25:43 doing is not Sam Gross's proposal. It's not about even from what I can see from the outside that much

25:48 about threading. It's more about how do I make just the fundamental stuff of Python go faster. Is that right?

25:54 That's right. These are like completely different developments. When we started this, we didn't

25:59 actually know Sam or that there was anyone who was working on something like that. But there had

26:06 been previous attempts to remove the gill, which is what Sam has done. And like the most recent one of

26:13 those was by Larry Hastings, who came up with the great name, the gillectomy. That's a fantastic name.

26:19 Yeah. He put a lot of time in it, but in the end he had to give up because the sort of the baseline

26:27 performance was just significantly slower than vanilla interpreter. And I believe it also didn't scale all

26:35 that well. Although I don't remember whether it sort of stopped scaling at five or 10 or 20 cores,

26:43 but right. Yeah.

26:43 Sam claims that he's sort of got the baseline performance, I think within 10% or so of vanilla

26:51 3.9, which is what he's worked off. Right.

26:54 And he also claims that he has a very scalable solution and he obviously put much more effort in it,

27:00 much more time in it than Larry ever had.

27:03 Yeah. And it sounds like Facebook is putting some effort into funding his work on that, which is great.

27:08 Yeah. But it feels like a very sort of bottom up project. It feels like Yeah.

27:13 Sam thought that that this was an interesting challenge and he sort of convinced himself that he could do it.

27:19 And he sort of gradually worked on all the different problems that he encountered on the way.

27:25 And he convinced his manager that this was a good use of his time. It's my theory, because that's usually how these projects go. But you almost never have management say, Oh, we got to fund an engineer to make faster or make a multi-core or whatever.

27:43 Find a good engineer.

27:48 Yeah. So you all are adopting what I see is going as the Shannon plan. As in Mark Shannon, the guests in the top left here. That's fantastic. I remember talking about this as well, that you had hosted this thing. When was this back? A little over a year ago. So interesting time in there, right? You had talked about making Python faster.

28:16 You had talked about Python faster over the next four releases by a factor of five, which is pretty awesome. And you have a concrete plan to sort of make changes along each yearly release to add a little bit of performance because the geometric growth may get quite a bit faster over time.

28:31 Yeah. Do you want me to run through these?

28:33 Yeah. Yeah. Tell us about your plan. You've got four stages and maybe we could talk through each stage and focus in on some of the tech there.

28:39 The way we're implementing is now kind of a bit of a jumble of stage one and two, but the basic idea is that, dynamic languages, the key performance improvement is always based on specialization. So obviously, you know, it's the most of the time the code does mostly the same thing as it did last time.

28:57 Yeah. And even in like non loopy code, you know, who are web server, there's still like a big loop level at sort of like requests response on a level. So you're still hitting the same sort of code. And those codes are doing much the same sort of thing. And the idea is that you, you transmit, you know, multiply the code. So it sort of works for those particular cases.

29:15 You should specialize it. So the obvious sort of simple stuff is, you know, like binary arithmetic. I have a special version of adding integers, special version floats. Obviously Python, it's much more to special versions for different calling, different things and different attributes and all this sort of stuff.

29:30 That's sort of the key first stage. I mean, that's mixed in with the second stage, which is really much more to just doing lots and lots of little bits and tweaks memory layout. So that's to do better, better memory layout. You know, modern CPUs are, you know, extremely efficient, but they still have to fetch from, you know, speed light issues with fetching stuff from memory. So, you know, how things are laid out in memory is key performance.

29:54 And it's sort of just those sort of little bits and tweaks here and just kind of writing the code as we would if it had been written for speed in the first place. So a lot of, you know, CPython is old and it's just sort of evolved. And a lot of it has, there's lots of potential for just sort of rearranging data structures and rearranging the code and so on. And these all add up, you know, a few percent here, a few percent there. And it doesn't take many of those to get a decent speed up.

30:20 So that's the sort of first two stages. And those are the ones where we have some pretty concrete idea what we're doing.

30:25 Right. And this is the kind of stuff that will benefit everybody, right? We all use numbers. We all do comparisons. We all do addition call functions and so on.

30:33 Yeah. I mean, the way we're sort of trending with performance in the moment is that sort of, you know, sort of webby type code, web backend sort of code.

30:41 You'd be looking at kind of where we are now, I don't know, it's a 25, 30% speed up.

30:45 Whereas if it's a machine learning, a sort of numerical code, it's more likely to be sort of 10% region.

30:51 Obviously we'd hope to push both up and by more, I don't think we're particularly focused on either.

30:59 It's just often the case where, you know, the next sort of obvious sort of convenient speed up lies.

31:04 And although everyone talks about speed ups and I've been doing the same myself, I mean, it's best to think of really at the time something takes to execute.

31:11 So it's often just shaving off 1% of the type rather than speed up by 1%.

31:16 And because, you know, obviously as the overall runtime shrinks, what were marginal improvements become more valuable, you know, shaving off 0.2% might be not worth it now.

31:26 But once you've sped something up by a factor of three or four, then that suddenly becomes, you know, a percent and it's worth the effort.

31:32 This portion of Talk Python To Me is sponsored by Linode.

31:38 Cut your cloud bills in half with Linode's Linux virtual machines.

31:41 Develop, deploy, and scale your modern applications faster and easier.

31:45 Whether you're developing a personal project or managing larger workloads, you deserve simple, affordable, and accessible cloud computing solutions.

31:53 Get started on Linode today with $100 in free credit for listeners of Talk Python.

31:58 You can find all the details over at talkpython.fm/Linode.

32:03 Linode has data centers around the world with the same simple and consistent pricing regardless of location.

32:09 Choose the data center that's nearest to you.

32:12 You also receive 24, 7, 365 human support with no tiers or handoffs regardless of your plan size.

32:20 Imagine that, real human support for everyone.

32:22 You can choose shared or dedicated compute instances, or you can use your $100 in credit on S3 compatible object storage, managed Kubernetes clusters, and more.

32:33 If it runs on Linux, it runs on Linode.

32:35 Visit talkpython.fm and click the create free account button to get started.

32:40 You can also find the link right in your podcast player show notes.

32:43 Thank you to Linode for supporting Talk Python.

32:48 Yeah, which leads on to stages three and four.

32:50 So, you know, just-in-time compilation is always hailed as the sort of the way to speed up interpreted languages.

32:55 Now, before you move on, let me just like sort of list out what you have on stage two for people who haven't dove into this.

33:01 Because I think some of the concrete details, you know, people hear this in the abstract.

33:05 They kind of want to know, like, okay, well, what actually are some of the things you all are considering?

33:09 So, improved performance for integers less than one machine word.

33:13 It's been a long time since I've done C++.

33:16 Is a word two bytes?

33:17 How big is a word?

33:17 Well, a word is how big depends on the machine.

33:19 So, that would be 64 bits for pretty much anything now.

33:22 Apart from like a little tiny embedded systems, which is 32 still.

33:26 So, that's a lot of numbers, right?

33:28 That's many of the numbers you work with are less than 2 billion or whatever that is.

33:33 Yeah.

33:33 I mean, basically, there are two types of integers.

33:35 There's big ones that are used for cryptography and other such things where, you know, it's a number in a sort of mathematical sense, but it's really sort of some elaborate code.

33:44 And then there's numbers that actually represent the number of things or the number of times you're going to do something.

33:49 And those are all relatively tiny and they'll all fit.

33:52 So, the long ones used for cryptography and so on are relatively rare and they're quite expensive.

33:56 So, it's the other ones we want to optimize for because when you see an integer, that's the integers you get.

34:02 You know, they aren't in the quadrillion range.

34:04 They're in the thousands.

34:05 Right, right.

34:06 Exactly.

34:06 A loop index or an array index or something.

34:10 Some languages, one that I'm thinking of that also maybe is kind of close to where Guido is right now, also in Microsoft space, is C#,

34:19 which treats integers sometimes as value types and sometimes as reference types.

34:24 So, that when you're doing like loops and other stuff, they operate more like C++ numbers and less like pi, you know, pointers to pi long objects.

34:34 Have you considered any of that kind of stuff?

34:36 Is that what you're thinking?

34:37 An obvious thing is an old thing as well is to have tagged integers.

34:42 So, basically, you know, where we would normally have a pointer, we've got a whole bunch of zeros at the end.

34:47 There's 64 bits.

34:48 A machine is three.

34:50 And then for alignment, there's effectively four zeros at the end.

34:54 So, we're using a sixteenth of the sort of the possible numbers that a pointer could hold, four pointers, which means leaves a bunch for integers and floating point numbers.

35:03 So, there's a number of what's called tagging schemes.

35:06 For example, LuaJit, which is a very fast implementation of Lua, uses what's called NAND boxing, which is everything's a floating point, but there is sufficiently something like two to the 53, which is a huge number of not a numbers in the floating point range.

35:19 So, you could use a lot of those for integers or pointers.

35:22 Now, that's a little problematic with 64-bit pointers because, obviously, 64 bits is bigger than 53.

35:27 But there are other schemes where you...

35:30 So, again, a simple scheme is that, basically, the least significant bit is one for pointers and zero for integers or vice versa.

35:38 And basically, it just gives you full machine performance for integers because you just basically, anything up to 63 bits fits in a 64-bit integer and has basically all of your numbers.

35:49 Interesting.

35:50 Okay.

35:50 Because it's shifted across all the machine arithmetic works as normal and overflows.

35:55 You just overflow checks a machine, a single machine instruction and things like this.

36:00 And that's, again, pretty standard.

36:01 And they need sort of like fast Lisp implementation and older small talk and other sort of historical languages.

36:10 JavaScript tends to use things like this NAND boxing I was talking about because all of the numbers are floating point numbers.

36:17 So, another one that stands out to me here is zero overhead exception handling.

36:21 Guido, that's making it into 3.11 already, right?

36:24 That's basically just what we used to have is we'd have a little setup and sort of tear down instruction for every time we wanted to sort of control the block of code inside a try.

36:34 As a try finally, but also with statements.

36:37 But we've just ditched those in favor of just a table lookup.

36:40 So, if there's an exception now, it's just looked up in a table, which is what the JVM Java virtual machine does.

36:44 Yeah, excellent.

36:45 ZeroVerhead is a slightly optimistic term.

36:48 It's obviously not zero overhead, but it is less.

36:51 You'll have a harder time finding it in the profiler.

36:53 There's a little bit of memory that you didn't have before.

36:56 That's a lookup table, but sort of it really is zero overhead if no exceptions happen, right?

37:02 Not quite.

37:03 Just because there is extra memories it causes.

37:06 But also, you know, because of like tracing guarantees, sometimes we have to insert a knob where the try was.

37:15 So, there's still some slight overhead.

37:17 And then potentially in future when we compile code, that should effectively become zero.

37:21 But it is definitely reduced.

37:23 Mark, Apple surprised the world and they took their phone chips and turned them into desktop chips.

37:28 And that seemed to actually work pretty well with their ARM stuff.

37:32 There's a switch not just having basically just x86 and 64-bit stuff to think about, but now you also have this ARM stuff.

37:39 Does that make life harder or easier?

37:41 Does it open up possibilities or is it another thing to deal with?

37:44 It's just harder because it's a bit harder.

37:49 And we may want to look to the future of RISC-V.

37:52 So, currently, CPython makes net is portable.

37:56 That's a key thing.

37:58 It's portability is, yeah, it rather depends on testing.

38:02 You know, it's all very well saying it's perfectly portable.

38:04 But if you have never tested on a platform, you may have surprises.

38:08 But it's all written in C.

38:09 And portability is a sort of serious consideration.

38:12 So, I mean, things like those tagging I was just talking about, that's technically not portable C.

38:18 But it's certainly, I mean, a lot of things aren't technically portable C, but in effect are.

38:23 I mean, technically, it's impossible to write a memory allocator in C because the specification says once you've called free, you can't access the memory, which makes it kind of difficult to write something that handles the memory.

38:34 But, you know, these are oddities.

38:36 But in practice, you know, if you write sensible C code, you should expect to be portable.

38:42 So, we are kind of basing around that.

38:45 I mean, like some other virtual machines, you know, particularly JavaScript ones are effectively written.

38:50 They're interpreted often written in Assembler or some variant of it.

38:53 There's definitely a performance advantage in that, but I'm not convinced it's great enough to lose the portability and the maintenance overhead.

39:01 Yeah.

39:01 And one of the things that you focused on, Guido, was that you wanted this to be, to keep, one of the constraints is you said you want to keep the code maintainable, right?

39:10 This is important.

39:10 Absolutely.

39:11 Why does that matter so much rather than if we can get 20% speed up if Mark refreshes his assembly language skills?

39:17 Well, it would leave most of the core development team behind.

39:22 And so, suddenly, Mark would be a very, very valuable contributor because he's the only one who understands that assembly code.

39:31 That's just how it goes.

39:33 Yeah.

39:33 And I don't think that that would be healthy for the Python ecosystem.

39:37 If the technology we used was so hard to understand and so hard to learn, making it so hard to maintain, then as an open source project, we'd lose velocity.

39:50 Right.

39:50 The only thing that would sort of cause to happen in the core team might be people decide to move more code to Python code because now the interpreter is faster anyway.

40:02 So they don't have to write so much in C code.

40:06 But then, of course, likely it's actually going to be slower, at least that particular bit of code.

40:12 That's an interesting intention to think about.

40:13 If you could make the interpreter dramatically faster, you could actually do more Python and less C.

40:20 I don't know.

40:20 It would have to be.

40:21 There's some big number where that happens, right?

40:23 It's not just a 10%, but maybe.

40:25 That could be in the distant future.

40:27 But nevertheless, I wouldn't want the C code to be unreadable for most of the core developers.

40:34 Yeah, I agree.

40:35 That makes a lot of sense.

40:36 Being a C expert is not a requirement for being a core developer.

40:39 In practice, quite a few of the core developers are really good C coders.

40:44 And we support each other in that.

40:47 We take pride in it and we help each other out.

40:52 I mean, code reviews are incredibly important.

40:55 And we will happily help newbies to sort of get up to speed with C.

41:00 If we had a considerable portion that was written in assembler.

41:04 Yeah.

41:04 And then it would have to be written in sort of multiple assemblers.

41:09 Or there would also have to be a C version for platforms where we don't have access to the assembler.

41:16 Nobody has bothered to write that assembler code yet.

41:19 All these things make things even more complicated than they already are.

41:24 Right.

41:24 And the portability and the approachability of it is certainly a huge benefit.

41:29 Two other constraints that you had here, maybe you could just elaborate on real quick, is don't break stable ABI compatibility and don't break limited API compatibility.

41:38 Yeah. So the ABI is the application binary interface.

41:43 And that guarantees that extension modules that use a limited set of C API functions don't have to be recompiled for each new Python version.

41:56 And so you can, in theory, you can have a wheel containing binary code.

42:00 And that binary code will still be platform specific, but it won't be Python version specific.

42:06 Yeah, that's very nice.

42:07 That sort of that we don't want to break that.

42:10 It is a terrible constraint because it means we can't move fields like the reference count or the type field around in the object.

42:18 Many other things as well.

42:19 But nevertheless, it is an important property because people depend on that.

42:24 Sure.

42:25 And the API compatibility, well, that's pretty clear.

42:27 You don't want people to have to rewrite code.

42:28 The limited API is sort of the compile time version of the stable ABI.

42:33 I think it's the same set of functions, except the stable ABI actually means that you don't have to recompile.

42:42 The limited API offers the same and I think a slightly larger set of API functions where if you do recompile, you're guaranteed to get the same behavior.

42:54 And again, there are sort of our API is pretty large and a few things have snuck into the limited API and the stable ABI that sort of are actually difficult to support with changes that we want to make.

43:10 And so sometimes this holds us back.

43:13 But at the same time, we don't want to break the promises that were made to the Python community about API compatibility.

43:20 We don't want to say, oh, sorry, folks, we made everything 20% faster.

43:25 But alas, you're going to have to use a new API and all your extensions.

43:30 Just recompiling isn't going to be good enough.

43:33 Some functions suddenly have three arguments instead of two or no longer exists.

43:39 Or return memory that you own instead of returning a borrowed reference.

43:44 And we don't want to do any of those things because that just would break the entire ecosystem in a way that would be as bad as the Python 3 transition.

43:53 Right.

43:53 And it's, yeah, just not just not worth it.

43:55 All right.

43:56 Let's go back to the Shannon plan.

43:58 So we talked about stage one and stage two.

44:01 And Mark, I see here this is Python 3.10 and Python 3.11.

44:05 Are those the numbers where they're actually going to make it in?

44:07 Or is it, do we have to do like a plus plus or plus equals on them?

44:10 I think a plus one would be appropriate.

44:13 All right.

44:13 Plus equals one.

44:14 Yeah.

44:14 So maybe we're a bit faster because obviously I envisioned this was basically me and one other developer.

44:21 Plus maybe sort of some sort of reasonable buy-in from the wider core development team.

44:27 So I wasn't sort of doing the work sort of entirely in isolation or, but yeah, I was still having extra hands will definitely help things.

44:35 Sure.

44:35 Yeah.

44:36 So back when you were thinking you were, this was written at 3.9 timeframe, right?

44:39 And you're like, okay, well the next version, maybe we can do this, the version after that.

44:42 And by the time it really got going, it's more like 3.11, 3.12 and so on, right?

44:46 Yeah.

44:46 It's just around the time.

44:47 I think we switched from 3.9 to 3.10 development.

44:49 I think I was sort of thinking.

44:51 Okay.

44:51 So stage three out of the four stages you have is, I guess, Python 3.13 now, which is a miniature JIT compiler.

45:01 Is that a right characterization?

45:03 I think that's not the compiler.

45:05 Well, I suppose it would be smaller.

45:07 Maybe the parts it applies to, the parts that get compiled.

45:09 Yeah.

45:10 So I think the idea is that you want to compile all of the code where it sort of forms as bad as it's sort of hot code.

45:19 But it makes life easier if you just compile little chunks of code and sort of stitch them together afterwards.

45:27 Because it's very easy to fall back into the interpreter and for the interpreter to jump into sort of compiled code.

45:32 And you can sort of just hang these bits of compiled code off by individual bytecodes where they sort of start from.

45:38 Obviously, that's not fantastic for performance because you're having to fall back into the interpreter,

45:43 which limits your ability to infer things about the state of things.

45:48 So obviously, if you've said earlier, a specialization, you have to do some type checks and other sort of checks.

45:54 If you've done a whole bunch of checks, if you then fall back into the interpreter, you have to throw away all that information.

45:59 If you compile a bigger region of code, which is of the stage four, then you already know something about the code and you can sort of apply those compilations.

46:08 The problem with trying to do big regions upfront is that if you choose poorly, you can make performance worse.

46:16 And this is a real issue for, well, the exact existing ones.

46:20 I think we're going to talk about some of the other historical sort of compilers in the past.

46:24 And this is a real issue for those that they're just trying to compile a method at a time, regardless of whether that is a sensible unit to compile.

46:31 Right. It's sometimes hard to do optimizations when it's too small, right?

46:35 Yeah. And also it's very expensive to do regions that are too big or just in the bounded in the wrong places.

46:41 Okay. Yeah. That definitely sounds tricky.

46:43 Guido, there was a question earlier about, you know, mypyC work and the mypy stuff.

46:47 And you are really central to that, right? Doing a lot of work there.

46:51 How do you, both of you, either of you, feel about using type annotations as some sort of guide to this compiler?

46:59 For example, Cython lets you say, you know, x colon int as a parameter, and it will take that as meaning something when you compile it with Cython.

47:07 It seems like, you know, Mark is talking about knowing the types and guessing them correctly matters in terms of what's fast here.

47:14 Is there any thought or appetite for using type annotations to mean more than static analysis?

47:20 It's a great idea. And I think for smaller code bases, something like mypyC will prove to be viable.

47:27 Or for code bases where there is an incredible motivation to make this happen.

47:34 I could see that happen at Instagram, for example.

47:37 But in general, most people haven't annotated their code completely and correctly.

47:44 And so if you were to switch to using something like mypyC, you'd find that basically it wouldn't work.

47:52 A large number of cases.

47:54 And it would basically sort of, it's a different language, and it has different semantics, and it has sort of different rules.

48:02 And so you have to write to that.

48:04 I can see there's a big challenge to say, hey, everybody, we can do this great stuff if you type annotated.

48:09 And only 4% of people have properly annotated their code.

48:14 And then there's also the possibility that it's incorrectly annotated, in which case it probably makes it worse in some way of a crash or something.

48:23 mypyC will generally crash if a type is detected that doesn't match the annotation.

48:29 And if you annotate stuff with simple types, you can get quite good speedup.

48:35 So number is generally designed for numerical stuff.

48:38 But again, it's the simple types, integers, floats.

48:40 Cython, obviously, will do this.

48:41 Number does it dynamically, cython, statically.

48:44 And the number model, for example, is similar to the model that Julia language uses.

48:49 Essentially, you compile method at a time, but you make as many specializations as you need for the particular types.

48:56 And that can give very good performance for that sort of numerical code.

48:59 But the problem is that saying something is a particular type doesn't tell you very much about it.

49:05 It doesn't tell you what attributes an instance of it may or may not have.

49:09 It depends, you know, because you can, it's not like Java or C++ where having a particular class means it has those instance attributes and they will exist or at least there exist in a particular place and they can be checked very efficiently.

49:22 Because if dictionary lookup and so on, these things get a bit fuzzy.

49:26 72 bytes into this C object is where you find the name or something like that, right?

49:30 Yeah.

49:31 So because we basically, because anything might not be as the annotations say effectively at the virtual machine level, we have to check everything.

49:40 And if we're going to check it anyway, we may as well just check it once up ahead as we first do the compilation, whatever specialization, and then assume it's going to be like that.

49:50 Because if the annotations are correct, then that's just as efficient.

49:54 And if the annotations are wrong, we still get some performance benefit and it's robust as well.

49:59 So there's really no, the only advantage of the annotations is for this sort of like very sort of loopy code where we can do things like, you know, loop transformations and so on, because we can infer the types from the arguments of enough of the function to do that stuff.

50:16 And that works great for numerical stuff.

50:17 But for more general code is problematic.

50:20 What about slots?

50:21 Slots are an interesting, not frequently used aspect of Python types that seem to change how things are laid out a little bit.

50:29 Yeah.

50:29 Is that?

50:30 Well, mypyC actually, one of mypyC's main tricks is that it turns every class into a class with slots.

50:39 Okay.

50:39 If you know how slots work, you will immediately see the limitation because it means there are no dynamic attributes at all.

50:47 Yeah.

50:48 These are what you get for your fields and that's it.

50:50 Yeah.

50:51 I mean, if you don't have dynamic attributes, though, it gives you pretty efficient memory use.

50:55 I mean, it's a little too far up Java.

50:57 So it's...

50:58 And more predictability about what's there and what's not, which is why it came to mind.

51:02 Yeah.

51:02 I mean, they definitely have their use.

51:04 Yeah.

51:04 All right.

51:05 Mark, that was your four-stage plan, hoping to make 1.5 times as fast as before each time, which you do that over four releases, you end up with five times faster, right?

51:17 That's the standard plan.

51:18 Where are we on this?

51:19 How's it going for you and everyone on the team?

51:22 I say it's a bit of a jumble of stages one and two that we're implementing, largely because it's a larger, more diverse team than I was expecting.

51:30 So it makes sense to just sort of spread things.

51:33 Yeah.

51:34 Yeah.

51:34 You'll work on operators, you go work on zero overhead exception handling and so on.

51:38 Yeah.

51:38 So I would say from where we are now, I was probably a bit optimistic with stage one, but stage two seems to have a lot of potential still.

51:49 There's always little bits of the interpreter we can tweak and improve.

51:52 So between the two of them, I'm confident we'll get this projected over twice the speed.

51:58 That's fantastic.

51:59 So the course you're on right now, if let's just say stage one and two happen, and for some reason the JET stuff doesn't, that's still a big contribution.

52:06 What do you think in terms of speed up for that?

52:09 Well, again, it's going to depend a lot.

52:11 I know it matters so much, but like, you know.

52:13 I mean, I just want to, because like currently we have a sort of set of benchmarks that we're using.

52:19 I mean, not possibly the, I mean, the more benchmarks is always better.

52:23 So it's a broad set.

52:24 Individually, the benchmarks, some of them aren't great, but collectively a form a sort of useful data set.

52:30 But I mean, we speed up from up like up to 60% down to zero.

52:33 So it's definitely a spread.

52:36 So it can, you know, try it out would be the thing.

52:39 I mean, you can download 3.11 Alpha 1 and Alpha 2 should be out a few days at all time now.

52:45 It's presumably before you publish a podcast.

52:48 Yeah.

52:48 Fantastic.

52:49 So people can download it, play it with it.

52:51 Yeah, that's fantastic.

52:52 You know, thank you for this.

52:53 I think even 50, 60%, if it stayed there, that's pretty incredible.

52:58 I mean, this language has been around for 30 years.

53:01 People have been trying to optimize it for a long time.

53:03 It's incredible, right?

53:04 And then, you know, to do this sort of change now, that would be really significant.

53:08 Yeah.

53:08 This is an area that we haven't spent much time on previously for various reasons.

53:15 I mean, people have spent a lot of time on sort of making the string of the objects fast,

53:21 making dictionary operations fast, making the memory efficient, adding functionality that

53:27 the sort of Python has generally, I think, had more of a focus on functionality than on speed.

53:34 And so for me, this is also a change in mindset.

53:36 I'm still learning a lot.

53:38 Mark actually teaches me a lot about how to think about this stuff.

53:41 And I decided to buy this horrible book.

53:44 Well, it's a great book.

53:46 Computer architecture.

53:47 But it's also like it weighs more than a 17-inch laptop.

53:52 Wow.

53:53 Classic text, but not a light read.

53:56 Yeah.

53:56 Down into beyond the software layer, into the hardware bits.

54:00 It makes me amazed that we have any performance at all and that any performance is predictable

54:06 because we're doing everything wrong from the perspective of giving the CPU something to work with.

54:13 I mean, all the algorithms described in there, branch prediction, speculative execution,

54:18 caching of instructions, all that is aimed at small loops of numerical code.

54:25 And we have none of that.

54:26 Yeah.

54:27 Exactly.

54:28 C of LC is not a numerical loop.

54:30 Definitely not.

54:31 All right.

54:31 Well, I think that might be it for the time we have.

54:34 I got a couple of questions from the audience out there.

54:36 Toon Army Captain says, I'm interested in Guido's thoughts about the Microsoft funded effort

54:44 versus the developer in residence, particularly in terms of the major work of the language and

54:49 the CPython runtime going forward.

54:50 I think these are both good things, both really good things.

54:53 They seem super different to me.

54:55 I think it's great that we have a developer in residence.

54:57 It's a very different role than what we're doing here.

55:01 The team at Microsoft is at least we're trying to be super focused on performance to the exclusion

55:07 of almost everything else, except all those constraints I mentioned, of course.

55:11 The developer in residence is focused on sort of the community, other core developers, but

55:20 also contributors.

55:21 Lukasz is great.

55:23 He's the perfect guy for that role.

55:25 And his work is completely orthogonal to what we're doing.

55:29 I hope that somehow the PSF finds funds for keeping the developer in residence role and

55:36 maybe even expanding it for many years.

55:39 seems to me like a really important role to smooth the edges of people contributing to

55:45 CPython.

55:46 And the difference of what Mark and you all are doing is heads down, focused on writing

55:52 one type of code, whereas Lukasz is there to make it easier for everyone else to do whatever

55:57 they were going to do.

55:58 Right.

55:59 And I think, you know, one sort of a horizontal scale of the CPython team and the other is very

56:05 focused, which is also needed.

56:06 It's actually amazing that we've been able to do all the work that we've been doing over

56:13 the past 30 years on Python without a developer in residence.

56:17 I think in the early years, I was probably taking up that role.

56:21 But the last decade or two, there just have been too many issues, too many peps for me to

56:28 sort of get everything going and sort of having...

56:31 I was always working part time on Python and part time working on my day job.

56:36 Right.

56:36 Absolutely.

56:37 Lukasz is working full time on Python.

56:39 And he has a somewhat specific mandate to sort of help people, help sort of contributions

56:48 go smoother, make working with the issue tracker easier.

56:53 So and that sort of developer contributors must be encouraged and rewarded.

56:58 And currently, often the way the bugs.python.org experience is it's a very old web app and it

57:06 looks that way.

57:07 And it's difficult to learn how to do various things with that thing.

57:12 And so Lukasz is really helping people.

57:15 Yeah, it's fantastic.

57:16 With the edges.

57:16 Of course, there's also the somewhat separate project of switching from bugs.python.org to

57:22 a purely GitHub-based tracker.

57:25 Yeah, I was just thinking of that as you were speaking there.

57:27 Do you think that'll help?

57:28 I feel like people are more familiar with that workflow.

57:31 People are more familiar.

57:32 It's more integrated with the pool request flow that we already have on GitHub.

57:37 I think it will be great.

57:39 Expectations is that I think it will be actually happening before the end of this year or very

57:44 early next year.

57:45 That'd be fantastic.

57:46 The code's already there.

57:47 The work's already there.

57:48 Might as well have the conversations and the issues and whatnot there.

57:52 All right, guys.

57:52 I think we are definitely over time, but I really appreciate, first of all, the work

57:57 that you're doing, Mark, on this project and Guido on the last 30 years.

58:02 This is amazing.

58:02 You can see out in the comments how appreciative folks are for all the work you've done.

58:06 So thank you for that.

58:07 Let's close with a final call to action.

58:10 You have the small team working on it.

58:12 I'm sure the community can help in some way.

58:14 What do you want from people?

58:15 How can they help you either now or in the future?

58:18 I mean, it's just contribute to CPython.

58:19 So, I mean, I don't think it's specifically performance.

58:23 I mean, like all the contributions help improve, you know, co-quality and reliability are still

58:30 very important.

58:31 So I don't think particularly people can do.

58:35 But we do have a sort of ideas repo if people do have sort of things they want to suggest

58:41 or bounce ideas around, whatever.

58:43 Maybe they could test their workloads on alpha versions of things like that.

58:48 Yeah, I mean, that would be fantastic.

58:49 I mean, we don't really have a set for where people can put that information.

58:53 But if just open an issue on the ideas thing and post some data, it'd be fantastic.

58:57 We'd love it for people to try to use the new code and see how it works out for them.

59:03 Yeah, fantastic.

59:04 All right.

59:04 Well, thank you both for being here.

59:06 It's been great.

59:07 Our pleasure.

59:07 Thank you.

59:08 This has been another episode of Talk Python To Me.

59:12 Thank you to our sponsors.

59:13 Be sure to check out what they're offering.

59:15 It really helps support the show.

59:16 Choose Shortcut, formerly Clubhouse.io, for tracking all of your project's work.

59:21 Because you shouldn't have to project manage your project management.

59:25 Visit talkpython.fm/shortcut.

59:28 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

59:33 Develop, deploy, and scale your modern applications faster and easier.

59:36 Visit talkpython.fm/linode and click the Create Free Account button to get started.

59:42 Do you need a great automatic speech-to-text API?

59:45 Get human-level accuracy in just a few lines of code.

59:47 Visit talkpython.fm/assemblyai.

59:50 Want to level up your Python?

59:52 We have one of the largest catalogs of Python video courses over at Talk Python.

59:56 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:00:01 And best of all, there's not a subscription in sight.

01:00:04 Check it out for yourself at training.talkpython.fm.

01:00:06 Be sure to subscribe to the show.

01:00:08 Open your favorite podcast app and search for Python.

01:00:11 We should be right at the top.

01:00:12 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct

01:00:19 RSS feed at /rss on talkpython.fm.

01:00:23 We're live streaming most of our recordings these days.

01:00:25 If you want to be part of the show and have your comments featured on the air, be sure to subscribe

01:00:30 to our YouTube channel at talkpython.fm/youtube.

01:00:33 This is your host, Michael Kennedy.

01:00:35 Thanks so much for listening.

01:00:36 I really appreciate it.

01:00:38 Now get out there and write some Python code.

01:00:39 I'll see you next time.

01:01:00 Thank you.