Learn Python with Talk Python's 270 hours of courses

#381: Python Perf: Specializing, Adaptive Interpreter Transcript

Recorded on Thursday, Sep 15, 2022.

00:00 We're on the edge of a major jump in Python performance. With the work done by the faster CPython team and Python 311 do out in around a month, your existing Python code might see an increase of well over 25% in speed with no changes to your code. One of the main reasons is its new specializing Adaptive Interpreter.

00:19 This episode is about that new feature.

00:22 And a great tool called Specialist, which lets you visualize how Python is speeding up your code and where it can't unless you make minor changes.

00:29 Its creator, Brandt Bucher, is here to tell us all about it. This is Talk Python to me.

00:33 Episode 381 recorded September 15, 2022.

00:50 Welcome.

00:50 To Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past episodes at Talkpython.FM and follow the show on Twitter via @talkpython. We started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at 'Talkpython/YouTube' to get notified about upcoming shows and be part of that episode. This episode is sponsored by Microsoft for Startups founders.

01:18 Hub.

01:18 Check them out at "Talkpython.FM/founders hub" to get early support for your startup. And it's brought to you by 'Compiler' from Red Hat. Listen to an episode of their podcast as they demystify the tech industry over at Talkpython.

01:35 Brandt welcome to Talk Python to me.

01:37 Thank you for having me. I'm excited to talk about some of this stuff.

01:39 I am absolutely excited about it as well. I feel there's a huge renaissance coming, happening right now, or has been happening for a little while now, around Python performance.

01:49 It's exciting to see, especially in just the last couple of years, that you definitely see these different focuses, whether it's improving single threaded Python performance, multi threaded Python performance, python in the browser. There's a lot of really cool stuff happening.

02:04 Yeah.

02:04 If we could talk WebAssembly and pyscript and all that is a very exciting thing. There's probably some performance side around it, maybe something we could touch on, but that's not the main topic for today. We're going to talk about just the core CPython and how it works, which is going to be awesome. Some work you've done with the team there at Microsoft and your contributions there. Before we get into all that stuff though, let's start with your story. How did you get into programming and Python?

02:27 Yeah, so I originally studied computer engineering, so like hardware stuff at UC Irvine around my third or fourth year. So 2017, I encountered Python really for the first time in a project setting. Basically it was a senior design project. We get to kind of make whatever we want. We made this cool system that basically it's point cameras at a blackjack table and it takes card counters. And if you want to do something like that in four months or whatever, python is kind of the way to go.

02:57 Yeah.

02:58 CV stuff there is really good, right?

02:59 Yeah.

03:00 So it was NumPy and OpenCV. And that was kind of my first exposure to stuff. And so I learned that I like developing software a lot more than developing hardware. So I kind of never looked back and just went full in about a year later. So, like 2018, I opened my first PR to the CPython repo and it was merged. And you're up to that.

03:24 There's this standard library module called module finder. Basically, it's one of those ones that's kind of just a historical oddity. And it's still in there. Basically, you can run it over any Python script. It detects all the imports. And so you can use it to build an import graph. And I forget exactly what I was using it for. I think it was for work. And I ran into some bugs that had been just kind of lingering there for years, and so I submitted patches for a bunch of them. And so that was kind of my first experience contributing to open source. In general, that PR was actually open for a while.

03:57 It was open for a month or two or something like that. So in the meantime, I contributed other things to mypy, Type shed. But that's still my first open source contribution count. The data is open.

04:10 Yeah. Does it count as a beginning or the end of the PR?

04:14 Right? Yeah, exactly.

04:15 Because those can be very different things.

04:17 Sometimes they can be very different. We have now in CPython. We have the just in Python. I guess we have the developer in residence with Ɓukasz Langa. He's there to sort of facilitate making that a lot faster. And I feel like people's experience there might be picking up speed and improving.

04:32 Yeah, no, I think that's a great thing. And in general, just seeing this kind of shift towards full time Python core development being funded by these big companies is really exciting to see. I think it improves the end contributor experience, the user experience, and just get things done, which is nice. I don't have any numbers, but I imagine there are fewer of those kind of lingering months old PR's than there were back when I first started. Way long ago. Four years ago.

05:01 So far back in the past.

05:02 So long. So Long.

05:03 Yeah.

05:04 So that was like, 2018. 2019, I became a member of the Triage team, which is basically a team for people who are kind of more involved in python than just your average drive by contributor. So while they're not full core developers, they basically can do anything at core developer can except both and actually press the merge button. So that was really nice because I made doing things like reviewing PRS easier. Tagging issues, closing issues, that sort of thing. A year later I became how do.

05:36 You get the experience for that?

05:38 It's one thing to say, well, I've focused on this module, and here's the fix. And it's another to say, give me anything from CPython and I'll assess it.

05:45 Well, it wasn't give me anything from CPython. It was made. It part of my morning routine to wake up, go to a coffee shop, order a coffee, and just for a half hour, look at newly opened issues and PR.

05:59 Yeah, well, I focus mostly on PR review for new contributors. Basically, I had filter set up that said, you know, show me all the PR last 48 hours from a new contributor. I thought, okay, my first contribution experience wasn't that great? It would be great if these people who have never opened a PR C python before can get a response within 48 hours, whether that's telling them to sign the contributor license agreement or formatting fixes or pinging the relevant core dev or whatever. So that was kind of how I dove into that.

06:27 All this terminology that you're using PR and stuff. No, this is great. What I was going to say is this is new to Python. Right. It wasn't that long ago that Python was on mercurial or before the subversion.

06:41 Coming over to GitHub is kind of a big deal, and I feel like it's really open the door for more people to be willing to contribute. What do you think?

06:48 Oh, it absolutely lowered the barrier to entry for people like me. Using the old bug stoppython.org was tough at first, but I eventually kind of got used to it just in time for it to be replaced with get up issues, which I much prefer, but I have a hard time seeing myself emailing patches around or I have a lot of respect for people where that was the development flow. And a number of years ago. So I became a core developer in 2020. And I guess it was about exactly a year ago Almost exactly a year ago, I joined the Python Performance Team at Microsoft.

07:21 Were you at Microsoft before then?

07:23 No, I was not. I worked for a company called Research Affiliates in Newport Beach. And I think you actually had my old manager, Chris Ariza, on the show.

07:32 I have had Chris on the show. That's Fantastic. Small World.

07:36 Yes.

07:37 All python world.

07:38 Yeah.

07:38 Small Python world. Just a couple of million of us. That's Great. So this brings us to our main topic. And let's start at the top. I guess there's the Faster Python team. I guess. I don't know. How do you all refer to yourself?

07:55 Yeah.

07:55 We refer to ourselves as the faster c python team. I think internally, our distribution list is the CPython Performance Engineering team, which sounds a little gnarly open, so it's a lot work here.

08:07 It's a cool title to have on a resume, isn't it?

08:10 Yeah, I think I'll use that one.

08:11 There you go. There's Been A Ton Of Progress some of this work was done in 310, but there's this article here that I got pulled up on the screen. It says Python 311 performance benchmarks are looking fantastic. It's going to make you feel good.

08:27 I mean, they were looking fantastic back in, what, June?

08:31 Small bit more fantastic now.

08:33 Yeah, exactly. This article is from June, so it's only better from there.

08:38 Right.

08:38 It's really exciting. And like I said, this is a performance jump that at least since I've been involved in Python, we haven't seen this in a point release where we're seeing 25, 30%, sometimes more improvements for pure pipeline code rather than kind of 5 10 15% range that might be more typical. And again, I think that's kind of a product of this very conscious effort, whether it's my team or that a lot of people that we interact with. So, for example, Pablo, release Manager student Council Member at Bloomberg, has been working a lot on this stuff with us. Outside contributors that 2 come to mind are Dennis Sweeney, Kenjin there's definitely been a focus on this and it's bang up, which is really exciting, like you said.

09:24 Yeah.

09:24 And maybe a little bit more in parallel instead of a cooperative effort. But there's also the Cinder folks over at Meta Facebook.

09:32 That's absolutely a cooperative effort. Even though Cinder isn't necessarily we're not merging all of Cinder back into CPython. Several of the changes are being upstreamed into CPython and in fact, just earlier this week on Tuesday, we had, I think, like a two hour meeting where the sender team walked our team through how their JIT works, even though yeah, it can be seen as a parallel effort. I think it's very collaborative.

09:57 I think also, I like you. I'm very much blown away at the step size of the performance improvements. It's just super surprising to me that a 30 year old language and a 30 year old runtime can get that much better in that short of a time.

10:12 Yeah.

10:12 And I feel like I'm going to keep coming back to this. But having fulltime people dedicated to this. Having teams of people dedicated to this. I think that's a big part of sort of unlocking this because some of the things that are required for those big jumps are kind of larger architectural changes that a single volunteer who's doing this on their free time probably wouldn't have been able to do that coordinating with others and without throwing a significant amount of effort at it.

10:41 Yeah.

10:41 I mean, there are people out there core dev's who have done amazing stuff. Shout out to Victor Stoner. I feel like a lot of the performance stuff that I've seen in the last couple of years, he is somehow associated with some major change. But the changes that are being undertaken here, they're much more holistic and far reaching and it really does take a team, I think, to make it reasonable.

11:03 Right.

11:03 What's cool about the 311 effort is it's a combination of kind of both sorts of changes. So we have a bunch of one off very targeted changes, probably five or six or ten of those, and then we have one or two of these kind of larger changes that we can build upon in the future and they're never really done, which, you know, that's exciting because it means we get to keep making Python faster.

11:28 It is exciting. I think another area to just highlight real quick before we get into too much detail is correct me if I'm wrong, but none of this is particularly focused on multicore parallelism, right?

11:41 No. So one member of our team, Eric so he's basically the subinterpreter guy.

11:46 Exactly.

11:47 Yeah.

11:49 He is focusing most of his time on a perch or Per Gil and all that sort of stuff. But I mean, all the numbers that you're looking at here and all of our benchmarks is all single threaded, single process. If you're running Python code, it will get faster without you changing your code, which is awesome.

12:07 Yeah, it's super awesome. I want to highlight that because if Eric manages to unlock multicore performance without much contention or trade offs, if you've got an eight core machine and you get seven x performance by using all the cores, that would be amazing. But all this work you're doing applies to everyone, even if they're trying to do that stuff.

12:28 Right.

12:28 So even if somehow we get this multicore stuff, the computational multicore stuff working super well. The work that you're doing and the team is doing is going to make it faster on every one of those cores.

12:38 Right.

12:38 So they're kind of multiplicative initiatives. So if we could get a big improvement in the parallel stuff, it's only going to just multiply how much better this aspect of it is going to be for people who use that.

12:49 Right.

12:49 It's great to be kind of pursuing all these different avenues because they pay off in different time frames. Right. Some of these are longer efforts in Eric's cases, have been prepared for very long effort, and you've done a great job sticking with it and seeing it through. And some of these were saying in point release, and so they absolutely build on each other. Like you said, you can get a seven x increase from subinterpreter just to throw out a number, but Python as a whole got 25 or 30% faster. Then you're seeing much more than a seven x.

13:19 Right?

13:19 Absolutely. So very exciting times. Two things. Before we get into the details of particular, interpreter and stuff, tell me a bit about this team. I interviewed widow and Mark Shannon a while ago, about a year ago, I suppose, about this plan when they were kicking it off, and we didn't have these numbers or anything, but we did talk about what we were doing and he talked about the team that he's working with there. So certainly it's just more than the two of us. Tell us about the team.

13:45 Yeah, so I think there are seven people. So there's Guido and Mark Shannon, Eric and me, as we mentioned, another core developer, Yurit, one other engineer, Elle, and a manager for the team who also does some engineering effort as well and is a member of the Triage team, Mike.

14:04 Nice.

14:05 Yeah. He worked on Pyodide.

14:08 Pyodide, that's right.

14:10 Live.

14:14 Something.

14:14 Or something.

14:15 Exactly.

14:18 That's right. That's the foundation for PY Script, actually, which is quite cool.

14:22 Python Three.

14:25 Yeah. The other thing that I want to ask you about is we have these numbers and visibility into Python 311. That's got a lot of conversations going. People are saying they're looking amazing and fantastic and other nice adjectives, but this is in beta, maybe soon to be RC. I'm not sure. What is Python 311 status these days?

14:46 311. We just released our last release candidate, I think, this week.

14:52 Right.

14:52 So basically the idea is the final release candidate and the actual 3110 release should be the exact same thing. Unless we find any serious bugs that buried fixing before the release candidate is going to be 311. This is as close to stable as.

15:12 Any of the almost there.

15:14 Right. Okay, cool.

15:15 So the reason I ask that is a lot of the work you've been doing recently is probably on 312, right?

15:20 Yes.

15:21 So beta freeze, which is basically when we go from alphas into betas, and there's no more, basically new features allowed at that point. That happens every May, and so everything that we've been working on since May goes into 12th.

15:35 Are you excited about the progress you've made?

15:36 Yes.

15:37 Very excited.

15:39 Yes.

15:40 And it's nice we still have a lot of time before the next week.

15:43 Let's talk really quickly about the faster CPython thing put together by Mark Shannon. Guido called it the Shannon plan. And the idea is, how can we make Python five times faster? If we could take Python and make it five times faster, that would be a really huge deal. And again, none of that is multicore. If you could somehow unlock all the cores and you've got eight, that's 35 or 40 or something like that. This is an ambitious plan. It started out with a little bit of work with 310. Is that when the adaptive specializing interpreter first appeared? Did it actually wait until 311 to show up?

16:19 No, I don't believe we don't have.

16:21 It in yes, I didn't think so either.

16:22 Missing something that's in 311. The rest of it looks accurate, though.

16:27 Yeah.

16:27 So then basically that was stage one. Stage two is 311, where there's a bunch of things, including kicking over the interpreter. We're going to talk about the specializing interpreter. Bunch of small changes here. And then stage three for 312 is JIT. Have you guys done any work on any of the Jit stuff right now?

16:46 It's not looking like 312 will ship with a jit. We think there are other changes that we can make that don't require a jit that will still pay off. We're probably planning on at least experimenting with what a jit would look like. As I said, we already have gotten kind of a guided towards Cinders and so we're talking kind of high level architecture and prototyping and that sort of thing. But I would be surprised if retwell shift, it's a long effort. So starting the research is being done?

17:18 Yes.

17:19 Active?

17:19 Yeah.

17:20 Cool. All right, well this was put together quite a while ago, back in 2020 as here's our plan and of course plans are meant to evolve, right? But still looks like this plan is working out quite well because of the changes in performance that we saw already in 311 beta and pretty fantastic. There are a bunch of changes here about things like zero overhead exception handling. I believe it used to there was an overhead for entering the try block.

17:48 Every time you went in or out of a try accept block. So even if I did try pass, there was overhead associated, right? So basically we would push a block that says oh, if an exception happens, jump to this location. Now what we do is we realize oh, we actually have that information ahead of time when we're actually compiling the byte code. So since the common case is that an exception is raised, then we can store that data in a separate table and say oh, if an exception is raised at this instruction, then jump here without actually having to do any of that management at a time. It's a little more expensive I believe, if an exception is raised. But in the case where an exception is not raised, I think it is basically as close to truly zero cost as possible.

18:36 This portion of Talk Python to me is brought to you by Microsoft for Startups Founders Hub, starting a business is hard. By some estimates, over 90% of startups will go out of business in just their first year. With that in mind, Microsoft for Startups set out to understand what startups need to be successful and to create a digital platform to help them overcome those challenges, microsoft for Startups Founders Hub was born. Founders Hub provides all founders at any stage with free resources to solve their startup challenges. The platform provides technology benefits, access to expert guidance and skilled resources, mentorship and networking connections, and much more. Unlike others in the industry, Microsoft for Startups Founders Hub doesn't require startups to be investor backed or third party validated to participate. Founders Hub is truly open to all. So what do you get if you join them? You speed up your development with free access to GitHub and Microsoft cloud computing resources and the ability to unlock more credits over time. To help your startup innovate, Founders Hub is partnering with innovative companies like OpenAI, a global leader in AI research and development to provide exclusive benefits and discounts through Microsoft for startups Founders Hub. Becoming a founder is no longer about who you know. You'll have access to their mentorship network, giving you a pool of hundreds of mentors across a range of disciplines and areas like idea validation, fundraising, management and coaching, sales and marketing, as well as specific technical stress points. You'll be able to book a one on one meeting with the mentors, many of whom are former founders themselves. Make your idea a reality today with the critical support you'll get from Founder Hub. To join the program, just visit Talkpathon.FM/foundershub. All one word, no links in your show notes thank you to Microsoft for supporting the show.

20:22 I think that's the way it should be. Exceptions, as the word implies, is not the main thing that happens. It's some unusual case that has happened, right?

20:32 Not always, but often means something has gone wrong. So if something goes wrong, well, you kind of put performance to the side anyway, right?

20:39 Yes.

20:39 And a lot of the sort of optimizations that you want to see, especially in for example, Jitter code or whatever, exceptions are the sort of thing that messed that up, where if an exception is raised, if they get out of the jit code, go back to slow land where we know what's going on and have better control of context everything. But yeah, it's really exciting to see. It's really cool design.

21:01 Yeah.

21:02 Mark Shannon did this.

21:05 There's a bunch of improvements coming along. But what I want to really focus on here is the Pep659. The specializing adaptive interpreter. And in addition to being on the team. You've created a really cool project. Which we're going to talk about as soon as we cover this one. About how you actually visualize this and maybe even change your code to make it faster. Understanding how maybe opportunities might be missed for your code to be specialized or adapted.

21:33 Pep659 the concepts are not too difficult, but the implementation is really hairy. So I think it definitely deserves to be gone over 70. It's really cool when you get down to how it actually works and what it's doing.

21:47 Is this the biggest reason for performance boosts?

21:51 I think it's the most important reason for performance. I mean, any performance boost kind of depends on what you're doing, right?

21:58 For example, Pablo and Mark work together on making Python to Python calls way faster and way more efficient. So if you're doing lots of recursive stuff, that's going to be the game changer.

22:07 We did also spend all your time writing loops that just do try accept, try accept that one's better.

22:15 Yeah, if you've got tried accept and tight loops, or if you do lots of regular expressions, then our improvements in that area will probably matter more than this. But this is cool because we can build upon it to kind of unlock additional performance improvements. Sure. We can kind of get that one. You have a better idea how, exactly?

22:36 When I look at PEPs, usually it'll say what its status is and what version of Python it applies to, and I see this Pep, it doesn't say which version it applies to, and it's status as draft. What's the story here?

22:48 I'm actually not sure what the story is behind the PEP itself. I think it's a good informational document that explains the changes that we did get into Python 311, but I don't think that Pep was ever formally accepted.

23:03 As you can see, it's just an informational Pep. So it's kind of more than design of what exactly we're doing and how we plan to do it.

23:09 Right. Because it's not really talking about the implementation so much as, like, here's the roadmap and here's what we plan to do and stuff.

23:17 Right. Yeah.

23:18 There's a lot of pros in there that says, here's how we might do it, but no promises. We could change this literally anytime, and we have.

23:26 The design has changed fairly significantly since this PEP was first published, but we've updated that.

23:32 Yeah.

23:33 Cool.

23:33 Okay, so the background is when we're running code. It's not compiled, and it's not static types because it's Python, it's dynamic. And the types could change. They could change because it just uses duck typing, and we might just happen to pass different things over. We do have type hints, but as the word there is, it applies as a hint, not a requirement like C or C# or something.

23:58 You have to have the CPython Runtime be as general as possible for many of its operations.

24:05 Right? Yeah, absolutely.

24:06 And beyond just types, things like I could create a global variable at any time, I could delete a global variable at any time.

24:13 I could add or remove arbitrary attributes, all these sort of very Pythonic things about Python or on Pythonic, depending on how you're looking at it. These are all things that would never fly compiledly.

24:25 Yeah.

24:25 And they mean you can't be overly specific about how you work with operations. So, for example, if you say, I want to work with, call this function and pass it the value of X. Well, where did X come from? Is X a built in? Is it a global variable? Is it at the module level? Is it a parameter? Is it a local variable? All these things have to be discovered at Runtime, right? For the most part, yeah, for the most part. Part of this adapting interpreter is it will run the code, and it says every single time they said, load this variable called X, it came out of the built in, not out of a module. And so we're going to replace that code specialize it or Quicken. It I've heard of Quickening, so I'll have to work on the nomenclature.

25:11 Yeah, we can clear up the terms a bit.

25:13 Yeah.

25:14 But you're going to take this code and you're going to replace it. And don't say just load an attribute or load of value from somewhere and go look at all the places you're like. No, go look in the built in and just get it from there. And that skips a lot of steps, right?

25:27 Yes.

25:28 Or maybe I'm doing math here and every single time it's been an integer. So let's just do integer math and not arbitrary addition operator with the huge.

25:37 Asterisk that if it does become a global variable, or if it does start throwing floats at your addition operation, that we don't set false or even produce an incorrect result.

25:49 Right.

25:49 Because you could say use X, but before that you might say if some value is true, x equals this thing, but it goes from being a built into a local variable or some weird thing like that.

25:59 Right.

26:00 And if it overly specialized and couldn't fall back, well, bad things happen, right?

26:05 Yeah.

26:05 It would be surprising if you were getting the Len function from the built in over and over and then you for some reason defined the Len as a global Python. The language specification says it's going to start loading the global now, regardless of where it be.

26:21 And that's the same for attribute accesses. If you used to be getting an attribute of the class and then you shadow it on the instance, you need to start getting it from the instance that you can't keep getting it from the class. Correct.

26:32 Right.

26:32 This is one of those problems that arises from this being a static dynamic language that can be changed as the code runs. Because if this was compiled, wherever those things came from and what they were, they can't change. Their type was set, their location was set, and so then the compiler can say, well, it's better if we inline this or it's better if we do this machine operation that works on integers better, or some special thing like that.

26:59 Right.

26:59 It doesn't have to worry about it falling back. And I feel like that ability to adapt and change and just be super dynamic is what's kind of held it back a lot, right?

27:07 Yes.

27:08 And I like that word they use adapt because that's kind of a big part of how the new interpreter works. You can change your code and the interpreter adapts it. If x starts being an attribute on the instance rather than from the class, well, soon the interpreter will learn that sometime later after running a program and they start doing the best path. For instance, access rather than class tax.

27:29 Let's start there. How does it know right? It doesn't compile, so it has to figure this stuff out.

27:36 Right.

27:36 I run my code. Why does it know that I can now treat these things x and y integers versus strings?

27:42 Yeah.

27:42 So stepping back a little bit, like how does this new interpreter work? So the big change kind of the headline is that the byte code instructions can change themselves while they're running something that used to be a generic ad operation and become something that is specialized, which is kind of the specialized instruction terminology we use for adding two integers or adding two quotes. And so this happens sort of in a couple of different phases after you've run your code for some amount of time, basically after we've determined it's worth optimizing because optimization is free. So something's only run once it's module level code or a class body, there's no reason to optimize that at all. For like later run, we have heuristics for okay, this code is warm and that's a term that you hear in jit.

28:35 Basically, after we've determined that a block of code is more, we quicken it, which is the term that you used earlier. And this basically means walking over the byte instructions and replacing many of them with what we call adaptive variant. And you can see an example in the pep, but to kind of walk through that example, if you have an attribute load, once the code is quickened, after we've determined its warm, we walk over all the instructions. All of the load attributes by code instructions become load add or adaptive. And what those adaptive instructions do is when we hit them, when we're actually doing the attribute load, in addition to actually loading the attribute, they will kind of do some facts finding they'll gather some information, say okay, I loaded the attribute. Did it come from the class? Did it come from the instance, did it come from addict? Did it come from mod tool? There's a bunch of different possibilities.

29:30 So using that information, the adaptive instruction can turn itself into one specialized instruction. So the example you have here on the screen can either be loaded or instance value or modular slot. And what the specialized instructions do is really cool. Basically they start with a couple of checks just to make sure if the assumptions are holding true. So for a load at or instance value we check and make sure that for example, the class of the object is as we expect, then our attribute isn't shadowed by a descriptor or something like that, or that we're not now getting it off of a class object or whatever. And then it has a very optimized form of getting the actual attribute. There's a lot of expensive work that you could skip if you know that you have an attribute that is coming.

30:18 Directly off the end or another one is load at our slot slots are really interesting for Python performance and they kind of capture more than a lot of the stuff, the difference of the possible and the common. And by that I mean it's possible that every time you access a field off of a class that it was dynamically added and it came from somewhere else and it's totally new and it could be any type. What's more likely though, is that the customer always has a name and the name is always a string. Right. And with slots you can sort of say, I don't want this particular class to be dynamic. And because of that, you can say, well, it doesn't need to have a dictionary that can evolve dynamically, which improves the access speed and the storage and all of that. And here you are pointing out that, well, we could actually have a special opcode that knows whenever I access X, x is a slot thing. Skip all the other checks you might have to do before you get there.

31:12 Well, yeah, and even accessing the slot is going to be fast. So I'm not rested upon how low our slot works, but the slots are implemented using descriptors. So to get the slot from your class, from your instance, you still need to go to the class, look up the attribute in the class dict, find the descriptor, verify it's a descriptor, call.

31:34 The descriptor into that list. Yeah, exactly.

31:37 We can do it even faster than that. So even if you do have slots, this happens really fast without even any dictionary. Lookups, we say, has the class changed?

31:46 No.

31:47 Okay, cool, we'll get to this later. But we remember what offset this lot was at last time and we just reached directly to the object, grab it, there's no dictionary. Lookups, we're not calling any descriptors doing anything like that. It's about as close to a compiled language as a dynamic language. Yeah, just verify, process a change if they reach in. I remember where it was, I tried.

32:09 To kind of align this up, saying the possible and the common most likely your code is not changing. And when it's well written, it's probably using the same type. It's not like, well, sometimes it's a string and sometimes that is the quote seven, and sometimes it's the actual integer seven. It should probably always be one. You just don't express that in code unless you're using type hints. Right, and they're not enforced.

32:31 Yeah.

32:32 And getting back to the adaptive nature and making sure that we are correct. If we had something that was a slotted instance coming through a bunch of times. And that's how you throw a module in or something with an instance dictionary or something else. Or maybe the attribute doesn't exist. Or those quick checks that I mentioned that happen before. We actually do the fast implementation of loading a slot. If any of those checks fail, then we basically fall back on the generic implementation. We say, oh, our assumptions don't hold, go back. And if that happens enough time, then we go back to the adaptive form in the whole cycle. So if I'm throwing a bunch of integer into an add instruction and then later I stop and I start throwing a bunch of strings into an ad instruction, we'll do the generic version of that for a little bit while we're kind of switching over, but then the interpreter will kind of get the hint and start and start specializing for string edition here. And it's really cool to see that.

33:28 This portion of Talk Python to Me is brought to you by the compiler podcast from Red Hat. Just like you, I'm a big fan of podcasts and I'm happy to share a new one from a highly respected and open source company, Compiler, an original podcast from Red Hat. With more and more of us working from home, it's important to keep our human connection with technology. With Compiler, you'll do just that. The compiler podcast unravels industry topics, trends and things you've always wanted to know about tech through interviews with people who know it best. These conversations include answering big questions like what is technical debt? What are hiring managers actually looking for? And do you have to know how to code to get started in open source? I was a guest on Red Hat's previous podcast, Command line Heroes and Compiler follows along in that excellent and polished style we came to expect from that show. I just listened to episode twelve of Compiler how should we handle Failure? I really valued their conversation about making space for developers to fail so that they can learn and grow without fear of making mistakes or taking down the production website. It's a conversation we can all relate to, I'm sure. Listen to an episode of compiler by visiting Talkpython.Fm/compiler. The link is in your podcast players show notes. You can listen to compiler on Apple podcast, overcast Spotify Podcast or anywhere you listen to your podcast. And yes, of course you could subscribe by just searching for it in your podcast player, but do so by following talkpython.fm/compiler so that they know that you came from Talk Python to me, my thanks to the compiler podcast for keeping this podcast going strong.

35:01 Right, so we shouldn't we, being Python developers that write code that just executes without thinking too very much about what that means, we should not have to worry or maybe even know that this is happening, right? If everything goes as it's supposed to, it should be completely transparent to us.

35:18 Yes.

35:19 The only way that you should know that anything is different about 311 is your CPU cycles the cloud hosting bill. At the end of the month, you should be able to upgrade, and if behavior changes, that's a buck.

35:32 Tell us about that going from this adaptive version. The adaptive instance sounds slightly more expensive than just the old school load adder, for example, because it has to keep track and it does a little bit of inspecting to see what's going on. But then the new ones, once it gets adapted and quickened, it should be much faster. Is there a chance that it gets into some weird loop where just about the time the adaptive one has decided to specialize. It hits a case where it falls back and it ends up being slower rather than faster before.

36:03 Well, yeah, the worst case scenario would be you send the same type through any number of times and right when it's going to try to specialize, you to set it through a different time.

36:13 Exactly. Whatever the limit. Like if it goes through N plus one times and then trips it back.

36:20 That would be sort of the worst case scenario. But we have kind of ways of trying to avoid that if at all possible. So for example, if we fail one of those checks, we don't immediately turn into the adaptive form. We will do it after that check has failed a certain number of times. And as just a concrete example, the number of times that we have hard coded is a prime number. So it's less likely that you'll fall into these sort of patterns where it's like, oh, I send three ints through that string three ends.

36:50 It'd be hard to get that worst case behavior without being intentionally.

36:55 By the way, we got to keep in mind these are extremely small steps in our code.

37:01 Right.

37:01 We might have multiple ones of these happening on a single line of what looks like oh, there's one line of code, there's five or however many instructions, bytecode instructions happening and some of them may be specialized in, some of them not.

37:15 Right?

37:15 Yeah, exactly. And so the overall effect definitely smooths out where sure, you may have a worst case behavior, one or two or three individual bitecode instructions, but your typical function is going to have a much more even a smallish function is going to have 20 or 300 instructions doing it real.

37:32 Yeah. If you care about it's performance, it will be doing a lot there.

37:35 Exactly.

37:36 And so some will specialize fastly, some won't, but in general your code will get 25% faster.

37:42 Is there a way you could know? Is there a way that you might be able to know if it specializes or not? We'll get to that.

37:48 But it looks like if I go and use the dismount, d is not for disrespect but for disassembling.

37:57 So you can say import disk and then maybe from display disk, you can say this and give it a function or something, it will write out the byte code of what's happening.

38:07 Does all of this mean that if I do this in 311, I might see additional byte codes than before? Instead of load adder, where I maybe see load adder instance value, will I actually be able to observe these specializations?

38:20 If you pass a keyword argument to your dis utilities, then yes, you will be able to.

38:26 Okay, but if I don't, for compatibility reasons like load add or adaptive and load at or instance, those are just going to return as load adder.

38:34 Load adder, exactly.

38:35 Okay.

38:36 Yeah. So the idea is anyone that's currently consuming the byte coat. They shouldn't have to worry about specialization because the idea is they're totally transparent, so they should only see what the original byte code was. But if you want to get at it, then yeah, any of the dis utilities capacity adaptively.

38:53 True.

38:53 And that will show you the adaptive and again, you only see them if it actually gets quicker. Meaning if you run it a number of times or so.

39:02 Okay, so if I wrote say a function, so often what you're doing if you're playing with disk is like you write the function and then you immediately print out the disk output. Maybe you've never called it right. And so that might actually give you a different even if you said yes to the specialization output, that still might not give you anything interesting.

39:22 It won't give you yeah, it'll just give you as if you hadn't passed it through. Because again, this all happens at runtime. So if the code is being run, nothing happens. We don't specialize code that is ever run.

39:33 What counts as warm? How many times do I got to call it?

39:37 The official answer is that's an implementation detail, the interpreter subject to change at any time. The actual answer is either eight calls or eight times through a loop in a function. So if your function has a loop, that executes more than eight times or if you call it more than eight times. So just calling your function eight times should be enough to quicken it, right?

39:56 Well, maybe that number changes in the future, but just having a sense, is it ten or is it 10000? You know what I mean?

40:04 Where is the scale up before I see something?

40:07 Yeah, if you want to make sure it's clicking but you don't want to take up too much time, I'd say just run it up. Time as shorthand. When we're writing tests and stuff, we do like 100 or thousand because that also gives you time to actually adapt to the actual data that you're sending because it's not enough to just quicken it. Then you'll just have a bunch of adaptive instructions. They actually have to see something.

40:24 Well, now we're paying attention. Like great house having to pay attention too, right?

40:29 Yeah, and you'll see that too, because if you have any sort of logic flow inside of your function when you're looking at the bike code, any paths that are hit will be specialized, but any paths that aren't obviously won't because we don't specialize that code.

40:43 So it has a bit of a code coverage aspect.

40:46 Think about it. If you look at it, you might see part of your code and it's just unmodified because whatever you're doing to it didn't hit this else case ever.

40:55 Yeah, and that's what's really exciting about this too.

40:59 If you run your code a bunch of times and then you call this on it, you see a lot of information that would potentially be useful if you are, for example, JIT compiling that function you see at every edition site, you're adding floats or INTs you see in every attribute load site, whether it's a slot or not. You see where the dead code is, you see the hot code. All that stuff is sort of getting back to what I said, how this sort of enables us to build upon it in the future. Not only can we add more specializations or specialize, more op cos or do that more efficiently, we can also use that information to kind of infer additional properties about the code that are useful for other faster, lower representation.

41:41 Yes, I can certainly see how that might be might inform some kind of jit engine in the future.

41:46 I think the PEP is interesting. The pep659 people can check it out. But like you said, it's informational, so it's not really the implementation. Exactly.

41:55 So let's talk a bit about your project that you did in addition to being on the team, the personal project that you did that basically takes all of these ideas we've been talking about and says, well, two things.

42:08 One, can I take code and look at that and get that answer again, kind of back to my code coverage example, like coloring code lines to mean stuff. And then what's interesting about this and we'll talk to this example that you put up, is there's actionable stuff you could do to make your code faster if you were in a super tight loop section? I feel like applying this visualization could help you allow Python to specialize more rather than less.

42:33 Yeah.

42:34 In general, this tool is really useful for us as maintainers of the specialization stuff, because we get to see where we're failing to specialize, because ideally, if we've done our job well, you should get past your Python without changing your code at all. First and foremost, this is a tool for us in our work so that we can see, okay, what can we still work on here? But that is sort of a cool knockout effect, is that if you do run on your coat, you also know where it's not specializing. And if you know enough about how specialization works, you may be able to fix that. But I would say word of caution against getting too in the weeds and try to get every single bite code to do what you want, because there are much better ways of making it faster. Right.

43:18 There are places where you're like these two functions. I know we got 200 lines of Python, but these two, which are like 20 lines, they are so important and they happen so frequently, anything you can do to make them faster matters. People rewrite that kind of stuff in Rust or in Python before you go that far. Maybe adding a zero on the end of a number is all it takes. Something like that.

43:41 Right.

43:41 That's kind of what I'm getting. Not to go crazy or try to think you should mess with the whole program, but there are these times where, like, five lines matter a lot.

43:49 Yeah.

43:50 Okay.

43:50 Well, tell us about your project specialist.

43:53 One really cool thing that our specializing adaptive interpreter does is we've worked really hard to basically make it easy for us to get information about how well specializations are working. So you can actually compile Python. It will run a lot slower, but you can compile Python with this option with Py stats, and that basically dumps all of the specialization stats.

44:15 Yes.

44:15 You actually pointed out that in the faster Python idea section, it lists out yes.

44:22 There's tons of counters. So you can see that when we run our benchmarks load, at or instance value is run, what, 2 billion times? Almost 3 billion times. And it misses its assumptions, 1% of those.

44:35 And so there's tons of these counters.

44:36 That you can dump.

44:37 And that's really cool because we can run our performance benchmarks and see how this changed. And even cooler than that, it allows us separately to for example, there's been at least one case where we've worked with a large company that has a big private internal app, and they can run it using Python 311. And we can get these stats without actually looking at the source code, which is really cool.

44:58 And so we want to help you be faster. And we're working on the runtime, but we don't want to.

45:02 But you don't want to show us your code, but we're not going to look at it. And so those stats are really useful for kind of knowing, okay, 90% of my attribute access. We're able to be specialized. What about the remaining 10%? Where are they?

45:14 An additional question, like, why couldn't they be specialized?

45:18 And that's something that's a lot easier to tell when you're looking at the code. And so this tool was basically my original intention for writing. It is once we get beyond seeing stats for benchmark and we run something on it that makes it easy to tell at a glance where we're failing and how.

45:36 Right.

45:36 It's like they have 96% code coverage versus these two lines are the ones that are not getting covered.

45:42 Exactly. You get a lot more information from actually getting those line numbers then from the 97th. And so basically the way it works, we already talked about how in the disk module, you can see which instructions are adaptive or specialized and all that. This tool does visualize literally, the implementation of this tool is just a for loop over this, where we just kind of categorize the different instructions and then map those two colors and all sorts of great.

46:12 Yeah.

46:12 And for people listening, imagine some code. And here you've got a four loop with the tuple decomposition. So you got four I comma, t and enumerate of text that you got to do some stuff and it has the I and the T turned green, but then some dictionary access turned yellow. And it talks about is it not at all specialized?

46:34 Did it try but fail to get specialized? And things like that, right?

46:38 Yes.

46:39 So green indicates successful specializations. Red are those adaptive instructions that are slightly slower and represent missed specialization. Yellow and orange are kind of green. As we talked about, one line of Python coat easily be ten or 25 code instructions.

46:56 Right.

46:57 It's kind of a way of compressing that information.

47:00 And one thing I want to point out about this too, is you'll notice it's actually you see characters within a line. And this is something that's really cool because this is piggybacking on a feature that's completely unrelated specialization.

47:14 Originally, when I was writing this, it highlighted each line. So you can see this line was kind of green or this line was kind of yellow. But then I remembered maybe you're familiar.

47:25 In Python 311, we have really formative trace backs where it will actually underline the part of the code that has a syntax error, that has an exception that was raised or something. And so that's the pet that's linked first in the description there. It's called fine grained locations trace backs. And so what that means is that previously we just had line number information in the byte code, but now we have line number and end line number and start column and end column information, which means that due to this completely unrelated feature, we can now also map it directly to which characters file corresponds to individual type.

48:04 That's super cool.

48:05 Yeah.

48:05 So for example, we've got a string and you're accessing it by element by passing an index. And you were able to highlight the square brackets as a separate classification on that array indexing or that string indexing.

48:20 Exactly.

48:21 So it wouldn't have been as helpful to just see that line was kind of yellow orange.

48:26 We see that the fast variable store was really quick. We see that the modularo operation and the indexing of string by it wasn't that fast. We weren't able to specialize it, but we were able to specialize the lookup of the name, weren't able to specialize the function call that sort of information. That granularity is really, really cool.

48:51 Yeah, it is super cool. And I think a good way to go through this, you got some more background that you write about here, but we've already talked a lot about especially.

48:58 This is just summarizing.

48:59 Yeah, exactly. People can check it out there if they want the too long getting read version. But you've got this really nice example of what may be in the first few weeks of some kind of Python programming class.

49:12 Write a program that converts fahrenheit to celsius and Celsius to fahrenheit and then just test that round tripping numbers. Gives you basically the same answer back within floating point variations, right?

49:24 Yeah.

49:24 I really like this example because as we'll show, it's kind of presented through the lens of performance optimization, but it also does a good job of showing you kind of how the interpreter works and how those little tweaks kind of cascade.

49:38 Absolutely. And it highlights certain things. You can take advantage of that if you just slightly change the order. It actually has a different runtime behavior, which we don't often think about in Python. We're doing C++. We would maybe debate, do you dereference that pointer first or can you do it in the loop, but not so much here. So let's go through. I guess just to remind people fahrenheit celsius you would say x equals f -32 and then you multiply the result once you've shifted zero by five divided by nine. And the reverse is you multiply the celsius by nine divided by five and then you add the 32 to shift to zero again. And basically that's all there is to it. And then you go through and say, well, let's run the specialist on it to get its output. And maybe talk about this first variation that we get here.

50:24 Yeah, so as we were talking about, the red indicates adaptive instructions and the green indicates the specialized structure. So we can see here that some things were specialized very well. For instance, look at a cert round trip that's break green because it's in that hot loop, we got enough information about it to say, okay, a certain round trip is being loaded from the global namespace and it's a Python function. So we can do that cool. Python to python call optimization. And that's basically as fast as any function called Python is going to be. But some things aren't specialized. So the things that jump out, the things that we may want to actually take a closer look at would be the map, which is yellow and red. So, for instance, the loads of the local variable f in that first function and the load of the constant 32, those are yellow because the math that they're involved in the subtraction operation wasn't specialized, but the individual loads of those were specialized.

51:26 I see.

51:26 So half yes, half no for what they were involved in.

51:29 Yeah, green plus red equals yellow.

51:33 So that subtraction wasn't specialized. And the reason is, in 311, just based on heuristics that we gathered, we determined it was worth it, at least for the time being, to optimize int, but not in the flow+flow. In what we're doing here is we're subtracting a float, and an int. And so we're able to do that, and that is specializable. But if we were to somehow change that, so that two ints and two floats, then it would be specializable.

52:01 Right, because when I look at it, it looks like it absolutely should have been specialized. I have a float minus an int. The int is even a constant like you're. Okay, well, this is standard math and it's always a Float and it's always an end and it's always subtraction. Right. It seems like that should just well, the math should be obvious and fast, but because as you pointed out, there's this pecularity or maybe an implementation detail for the most.

52:26 Absolutely an implementation.

52:27 Yeah, if it's an int and a Float, well, right now that problem is not solved. Maybe it will be in the future, right? It seems like pretty low hanging fruit, but well, specializations, right?

52:38 Yeah, specializations aren't free. So, for instance, like when we're running those adaptive instructions, we need to check for all of the different possible specializations. So every time we add a new specialization that has some cost and basically we determined, at least for the time being, like I said, that we've tried to do int plus float to Float plus int at least based on the benchmarks that we have and the code that we've seen, it just isn't worth.

53:02 Okay.

53:03 And it's something like int+int is very easy to do quickly. Float is very quickly int plus Float, there's some coercion that needs to happen there anyway, conversion. So it's not something that it's something that costs some time to check, but we don't have a significantly faster way.

53:23 Which does happen on a register on the CPU or something.

53:26 Yeah, exactly.

53:27 Okay, so then you say, well, look what the problem is, it's Float and Int, where we have F -32, which seems, again, completely straightforward. What if we are both floats? Well, what if you know it's going to end up a Float in the end anyway? How about make it 32.0 instead of 32 and then Bam, that whole line turns green, right?

53:47 Yes.

53:47 So now you're basically doing that entire line using fast local variables and native floating point operations. You're adding literally two C doubles together.

53:59 Yeah, and this is what I was talking about, where you could look at that line and go, oh, well, I just wrote the integer 32, but I'm doing floating point math, not like I'm doing integer math. So if you just put write it as a constant with a zero on it, that's a pretty low effort change. And here you go, python can help you more and go faster.

54:18 Right? Yeah.

54:19 And that's not a transformation that we can do for you, because if F is an instance of some user class that defines Thunderstell, that would be a visible change if it started receiving a Float as the right int argument. So those are things that we can't do while making the language still.

54:35 Right.

54:35 But your specialist tool can show you and again, figure out where your code is slow and then consider like, well, we only got 100,000 lines, who's assigned to specializing today?

54:47 Yeah, and it also requires this is a simpler example, but it does require a somewhat deep knowledge of how a specialization work, because for things other than binary operation, it's not going. To become clear what the fix it you just see kind of where I hesitate to even call it a problem, but you see where there's the potential for improvement, but not necessarily how to improve.

55:08 Yeah.

55:08 And then you have another one in here that's I think really interesting because so often when we're talking about math and at least commutative operations, it doesn't matter which order you do them in, like five times seven times three if it's the first two and then the result, or you multiply the last two. And unless there's some weird floating point edge case that the IEEE representation goes haywire, it doesn't matter. Right. For example, here you've got to finish off the fahrenheit to celsius conversion. It's the x times five divided by nine. And that one is busted too, for the same reason.

55:42 Right?

55:43 Because it's a five.

55:43 Yeah.

55:45 So this is kind of for two reasons. This line isn't as good as it could be under ideal position. So x is a floating point, number five is an integer. We have the same problem. We specialize multiplication for intent flow flows, but not for intent flows. And the division is a different problem. We don't try to specialize division at all for the reason that it's kind of problematic, because the right hand side can be zero. And then you have to check for that and all sorts of things that you need to check for, and that make it not as much of a pay off. So if we have both an operation that we can't specialize, but it isn't being specialized, and then another one that we're not even trying to specialize, but.

56:24 Then back to the community of thinking you're like, well, what if we did the division?

56:29 Right? What if we did the division like parentheses five divided by nine and then x times that right.

56:35 Yes.

56:37 Waiting through more of these implementation details, python's compiler, we have a bytecode compiler. It's not compiling to machine code, but it can perform simple optimization. So by changing the order of operations, the byte code compiler sees, oh, five divided by nine. I can do that at compile time once rather than at run time, literally every time. That's never going to change.

56:58 And so by changing the specialization yeah, sorry. Regardless of specialization. That's better anyway.

57:04 Right?

57:04 Because that happens when the PYC file is generated or when the equivalent thing in memory is generated. And then it's just known as a constant, right?

57:13 Yeah.

57:13 And you're doing no division of runtime anymore. So you turn this from two operations, one of which is pretty expensive, to just one operation.

57:20 All you're doing is a multiply by a constant.

57:23 And so you can see that once we apply that transformation to our code, everything's all break as specialized as it.

57:32 Right.

57:32 Because in Python three, five divided by nine is a floating point.

57:36 Right.

57:36 Doesn't module it out or whatever.

57:38 Yeah.

57:39 So then it becomes float times float, which then can be specialized in that first division part is something that is done at Runtime when it first runs, but only once, which is fantastic, like pars time, basically. So, yeah, this function or these functions, this code is much better as understanding.

57:58 Yeah.

57:58 And this transformation isn't something that Python can do for you because it changes the semantics of the language. Again, if that is some user object, then it can serve the types that are being passed to it.

58:08 Yeah.

58:08 If it implements multiply, it expected to receive the five, it didn't expect to receive 1.2715, or whatever the heck that is.

58:16 Right? Yes.

58:17 Cool. All right, well, this is a really cool tool. I definitely encourage people, if they're listening, just come over and just pictures of code and color, just scroll quickly through to see what we're talking about. And I find it super valuable because it highlights with color right on the code that you wrote. It doesn't spit out the Byte code and say, here's the byte code improvements, but it highlights your code and says the code you wrote is being improved by Python or not being improved by Python here.

58:45 Right.

58:45 And just understanding that it might not matter and it might matter a lot to you. It depends.

58:49 Yeah.

58:49 And another thing to highlight too, that's kind of different about this tool for maybe most tools that you would use is this isn't static analysis like mypy or Pilot, where it's running over your code just in its file. You actually need to run your code under this tool for it to be able to do. Because again, all this happens at Runtime. So it's only after running the code that specialist walk over and see where.

59:13 Running the code out enough. Right?

59:15 Yes. So, for example, if I just had this function and I didn't actually call test conversions at the bottom there name equals eight, everything would just be white because nothing actually ran.

59:27 Right.

59:27 So in this example here, you've got, let's see, 24689. Surprising that number, you have nine test values that you're passing in and you're looping over all those values and testing it. If you're going to apply this to your code, it's super important that you come up with a scenario of representative data for now, and greater than eight. Who knows?

59:48 Something that's loopy it runs loops eight times or something. Basically, if the same byte code instructions are being executed a bunch of times, that's how we tell it's hot. Whether that's in a loop or from repeated calls, I can see it's pretty.

01:00:02 Easy to forget that and people might run it and go, it didn't do anything, it did nothing, it's just all white.

01:00:06 Yeah.

01:00:08 I mean, I'm not trying to issue an audio PR or anything, but maybe it should have some kind of warning. Like if there's zero color at all, like a warning, are you sure you ran it? Because we don't think you did anything.

01:00:20 I actually really liked that request. Do that.

01:00:24 Yeah. Because you would know, right? You'd know, if I've colored nothing in any color or whatsoever.

01:00:29 And I look at them like, oh, I did something wrong with someone else's. Like Frank did something wrong.

01:00:33 Yes, exactly.

01:00:34 Got to protect my reputation well and.

01:00:37 Just like limit the issues being raised like how many times you want to say did you remember to call it enough times? Yeah.

01:00:46 Cool.

01:00:46 I definitely think people should check this out if they're interested in seeing how the adaptive specializing adaptive interpreter from Python 311 is applied to their code. I guess. Also other caveat like really not super handy. If you try to do this with 310 Got to have 311.

01:01:02 It refuses to run under 310.

01:01:05 I accidentally made that mistake enough times where I just had the python environment. I had active with 310 argument doesn't Exist. Yeah, all that stuff.

01:01:15 Yeah.

01:01:16 No.

01:01:17 You need to be running at 311 or for 312.

01:01:22 But as you showed earlier, you can download iPad.org. My favorite way of installing Python versions. Py n has had 311 Dev for a while now. They also have 312. Dev are crazy, and you want to try it out, but yeah, you do need a 311.

01:01:40 Fantastic. Well, really great work. I think it's quite a contribution. That really highlights all the work that's being done in 311. So well done. All right, now, before you get out of here, I got to ask you the final two questions. If you're going to write some Python code, if you're going to work on specialist, what editor do you use?

01:01:56 I use Vs code.

01:01:57 Okay.

01:01:57 Right on. And then notable PyPI package. Something you came across, like, oh, this thing is cool. Maybe not the most popular, but something.

01:02:05 Yes.

01:02:06 I thought about this. Can I say two? Is it only limited?

01:02:09 Two is fine. Okay.

01:02:10 I really like creative packages that kind of blur the line between what is python and what is not?

01:02:16 Okay.

01:02:17 So the first one is called Py metal. Three. I MTL three. This is so cool. It Allows You To Design hardware using Python, and you can basically design everything from Just A Small set of logic Gates to A Full chip, then export it to Verilog. You run on an FPGA. And this is kind of my hardware background coming through. Right. Yeah. My brother is actually studying at cal poly San Luis Obispo right now, and he's on a research team that's designing an entire processor in Python.

01:02:49 Basically, the processor itself is designed in Python, and you can test it with Python. So they're testing the hypothesis, and it's a really cool, creative way of doing this sort of stuff.

01:03:01 You don't need any special hardware to make it happen either, right?

01:03:04 Yeah, exactly.

01:03:05 You can just run it on your local machine. And now you've got a risk five tip running.

01:03:10 Nice.

01:03:10 Yeah.

01:03:11 The other one is another kind of cool, weird old hardware package. I don't know if account is pip installed. It's called Peach Py. I don't know how well it maintained it is, but you have to do that thing where you tell pip to install from, like, a GitHub link.

01:03:25 All right, well, you can pip install from a GitHub link.

01:03:28 Yeah.

01:03:29 You got to give it a really weird URL instead of just okay. Got it.

01:03:33 Sure.

01:03:33 And so this is super cool. It's an X 86 64 assembler in Python.

01:03:39 Okay.

01:03:39 So with this you can basically implement a compiler or if you like it, a just in time compiler or basically X 86 hardware. So what this allows you to do is in your python code. It takes care of doing things like allocating hardware registers and labeling jumps and all that sort of stuff. Calling conventions. And so you can specify exactly what assembly instructions you want to execute, assemble them, and then it will package them up in a python function object. And you can call your assembly code from Python, which I think is wow.

01:04:14 And it actually executes as assembly instructions.

01:04:17 Yes.

01:04:17 So it's like faults and everything.

01:04:20 Yeah, of course. It's the most common thing it does.

01:04:22 Yeah, like a simple example. You can pass in a Py object pointer and then add 8 or 16 or whatever to get the type of it, and then return. That no return.

01:04:33 Yeah.

01:04:33 Very cool. So you can see the code example on here for Peach Py. I'll put the link in the show notes, but you do things like create an argument, and then you create a general purpose register and you load the argument onto the register. And you might call the issa SSE for operation or whatever. Pretty Cool. Two really good ones.

01:04:51 All right.

01:04:51 Final call to action. People are interested in specialist and exploring the specializing adaptive interpreter. What do you tell them?

01:04:58 I think the most important thing that you can do is download or, if you're feeling like it built, python 311 and try running it for yourself. See if your code gets faster. It probably will. If it doesn't, then specialists could help show you where it's not. And if that is surprising to you, then you could report it to us. If, like, oh, my code got slower for some reason.

01:05:20 And it looks like this specific pattern. What's causing it? That's something that we care about.

01:05:25 Yes.

01:05:25 I suspect this interpreter is something that's never done.

01:05:29 It's Clear.

01:05:29 You could always add more cases. You could make it decide sooner or easier or more accurately, when and how to specialize and add more bite. There's, like, a lot of stuff you can do, right? As opposed to oh, yeah. Now you read CSV file, that part is done.

01:05:44 And again, if you're feeling up to it and you've got a huge, pure Python app, you can even compile the stats and dump that out. Take a peek at it. I think you showed our repository earlier where we have our issue tracker, where we kind of just spitball ideas and keep track of work in progress. This one, if you go to the.

01:06:00 Issue tracker here sorry, the ideas one. So you've got it's faster. CPython ideas on GitHub.

01:06:07 Yeah.

01:06:07 So if you go to issues there, that's all our work in progress. And if you have experience optimizing dynamic languages, or if you see a cool research paper or something that you want us to know about opening an issue here and where things get done.

01:06:21 Yeah.

01:06:21 Fantastic. Well, thank you for making Python faster. I think it's really important.

01:06:27 I'm always a little bit conflicted because I have some pretty complicated web apps to get a decent amount of traffic and they've been fine, like really handful of milliseconds response time and they're doing all sorts of madness with databases and HTML and all kinds of stuff. So on one hand, I don't know if I need Python to be faster, but on the other, people are deciding which language they're going to choose and where they can do their work. And sometimes, either perceived or real reasons, people think Python is not fast enough.

01:06:56 Right.

01:06:56 And so this is important work that will really help some people and will help the community be stronger. So thank you.

01:07:02 We love Python program.

01:07:04 Right on. Cool. Well, thank you so much for being here. It's great to chat with you.

01:07:09 Thanks again.

01:07:09 Yeah, you bet.

01:07:10 Bye.

01:07:12 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering.

01:07:17 It really helps support the show.

01:07:19 Starting a business is hard. Microsoft for Startups Foundershub provides all founders at any stage with free resources and connections to solve startup challenges. Apply for free today at Talkpython.FM/Foundershub. Listen to an episode of Compiler, an original podcast from Red Hat Compiler unravels industry topics, trends and things you've always wanted to know about tech through interviews with the people who know it best. Subscribe today by following talkpython.Fm/compiler. Want to level up your Python? We have one of the largest catalogs.

01:07:51 Of Python video courses over at Talk Python.

01:07:53 Our content ranges from true beginners to deeply advanced topics like memory and Async. And best of all, there's not a subscription in site. Check it out for yourself at training.talkpython.FM.

01:08:04 Be sure to subscribe to the show.

01:08:05 Open your favorite podcast app and search for Python. We should be right at the top.

01:08:09 You can also find the itunes feed.

01:08:11 At /itunes, the Google Play feed at /Play and the Direct rss feed at rss on talkpython.FM. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening.

01:08:32 I really appreciate it. Get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon