00:00 The debate about whether Python is fast or slow is never ending. It depends on what you're optimizing for CPU server consumption, developer time maintainability. There are many factors. But if we keep our eye on the pure computational speed in the Python layer, then yes, Python is slow. In this episode, we invite Anthony sharp back on the show. He's here to dig into the reasons that Python is computationally slower than many of its pure languages and technologies such as C++ and JavaScript. This is talk Python to me, Episode 265, recorded may 19 2020.
00:48 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm. And follow the show on Twitter via at talk Python. This episode is sponsored by brilliant.org and century, please check out their offerings during their segments. It really helps support the show. Anthony, welcome back to talk Python.
01:14 Hey, Mike, it's great to be back.
01:16 Yeah, it's great to have you back. You've been on the show a bunch of times, you've been over on Python bytes when you're not featured there. But you know, people may know you were on episode 168 10, Python security holes and how to plug them that was super fun with one of your colleagues. And then 214, dive into the CPython 3.8 source code, or just what was new 3.8. And then a guided tour to the CPython source code, which I think at the time was also three, eight. And now we're gonna look at the internals of Python again, I feel like you're becoming the Python internals guy.
01:48 Yeah. Well, I don't know. There's lots of people who know a lot more about it than I do. But I've been working on this book over the last year on CPython internals, which has been focused on 3.9. So yeah, we've got some stuff to talk about.
02:03 Yeah, that's awesome. And your book started out as a real Python comm article, which I'm trying to define a term that describes, but some of these look like, when I think of article, I think of a three to four page thing. Maybe it's in depth, and it's 10 pages. This is like 109 pages or something as an article, right? It was like insane. But it was really awesome, and really in depth. And so you were partway towards a book and you figured, like, well, what the heck, I'll just finish up this walk.
02:29 Yeah, I figured I'd pretty much written a book. So I might as well put it between two covers a lot. It was actually a lot of work to get it from that stage to where it is now. So I think the whole thing's pretty much been rewritten. Like there's a way that you explain things in an article that people expect, which is very different to the style of a book, and also the stuff that I kind of skimmed over in the article. So I think it's actually about three times longer than the original article. And it's a lot more practical. So rather than being like a tourist guide to the source code, it's more about like CPython internals and optimizations and practical tools you can learn as most of the like advanced techniques, if you use CPython a lot for your day job to either make it more performant or to optimize things or to make it more stable and stuff like that.
03:21 Yeah. It's really interesting, because if you want to understand how Python works, and you're, say, the world's best Python developer, your Python knowledge is going to help you a little bit but not a ton for understanding sci fi, because that's mostly well see, code, right. And so I think this having this guided tour, this book that talks about that is really helpful, especially for taking people who know and love Python, but actually want to get a little deeper and understand the internals or maybe even become a core developer.
03:49 Yeah, definitely. And if you look at some of the stuff, we'll talk about this episode, hopefully like, scythe on and my PC and stuff like that, then knowing C or knowing how C and Python work together is also really important.
04:02 Yeah, absolutely. All right. So looking forward to talking about that. But it's really quickly, you know, give people a sense of what you work on day to day when you're not building extensions for IDs, writing books, and otherwise doing more writing.
04:15 Yes, I work at NTT, and run sort of learning and development and training for the organization. So I'm involved in my guess, like, what skills we teach our technical people and our sales people on all of our employees? Really?
04:30 Yeah, that's really cool. That sounds like a fun place to be. Yeah, that's a great job. Yeah. Awesome. All right. Well, the reason I reached out to you about having you on the show for this specific topic, I always like to have you on the show. We always have fun conversations, but I saw that you were doing we use multiple or just this PI con talk, just one visit accepted to but I was supposed to pick one. I see. That's right.
04:55 That's right. And then Python got canceled.
04:58 Yeah, so I was like, well, let's see now. Talk, we can talk after pi con after you give your talk. It'd be really fun to cover this. And then, you know, we were supposed to share a beer in Pittsburgh, and we're like half a world away. Didn't happen, did it?
05:13 Yeah. Maybe next year.
05:14 Yeah. Hopefully next year, hopefully things are back to up and running. Because I don't know to me Python is kind of like my geek holiday that I get to go on. I love it. Yeah. All right. Well, so just I guess, for people are saying you did end up doing that talk and an altered sense, right? They can technically go watch it soon, at least maybe by the time this is out.
05:34 Yeah, definitely. It'll be out two nights, it's going to be on the YouTube channel on the pike on 2020. YouTube channel, the organizers reached out to all the speakers and said, if you want to record your talk and submit it from home, then you can still do that. And we'll put them all up on YouTube. I think
05:51 that's great. You know, and there's also a little bit more over pi con online. Like, one thing I think is really valuable for people right now is they have the job. Fair, kind of right. There's a lot of job listings for folks who are looking to get in jobs. Have you seen the PSF JetBrains survey that came out? Yesterday? The 2019. That came out just like a few days ago? Really interesting stuff. Right? Like a lot of cool things in there. Yeah, definitely. Yeah. I love that, that in the StackOverflow developer survey, those are the two that really I think, have the pulse correctly taken. One of the things that was in there, I thought was interesting is more than any other category of people. They said, How long have you been coding? I don't know if it was in Python, or just how long have you been coding, but it was different, you know, 123 years, three to five, five to 1010 to 15. And then people like me, forever long time, you know, like 20 Plus or something. The biggest bar of all those categories. The biggest group was the 123 years. Yeah, right. Like by 29% of the people said, I've only been coding three years or fewer. And I think that that's really interesting. So I think things like that job board and stuff are probably super valuable for folks just getting into things. Definitely. Yeah, so really good that they're putting that up, and people will be able to check out your talk. I'll put a link to it in the show notes, of course, but they could just go to the PI con 2020. YouTube channel and check it out there.
07:13 Yeah. And check out the other talks as well. There's some really good ones up already. The nice thing about this year's virtual pi con is you can watch talks from your couch.
07:22 That's right. You don't even have to get dressed to go to fika. Just doing ups.
07:26 That's right. So much more comfortable within the conference chairs.
07:31 That's true. That's for sure. Yeah, very cool. I'm definitely looking forward to checking out more of the talks as well. I've already watched a few. I wanted to set the stage for our conversation here by defining slow because I think slow is in the eye of the beholder, just like beauty, right? Like, sometimes slow, doesn't matter. Sometimes. Computational speed might be slow, but some other factor might be quick. So I'll let you take a shot at that I'll throw in my two cents as well. Like, let's like, What do you mean, when you say, why is Python slow?
08:04 So when I say why is Python slow? The question is, why is it slower than other languages that doing exactly the same thing and
08:14 have picked on? Right? So if I had an algorithm that I implemented, say and see a JavaScript on top of node and Python, it might be much slower in Python? Oh, walltime, like execution time?
08:25 Yeah, execution time might be much slower in Python than it is in other languages.
08:29 And that matters sometimes. And sometimes, it doesn't matter. as much. It depends what you're doing, right? If you're doing like a DevOps II thing, and you're trying to orchestrate calling into Linux, well, who cares? How far how fast Python goes? Probably like the startup time is the most important of all of them. If you're modeling stuff, and you're trying to do the mathematical bits, anything computational, and you're doing that in Python, then it really might matter to you.
08:54 Yeah, so it was kind of like a question. If we can find out the answer. Maybe there's a solution to it. Yeah. Because you know, you hear this thrown around people say pythons too slow. And I use this other language because it's faster. And so I just wanted to understand like, what is the actual reason why Python is slower at doing certain things in other languages? And is there a reason that can be resolved? Or is it just that's just how it is? It's part of designs
09:21 is fundamentally it's going to be that way? Yeah. I don't think it is. I think, you know, it's slow. No, I don't think it's fundamentally has to be that way.
09:30 I agree with you. I think in the research as well, it uncovered, it doesn't fundamentally have to be that way. And in lots of cases, it isn't that way either. Like there's ways to get around the slowdown, like the causes of slowdown. And if you understand in what situations Python can be slow, then you can kind of like bypass those, right?
09:52 So let me tell a really interesting story that comes from Michael tricycles book Python interviews. So over there, he interviewed I think it was Alex. Yeah, Alex Martelli. And they talked about the history of YouTube, right? YouTube is built on Python. And why is that the case? Originally, there was Google Video, which had a hundreds of engineers writing implementing Google Video, which is going to be basically YouTube. But YouTube was also a start up around the same time, right. And they were kind of competing for features and users and whatnot. And YouTube only had like 20 employees at the time, or something like that. Whereas Google had hundreds of super smart engineers. And Google kept falling behind farther and farther and not be able to implement the features that people wanted nearly as quick as YouTube. And the reason was, they were all doing it in C++. And it took a long time to get that written. And YouTube just ran circles around them with a, you know, more or less than a fifth of the number of people working on it. So in some sense, like, that's a testament of Python speed, right? But it's not its execution speed. It's like the larger view of speed, which is why I really wanted to find like, what computational speed is. Another sense where it may or may not matter is like, Where are you doing stuff that Wait, right? Somewhere where asyncio would be a really good option, right? I'm talking to Redis. And talking to this database, I'm calling this API, if 95% of your time is waiting on a network response, it probably doesn't matter right long as you're using some sort of async or something. But then there's that other part where it's like I have on my computer, I've got six hyper threaded cores, why can only use one 12th of my computational power on my computer? If I still write C code, right? So there's these other places where it super matters? Or I just like you said, there's this great example that we're going to talk about the inbody problem modeling, like planets, and how they interact with each other. And I would just like to set the stage, what was the number for C versus Python in terms of time, computation time to give people a sense, like, why did we care? Like, why is this a big enough deal to worry about? Is it what is it 30%? slower
12:00 is a little bit slower? Yeah, it's so for this algorithm. This is called the N body problem. And it's to do with calculating the orbits of some of the planets in the solar system. And you just do a lot of really simple mathematical operations. So just adding numbers, but again, and again and again. So millions of times lots of loops, lots of math, lots of math, lots of looping. And in C, this implementation is seven seconds to complete. And in Python, it's 14 minutes, that might be a difference that you're needing to optimize away. That could be too much. Right. Yeah. I mean, everyone is calculating the altar planets as part of their day jobs. So yeah, you know,
12:39 I honestly haven't really done that for at least two weeks. No, but I mean, it's fundamentally like thinking about like this is, I think this undercovers one of the real Achilles heels of Python, in that doing math in tight loops, is really not super great. In pure Python, right, whether that's planets, whether that's financial calculations, or something else, right. But numbers are very flexible, but that makes them inefficient, right? Python is interpreted, which has a lot of benefits, but also can make it much slower as well. Right?
13:15 Yeah. So I think we're looking at this particular problem, because I thought it would be a good example, it shines a bit of a spotlight on one of see pythons weaknesses when it comes to performance. But in terms of like the loop, the only times you would be doing like a small loop. And doing the same thing over and over again, is if you're doing like math work doing like number crunching. Or if you're doing benchmarks, that's like one of the other reasons. So like the way that bench a lot of benchmarks designed to do like computational benchmarks, anyway, is to do the same operation again and again. So if there is a, an overhead or a slowdown, then it's magnified to the point where you can see it a lot bigger.
13:55 Yeah, for sure. I guess one thing to put out there, before people run code, it doesn't go as fast as they'd hoped. So they say that Python is slow, right, assuming the code they originally ran is Python like that. That would be a requirement, I guess, is you probably should profile it, you should understand what your code is doing. And where it's slow. Like, for example, if you're doing lookups, but your data structure is a list instead of a dictionary, right? You could make that 100 times faster just by switching a date, because you're just doing the wrong type of data structure, the wrong algorithm. It could be just that you're doing it wrong. Right. So I guess before people worry about like, is it executing too slowly? Maybe you should make sure that it's executing the right thing.
14:40 Yeah, is unlikely that your application is running a very small operation, in which is this benchmark again, and again, like millions of times in a loop. And if you are doing that, there's probably other other tools you could use. And there's other implementations you can do in Python.
14:59 This point Talk Python to me is brought to you by brilliant.org brilliant mission is to help people achieve their learning goals. So whether you're a student, a professional brushing up, or learning cutting edge topics, or someone who just wants to understand the world better, you should check out brilliant. Set a goal to improve yourself a little bit every day. Brilliant makes it easy with interactive explorations and a mobile app that you can use on the go. If you're naturally curious, want to build your problem solving skills or need to develop confidence in your analytical abilities, then get brilliant premium to learn something new every day. Brilliant, thought provoking math, science and computer science content helps guide you to mastery by taking complex concepts and breaking them into bite sized understandable chunks. So get started at talkpython.fm/brilliant, or just click the link in your show notes.
15:50 Another benchmark I covered in the talk was the regular expression benchmark, which Python is actually really good at. So this is like the opposite to this particular benchmark. So just saying that Python is slow isn't really a fair statement, because and we kind of talk about this in a minute. But like for other benchmarks, Python does really, really well. So it's string implementation is really performant. And when you're working with text based data, Python, such a great platform to use a great language to use the CPython compilers is pretty efficient at dealing with text data. And if you're working on web applications, or data processing, chances are you dealing with text data. So
16:32 yeah, that's a good example. Like the websites that I have, like the top Python training site and the various podcast sites and stuff. They're all in Python with no special incredible optimizations other than like databases with indexes and stuff like that. And, you know, the response times are like 1030 milliseconds, there's no problem. Like, it's fantastic. It's really, really good. But there are those situations like this and body problem, or other ones where it matters. I don't know if it's fair or not to compare it against C, right? C's really, really low level, at least from today's perspective, it used to be a high level language, but now I see it as a low level language. If you do a malloc and free and you know, the address of this thing, right, that feels pretty low level to me. So maybe it's unfair, I mean, you could probably get something pretty fast in assembly, but I would never choose to use assembly code these days. Because it's just like, I want to get stuff done and maintain it and be able to have other people understand what I'm doing. But, you know, kind of a reasonable comparison, I think, would be Node JS, and JavaScript. And you made some really interesting compare and contrast between those two environments. Because they seem like well, like a Python lease, it has some C and their JavaScript who knows what's going on that thing, right? Like, you know, what's the story between those two?
17:52 Yeah, you make a fat point, which is, I mean, comparing C and Python isn't ready for one is like a strongly typed compiled language. The other is a dynamically typed, interpreted language. And they handle memory differently. Like in C, you have to statically or dynamically allocate memory in C, Python is done automatically. Like it has a garbage collector, this is so many differences between the two platforms. And so I think node j s, which is, so node j. s is probably a closer comparison to Python. Node. js isn't isn't a language, it's a kind of like a stack that sits on top of JavaScript that allows you to write JavaScript, which operates with things that run in the operating system. So similar to C, Python, like C, Python has extensions that are written in C, that allow you to do things like connect to the network, or, you know, connect to like physical hardware or talk to the operating system in some way. Like, if you just wrote pure Python, and there was no C, you couldn't do that. Because the operating system API's are c headers in most cases. So
18:59 right, almost all of our head and C somewhere. Yeah,
19:01 yeah. And with JavaScript is the same thing. Like if you want to talk to the operating system, or do anything other than like working with stuff that's in the browser, you need something that plugs into the OS, and no j s kind of provides that stack. So when I wanted to compare Python with something, I thought node was a better comparison, because, like JavaScript and Python, in terms of syntax are very different. But in terms of their capabilities, they're quite similar. You know, they both have classes and functions, and you can use them interchangeably. They're both kind of like dynamically typed, the scoping is different and the language is different. But like, in terms of the threading as well, they're quite similar, right? They do feel much more similar. But there's a huge difference between how they run at least when run on Google's v eight engine, which basically is thing behind node and whatnot. versus C, Python is C, Python is interpreted and B eight JIT compile just in time compiled. Yeah, so that's probably one of the biggest differences. And when I was comparing the two, so I wanted to see, okay, which one? Which one is faster? Like if you gave it the same task, and if you get the N body problem, then no j s is a couple of multiples faster. I think it was two or three times faster to do the same algorithm. And for a dynamically typed language, you know, that means that they must have some optimizations, which make it fast. I mean, if you're running on the same hardware, then, you know, what is the overhead? And kind of digging into it, I guess, in a bit more detail. So JavaScript has this, actually, there's multiple JavaScript engines, but kind of the one that Node JS uses is Google's VA engine. So quite cleverly named, which is all written in already, or a V 12. You know,
20:55 or an inline six? I think that's a better option.
20:57 Yeah, there you go.
21:00 So Google's VH JavaScript engine is written in C++. So maybe that's a fair comparison. But the optimizing compiler is called turbofan. And it's a JIT optimizing compiler. So it's a just in time compiler. Whereas C, Python is an ahead of time or an AI t compiler. And its JIT optimizer has got some really clever, basically, sort of algorithms and logic that it uses to optimize the performance of the application, what actually runs. And these can make a significant difference, like some, some of the small optimizations alone can make 30 40% increase in speed. And if you compare even just the eight compared to other JavaScript engines, you can see, like, what all this engineering can do to make the language faster. And that's how I got two, three multiples, performance increases that was to optimize the jet. And understand like how people write JavaScript code, and the way that it compiles the code down into operations, and then basically like it can reassemble those operations that are more performant for the CPU so that when it actually executes them, does it in the most efficient way possible, right? The difference between a JIT and an IoT is that the JIT compiler kind of makes decisions about the compilation based on the application, and based on the environment, whether an IoT compiler will compile the application the same, and it does it all ahead of time, right. So you probably have a much more coarsely grained set of optimizations and stuff for ahead of time compiler, like C++ or something, right? Like I have compiled against x86 Intel CPU, with, like, the multimedia extensions, or whatever, right, the scientific computing extensions. But other than that, I make no assumptions, whether it's multi core, highly multi core, what it's LG caches, none of that stuff, right? It's just we're gonna kind of target modern intel on macOS and do it on Windows and compile that. Yep, so modern CPU architectures. And modern OSS can really benefit if you've optimized the instructions that you're giving them to benefit, like, the caches that they have, or the cycles that they've set up. And sort of the turbofan optimizer for the VA engine, mate takes a lot of advantage of those things. Yeah,
23:25 that seems really powerful. I guess we should step back and talk a little bit about how c Python runs. But being an interpreter, he can only optimize so much. It's got all of its byte codes, and it's gonna go through its byte codes and execute them. But saying like, well, these five byte codes, we could actually turn that into an inline thing over here. And I see this actually has no effect on what's loaded on the stack. So we're not going to like push the I mean, it seems like it doesn't operate, optimizing. totally the wrong if it doesn't optimize, like across lots of byte codes, as it's thinking about it.
24:03 Yeah. So what c Python will do when it compiles your code. And it's also worth pointing out that when you run your code for the first time, it will compile it, but when you run it again, it will use the cached version. So right if you ever see the Dunder pi
24:17 cache, with ya pi c file, that's like, three of the four steps of getting your code ready to run saved and done and never done again.
24:26 Yeah, so that's like the compiled version. So it's not Python is slow to compile code, it doesn't really matter unless your code is somehow changing every time it gets around, which I'd be worried about.
24:38 Yeah, bigger problems.
24:38 Yeah, exactly. So the benefits, I guess, of an IoT compiler is that you you compile things ahead of time and when they execute, they should be efficient. So see, pythons compiler will kind of take your code which is like a text file. Typically, it will look at the syntax, it will parse that into an abstract syntax tree which is a Sort of a representation of functions and classes and statements and variables and operations and all that kind of stuff, your code your file, your module basically becomes like a tree. And then what it does is it then compiles that tree by go walking through each of the branches and walking through and understanding what the nodes are. And then there is a compilation. Basically, like in the CPython compiler, there's a function for each type of thing in Python. So there's a compile binary operation, or there's a compile class function and a compile class will take a node from the ASP, which has got your class in it. And it will then go through and say, Okay, what properties? What methods does it have, and it will then go and compile the methods and then inside a method or go compile statements. So like, once you break down the compile it into smaller pieces is not that complicated. And what our compiler will do is, will spit out, so compiled basic frame blocks, they're called, and then they get assembled into bytecode. So after the compiler stage, there is an assembler stage, which basically figures out in which sequence should the code be executed? You know, which basically like what will the control flow be between the different parts of codes, different frames, in reality, like they get executed in different orders, because they depend on input, whether or not you call this particular function. But still, like, if you've got a for loop, then it's still good to go inside the for loop, and then back to the top again, like that, that logic is like hard coded into the for loop, right?
26:35 You know, as you're talking, I'm wondering if, you know, minor extensions to the language might let you do higher level optimizations, like a C, like having a frozen class that you're saying, I'm not going to add any fields to or like an inline on a function like I only, or make it function internal to a class in which it could be inlined, potentially, because you know, no one's going to be able to like, look at it from the outside of this code and stuff. What do you think
27:03 there is an optimizer in the compiler called the peephole optimizer? And when it's compiling, I think essentially, it's after the compilation stage, I think it goes through it, it looks at the code that's been compiled. And if it can make some decisions about either like dead code that can be removed, or branches, which can be simplified, then it can basically optimize that. And that will make some improvement, like it will optimize your code slightly, right. But then once it's done, basically, your Python application has been compiled down into this, like assembly language. Yeah, called bytecode, which is the, like, the actual individual operations, and then executed in sequence the split up into small pieces, they're split up into frames, but they're executed in sequence,
27:50 right? And if you look at the C source code, dive into there, there's a C eval dot c file, and it has like the world's largest while loop with a switch statement in it, right?
28:02 Yes, this is like that, the kind of the brain of CPython, or maybe it's not the brain, but it's the bit that like, goes through each of the operations and says, Okay, if it's this operation, do this thing, if is that one do this thing. This is all compiled in C. So it's, it's fairly fast, but it will basically set and run the loop. So when you actually run your code, it takes the assembled bytecode. And then for each bytecode operation, it will then do something so for example, does a bytecode for add an item to a list, so it knows that it will take a value off the stack, and we'll put that into the list, or this one, which calls a function. So if the bytecode is called function that it knows to figure out how to call that function and see,
28:46 right, maybe it's loaded a few things on the stack, it's gonna call it does just get sucked along, something like that. And so I guess one of the interesting things and you, you were talking about an interesting analogy about this, sort of when Python can be slow versus a little bit less slow. It's the overhead of like, going through that loop, figuring out what to do like preparing the step before you call the CPython thing, right, like list dot sort, it could be super fast, even for a huge list because it's just going to this underlying c object and say in C go to your sword. But if you're doing a bunch of small steps, like the overhead of the next step, can be a lot
29:25 higher and and body problem. To step that it has to do the operation it has to do will be at number eight, and number B, which honor like a decent CPU, I mean, it's like nanoseconds, in terms of like time it takes to execute. So if it's basically if the operation that it's doing is really tiny, then after doing that operation, it's going to go all the way back up to the top of the loop again, look at the next or the next barcode operation, and then go and run this apart, you know, call this thing which runs the operation which takes again, like nanoseconds to finish and then go There's a little background again. So I guess the analogy I was trying to think of where the mind body problem is, you know, if you were a plumber, and you got called out to do a load of jobs in a week, every single job was, can you change this one washer on a tap for me, which takes you like, two minutes to finish. But you get 100 of those jobs in a day, you're going to spend most of your day just driving around and actually doing any plumbing, you're going to be driving from house to house and then doing these like two, two minute jobs and then driving on to the next job. So I think the end body problem that's kind of an example of that is that the evaluation loop can't make decisions like it can't say, oh, if I'm going to do the same operation again, and again, and again, instead of going around the loop each time, maybe I should just call that operation, the number of times that I need to. And those are the kind of optimizations that a JIT would do, because it kind of changes the compilation order, in sequence. So that's, I guess, that we could talk about there are objects available for Python? Yes, I have CPython doesn't use it yet. But for things like the N body problem, instead of the, you know, the plumber driving to every house and doing this two minute job, why can't somebody actually just go in? Why can't everyone just send that tap to the factory to sit in the factory all day, replacing the wash eggs, like Netflix
31:26 of taps or something? Yeah. Back when they sent out an ivds.
31:31 Maybe I was stretching the analogy a bit. But you know, instead of basically like you can make optimizations if you know, you're going to do the same job again and again and again. Or maybe like he just brings all the washers within instead of driving back to the warehouse each time. So yeah, like there's optimizations you can bake if you know what's coming. But because the the CPython application was compiled ahead of time, it doesn't know what's coming. There are some opcodes that are coupled together. But there's only there's only a few like, which ones they asked of my head. But there's only a couple and it doesn't really add a huge performance increase.
32:05 Yeah, there have been some improvements around like bound method, execution time and methods without keyword arguments, or some something along those lines that got quite a bit faster. But it's still just like, how can we make this operation faster? Not how can we say like, you know, what, we don't need a function. Let's inline that. It's called in one place once just inline it right? Things like that. This portion of talk Python, to me is brought to you by century. How would you like to remove a little stress from your life? Do you worry that users may be having difficulties or are encountering errors with your app right now? Would you even know it until they send that support email? How much better would it be to have the error details immediately sent to you, including the call stack and values of local variables, as well as the active user stored in the report. With century This is not only possible, it's simple and free. In fact, we use century on all the talk Python web properties, we've actually fixed a bug triggered by our user and had the upgrade ready to roll out as we got the support email, that was a great email to write back, we saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users today, create your free account at talk Python dot f m slash century and track up to 5000 errors a month across multiple projects for free. So you did say there were some there was pigeon. There's pi pi, there's unladen. Swallow, there's some other options as well. But those are the jets that are coming to mind piston. All of those were attempts, and I've not heard anything about any of them for a year. So that's probably not a super sign for their adoption.
33:42 Yeah, so the ones I kind of picked on because I think they've got a lot of promise and kind of show a big performance improvement is pi pi. Which is shouldn't be new, new. I mean, it's a popular project. But pi pi use why
33:55 he why because some people say like Python package index, they also call it pi pi. But that's a totally different thing. Yes.
34:04 Popeye kind of helped solve the argument to my talk catchy cuz if Python is slow, then writing a Python compiler in Python should be like, really, really slow. But actually pi pi, which is a Python compiler written in Python. In problems like the end body problem, where you're doing the same thing again, and again, it's actually really good. Like, it's significantly as 700. And something percent faster than CPython are doing the same algorithm. Like if you copy and paste the same code and run it in pi pi versus C, Python. Yeah, we're run over seven times faster in pi pi. And pi pi is written in Python. So as an alternative Python interpreter, just written purely in Python, but it has a JIT compiler. That's probably the only difference. Yeah.
34:53 As far as I understand it, pi pi is kind of like a half JIT compiler. It's not like a full JIT compiler. Like I'd say C# or Java, in that it will like run on C, Python and then like, decide to JIT compile the stuff that's run a lot,
35:08 be like that's the case, pi. Pi is a pure JIT compiler. And then number is a, you can basically choose to jet certain parts of your code. So with number you can use a actually a decorator, and you can stick it in at jet. Yeah, literally his that you can do it at jet on a function, and it will just compile that function for you. So if there's a piece of your code, which would work better if it would get it, like it would be faster, then you can just stick a JIT decorator on that using the number package.
35:41 Yeah, that's really cool. Do you have to how do you run it? I've got some function within a larger Python program, and I put in at jet on it. Like how do I make it actually JIT that and like execute? Can I still type Python space? I think or what happens now? Yeah, yeah, I'm just wondering like, I probably is the library that as it pulls in, what it's going to give you back, you know, the wrapper, the decorator the function, it probably does get it so interesting. I think that's a really good option of all the options. Honestly, I haven't done anything with numba. But it looks like probably the best option. It sounds a little bit similar to scythe on but scythe on is kind of a upfront style, right? Like we're going to pre compile this Python code to see whereas numba It sounds more a little more runtime,
36:26 yes, this lesson is not really a jet or a jet optimizer. It's a way of decorating your Python code with type annotations, and using like a slightly different syntax to say are this, this variable is this type. And then psyphon will actually compile that into a sixth module. And then you run it from CPython. So it basically like compiles your Python into C, and then loads it as a C extension module, which can make a massive performance improvement.
37:00 Yeah. So you've got to run a like a setup py build command to generate the libraries, the dot o files or whatever the platform generates. And then those get loaded in. Even if you change the Python code that was their source, you've got to recompile them, or it's just still the same old compiled stuff, same old binaries. Yeah,
37:21 you can automate that. So you get to type it by hand. But yeah, but I think siphon is a really good solution for speeding it up. But as I kind of pointed out in my talk, it doesn't answer the question of why Python is slow. It says for Python can be faster if you'd use C instead.
37:37 Yeah. One thing I do like about scythe on these days is they've adopted the type hints type annotation format. So if you have, what is that Python three, four or later type annotations, you got to be explicit on everything. But if you have those, that's all you have to do, to turn it into, like official scythe on, which is nice, because it used to be you'd have to have like, a C type or cython, type.int. Rather than a, you know, colon enter or something funky like that.
38:08 Yeah, no, it's nice. They put the two things together. So I think like had type annotations before the language did I think they had their own
38:15 their own special way? They had their own special little sub language that was Python esque, but not quite. So I was looking at this inbody problem. And I thought, all right, well, I probably should have played with numba. But I have a little more experience with scythe on. So let me just see if the code is not that hard. And I'm going in terms of like, how much code is there or whatever. Sure, the math is hard, but the actual execution of it isn't. So I'll link to the actual Python source code for the inbody problem. And I ran it, it has some defaults that are much smaller than the one you're talking about. So if you run it just hit Run, it'll run for like two in my machine, it ran for 213 milliseconds, just in pure c Python. So I said I, what if I just grab that code, and I just plunk it into a pi c file? unchanged, I didn't change anything, I just moved it over. I got it to go into 90 milliseconds, which is like 2.34 times faster. And then I did the type hints that I told you about. Because if you don't put the type ends, it'll still run. But it will work at this the the PI object level, like so your numbers are pi object numbers, not, you know, intz and floats data. So you may get a little bit faster. So but I was only able to get it four times faster down to 50 milliseconds. Either I was doing it wrong, or that's just about as that much faster as I can get it. I could have been missing some types. And it was still doing a little more c Python. interrupt stuff. But yeah, I don't know. It's it's an interesting challenge. I guess the last thing to talk about, like on this little bit right here is the is my pi c.
39:44 Yeah, I don't know much about I
39:46 don't know a lot about it either. So my PI is a type checking library and verification library for the type annotations, right? So if you put the type annotations in there, they don't do anything at runtime. They're just like, there to tell you stuff right? But things like certain editors can partially check them. Or my PI can like follow the entire chain and say, this code looks like it's type wise hanging together not like a pure five levels, we pass an integer and you expect a string. So it's broken, right? It can check that, too, they added this thing called my pi c, which can take stuff that is annotated in a way that my PI works with, which is basically type annotations, but more, and they can compile that to C as well, which they also address and they got like a four times speed up with stuff, not an inbody problem, but on my pi. So I don't know, it's, there's a lot of options. But as you pointed out, they are a little bit dodgy in Python, the number of stuff is cool, because I think you don't really write different code do.
40:46 Yeah, it's been more natural. And I think pi pi. Like you're saying, you kind of got two to four times improvement by moving things cipher, he took
40:55 a decent amount of work, right? Because every little variable had to be declared somewhere else, because you can't set the type of the type annotation inside the loop declaration, right? Like it wasn't just put a colon in, I had to do like a decent amount of work to drag out the types.
41:08 Yeah. And where's pi pi will be a seven times improvement in speed for that problem. Yeah. And there's no c compilation.
41:15 Yes. That's really nice. That's really nice. So we talked about jets and jets are pretty interesting. To me, I feel like jets often go together with garbage collection in the entirely unmanaged sort of non deterministic sense of garbage collection. Right, not reference counting, but sort of the mark and sweep style to Python. I mean, maybe we could talk about GC, Python first. And then if there's any way to like change that, or advantages or disadvantages, from the Instagram story, they saw a performance improvement when they turned off GC. Yeah, like, we're gonna solve the memory problem by just letting it leak, like, literally, we're going to disable garbage collection. Yeah, I think they got like a 12% improvement or something, it was significant, they turned it off. And then they just restarted the worker processes every 12 hours or something like that. And it wasn't that bad.
42:07 The GC itself like to your I said, there's another problem vice studied, which was the binary tree problem. And this particular problem will show you the impact of the garbage collector performance on like, in this particular algorithm, this benchmark, it will show you how much your GC slows down the program. And again, I wanted to compare node with Python, because they both have both reference counting and garbage collection. So the garbage collector with node is a bit different in terms of its design, but both of them are stopped everything garbage collector, so you know, C, Python has a main thread, basically. And the garbage collector will run on the main thread, and it will run every number of operations. So I think that the default is it's like 3000, or something, every 3000 operations in the first generation, where an object has been assigned or the assigned, they'll run the garbage collector, which goes into inspect every, every list every dictionary every what other types by custom objects, and sees if they have any circular references,
43:17 right. And the reason we need the GCC, which does this is because it's not even the main memory management system, because if it was, Instagram would not at all be able to get away with that trick, right? This is like a, a final net to catch the stuff that reference counting doesn't work. Normally, like if there's some references to an object, once thing stop pointing at it. The last one that goes it just poof disappears. But the challenge of reference counting garbage collection is if you've got, like, some kind of relationship where one thing points at the other of that thing that also points back to itself, right? Like a couple objects array, a person object with a spouse pointer or something like that. Right? When you're married, you're gonna leave Yeah, absolutely. So this is the thing, you're talking about those types of things that's addressing,
44:02 and it's kind of designed on the assumption that most objects in C, Python, have very short lifespans. So you know, they get created, and then they get destroyed shortly afterwards. So like local variables inside functions, or, you know, like local variables inside list comprehensions, for example, like, those can be destroyed pretty much straight away, but the garbage collector will stop everything running on the main thread while it's running. Because it has to, because you can't, you know, if it's deleting stuff, and there's something else running at the same time that's expecting that thing to exist, that's gonna cause all sorts of problems. So yeah, the GC will kind of slow down your application, if it gets hit a lot. And the binary tree problem will basically construct a series of trees and then loop through them and then delete the nodes and the branches, which kind of triggers the GC to run a lot and then you compare the performance of the the garbage collectors. So one thing I kind of noted in the design is that To the stop everything, if the time it takes to run, the garbage collector could be as short as possible, then the performance hit of running it, it's going to be smaller. And something that no does is it runs a multi threaded Mark process. So when actually goes and looks for circular references, it actually starts looking before it stops the main thread on different helper threads. So it starts separate threads and starts the mark process. And then it still stops everything on the main process, but it's kind of prepared all its homework,
45:32 already figured out what is garbage, before it stopped stuff. And it's like, now we just have to stop what we throw it away and update the pointers. And then you can carry on, right, because it's got a, you know, balance in memory and stop allocation or whatnot.
45:45 Yeah, so I think technically, that's possible. And c Python. I don't think it has anything to do with the Gil either. Like why that couldn't be done. You could still do write, it seems like it
45:55 totally could be done. Yeah,
45:56 yeah. Because the marking and finding circular references could be done outside of the Guild, because it's a sea level call. It's not a not an opcode. But like I, like I say and talk, you know, all this stuff that I've listed so far as a lot of work. And it's a lot of engineering work that needs to go into it. And if you actually look at the CPython compiler, like the C eval, and look at the number of people who've worked on or contributed to it, it's less than 10. Like to the core component, I wouldn't
46:27 want to touch it. I would not want to get in there and be responsible for that part of it. No way.
46:33 Yeah, this stage, they're mine optimizations. Yeah. sort of big overhauls, because there just isn't that there just isn't the people to do it.
46:41 Yeah, you made a point in your Python talk. That, you know, the reason that VA got to be so optimized is so fast is because it's got, you know, 10s of millions of dollars of engineering, put against it yearly. Right? I mean, it's kind of part of the browser wars. The new browser wars a
46:59 bit. Yeah, I from what I could work out, there's at least 35 permanent developers working on it, just looking at the GitHub project. So like, if you just see the commit histories, like nine to five, Monday to Friday, 35 advanced C++ developers hacking away at it,
47:16 right? If we had that many people continuously working on C pythons, like internals and garbage collection and stuff, we'd have more optimizations are bigger projects that people will try to take on, right?
47:27 Yeah, absolutely. And the people who work in at the moment, all of them have day jobs. And this is not typically their day job like they manage, they've convinced their employer to let them do it in their spare time, or, you know, one or two days a week, for example. And they're finding the time to do in this community run project. It's an open source project. What I think kind of going back to places was Python could be faster, like these kinds of optimizations. In terms of engineering, they're, they're expensive optimizations, they cost a lot of money, because they need a lot of engineering expertise and a lot of engineering time. And I think as a project at the moment, we don't really have that luxury. So it's not very fair of me to complain about it. If I'm not contributing to the solution. Yeah, but you have a day job as
48:13 well. Right.
48:14 But I have a day job and this is not. So yeah, there's, I think, for what we use Python for most, the time is definitely fast enough. And in places where it could have optimizations, like the ones that we talked about, those optimizations have drawbacks, because, you know, adding a JIT, for example, means that it uses a lot more memory, like the node j. s example, the N body problem short, it finishes it faster, but uses about five times more RAM to do it right. In pi pi uses more memory, like the JIT compiler, and also the startup time of the processes are typically a lot longer. And if anyone's ever tried to boot, Java JVM cold, you know, like to start up time for JVM is pretty slow. dotnet, it's the same, like the initial boot time for it to actually get started and warm ups time consuming. So you wouldn't use it as a, like a command line tool to write a simple script that you'd expect to finish in, you know, under 100 milliseconds.
49:13 I think that kind of highlights one of the challenges, right? It's, if you thought your process was just going to start and be a web server or a desktop application. Two seconds, startup time is fine, or whatever that number is. But if it's solving this general problem, yeah, it could be running flask as a micro service. Or it could be, you know, replacing bash, right? Like, these are very different constraints and interest, right?
49:38 Yeah. And there aren't really many other languages where there is one sort of language definition and there are multiple, mature implementations of it. So, you know, with Python, you know, you've got psyphon, you've got pi pi, you've got number you've got buying Python. I mean, there's like a whole list of Yeah, JSON, you know, different Then like different implementations of the language, and people can choose the, I guess, kind of pick which one is best for the problem that they're trying to solve, but use the same language across them. Whereas you don't really have that luxury with others. You know, if you're writing Java, then you're using JVM, there are two main there's two implementations as the free one. And the lessons one, like us pretty much, as far as it goes, has
50:23 exactly the same trade off. Yeah, it's optimizing for money, I thought about optimizing for performance or whatever, necessarily. So one thing that I feel like comes around and around again, in this discussion, and I'm thinking of mostly of like pi pi, and some of these other attempts that people have made to add, like JIT compilation to the language or other changes, it's always come back, it seems like to well, it would be great to have these features. Oh, yeah. But there's this thing called the C API. And so no, we can't change the Gil. No, we can't change memory allocation. No, we can't change any of these other things, because of the C API. And so we're stuck.
51:02 Yeah, that's I
51:04 mean, I'm not saying I'm asking you for a solution here. Like I just, it feels like that is both the real value of Python. And that, like some of the reasons that we can still do insanely computational stuff with Python, is because a lot of these libraries where they have these tight loops, or these little bits of code, D serialization, or matrix multiplication or whatever, they've written that in C, and then ship that is a wheel. And so now all of a sudden, our code is not slow as doing math with Python as fast as doing math with C.
51:37 Yeah, I mean, so if you look at a NumPy, for example, if you're doing a lot of math, and you're you know, you could be using the NumPy library, which is largely compiled C code, it's not like you import it from Python, and you want it from Python. But the actual implementation is a C extension. That wouldn't be possible. If c Python wasn't built in the way it is, which is that it is a ahead of time extension loader, that you can run from Python code.
52:04 Yeah, one project, I do want to give a shout out to I don't know if it's gonna go anywhere, it's got a decent amount of work on it. But it's only got 185 GitHub stars. So take that for what it's worth this thing called h pi HP why Guido van Rossum called this out on Python bytes 179, when he was a guest, co host there. And it's an attempt to make a new replacement of the C API for Python, where instead of passing around pointers to objects, you pass basically pointers to pointers, which means that things that move stuff around like compacting garbage collectors, or other implementations, like jits have a much better chance to change things without directly breaking the C API, right, you can change the value of the pointer pointer without, you know, having to re assign that down at that layer. So they specifically call out it's, you know, the currency API makes it hard for things like pi pi, and Grail, Python and jython. And the goals are to make it easier to experiment with these ideas more friendly for other implementations, reference counting, for example, and so on. So anyway, I don't know. That's going anywhere how much traction it has. But it's interesting idea.
53:20 Yeah, no, I like the idea and see API, like has come a long way. But it's it's got its quirks. I don't know, there's been a lot of discussions. And there's a lot of draft peps as well, you know, proposing kind of different designs to the C API.
53:34 Yeah. So we're getting kind of short on time, discussed a bunch of stuff. I guess two other things I'd like to cover real quickly. One, we've talked about a lot of stuff in terms of computational things. But understanding memory is also pretty important. So we did just talk about the GC, it's pretty easy in Python to just run c profile and ask what my computational time is. It's less obvious how to understand memory allocation and stuff. And it was it you that recommended Austin, to me, yeah, yeah. Yeah. So Austin is a super cool profiler, but does both CPU profiling, but also memory allocation, profiling, and tracing and Python two and tell people about Austin real quick?
54:14 Yeah, so spin is a new profiler working for Python code as a sampling profiler. So it won't like other profilers, it won't slow your code down significantly. It's, it's kind of basically sits on the side just asking your app, you know, what it's doing as a sample, and then it will give you a whole bunch of visuals to let you see, like flame graphs, for example, like what's being called What's taking a long time, which functions are chewing up your CPU, like which ones are causing the bottlenecks, and then which ones are consuming a lot of memories. So if you've got a, you know, a piece of code that is it is slow. The first thing you should probably do is to stick it through a profiler and see if there is a reason why like if there is something that you could either optimize or You know, you've accidentally done like a nested loop or something. And Austin would help you do that. One
55:06 of the things I thought was super cool about this, like, a challenge I have so often with profilers is the startup of whatever I'm trying to do just overwhelms like the little thing I'm trying to test. You know, I'm like, starting up a web app and initializing database connections. And I just want to request a little bit of something. And it's not that slow. But, you know, it's just, I'm seeing all this other stuff around, I'm just like, I just want to focus on this one part of it. And they've got all these different user interfaces, like a web user interface in a terminal user interface, they call it to, which is cool. And it gives you like a, like, kind of like top, or glances, or one of these things it tells you right now, here's what the profile for the last five seconds looks like. And it gives you the call stack and breakdown of your code. Right now for like that five seconds segment like updating in real time. That's super cool.
55:56 Yeah. So if you want to run something, and then just see what it's doing, or you want to
56:00 replay it, why is it using a lot of CPU now? Yeah, yeah. Yeah, that's, I really like that. That's super cool. All right. Also, you know, concurrency is something that Python has gotten a bad rap for, in terms of slowness, I think with async and await and async. io, if you're waiting on an external thing, Python can be ultra fast now, right? Like, it's async and await, waiting on like database calls, web calls with the right drivers super fast. But when it comes down to computational stuff, there's still the Gil. And there's really not a great fix for that. I mean, there's multiprocessing. But that's got a lot of overhead. So it's got to make sense, right? Kind of like your plumber analogy, right? You can't do like one line function calls in multiprocessing, or, you know, like one line computations. But the work that airstones doing with sub interpreters looks pretty promising to unlock another layer.
56:50 Yeah. So it's, it's out in the 3.9 alpha, if you've played with that yet, it's still experimental. So Southern surprises, is somewhere in between multi processing and threading in terms of the like the implementation, so it will, it doesn't spawn. So if you use multi processing, I mean, that's basically just saying, Let's hire another plumber. And we'll get them to talk to each other, that being the beginning of day and split up the tasks, whereas sub interpreters, actually, maybe they're sharing the same van, I'm not sure where this analogy is going. But you know, they use the same process as up interpreters share the same Python process, it doesn't spawn up an entirely new process, it doesn't have to load all the modules again. And the sub interpreters can also talk to each other, they can use shared memory to communicate with each other as well. But because they're in separate interpreters, then technically they can have their own locks. So the lock that you know gets locked when you run any opcode is the interpreter lock. And this basically means that you can have two interpreters running in a single process, each with its own lock. So it can be running different operations at the same time and right, they would automatically run on separate threads. So you're basically running multi threading, and it can also use multi CPU be great. Yeah, fundamentally, the Gil is not about it by threading thing, per se. It's about serializing, memory access, allocation and de allocation. And so with the seven interpreters ideas, you don't directly share pointers between sub interpreters. There's like a channel type of communication between them. So you don't have to take a lock on one when it's working with objects versus another like they're entirely different set of objects. They're still in the same process space, but they're not actually sharing pointers. So you don't need to protect each other. Right? You just have to protect with any sub interpreter, which has the possibility to let me use all six in my course. Yeah, absolutely. You can't read and write from the same local variables. For that reason you wish you can do in threading, but with sub interpreters is kind of like a halfway halfway between just running a separate process. Yeah,
59:00 it probably formalizes some of the multi threading, communication styles that are going to keep things safer anyway. Hmm, definitely. Yeah. All right. Let's talk about one really quick thing before we wrap it up. Just one interesting project you've been working on. I mentioned the Ron before about some security issues. Right. Yeah. I want to tell people about your PI charm extension that you've been working on.
59:19 Yeah. So I've been working on i pi charm extension called Python security is very creatively named. It's available. Take the straightforward Yeah, exactly. So it's basically like a code checker, but it runs inside pi charm and it will look for security vulnerabilities that you may have written in your code and underline them for you, and in some cases, fix them for you as well. So it will say the thing you've done here is really bad because it can cause somebody to be able to hack into your code. And you can just press the quick fix button, and it could fix it for you. So it's got actually over 100 different inspections now. And also, you could show
59:58 us again without load cells, Is that good? No.
01:00:03 I think I was like the first check I wrote she was the most. Yeah, you can run it across the whole project. So you can do a code inspection across your project to like a code audit, and also uses pi charms package manager. So it will go in and look at all the packages you have installed in your project. And it will check them against by the sneak, which is a big database of vulnerable Python packages. s ny k.io, uses their API, so checks against that, or you can check it against like your own your own list. And also, it's available as a GitHub action. So managed to figure out how to run pi charm inside Docker so that you can run pi charm from GitHub actions. So Wow, yeah, you can write a CI CD script in GitHub, to just say inspect my code, and it will just inside GitHub, you don't need pi charm to do it. But it will run the inspection tool against your code repository. It just requires that it's open sourced to be able to do that.
01:01:03 Okay, that's super cool. All right. Well, we're definitely out of time. So I have to leave it there. Two quick questions. Favorite editor notable package? What do you got? Pi charm. And I don't know how to
01:01:13 metal package.
01:01:14 Oh, yeah. You've been too far in the C code.
01:01:16 Yeah. Like, what are packages?
01:01:19 I think there's something that does and sells those, but they don't work down and see. Yeah, no, that's cool. All right. So people are interested in this. They want to maybe understand how c Python works better, or how that works, and where and why it might be slow, so they can avoid that. Or maybe they even want to contribute What do you say
01:01:35 my my book to come out and read the book or read the real Python article, which is free and online. And it talks through a lot of these concepts. Yeah.
01:01:44 Well, Anthony, thanks for being back on the show. Great as always to dig into the turtles. Thanks. Miko. Yeah, you bet. Bye. Hi. This has been another episode of talk Python. To me. Our guest on this episode was Anthony Shaw, and it's been brought to you by brilliant.org in century brilliant.org encourages you to level up your analytical skills and knowledge, visit talkpython.fm/ brilliant and get brilliant premium to learn something new every day. Take some stress out of your life get notified immediately about errors in your web applications with century. Just visit talkpython.fm/ century and get started for free. Want to level up your Python. If you're just getting started, try my Python jumpstart by building 10 apps course. Or if you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python we should be right at the top. You can also find the iTunes feed at /itunes. The Google Play feed is /play in the direct RSS feed at /rss on talk python.fm. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code