#265: Why is Python slow? Transcript
00:00 The debate about whether Python is fast or slow is never-ending.
00:03 It depends on what you're optimizing for.
00:05 CPU server consumption?
00:07 Developer time?
00:08 Maintainability?
00:09 There are many factors.
00:11 But if we keep our eye on the pure computational speed in the Python layer,
00:15 then yes, Python is slow.
00:17 In this episode, we invite Anthony Shaw back on the show.
00:21 He's here to dig into the reasons that Python is computationally slower than many of its pure languages and technologies, such as C++ and JavaScript.
00:29 This is Talk Python to Me, episode 265, recorded May 19, 2020.
00:34 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:54 This is your host, Michael Kennedy.
00:56 Follow me on Twitter, where I'm @mkennedy.
00:58 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
01:04 This episode is sponsored by Brilliant.org and Sentry.
01:08 Please check out their offerings during their segments.
01:10 It really helps support the show.
01:12 Anthony, welcome back to Talk Python.
01:15 Hey, Mike.
01:15 It's great to be back.
01:16 Yeah, it's great to have you back.
01:17 You've been on the show a bunch of times.
01:19 You've been over on Python Bytes when you're not featured there.
01:22 But, you know, people may know you were on episode 168, 10 Python security holes and how to plug them.
01:29 That was super fun with one of your colleagues.
01:31 And then 214, dive into the CPython 3.8 source code.
01:36 Or just what was new in 3.8.
01:38 And then a guided tour of the CPython source code, which I think at the time was also 3.8.
01:42 And now we're going to look at the internals of Python again.
01:45 I feel like you're becoming the Python internals guy.
01:48 Yeah.
01:48 Well, I don't know.
01:50 There's lots of people who know a lot more about it than I do.
01:53 But I've been working on this book over the last year on CPython internals, which has been focused on 3.9.
02:00 So, yeah, we've got some stuff to talk about.
02:03 Yeah, that's awesome.
02:04 And your book started out as a realpython.com article, which I'm trying to define a term that describes what some of these look like.
02:13 When I think of article, I think of a three to four page thing.
02:17 Maybe it's in depth and it's 10 pages.
02:19 This is like 109 pages or something as an article, right?
02:22 It was like insane.
02:23 But it was really awesome and really in depth.
02:24 And so you were partway towards a book and you figured like, well, what the heck?
02:28 I'll just finish up this walk.
02:29 Yeah, I figured I'd pretty much written a book.
02:32 So I might as well put it between two covers.
02:34 It was actually a lot.
02:36 It was actually a lot of work to get it from that stage to where it is now.
02:41 So I think the whole thing's pretty much been rewritten.
02:43 There's a way that you explain things in an article that people expect, which is very different to the style of a book.
02:49 And also there's stuff that I kind of skimmed over in the article.
02:53 I think it's actually about three times longer than the original article.
02:56 And it's a lot more practical.
02:59 So rather than being like a tourist guide to the source code, it's more about like CPython internals and optimizations and practical tools you can learn as more sort of like advanced techniques.
03:12 If you use CPython a lot for your day job to either make it more performant or to optimize things or to make it more stable and stuff like that.
03:21 Yeah.
03:21 It's really interesting because if you want to understand how Python works and you're, say, the world's best Python developer, your Python knowledge is going to help you a little bit.
03:31 But not a ton for understanding CPython because that's mostly, well, C code, right?
03:36 And so I think this having this guided tour, this book that talks about that is really helpful, especially for taking people who know and love Python, but actually want to get a little deeper and understand the internals or maybe even become a core developer.
03:49 Yeah, definitely.
03:49 And if you look at some of the stuff we'll talk about this episode, hopefully, like Cython and mypyC and stuff like that, then knowing C or knowing how C and Python work together is also really important.
04:02 Yeah, absolutely.
04:02 All right.
04:03 So looking forward to talking about that.
04:05 But just really quickly, you know, give people a sense of what you work on day to day when you're not building extensions for IDEs, writing books and otherwise doing more writing.
04:15 Yeah, so I work at NTT and run sort of learning and development and training for the organization.
04:22 So I'm involved in, I guess, like what skills we teach our technical people and our sales people and all of our employees, really.
04:30 Yeah, that's really cool.
04:31 That sounds like a fun place to be.
04:32 Yeah, that's a great job.
04:33 Yeah, awesome.
04:33 All right.
04:34 Well, the reason I reached out to you about having you on the show for this specific topic, I always like to have you on the show.
04:42 We always have fun conversations, but I saw that you were doing, were you doing multiple or just this PyCon talk?
04:50 Just one.
04:51 I was accepted for two, but I was supposed to pick one.
04:55 I see.
04:55 That's right.
04:55 That's right.
04:56 And then PyCon got canceled.
04:57 Yeah.
04:59 So I was like, well, let's, you know, talk.
05:00 We can talk after PyCon after you give your talk.
05:02 It'll be really fun to cover this.
05:04 And then, you know, we were supposed to share a beer in Pittsburgh and we're like half a world away.
05:12 Didn't happen, did it?
05:13 Yeah.
05:13 Maybe next year.
05:14 Yeah.
05:15 Hopefully next year.
05:15 Hopefully things are back to up and running because I don't know.
05:18 To me, PyCon is kind of like my geek holiday that I get to go on.
05:22 I love it.
05:22 Yeah.
05:23 All right.
05:23 Well, so just, I guess, for people listening, you did end up doing that talk in an altered sense,
05:30 right?
05:30 And they can technically go watch it soon, at least maybe by the time this is out.
05:34 Yeah, definitely.
05:35 It'll be out tonight.
05:36 It's going to be on the YouTube channel, the PyCon 2020 YouTube channel.
05:41 The organizers reached out to all the speakers and said, if you want to record your talk and
05:46 submit it from home, then you can still do that and put them all up on YouTube.
05:50 I think that's great.
05:51 You know, and there's also a little bit more over PyCon online.
05:54 One thing I think is really valuable for people right now is they have the job fair, kind of,
06:00 right?
06:01 There's a lot of job listings for folks who are looking to get in jobs.
06:05 Have you seen the PSF JetBrains survey that came out?
06:08 Yes.
06:09 In the 2019, it came out just like a few days ago.
06:11 Really interesting stuff, right?
06:13 Like a lot of cool things in there.
06:14 Yeah, definitely.
06:15 Yeah.
06:15 I love that.
06:16 That and the Stack Overflow developer survey.
06:18 Those are the two that really, I think, have the pulse correctly taken.
06:22 One of the things that was in there I thought was interesting is more than any other category
06:27 of people, they said, how long have you been coding?
06:30 I don't know if it was in Python or just how long have you been coding, but it was different,
06:36 you know, one to three years, three to five, five to 10, 10 to 15.
06:41 And then people like me forever, long time, you know, like 20 plus or something.
06:45 The biggest bar of all those categories, the biggest group was the one to three years.
06:51 Yeah.
06:52 Right.
06:52 Like by 29% of the people said, I've only been coding three years or fewer.
06:56 And I think that that's really interesting.
06:58 So I think things like that job board and stuff are probably super valuable for folks just getting
07:02 into things.
07:03 Definitely.
07:03 Yeah.
07:03 So really good that they're putting that up and people will be able to check out your
07:07 talk.
07:07 I'll put a link to it in the show notes, of course, but they can just go to the PyCon 2020
07:11 YouTube channel and check it out there.
07:13 Yeah.
07:13 And check out the other talks as well.
07:15 There's some really good ones up already.
07:16 The nice thing about this year's virtual PyCon is you can watch talks from your couch.
07:20 That's right.
07:22 You don't even have to get dressed to go to PyCon.
07:25 Just do it in your PJs.
07:26 That's right.
07:27 It's so much more comfortable than the conference chairs.
07:31 That's true.
07:31 That's for sure.
07:32 Yeah.
07:33 Very cool.
07:33 I'm definitely looking forward to checking out more of the talks as well.
07:35 I've already watched a few.
07:36 I wanted to set the stage for our conversation here by defining slow because I think slow is
07:44 in the eye of the beholder, just like beauty, right?
07:46 Like sometimes slow doesn't matter.
07:50 Sometimes computational speed might be slow, but some other factor might be quick.
07:57 So I'll let you take a shot at it, then I'll throw in my two cents as well.
08:00 Like let's like, what do you mean when you say, why is Python slow?
08:04 So when I say, why is Python slow?
08:06 The question is, why is it slower than other languages doing exactly the same thing and have
08:14 picked on an error?
08:15 Right.
08:15 So if I had an algorithm that I implemented, say in C, a JavaScript on top of Node and Python,
08:20 it might be much slower in Python.
08:23 Wall time, like execution time.
08:25 Yeah.
08:26 Execution time might be much slower in Python than it is in other languages.
08:29 And that matters sometimes.
08:31 And sometimes it doesn't matter as much.
08:34 It depends what you're doing, right?
08:35 If you're doing like a DevOps-y thing and you're trying to orchestrate calling into Linux, well,
08:40 who cares how fast Python goes?
08:42 Probably like the startup time is the most important of all of them.
08:45 If you're modeling stuff and you're trying to do the mathematical bits, anything computational,
08:51 and you're doing that in Python, then it really might matter to you.
08:54 Yeah.
08:55 So it was kind of like a question, if we can find out the answer, maybe there's a solution
09:00 to it.
09:00 Yeah.
09:01 Because, you know, you hear this thrown around.
09:02 People say Python's too slow and I use this other language because it's faster.
09:06 And so I just wanted to understand, like, what is the actual reason why Python is slower
09:12 at doing certain things than other languages?
09:14 And is there a reason that can be resolved?
09:18 Or is it just that's just how it is as part of the design?
09:22 Fundamentally, it's going to be that way.
09:23 Yeah.
09:24 I don't think it is.
09:25 I think...
09:26 You don't think it's slow?
09:27 No, I don't think it's fundamentally has to be that way.
09:30 I agree with you.
09:31 I think in the research as well, it uncovered it doesn't fundamentally have to be that way.
09:36 And in lots of cases, it isn't that way either.
09:38 Like there's ways to get around the slowdown, like the causes of slowdown.
09:44 And if you understand in what situations Python can be slow, then you can kind of like bypass
09:51 those.
09:51 Right.
09:52 So let me tell a really interesting story that comes from Michael Driscoll's book, Python
09:57 Interviews.
09:58 So over there, he interviewed, I think it was Alex.
10:02 Yeah, Alex Martelli.
10:03 And they talked about the history of YouTube, right?
10:07 YouTube's built on Python.
10:09 And why is that the case?
10:11 Originally, there was Google Video, which had hundreds of engineers writing, implementing
10:18 Google Video, which is going to be basically YouTube.
10:21 But YouTube was also a startup around the same time, right?
10:24 And they were kind of competing for features and users and whatnot.
10:26 And YouTube only had like 20 employees at the time or something like that, whereas Google
10:31 had hundreds of super smart engineers.
10:34 And Google kept falling behind farther and farther and not be able to implement the features that
10:39 people wanted nearly as quick as YouTube.
10:41 YouTube.
10:41 And the reason was they were all doing it in C++ and it took a long time to get that written.
10:47 And YouTube just ran circles around them with a, you know, more less than a fifth of the
10:52 number of people working on it.
10:53 So in some sense, like that's a testament of Python speed, right?
10:58 But it's not its execution speed.
11:00 It's like the larger view of speed, which is why I really wanted to find like what computational
11:04 speed is.
11:05 Another sense where it may or may not matter is like where you're doing stuff that waits,
11:10 right?
11:10 Somewhere where asyncio would be a really good option, right?
11:13 I'm talking to Redis.
11:14 I'm talking to this database.
11:15 I'm calling this API.
11:16 Like if 95% of your time is waiting on a network response, it probably doesn't matter, right?
11:21 As long as you're using some sort of async or something.
11:24 But then there's that other part where it's like I have on my computer, I've got six hyperthreaded
11:30 cores.
11:30 Why can I only use one twelfth of my computational power on my computer if I still write C code,
11:36 right?
11:37 So there's these other places where it super matters.
11:39 Or I just, like you said, there's this great example that we're going to talk about the
11:43 in-body problem, modeling like planets and how they interact with each other.
11:48 And I mean, just like to set the stage, what was the number for C versus Python in terms
11:53 of time, computation time?
11:54 To give people a sense, like why did we care?
11:56 Like why is this a big enough deal to worry about?
11:58 Is it, what is it like 30% slower?
12:00 It's a little bit slower.
12:01 Yeah.
12:01 It's a, for this algorithm, so this is called the end body problem and it's to do with calculating
12:07 the orbits of some of the planets in the solar system.
12:10 And you just do a lot, a really simple arithmetic operations.
12:15 So just adding numbers, but again and again and again.
12:17 So millions of times.
12:18 Lots of loops, lots of math.
12:20 Lots of math, lots of looping.
12:22 And in C, this implementation is seven seconds to complete.
12:27 And in Python, it's 14 minutes.
12:29 That might be a difference that you're needing to optimize away.
12:32 That could be too much, right?
12:34 Yeah.
12:34 I mean, everyone is calculating the orbits of planets as part of their day job.
12:38 So yeah.
12:39 You know, I honestly, I haven't really done that for at least two weeks.
12:44 No, but I mean, it's, it's fundamentally like I'm thinking about like, this is, I think this
12:48 undercovers one of the real Achilles heels of Python in that doing math in tight loops is really not super great in pure Python.
13:00 Right.
13:01 Whether that's planets, whether that's financial calculations or something else.
13:05 Right.
13:05 But numbers are very flexible, but that makes them inefficient.
13:08 Right.
13:09 Python is interpreted, which has a lot of benefits, but also can make it much slower as well.
13:15 Right.
13:15 Yeah.
13:16 So I think looking at this particular problem, because I thought it would be a good example,
13:20 it shines a bit of a spotlight on one of CPython's weaknesses when it comes to performance.
13:26 But in terms of like the loop, the only times you would be doing like a small loop and doing
13:31 the same thing over and over again is if you're doing like math work, doing like number crunching,
13:37 or if you're doing benchmarks, that's like one of the other reasons.
13:41 So like the way that a lot of benchmarks designed to do like computational benchmarks anyway,
13:47 is to do the same operation again and again.
13:49 So if there is an overhead or a slowdown, then it's magnified to the point where you can see
13:55 it a lot bigger.
13:55 Yeah, for sure.
13:56 I guess one thing to put out there before people run code, it doesn't go as fast as they'd hoped.
14:04 So they say that Python is slow, right?
14:07 Assuming the code they originally ran was Python like that.
14:09 That would be a requirement, I guess, is you probably should profile it.
14:13 You should understand what your code is doing and where it's slow.
14:17 Like, for example, if you're doing lookups, but your data structure is a list instead of
14:21 a dictionary, right?
14:23 You could make that a hundred times faster just by switching a date because you're just doing
14:26 the wrong type of data structure, the wrong algorithm.
14:29 It could be just that you're doing it wrong, right?
14:32 So I guess before people worry about like, is it executing too slowly?
14:37 Maybe you should make sure that it's executing the right thing.
14:40 Yeah, and it's unlikely that your application is running a very small operation, which is
14:47 this benchmark again and again, like millions of times in a loop.
14:50 And if you are doing that, there's probably other tools you could use and there's other
14:55 implementations you can do in Python.
14:59 This portion of Talk Python to Me is brought to you by Brilliant.org.
15:03 Brilliant's mission is to help people achieve their learning goals.
15:06 So whether you're a student, a professional brushing up or learning cutting edge topics,
15:10 or someone who just wants to understand the world better, you should check out Brilliant.
15:14 Set a goal to improve yourself a little bit every day.
15:17 Brilliant makes it easy with interactive explorations and a mobile app that you can use on the go.
15:22 If you're naturally curious, want to build your problem solving skills, or need to develop
15:26 confidence in your analytical abilities, then get Brilliant Premium to learn something new
15:31 every day.
15:32 Brilliant's thought-provoking math, science, and computer science content helps guide you
15:37 to mastery by taking complex concepts and breaking them into bite-sized, understandable chunks.
15:42 So get started at talkpython.fm/brilliant, or just click the link in your show notes.
15:50 Another benchmark I covered in the talk was the regular expression benchmark,
15:54 which Python is actually really good at.
15:57 So this is like the opposite to this particular benchmark.
16:01 So just saying that Python is slow isn't really a fair statement, because, and we'll kind of talk about this in a minute,
16:07 but like for other benchmarks, Python does really well.
16:11 So its string implementation is really performant.
16:14 And when you're working with text-based data, Python's actually a great platform to use, a great language to use.
16:20 The CPython compilers is pretty efficient at dealing with text data.
16:25 And if you're working on web applications or data processing, chances are you're dealing with text data.
16:32 Yeah, that's a good example.
16:33 Like the websites that I have, like the Talk Python training site, and the various podcast sites and stuff,
16:39 they're all in Python with no special, incredible optimizations, other than like databases with indexes and stuff like that.
16:46 And, you know, the response times are like 10, 30 milliseconds.
16:51 There's no problem.
16:52 Like it's fantastic.
16:53 It's really, really good.
16:54 But there are those situations like this in-body problem or other ones where it matters.
17:01 I don't know if it's fair or not to compare it against C, right?
17:04 C is really, really low level, at least from today's perspective.
17:10 It used to be a high level language, but now I see it as a low level language.
17:13 If you do a malloc and free and, you know, the address of this thing, right,
17:17 that feels pretty low level to me.
17:19 So maybe it's unfair.
17:21 I mean, you could probably get something pretty fast in assembly, but I would never choose to use assembly code these days
17:27 because it's just like I want to get stuff done and maintain it and be able to have other people understand what I'm doing.
17:31 But, you know, kind of a reasonable comparison, I think, would be Node.js and JavaScript.
17:38 And you made some really interesting compare and contrast between those two environments
17:43 because they seem like, well, like, okay, Python, at least it has some C in their JavaScript.
17:48 Who knows what's going on with that thing, right?
17:50 Like, you know, what's the story between those two?
17:52 Yeah, you make a fair point, which is, I mean, comparing C and Python isn't really fair.
17:56 One is like a strongly typed compiled language.
17:59 The other is a dynamically typed interpreted language and they handle memory differently.
18:06 Like in C, you have to statically or dynamically allocate memory and CPython is done automatically.
18:12 Like it has a garbage collector.
18:14 There's so many differences between the two platforms.
18:16 And so I think Node.js, which is, so Node.js is probably a closer comparison to Python.
18:24 Node.js isn't a language.
18:25 It's a kind of like a stack that sits on top of JavaScript that allows you to write JavaScript,
18:32 which operates with things that run in the operating system.
18:36 So similar to CPython, like CPython has extensions that are written in C that allow you to do things
18:43 like connect to the network or, you know, connect to like physical hardware
18:49 or talk to the operating system in some way.
18:51 Like if you just wrote pure Python and there was no C, you couldn't do that because the operating system APIs
18:56 are C headers in most cases.
18:59 Right.
18:59 Almost all of them are in C somewhere.
19:01 Yeah.
19:01 Yeah.
19:02 And with JavaScript, it's the same thing.
19:03 Like if you want to talk to the operating system or do anything other than like working with stuff
19:09 that's in the browser, you need something that plugs into the OS.
19:12 And Node.js kind of provides that stack.
19:15 So when I wanted to compare Python with something, I thought Node was a better comparison
19:21 because like JavaScript and Python, in terms of the syntax, they're very different.
19:25 But in terms of their capabilities, they're quite similar.
19:29 You know, they both have classes and functions and you can use them interchangeably.
19:33 They're both kind of like dynamically typed.
19:35 The scoping is different and the language is different.
19:37 But like in terms of the threading as well, they're quite similar.
19:42 Right.
19:42 They do feel much more similar.
19:44 But there's a huge difference between how they run, at least when run on Google's V8 engine,
19:51 which basically is the thing behind Node and whatnot, versus CPython is,
19:56 CPython is interpreted and V8 is JIT compiled, just in time compiled.
20:02 Yeah, so that's probably one of the biggest differences.
20:04 And when I was comparing the two, so I wanted to see, okay, which one is faster?
20:10 Like if you gave it the same task and if you gave it the end body problem,
20:13 then Node.js is a couple of multiples faster.
20:18 I think it was two or three times faster to do the same algorithm.
20:23 And for a dynamically typed language, you know, that means that they must have some optimizations,
20:28 which make it faster.
20:30 I mean, if you're running on the same hardware, then, you know, what is the overhead?
20:34 And kind of digging into it, I guess, in a bit more detail.
20:39 So JavaScript has this, actually there's multiple JavaScript engines, but kind of the one that Node.js uses
20:45 is Google's V8 engine.
20:47 So quite cleverly named, which is all written in...
20:52 Only would it be better if it were a V12, you know?
20:54 Or an inline six.
20:56 I think that's a better option.
20:57 Yeah, there you go.
21:01 So Google's V8 JavaScript engine is written in C++, so maybe that's a fair comparison.
21:07 But the optimizing compiler is called TurboFan, and it's a JIT optimizing compiler.
21:14 So it's a just-in-time compiler, whereas CPython is an ahead-of-time or an AIT compiler.
21:20 And it's JIT optimizer has got some really clever, basically sort of algorithms and logic
21:27 that it uses to optimize the performance of the application when it actually runs.
21:31 And these can make a significant difference.
21:33 Like some of the small optimizations alone can make 30%, 40% increase in speed.
21:39 And if you compare even just V8 compared to other JavaScript engines, you can see, like,
21:45 what all this engineering can do to make the language faster.
21:49 And that's how it got two, three multiples performance increases, was to optimize the JIT
21:54 and to understand, like, how people write JavaScript code and the way that it compiles the code
22:01 down into operations.
22:03 Then basically, like, it can reassemble those operations that are more performant for the CPU
22:08 so that when it actually executes them, does it in the most efficient way possible.
22:12 Right.
22:13 The difference between a JIT and an AOT is that the JIT compiler kind of makes decisions
22:18 about the compilation based on the application and based on the environment,
22:22 whereas an AOT compiler will compile the application the same and it does it all ahead of time.
22:29 Right.
22:29 So you probably have a much more coarsely-grained set of optimizations and stuff for an ahead-of-time compiler,
22:36 like C++ or something, right?
22:38 Like, I've compiled against x86 Intel CPU with, like, the multimedia extensions
22:47 or whatever, right?
22:48 The scientific computing extensions.
22:49 But other than that, I make no assumptions, whether it's multi-core, highly multi-core,
22:54 what its L2 cache is, none of that stuff, right?
22:57 It's just, we're going to kind of target modern Intel on macOS and do it on Windows
23:04 and compile that.
23:05 Yeah.
23:05 So modern CPU architectures and modern OSes can really benefit if you've optimized
23:12 the instructions that you're giving them to benefit, like, the caches that they have
23:17 or the cycles that they've set up and the sort of the turbo fan optimizer
23:22 for the VA engine takes a lot of advantage of those things.
23:25 Yeah.
23:25 That seems really powerful.
23:27 I guess we should step back and talk a little bit about how CPython runs,
23:32 but being an interpreter, it can only optimize so much.
23:37 It's got all of its byte codes and it's going to go through its byte codes
23:41 and execute them, but saying, like, well, these five byte codes, we could actually turn that
23:45 into an inline thing over here and I see this actually has no effect on what's loaded on the stack,
23:51 so we're not going to, like, push the item.
23:53 I mean, it seems like it doesn't operate optimizing, tell me if I'm wrong,
23:57 if it doesn't optimize, like, across lots of byte codes as it's thinking about it.
24:04 Yeah, so what CPython will do when it compiles your code, and it's also worth pointing out
24:08 that when you run your code for the first time, it will compile it, but when you run it again,
24:14 it will use the cached version, so...
24:16 Right, if you ever see the dunder pycache with .pyc files, that's, like,
24:21 three of the four steps of getting your code ready to run saved and done
24:25 and never done again.
24:26 Yeah, so that's, like, the compiled version.
24:28 So it's not...
24:29 If Python is slow to compile code, it doesn't really matter unless your code
24:33 is somehow changing every time it gets run, which I'd be worried about.
24:37 You have bigger problems.
24:38 Yeah, exactly.
24:39 So the benefits, I guess, of an AOT compiler is that you compile things ahead of time
24:45 and then when they execute, they should be efficient.
24:47 So CPython's compiler will kind of take your code, which is, like, a text file,
24:53 typically.
24:53 It'll look at the syntax.
24:55 It will parse that into an abstract syntax tree, which is a sort of representation of functions
25:02 and classes and statements and variables and operations and all that kind of stuff.
25:08 your code, your file, your module, basically, becomes like a tree and then what it does
25:13 is it then compiles that tree by walking through each of the branches and walking through
25:19 and understanding what the nodes are and then there is a compilation.
25:23 Basically, like, in the CPython compiler, there's a function for each type of thing
25:28 in Python.
25:28 so there's a compile binary operation or there's a compile class function
25:34 and a compile class will take a node from the AST, which has got your class in it
25:39 and it will then go through and say, okay, what properties, what methods does it have
25:44 and it will then go and compile the methods and then inside a method it will go and compile the statements.
25:48 So, like, once you break down the compiler into smaller pieces, it's not that complicated
25:53 and what a compiler will do is it will spit out so compiled basic frame blocks
25:59 they're called and then they get assembled into bytecode.
26:03 So, after the compiler stage, there is an assembler stage which basically figures out
26:08 in which sequence should the code be executed, you know, which basically,
26:13 like, what will the control flow be between the different parts of code,
26:17 the different frames.
26:18 In reality, like, they get executed in different orders because they depend on input
26:23 whether or not you call this particular function but still, like, if you've got a for loop,
26:27 then it's still got to go inside the for loop and then back to the top again.
26:31 Like, that logic is, like, hard-coded into the for loop.
26:34 Right.
26:35 You know, as you're talking, I'm wondering if, you know, minor extensions
26:39 to the language might let you do higher-level optimizations.
26:43 Like, say, like, having a frozen class that you're saying I'm not going to add any fields to
26:49 or, like, an inline on a function, like, I only, or make it a function internal
26:54 to a class in which it could be inlined, potentially, because, you know,
26:58 no one's going to be able to, like, look at it from the outside of this code and stuff.
27:02 What do you think?
27:03 There is an optimizer in the compiler called the peephole optimizer.
27:07 And when it's compiling, I think it's actually it's after the compilation stage,
27:12 I think, it goes through and it looks at the code that's been compiled and if it can make some
27:18 decisions about either, like, dead code that can be removed or branches which can be simplified,
27:25 then it can basically optimize that.
27:27 And that will make some improvement, like, it will optimize your code slightly.
27:31 Right.
27:32 But then once it's done, basically, your Python application has been compiled down
27:36 into this, like, assembly language called bytecode, which is the, like, the actual individual operations
27:42 and then they're executed in sequence.
27:45 They're split up into small pieces, they're split up into frames, but they're executed
27:50 in sequence.
27:50 Right.
27:51 And if you look at the C source code, dive into there, there's a C eval.c file
27:56 and it has, like, the world's largest while loop with a switch statement
28:01 in it, right?
28:02 Yeah.
28:02 So this is, like, the kind of the brain of CPython.
28:06 Oh, maybe it's not the brain, but it's the bit that, like, goes through each
28:10 of the operations and says, okay, if it's this operation, do this thing,
28:13 if it's that one, do this thing.
28:14 This is all compiled in C, so it's fairly fast, but it will basically sit and run the loop.
28:20 So when you actually run your code, it takes the assembled bytecode and then for each
28:26 bytecode operation, it will then do something.
28:29 So, for example, there's a bytecode for add an item to a list.
28:33 So it knows that it will make a value off the stack and it will put that
28:37 into the list or this one which calls a function.
28:40 So, if the bytecode is call function, then it knows to figure out how to
28:44 call that function in C.
28:46 Right.
28:46 Maybe it's loaded a few things on the stack, it's going to call it, do it just get sucked along,
28:51 something like that.
28:51 And so I guess one of the interesting things, and you were talking about an interesting
28:56 analogy about this, sort of when Python can be slow versus a little bit less slow,
29:02 it's the overhead of like going through that loop, figuring out what to do,
29:06 like preparing stuff before you call the CPython's thing, right?
29:10 Like list.sort, it could be super fast even for a huge list because it's just going
29:15 to this underlying C object and say, in C, go do your sort.
29:18 But if you're doing a bunch of small steps, like the overhead of the next step
29:24 can be a lot higher.
29:26 in the nbody problem, the step that it has to do, the operation it has to do,
29:30 will be add number A to number B, which on a decent CPU, I mean, this is like nanoseconds
29:36 in terms of time it takes to execute.
29:39 So if it's basically, if the operation that it's doing is really tiny, then after doing
29:46 that operation, it's got to go all the way back up to the top of the loop again,
29:49 look at the next barcode operation, and then go and run this, you know, call this thing,
29:56 which runs the operation, which takes again like nanoseconds to finish, and then it goes
30:00 all the way back around again.
30:01 So I guess the analogy I was trying to think of with the nbody problem is,
30:05 you know, if you were a plumber and you got called out to do a load of jobs
30:10 in a week, but every single job was, can you change this one washer on a tap for me,
30:16 which takes you like two minutes to finish, but you get a hundred of those jobs
30:21 in a day, you're going to spend most of your day just driving around and not actually doing
30:25 any plumbing.
30:26 You're going to be driving from house to house and then doing these like two
30:30 minute jobs and then driving on to the next job.
30:33 So I think the nbody problem, that's kind of an example of that is that the evaluation
30:39 loop can't make decisions, like it can't say, oh, if I'm going to do the same operation
30:43 again and again and again, instead of going around the loop each time, maybe I should just
30:49 call that operation the number of times that I need to.
30:53 and those are the kind of optimizations that a JIT would do because it kind of
30:56 changes the compilation order in sequence.
30:59 So that's, I guess like we could talk about there are JITs available for
31:03 Python.
31:04 Yes.
31:05 CPython doesn't have, CPython doesn't use a JIT, but for things like the
31:10 nbody problem, instead of the, you know, the plumber driving to every house and doing
31:15 this two minute job, why can't somebody actually just go and, why can't everyone
31:20 just send their tap to like the factory and he just sits in the factory all day
31:24 replacing the washers.
31:26 Like Netflix of taps or something, yeah.
31:28 Back when they sent out DVDs.
31:31 Maybe I was stretching the analogy a bit, but, you know, basically like you can
31:35 make optimizations if you know you're going to do the same job again and again
31:39 and again, or maybe like he just brings all the washers with him instead of driving
31:44 back to the warehouse each time.
31:45 So, like there's optimizations you can make if you know what's coming.
31:49 But because the CPython application was compiled ahead of time, it doesn't know
31:54 what's coming.
31:55 There are some opcodes that are coupled together, but there's only a few
32:00 like which ones they are off the top of my head, but there's only a couple and it doesn't
32:04 really add a huge performance increase.
32:05 Yeah, there have been some improvements around like bound method execution time and
32:10 methods without keyword arguments or some something along those lines that got quite a
32:14 bit faster.
32:14 But that's still just like how can we make this operation faster?
32:18 Not how can we say like, you know what, we don't need a function, let's inline that.
32:21 It's called in one place once, just inline it, right?
32:23 Things like that.
32:24 This portion of Talk Python to me is brought to you by Sentry.
32:29 How would you like to remove a little stress from your life?
32:32 Do you worry that users may be having difficulties or are encountering errors
32:36 with your app right now?
32:37 Would you even know it until they send that support email?
32:40 How much better would it be to have the error details immediately sent to you,
32:44 including the call stack and values of local variables, as well as the active user stored in
32:50 the report?
32:51 With Sentry, this is not only possible, it's simple and free.
32:54 In fact, we use Sentry on all the Talk Python web properties.
32:58 We've actually fixed a bug triggered by our user and had the upgrade ready to roll
33:03 out as we got the support email.
33:04 That was a great email to write back.
33:06 We saw your error and have already rolled out the fix.
33:09 Imagine their surprise.
33:10 Surprise and delight your users today.
33:12 Create your free account at talkpython.fm/sentry and track up to 5,000
33:18 errors a month across multiple projects for free.
33:20 So you did say there were some.
33:23 There was Pigeon, there's PyPy, there's Unladen Swallow, there's some other options as
33:33 well, but those are the JITs that are coming to mind.
33:35 Piston, all of those were attempts and I have not heard anything about any of them for a
33:39 year, so that's probably not a super sign for their adoption.
33:43 Yeah, so the ones I kind of picked on because I think they've got a lot of promise
33:46 and kind of show a big performance improvement is PyPy, which shouldn't be new.
33:52 I mean, it's a popular project, but PyPy uses a...
33:55 PY, PY, because some people say like Python package inject, they also call it PyPy, but
34:00 that's a totally different thing.
34:01 Yeah, so PyPy...
34:02 Just for listeners who aren't sure.
34:03 PyPy kind of helped solve the argument for my talk actually, because if Python is slow, then
34:10 writing a Python compiler in Python should be like really, really slow.
34:14 But actually, PyPy, which is a Python compiler written in Python, in problems like the n-body
34:21 problem, where you're doing the same thing again and again, it's actually really good at
34:26 it.
34:26 Like, it's significantly...
34:28 It's 700-something percent faster than CPython at doing the same algorithm.
34:33 Like, if you copy and paste the same code and run it in PyPy versus CPython, yeah, it will
34:40 run over seven times faster in PyPy, and PyPy is written in Python.
34:44 So it's an alternative Python interpreter that's written purely in Python.
34:49 But it has a JIT compiler.
34:51 That's probably the big difference.
34:52 Yeah.
34:52 As far as I understand it, PyPy is kind of like a half JIT compiler.
34:57 It's not like a full JIT compiler like, say, C# or Java, in that it
35:02 will, like, run on CPython and then, like, decide to JIT compile the stuff that's run a lot.
35:08 I feel like that's the case.
35:09 PyPy is a pure JIT compiler, and then number is a, you can basically choose to JIT
35:16 certain parts of your code.
35:17 So with number, you can use a, actually, a decorator, and you can stick it on.
35:22 An at JIT.
35:23 Yeah, it literally is that.
35:25 You can do an at JIT on a function, and it will JIT compile that function for
35:30 you.
35:30 So if there's a piece of your code which would work better if it were JITed, like it would be
35:35 faster, then you can just stick a JIT decorator on that using the number
35:40 package.
35:41 Yeah, that's really cool.
35:42 Do you have to, how do you run it?
35:44 I've got some function within a larger Python program, and I put an at JIT on it.
35:48 Like, how do I make it actually JIT that and, like, execute?
35:52 Can I still type Python space, I think, or what happens?
35:56 I don't know.
35:56 Do you know?
35:57 Yeah, I'm just wondering, like, it probably is the library that, as it pulls
36:02 in what it's going to give you back, you know, the wrapper, the decorator, the
36:05 function, it probably does JIT.
36:07 So interesting.
36:07 I think that's a really good option.
36:09 Of all the options, honestly, I haven't done anything with Numba, but it looks like probably the
36:13 best option.
36:13 It sounds a little bit similar to Cython, but Cython's kind of the upfront style, right?
36:20 Like, we're going to pre-compile this Python code to see, whereas Numba, it sounds more, a
36:25 little more runtime.
36:26 Yeah, so Cython is not really a JIT or a JIT optimizer.
36:31 It's a way of decorating your Python code with type annotations and using, like, a sort of
36:40 slightly different syntax to say, oh, this variable is this type, and then Cython will
36:47 actually compile that into a C extension module, and then you run it from CPython.
36:51 So it basically, like, compiles your Python into C and then loads it as a
36:57 C extension module, which can make a massive performance improvement.
37:01 Yeah, so you've got to run a, like, a set of py build command to generate the libraries, the
37:07 .o files, or whatever the platform generates, and then those get loaded in.
37:13 Even if you change the Python code that was their source, you've got to recompile them, or it's
37:18 just still the same old compiled stuff, same old binaries, yeah.
37:21 You can automate that so you don't have to type it by hand, but I think Cython is a really good
37:26 solution for speeding it up.
37:28 But as I kind of pointed out in my talk, it doesn't answer the question of why Python is
37:32 slow.
37:33 It says, well, Python can be faster if you do C instead.
37:37 Yeah.
37:37 One thing I do like about Cython these days is they've adopted the type hints,
37:42 type annotation format.
37:44 So if you have, what is that, Python 3, 4, or later type annotations, you
37:51 got to be explicit on everything.
37:53 But if you have those, that's all you have to do to turn it into like official Cython, which is
38:00 nice because it used to be you'd have to have like a C type or Cython type dot
38:04 int rather than a, you know, colon int or something funky like that.
38:08 Yeah.
38:08 And it's nice that they brought the two things together.
38:10 Cython like had type annotations before the language did, I think.
38:14 Right.
38:14 Yeah.
38:15 So they had their own special way.
38:16 They had their own special little sub language that was Python-esque, but not
38:20 quite.
38:20 So I was looking at this nbody problem and I thought, all right, well, I
38:24 probably should have played with Numba, but I have a little more experience with
38:27 Cython.
38:27 So let me just see, like the code is not that hard and I'm going in terms of
38:32 like how much code is there or whatever.
38:34 Sure.
38:34 The math is hard, but the actual execution of it isn't.
38:37 So I'll link to the actual Python source code for the nbody problem.
38:41 And I ran it.
38:42 It has some defaults that are much smaller than the one you're talking about.
38:45 So if you run it, just hit run.
38:46 It'll run for like two on my machine.
38:48 It ran for 213 milliseconds just in pure CPython.
38:52 So I said, all right, well, what if I just grab that code and I just plunk it
38:56 into a PYC file unchanged.
38:59 I didn't change anything.
39:00 I just moved it over.
39:01 I got it to go into 90 milliseconds, which is like 2.34 times faster.
39:06 And then I did the type hints that I told you about.
39:09 Because if you don't put the type hints, it'll still run, but it will work at the, the, the
39:14 pie object level.
39:16 Like, so your numbers are pie object numbers, not, you know, ints and floats down.
39:22 So you make it a little bit faster.
39:23 So, but I was only able to get it four times faster down to 50 milliseconds.
39:26 Either I was doing it wrong or that's just about as that much faster as I
39:30 can get it.
39:31 I could have been missing some types and it was still doing a little more
39:33 CPython interrupt stuff.
39:36 But yeah, I don't know.
39:37 It's, it's an interesting challenge.
39:39 I guess the last thing to talk about, like on this little bit right here is
39:42 the, is my PYC.
39:44 Yeah.
39:44 I didn't know much about my PYC.
39:46 I don't know a lot about it either.
39:47 So my PY is a type checking library and verification library for the type annotations.
39:54 Right.
39:54 So if you put these type annotations in there, they don't do anything at runtime.
39:57 They're just like there to tell you stuff.
39:59 Right.
40:00 But things like certain editors can partially check them or my PY can like
40:05 follow the entire chain and say this code looks like it's typewise hanging
40:09 together.
40:10 Not like a pure five levels.
40:12 pass an integer and you expect a string.
40:14 So it's broken.
40:15 Right.
40:15 It can check that.
40:16 So they added this thing called my PYC, which can take stuff that is annotated in a way that
40:22 my PY works with, which is basically type annotations, but more.
40:25 And they can compile that to see as well, which they also interestingly got
40:30 like a four times speed up with stuff, not in the embody problem, but on my PY.
40:34 So I don't know.
40:34 It's, there's a lot of options, but as you point out, they are a little bit dodging Python.
40:41 The number stuff is cool because I think you don't really write different code.
40:45 Do you?
40:46 Yeah, it's been more natural.
40:47 And I think PYPY, like you're saying you kind of got two to four times improvement by moving
40:54 things to Siphon.
40:55 And it took a decent amount of work, right?
40:56 Because every loop variable had to be declared somewhere else because you can't set the
41:00 type or the type annotation inside the loop declaration, right?
41:03 Like it wasn't just put a colon in.
41:05 I had to do like a decent amount of work to drag out the types.
41:08 Yeah.
41:08 And whereas PYPY will be a seven times improvement in speed, for that problem.
41:13 Yeah.
41:14 And there's no C compilation.
41:15 Yes.
41:16 That's really nice.
41:17 That's really nice.
41:18 So we talked about JITs and JITs are pretty interesting.
41:21 To me, I feel like JITs often go together with garbage collection in the entirely
41:28 unmanaged sort of non-deterministic sense of garbage collection, right?
41:33 Not reference counting, but sort of the mark and sweep style.
41:37 So Python, I mean, maybe we could talk about GC at Python first and then
41:41 if there's any way to like change that or advantages there, disadvantages.
41:46 From the Instagram story that they saw a performance improvement when they turned off GC.
41:52 Yeah, like we're going to solve the memory problem by just letting it leak.
41:55 Like literally, we're going to disable garbage collection.
41:58 Yeah, I think they got like a 12% improvement or something.
42:01 It was significant.
42:02 They turned it off and then they just restarted the worker processes every 12 hours
42:06 or something like that.
42:06 And it wasn't that bad.
42:07 The GC itself, like to your, I said there's another problem that I studied
42:13 which was the binary tree problem.
42:15 And this particular problem will show you the impact of the garbage collector
42:22 performance on, like in this particular algorithm, this benchmark, it will show you
42:27 how much your GC slows down the program.
42:30 And again, I wanted to compare Node with Python because they both have both reference
42:35 counting and garbage collection.
42:36 So the garbage collector with Node is a bit different in terms of its design,
42:42 but both of them are a stop everything garbage collector.
42:45 So, you know, CPython has a main thread, basically, and the garbage collector will run
42:52 on the main thread and it will run every number of operations.
42:55 So, I think the, I can't remember what the default is, it's like 3,000 or
42:59 something.
42:59 Every 3,000 operations in the first generation where an object has been assigned
43:05 or deassigned, then it will run the garbage collector, which goes and inspects every,
43:09 every list, every dictionary, every, what other types, like custom objects,
43:14 and sees if they have any circular references.
43:17 Right, and the reason we need the GC, which does this, is because it's not
43:21 even the main memory management system, because if it was, Instagram would not
43:26 at all be able to get away with that trick.
43:27 Right, this is like a, a final net to catch the stuff that reference counting doesn't
43:33 work.
43:33 Normally, like if there's some references to an object, once things stop
43:37 pointing at it, the last one that goes, it just poof, it disappears.
43:41 But the challenge of reference counting garbage collection is if you've got like
43:46 some kind of relationship where one thing points at the other, but that thing
43:49 also points back to itself, right, like a couple object, right, a person object
43:54 with a spouse pointer or something like that, right?
43:57 If you're married, you're going to leak.
43:58 Yeah, absolutely.
43:59 So this is the thing you're talking about, those types of things that's addressing.
44:02 And it's kind of designed on the assumption that most objects in CPython
44:06 have very short lifespans.
44:09 So, you know, they get created and then they get destroyed shortly afterwards.
44:13 So like local variables inside functions or, you know, like local variables
44:17 inside list comprehensions, for example, like those can be destroyed pretty much
44:22 straight away.
44:22 But the garbage collective will stop everything running on the main thread
44:26 while it's running because it has to because you can't, you know, if it's deleting stuff
44:30 and there's something else running at the same time that's expecting that thing
44:34 to exist, it's going to cause all sorts of problems.
44:36 So yeah, the GC will kind of slow down your application if it gets hit a lot.
44:41 And the binary tree problem will basically construct a series of trees and then loop
44:47 through them and then delete the nodes and the branches, which kind of triggers
44:51 the GC to run a lot.
44:53 And then you can compare the performance of the garbage collectors.
44:56 So one thing I kind of noted in the design is that they stop everything.
45:02 If the time it takes to run the garbage collector could be as short as possible,
45:06 then the performance hit of running it is going to be smaller.
45:08 And something that Node does is it runs a multi-threaded mark process.
45:13 So when it actually goes and looks for circular references, it actually starts
45:18 looking before it stops the main thread on different helper threads.
45:23 So it starts separate threads and starts the mark process.
45:26 And then it still stops everything on the main process, but it's kind of prepared all its
45:31 homework ahead of time.
45:32 It's already figured out what is garbage before it stops stuff.
45:36 And it's like, now we just have to stop what we throw it away and update the pointers and
45:41 then you can carry on, right?
45:42 Because it's got to, you know, balance the memory and stop allocation and whatnot.
45:45 Yeah.
45:46 So I think technically that's possible in CPython.
45:49 I don't think it has anything to do with the GIL either, like why that couldn't be
45:53 done.
45:53 You could still do...
45:55 Right.
45:55 It seems like it totally could be done.
45:56 Yeah.
45:56 Yeah.
45:57 Because marking and finding circular references could be done outside of the gill
46:01 because it's a C-level call.
46:02 It's not an opcode.
46:04 But like I say in the talk, you know, all this stuff that I've listed so far is a lot of
46:11 work and it's a lot of engineering work that needs to go into it.
46:14 And if you actually look at the CPython compiler, like the CEval, and look
46:20 at the number of people who've worked on or contributed to it, it's less than 10
46:24 like to the core component.
46:27 I wouldn't want to touch it.
46:28 I would not want to get in there and be responsible for that part of it.
46:31 No way.
46:31 Yeah.
46:33 And at this stage, they're minor optimizations.
46:35 They're not sort of big overhauls because there just isn't the people to do it.
46:41 Yeah.
46:41 You made a point in your PyCon talk that, you know, the reason that V8 got to be so
46:47 optimized so fast is because it's got, you know, tens of millions of dollars of
46:51 engineering put against it yearly.
46:54 Right?
46:55 I mean, it's kind of part of the browser wars.
46:58 The new browser wars a bit.
47:00 Yeah.
47:00 From what I could work out, there's at least 35 permanent developers working on
47:05 it.
47:05 Just looking at the GitHub project, like if you just see the commit histories, like
47:10 nine to five, Monday to Friday, 35 advanced C++ developers hacking away at it.
47:16 Right.
47:16 If we had that many people continuously working on CPython's like internals and
47:22 garbage collection and stuff, we'd have more optimizations or bigger projects that people
47:26 will try to take on probably.
47:27 Yeah, absolutely.
47:27 And the people who work on it at the moment, all of them have day jobs and this
47:32 is not typically their day job.
47:33 Like they managed, they've convinced their employer to let them do it in their spare time
47:38 or, you know, one or two days a week, for example, and they're finding the time to do
47:42 it.
47:42 And it's a community run project.
47:44 it's an open source project.
47:45 But I think kind of going back to places where Python could be faster, like these kind
47:51 of optimizations in terms of engineering, they're expensive optimizations.
47:56 They cost a lot of money because they need a lot of engineering expertise and a lot of
48:01 engineering time.
48:02 And I think as a project at the moment, we don't really have that luxury.
48:06 So it's not really fair of me to complain about it if I'm not contributing to the
48:12 solution.
48:12 Yeah, but you have a day job as well, right?
48:14 But I have a day job and this is not day job.
48:16 So yeah, I think there's, I think for what we use Python for most of the
48:21 time, it's definitely fast enough.
48:23 And in places where it could have optimizations like the ones that we talked about, those
48:28 optimizations have drawbacks because, you know, adding a JIT, for example, means that it
48:34 uses a lot more memory.
48:35 like the Node.js example, the n-body problem, sure, it finishes it faster, but
48:40 uses about five times more RAM to do it.
48:42 Right.
48:43 And PyPy uses more memory, like the JIT compiler, and also the startup time of the
48:48 process is typically a lot longer.
48:50 If anyone's ever tried to boot Java JVM cold, you know, like the startup time for
48:57 JVM is pretty slow.
48:58 .NET's the same, like the initial boot time for it to actually get started and warm up
49:03 is time consuming.
49:05 So you wouldn't use it as a, like a command line tool to write a simple script
49:10 that you'd expect to finish in, you know, under 100 milliseconds.
49:13 I think that that kind of highlights one of the challenges, right?
49:16 It's if you thought your process was just going to start and be a web server or a desktop
49:21 application, two seconds start of time is fine, or whatever that number is.
49:26 But if it's solving this general problem, yeah, it could be running Flask as
49:30 a microservice, or it could be, you know, replacing Bash, right?
49:35 Like these are very different constraints and interests, right?
49:38 Yeah.
49:38 And there aren't really many other languages where there is one sort of language definition and
49:44 there are multiple mature implementations of it.
49:47 So, you know, with Python, you know, you've got Cython, you've got PyPy, you've got
49:53 Numba, you've got LionPython.
49:55 I mean, there's like a whole list of, you know, different, Jython, like different implementations
50:02 of the language.
50:02 And people can choose the, I guess, kind of pick which one is best for the problem that they're
50:08 trying to solve, but use the same language across them.
50:10 Whereas you don't really have that luxury with others.
50:12 You know, if you're writing Java, then you're using JVM.
50:15 There are, I mean, there's two implementations.
50:17 It's the free one and the licensed one, but like that's pretty much as far as it goes.
50:22 That's not exactly the same trade-off.
50:25 Yeah.
50:25 It's optimizing for money.
50:26 That's not optimizing for performers or whatever necessarily.
50:30 So one thing that I feel like comes around and around again in this discussion, and I'm
50:36 thinking mostly of like PyPy and some of these other attempts people have made to add like
50:41 JIT compilation to the language or other changes.
50:43 It's always come back, it seems like, to well, it would be great to have these
50:49 features.
50:49 Oh yeah, but there's this thing called the C API.
50:52 And so no, we can't change the GIL.
50:54 No, we can't change memory allocation.
50:56 No, we can't change any of these other things because of the C API.
51:01 And so we're stuck.
51:02 Yeah.
51:03 I mean, I'm not saying I'm asking you for a solution here.
51:07 like, I just, it feels like that is both the real value of Python in that like some of the
51:15 reasons that we can still do insanely computational stuff with Python is
51:20 because a lot of these libraries where they have these tight loops or these little bits of code
51:24 deserialization or matrix multiplication or whatever, they've written that in C
51:29 and then ship that as a wheel.
51:31 And so now all of a sudden our code is not slow as doing math with Python,
51:35 as fast as doing math with C.
51:37 Yeah.
51:37 I mean, so if you look at a NumPy, for example, if you're doing a lot of
51:41 math, then you, you know, you could be using the NumPy library, which is
51:45 largely compiled C code.
51:47 It's not like you import it from Python and you run it from Python, but the
51:51 actual implementation is a C extension.
51:54 And that wouldn't be possible if CPython wasn't built in the way it is, which is that it is a
52:00 ahead of time extension loader that you can run from Python code.
52:04 Yeah.
52:05 One project I do want to give a shout out to, I don't know if it's going to
52:08 go anywhere.
52:08 It's got a decent amount of work on it, but it's only got 185 GitHub stars.
52:13 So take that for what it's worth.
52:14 This thing called HPY, H-P-Y.
52:17 Guido Van Rossum called this out on Python Bytes 179 when he was a guest co-host there.
52:24 And it's an attempt to make a new replacement of the C API for Python, where instead of
52:33 pass around pointers to objects, you pass basically pointers to pointers, which
52:37 means that things that move stuff around like compacting garbage collectors or other
52:44 implementations like JITs have a much better chance to change things without directly breaking
52:49 the C API.
52:50 You can change the value of the pointer pointer without, you know, having to reassign that
52:55 down at that layer.
52:56 So they specifically call out it's, you know, the current C API makes it hard for things like
53:02 PyPy and Grail Python and JITon.
53:04 And the goals are to make it easier to experiment with these ideas, more friendly for other
53:10 implementations, reference counting, for example, and so on.
53:14 So anyway, I don't know that's going anywhere, how much traction it has, but it's interesting
53:19 idea.
53:20 Yeah, no, I like the idea.
53:21 And the C API, like, has come a long way, but it's got its quirks.
53:26 I don't know, there's been a lot of discussions, and there's a lot of draft peps as well, you know,
53:31 proposing kind of different designs to the C API.
53:34 Yeah.
53:35 So we're getting kind of short on time.
53:36 We've discussed a bunch of stuff.
53:38 I guess two other things I'd like to cover real quickly.
53:41 One, we've talked about a lot of stuff in terms of computational things, but understanding memory
53:48 is also pretty important.
53:49 And we did just talk about the GC.
53:50 It's pretty easy in Python to just run C profile and ask what my computational time is.
53:57 It's less obvious how to understand memory allocation and stuff.
54:00 And was it you that recommended Austin to me?
54:03 Yeah.
54:04 Yeah, so Austin is a super cool profiler, but does both CPU profiling, but also memory
54:09 allocation profiling and tracing in Python.
54:13 Do you want to tell people about Austin real quick?
54:14 Yeah, so Austin is a new profiler written for Python code.
54:18 It's a sampling profiler, so it won't, like other profilers, it won't slow your code down
54:23 significantly.
54:23 It's kind of basically sits on the side, just asking your app, you know, what it's doing as a
54:30 sample.
54:30 And then it will give you a whole bunch of visuals to let you see, like flame graphs, for example,
54:36 like what's being called, what's taking a long time, which functions are chewing up your CPU, like
54:42 which ones are causing the bottlenecks and then which ones are consuming a lot of memory.
54:46 So if you've got a, you know, a piece of code that is, it is slow, the first thing you should probably
54:52 do is just stick it through a profiler and see if there is a reason why, like if there is
54:57 something that you could either optimize or, you know, you've accidentally done like a nested
55:02 loop or something and Austin would help you do that.
55:06 One of the things I thought was super cool about this, like the challenge I have so often with
55:10 profilers is the startup of whatever I'm trying to do, it just overwhelms like the little thing I'm
55:16 trying to test.
55:18 you know, I'm like starting up a web app and initializing database connections and I just want to
55:22 request a little bit of some paid and it's not that slow, but you know, it's just, I'm seeing all
55:28 this other stuff around and I'm just like, I just want to focus on this one part of it and they've got all
55:33 these different user interfaces, like a web user interface in a terminal
55:37 user interface.
55:38 They call it two, which is cool.
55:39 And it gives you like a, like kind of like top or glances or one of these things that tells you right now,
55:45 here's what the profile for the last five seconds looks like.
55:48 And it gives you the call stack and breakdown of your code right now for
55:53 like that five second segment, like updating in real time.
55:56 That's super cool.
55:56 Yeah.
55:57 So if you want to run something and then just see what it's doing or you
56:00 want to replay it.
56:01 Why is it using a lot of CPU now?
56:02 Yeah.
56:03 Yeah.
56:03 Yeah.
56:03 That's, I really like that.
56:04 That's super cool.
56:05 All right.
56:06 Also, you know, concurrency is something that Python has gotten a bad rap for in terms of slowness.
56:11 I think with async and await and asyncio, if you're waiting on an external thing, Python can be ultra
56:17 fast now, right?
56:18 Like it's acing and awaiting waiting on like database calls, web calls with the right drivers, super fast.
56:26 But when it comes down to the computational stuff, there's still the GIL and there's really not a
56:30 great fix for that.
56:32 I mean, there's multiprocessing, but that's got a lot of overhead.
56:35 So it's got to make sense, right?
56:36 Kind of like your plumber analogy, right?
56:38 You can't do like one line function calls in multiprocessing or, you know, like one line computations.
56:45 But the work that Eric Snow's doing with subinterpreters looks pretty promising to unlock another layer.
56:50 Yeah.
56:50 So it's out in the 3.9 alpha.
56:53 If you've played with that yet, it's still experimental.
56:56 So subinterpreters is somewhere in between multiprocessing and threading in terms of the like the
57:03 implementation.
57:04 So it will it doesn't spawn.
57:06 So if you use multiprocessing, I mean, that's basically just saying let's hire another plumber and we'll get
57:13 them to talk to each other at the beginning of the day and split up the tasks.
57:17 Whereas subinterpreters, actually, maybe they're sharing the same van.
57:20 I'm not sure where this analogy is going, but, you know, they use the same process.
57:25 The subinterpreters share the same Python process.
57:27 It doesn't spawn up an entirely new process.
57:29 It doesn't have to load all the modules again.
57:33 And the subinterpreters can also talk to each other.
57:36 they can use shared memory to communicate with each other as well.
57:40 But because they're separate interpreters, then technically they can have their own locks.
57:47 So the lock that, you know, gets locked whenever you run any opcode is the
57:52 interpreter lock.
57:52 And this basically means that you can have two interpreters running in a
57:57 single process, each with its own lock.
57:59 So it can be running different operations at the same time.
58:03 Right.
58:04 They would automatically run on separate threads.
58:06 So you're basically running multi-threading and it can also use multi-CPU.
58:10 That'd be great.
58:11 Fundamentally, the GIL is not about a threading thing per se.
58:16 It's about serializing memory access allocation and deallocation.
58:21 And so with the subinterpreters, the idea is you don't directly share pointers
58:26 between subinterpreters.
58:27 There's like a channel type of communication between them.
58:30 So you don't have to take a lock on one when it's working with objects versus another, like they're entirely
58:36 different set of objects.
58:37 They're still in the same process space, but they're not actually sharing
58:40 pointers.
58:40 So you don't need to protect each other.
58:42 Right.
58:42 You just have to protect within each subinterpreter, which has a possibility to let me use all six of
58:47 my cores.
58:48 Yeah, absolutely.
58:49 You can't read and write from the same local variables for that reason, which you can do in threading.
58:54 But with subinterpreters, it's kind of like a halfway halfway between just
58:58 running a separate process.
58:59 Yeah.
58:59 It probably formalizes some of the multi-threading communication styles that are going to keep things safer
59:05 anyway.
59:06 Definitely.
59:07 Yeah.
59:07 All right.
59:08 Let's talk about one really quick thing before we wrap it up.
59:10 Just one interesting project that you've been working on.
59:13 I mentioned that you were on before about some security issues, right?
59:16 Yeah.
59:16 I want to tell people about your PyCharm extension that you've been working on.
59:19 Yeah.
59:19 So I've been working on a PyCharm extension called Python Security.
59:23 It's very creatively named.
59:25 It's available.
59:27 Take the straightforward.
59:28 Yeah, exactly.
59:29 So it's basically like a code checker, but it runs inside PyCharm and it will
59:34 look for security vulnerabilities that you may have written in your code and
59:39 underline them for you and in some cases fix them for you as well.
59:42 So it will say the thing you've done here is really bad because it can cause
59:46 someone to be able to hack into your code and you can just press the quick
59:50 fix button and it could fix it for you.
59:52 So it's got actually over a hundred different inspections now.
59:56 And also you can run it across...
59:58 Should I use the YAML.load still?
01:00:00 Is that good?
01:00:00 No.
01:00:01 I think that was like the first checker, right?
01:00:05 Actually, it was the YAML.load.
01:00:07 Yeah, you can run it across the whole project.
01:00:09 So you can do a code inspection across your project to like a code audit.
01:00:12 And also it uses PyCharm's package manager.
01:00:15 So it will go in and look at all the packages you have installed in your project and it will
01:00:21 check them against either Snyk, which is a big database of vulnerable Python packages.
01:00:28 Snyk.io uses their API.
01:00:31 So it checks it against that or you can check it against like your own list.
01:00:36 And also it's available as a GitHub action.
01:00:39 So manage to figure out how to run PyCharm inside Docker so that you can run PyCharm from
01:00:46 GitHub action.
01:00:47 Wow.
01:00:48 Yeah, you can write a CICD script in GitHub to just say inspect my code and it will just
01:00:55 inside GitHub.
01:00:56 You don't need PyCharm to do it, but it will run the inspection tool against your code repository.
01:01:01 It just requires that it's open source to be able to do that.
01:01:03 Okay, that's super cool.
01:01:04 All right, well, we're definitely out of time, so I have to leave it there.
01:01:07 Two quick questions.
01:01:08 Favorite editor, notable package?
01:01:10 What do you got?
01:01:10 PyCharm and I don't know about the notable package.
01:01:14 I don't know.
01:01:15 Yeah, you've been too far in the C code.
01:01:16 Yeah, I know.
01:01:17 I'm like, what are packages?
01:01:18 I think there's something that does install those, but they don't work down in C.
01:01:22 Yeah, no, that's cool.
01:01:23 All right, so people are interested in this.
01:01:26 They want to maybe understand how CPython works better or how that works and where and why
01:01:31 it might be slow so they can avoid that.
01:01:33 Or maybe they even want to contribute.
01:01:35 What do you say?
01:01:35 Wait for my book to come out and read the book or read the real Python article, which is free
01:01:40 and online.
01:01:40 And it talks through a lot of these concepts.
01:01:43 Yeah, right on.
01:01:43 Well, Anthony, thanks for being back on the show.
01:01:46 Great, as always, to dig into the internals.
01:01:48 Thanks, Michael.
01:01:48 Yeah, you bet.
01:01:49 Bye.
01:01:49 Bye.
01:01:49 Bye.
01:01:50 Bye.
01:01:51 This has been another episode of Talk Python to Me.
01:01:54 Our guest on this episode was Anthony Shaw, and it's been brought to you by Brilliant.org
01:01:59 and Sentry.
01:01:59 Brilliant.org encourages you to level up your analytical skills and knowledge.
01:02:04 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every
01:02:10 day.
01:02:10 Take some stress out of your life.
01:02:12 Get notified immediately about errors in your web applications with Sentry.
01:02:17 Just visit talkpython.fm/sentry and get started for free.
01:02:21 Want to level up your Python?
01:02:23 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.
01:02:28 Or if you're looking for something more advanced, check out our new Async course that digs into
01:02:33 all the different types of async programming you can do in Python.
01:02:36 And of course, if you're interested in more than one of these, be sure to check out our
01:02:40 Everything Bundle.
01:02:41 It's like a subscription that never expires.
01:02:43 Be sure to subscribe to the show.
01:02:45 Open your favorite podcatcher and search for Python.
01:02:47 We should be right at the top.
01:02:48 You can also find the iTunes feed at /itunes, the Google Play feed at /play,
01:02:53 and the direct RSS feed at /rss on talkpython.fm.
01:02:57 This is your host, Michael Kennedy.
01:02:59 Thanks so much for listening.
01:03:01 I really appreciate it.
01:03:02 Now get out there and write some Python code.
01:03:04 I'll see you next time.
01:03:24 Thank you.