Learn Python with Talk Python's 270 hours of courses

#265: Why is Python slow? Transcript

Recorded on Tuesday, May 12, 2020.

00:00 The debate about whether Python is fast or slow is never-ending.

00:03 It depends on what you're optimizing for.

00:05 CPU server consumption?

00:07 Developer time?

00:08 Maintainability?

00:09 There are many factors.

00:11 But if we keep our eye on the pure computational speed in the Python layer,

00:15 then yes, Python is slow.

00:17 In this episode, we invite Anthony Shaw back on the show.

00:21 He's here to dig into the reasons that Python is computationally slower than many of its pure languages and technologies, such as C++ and JavaScript.

00:29 This is Talk Python to Me, episode 265, recorded May 19, 2020.

00:34 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:54 This is your host, Michael Kennedy.

00:56 Follow me on Twitter, where I'm @mkennedy.

00:58 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:04 This episode is sponsored by Brilliant.org and Sentry.

01:08 Please check out their offerings during their segments.

01:10 It really helps support the show.

01:12 Anthony, welcome back to Talk Python.

01:15 Hey, Mike.

01:15 It's great to be back.

01:16 Yeah, it's great to have you back.

01:17 You've been on the show a bunch of times.

01:19 You've been over on Python Bytes when you're not featured there.

01:22 But, you know, people may know you were on episode 168, 10 Python security holes and how to plug them.

01:29 That was super fun with one of your colleagues.

01:31 And then 214, dive into the CPython 3.8 source code.

01:36 Or just what was new in 3.8.

01:38 And then a guided tour of the CPython source code, which I think at the time was also 3.8.

01:42 And now we're going to look at the internals of Python again.

01:45 I feel like you're becoming the Python internals guy.

01:48 Yeah.

01:48 Well, I don't know.

01:50 There's lots of people who know a lot more about it than I do.

01:53 But I've been working on this book over the last year on CPython internals, which has been focused on 3.9.

02:00 So, yeah, we've got some stuff to talk about.

02:03 Yeah, that's awesome.

02:04 And your book started out as a realpython.com article, which I'm trying to define a term that describes what some of these look like.

02:13 When I think of article, I think of a three to four page thing.

02:17 Maybe it's in depth and it's 10 pages.

02:19 This is like 109 pages or something as an article, right?

02:22 It was like insane.

02:23 But it was really awesome and really in depth.

02:24 And so you were partway towards a book and you figured like, well, what the heck?

02:28 I'll just finish up this walk.

02:29 Yeah, I figured I'd pretty much written a book.

02:32 So I might as well put it between two covers.

02:34 It was actually a lot.

02:36 It was actually a lot of work to get it from that stage to where it is now.

02:41 So I think the whole thing's pretty much been rewritten.

02:43 There's a way that you explain things in an article that people expect, which is very different to the style of a book.

02:49 And also there's stuff that I kind of skimmed over in the article.

02:53 I think it's actually about three times longer than the original article.

02:56 And it's a lot more practical.

02:59 So rather than being like a tourist guide to the source code, it's more about like CPython internals and optimizations and practical tools you can learn as more sort of like advanced techniques.

03:12 If you use CPython a lot for your day job to either make it more performant or to optimize things or to make it more stable and stuff like that.

03:21 Yeah.

03:21 It's really interesting because if you want to understand how Python works and you're, say, the world's best Python developer, your Python knowledge is going to help you a little bit.

03:31 But not a ton for understanding CPython because that's mostly, well, C code, right?

03:36 And so I think this having this guided tour, this book that talks about that is really helpful, especially for taking people who know and love Python, but actually want to get a little deeper and understand the internals or maybe even become a core developer.

03:49 Yeah, definitely.

03:49 And if you look at some of the stuff we'll talk about this episode, hopefully, like Cython and mypyC and stuff like that, then knowing C or knowing how C and Python work together is also really important.

04:02 Yeah, absolutely.

04:02 All right.

04:03 So looking forward to talking about that.

04:05 But just really quickly, you know, give people a sense of what you work on day to day when you're not building extensions for IDEs, writing books and otherwise doing more writing.

04:15 Yeah, so I work at NTT and run sort of learning and development and training for the organization.

04:22 So I'm involved in, I guess, like what skills we teach our technical people and our sales people and all of our employees, really.

04:30 Yeah, that's really cool.

04:31 That sounds like a fun place to be.

04:32 Yeah, that's a great job.

04:33 Yeah, awesome.

04:33 All right.

04:34 Well, the reason I reached out to you about having you on the show for this specific topic, I always like to have you on the show.

04:42 We always have fun conversations, but I saw that you were doing, were you doing multiple or just this PyCon talk?

04:50 Just one.

04:51 I was accepted for two, but I was supposed to pick one.

04:55 I see.

04:55 That's right.

04:55 That's right.

04:56 And then PyCon got canceled.

04:57 Yeah.

04:59 So I was like, well, let's, you know, talk.

05:00 We can talk after PyCon after you give your talk.

05:02 It'll be really fun to cover this.

05:04 And then, you know, we were supposed to share a beer in Pittsburgh and we're like half a world away.

05:12 Didn't happen, did it?

05:13 Yeah.

05:13 Maybe next year.

05:14 Yeah.

05:15 Hopefully next year.

05:15 Hopefully things are back to up and running because I don't know.

05:18 To me, PyCon is kind of like my geek holiday that I get to go on.

05:22 I love it.

05:22 Yeah.

05:23 All right.

05:23 Well, so just, I guess, for people listening, you did end up doing that talk in an altered sense,

05:30 right?

05:30 And they can technically go watch it soon, at least maybe by the time this is out.

05:34 Yeah, definitely.

05:35 It'll be out tonight.

05:36 It's going to be on the YouTube channel, the PyCon 2020 YouTube channel.

05:41 The organizers reached out to all the speakers and said, if you want to record your talk and

05:46 submit it from home, then you can still do that and put them all up on YouTube.

05:50 I think that's great.

05:51 You know, and there's also a little bit more over PyCon online.

05:54 One thing I think is really valuable for people right now is they have the job fair, kind of,

06:00 right?

06:01 There's a lot of job listings for folks who are looking to get in jobs.

06:05 Have you seen the PSF JetBrains survey that came out?

06:08 Yes.

06:09 In the 2019, it came out just like a few days ago.

06:11 Really interesting stuff, right?

06:13 Like a lot of cool things in there.

06:14 Yeah, definitely.

06:15 Yeah.

06:15 I love that.

06:16 That and the Stack Overflow developer survey.

06:18 Those are the two that really, I think, have the pulse correctly taken.

06:22 One of the things that was in there I thought was interesting is more than any other category

06:27 of people, they said, how long have you been coding?

06:30 I don't know if it was in Python or just how long have you been coding, but it was different,

06:36 you know, one to three years, three to five, five to 10, 10 to 15.

06:41 And then people like me forever, long time, you know, like 20 plus or something.

06:45 The biggest bar of all those categories, the biggest group was the one to three years.

06:51 Yeah.

06:52 Right.

06:52 Like by 29% of the people said, I've only been coding three years or fewer.

06:56 And I think that that's really interesting.

06:58 So I think things like that job board and stuff are probably super valuable for folks just getting

07:02 into things.

07:03 Definitely.

07:03 Yeah.

07:03 So really good that they're putting that up and people will be able to check out your

07:07 talk.

07:07 I'll put a link to it in the show notes, of course, but they can just go to the PyCon 2020

07:11 YouTube channel and check it out there.

07:13 Yeah.

07:13 And check out the other talks as well.

07:15 There's some really good ones up already.

07:16 The nice thing about this year's virtual PyCon is you can watch talks from your couch.

07:20 That's right.

07:22 You don't even have to get dressed to go to PyCon.

07:25 Just do it in your PJs.

07:26 That's right.

07:27 It's so much more comfortable than the conference chairs.

07:31 That's true.

07:31 That's for sure.

07:32 Yeah.

07:33 Very cool.

07:33 I'm definitely looking forward to checking out more of the talks as well.

07:35 I've already watched a few.

07:36 I wanted to set the stage for our conversation here by defining slow because I think slow is

07:44 in the eye of the beholder, just like beauty, right?

07:46 Like sometimes slow doesn't matter.

07:50 Sometimes computational speed might be slow, but some other factor might be quick.

07:57 So I'll let you take a shot at it, then I'll throw in my two cents as well.

08:00 Like let's like, what do you mean when you say, why is Python slow?

08:04 So when I say, why is Python slow?

08:06 The question is, why is it slower than other languages doing exactly the same thing and have

08:14 picked on an error?

08:15 Right.

08:15 So if I had an algorithm that I implemented, say in C, a JavaScript on top of Node and Python,

08:20 it might be much slower in Python.

08:23 Wall time, like execution time.

08:25 Yeah.

08:26 Execution time might be much slower in Python than it is in other languages.

08:29 And that matters sometimes.

08:31 And sometimes it doesn't matter as much.

08:34 It depends what you're doing, right?

08:35 If you're doing like a DevOps-y thing and you're trying to orchestrate calling into Linux, well,

08:40 who cares how fast Python goes?

08:42 Probably like the startup time is the most important of all of them.

08:45 If you're modeling stuff and you're trying to do the mathematical bits, anything computational,

08:51 and you're doing that in Python, then it really might matter to you.

08:54 Yeah.

08:55 So it was kind of like a question, if we can find out the answer, maybe there's a solution

09:00 to it.

09:00 Yeah.

09:01 Because, you know, you hear this thrown around.

09:02 People say Python's too slow and I use this other language because it's faster.

09:06 And so I just wanted to understand, like, what is the actual reason why Python is slower

09:12 at doing certain things than other languages?

09:14 And is there a reason that can be resolved?

09:18 Or is it just that's just how it is as part of the design?

09:22 Fundamentally, it's going to be that way.

09:23 Yeah.

09:24 I don't think it is.

09:25 I think...

09:26 You don't think it's slow?

09:27 No, I don't think it's fundamentally has to be that way.

09:30 I agree with you.

09:31 I think in the research as well, it uncovered it doesn't fundamentally have to be that way.

09:36 And in lots of cases, it isn't that way either.

09:38 Like there's ways to get around the slowdown, like the causes of slowdown.

09:44 And if you understand in what situations Python can be slow, then you can kind of like bypass

09:51 those.

09:51 Right.

09:52 So let me tell a really interesting story that comes from Michael Driscoll's book, Python

09:57 Interviews.

09:58 So over there, he interviewed, I think it was Alex.

10:02 Yeah, Alex Martelli.

10:03 And they talked about the history of YouTube, right?

10:07 YouTube's built on Python.

10:09 And why is that the case?

10:11 Originally, there was Google Video, which had hundreds of engineers writing, implementing

10:18 Google Video, which is going to be basically YouTube.

10:21 But YouTube was also a startup around the same time, right?

10:24 And they were kind of competing for features and users and whatnot.

10:26 And YouTube only had like 20 employees at the time or something like that, whereas Google

10:31 had hundreds of super smart engineers.

10:34 And Google kept falling behind farther and farther and not be able to implement the features that

10:39 people wanted nearly as quick as YouTube.

10:41 YouTube.

10:41 And the reason was they were all doing it in C++ and it took a long time to get that written.

10:47 And YouTube just ran circles around them with a, you know, more less than a fifth of the

10:52 number of people working on it.

10:53 So in some sense, like that's a testament of Python speed, right?

10:58 But it's not its execution speed.

11:00 It's like the larger view of speed, which is why I really wanted to find like what computational

11:04 speed is.

11:05 Another sense where it may or may not matter is like where you're doing stuff that waits,

11:10 right?

11:10 Somewhere where asyncio would be a really good option, right?

11:13 I'm talking to Redis.

11:14 I'm talking to this database.

11:15 I'm calling this API.

11:16 Like if 95% of your time is waiting on a network response, it probably doesn't matter, right?

11:21 As long as you're using some sort of async or something.

11:24 But then there's that other part where it's like I have on my computer, I've got six hyperthreaded

11:30 cores.

11:30 Why can I only use one twelfth of my computational power on my computer if I still write C code,

11:36 right?

11:37 So there's these other places where it super matters.

11:39 Or I just, like you said, there's this great example that we're going to talk about the

11:43 in-body problem, modeling like planets and how they interact with each other.

11:48 And I mean, just like to set the stage, what was the number for C versus Python in terms

11:53 of time, computation time?

11:54 To give people a sense, like why did we care?

11:56 Like why is this a big enough deal to worry about?

11:58 Is it, what is it like 30% slower?

12:00 It's a little bit slower.

12:01 Yeah.

12:01 It's a, for this algorithm, so this is called the end body problem and it's to do with calculating

12:07 the orbits of some of the planets in the solar system.

12:10 And you just do a lot, a really simple arithmetic operations.

12:15 So just adding numbers, but again and again and again.

12:17 So millions of times.

12:18 Lots of loops, lots of math.

12:20 Lots of math, lots of looping.

12:22 And in C, this implementation is seven seconds to complete.

12:27 And in Python, it's 14 minutes.

12:29 That might be a difference that you're needing to optimize away.

12:32 That could be too much, right?

12:34 Yeah.

12:34 I mean, everyone is calculating the orbits of planets as part of their day job.

12:38 So yeah.

12:39 You know, I honestly, I haven't really done that for at least two weeks.

12:44 No, but I mean, it's, it's fundamentally like I'm thinking about like, this is, I think this

12:48 undercovers one of the real Achilles heels of Python in that doing math in tight loops is really not super great in pure Python.

13:00 Right.

13:01 Whether that's planets, whether that's financial calculations or something else.

13:05 Right.

13:05 But numbers are very flexible, but that makes them inefficient.

13:08 Right.

13:09 Python is interpreted, which has a lot of benefits, but also can make it much slower as well.

13:15 Right.

13:15 Yeah.

13:16 So I think looking at this particular problem, because I thought it would be a good example,

13:20 it shines a bit of a spotlight on one of CPython's weaknesses when it comes to performance.

13:26 But in terms of like the loop, the only times you would be doing like a small loop and doing

13:31 the same thing over and over again is if you're doing like math work, doing like number crunching,

13:37 or if you're doing benchmarks, that's like one of the other reasons.

13:41 So like the way that a lot of benchmarks designed to do like computational benchmarks anyway,

13:47 is to do the same operation again and again.

13:49 So if there is an overhead or a slowdown, then it's magnified to the point where you can see

13:55 it a lot bigger.

13:55 Yeah, for sure.

13:56 I guess one thing to put out there before people run code, it doesn't go as fast as they'd hoped.

14:04 So they say that Python is slow, right?

14:07 Assuming the code they originally ran was Python like that.

14:09 That would be a requirement, I guess, is you probably should profile it.

14:13 You should understand what your code is doing and where it's slow.

14:17 Like, for example, if you're doing lookups, but your data structure is a list instead of

14:21 a dictionary, right?

14:23 You could make that a hundred times faster just by switching a date because you're just doing

14:26 the wrong type of data structure, the wrong algorithm.

14:29 It could be just that you're doing it wrong, right?

14:32 So I guess before people worry about like, is it executing too slowly?

14:37 Maybe you should make sure that it's executing the right thing.

14:40 Yeah, and it's unlikely that your application is running a very small operation, which is

14:47 this benchmark again and again, like millions of times in a loop.

14:50 And if you are doing that, there's probably other tools you could use and there's other

14:55 implementations you can do in Python.

14:59 This portion of Talk Python to Me is brought to you by Brilliant.org.

15:03 Brilliant's mission is to help people achieve their learning goals.

15:06 So whether you're a student, a professional brushing up or learning cutting edge topics,

15:10 or someone who just wants to understand the world better, you should check out Brilliant.

15:14 Set a goal to improve yourself a little bit every day.

15:17 Brilliant makes it easy with interactive explorations and a mobile app that you can use on the go.

15:22 If you're naturally curious, want to build your problem solving skills, or need to develop

15:26 confidence in your analytical abilities, then get Brilliant Premium to learn something new

15:31 every day.

15:32 Brilliant's thought-provoking math, science, and computer science content helps guide you

15:37 to mastery by taking complex concepts and breaking them into bite-sized, understandable chunks.

15:42 So get started at talkpython.fm/brilliant, or just click the link in your show notes.

15:50 Another benchmark I covered in the talk was the regular expression benchmark,

15:54 which Python is actually really good at.

15:57 So this is like the opposite to this particular benchmark.

16:01 So just saying that Python is slow isn't really a fair statement, because, and we'll kind of talk about this in a minute,

16:07 but like for other benchmarks, Python does really well.

16:11 So its string implementation is really performant.

16:14 And when you're working with text-based data, Python's actually a great platform to use, a great language to use.

16:20 The CPython compilers is pretty efficient at dealing with text data.

16:25 And if you're working on web applications or data processing, chances are you're dealing with text data.

16:32 Yeah, that's a good example.

16:33 Like the websites that I have, like the Talk Python training site, and the various podcast sites and stuff,

16:39 they're all in Python with no special, incredible optimizations, other than like databases with indexes and stuff like that.

16:46 And, you know, the response times are like 10, 30 milliseconds.

16:51 There's no problem.

16:52 Like it's fantastic.

16:53 It's really, really good.

16:54 But there are those situations like this in-body problem or other ones where it matters.

17:01 I don't know if it's fair or not to compare it against C, right?

17:04 C is really, really low level, at least from today's perspective.

17:10 It used to be a high level language, but now I see it as a low level language.

17:13 If you do a malloc and free and, you know, the address of this thing, right,

17:17 that feels pretty low level to me.

17:19 So maybe it's unfair.

17:21 I mean, you could probably get something pretty fast in assembly, but I would never choose to use assembly code these days

17:27 because it's just like I want to get stuff done and maintain it and be able to have other people understand what I'm doing.

17:31 But, you know, kind of a reasonable comparison, I think, would be Node.js and JavaScript.

17:38 And you made some really interesting compare and contrast between those two environments

17:43 because they seem like, well, like, okay, Python, at least it has some C in their JavaScript.

17:48 Who knows what's going on with that thing, right?

17:50 Like, you know, what's the story between those two?

17:52 Yeah, you make a fair point, which is, I mean, comparing C and Python isn't really fair.

17:56 One is like a strongly typed compiled language.

17:59 The other is a dynamically typed interpreted language and they handle memory differently.

18:06 Like in C, you have to statically or dynamically allocate memory and CPython is done automatically.

18:12 Like it has a garbage collector.

18:14 There's so many differences between the two platforms.

18:16 And so I think Node.js, which is, so Node.js is probably a closer comparison to Python.

18:24 Node.js isn't a language.

18:25 It's a kind of like a stack that sits on top of JavaScript that allows you to write JavaScript,

18:32 which operates with things that run in the operating system.

18:36 So similar to CPython, like CPython has extensions that are written in C that allow you to do things

18:43 like connect to the network or, you know, connect to like physical hardware

18:49 or talk to the operating system in some way.

18:51 Like if you just wrote pure Python and there was no C, you couldn't do that because the operating system APIs

18:56 are C headers in most cases.

18:59 Right.

18:59 Almost all of them are in C somewhere.

19:01 Yeah.

19:01 Yeah.

19:02 And with JavaScript, it's the same thing.

19:03 Like if you want to talk to the operating system or do anything other than like working with stuff

19:09 that's in the browser, you need something that plugs into the OS.

19:12 And Node.js kind of provides that stack.

19:15 So when I wanted to compare Python with something, I thought Node was a better comparison

19:21 because like JavaScript and Python, in terms of the syntax, they're very different.

19:25 But in terms of their capabilities, they're quite similar.

19:29 You know, they both have classes and functions and you can use them interchangeably.

19:33 They're both kind of like dynamically typed.

19:35 The scoping is different and the language is different.

19:37 But like in terms of the threading as well, they're quite similar.

19:42 Right.

19:42 They do feel much more similar.

19:44 But there's a huge difference between how they run, at least when run on Google's V8 engine,

19:51 which basically is the thing behind Node and whatnot, versus CPython is,

19:56 CPython is interpreted and V8 is JIT compiled, just in time compiled.

20:02 Yeah, so that's probably one of the biggest differences.

20:04 And when I was comparing the two, so I wanted to see, okay, which one is faster?

20:10 Like if you gave it the same task and if you gave it the end body problem,

20:13 then Node.js is a couple of multiples faster.

20:18 I think it was two or three times faster to do the same algorithm.

20:23 And for a dynamically typed language, you know, that means that they must have some optimizations,

20:28 which make it faster.

20:30 I mean, if you're running on the same hardware, then, you know, what is the overhead?

20:34 And kind of digging into it, I guess, in a bit more detail.

20:39 So JavaScript has this, actually there's multiple JavaScript engines, but kind of the one that Node.js uses

20:45 is Google's V8 engine.

20:47 So quite cleverly named, which is all written in...

20:52 Only would it be better if it were a V12, you know?

20:54 Or an inline six.

20:56 I think that's a better option.

20:57 Yeah, there you go.

21:01 So Google's V8 JavaScript engine is written in C++, so maybe that's a fair comparison.

21:07 But the optimizing compiler is called TurboFan, and it's a JIT optimizing compiler.

21:14 So it's a just-in-time compiler, whereas CPython is an ahead-of-time or an AIT compiler.

21:20 And it's JIT optimizer has got some really clever, basically sort of algorithms and logic

21:27 that it uses to optimize the performance of the application when it actually runs.

21:31 And these can make a significant difference.

21:33 Like some of the small optimizations alone can make 30%, 40% increase in speed.

21:39 And if you compare even just V8 compared to other JavaScript engines, you can see, like,

21:45 what all this engineering can do to make the language faster.

21:49 And that's how it got two, three multiples performance increases, was to optimize the JIT

21:54 and to understand, like, how people write JavaScript code and the way that it compiles the code

22:01 down into operations.

22:03 Then basically, like, it can reassemble those operations that are more performant for the CPU

22:08 so that when it actually executes them, does it in the most efficient way possible.

22:12 Right.

22:13 The difference between a JIT and an AOT is that the JIT compiler kind of makes decisions

22:18 about the compilation based on the application and based on the environment,

22:22 whereas an AOT compiler will compile the application the same and it does it all ahead of time.

22:29 Right.

22:29 So you probably have a much more coarsely-grained set of optimizations and stuff for an ahead-of-time compiler,

22:36 like C++ or something, right?

22:38 Like, I've compiled against x86 Intel CPU with, like, the multimedia extensions

22:47 or whatever, right?

22:48 The scientific computing extensions.

22:49 But other than that, I make no assumptions, whether it's multi-core, highly multi-core,

22:54 what its L2 cache is, none of that stuff, right?

22:57 It's just, we're going to kind of target modern Intel on macOS and do it on Windows

23:04 and compile that.

23:05 Yeah.

23:05 So modern CPU architectures and modern OSes can really benefit if you've optimized

23:12 the instructions that you're giving them to benefit, like, the caches that they have

23:17 or the cycles that they've set up and the sort of the turbo fan optimizer

23:22 for the VA engine takes a lot of advantage of those things.

23:25 Yeah.

23:25 That seems really powerful.

23:27 I guess we should step back and talk a little bit about how CPython runs,

23:32 but being an interpreter, it can only optimize so much.

23:37 It's got all of its byte codes and it's going to go through its byte codes

23:41 and execute them, but saying, like, well, these five byte codes, we could actually turn that

23:45 into an inline thing over here and I see this actually has no effect on what's loaded on the stack,

23:51 so we're not going to, like, push the item.

23:53 I mean, it seems like it doesn't operate optimizing, tell me if I'm wrong,

23:57 if it doesn't optimize, like, across lots of byte codes as it's thinking about it.

24:04 Yeah, so what CPython will do when it compiles your code, and it's also worth pointing out

24:08 that when you run your code for the first time, it will compile it, but when you run it again,

24:14 it will use the cached version, so...

24:16 Right, if you ever see the dunder pycache with .pyc files, that's, like,

24:21 three of the four steps of getting your code ready to run saved and done

24:25 and never done again.

24:26 Yeah, so that's, like, the compiled version.

24:28 So it's not...

24:29 If Python is slow to compile code, it doesn't really matter unless your code

24:33 is somehow changing every time it gets run, which I'd be worried about.

24:37 You have bigger problems.

24:38 Yeah, exactly.

24:39 So the benefits, I guess, of an AOT compiler is that you compile things ahead of time

24:45 and then when they execute, they should be efficient.

24:47 So CPython's compiler will kind of take your code, which is, like, a text file,

24:53 typically.

24:53 It'll look at the syntax.

24:55 It will parse that into an abstract syntax tree, which is a sort of representation of functions

25:02 and classes and statements and variables and operations and all that kind of stuff.

25:08 your code, your file, your module, basically, becomes like a tree and then what it does

25:13 is it then compiles that tree by walking through each of the branches and walking through

25:19 and understanding what the nodes are and then there is a compilation.

25:23 Basically, like, in the CPython compiler, there's a function for each type of thing

25:28 in Python.

25:28 so there's a compile binary operation or there's a compile class function

25:34 and a compile class will take a node from the AST, which has got your class in it

25:39 and it will then go through and say, okay, what properties, what methods does it have

25:44 and it will then go and compile the methods and then inside a method it will go and compile the statements.

25:48 So, like, once you break down the compiler into smaller pieces, it's not that complicated

25:53 and what a compiler will do is it will spit out so compiled basic frame blocks

25:59 they're called and then they get assembled into bytecode.

26:03 So, after the compiler stage, there is an assembler stage which basically figures out

26:08 in which sequence should the code be executed, you know, which basically,

26:13 like, what will the control flow be between the different parts of code,

26:17 the different frames.

26:18 In reality, like, they get executed in different orders because they depend on input

26:23 whether or not you call this particular function but still, like, if you've got a for loop,

26:27 then it's still got to go inside the for loop and then back to the top again.

26:31 Like, that logic is, like, hard-coded into the for loop.

26:34 Right.

26:35 You know, as you're talking, I'm wondering if, you know, minor extensions

26:39 to the language might let you do higher-level optimizations.

26:43 Like, say, like, having a frozen class that you're saying I'm not going to add any fields to

26:49 or, like, an inline on a function, like, I only, or make it a function internal

26:54 to a class in which it could be inlined, potentially, because, you know,

26:58 no one's going to be able to, like, look at it from the outside of this code and stuff.

27:02 What do you think?

27:03 There is an optimizer in the compiler called the peephole optimizer.

27:07 And when it's compiling, I think it's actually it's after the compilation stage,

27:12 I think, it goes through and it looks at the code that's been compiled and if it can make some

27:18 decisions about either, like, dead code that can be removed or branches which can be simplified,

27:25 then it can basically optimize that.

27:27 And that will make some improvement, like, it will optimize your code slightly.

27:31 Right.

27:32 But then once it's done, basically, your Python application has been compiled down

27:36 into this, like, assembly language called bytecode, which is the, like, the actual individual operations

27:42 and then they're executed in sequence.

27:45 They're split up into small pieces, they're split up into frames, but they're executed

27:50 in sequence.

27:50 Right.

27:51 And if you look at the C source code, dive into there, there's a C eval.c file

27:56 and it has, like, the world's largest while loop with a switch statement

28:01 in it, right?

28:02 Yeah.

28:02 So this is, like, the kind of the brain of CPython.

28:06 Oh, maybe it's not the brain, but it's the bit that, like, goes through each

28:10 of the operations and says, okay, if it's this operation, do this thing,

28:13 if it's that one, do this thing.

28:14 This is all compiled in C, so it's fairly fast, but it will basically sit and run the loop.

28:20 So when you actually run your code, it takes the assembled bytecode and then for each

28:26 bytecode operation, it will then do something.

28:29 So, for example, there's a bytecode for add an item to a list.

28:33 So it knows that it will make a value off the stack and it will put that

28:37 into the list or this one which calls a function.

28:40 So, if the bytecode is call function, then it knows to figure out how to

28:44 call that function in C.

28:46 Right.

28:46 Maybe it's loaded a few things on the stack, it's going to call it, do it just get sucked along,

28:51 something like that.

28:51 And so I guess one of the interesting things, and you were talking about an interesting

28:56 analogy about this, sort of when Python can be slow versus a little bit less slow,

29:02 it's the overhead of like going through that loop, figuring out what to do,

29:06 like preparing stuff before you call the CPython's thing, right?

29:10 Like list.sort, it could be super fast even for a huge list because it's just going

29:15 to this underlying C object and say, in C, go do your sort.

29:18 But if you're doing a bunch of small steps, like the overhead of the next step

29:24 can be a lot higher.

29:26 in the nbody problem, the step that it has to do, the operation it has to do,

29:30 will be add number A to number B, which on a decent CPU, I mean, this is like nanoseconds

29:36 in terms of time it takes to execute.

29:39 So if it's basically, if the operation that it's doing is really tiny, then after doing

29:46 that operation, it's got to go all the way back up to the top of the loop again,

29:49 look at the next barcode operation, and then go and run this, you know, call this thing,

29:56 which runs the operation, which takes again like nanoseconds to finish, and then it goes

30:00 all the way back around again.

30:01 So I guess the analogy I was trying to think of with the nbody problem is,

30:05 you know, if you were a plumber and you got called out to do a load of jobs

30:10 in a week, but every single job was, can you change this one washer on a tap for me,

30:16 which takes you like two minutes to finish, but you get a hundred of those jobs

30:21 in a day, you're going to spend most of your day just driving around and not actually doing

30:25 any plumbing.

30:26 You're going to be driving from house to house and then doing these like two

30:30 minute jobs and then driving on to the next job.

30:33 So I think the nbody problem, that's kind of an example of that is that the evaluation

30:39 loop can't make decisions, like it can't say, oh, if I'm going to do the same operation

30:43 again and again and again, instead of going around the loop each time, maybe I should just

30:49 call that operation the number of times that I need to.

30:53 and those are the kind of optimizations that a JIT would do because it kind of

30:56 changes the compilation order in sequence.

30:59 So that's, I guess like we could talk about there are JITs available for

31:03 Python.

31:04 Yes.

31:05 CPython doesn't have, CPython doesn't use a JIT, but for things like the

31:10 nbody problem, instead of the, you know, the plumber driving to every house and doing

31:15 this two minute job, why can't somebody actually just go and, why can't everyone

31:20 just send their tap to like the factory and he just sits in the factory all day

31:24 replacing the washers.

31:26 Like Netflix of taps or something, yeah.

31:28 Back when they sent out DVDs.

31:31 Maybe I was stretching the analogy a bit, but, you know, basically like you can

31:35 make optimizations if you know you're going to do the same job again and again

31:39 and again, or maybe like he just brings all the washers with him instead of driving

31:44 back to the warehouse each time.

31:45 So, like there's optimizations you can make if you know what's coming.

31:49 But because the CPython application was compiled ahead of time, it doesn't know

31:54 what's coming.

31:55 There are some opcodes that are coupled together, but there's only a few

32:00 like which ones they are off the top of my head, but there's only a couple and it doesn't

32:04 really add a huge performance increase.

32:05 Yeah, there have been some improvements around like bound method execution time and

32:10 methods without keyword arguments or some something along those lines that got quite a

32:14 bit faster.

32:14 But that's still just like how can we make this operation faster?

32:18 Not how can we say like, you know what, we don't need a function, let's inline that.

32:21 It's called in one place once, just inline it, right?

32:23 Things like that.

32:24 This portion of Talk Python to me is brought to you by Sentry.

32:29 How would you like to remove a little stress from your life?

32:32 Do you worry that users may be having difficulties or are encountering errors

32:36 with your app right now?

32:37 Would you even know it until they send that support email?

32:40 How much better would it be to have the error details immediately sent to you,

32:44 including the call stack and values of local variables, as well as the active user stored in

32:50 the report?

32:51 With Sentry, this is not only possible, it's simple and free.

32:54 In fact, we use Sentry on all the Talk Python web properties.

32:58 We've actually fixed a bug triggered by our user and had the upgrade ready to roll

33:03 out as we got the support email.

33:04 That was a great email to write back.

33:06 We saw your error and have already rolled out the fix.

33:09 Imagine their surprise.

33:10 Surprise and delight your users today.

33:12 Create your free account at talkpython.fm/sentry and track up to 5,000

33:18 errors a month across multiple projects for free.

33:20 So you did say there were some.

33:23 There was Pigeon, there's PyPy, there's Unladen Swallow, there's some other options as

33:33 well, but those are the JITs that are coming to mind.

33:35 Piston, all of those were attempts and I have not heard anything about any of them for a

33:39 year, so that's probably not a super sign for their adoption.

33:43 Yeah, so the ones I kind of picked on because I think they've got a lot of promise

33:46 and kind of show a big performance improvement is PyPy, which shouldn't be new.

33:52 I mean, it's a popular project, but PyPy uses a...

33:55 PY, PY, because some people say like Python package inject, they also call it PyPy, but

34:00 that's a totally different thing.

34:01 Yeah, so PyPy...

34:02 Just for listeners who aren't sure.

34:03 PyPy kind of helped solve the argument for my talk actually, because if Python is slow, then

34:10 writing a Python compiler in Python should be like really, really slow.

34:14 But actually, PyPy, which is a Python compiler written in Python, in problems like the n-body

34:21 problem, where you're doing the same thing again and again, it's actually really good at

34:26 it.

34:26 Like, it's significantly...

34:28 It's 700-something percent faster than CPython at doing the same algorithm.

34:33 Like, if you copy and paste the same code and run it in PyPy versus CPython, yeah, it will

34:40 run over seven times faster in PyPy, and PyPy is written in Python.

34:44 So it's an alternative Python interpreter that's written purely in Python.

34:49 But it has a JIT compiler.

34:51 That's probably the big difference.

34:52 Yeah.

34:52 As far as I understand it, PyPy is kind of like a half JIT compiler.

34:57 It's not like a full JIT compiler like, say, C# or Java, in that it

35:02 will, like, run on CPython and then, like, decide to JIT compile the stuff that's run a lot.

35:08 I feel like that's the case.

35:09 PyPy is a pure JIT compiler, and then number is a, you can basically choose to JIT

35:16 certain parts of your code.

35:17 So with number, you can use a, actually, a decorator, and you can stick it on.

35:22 An at JIT.

35:23 Yeah, it literally is that.

35:25 You can do an at JIT on a function, and it will JIT compile that function for

35:30 you.

35:30 So if there's a piece of your code which would work better if it were JITed, like it would be

35:35 faster, then you can just stick a JIT decorator on that using the number

35:40 package.

35:41 Yeah, that's really cool.

35:42 Do you have to, how do you run it?

35:44 I've got some function within a larger Python program, and I put an at JIT on it.

35:48 Like, how do I make it actually JIT that and, like, execute?

35:52 Can I still type Python space, I think, or what happens?

35:56 I don't know.

35:56 Do you know?

35:57 Yeah, I'm just wondering, like, it probably is the library that, as it pulls

36:02 in what it's going to give you back, you know, the wrapper, the decorator, the

36:05 function, it probably does JIT.

36:07 So interesting.

36:07 I think that's a really good option.

36:09 Of all the options, honestly, I haven't done anything with Numba, but it looks like probably the

36:13 best option.

36:13 It sounds a little bit similar to Cython, but Cython's kind of the upfront style, right?

36:20 Like, we're going to pre-compile this Python code to see, whereas Numba, it sounds more, a

36:25 little more runtime.

36:26 Yeah, so Cython is not really a JIT or a JIT optimizer.

36:31 It's a way of decorating your Python code with type annotations and using, like, a sort of

36:40 slightly different syntax to say, oh, this variable is this type, and then Cython will

36:47 actually compile that into a C extension module, and then you run it from CPython.

36:51 So it basically, like, compiles your Python into C and then loads it as a

36:57 C extension module, which can make a massive performance improvement.

37:01 Yeah, so you've got to run a, like, a set of py build command to generate the libraries, the

37:07 .o files, or whatever the platform generates, and then those get loaded in.

37:13 Even if you change the Python code that was their source, you've got to recompile them, or it's

37:18 just still the same old compiled stuff, same old binaries, yeah.

37:21 You can automate that so you don't have to type it by hand, but I think Cython is a really good

37:26 solution for speeding it up.

37:28 But as I kind of pointed out in my talk, it doesn't answer the question of why Python is

37:32 slow.

37:33 It says, well, Python can be faster if you do C instead.

37:37 Yeah.

37:37 One thing I do like about Cython these days is they've adopted the type hints,

37:42 type annotation format.

37:44 So if you have, what is that, Python 3, 4, or later type annotations, you

37:51 got to be explicit on everything.

37:53 But if you have those, that's all you have to do to turn it into like official Cython, which is

38:00 nice because it used to be you'd have to have like a C type or Cython type dot

38:04 int rather than a, you know, colon int or something funky like that.

38:08 Yeah.

38:08 And it's nice that they brought the two things together.

38:10 Cython like had type annotations before the language did, I think.

38:14 Right.

38:14 Yeah.

38:15 So they had their own special way.

38:16 They had their own special little sub language that was Python-esque, but not

38:20 quite.

38:20 So I was looking at this nbody problem and I thought, all right, well, I

38:24 probably should have played with Numba, but I have a little more experience with

38:27 Cython.

38:27 So let me just see, like the code is not that hard and I'm going in terms of

38:32 like how much code is there or whatever.

38:34 Sure.

38:34 The math is hard, but the actual execution of it isn't.

38:37 So I'll link to the actual Python source code for the nbody problem.

38:41 And I ran it.

38:42 It has some defaults that are much smaller than the one you're talking about.

38:45 So if you run it, just hit run.

38:46 It'll run for like two on my machine.

38:48 It ran for 213 milliseconds just in pure CPython.

38:52 So I said, all right, well, what if I just grab that code and I just plunk it

38:56 into a PYC file unchanged.

38:59 I didn't change anything.

39:00 I just moved it over.

39:01 I got it to go into 90 milliseconds, which is like 2.34 times faster.

39:06 And then I did the type hints that I told you about.

39:09 Because if you don't put the type hints, it'll still run, but it will work at the, the, the

39:14 pie object level.

39:16 Like, so your numbers are pie object numbers, not, you know, ints and floats down.

39:22 So you make it a little bit faster.

39:23 So, but I was only able to get it four times faster down to 50 milliseconds.

39:26 Either I was doing it wrong or that's just about as that much faster as I

39:30 can get it.

39:31 I could have been missing some types and it was still doing a little more

39:33 CPython interrupt stuff.

39:36 But yeah, I don't know.

39:37 It's, it's an interesting challenge.

39:39 I guess the last thing to talk about, like on this little bit right here is

39:42 the, is my PYC.

39:44 Yeah.

39:44 I didn't know much about my PYC.

39:46 I don't know a lot about it either.

39:47 So my PY is a type checking library and verification library for the type annotations.

39:54 Right.

39:54 So if you put these type annotations in there, they don't do anything at runtime.

39:57 They're just like there to tell you stuff.

39:59 Right.

40:00 But things like certain editors can partially check them or my PY can like

40:05 follow the entire chain and say this code looks like it's typewise hanging

40:09 together.

40:10 Not like a pure five levels.

40:12 pass an integer and you expect a string.

40:14 So it's broken.

40:15 Right.

40:15 It can check that.

40:16 So they added this thing called my PYC, which can take stuff that is annotated in a way that

40:22 my PY works with, which is basically type annotations, but more.

40:25 And they can compile that to see as well, which they also interestingly got

40:30 like a four times speed up with stuff, not in the embody problem, but on my PY.

40:34 So I don't know.

40:34 It's, there's a lot of options, but as you point out, they are a little bit dodging Python.

40:41 The number stuff is cool because I think you don't really write different code.

40:45 Do you?

40:46 Yeah, it's been more natural.

40:47 And I think PYPY, like you're saying you kind of got two to four times improvement by moving

40:54 things to Siphon.

40:55 And it took a decent amount of work, right?

40:56 Because every loop variable had to be declared somewhere else because you can't set the

41:00 type or the type annotation inside the loop declaration, right?

41:03 Like it wasn't just put a colon in.

41:05 I had to do like a decent amount of work to drag out the types.

41:08 Yeah.

41:08 And whereas PYPY will be a seven times improvement in speed, for that problem.

41:13 Yeah.

41:14 And there's no C compilation.

41:15 Yes.

41:16 That's really nice.

41:17 That's really nice.

41:18 So we talked about JITs and JITs are pretty interesting.

41:21 To me, I feel like JITs often go together with garbage collection in the entirely

41:28 unmanaged sort of non-deterministic sense of garbage collection, right?

41:33 Not reference counting, but sort of the mark and sweep style.

41:37 So Python, I mean, maybe we could talk about GC at Python first and then

41:41 if there's any way to like change that or advantages there, disadvantages.

41:46 From the Instagram story that they saw a performance improvement when they turned off GC.

41:52 Yeah, like we're going to solve the memory problem by just letting it leak.

41:55 Like literally, we're going to disable garbage collection.

41:58 Yeah, I think they got like a 12% improvement or something.

42:01 It was significant.

42:02 They turned it off and then they just restarted the worker processes every 12 hours

42:06 or something like that.

42:06 And it wasn't that bad.

42:07 The GC itself, like to your, I said there's another problem that I studied

42:13 which was the binary tree problem.

42:15 And this particular problem will show you the impact of the garbage collector

42:22 performance on, like in this particular algorithm, this benchmark, it will show you

42:27 how much your GC slows down the program.

42:30 And again, I wanted to compare Node with Python because they both have both reference

42:35 counting and garbage collection.

42:36 So the garbage collector with Node is a bit different in terms of its design,

42:42 but both of them are a stop everything garbage collector.

42:45 So, you know, CPython has a main thread, basically, and the garbage collector will run

42:52 on the main thread and it will run every number of operations.

42:55 So, I think the, I can't remember what the default is, it's like 3,000 or

42:59 something.

42:59 Every 3,000 operations in the first generation where an object has been assigned

43:05 or deassigned, then it will run the garbage collector, which goes and inspects every,

43:09 every list, every dictionary, every, what other types, like custom objects,

43:14 and sees if they have any circular references.

43:17 Right, and the reason we need the GC, which does this, is because it's not

43:21 even the main memory management system, because if it was, Instagram would not

43:26 at all be able to get away with that trick.

43:27 Right, this is like a, a final net to catch the stuff that reference counting doesn't

43:33 work.

43:33 Normally, like if there's some references to an object, once things stop

43:37 pointing at it, the last one that goes, it just poof, it disappears.

43:41 But the challenge of reference counting garbage collection is if you've got like

43:46 some kind of relationship where one thing points at the other, but that thing

43:49 also points back to itself, right, like a couple object, right, a person object

43:54 with a spouse pointer or something like that, right?

43:57 If you're married, you're going to leak.

43:58 Yeah, absolutely.

43:59 So this is the thing you're talking about, those types of things that's addressing.

44:02 And it's kind of designed on the assumption that most objects in CPython

44:06 have very short lifespans.

44:09 So, you know, they get created and then they get destroyed shortly afterwards.

44:13 So like local variables inside functions or, you know, like local variables

44:17 inside list comprehensions, for example, like those can be destroyed pretty much

44:22 straight away.

44:22 But the garbage collective will stop everything running on the main thread

44:26 while it's running because it has to because you can't, you know, if it's deleting stuff

44:30 and there's something else running at the same time that's expecting that thing

44:34 to exist, it's going to cause all sorts of problems.

44:36 So yeah, the GC will kind of slow down your application if it gets hit a lot.

44:41 And the binary tree problem will basically construct a series of trees and then loop

44:47 through them and then delete the nodes and the branches, which kind of triggers

44:51 the GC to run a lot.

44:53 And then you can compare the performance of the garbage collectors.

44:56 So one thing I kind of noted in the design is that they stop everything.

45:02 If the time it takes to run the garbage collector could be as short as possible,

45:06 then the performance hit of running it is going to be smaller.

45:08 And something that Node does is it runs a multi-threaded mark process.

45:13 So when it actually goes and looks for circular references, it actually starts

45:18 looking before it stops the main thread on different helper threads.

45:23 So it starts separate threads and starts the mark process.

45:26 And then it still stops everything on the main process, but it's kind of prepared all its

45:31 homework ahead of time.

45:32 It's already figured out what is garbage before it stops stuff.

45:36 And it's like, now we just have to stop what we throw it away and update the pointers and

45:41 then you can carry on, right?

45:42 Because it's got to, you know, balance the memory and stop allocation and whatnot.

45:45 Yeah.

45:46 So I think technically that's possible in CPython.

45:49 I don't think it has anything to do with the GIL either, like why that couldn't be

45:53 done.

45:53 You could still do...

45:55 Right.

45:55 It seems like it totally could be done.

45:56 Yeah.

45:56 Yeah.

45:57 Because marking and finding circular references could be done outside of the gill

46:01 because it's a C-level call.

46:02 It's not an opcode.

46:04 But like I say in the talk, you know, all this stuff that I've listed so far is a lot of

46:11 work and it's a lot of engineering work that needs to go into it.

46:14 And if you actually look at the CPython compiler, like the CEval, and look

46:20 at the number of people who've worked on or contributed to it, it's less than 10

46:24 like to the core component.

46:27 I wouldn't want to touch it.

46:28 I would not want to get in there and be responsible for that part of it.

46:31 No way.

46:31 Yeah.

46:33 And at this stage, they're minor optimizations.

46:35 They're not sort of big overhauls because there just isn't the people to do it.

46:41 Yeah.

46:41 You made a point in your PyCon talk that, you know, the reason that V8 got to be so

46:47 optimized so fast is because it's got, you know, tens of millions of dollars of

46:51 engineering put against it yearly.

46:54 Right?

46:55 I mean, it's kind of part of the browser wars.

46:58 The new browser wars a bit.

47:00 Yeah.

47:00 From what I could work out, there's at least 35 permanent developers working on

47:05 it.

47:05 Just looking at the GitHub project, like if you just see the commit histories, like

47:10 nine to five, Monday to Friday, 35 advanced C++ developers hacking away at it.

47:16 Right.

47:16 If we had that many people continuously working on CPython's like internals and

47:22 garbage collection and stuff, we'd have more optimizations or bigger projects that people

47:26 will try to take on probably.

47:27 Yeah, absolutely.

47:27 And the people who work on it at the moment, all of them have day jobs and this

47:32 is not typically their day job.

47:33 Like they managed, they've convinced their employer to let them do it in their spare time

47:38 or, you know, one or two days a week, for example, and they're finding the time to do

47:42 it.

47:42 And it's a community run project.

47:44 it's an open source project.

47:45 But I think kind of going back to places where Python could be faster, like these kind

47:51 of optimizations in terms of engineering, they're expensive optimizations.

47:56 They cost a lot of money because they need a lot of engineering expertise and a lot of

48:01 engineering time.

48:02 And I think as a project at the moment, we don't really have that luxury.

48:06 So it's not really fair of me to complain about it if I'm not contributing to the

48:12 solution.

48:12 Yeah, but you have a day job as well, right?

48:14 But I have a day job and this is not day job.

48:16 So yeah, I think there's, I think for what we use Python for most of the

48:21 time, it's definitely fast enough.

48:23 And in places where it could have optimizations like the ones that we talked about, those

48:28 optimizations have drawbacks because, you know, adding a JIT, for example, means that it

48:34 uses a lot more memory.

48:35 like the Node.js example, the n-body problem, sure, it finishes it faster, but

48:40 uses about five times more RAM to do it.

48:42 Right.

48:43 And PyPy uses more memory, like the JIT compiler, and also the startup time of the

48:48 process is typically a lot longer.

48:50 If anyone's ever tried to boot Java JVM cold, you know, like the startup time for

48:57 JVM is pretty slow.

48:58 .NET's the same, like the initial boot time for it to actually get started and warm up

49:03 is time consuming.

49:05 So you wouldn't use it as a, like a command line tool to write a simple script

49:10 that you'd expect to finish in, you know, under 100 milliseconds.

49:13 I think that that kind of highlights one of the challenges, right?

49:16 It's if you thought your process was just going to start and be a web server or a desktop

49:21 application, two seconds start of time is fine, or whatever that number is.

49:26 But if it's solving this general problem, yeah, it could be running Flask as

49:30 a microservice, or it could be, you know, replacing Bash, right?

49:35 Like these are very different constraints and interests, right?

49:38 Yeah.

49:38 And there aren't really many other languages where there is one sort of language definition and

49:44 there are multiple mature implementations of it.

49:47 So, you know, with Python, you know, you've got Cython, you've got PyPy, you've got

49:53 Numba, you've got LionPython.

49:55 I mean, there's like a whole list of, you know, different, Jython, like different implementations

50:02 of the language.

50:02 And people can choose the, I guess, kind of pick which one is best for the problem that they're

50:08 trying to solve, but use the same language across them.

50:10 Whereas you don't really have that luxury with others.

50:12 You know, if you're writing Java, then you're using JVM.

50:15 There are, I mean, there's two implementations.

50:17 It's the free one and the licensed one, but like that's pretty much as far as it goes.

50:22 That's not exactly the same trade-off.

50:25 Yeah.

50:25 It's optimizing for money.

50:26 That's not optimizing for performers or whatever necessarily.

50:30 So one thing that I feel like comes around and around again in this discussion, and I'm

50:36 thinking mostly of like PyPy and some of these other attempts people have made to add like

50:41 JIT compilation to the language or other changes.

50:43 It's always come back, it seems like, to well, it would be great to have these

50:49 features.

50:49 Oh yeah, but there's this thing called the C API.

50:52 And so no, we can't change the GIL.

50:54 No, we can't change memory allocation.

50:56 No, we can't change any of these other things because of the C API.

51:01 And so we're stuck.

51:02 Yeah.

51:03 I mean, I'm not saying I'm asking you for a solution here.

51:07 like, I just, it feels like that is both the real value of Python in that like some of the

51:15 reasons that we can still do insanely computational stuff with Python is

51:20 because a lot of these libraries where they have these tight loops or these little bits of code

51:24 deserialization or matrix multiplication or whatever, they've written that in C

51:29 and then ship that as a wheel.

51:31 And so now all of a sudden our code is not slow as doing math with Python,

51:35 as fast as doing math with C.

51:37 Yeah.

51:37 I mean, so if you look at a NumPy, for example, if you're doing a lot of

51:41 math, then you, you know, you could be using the NumPy library, which is

51:45 largely compiled C code.

51:47 It's not like you import it from Python and you run it from Python, but the

51:51 actual implementation is a C extension.

51:54 And that wouldn't be possible if CPython wasn't built in the way it is, which is that it is a

52:00 ahead of time extension loader that you can run from Python code.

52:04 Yeah.

52:05 One project I do want to give a shout out to, I don't know if it's going to

52:08 go anywhere.

52:08 It's got a decent amount of work on it, but it's only got 185 GitHub stars.

52:13 So take that for what it's worth.

52:14 This thing called HPY, H-P-Y.

52:17 Guido Van Rossum called this out on Python Bytes 179 when he was a guest co-host there.

52:24 And it's an attempt to make a new replacement of the C API for Python, where instead of

52:33 pass around pointers to objects, you pass basically pointers to pointers, which

52:37 means that things that move stuff around like compacting garbage collectors or other

52:44 implementations like JITs have a much better chance to change things without directly breaking

52:49 the C API.

52:50 You can change the value of the pointer pointer without, you know, having to reassign that

52:55 down at that layer.

52:56 So they specifically call out it's, you know, the current C API makes it hard for things like

53:02 PyPy and Grail Python and JITon.

53:04 And the goals are to make it easier to experiment with these ideas, more friendly for other

53:10 implementations, reference counting, for example, and so on.

53:14 So anyway, I don't know that's going anywhere, how much traction it has, but it's interesting

53:19 idea.

53:20 Yeah, no, I like the idea.

53:21 And the C API, like, has come a long way, but it's got its quirks.

53:26 I don't know, there's been a lot of discussions, and there's a lot of draft peps as well, you know,

53:31 proposing kind of different designs to the C API.

53:34 Yeah.

53:35 So we're getting kind of short on time.

53:36 We've discussed a bunch of stuff.

53:38 I guess two other things I'd like to cover real quickly.

53:41 One, we've talked about a lot of stuff in terms of computational things, but understanding memory

53:48 is also pretty important.

53:49 And we did just talk about the GC.

53:50 It's pretty easy in Python to just run C profile and ask what my computational time is.

53:57 It's less obvious how to understand memory allocation and stuff.

54:00 And was it you that recommended Austin to me?

54:03 Yeah.

54:04 Yeah, so Austin is a super cool profiler, but does both CPU profiling, but also memory

54:09 allocation profiling and tracing in Python.

54:13 Do you want to tell people about Austin real quick?

54:14 Yeah, so Austin is a new profiler written for Python code.

54:18 It's a sampling profiler, so it won't, like other profilers, it won't slow your code down

54:23 significantly.

54:23 It's kind of basically sits on the side, just asking your app, you know, what it's doing as a

54:30 sample.

54:30 And then it will give you a whole bunch of visuals to let you see, like flame graphs, for example,

54:36 like what's being called, what's taking a long time, which functions are chewing up your CPU, like

54:42 which ones are causing the bottlenecks and then which ones are consuming a lot of memory.

54:46 So if you've got a, you know, a piece of code that is, it is slow, the first thing you should probably

54:52 do is just stick it through a profiler and see if there is a reason why, like if there is

54:57 something that you could either optimize or, you know, you've accidentally done like a nested

55:02 loop or something and Austin would help you do that.

55:06 One of the things I thought was super cool about this, like the challenge I have so often with

55:10 profilers is the startup of whatever I'm trying to do, it just overwhelms like the little thing I'm

55:16 trying to test.

55:18 you know, I'm like starting up a web app and initializing database connections and I just want to

55:22 request a little bit of some paid and it's not that slow, but you know, it's just, I'm seeing all

55:28 this other stuff around and I'm just like, I just want to focus on this one part of it and they've got all

55:33 these different user interfaces, like a web user interface in a terminal

55:37 user interface.

55:38 They call it two, which is cool.

55:39 And it gives you like a, like kind of like top or glances or one of these things that tells you right now,

55:45 here's what the profile for the last five seconds looks like.

55:48 And it gives you the call stack and breakdown of your code right now for

55:53 like that five second segment, like updating in real time.

55:56 That's super cool.

55:56 Yeah.

55:57 So if you want to run something and then just see what it's doing or you

56:00 want to replay it.

56:01 Why is it using a lot of CPU now?

56:02 Yeah.

56:03 Yeah.

56:03 Yeah.

56:03 That's, I really like that.

56:04 That's super cool.

56:05 All right.

56:06 Also, you know, concurrency is something that Python has gotten a bad rap for in terms of slowness.

56:11 I think with async and await and asyncio, if you're waiting on an external thing, Python can be ultra

56:17 fast now, right?

56:18 Like it's acing and awaiting waiting on like database calls, web calls with the right drivers, super fast.

56:26 But when it comes down to the computational stuff, there's still the GIL and there's really not a

56:30 great fix for that.

56:32 I mean, there's multiprocessing, but that's got a lot of overhead.

56:35 So it's got to make sense, right?

56:36 Kind of like your plumber analogy, right?

56:38 You can't do like one line function calls in multiprocessing or, you know, like one line computations.

56:45 But the work that Eric Snow's doing with subinterpreters looks pretty promising to unlock another layer.

56:50 Yeah.

56:50 So it's out in the 3.9 alpha.

56:53 If you've played with that yet, it's still experimental.

56:56 So subinterpreters is somewhere in between multiprocessing and threading in terms of the like the

57:03 implementation.

57:04 So it will it doesn't spawn.

57:06 So if you use multiprocessing, I mean, that's basically just saying let's hire another plumber and we'll get

57:13 them to talk to each other at the beginning of the day and split up the tasks.

57:17 Whereas subinterpreters, actually, maybe they're sharing the same van.

57:20 I'm not sure where this analogy is going, but, you know, they use the same process.

57:25 The subinterpreters share the same Python process.

57:27 It doesn't spawn up an entirely new process.

57:29 It doesn't have to load all the modules again.

57:33 And the subinterpreters can also talk to each other.

57:36 they can use shared memory to communicate with each other as well.

57:40 But because they're separate interpreters, then technically they can have their own locks.

57:47 So the lock that, you know, gets locked whenever you run any opcode is the

57:52 interpreter lock.

57:52 And this basically means that you can have two interpreters running in a

57:57 single process, each with its own lock.

57:59 So it can be running different operations at the same time.

58:03 Right.

58:04 They would automatically run on separate threads.

58:06 So you're basically running multi-threading and it can also use multi-CPU.

58:10 That'd be great.

58:11 Fundamentally, the GIL is not about a threading thing per se.

58:16 It's about serializing memory access allocation and deallocation.

58:21 And so with the subinterpreters, the idea is you don't directly share pointers

58:26 between subinterpreters.

58:27 There's like a channel type of communication between them.

58:30 So you don't have to take a lock on one when it's working with objects versus another, like they're entirely

58:36 different set of objects.

58:37 They're still in the same process space, but they're not actually sharing

58:40 pointers.

58:40 So you don't need to protect each other.

58:42 Right.

58:42 You just have to protect within each subinterpreter, which has a possibility to let me use all six of

58:47 my cores.

58:48 Yeah, absolutely.

58:49 You can't read and write from the same local variables for that reason, which you can do in threading.

58:54 But with subinterpreters, it's kind of like a halfway halfway between just

58:58 running a separate process.

58:59 Yeah.

58:59 It probably formalizes some of the multi-threading communication styles that are going to keep things safer

59:05 anyway.

59:06 Definitely.

59:07 Yeah.

59:07 All right.

59:08 Let's talk about one really quick thing before we wrap it up.

59:10 Just one interesting project that you've been working on.

59:13 I mentioned that you were on before about some security issues, right?

59:16 Yeah.

59:16 I want to tell people about your PyCharm extension that you've been working on.

59:19 Yeah.

59:19 So I've been working on a PyCharm extension called Python Security.

59:23 It's very creatively named.

59:25 It's available.

59:27 Take the straightforward.

59:28 Yeah, exactly.

59:29 So it's basically like a code checker, but it runs inside PyCharm and it will

59:34 look for security vulnerabilities that you may have written in your code and

59:39 underline them for you and in some cases fix them for you as well.

59:42 So it will say the thing you've done here is really bad because it can cause

59:46 someone to be able to hack into your code and you can just press the quick

59:50 fix button and it could fix it for you.

59:52 So it's got actually over a hundred different inspections now.

59:56 And also you can run it across...

59:58 Should I use the YAML.load still?

01:00:00 Is that good?

01:00:00 No.

01:00:01 I think that was like the first checker, right?

01:00:05 Actually, it was the YAML.load.

01:00:07 Yeah, you can run it across the whole project.

01:00:09 So you can do a code inspection across your project to like a code audit.

01:00:12 And also it uses PyCharm's package manager.

01:00:15 So it will go in and look at all the packages you have installed in your project and it will

01:00:21 check them against either Snyk, which is a big database of vulnerable Python packages.

01:00:28 Snyk.io uses their API.

01:00:31 So it checks it against that or you can check it against like your own list.

01:00:36 And also it's available as a GitHub action.

01:00:39 So manage to figure out how to run PyCharm inside Docker so that you can run PyCharm from

01:00:46 GitHub action.

01:00:47 Wow.

01:00:48 Yeah, you can write a CICD script in GitHub to just say inspect my code and it will just

01:00:55 inside GitHub.

01:00:56 You don't need PyCharm to do it, but it will run the inspection tool against your code repository.

01:01:01 It just requires that it's open source to be able to do that.

01:01:03 Okay, that's super cool.

01:01:04 All right, well, we're definitely out of time, so I have to leave it there.

01:01:07 Two quick questions.

01:01:08 Favorite editor, notable package?

01:01:10 What do you got?

01:01:10 PyCharm and I don't know about the notable package.

01:01:14 I don't know.

01:01:15 Yeah, you've been too far in the C code.

01:01:16 Yeah, I know.

01:01:17 I'm like, what are packages?

01:01:18 I think there's something that does install those, but they don't work down in C.

01:01:22 Yeah, no, that's cool.

01:01:23 All right, so people are interested in this.

01:01:26 They want to maybe understand how CPython works better or how that works and where and why

01:01:31 it might be slow so they can avoid that.

01:01:33 Or maybe they even want to contribute.

01:01:35 What do you say?

01:01:35 Wait for my book to come out and read the book or read the real Python article, which is free

01:01:40 and online.

01:01:40 And it talks through a lot of these concepts.

01:01:43 Yeah, right on.

01:01:43 Well, Anthony, thanks for being back on the show.

01:01:46 Great, as always, to dig into the internals.

01:01:48 Thanks, Michael.

01:01:48 Yeah, you bet.

01:01:49 Bye.

01:01:49 Bye.

01:01:49 Bye.

01:01:50 Bye.

01:01:51 This has been another episode of Talk Python to Me.

01:01:54 Our guest on this episode was Anthony Shaw, and it's been brought to you by Brilliant.org

01:01:59 and Sentry.

01:01:59 Brilliant.org encourages you to level up your analytical skills and knowledge.

01:02:04 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every

01:02:10 day.

01:02:10 Take some stress out of your life.

01:02:12 Get notified immediately about errors in your web applications with Sentry.

01:02:17 Just visit talkpython.fm/sentry and get started for free.

01:02:21 Want to level up your Python?

01:02:23 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:02:28 Or if you're looking for something more advanced, check out our new Async course that digs into

01:02:33 all the different types of async programming you can do in Python.

01:02:36 And of course, if you're interested in more than one of these, be sure to check out our

01:02:40 Everything Bundle.

01:02:41 It's like a subscription that never expires.

01:02:43 Be sure to subscribe to the show.

01:02:45 Open your favorite podcatcher and search for Python.

01:02:47 We should be right at the top.

01:02:48 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:02:53 and the direct RSS feed at /rss on talkpython.fm.

01:02:57 This is your host, Michael Kennedy.

01:02:59 Thanks so much for listening.

01:03:01 I really appreciate it.

01:03:02 Now get out there and write some Python code.

01:03:04 I'll see you next time.

01:03:24 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon