Why is Python slow?

Episode #265, published Tue, May 19, 2020, recorded Tue, May 12, 2020

Episode Deep Dive Links Transcript

The debate about whether Python is fast or slow is never-ending. It depends on what you're optimizing for: Server CPU consumption? Developer time? Maintainability? There are many factors. But if we keep our eye on pure computational speed in the Python layer, then yes, Python is slow.

In this episode, we invite Anthony Shaw back on the show. He's here to dig into the reasons Python is computationally slower than many of its peer languages and technologies such as C++ and JavaScript.

Episode Deep Dive

Guest introduction and background

Anthony Shaw is a seasoned Python developer and active member of the Python community. He works at NTT, leads learning and development initiatives, and has also contributed significantly to Python’s ecosystem, from authoring articles on CPython internals at Real Python to writing an in-depth book on the subject. Anthony regularly dives into CPython source code, examining performance, security, and optimization. He also created a PyCharm plugin called “Python Security” to help developers identify and fix common security flaws in their Python code.

What to Know If You're New to Python

Below are some tips if you’re just starting out and want to grasp the main ideas from this episode:

Python's Nature: Python is easy to learn, but remember it’s also an interpreted language with dynamic typing, which affects how code executes.
Focus on Big Picture: The episode talks a lot about where Python is “slow,” but also why it’s fast enough in many real-world situations. Don’t be discouraged, just keep an eye on how and where performance might matter.
Stay Curious: Tools like PyPy (an alternative interpreter) or Cython can make Python code run faster. Even if you’re new, knowing these exist helps you expand later.
Understand CPython Basics: Key discussions revolve around the Python interpreter itself and how it works internally (GC, GIL, etc.). Getting a broad sense of these concepts will help you appreciate the deeper topics.

Key points and takeaways

Why Python Can Appear “Slow” Python is often labeled as “slow” in purely computational scenarios, especially tight loops doing millions of math operations. However, this perspective typically ignores development speed, maintainability, and the fact that many tasks are not CPU-bound.
- Tools / Links:
  - Python official docs
  - Anthony’s CPython Internals Article on Real Python
When Python Is Actually “Fast Enough” Many real-world Python applications spend much of their time waiting on external resources (e.g., databases, networks). In these cases, Python’s “speed” is sufficient, especially with frameworks that leverage async/await to handle I/O efficiently.
- Tools / Links:
  - FastAPI
  - Asyncio Documentation
The N-Body Problem Example (7 seconds in C vs. 14 minutes in Python) A striking benchmark is the N-Body orbital simulation. Implemented in C, it might run in seven seconds; in pure CPython, it could take 14 minutes. This highlights Python’s overhead for tight arithmetic loops.
- Tools / Links:
  - N-body Problem Reference (Rosetta Code)
  - Cython
Just-in-Time (JIT) Compilers: PyPy and Beyond Unlike CPython, interpreters like PyPy bring JIT compilation to Python. For heavily repeated operations, JITs can yield massive speedups (e.g., 7× improvement on the N-Body problem). The trade-off is higher memory usage and longer startup times.
- Tools / Links:
  - PyPy
  - Numba (decorator-based JIT)
Cython and mypyc for Targeted Speedups Instead of switching interpreters entirely, developers often choose Cython or mypyc to compile performance-critical Python sections to native code. This can dramatically reduce CPU time by turning Python code into compiled extensions, though it requires extra tooling and, typically, adding type hints.
- Tools / Links:
  - Cython
  - mypy and mypyc
Reference Counting and Garbage Collection Python primarily uses reference counting plus a garbage collector for circular references. This approach can pause execution to reclaim memory, which might affect performance. Projects like Instagram have even turned off GC to speed up certain workloads (restarting processes instead to manage memory).
- Tools / Links:
  - Python Garbage Collection Docs
  - Austin Python Profiler
Concurrency, the GIL, and Subinterpreters Python’s Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time. Solutions include multiprocessing, using alternative interpreters, or soon, subinterpreters (per-interpreter locks) under development for CPython. This may open the door to more true parallelism in a single process.
- Tools / Links:
  - PEP 554: Multiple Interpreters in the Stdlib
  - Eric Snow’s Subinterpreter Work
The Power of Native Extensions and “The C API” Python’s C API is central to libraries like NumPy, Pandas, and much of the data science stack. It’s also the reason the language can’t drastically change memory models or fully remove the GIL without breaking these essential libraries. Projects like HPy attempt to modernize this API.
- Tools / Links:
  - HPy Project
  - NumPy
Profiling and Optimization Techniques Before declaring Python “too slow,” it’s crucial to profile. Tools like CProfile and Austin can pinpoint bottlenecks. Sometimes, algorithmic changes or data structure improvements are enough to fix performance issues.
- Tools / Links:
  - Austin Profiler
  - cProfile Module Docs
Anthony’s Python Security Plugin and CPython Explorations Beyond performance, Anthony is heavily involved in security tooling for Python. His free PyCharm plugin, “Python Security,” helps detect vulnerabilities in code. He also emphasizes that diving into CPython’s source (written in C) can expand understanding of Python’s design.

Tools / Links:
- Python Security PyCharm Plugin
- CPython Source Code on GitHub

Interesting quotes and stories

“If I gave it the same algorithm, C++ or JavaScript on top of Node or Python, then yeah, it might be slower in Python. But that's not always the story. Sometimes Python’s faster for text processing or dev speed.” , Anthony Shaw

“It's basically if you were a plumber, and you had a hundred jobs in a day, but each job is just, ‘change a washer on a tap.’ You’d spend more time driving between houses than actually doing plumbing.” , Anthony Shaw, on the tight loop overhead

“YouTube, built in Python, was outpacing Google Video, built in C++, with a fraction of the engineers. That’s speed, just not the CPU kind.” , Michael Kennedy, referencing a historical anecdote

Key definitions and terms

Interpreter: A program that directly executes instructions written in a programming language without requiring them to be compiled into machine code.
JIT (Just-in-Time Compilation): A technique where code is compiled during execution, allowing runtime optimizations that can significantly speed up repeated operations.
Reference Counting: Python’s memory management system where objects are deallocated once no references to them remain.
GIL (Global Interpreter Lock): A mechanism in CPython that ensures only one thread runs Python code at once, simplifying memory management but limiting true parallelism.
Subinterpreters: Proposed feature allowing multiple interpreters in a single Python process, each with its own GIL, to achieve better concurrency.

Learning resources

Here are some courses that can help deepen your knowledge:

Python for Absolute Beginners: Perfect if you’re new to Python, covering core concepts in a fun, hands-on way.
Python Memory Management and Tips: Dive deeper into how Python handles memory, reference counting, and garbage collection.
Async Techniques and Examples in Python: Learn to leverage concurrency in Python, including threading and multiprocessing approaches.

Overall takeaway

Despite the perennial debate over Python’s performance, the real question is whether Python is fast enough for your needs. Purely CPU-bound code may benefit from specialized tools like PyPy, Cython, or micro-optimizations, but most day-to-day tasks are bottlenecked by I/O or benefit from Python’s clarity and vast ecosystem. By understanding how CPython manages memory, concurrency, and compilation, you can make informed decisions to harness Python’s strengths and avoid its pitfalls.

Links from the show

Anthony's CPython Source Book: realpython.com/products
Anthony's PyCon Talk: youtube.com
N-body problem example: github.com
HPy project: github.com
Austin profiler: github.com

Prior episodes:
#240: A guided tour of the CPython source: talkpython.fm
#214: Dive into CPython 3.8: talkpython.fm
#168: 10 Python security holes: talkpython.fm
Episode #265 deep-dive: talkpython.fm/265
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #265 deep-dive: talkpython.fm/265

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 The debate about whether Python is fast or slow is never-ending.

00:03 It depends on what you're optimizing for.

00:05 CPU server consumption?

00:07 Developer time?

00:08 Maintainability?

00:09 There are many factors.

00:11 But if we keep our eye on the pure computational speed in the Python layer,

00:15 then yes, Python is slow.

00:17 In this episode, we invite Anthony Shaw back on the show.

00:21 He's here to dig into the reasons that Python is computationally slower than many of its pure languages and technologies, such as C++ and JavaScript.

00:29 This is Talk Python To Me, episode 265, recorded May 19, 2020.

00:34 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:54 This is your host, Michael Kennedy.

00:56 Follow me on Twitter, where I'm @mkennedy.

00:58 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:04 This episode is sponsored by Brilliant.org and Sentry.

01:08 Please check out their offerings during their segments.

01:10 It really helps support the show.

01:12 Anthony, welcome back to Talk Python.

01:15 Hey, Mike.

01:15 It's great to be back.

01:16 Yeah, it's great to have you back.

01:17 You've been on the show a bunch of times.

01:19 You've been over on Python Bytes when you're not featured there.

01:22 But, you know, people may know you were on episode 168, 10 Python security holes and how to plug them.

01:29 That was super fun with one of your colleagues.

01:31 And then 214, dive into the CPython 3.8 source code.

01:36 Or just what was new in 3.8.

01:38 And then a guided tour of the CPython source code, which I think at the time was also 3.8.

01:42 And now we're going to look at the internals of Python again.

01:45 I feel like you're becoming the Python internals guy.

01:48 Yeah.

01:48 Well, I don't know.

01:50 There's lots of people who know a lot more about it than I do.

01:53 But I've been working on this book over the last year on CPython internals, which has been focused on 3.9.

02:00 So, yeah, we've got some stuff to talk about.

02:03 Yeah, that's awesome.

02:04 And your book started out as a realpython.com article, which I'm trying to define a term that describes what some of these look like.

02:13 When I think of article, I think of a three to four page thing.

02:17 Maybe it's in depth and it's 10 pages.

02:19 This is like 109 pages or something as an article, right?

02:22 It was like insane.

02:23 But it was really awesome and really in depth.

02:24 And so you were partway towards a book and you figured like, well, what the heck?

02:28 I'll just finish up this walk.

02:29 Yeah, I figured I'd pretty much written a book.

02:32 So I might as well put it between two covers.

02:34 It was actually a lot.

02:36 It was actually a lot of work to get it from that stage to where it is now.

02:41 So I think the whole thing's pretty much been rewritten.

02:43 There's a way that you explain things in an article that people expect, which is very different to the style of a book.

02:49 And also there's stuff that I kind of skimmed over in the article.

02:53 I think it's actually about three times longer than the original article.

02:56 And it's a lot more practical.

02:59 So rather than being like a tourist guide to the source code, it's more about like CPython internals and optimizations and practical tools you can learn as more sort of like advanced techniques.

03:12 If you use CPython a lot for your day job to either make it more performant or to optimize things or to make it more stable and stuff like that.

03:21 Yeah.

03:21 It's really interesting because if you want to understand how Python works and you're, say, the world's best Python developer, your Python knowledge is going to help you a little bit.

03:31 But not a ton for understanding CPython because that's mostly, well, C code, right?

03:36 And so I think this having this guided tour, this book that talks about that is really helpful, especially for taking people who know and love Python, but actually want to get a little deeper and understand the internals or maybe even become a core developer.

03:49 Yeah, definitely.

03:49 And if you look at some of the stuff we'll talk about this episode, hopefully, like Cython and mypyC and stuff like that, then knowing C or knowing how C and Python work together is also really important.

04:02 Yeah, absolutely.

04:02 All right.

04:03 So looking forward to talking about that.

04:05 But just really quickly, you know, give people a sense of what you work on day to day when you're not building extensions for IDEs, writing books and otherwise doing more writing.

04:15 Yeah, so I work at NTT and run sort of learning and development and training for the organization.

04:22 So I'm involved in, I guess, like what skills we teach our technical people and our sales people and all of our employees, really.

04:30 Yeah, that's really cool.

04:31 That sounds like a fun place to be.

04:32 Yeah, that's a great job.

04:33 Yeah, awesome.

04:33 All right.

04:34 Well, the reason I reached out to you about having you on the show for this specific topic, I always like to have you on the show.

04:42 We always have fun conversations, but I saw that you were doing, were you doing multiple or just this PyCon talk?

04:50 Just one.

04:51 I was accepted for two, but I was supposed to pick one.

04:55 I see.

04:55 That's right.

04:56 And then PyCon got canceled.

04:57 Yeah.

04:59 So I was like, well, let's, you know, talk.

05:00 We can talk after PyCon after you give your talk.

05:02 It'll be really fun to cover this.

05:04 And then, you know, we were supposed to share a beer in Pittsburgh and we're like half a world away.

05:12 Didn't happen, did it?

05:13 Yeah.

05:13 Maybe next year.

05:14 Yeah.

05:15 Hopefully next year.

05:15 Hopefully things are back to up and running because I don't know.

05:18 To me, PyCon is kind of like my geek holiday that I get to go on.

05:22 I love it.

05:22 Yeah.

05:23 All right.

05:23 Well, so just, I guess, for people listening, you did end up doing that talk in an altered sense,

05:30 right?

05:30 And they can technically go watch it soon, at least maybe by the time this is out.

05:34 Yeah, definitely.

05:35 It'll be out tonight.

05:36 It's going to be on the YouTube channel, the PyCon 2020 YouTube channel.

05:41 The organizers reached out to all the speakers and said, if you want to record your talk and

05:46 submit it from home, then you can still do that and put them all up on YouTube.

05:50 I think that's great.

05:51 You know, and there's also a little bit more over PyCon online.

05:54 One thing I think is really valuable for people right now is they have the job fair, kind of,

06:00 right?

06:01 There's a lot of job listings for folks who are looking to get in jobs.

06:05 Have you seen the PSF JetBrains survey that came out?

06:08 Yes.

06:09 In the 2019, it came out just like a few days ago.

06:11 Really interesting stuff, right?

06:13 Like a lot of cool things in there.

06:14 Yeah, definitely.

06:15 Yeah.

06:15 I love that.

06:16 That and the Stack Overflow developer survey.

06:18 Those are the two that really, I think, have the pulse correctly taken.

06:22 One of the things that was in there I thought was interesting is more than any other category

06:27 of people, they said, how long have you been coding?

06:30 I don't know if it was in Python or just how long have you been coding, but it was different,

06:36 you know, one to three years, three to five, five to 10, 10 to 15.

06:41 And then people like me forever, long time, you know, like 20 plus or something.

06:45 The biggest bar of all those categories, the biggest group was the one to three years.

06:51 Yeah.

06:52 Right.

06:52 Like by 29% of the people said, I've only been coding three years or fewer.

06:56 And I think that that's really interesting.

06:58 So I think things like that job board and stuff are probably super valuable for folks just getting

07:02 into things.

07:03 Definitely.

07:03 Yeah.

07:03 So really good that they're putting that up and people will be able to check out your

07:07 talk.

07:07 I'll put a link to it in the show notes, of course, but they can just go to the PyCon 2020

07:11 YouTube channel and check it out there.

07:13 Yeah.

07:13 And check out the other talks as well.

07:15 There's some really good ones up already.

07:16 The nice thing about this year's virtual PyCon is you can watch talks from your couch.

07:20 That's right.

07:22 You don't even have to get dressed to go to PyCon.

07:25 Just do it in your PJs.

07:26 That's right.

07:27 It's so much more comfortable than the conference chairs.

07:31 That's true.

07:31 That's for sure.

07:32 Yeah.

07:33 Very cool.

07:33 I'm definitely looking forward to checking out more of the talks as well.

07:35 I've already watched a few.

07:36 I wanted to set the stage for our conversation here by defining slow because I think slow is

07:44 in the eye of the beholder, just like beauty, right?

07:46 Like sometimes slow doesn't matter.

07:50 Sometimes computational speed might be slow, but some other factor might be quick.

07:57 So I'll let you take a shot at it, then I'll throw in my two cents as well.

08:00 Like let's like, what do you mean when you say, why is Python slow?

08:04 So when I say, why is Python slow?

08:06 The question is, why is it slower than other languages doing exactly the same thing and have

08:14 picked on an error?

08:15 Right.

08:15 So if I had an algorithm that I implemented, say in C, a JavaScript on top of Node and Python,

08:20 it might be much slower in Python.

08:23 Wall time, like execution time.

08:25 Yeah.

08:26 Execution time might be much slower in Python than it is in other languages.

08:29 And that matters sometimes.

08:31 And sometimes it doesn't matter as much.

08:34 It depends what you're doing, right?

08:35 If you're doing like a DevOps-y thing and you're trying to orchestrate calling into Linux, well,

08:40 who cares how fast Python goes?

08:42 Probably like the startup time is the most important of all of them.

08:45 If you're modeling stuff and you're trying to do the mathematical bits, anything computational,

08:51 and you're doing that in Python, then it really might matter to you.

08:54 Yeah.

08:55 So it was kind of like a question, if we can find out the answer, maybe there's a solution

09:00 to it.

09:00 Yeah.

09:01 Because, you know, you hear this thrown around.

09:02 People say Python's too slow and I use this other language because it's faster.

09:06 And so I just wanted to understand, like, what is the actual reason why Python is slower

09:12 at doing certain things than other languages?

09:14 And is there a reason that can be resolved?

09:18 Or is it just that's just how it is as part of the design?

09:22 Fundamentally, it's going to be that way.

09:23 Yeah.

09:24 I don't think it is.

09:25 I think...

09:26 You don't think it's slow?

09:27 No, I don't think it's fundamentally has to be that way.

09:30 I agree with you.

09:31 I think in the research as well, it uncovered it doesn't fundamentally have to be that way.

09:36 And in lots of cases, it isn't that way either.

09:38 Like there's ways to get around the slowdown, like the causes of slowdown.

09:44 And if you understand in what situations Python can be slow, then you can kind of like bypass

09:51 those.

09:51 Right.

09:52 So let me tell a really interesting story that comes from Michael Driscoll's book, Python

09:57 Interviews.

09:58 So over there, he interviewed, I think it was Alex.

10:02 Yeah, Alex Martelli.

10:03 And they talked about the history of YouTube, right?

10:07 YouTube's built on Python.

10:09 And why is that the case?

10:11 Originally, there was Google Video, which had hundreds of engineers writing, implementing

10:18 Google Video, which is going to be basically YouTube.

10:21 But YouTube was also a startup around the same time, right?

10:24 And they were kind of competing for features and users and whatnot.

10:26 And YouTube only had like 20 employees at the time or something like that, whereas Google

10:31 had hundreds of super smart engineers.

10:34 And Google kept falling behind farther and farther and not be able to implement the features that

10:39 people wanted nearly as quick as YouTube.

10:41 YouTube.

10:41 And the reason was they were all doing it in C++ and it took a long time to get that written.

10:47 And YouTube just ran circles around them with a, you know, more less than a fifth of the

10:52 number of people working on it.

10:53 So in some sense, like that's a testament of Python speed, right?

10:58 But it's not its execution speed.

11:00 It's like the larger view of speed, which is why I really wanted to find like what computational

11:04 speed is.

11:05 Another sense where it may or may not matter is like where you're doing stuff that waits,

11:10 right?

11:10 Somewhere where asyncio would be a really good option, right?

11:13 I'm talking to Redis.

11:14 I'm talking to this database.

11:15 I'm calling this API.

11:16 Like if 95% of your time is waiting on a network response, it probably doesn't matter, right?

11:21 As long as you're using some sort of async or something.

11:24 But then there's that other part where it's like I have on my computer, I've got six hyperthreaded

11:30 cores.

11:30 Why can I only use one twelfth of my computational power on my computer if I still write C code,

11:36 right?

11:37 So there's these other places where it super matters.

11:39 Or I just, like you said, there's this great example that we're going to talk about the

11:43 in-body problem, modeling like planets and how they interact with each other.

11:48 And I mean, just like to set the stage, what was the number for C versus Python in terms

11:53 of time, computation time?

11:54 To give people a sense, like why did we care?

11:56 Like why is this a big enough deal to worry about?

11:58 Is it, what is it like 30% slower?

12:00 It's a little bit slower.

12:01 Yeah.

12:01 It's a, for this algorithm, so this is called the end body problem and it's to do with calculating

12:07 the orbits of some of the planets in the solar system.

12:10 And you just do a lot, a really simple arithmetic operations.

12:15 So just adding numbers, but again and again and again.

12:17 So millions of times.

12:18 Lots of loops, lots of math.

12:20 Lots of math, lots of looping.

12:22 And in C, this implementation is seven seconds to complete.

12:27 And in Python, it's 14 minutes.

12:29 That might be a difference that you're needing to optimize away.

12:32 That could be too much, right?

12:34 Yeah.

12:34 I mean, everyone is calculating the orbits of planets as part of their day job.

12:38 So yeah.

12:39 You know, I honestly, I haven't really done that for at least two weeks.

12:44 No, but I mean, it's, it's fundamentally like I'm thinking about like, this is, I think this

12:48 undercovers one of the real Achilles heels of Python in that doing math in tight loops is really not super great in pure Python.

13:00 Right.

13:01 Whether that's planets, whether that's financial calculations or something else.

13:05 Right.

13:05 But numbers are very flexible, but that makes them inefficient.

13:08 Right.

13:09 Python is interpreted, which has a lot of benefits, but also can make it much slower as well.

13:15 Right.

13:15 Yeah.

13:16 So I think looking at this particular problem, because I thought it would be a good example,

13:20 it shines a bit of a spotlight on one of CPython's weaknesses when it comes to performance.

13:26 But in terms of like the loop, the only times you would be doing like a small loop and doing

13:31 the same thing over and over again is if you're doing like math work, doing like number crunching,

13:37 or if you're doing benchmarks, that's like one of the other reasons.

13:41 So like the way that a lot of benchmarks designed to do like computational benchmarks anyway,

13:47 is to do the same operation again and again.

13:49 So if there is an overhead or a slowdown, then it's magnified to the point where you can see

13:55 it a lot bigger.

13:55 Yeah, for sure.

13:56 I guess one thing to put out there before people run code, it doesn't go as fast as they'd hoped.

14:04 So they say that Python is slow, right?

14:07 Assuming the code they originally ran was Python like that.

14:09 That would be a requirement, I guess, is you probably should profile it.

14:13 You should understand what your code is doing and where it's slow.

14:17 Like, for example, if you're doing lookups, but your data structure is a list instead of

14:21 a dictionary, right?

14:23 You could make that a hundred times faster just by switching a date because you're just doing

14:26 the wrong type of data structure, the wrong algorithm.

14:29 It could be just that you're doing it wrong, right?

14:32 So I guess before people worry about like, is it executing too slowly?

14:37 Maybe you should make sure that it's executing the right thing.

14:40 Yeah, and it's unlikely that your application is running a very small operation, which is

14:47 this benchmark again and again, like millions of times in a loop.

14:50 And if you are doing that, there's probably other tools you could use and there's other

14:55 implementations you can do in Python.

14:59 This portion of Talk Python To Me is brought to you by Brilliant.org.

15:03 Brilliant's mission is to help people achieve their learning goals.

15:06 So whether you're a student, a professional brushing up or learning cutting edge topics,

15:10 or someone who just wants to understand the world better, you should check out Brilliant.

15:14 Set a goal to improve yourself a little bit every day.

15:17 Brilliant makes it easy with interactive explorations and a mobile app that you can use on the go.

15:22 If you're naturally curious, want to build your problem solving skills, or need to develop

15:26 confidence in your analytical abilities, then get Brilliant Premium to learn something new

15:31 every day.

15:32 Brilliant's thought-provoking math, science, and computer science content helps guide you

15:37 to mastery by taking complex concepts and breaking them into bite-sized, understandable chunks.

15:42 So get started at talkpython.fm/brilliant, or just click the link in your show notes.

15:50 Another benchmark I covered in the talk was the regular expression benchmark,

15:54 which Python is actually really good at.

15:57 So this is like the opposite to this particular benchmark.

16:01 So just saying that Python is slow isn't really a fair statement, because, and we'll kind of talk about this in a minute,

16:07 but like for other benchmarks, Python does really well.

16:11 So its string implementation is really performant.

16:14 And when you're working with text-based data, Python's actually a great platform to use, a great language to use.

16:20 The CPython compilers is pretty efficient at dealing with text data.

16:25 And if you're working on web applications or data processing, chances are you're dealing with text data.

16:32 Yeah, that's a good example.

16:33 Like the websites that I have, like the Talk Python training site, and the various podcast sites and stuff,

16:39 they're all in Python with no special, incredible optimizations, other than like databases with indexes and stuff like that.

16:46 And, you know, the response times are like 10, 30 milliseconds.

16:51 There's no problem.

16:52 Like it's fantastic.

16:53 It's really, really good.

16:54 But there are those situations like this in-body problem or other ones where it matters.

17:01 I don't know if it's fair or not to compare it against C, right?

17:04 C is really, really low level, at least from today's perspective.

17:10 It used to be a high level language, but now I see it as a low level language.

17:13 If you do a malloc and free and, you know, the address of this thing, right,

17:17 that feels pretty low level to me.

17:19 So maybe it's unfair.

17:21 I mean, you could probably get something pretty fast in assembly, but I would never choose to use assembly code these days

17:27 because it's just like I want to get stuff done and maintain it and be able to have other people understand what I'm doing.

17:31 But, you know, kind of a reasonable comparison, I think, would be Node.js and JavaScript.

17:38 And you made some really interesting compare and contrast between those two environments

17:43 because they seem like, well, like, okay, Python, at least it has some C in their JavaScript.

17:48 Who knows what's going on with that thing, right?

17:50 Like, you know, what's the story between those two?

17:52 Yeah, you make a fair point, which is, I mean, comparing C and Python isn't really fair.

17:56 One is like a strongly typed compiled language.

17:59 The other is a dynamically typed interpreted language and they handle memory differently.

18:06 Like in C, you have to statically or dynamically allocate memory and CPython is done automatically.

18:12 Like it has a garbage collector.

18:14 There's so many differences between the two platforms.

18:16 And so I think Node.js, which is, so Node.js is probably a closer comparison to Python.

18:24 Node.js isn't a language.

18:25 It's a kind of like a stack that sits on top of JavaScript that allows you to write JavaScript,

18:32 which operates with things that run in the operating system.

18:36 So similar to CPython, like CPython has extensions that are written in C that allow you to do things

18:43 like connect to the network or, you know, connect to like physical hardware

18:49 or talk to the operating system in some way.

18:51 Like if you just wrote pure Python and there was no C, you couldn't do that because the operating system APIs

18:56 are C headers in most cases.

18:59 Right.

18:59 Almost all of them are in C somewhere.

19:01 Yeah.

19:02 And with JavaScript, it's the same thing.

19:03 Like if you want to talk to the operating system or do anything other than like working with stuff

19:09 that's in the browser, you need something that plugs into the OS.

19:12 And Node.js kind of provides that stack.

19:15 So when I wanted to compare Python with something, I thought Node was a better comparison

19:21 because like JavaScript and Python, in terms of the syntax, they're very different.

19:25 But in terms of their capabilities, they're quite similar.

19:29 You know, they both have classes and functions and you can use them interchangeably.

19:33 They're both kind of like dynamically typed.

19:35 The scoping is different and the language is different.

19:37 But like in terms of the threading as well, they're quite similar.

19:42 Right.

19:42 They do feel much more similar.

19:44 But there's a huge difference between how they run, at least when run on Google's V8 engine,

19:51 which basically is the thing behind Node and whatnot, versus CPython is,

19:56 CPython is interpreted and V8 is JIT compiled, just in time compiled.

20:02 Yeah, so that's probably one of the biggest differences.

20:04 And when I was comparing the two, so I wanted to see, okay, which one is faster?

20:10 Like if you gave it the same task and if you gave it the end body problem,

20:13 then Node.js is a couple of multiples faster.

20:18 I think it was two or three times faster to do the same algorithm.

20:23 And for a dynamically typed language, you know, that means that they must have some optimizations,

20:28 which make it faster.

20:30 I mean, if you're running on the same hardware, then, you know, what is the overhead?

20:34 And kind of digging into it, I guess, in a bit more detail.

20:39 So JavaScript has this, actually there's multiple JavaScript engines, but kind of the one that Node.js uses

20:45 is Google's V8 engine.

20:47 So quite cleverly named, which is all written in...

20:52 Only would it be better if it were a V12, you know?

20:54 Or an inline six.

20:56 I think that's a better option.

20:57 Yeah, there you go.

21:01 So Google's V8 JavaScript engine is written in C++, so maybe that's a fair comparison.

21:07 But the optimizing compiler is called TurboFan, and it's a JIT optimizing compiler.

21:14 So it's a just-in-time compiler, whereas CPython is an ahead-of-time or an AIT compiler.

21:20 And it's JIT optimizer has got some really clever, basically sort of algorithms and logic

21:27 that it uses to optimize the performance of the application when it actually runs.

21:31 And these can make a significant difference.

21:33 Like some of the small optimizations alone can make 30%, 40% increase in speed.

21:39 And if you compare even just V8 compared to other JavaScript engines, you can see, like,

21:45 what all this engineering can do to make the language faster.

21:49 And that's how it got two, three multiples performance increases, was to optimize the JIT

21:54 and to understand, like, how people write JavaScript code and the way that it compiles the code

22:01 down into operations.

22:03 Then basically, like, it can reassemble those operations that are more performant for the CPU

22:08 so that when it actually executes them, does it in the most efficient way possible.

22:12 Right.

22:13 The difference between a JIT and an AOT is that the JIT compiler kind of makes decisions

22:18 about the compilation based on the application and based on the environment,

22:22 whereas an AOT compiler will compile the application the same and it does it all ahead of time.

22:29 Right.

22:29 So you probably have a much more coarsely-grained set of optimizations and stuff for an ahead-of-time compiler,

22:36 like C++ or something, right?

22:38 Like, I've compiled against x86 Intel CPU with, like, the multimedia extensions

22:47 or whatever, right?

22:48 The scientific computing extensions.

22:49 But other than that, I make no assumptions, whether it's multi-core, highly multi-core,

22:54 what its L2 cache is, none of that stuff, right?

22:57 It's just, we're going to kind of target modern Intel on macOS and do it on Windows

23:04 and compile that.

23:05 Yeah.

23:05 So modern CPU architectures and modern OSes can really benefit if you've optimized

23:12 the instructions that you're giving them to benefit, like, the caches that they have

23:17 or the cycles that they've set up and the sort of the turbo fan optimizer

23:22 for the VA engine takes a lot of advantage of those things.

23:25 Yeah.

23:25 That seems really powerful.

23:27 I guess we should step back and talk a little bit about how CPython runs,

23:32 but being an interpreter, it can only optimize so much.

23:37 It's got all of its byte codes and it's going to go through its byte codes

23:41 and execute them, but saying, like, well, these five byte codes, we could actually turn that

23:45 into an inline thing over here and I see this actually has no effect on what's loaded on the stack,

23:51 so we're not going to, like, push the item.

23:53 I mean, it seems like it doesn't operate optimizing, tell me if I'm wrong,

23:57 if it doesn't optimize, like, across lots of byte codes as it's thinking about it.

24:04 Yeah, so what CPython will do when it compiles your code, and it's also worth pointing out

24:08 that when you run your code for the first time, it will compile it, but when you run it again,

24:14 it will use the cached version, so...

24:16 Right, if you ever see the dunder pycache with .pyc files, that's, like,

24:21 three of the four steps of getting your code ready to run saved and done

24:25 and never done again.

24:26 Yeah, so that's, like, the compiled version.

24:28 So it's not...

24:29 If Python is slow to compile code, it doesn't really matter unless your code

24:33 is somehow changing every time it gets run, which I'd be worried about.

24:37 You have bigger problems.

24:38 Yeah, exactly.

24:39 So the benefits, I guess, of an AOT compiler is that you compile things ahead of time

24:45 and then when they execute, they should be efficient.

24:47 So CPython's compiler will kind of take your code, which is, like, a text file,

24:53 typically.

24:53 It'll look at the syntax.

24:55 It will parse that into an abstract syntax tree, which is a sort of representation of functions

25:02 and classes and statements and variables and operations and all that kind of stuff.

25:08 your code, your file, your module, basically, becomes like a tree and then what it does

25:13 is it then compiles that tree by walking through each of the branches and walking through

25:19 and understanding what the nodes are and then there is a compilation.

25:23 Basically, like, in the CPython compiler, there's a function for each type of thing

25:28 in Python.

25:28 so there's a compile binary operation or there's a compile class function

25:34 and a compile class will take a node from the AST, which has got your class in it

25:39 and it will then go through and say, okay, what properties, what methods does it have

25:44 and it will then go and compile the methods and then inside a method it will go and compile the statements.

25:48 So, like, once you break down the compiler into smaller pieces, it's not that complicated

25:53 and what a compiler will do is it will spit out so compiled basic frame blocks

25:59 they're called and then they get assembled into bytecode.

26:03 So, after the compiler stage, there is an assembler stage which basically figures out

26:08 in which sequence should the code be executed, you know, which basically,

26:13 like, what will the control flow be between the different parts of code,

26:17 the different frames.

26:18 In reality, like, they get executed in different orders because they depend on input

26:23 whether or not you call this particular function but still, like, if you've got a for loop,

26:27 then it's still got to go inside the for loop and then back to the top again.

26:31 Like, that logic is, like, hard-coded into the for loop.

26:34 Right.

26:35 You know, as you're talking, I'm wondering if, you know, minor extensions

26:39 to the language might let you do higher-level optimizations.

26:43 Like, say, like, having a frozen class that you're saying I'm not going to add any fields to

26:49 or, like, an inline on a function, like, I only, or make it a function internal

26:54 to a class in which it could be inlined, potentially, because, you know,

26:58 no one's going to be able to, like, look at it from the outside of this code and stuff.

27:02 What do you think?

27:03 There is an optimizer in the compiler called the peephole optimizer.

27:07 And when it's compiling, I think it's actually it's after the compilation stage,

27:12 I think, it goes through and it looks at the code that's been compiled and if it can make some

27:18 decisions about either, like, dead code that can be removed or branches which can be simplified,

27:25 then it can basically optimize that.

27:27 And that will make some improvement, like, it will optimize your code slightly.

27:31 Right.

27:32 But then once it's done, basically, your Python application has been compiled down

27:36 into this, like, assembly language called bytecode, which is the, like, the actual individual operations

27:42 and then they're executed in sequence.

27:45 They're split up into small pieces, they're split up into frames, but they're executed

27:50 in sequence.

27:50 Right.

27:51 And if you look at the C source code, dive into there, there's a C eval.c file

27:56 and it has, like, the world's largest while loop with a switch statement

28:01 in it, right?

28:02 Yeah.

28:02 So this is, like, the kind of the brain of CPython.

28:06 Oh, maybe it's not the brain, but it's the bit that, like, goes through each

28:10 of the operations and says, okay, if it's this operation, do this thing,

28:13 if it's that one, do this thing.

28:14 This is all compiled in C, so it's fairly fast, but it will basically sit and run the loop.

28:20 So when you actually run your code, it takes the assembled bytecode and then for each

28:26 bytecode operation, it will then do something.

28:29 So, for example, there's a bytecode for add an item to a list.

28:33 So it knows that it will make a value off the stack and it will put that

28:37 into the list or this one which calls a function.

28:40 So, if the bytecode is call function, then it knows to figure out how to

28:44 call that function in C.

28:46 Right.

28:46 Maybe it's loaded a few things on the stack, it's going to call it, do it just get sucked along,

28:51 something like that.

28:51 And so I guess one of the interesting things, and you were talking about an interesting

28:56 analogy about this, sort of when Python can be slow versus a little bit less slow,

29:02 it's the overhead of like going through that loop, figuring out what to do,

29:06 like preparing stuff before you call the CPython's thing, right?

29:10 Like list.sort, it could be super fast even for a huge list because it's just going

29:15 to this underlying C object and say, in C, go do your sort.

29:18 But if you're doing a bunch of small steps, like the overhead of the next step

29:24 can be a lot higher.

29:26 in the nbody problem, the step that it has to do, the operation it has to do,

29:30 will be add number A to number B, which on a decent CPU, I mean, this is like nanoseconds

29:36 in terms of time it takes to execute.

29:39 So if it's basically, if the operation that it's doing is really tiny, then after doing

29:46 that operation, it's got to go all the way back up to the top of the loop again,

29:49 look at the next barcode operation, and then go and run this, you know, call this thing,

29:56 which runs the operation, which takes again like nanoseconds to finish, and then it goes

30:00 all the way back around again.

30:01 So I guess the analogy I was trying to think of with the nbody problem is,

30:05 you know, if you were a plumber and you got called out to do a load of jobs

30:10 in a week, but every single job was, can you change this one washer on a tap for me,

30:16 which takes you like two minutes to finish, but you get a hundred of those jobs

30:21 in a day, you're going to spend most of your day just driving around and not actually doing

30:25 any plumbing.

30:26 You're going to be driving from house to house and then doing these like two

30:30 minute jobs and then driving on to the next job.

30:33 So I think the nbody problem, that's kind of an example of that is that the evaluation

30:39 loop can't make decisions, like it can't say, oh, if I'm going to do the same operation

30:43 again and again and again, instead of going around the loop each time, maybe I should just

30:49 call that operation the number of times that I need to.

30:53 and those are the kind of optimizations that a JIT would do because it kind of

30:56 changes the compilation order in sequence.

30:59 So that's, I guess like we could talk about there are JITs available for

31:03 Python.

31:04 Yes.

31:05 CPython doesn't have, CPython doesn't use a JIT, but for things like the

31:10 nbody problem, instead of the, you know, the plumber driving to every house and doing

31:15 this two minute job, why can't somebody actually just go and, why can't everyone

31:20 just send their tap to like the factory and he just sits in the factory all day

31:24 replacing the washers.

31:26 Like Netflix of taps or something, yeah.

31:28 Back when they sent out DVDs.

31:31 Maybe I was stretching the analogy a bit, but, you know, basically like you can

31:35 make optimizations if you know you're going to do the same job again and again

31:39 and again, or maybe like he just brings all the washers with him instead of driving

31:44 back to the warehouse each time.

31:45 So, like there's optimizations you can make if you know what's coming.

31:49 But because the CPython application was compiled ahead of time, it doesn't know

31:54 what's coming.

31:55 There are some opcodes that are coupled together, but there's only a few

32:00 like which ones they are off the top of my head, but there's only a couple and it doesn't

32:04 really add a huge performance increase.

32:05 Yeah, there have been some improvements around like bound method execution time and

32:10 methods without keyword arguments or some something along those lines that got quite a

32:14 bit faster.

32:14 But that's still just like how can we make this operation faster?

32:18 Not how can we say like, you know what, we don't need a function, let's inline that.

32:21 It's called in one place once, just inline it, right?

32:23 Things like that.

32:24 This portion of Talk Python To Me is brought to you by Sentry.

32:29 How would you like to remove a little stress from your life?

32:32 Do you worry that users may be having difficulties or are encountering errors

32:36 with your app right now?

32:37 Would you even know it until they send that support email?

32:40 How much better would it be to have the error details immediately sent to you,

32:44 including the call stack and values of local variables, as well as the active user stored in

32:50 the report?

32:51 With Sentry, this is not only possible, it's simple and free.

32:54 In fact, we use Sentry on all the Talk Python web properties.

32:58 We've actually fixed a bug triggered by our user and had the upgrade ready to roll

33:03 out as we got the support email.

33:04 That was a great email to write back.

33:06 We saw your error and have already rolled out the fix.

33:09 Imagine their surprise.

33:10 Surprise and delight your users today.

33:12 Create your free account at talkpython.fm/sentry and track up to 5,000

33:18 errors a month across multiple projects for free.

33:20 So you did say there were some.

33:23 There was Pigeon, there's PyPy, there's Unladen Swallow, there's some other options as

33:33 well, but those are the JITs that are coming to mind.

33:35 Piston, all of those were attempts and I have not heard anything about any of them for a

33:39 year, so that's probably not a super sign for their adoption.

33:43 Yeah, so the ones I kind of picked on because I think they've got a lot of promise

33:46 and kind of show a big performance improvement is PyPy, which shouldn't be new.

33:52 I mean, it's a popular project, but PyPy uses a...

33:55 PY, PY, because some people say like Python package inject, they also call it PyPy, but

34:00 that's a totally different thing.

34:01 Yeah, so PyPy...

34:02 Just for listeners who aren't sure.

34:03 PyPy kind of helped solve the argument for my talk actually, because if Python is slow, then

34:10 writing a Python compiler in Python should be like really, really slow.

34:14 But actually, PyPy, which is a Python compiler written in Python, in problems like the n-body

34:21 problem, where you're doing the same thing again and again, it's actually really good at

34:26 it.

34:26 Like, it's significantly...

34:28 It's 700-something percent faster than CPython at doing the same algorithm.

34:33 Like, if you copy and paste the same code and run it in PyPy versus CPython, yeah, it will

34:40 run over seven times faster in PyPy, and PyPy is written in Python.

34:44 So it's an alternative Python interpreter that's written purely in Python.

34:49 But it has a JIT compiler.

34:51 That's probably the big difference.

34:52 Yeah.

34:52 As far as I understand it, PyPy is kind of like a half JIT compiler.

34:57 It's not like a full JIT compiler like, say, C# or Java, in that it

35:02 will, like, run on CPython and then, like, decide to JIT compile the stuff that's run a lot.

35:08 I feel like that's the case.

35:09 PyPy is a pure JIT compiler, and then number is a, you can basically choose to JIT

35:16 certain parts of your code.

35:17 So with number, you can use a, actually, a decorator, and you can stick it on.

35:22 An at JIT.

35:23 Yeah, it literally is that.

35:25 You can do an at JIT on a function, and it will JIT compile that function for

35:30 you.

35:30 So if there's a piece of your code which would work better if it were JITed, like it would be

35:35 faster, then you can just stick a JIT decorator on that using the number

35:40 package.

35:41 Yeah, that's really cool.

35:42 Do you have to, how do you run it?

35:44 I've got some function within a larger Python program, and I put an at JIT on it.

35:48 Like, how do I make it actually JIT that and, like, execute?

35:52 Can I still type Python space, I think, or what happens?

35:56 I don't know.

35:56 Do you know?

35:57 Yeah, I'm just wondering, like, it probably is the library that, as it pulls

36:02 in what it's going to give you back, you know, the wrapper, the decorator, the

36:05 function, it probably does JIT.

36:07 So interesting.

36:07 I think that's a really good option.

36:09 Of all the options, honestly, I haven't done anything with Numba, but it looks like probably the

36:13 best option.

36:13 It sounds a little bit similar to Cython, but Cython's kind of the upfront style, right?

36:20 Like, we're going to pre-compile this Python code to see, whereas Numba, it sounds more, a

36:25 little more runtime.

36:26 Yeah, so Cython is not really a JIT or a JIT optimizer.

36:31 It's a way of decorating your Python code with type annotations and using, like, a sort of

36:40 slightly different syntax to say, oh, this variable is this type, and then Cython will

36:47 actually compile that into a C extension module, and then you run it from CPython.

36:51 So it basically, like, compiles your Python into C and then loads it as a

36:57 C extension module, which can make a massive performance improvement.

37:01 Yeah, so you've got to run a, like, a set of py build command to generate the libraries, the

37:07 .o files, or whatever the platform generates, and then those get loaded in.

37:13 Even if you change the Python code that was their source, you've got to recompile them, or it's

37:18 just still the same old compiled stuff, same old binaries, yeah.

37:21 You can automate that so you don't have to type it by hand, but I think Cython is a really good

37:26 solution for speeding it up.

37:28 But as I kind of pointed out in my talk, it doesn't answer the question of why Python is

37:32 slow.

37:33 It says, well, Python can be faster if you do C instead.

37:37 Yeah.

37:37 One thing I do like about Cython these days is they've adopted the type hints,

37:42 type annotation format.

37:44 So if you have, what is that, Python 3, 4, or later type annotations, you

37:51 got to be explicit on everything.

37:53 But if you have those, that's all you have to do to turn it into like official Cython, which is

38:00 nice because it used to be you'd have to have like a C type or Cython type dot

38:04 int rather than a, you know, colon int or something funky like that.

38:08 Yeah.

38:08 And it's nice that they brought the two things together.

38:10 Cython like had type annotations before the language did, I think.

38:14 Right.

38:14 Yeah.

38:15 So they had their own special way.

38:16 They had their own special little sub language that was Python-esque, but not

38:20 quite.

38:20 So I was looking at this nbody problem and I thought, all right, well, I

38:24 probably should have played with Numba, but I have a little more experience with

38:27 Cython.

38:27 So let me just see, like the code is not that hard and I'm going in terms of

38:32 like how much code is there or whatever.

38:34 Sure.

38:34 The math is hard, but the actual execution of it isn't.

38:37 So I'll link to the actual Python source code for the nbody problem.

38:41 And I ran it.

38:42 It has some defaults that are much smaller than the one you're talking about.

38:45 So if you run it, just hit run.

38:46 It'll run for like two on my machine.

38:48 It ran for 213 milliseconds just in pure CPython.

38:52 So I said, all right, well, what if I just grab that code and I just plunk it

38:56 into a PYC file unchanged.

38:59 I didn't change anything.

39:00 I just moved it over.

39:01 I got it to go into 90 milliseconds, which is like 2.34 times faster.

39:06 And then I did the type hints that I told you about.

39:09 Because if you don't put the type hints, it'll still run, but it will work at the, the, the

39:14 pie object level.

39:16 Like, so your numbers are pie object numbers, not, you know, ints and floats down.

39:22 So you make it a little bit faster.

39:23 So, but I was only able to get it four times faster down to 50 milliseconds.

39:26 Either I was doing it wrong or that's just about as that much faster as I

39:30 can get it.

39:31 I could have been missing some types and it was still doing a little more

39:33 CPython interrupt stuff.

39:36 But yeah, I don't know.

39:37 It's, it's an interesting challenge.

39:39 I guess the last thing to talk about, like on this little bit right here is

39:42 the, is my PYC.

39:44 Yeah.

39:44 I didn't know much about my PYC.

39:46 I don't know a lot about it either.

39:47 So my PY is a type checking library and verification library for the type annotations.

39:54 Right.

39:54 So if you put these type annotations in there, they don't do anything at runtime.

39:57 They're just like there to tell you stuff.

39:59 Right.

40:00 But things like certain editors can partially check them or my PY can like

40:05 follow the entire chain and say this code looks like it's typewise hanging

40:09 together.

40:10 Not like a pure five levels.

40:12 pass an integer and you expect a string.

40:14 So it's broken.

40:15 Right.

40:15 It can check that.

40:16 So they added this thing called my PYC, which can take stuff that is annotated in a way that

40:22 my PY works with, which is basically type annotations, but more.

40:25 And they can compile that to see as well, which they also interestingly got

40:30 like a four times speed up with stuff, not in the embody problem, but on my PY.

40:34 So I don't know.

40:34 It's, there's a lot of options, but as you point out, they are a little bit dodging Python.

40:41 The number stuff is cool because I think you don't really write different code.

40:45 Do you?

40:46 Yeah, it's been more natural.

40:47 And I think PYPY, like you're saying you kind of got two to four times improvement by moving

40:54 things to Siphon.

40:55 And it took a decent amount of work, right?

40:56 Because every loop variable had to be declared somewhere else because you can't set the

41:00 type or the type annotation inside the loop declaration, right?

41:03 Like it wasn't just put a colon in.

41:05 I had to do like a decent amount of work to drag out the types.

41:08 Yeah.

41:08 And whereas PYPY will be a seven times improvement in speed, for that problem.

41:13 Yeah.

41:14 And there's no C compilation.

41:15 Yes.

41:16 That's really nice.

41:17 That's really nice.

41:18 So we talked about JITs and JITs are pretty interesting.

41:21 To me, I feel like JITs often go together with garbage collection in the entirely

41:28 unmanaged sort of non-deterministic sense of garbage collection, right?

41:33 Not reference counting, but sort of the mark and sweep style.

41:37 So Python, I mean, maybe we could talk about GC at Python first and then

41:41 if there's any way to like change that or advantages there, disadvantages.

41:46 From the Instagram story that they saw a performance improvement when they turned off GC.

41:52 Yeah, like we're going to solve the memory problem by just letting it leak.

41:55 Like literally, we're going to disable garbage collection.

41:58 Yeah, I think they got like a 12% improvement or something.

42:01 It was significant.

42:02 They turned it off and then they just restarted the worker processes every 12 hours

42:06 or something like that.

42:06 And it wasn't that bad.

42:07 The GC itself, like to your, I said there's another problem that I studied

42:13 which was the binary tree problem.

42:15 And this particular problem will show you the impact of the garbage collector

42:22 performance on, like in this particular algorithm, this benchmark, it will show you

42:27 how much your GC slows down the program.

42:30 And again, I wanted to compare Node with Python because they both have both reference

42:35 counting and garbage collection.

42:36 So the garbage collector with Node is a bit different in terms of its design,

42:42 but both of them are a stop everything garbage collector.

42:45 So, you know, CPython has a main thread, basically, and the garbage collector will run

42:52 on the main thread and it will run every number of operations.

42:55 So, I think the, I can't remember what the default is, it's like 3,000 or

42:59 something.

42:59 Every 3,000 operations in the first generation where an object has been assigned

43:05 or deassigned, then it will run the garbage collector, which goes and inspects every,

43:09 every list, every dictionary, every, what other types, like custom objects,

43:14 and sees if they have any circular references.

43:17 Right, and the reason we need the GC, which does this, is because it's not

43:21 even the main memory management system, because if it was, Instagram would not

43:26 at all be able to get away with that trick.

43:27 Right, this is like a, a final net to catch the stuff that reference counting doesn't

43:33 work.

43:33 Normally, like if there's some references to an object, once things stop

43:37 pointing at it, the last one that goes, it just poof, it disappears.

43:41 But the challenge of reference counting garbage collection is if you've got like

43:46 some kind of relationship where one thing points at the other, but that thing

43:49 also points back to itself, right, like a couple object, right, a person object

43:54 with a spouse pointer or something like that, right?

43:57 If you're married, you're going to leak.

43:58 Yeah, absolutely.

43:59 So this is the thing you're talking about, those types of things that's addressing.

44:02 And it's kind of designed on the assumption that most objects in CPython

44:06 have very short lifespans.

44:09 So, you know, they get created and then they get destroyed shortly afterwards.

44:13 So like local variables inside functions or, you know, like local variables

44:17 inside list comprehensions, for example, like those can be destroyed pretty much

44:22 straight away.

44:22 But the garbage collective will stop everything running on the main thread

44:26 while it's running because it has to because you can't, you know, if it's deleting stuff

44:30 and there's something else running at the same time that's expecting that thing

44:34 to exist, it's going to cause all sorts of problems.

44:36 So yeah, the GC will kind of slow down your application if it gets hit a lot.

44:41 And the binary tree problem will basically construct a series of trees and then loop

44:47 through them and then delete the nodes and the branches, which kind of triggers

44:51 the GC to run a lot.

44:53 And then you can compare the performance of the garbage collectors.

44:56 So one thing I kind of noted in the design is that they stop everything.

45:02 If the time it takes to run the garbage collector could be as short as possible,

45:06 then the performance hit of running it is going to be smaller.

45:08 And something that Node does is it runs a multi-threaded mark process.

45:13 So when it actually goes and looks for circular references, it actually starts

45:18 looking before it stops the main thread on different helper threads.

45:23 So it starts separate threads and starts the mark process.

45:26 And then it still stops everything on the main process, but it's kind of prepared all its

45:31 homework ahead of time.

45:32 It's already figured out what is garbage before it stops stuff.

45:36 And it's like, now we just have to stop what we throw it away and update the pointers and

45:41 then you can carry on, right?

45:42 Because it's got to, you know, balance the memory and stop allocation and whatnot.

45:45 Yeah.

45:46 So I think technically that's possible in CPython.

45:49 I don't think it has anything to do with the GIL either, like why that couldn't be

45:53 done.

45:53 You could still do...

45:55 Right.

45:55 It seems like it totally could be done.

45:56 Yeah.

45:57 Because marking and finding circular references could be done outside of the gill

46:01 because it's a C-level call.

46:02 It's not an opcode.

46:04 But like I say in the talk, you know, all this stuff that I've listed so far is a lot of

46:11 work and it's a lot of engineering work that needs to go into it.

46:14 And if you actually look at the CPython compiler, like the CEval, and look

46:20 at the number of people who've worked on or contributed to it, it's less than 10

46:24 like to the core component.

46:27 I wouldn't want to touch it.

46:28 I would not want to get in there and be responsible for that part of it.

46:31 No way.

46:31 Yeah.

46:33 And at this stage, they're minor optimizations.

46:35 They're not sort of big overhauls because there just isn't the people to do it.

46:41 Yeah.

46:41 You made a point in your PyCon talk that, you know, the reason that V8 got to be so

46:47 optimized so fast is because it's got, you know, tens of millions of dollars of

46:51 engineering put against it yearly.

46:54 Right?

46:55 I mean, it's kind of part of the browser wars.

46:58 The new browser wars a bit.

47:00 Yeah.

47:00 From what I could work out, there's at least 35 permanent developers working on

47:05 it.

47:05 Just looking at the GitHub project, like if you just see the commit histories, like

47:10 nine to five, Monday to Friday, 35 advanced C++ developers hacking away at it.

47:16 Right.

47:16 If we had that many people continuously working on CPython's like internals and

47:22 garbage collection and stuff, we'd have more optimizations or bigger projects that people

47:26 will try to take on probably.

47:27 Yeah, absolutely.

47:27 And the people who work on it at the moment, all of them have day jobs and this

47:32 is not typically their day job.

47:33 Like they managed, they've convinced their employer to let them do it in their spare time

47:38 or, you know, one or two days a week, for example, and they're finding the time to do

47:42 it.

47:42 And it's a community run project.

47:44 it's an open source project.

47:45 But I think kind of going back to places where Python could be faster, like these kind

47:51 of optimizations in terms of engineering, they're expensive optimizations.

47:56 They cost a lot of money because they need a lot of engineering expertise and a lot of

48:01 engineering time.

48:02 And I think as a project at the moment, we don't really have that luxury.

48:06 So it's not really fair of me to complain about it if I'm not contributing to the

48:12 solution.

48:12 Yeah, but you have a day job as well, right?

48:14 But I have a day job and this is not day job.

48:16 So yeah, I think there's, I think for what we use Python for most of the

48:21 time, it's definitely fast enough.

48:23 And in places where it could have optimizations like the ones that we talked about, those

48:28 optimizations have drawbacks because, you know, adding a JIT, for example, means that it

48:34 uses a lot more memory.

48:35 like the Node.js example, the n-body problem, sure, it finishes it faster, but

48:40 uses about five times more RAM to do it.

48:42 Right.

48:43 And PyPy uses more memory, like the JIT compiler, and also the startup time of the

48:48 process is typically a lot longer.

48:50 If anyone's ever tried to boot Java JVM cold, you know, like the startup time for

48:57 JVM is pretty slow.

48:58 .NET's the same, like the initial boot time for it to actually get started and warm up

49:03 is time consuming.

49:05 So you wouldn't use it as a, like a command line tool to write a simple script

49:10 that you'd expect to finish in, you know, under 100 milliseconds.

49:13 I think that that kind of highlights one of the challenges, right?

49:16 It's if you thought your process was just going to start and be a web server or a desktop

49:21 application, two seconds start of time is fine, or whatever that number is.

49:26 But if it's solving this general problem, yeah, it could be running Flask as

49:30 a microservice, or it could be, you know, replacing Bash, right?

49:35 Like these are very different constraints and interests, right?

49:38 Yeah.

49:38 And there aren't really many other languages where there is one sort of language definition and

49:44 there are multiple mature implementations of it.

49:47 So, you know, with Python, you know, you've got Cython, you've got PyPy, you've got

49:53 Numba, you've got LionPython.

49:55 I mean, there's like a whole list of, you know, different, Jython, like different implementations

50:02 of the language.

50:02 And people can choose the, I guess, kind of pick which one is best for the problem that they're

50:08 trying to solve, but use the same language across them.

50:10 Whereas you don't really have that luxury with others.

50:12 You know, if you're writing Java, then you're using JVM.

50:15 There are, I mean, there's two implementations.

50:17 It's the free one and the licensed one, but like that's pretty much as far as it goes.

50:22 That's not exactly the same trade-off.

50:25 Yeah.

50:25 It's optimizing for money.

50:26 That's not optimizing for performers or whatever necessarily.

50:30 So one thing that I feel like comes around and around again in this discussion, and I'm

50:36 thinking mostly of like PyPy and some of these other attempts people have made to add like

50:41 JIT compilation to the language or other changes.

50:43 It's always come back, it seems like, to well, it would be great to have these

50:49 features.

50:49 Oh yeah, but there's this thing called the C API.

50:52 And so no, we can't change the GIL.

50:54 No, we can't change memory allocation.

50:56 No, we can't change any of these other things because of the C API.

51:01 And so we're stuck.

51:02 Yeah.

51:03 I mean, I'm not saying I'm asking you for a solution here.

51:07 like, I just, it feels like that is both the real value of Python in that like some of the

51:15 reasons that we can still do insanely computational stuff with Python is

51:20 because a lot of these libraries where they have these tight loops or these little bits of code

51:24 deserialization or matrix multiplication or whatever, they've written that in C

51:29 and then ship that as a wheel.

51:31 And so now all of a sudden our code is not slow as doing math with Python,

51:35 as fast as doing math with C.

51:37 Yeah.

51:37 I mean, so if you look at a NumPy, for example, if you're doing a lot of

51:41 math, then you, you know, you could be using the NumPy library, which is

51:45 largely compiled C code.

51:47 It's not like you import it from Python and you run it from Python, but the

51:51 actual implementation is a C extension.

51:54 And that wouldn't be possible if CPython wasn't built in the way it is, which is that it is a

52:00 ahead of time extension loader that you can run from Python code.

52:04 Yeah.

52:05 One project I do want to give a shout out to, I don't know if it's going to

52:08 go anywhere.

52:08 It's got a decent amount of work on it, but it's only got 185 GitHub stars.

52:13 So take that for what it's worth.

52:14 This thing called HPY, H-P-Y.

52:17 Guido Van Rossum called this out on Python Bytes 179 when he was a guest co-host there.

52:24 And it's an attempt to make a new replacement of the C API for Python, where instead of

52:33 pass around pointers to objects, you pass basically pointers to pointers, which

52:37 means that things that move stuff around like compacting garbage collectors or other

52:44 implementations like JITs have a much better chance to change things without directly breaking

52:49 the C API.

52:50 You can change the value of the pointer pointer without, you know, having to reassign that

52:55 down at that layer.

52:56 So they specifically call out it's, you know, the current C API makes it hard for things like

53:02 PyPy and Grail Python and JITon.

53:04 And the goals are to make it easier to experiment with these ideas, more friendly for other

53:10 implementations, reference counting, for example, and so on.

53:14 So anyway, I don't know that's going anywhere, how much traction it has, but it's interesting

53:19 idea.

53:20 Yeah, no, I like the idea.

53:21 And the C API, like, has come a long way, but it's got its quirks.

53:26 I don't know, there's been a lot of discussions, and there's a lot of draft peps as well, you know,

53:31 proposing kind of different designs to the C API.

53:34 Yeah.

53:35 So we're getting kind of short on time.

53:36 We've discussed a bunch of stuff.

53:38 I guess two other things I'd like to cover real quickly.

53:41 One, we've talked about a lot of stuff in terms of computational things, but understanding memory

53:48 is also pretty important.

53:49 And we did just talk about the GC.

53:50 It's pretty easy in Python to just run C profile and ask what my computational time is.

53:57 It's less obvious how to understand memory allocation and stuff.

54:00 And was it you that recommended Austin to me?

54:03 Yeah.

54:04 Yeah, so Austin is a super cool profiler, but does both CPU profiling, but also memory

54:09 allocation profiling and tracing in Python.

54:13 Do you want to tell people about Austin real quick?

54:14 Yeah, so Austin is a new profiler written for Python code.

54:18 It's a sampling profiler, so it won't, like other profilers, it won't slow your code down

54:23 significantly.

54:23 It's kind of basically sits on the side, just asking your app, you know, what it's doing as a

54:30 sample.

54:30 And then it will give you a whole bunch of visuals to let you see, like flame graphs, for example,

54:36 like what's being called, what's taking a long time, which functions are chewing up your CPU, like

54:42 which ones are causing the bottlenecks and then which ones are consuming a lot of memory.

54:46 So if you've got a, you know, a piece of code that is, it is slow, the first thing you should probably

54:52 do is just stick it through a profiler and see if there is a reason why, like if there is

54:57 something that you could either optimize or, you know, you've accidentally done like a nested

55:02 loop or something and Austin would help you do that.

55:06 One of the things I thought was super cool about this, like the challenge I have so often with

55:10 profilers is the startup of whatever I'm trying to do, it just overwhelms like the little thing I'm

55:16 trying to test.

55:18 you know, I'm like starting up a web app and initializing database connections and I just want to

55:22 request a little bit of some paid and it's not that slow, but you know, it's just, I'm seeing all

55:28 this other stuff around and I'm just like, I just want to focus on this one part of it and they've got all

55:33 these different user interfaces, like a web user interface in a terminal

55:37 user interface.

55:38 They call it two, which is cool.

55:39 And it gives you like a, like kind of like top or glances or one of these things that tells you right now,

55:45 here's what the profile for the last five seconds looks like.

55:48 And it gives you the call stack and breakdown of your code right now for

55:53 like that five second segment, like updating in real time.

55:56 That's super cool.

55:56 Yeah.

55:57 So if you want to run something and then just see what it's doing or you

56:00 want to replay it.

56:01 Why is it using a lot of CPU now?

56:02 Yeah.

56:03 Yeah.

56:03 That's, I really like that.

56:04 That's super cool.

56:05 All right.

56:06 Also, you know, concurrency is something that Python has gotten a bad rap for in terms of slowness.

56:11 I think with async and await and asyncio, if you're waiting on an external thing, Python can be ultra

56:17 fast now, right?

56:18 Like it's acing and awaiting waiting on like database calls, web calls with the right drivers, super fast.

56:26 But when it comes down to the computational stuff, there's still the GIL and there's really not a

56:30 great fix for that.

56:32 I mean, there's multiprocessing, but that's got a lot of overhead.

56:35 So it's got to make sense, right?

56:36 Kind of like your plumber analogy, right?

56:38 You can't do like one line function calls in multiprocessing or, you know, like one line computations.

56:45 But the work that Eric Snow's doing with subinterpreters looks pretty promising to unlock another layer.

56:50 Yeah.

56:50 So it's out in the 3.9 alpha.

56:53 If you've played with that yet, it's still experimental.

56:56 So subinterpreters is somewhere in between multiprocessing and threading in terms of the like the

57:03 implementation.

57:04 So it will it doesn't spawn.

57:06 So if you use multiprocessing, I mean, that's basically just saying let's hire another plumber and we'll get

57:13 them to talk to each other at the beginning of the day and split up the tasks.

57:17 Whereas subinterpreters, actually, maybe they're sharing the same van.

57:20 I'm not sure where this analogy is going, but, you know, they use the same process.

57:25 The subinterpreters share the same Python process.

57:27 It doesn't spawn up an entirely new process.

57:29 It doesn't have to load all the modules again.

57:33 And the subinterpreters can also talk to each other.

57:36 they can use shared memory to communicate with each other as well.

57:40 But because they're separate interpreters, then technically they can have their own locks.

57:47 So the lock that, you know, gets locked whenever you run any opcode is the

57:52 interpreter lock.

57:52 And this basically means that you can have two interpreters running in a

57:57 single process, each with its own lock.

57:59 So it can be running different operations at the same time.

58:03 Right.

58:04 They would automatically run on separate threads.

58:06 So you're basically running multi-threading and it can also use multi-CPU.

58:10 That'd be great.

58:11 Fundamentally, the GIL is not about a threading thing per se.

58:16 It's about serializing memory access allocation and deallocation.

58:21 And so with the subinterpreters, the idea is you don't directly share pointers

58:26 between subinterpreters.

58:27 There's like a channel type of communication between them.

58:30 So you don't have to take a lock on one when it's working with objects versus another, like they're entirely

58:36 different set of objects.

58:37 They're still in the same process space, but they're not actually sharing

58:40 pointers.

58:40 So you don't need to protect each other.

58:42 Right.

58:42 You just have to protect within each subinterpreter, which has a possibility to let me use all six of

58:47 my cores.

58:48 Yeah, absolutely.

58:49 You can't read and write from the same local variables for that reason, which you can do in threading.

58:54 But with subinterpreters, it's kind of like a halfway halfway between just

58:58 running a separate process.

58:59 Yeah.

58:59 It probably formalizes some of the multi-threading communication styles that are going to keep things safer

59:05 anyway.

59:06 Definitely.

59:07 Yeah.

59:07 All right.

59:08 Let's talk about one really quick thing before we wrap it up.

59:10 Just one interesting project that you've been working on.

59:13 I mentioned that you were on before about some security issues, right?

59:16 Yeah.

59:16 I want to tell people about your PyCharm extension that you've been working on.

59:19 Yeah.

59:19 So I've been working on a PyCharm extension called Python Security.

59:23 It's very creatively named.

59:25 It's available.

59:27 Take the straightforward.

59:28 Yeah, exactly.

59:29 So it's basically like a code checker, but it runs inside PyCharm and it will

59:34 look for security vulnerabilities that you may have written in your code and

59:39 underline them for you and in some cases fix them for you as well.

59:42 So it will say the thing you've done here is really bad because it can cause

59:46 someone to be able to hack into your code and you can just press the quick

59:50 fix button and it could fix it for you.

59:52 So it's got actually over a hundred different inspections now.

59:56 And also you can run it across...

59:58 Should I use the YAML.load still?

01:00:00 Is that good?

01:00:00 No.

01:00:01 I think that was like the first checker, right?

01:00:05 Actually, it was the YAML.load.

01:00:07 Yeah, you can run it across the whole project.

01:00:09 So you can do a code inspection across your project to like a code audit.

01:00:12 And also it uses PyCharm's package manager.

01:00:15 So it will go in and look at all the packages you have installed in your project and it will

01:00:21 check them against either Snyk, which is a big database of vulnerable Python packages.

01:00:28 Snyk.io uses their API.

01:00:31 So it checks it against that or you can check it against like your own list.

01:00:36 And also it's available as a GitHub action.

01:00:39 So manage to figure out how to run PyCharm inside Docker so that you can run PyCharm from

01:00:46 GitHub action.

01:00:47 Wow.

01:00:48 Yeah, you can write a CICD script in GitHub to just say inspect my code and it will just

01:00:55 inside GitHub.

01:00:56 You don't need PyCharm to do it, but it will run the inspection tool against your code repository.

01:01:01 It just requires that it's open source to be able to do that.

01:01:03 Okay, that's super cool.

01:01:04 All right, well, we're definitely out of time, so I have to leave it there.

01:01:07 Two quick questions.

01:01:08 Favorite editor, notable package?

01:01:10 What do you got?

01:01:10 PyCharm and I don't know about the notable package.

01:01:14 I don't know.

01:01:15 Yeah, you've been too far in the C code.

01:01:16 Yeah, I know.

01:01:17 I'm like, what are packages?

01:01:18 I think there's something that does install those, but they don't work down in C.

01:01:22 Yeah, no, that's cool.

01:01:23 All right, so people are interested in this.

01:01:26 They want to maybe understand how CPython works better or how that works and where and why

01:01:31 it might be slow so they can avoid that.

01:01:33 Or maybe they even want to contribute.

01:01:35 What do you say?

01:01:35 Wait for my book to come out and read the book or read the real Python article, which is free

01:01:40 and online.

01:01:40 And it talks through a lot of these concepts.

01:01:43 Yeah, right on.

01:01:43 Well, Anthony, thanks for being back on the show.

01:01:46 Great, as always, to dig into the internals.

01:01:48 Thanks, Michael.

01:01:48 Yeah, you bet.

01:01:49 Bye.

01:01:50 Bye.

01:01:51 This has been another episode of Talk Python To Me.

01:01:54 Our guest on this episode was Anthony Shaw, and it's been brought to you by Brilliant.org

01:01:59 and Sentry.

01:01:59 Brilliant.org encourages you to level up your analytical skills and knowledge.

01:02:04 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every

01:02:10 day.

01:02:10 Take some stress out of your life.

01:02:12 Get notified immediately about errors in your web applications with Sentry.

01:02:17 Just visit talkpython.fm/sentry and get started for free.

01:02:21 Want to level up your Python?

01:02:23 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:02:28 Or if you're looking for something more advanced, check out our new Async course that digs into

01:02:33 all the different types of async programming you can do in Python.

01:02:36 And of course, if you're interested in more than one of these, be sure to check out our

01:02:40 Everything Bundle.

01:02:41 It's like a subscription that never expires.

01:02:43 Be sure to subscribe to the show.

01:02:45 Open your favorite podcatcher and search for Python.

01:02:47 We should be right at the top.

01:02:48 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:02:53 and the direct RSS feed at /rss on talkpython.fm.

01:02:57 This is your host, Michael Kennedy.

01:02:59 Thanks so much for listening.

01:03:01 I really appreciate it.

01:03:02 Now get out there and write some Python code.

01:03:04 I'll see you next time.

01:03:24 Thank you.