Parallel Python Apps with Sub Interpreters

Episode #447, published Sat, Feb 3, 2024, recorded Tue, Dec 5, 2023

Episode Deep Dive Links Transcript

It's an exciting time for the capabilities of Python. We have the Faster CPython initiative going strong, the recent async work, the adoption of typing and on this episode we discuss a new isolation and parallelization capability coming to Python through sub-interpreters. We have Eric Snow who spearheaded the work to get them added to Python 3.12 and is working on the Python API for 3.13 along with Anthony Shaw who has been pushing the boundaries of what you can already do with subinterpreters.

Play on YouTube

Watch the live stream version

Episode Deep Dive

The Guests

Eric Snow (Microsoft, Python core developer) has worked on Python’s core runtime for over a decade. He is the driving force behind sub-interpreters in Python and co-author of PEP 684 (Per-Interpreter GIL) and PEP 734 (Multiple Interpreters in the Stdlib). His day-to-day includes building performance enhancements such as Faster CPython.

Anthony Shaw (Microsoft, Python advocate, open-source contributor) works on Python tooling, advocacy, and research into parallelism in Python. He's author of CPython Internals and has been experimenting with sub-interpreters in real-world scenarios, from running web apps in parallel to adapting existing libraries for multi-core Python.

What to Know if You're New to Python

Python is a high-level language that emphasizes readability and simplicity.
Before diving into parallelization topics like sub-interpreters, you should feel comfortable with fundamental Python structures and modules.
Understanding how Python handles concurrency (threads vs. async vs. multiprocessing) is beneficial.
If you want a concise introduction to concurrency in Python, check out Async Techniques and Examples in Python from Talk Python Training.

Key Topics and Takeaways

Why Sub-Interpreters Matter
Eric Snow explained that sub-interpreters allow multiple Python interpreters to coexist within a single process. Each interpreter has its own GIL (Global Interpreter Lock), effectively enabling parallel CPU-bound execution on multiple cores. This new approach addresses some limitations of threads (one shared GIL) and the heaviness of multiprocessing (spawned OS processes).
PEP 684 and PEP 734
- PEP 684 – Per-Interpreter GIL: This proposal lays the groundwork for separate GILs per interpreter. It required significant work to isolate global state in CPython.
- PEP 734 – Multiple Interpreters in the Stdlib: This upcoming PEP will introduce a high-level Python API for creating and managing sub-interpreters, making it easier to run Python code in parallel without starting separate processes.
  Links:
- PEP 684
- PEP 734
Parallel Web Applications
Anthony tested sub-interpreters in a web context by adapting Hypercorn to spawn sub-interpreter workers. While frameworks often rely on multiple processes (e.g., Gunicorn), sub-interpreters potentially reduce overhead and speed up startup. This indicates a future direction for high-throughput Python web apps without resorting to heavyweight processes.
Data Sharing and Serialization
Because each sub-interpreter is isolated, sharing data between them requires explicit communication via queues or pickle-like serialization. Anthony’s exploration revealed that techniques like dill or msgspec can improve serialization speed. They also discussed passing memoryview objects directly without extra copying—useful for NumPy arrays.
Integration with Testing
Anthony’s work on pytest subinterpreters (Pytest Subinterpreters Example) demonstrates how C extensions or packages can be tested in isolated environments. This helps catch hidden bugs in extension modules that assume there’s only one GIL or a single global state.
C Extensions and Global State
A big part of sub-interpreter readiness is ensuring that extensions don’t rely on hidden global variables (e.g., readline, datetime). Anthony and Eric encountered modules that either share state incorrectly or crash when used in sub-interpreters. Future Python releases will include more robust checks and updated modules to handle multi-interpreter usage gracefully.
Memory Model and Performance
Each sub-interpreter allocates its own memory arenas, avoiding direct conflicts and reducing the need for a single global lock. This can improve parallel CPU-bound tasks. While the approach is conceptually similar to multiprocessing, it’s more lightweight and integrates more seamlessly with the existing Python environment.
Sub-Interpreter Pool Executors
PEP 734 envisions a “sub-interpreter pool executor” similar to threading.ThreadPoolExecutor or multiprocessing.Pool. This would let you reuse interpreters across many tasks, minimizing repeated startup overhead. The result is near-process-level parallelism without the cost of new processes each time.

Quotes & Stories

Anthony on performance trade-offs: “If you create a fresh Python process for a small task, you waste a lot of startup time. Sub-interpreters drastically cut that overhead.”
Eric on concurrency models: “Once you move beyond one GIL, you need to be deliberate about your data. Sub-interpreters force you to do concurrency the right way.”

Overall Takeaway

Sub-interpreters mark a significant step in Python’s evolution toward multi-core CPU utilization. By giving each interpreter its own GIL, developers can take advantage of parallelism without spinning up full processes. Though still evolving, experiments in web frameworks, testing, and extension modules show promising speedups and better resource usage. With PEP 734 slated for Python 3.13, now is the perfect time to explore this new concurrency model—and be prepared for a future where Python apps run faster and more safely on modern hardware.

Links from the show

Guests
Anthony Shaw: @tonybaloney@fosstodon.org
Eric Snow: @ericsnowcurrently@mastodon.social

PEP 684 – A Per-Interpreter GIL: peps.python.org
PEP 734 – Multiple Interpreters in the Stdlib: peps.python.org
Running Python Parallel Applications with Sub Interpreters: fosstodon.org
pytest subinterpreters: fosstodon.org
Long-Term Vision for a Parallel Python Programming Model?: fosstodon.org

Hypercorn Server: github.com
msgspec: jcristharif.com
Dill package: pypi.org
Watch this episode on YouTube: youtube.com
Episode #447 deep-dive: talkpython.fm/447
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #447 deep-dive: talkpython.fm/447

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 It's an exciting time for the capabilities of Python.

00:02 We have the Faster CPython initiative going strong, the recent async work, the adoption of typing,

00:09 and on this episode, we discuss a new isolation and parallelization capability coming to Python

00:15 through sub-interpreters.

00:16 We have Eric Snow, who spearheaded the work to get them added to Python 3.12

00:21 and is working on the Python API for them in 3.11, along with Anthony Shaw, who's been pushing the boundaries

00:27 of what you can already do with sub-interpreters.

00:30 This is Talk Python To Me, episode 446, recorded December 5th, 2023.

00:36 Welcome to Talk Python To Me, a weekly podcast on Python.

00:54 This is your host, Michael Kennedy.

00:55 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

01:00 both on fosstodon.org.

01:03 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:08 We've started streaming most of our episodes live on YouTube.

01:12 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows

01:18 and be part of that episode.

01:19 This episode is sponsored by PyBites Developer Mindset Program.

01:24 PyBytes' core mission is to help you break the vicious cycle of tutorial paralysis through developing real-world applications.

01:30 The PyBites Developer Mindset Program will help you build the confidence

01:34 you need to become a highly effective developer.

01:37 And it's brought to you by Sentry.

01:39 Don't let those errors go unnoticed.

01:41 Use Sentry.

01:42 Get started at talkpython.fm/sentry.

01:45 Anthony, Eric, hello, and welcome to Talk Python.

01:49 Great.

01:50 Hey, guys.

01:50 It's really good to have you both here.

01:51 You both have been on the show before, which is awesome.

01:55 And Eric, we've talked about sub-interpreters before, but they were kind of a dream almost at the time.

02:01 That's right.

02:02 Now they feel pretty real.

02:04 That's right.

02:05 Yeah, it's been a long time coming.

02:07 And I think the last time we talked, I've always been hopeful, but it seemed like it was getting

02:11 closer.

02:12 So with 312, we were able to land Per Interpreter Gill, which kind of was the last piece, the

02:18 foundational part I wanted to do.

02:20 A lot of cleanup, a lot of work that had to get done.

02:22 But that last piece got in for 312.

02:25 Excellent.

02:26 So good.

02:26 And maybe let's just do a quick check-in with you all.

02:30 It's been a while.

02:31 You know, Anthony, start with you, I guess.

02:34 Quick intro for people who don't know you, although I don't know how that's possible.

02:37 And then just what you've been up to.

02:39 Yeah.

02:39 I'm Anthony Shaw.

02:41 I work at Microsoft to lead the Python advocacy team.

02:44 And I do lots of Python stuff, open source, testing things, building tools, blogging, building

02:54 projects, sharing things.

02:55 You have a book, something about the inside of Python?

02:58 Oh, yeah, I have a book as well.

02:58 Yeah?

02:58 I forget about that.

02:59 Yeah.

03:00 Yeah, there's a book called CPython Internals, which is a book all about the Python compiler

03:04 and how it works.

03:05 And you suppress the memory of writing it, like it was too traumatic.

03:08 It's down there.

03:09 I keep forgetting.

03:11 Yeah, that book was for 3.9.

03:15 And people keep asking me if I'm going to update it for 3.13, maybe, because things keep changing.

03:22 Yeah.

03:23 Things are definitely changing at a more rapid pace than they were a few years ago as well.

03:27 So that maybe makes it more challenging.

03:29 Yeah, recently, I've been doing some more research as well.

03:34 So I just finished my master's a few months ago and I started my PhD.

03:39 And I'm looking at parallelism in Python as one of the topics.

03:43 So I've been quite involved in subinterpreters and the free threading project and some other

03:49 stuff as well.

03:50 Awesome.

03:51 Congratulations on the master's degree.

03:52 That's really great.

03:53 Thanks.

03:53 And I didn't realize you were going further.

03:56 So Eric.

03:56 Eric Snow.

03:58 So I've been working on Python as a core developer for over 10 years now.

04:03 But I've been participating for even longer than that.

04:07 And it's been good.

04:09 I've worked on a variety of things, a lot of stuff down in the core runtime of CPython.

04:15 And I've been working on this, trying to find a solution for multi-core Python since, really,

04:21 since 2014.

04:23 So I've been slowly, ever so slowly, working towards that goal.

04:28 And we've made it with 3.12 and there's more work to do.

04:31 But that's a lot of the stuff that I've been working on.

04:34 I'm at Microsoft, but don't work with Anthony a whole lot.

04:39 I work on the Python performance team with Guido and Grant Booger and Mark Shannon here

04:46 at Kedrill.

04:47 And we're just working generally to make Python faster.

04:50 So my part of that has involved sub-interpreters.

04:53 Awesome.

04:53 Interestingly enough, it's only really this year that I've been able to work on all the sub-interpreter

05:00 stuff full-time.

05:01 Before that, I was working mostly on other stuff.

05:04 So this year has been a good year for me.

05:07 Yeah, I would say.

05:08 That must be really exciting to get the, like, you know what?

05:11 Why don't you just keep, just do that?

05:12 That'd be awesome for us.

05:13 Yeah, it's been awesome.

05:15 Well, maybe since you're on the team, let's a quick check-in on FasterCPython.

05:20 It's made a mega difference over the last couple of releases.

05:23 Yeah, it's kind of interesting.

05:25 Mark Shannon has a, definitely has a vision.

05:28 He's developed a plan as of, like, years ago.

05:31 But we finally were able to get him, put him in a position where he could do something

05:37 about it.

05:37 And we've all been kind of pitching in.

05:40 A lot of it has to do with just applying some of the general ideas that are out there regarding

05:46 dynamic languages and optimization.

05:47 Things have been applied to other things like HHVM or various of the JavaScript runtimes.

05:55 And so a lot of specialization, adaptive specialization, a few other techniques.

06:02 But right now, so a lot of that stuff we're able to get in for 3.11.

06:09 In 3.12, there wasn't quite as much impactful stuff.

06:13 We're kind of gearing up to effectively add a JIT into CPython.

06:18 And that's required a lot of kind of behind-the-scenes work to get things in the right places.

06:25 So we're somewhat targeting 3.13 for that.

06:29 So right now, I think with where things are at, we're kind of break-even performance-wise.

06:37 But there's a lot of stuff that we can do.

06:39 A lot of optimization work that really hasn't even been done yet that'll take that performance

06:45 improvement up pretty drastically.

06:47 It's kind of hard to say where we're going to be.

06:50 But for 3.13, it's looking pretty good for at least some performance improvement because

06:57 of the JITing and optimization work.

06:59 That's exciting.

07:01 Yeah, we have no real JIT at the moment, right?

07:03 Not in CPython.

07:05 Yeah.

07:05 I mean, I know there's number.

07:07 In Anthony, yeah, it is best.

07:08 I know.

07:10 Well, that's actually super exciting because I feel like that could be another big boost,

07:15 potentially.

07:15 You know, with the JIT, you come to all sorts of things like inlining of small methods and

07:20 optimization based on type information and, yeah.

07:23 Yep.

07:24 All that stuff.

07:24 One of the most exciting parts for me is that a lot of this work, not long after I joined

07:30 the team, so two years ago, two and a half years ago, somewhere in there, pretty early

07:37 on, we started reaching out to other folks, other projects that were interested in performance

07:43 and performance of Python code.

07:45 And we've worked pretty hard to collaborate with them.

07:49 So like the team over at Meta, they have a lot of interest in making sure Python is very

07:57 efficient.

07:57 And so we've actually worked pretty closely with them and they're able to take advantage

08:03 of a lot of the work that we've done, which is great.

08:05 Yeah.

08:06 There seems to be some synergy between the sender team and the faster CPython team.

08:10 So awesome.

08:11 But let's focus on a part that is there, but not really utilized very much yet, which is

08:19 the sub-interpreter.

08:20 So back on, when is this?

08:22 2019.

08:23 Eric, I had you on and we talked about, can sub-interpreters free us from Python's gil?

08:29 And then since then, this has been accepted, but it's Anthony's fault that we're here.

08:35 Because Anthony posted over on Mastodon, hey, here's a new blog post, me running Python parallel

08:41 applications with sub-interpreters.

08:42 How about we use Flask and FastAPI and sub-interpreters and make that go fast?

08:48 And that sounded more available in the Python level than I kind of realized the sub-interpreter

08:55 stuff was.

08:56 So that's super exciting, both of you.

08:58 Yeah, it's been fun to play with it and try and build applications on it and stuff like

09:02 that.

09:03 And working with Eric probably over the last couple of months on things that we've discovered

09:08 in that process, especially with C extensions.

09:12 Daytime?

09:13 Yeah.

09:14 Yeah, that's one.

09:16 With C extensions.

09:18 And I think that some of those challenges are going to be the same with free threading as

09:24 well.

09:24 So it's how C extensions have state, where they put it, whether that's thread safe.

09:30 And as soon as you kind of open up the possibility of having multiple gills in one process, then

09:36 what challenges does that create?

09:40 Absolutely.

09:41 Well, I guess maybe some nomenclature first.

09:43 Not no-gil Python or sub-interpreter Python.

09:47 Free-threaded.

09:48 Is that what we're calling it?

09:49 What's the name?

09:50 How do we speak about this?

09:52 It's not quite settled, but I think a lot of people have taken to referring to it as

09:58 free-threaded.

09:58 I can go with that.

09:59 I mean, people still talk about no-gil, but free-threaded is probably the best bet.

10:04 Are you describing what it does and why you care?

10:07 Or are you describing the implementation, right?

10:09 Like the implementation is it has no gill, so it can be free-threaded, or it has sub-interpreters,

10:13 so it can be free-threaded.

10:15 But really what you want is the free-threaded part.

10:17 You don't care actually about the GIL too much, right?

10:19 Kind of.

10:20 Well, it's interesting.

10:21 With sub-interpreters, it really isn't necessarily a free-threaded model.

10:26 It's kind of free-threaded only in the part at which you're moving between interpreters.

10:33 So you only have to care about it when you're interacting between interpreters.

10:37 The rest of the time, you don't have to worry about it.

10:39 Where with the no-gil is more kind of what we think of as free-threading, where everything

10:46 is unsafe.

10:47 And for people who don't know, the no-gil stuff is what's coming out of the sender team

10:50 and from Sam Gross.

10:51 And that was also approved, but with the biggest caveat I've ever seen on an approved pep.

10:55 Sure.

10:56 Like, we approve this.

10:58 We also reserve the right to completely undo it and not approve it anymore.

11:01 But it's also a compiler flag that is an optional off-by-default situation.

11:06 So it should be interesting.

11:08 Yeah, we can maybe compare and contrast them a bit later as well.

11:11 Yeah, absolutely.

11:12 Well, let's start with what is an interpreter.

11:16 So then how do we get to subinterpreters?

11:18 And then what work did you have to do?

11:20 I heard there was a few global variables that were being shared, Eric.

11:23 Yeah.

11:24 Maybe let's give people a quick rundown of what is this and how is this new feature in 3.12

11:30 changing things?

11:31 Yeah.

11:31 Subinterpreters, in a Python process, when you run Python, everything that happens, all

11:39 the machinery that's running your Python code is running with a certain amount of global state.

11:45 And historically, you can think of it as, you know, across the whole process.

11:50 You've got a bunch of global state.

11:52 If you look at all the stuff like in the sys module, sys.modules or sys.whatever, all those

11:59 things are shared across the whole runtime.

12:02 So if you have different threads, for instance, running it, they all share that stuff, even though

12:07 you're going to have different code running in each thread.

12:10 So all of that runtime state is everything that Python needs in order to run.

12:15 But what's interesting is that the vast majority of it, you can think of as the actual interpreter.

12:23 And so that state, if we treat it as isolated and we're very careful about it, then we can

12:29 have multiple of them.

12:30 That means that when your Python code runs, that it can run with a different set of this

12:36 global state, different modules imported, different things going on, different threads that are

12:42 unrelated and really don't affect each other at all.

12:45 And then with that in mind, you can take it one step farther and say, well, let's completely

12:51 isolate those and not even have them share a gill, right?

12:55 And then at that point, that's where the magic kind of happens.

12:59 So that's kind of my first goal in this whole project was to get to that point.

13:05 Because once you get there, then it opens up a lot of possibilities when it comes to

13:10 concurrency and parallelism.

13:12 Yeah.

13:12 Then Anthony can start running with his blog posts and showing off good things.

13:16 Yeah, absolutely.

13:17 This portion of Talk Python To Me is brought to you by the Pybytes Python Developer Mindset

13:26 Program.

13:26 It's run by my two friends and frequent guests, Bob Belderbos and Julian Sequira.

13:32 And instead of me telling you about it, let's hear them describe their program.

13:35 In a world where AI, machine learning and large language models are revolutionizing how we live

13:43 and work, Python stands at the forefront.

13:46 Don't get left behind in this technological evolution.

13:51 Tutorial paralysis?

13:52 Tutorial paralysis?

13:52 That's a thing of the past.

13:54 With Pybytes Coaching, you move beyond endless tutorials to become an efficient, skilled Python

13:59 developer.

14:00 We focus on practical, real-world skills that prepare you for the future of tech.

14:05 Join us at Pybytes and step into a world where Python isn't just a language, but a key to unlocking

14:12 endless possibilities in the tech landscape.

14:15 Check out our 12-week PDM program and embark on a journey to Python mastery.

14:20 The future is Python.

14:22 And with Pybytes, you're one step ahead.

14:24 One thing I don't know the answer to, but might be interesting, is Python has a memory management

14:43 story in front of the operating system, virtual memory that's assigned to the process with pools,

14:48 arenas, blocks, those kinds of things.

14:51 What's that look like with regard to sub-interpreters?

14:54 Does each sub-interpreter have its own chunk or set of those for the memory it allocates,

14:59 or is it still a shared one thing per process?

15:02 It's per interpreter.

15:04 This is something that was very global.

15:07 And like you pointed out earlier, this whole project was all about taking all sorts of global

15:13 state that was actually stored in C global variables all over the place, right?

15:18 And pulling those in together into one place and moving those down from kind of the process

15:25 global state down into each interpreter.

15:27 So one of those things was all of the allocator state that we have for objects.

15:34 And Python kind of has this idea of different levels of allocators.

15:39 The object allocator is what's used heavily for Python objects, of course, but some other

15:44 state as well.

15:45 And the object allocator is the part that has all the arenas and everything, like you were

15:51 saying.

15:51 Yeah.

15:52 So part of what I did before we could make the GIL per interpreter, we had to make the

15:58 allocator state per interpreter.

15:59 Well, the reason I think that it's interesting asking about it, one, because of the gill, obviously,

16:04 but the other one is, it seems to me like these sub interpreters could be used for a little

16:09 bit of stability or isolation or run some kind of code.

16:13 And when that line exits, I want the memory freed.

16:16 I want models unloaded.

16:18 I wanted to go back to the way it was.

16:19 You know what I mean?

16:21 Whereas normally in Python, even if the memory becomes free, right, it's still got some of

16:25 that, like, well, we allocated this stuff.

16:27 Now we're holding it to refill it.

16:28 And then, you know, you don't unimport modules and the modules can be pretty intense, actually,

16:34 if, you know, if they start allocating a bunch of stuff themselves and so on.

16:37 What do you guys think about this as an idea, as an aspect of it?

16:41 Yeah, there's one example that's been coming across recently.

16:44 And this is a pattern.

16:46 I think it's a bit of an anti-pattern, actually.

16:48 But some Python packages, they store, like, some state information in the module level.

16:56 So an example is a SDK that I've been working with, which has just been rewritten to stop

17:02 it from doing this.

17:03 But you would put the API key of the SDK, you would import it.

17:08 So you'd import X and then do, like, X dot API key equals.

17:13 So it basically stores the API key in the module object, which is fine if you've got,

17:21 you've imported the module once and you're using it once.

17:24 But what you see is that if you put that in a web application, it just assumes that everyone

17:30 uses the same key.

17:31 So, you know, you can't have, you can't import that module and then connect to it with different

17:38 API keys, like you'd have different users or something.

17:40 Right.

17:40 So you had some kind of multi-tenancy, right?

17:43 Where, like, they would say, you know, enter their ChatGPT, open AI key, and then they could

17:49 work on behalf of that, right?

17:50 Like, potentially something like that, right?

17:52 Yeah, exactly.

17:53 So that's kind of like an API example.

17:55 But there are other examples where, let's say you're loading data or something and it's,

17:59 and it stores some temporary information somewhere in like a class attribute or even like a

18:05 module attribute like that.

18:06 If you've got one piece of code loading data and then in another thread in a web app or

18:13 just in another thread generally, you're reading another piece of data and they're sharing state

18:18 somehow and you've got no isolation.

18:20 Some of that is due to the way that people have written the Python code or the extension

18:25 code has kind of been built around, oh, we'll just put this information here.

18:31 And they haven't really thought about the isolation.

18:33 Sometimes it's because on the C level, especially that because the GIL was always there, they've

18:40 never had to worry about it.

18:42 So, you know, let's, you could just have a counter, for example, or there's an object, which is a

18:47 dictionary that is a cache of something.

18:50 and you just put that, as a static variable, and you just read and write from

18:55 it.

18:56 You've never had to worry about thread safety because the girl was there to kind of protect

18:59 you.

19:00 You probably shouldn't have built it that way, but it didn't really matter because it worked.

19:04 what about this, Anthony?

19:06 What if we can, if we can write it on one line, it'll probably be safe, right?

19:10 If we can fit it just one line of Python code, it'd be okay.

19:13 Yeah.

19:14 Yeah.

19:16 Dictionary.ad.

19:16 What's wrong there?

19:17 Dictionary.get.

19:18 It's fine.

19:18 Yeah.

19:19 So yeah, what we're saying within sub interpreters, I think what's the concept that people will

19:25 need to kind of understand is where the, where the isolation is, because there are different

19:31 models for running parallel code.

19:32 And at the moment we've got co-routines, which is asynchronous.

19:38 So it can run concurrently.

19:40 So that's, if you do async and a wait, or if you use the old, co-routine decorator,

19:44 you've also got things like generators, which are kind of like a concurrent pattern.

19:49 you've got threads that you can create.

19:52 all of those live within the same interpreter and they share the same information.

19:58 So you don't have to, if you create a thread inside that thread, you can read a variable

20:04 from outside of that thread.

20:06 and it doesn't complain.

20:08 you don't need to create a lock at the moment.

20:10 although in some situations you probably should.

20:13 and you don't need to reimport modules and stuff like that, which, yeah, which can

20:19 be fine.

20:20 And then at the other extreme, you've got multi-processing, which is a module in the standard

20:25 library that allows you to create extra Python processes and then kind of gives you like an

20:30 API to talk to them and share information between them.

20:33 And that's kind of like the app, the other extreme, which is you've got on it is the, you

20:38 know, the ultimate level of isolation.

20:40 You've got a whole separate Python process.

20:42 but instead of interacting with it via the command line, you've kind of got this

20:46 nicer API where you can almost treat it like it's in the same process as the one you're

20:52 running from.

20:53 Yeah.

20:53 It's kind of magical actually that that at all where you get a return value from a process,

20:57 for example.

20:58 Right.

20:59 Yeah.

20:59 but the thing is, if you peel back the covers a little bit, then like how it

21:03 sends information to the other Python process involves a lot of pickles.

21:08 and it's not particularly efficient.

21:11 And also a Python process has a lot of extra stuff that you maybe necessarily didn't even

21:17 need.

21:18 Like you get all this isolation from having it, but you have to import all the modules again.

21:22 You have to create the arenas again, or the memory allocation.

21:26 You have to do all the startup process again, which takes a lot of times, like at least 200

21:31 million seconds.

21:31 The Python code again, right?

21:32 At least the PYC.

21:33 Yeah.

21:34 Yeah.

21:34 Yeah, exactly.

21:35 So you basically created like a whole separate Python.

21:38 and if you do that just to run a small chunk of code, then it's not probably the best

21:45 model at all.

21:45 Yeah.

21:46 You have a nice graph as well that shows like sort of the rate as you add more work and you

21:51 need more parallelism.

21:52 We'll get to that.

21:53 I'm sure.

21:54 you know, one thing that struck me coming to Python from other languages like C, C++,

21:59 C#, there's very little locks and events, threading, coordinating stuff in Python.

22:06 And I think that there's probably a ton of Python code that actually is not actually thread

22:12 safe, but people kind of get away with it because the context switching is so coarse grained,

22:17 right?

22:18 Like you say, well, the gills there.

22:19 So you only run one instruction at a time, but like this temporary invalid state you enter

22:24 to as part of like, just your code running, like took money out of this account.

22:27 And then I'm going to put it into that account.

22:29 Those are multiple Python lines and there's nothing saying they couldn't get interrupted between

22:33 one to the other.

22:34 And then things are busted.

22:35 Right.

22:36 I feel there's, there's some concern about adding this concurrency, like, oh, we're having

22:40 to worry about it.

22:41 Like you probably should be worrying about it now.

22:42 Not as much necessarily, but it's, I feel like people are getting away with it because

22:47 it's so rare, but it's not, it's a non-zero possibility.

22:51 What do you guys think?

22:51 Yeah.

22:52 I mean, those are real concerns.

22:53 It's as there's been lots of discussion with the no-go work about really what, what matters,

23:01 what we need to care about really what impact it's going to have.

23:05 And I mean, it's probably going to have some impact on people with Python code, but it'll especially have impact on people that maintain extension modules.

23:15 Yeah.

23:16 But it really is all the pain that comes with free threading.

23:21 that's what it introduces with the benefits as well, of course.

23:28 But what's interesting, I'd like to think of sub interpreters kind of provide the same facility, but they force you to be explicit about what gets shared and they force you to do it in a thread safe way.

23:42 So it's, you can't do it without thread safety.

23:46 And so it's not an issue and it doesn't hurt that people really haven't used sub interpreters extensively up till now, whereas threads are kind of something that's been around for quite a while.

23:58 Yeah, it has been in, well, it's sub interpreters have traditionally just been a thing you can do from C extensions or the C API, which really limits them from being used in just a standard.

24:09 Like I'm working on my web app, so let's just throw in a couple of sub interpreters, you know, but in 313, is that when we're looking at having a Python level API for creating, interacting with?

24:20 Yeah, I've been working on a PEP for that PEP 554 recently created a new PEP to replace that one, which is PEP 734.

24:30 That's the one.

24:31 So that's the one that I'm targeting for 313.

24:35 And I don't know, it's pretty straightforward.

24:37 Create interpreters and kind of look at them and with an interpreter, run some code, you know, pretty basic stuff.

24:46 And then also because sub interpreters aren't quite so useful if you can't cooperate between them.

24:52 But there's also a queue type that, you know, you push stuff on and you pop stuff off and just pretty basic.

25:00 So you could write something like await queue.pop or something like that.

25:05 Excellent.

25:06 Yeah.

25:06 So yeah, this is really cool.

25:08 And the other thing that I wanted to talk about here, looks like you already have it in the pep, which is excellent.

25:13 Somehow I missed that before.

25:14 Is that there's an, we have thread pool executors, we have multi-processing pool executors, and this would be an interpreter pool executor.

25:23 What's the thinking there?

25:25 People are already familiar with using concurrent features.

25:28 So if we can present the same API for sub interpreters, it makes it really easy because you can set it up with multi-processing or threads and switch it over to one of the other pool types without a lot of fuss.

25:41 Right.

25:42 Basically, with a clever import statement, you're good to go, right?

25:45 From whatever import, like multi-processing pool executor as pool executor or interpreter pool executor as pool executor, and then the rest of the code could stay potentially.

25:54 Yeah.

25:54 What I expect-

25:55 The communication, like you gotta, it's gotta kind of be a basic situation, right?

25:59 Because there are assumptions.

26:01 Yeah.

26:01 And it should work mostly the same way that you already use it with threads and multi-processing.

26:08 But we'll see.

26:09 There's some limitations with sub interpreters currently that I'm sure we'll work on solving as we can.

26:17 So we'll see.

26:19 It may not be quite as efficient as I'd like at first with the interpreter pool executor, because we'll probably end up doing some pickling stuff, kind of like multi-processing does.

26:29 Although I expect it'll be a little more efficient.

26:31 This portion of Talk Python To Me is brought to you by Sentry.

26:35 You know Sentry for the air monitoring service, the one that we use right here at Talk Python.

26:39 But this time, I want to tell you about a new and free workshop.

26:42 Heaming the Kraken, managing a Python monorepo with Sentry.

26:47 Join Salma Alam Nayour, senior developer advocate at Sentry, and David Winterbottom, head of engineering at Kraken Technologies, for an inside look into how he and his team develop

26:58 deploy and maintain a rapidly evolving Python monorepo with over 4 million lines of code that powers the Kraken utility platform.

27:07 In this workshop, David will share how his department of 500 developers, who deploy around 200 times a day, use Sentry to reduce noise, prioritize issues, and maintain code quality without relying on a dedicated Q&A team.

27:21 You'll learn how to find and fix root causes of crashes, ways to prioritize the most urgent crashes and errors, and tips to streamline your workflow.

27:29 Join them for free on Tuesday, February 27th, 2024 at 2 a.m. Pacific time.

27:35 Just visit talkpython.fm/sentry dash monorepo.

27:39 That link is in your podcast player show notes.

27:42 2 a.m. might be a little early here in the U.S., but go ahead and sign up anyway if you're a U.S. listener, because I'm sure they'll email you about a follow-up recording as well.

27:51 Thank you to Sentry for supporting this episode.

27:55 I was going to save this for later, but I think maybe it's worth talking about now.

27:58 So first of all, Anthony, you wrote a lot about and have actually had some recent influence on what you can pass across, say, the starting code and then the running interpreter that's kind of like the sub-interpreter doing extra work.

28:10 Want to talk about what data exchange there is?

28:13 Yeah.

28:13 So when you're using any of these models, multiprocessing, sub-interpreters, or threading, I guess you've got two or three things to worry about.

28:23 One is how do you create it in the first place?

28:25 So how do you create a process?

28:27 How do you create an interpreter?

28:29 How do you create a thread?

28:30 The second thing is how do you send data to it?

28:33 Because normally the reason you've created them is because you need it to do some work.

28:37 So you've got the code, which is when you spawn it, when you create it.

28:42 The code that you want it to run, but that code needs some sort of input, and that's probably going to be Python objects.

28:48 It might be reading files, for example, or listening to a network socket.

28:53 So it might be getting its input from somewhere else.

28:57 But typically you need to give it parameters.

28:59 Now, the way that works in multiprocessing is mostly reliant on pickle.

29:06 If you start a process and you give it some data, either as a parameter or you create a queue and you send data down the queue or the pipe, for example, it pickles the data.

29:20 So you can put a Python object in.

29:22 It uses the pickle module.

29:24 It converts it into a byte string and then it basically converts the byte string on the other end back into objects.

29:29 That's got its limitations because not everything can be pickled.

29:35 And also some objects, especially if you've got an object which has got objects in it and it's deeply nested or you've got a big complicated dictionary or something that's got all these strange types in it which can't necessarily be rehydrated just from a byte string.

29:53 An alternative, actually, I do want to point out because for people who come across this issue quite a lot, there's another package called Dill on PyPI.

30:01 So if you think of pickle, think of Dill.

30:05 Dill is very similar to pickle.

30:08 It has the same interface, but it can pickle slightly more exotic objects than pickle can.

30:14 So often if you find that you've tried to pickle something, you try to share it with a process or a sub interpreter and it comes back and says, this can't be pickled.

30:24 You can try Dill and see if that works.

30:27 So, yeah, that's the typical way of doing it is that you would pickle an object and then on the other end, you would basically unpickle it back into another object.

30:38 The downside of that is that it's pretty slow.

30:40 It's equivalent.

30:41 Like if you use the JSON module in Python, it's kind of similar, I guess, to converting something into JSON and then converting it from JSON back into a dictionary on the other end.

30:52 Like it's not a super efficient way of doing it.

30:55 So sub interpreters have another mechanism.

30:58 I haven't read PEP 734 yet.

31:02 So I don't know how much of this is in the new PEP, Eric, or if it's in the queue.

31:08 But there's a...

31:09 It's much the same.

31:11 Okay, it's much the same.

31:12 So there's another mechanism with sub interpreters.

31:15 Because they share the same process, whereas multiprocessing doesn't, they're separate processes.

31:21 Because they share the same process, you can basically put some data in a memory space, which can be read from a separate interpreter.

31:29 Now you need to be...

31:31 Well, Python needs to be really careful.

31:32 You don't need to worry too much about it because that complexity is done for you.

31:37 But there are certain types of objects that you can put in as parameters.

31:41 You can send either as startup variables for your sub interpreter, or you can send via like a pipe, basically, backwards and forwards between the interpreters.

31:51 And these are essentially all the immutable types for Python, which is like string, unicode strings, byte strings, bool, none, integer, float, and tuples.

32:04 And you can do tuples of tuples as well.

32:07 And it seems like the tuple part had to...

32:11 Something that you added recently, right?

32:13 It says, I implemented tuple sharing just last week.

32:15 Yeah, that's in now.

32:17 I really wanted to use it.

32:19 So I thought, well, instead of keep...

32:21 I kept complaining that it wasn't there.

32:23 So I thought instead of complaining, I might as well talk to Eric and work out how to implement it.

32:28 Yeah, that's awesome.

32:29 But yeah, you can't share dictionaries.

32:31 That's one thing.

32:31 Yeah, exactly.

32:32 So one thing that I thought that might be awesome, are you familiar with message spec?

32:36 You guys seen message spec?

32:37 It's like Pydantic in the sense that you create a class with types, but the parsing performance is quite a bit, like much, much faster.

32:46 80 times faster than Pydantic, 10 times faster than Marshuro and Seatters and so on.

32:53 And faster still even than, say, JSON or UJSON.

32:57 So maybe it makes sense to use this, turn it into its serialization formats, bytes, send the bytes over and then pull it back.

33:04 I don't know.

33:04 It might give you a nice structured way.

33:06 You can share bytes strings.

33:06 So you can stick something into Pickle or you can use like msgspec or something like that to serialize something into a bytes string and then receive it on the other end and rehydrate it.

33:18 Or even Pydantic.

33:19 Like Pydantic is awesome as well.

33:20 This is meant to be super fast with a little bit of less behavior, right?

33:24 Yeah.

33:25 So this is kind of a design thing.

33:26 I think people need to consider when they're like, great, I can run everything in parallel now.

33:32 But you have to kind of unwind and think about how you've designed your application.

33:36 Like at which point do you fork off the work?

33:40 Yeah.

33:40 And how do you split the data?

33:43 You can't just kind of go into it assuming, oh, we'll just have a pool of workers and we've kind of got this shared area of data that everybody just reads for.

33:53 Yeah, I'll pass it a point or two of million entry list and I'll just run with it.

33:57 Yeah, because I mean, in any language, you're going to get issues if you do that.

34:01 Even if you've got shared memory and it's easier to read and write to the different spaces, you're going to get issues with locking.

34:07 And I think it's also important with free threading.

34:10 If you read the spec or kind of follow what's happening with free threading, it's not like the gills disappeared.

34:17 The gills been replaced with other locks.

34:20 Yeah.

34:21 So there are still going to be locks.

34:23 You can't just have no locks.

34:25 Especially cross threads, right?

34:27 Like it moves some of the reference counting stuff into like, well, it's fast on the default thread, the same thread.

34:34 But if it goes to another, it has to kick in another more thread safe case that potentially is slower and so on.

34:39 Yeah.

34:39 So yeah, the really important thing with sub interpreters is that they have their own GIL.

34:45 So each one has its own lock.

34:48 So they can run fully in parallel just as they could with multi-processing.

34:52 So I feel like a closer comparison with sub interpreters is multi-processing.

34:57 Yeah, absolutely.

34:58 Because they basically run fully in parallel.

35:00 If you start four of them and you have four cores, each core is going to be busy doing work.

35:05 You start them, you give them data, you can interact with them whilst they're running.

35:10 And then when they're finished, they can close and they can be destroyed and cleaned up.

35:16 So it's much closer to multi-processing.

35:20 But the big difference is that the overhead both on the memory and CPU side of things is much smaller.

35:27 Separate processes with multi-processing are pretty heavyweight.

35:31 They're big workers.

35:33 And then the other thing that's pretty significant is the time it takes to start one.

35:37 So starting a process with multi-processing takes quite a lot of time.

35:42 And it's significantly, I think it's like 20 or 30 times faster to start a sub interpreter.

35:48 You have a bunch of graphs for it somewhere.

35:51 There we go.

35:51 Yeah.

35:52 So I scrolled past it.

35:53 There we go.

35:54 It's not exactly the same, but kind of captures a lot of it there.

35:58 So one thing that I think is exciting, Eric, is the interpreter pool, sub interpreter pool.

36:04 Because a lot of the difference between the threading and the sub interpreter performance

36:09 is that startup of the new arenas and like importing the standard library,

36:14 all that kind of stuff that still is going to happen.

36:16 But once those things are loaded up in the process, they could be handed work easily, right?

36:21 And so if you've got a pool of, you know, like say that you have 10 cores,

36:24 you've got 10 of them just chilling or however many, you know, you've sort of done enough work to like do in parallel,

36:29 then you could have them laying around and just send like, okay, now I want you to run this function.

36:34 And now I want you to run this.

36:35 And that one means go call that API and then process it.

36:38 I think you could get the difference between threading and sub interpreters

36:41 a lot lower by having them kind of reuse basically.

36:44 Yep.

36:45 Absolutely.

36:46 There's some of the key difference, I think is mostly that when you have mutable data,

36:54 whereas with threads, you can share it.

36:57 So threads can kind of talk to each other through the data that they share with each other.

37:02 Whereas with sub interpreters, there are a lot of restrictions.

37:05 And I expect we'll work on that to an extent, but it's also part of the programming model.

37:11 And like Anthony was saying, if you really want to take advantage of parallelism,

37:15 you need to think about it.

37:17 You need to actually be careful about your data and how you're splitting up your work.

37:22 I think there's going to be design patterns that we come to know, or conventions we come to know,

37:26 like, let's suppose I need some calculation, and I'm going to use it in a for loop.

37:31 You don't run the calculation if it's the same over and over.

37:34 Every time through the loop, you run it, and then you use the result, right?

37:37 So in this, you know, a similar thing here would be like, well, if you're going to process a bunch of data,

37:41 and the data comes from, say, a database, don't do the query and hand it all the records.

37:46 Just tell it, go get that data from the database.

37:48 That way it's already serialized in the right process, and there's not this cross serialization to either pickling

37:55 or whatever mechanism you come up with, right?

37:57 But like, try to think about when you get the data, can you delay it until it's in the sub process,

38:02 or sub interpreter rather, and so on, right?

38:05 Yeah, definitely.

38:06 One interesting thing is that pep734, I've included memory view as one of the types that's supported.

38:15 So basically, you can take a memory view of any kind of object that implements the buffer protocol,

38:23 so like numpy arrays and stuff like that, and pass that memory view through to another interpreter,

38:29 and you can use it, and it doesn't make a copy or anything.

38:31 It actually uses the same underlying data.

38:34 They actually get shared.

38:36 Oh, that's interesting.

38:37 Yeah, and I think there's even more room for that with other types, but we're starting small.

38:45 But the key thing there is that, like you're saying, I mean, with coming up with different models and patterns

38:54 and libraries, I'm sure they'll come up as people feel out really what's the easiest way

39:01 to take advantage of these features, and that's the sort of thing that will apply

39:05 not just to general free-threaded, like, no-gil, but also sub-interpreters.

39:10 Definitely.

39:11 It's going to be exciting.

39:12 So I guess I want to move on and talk about working with this in Python and the stuff that you've done, Anthony,

39:18 but maybe a quick comment from the audience.

39:21 Jazzy asks, is this built on top of a queue, which is built on top of a linked list?

39:24 Because I'm building this, and my research led me to these data structures.

39:28 I guess that's the communication across sub-interpreter, cross-interpreter communication.

39:34 Yeah, with sub-interpreters, like in PEPS 734, a queue implements the same interfaces

39:40 as the queue from the queue module.

39:42 But there's no reason why people couldn't implement whatever data structure they want

39:48 for communicating between sub-interpreters, and then that data structure's in charge

39:52 of preserving thread safety and so forth.

39:55 Yeah, excellent.

39:56 Yeah, it's not a standard queue.

39:57 It's like a concurrent queue or something along those lines.

40:00 Yeah.

40:01 All right, so all of this we've been talking about here is we're looking at this cool interpreter pool

40:07 executor stuff.

40:08 That's in draft format, Anthony, for 3.13.

40:12 And somehow I'm looking at this running Python parallel applications and sub-interpreters that you're running.

40:18 What's going on here?

40:20 How do you do this magic?

40:21 You need to know the secret password.

40:23 So in Python 3.12, the C API for creating sub-interpreters was included,

40:36 and a lot of the mechanism for creating sub-interpreters was included.

40:42 So there's also a...

40:44 In CPython, there's a standard library, which I think everybody kind of knows.

40:49 And then there are some hidden modules, which are mostly used for testing.

40:55 So not all of them get bundled, I think, in the distribution.

41:00 I think a lot of the test modules get taken out.

41:03 But there are some hidden modules you can use for testing or because a lot of the test suite for CPython

41:09 has to test C APIs, and nobody really wants to write unit tests in C.

41:14 So they write the tests in Python, and then they kind of create these modules

41:19 that basically just call the C functions.

41:20 And so you can get the test coverage and do the testing from Python code.

41:24 So I guess what was from PEP6...

41:29 I can't remember.

41:30 I look at how many PEP6...

41:32 Eric will probably know.

41:34 What is now PEP734, but the Python interface to create sub-interpreters,

41:42 a version of that was included in 3.12.

41:45 So you can import this module called underscore XX sub-interpreters.

41:49 And it's called underscore XX because it kind of indicates that it's experimental

41:54 and it's underscore because you probably shouldn't be using it.

41:57 It's not safe for work to me.

41:59 I mean, I don't know.

42:02 But it provides a good way of people actually testing this stuff and seeing what happens if I import my C extension from a sub-interpreter.

42:13 So that's kind of some of what I've been doing, is looking at, okay, what can we try and do in parallel?

42:20 And this blog post, I wanted to try a WSGI or an ASGI web app.

42:28 And the typical pattern that you have at the moment, and I guess how a lot of people would be using parallel code,

42:35 but without really realizing it, is when you deploy a web app for Django Flask or FastAPI,

42:42 you can't have one guild per web server because if you've got one guild per web server,

42:48 you can only have one user per website, which is not great.

42:53 So the way that most web servers implement this is that they have a pool of workers.

43:00 G-Unicorn does that by spawning Python processes and then using the multi-processing module.

43:06 So it basically creates multiple Python processes all listening to the same socket.

43:11 And then when a web request comes in, one of them takes that request.

43:16 It also then inside that has a thread pool.

43:20 So even basically a thread pool is better for concurrent code.

43:24 So G-Unicorn normally is used in a multi-worker, multi-thread model.

43:30 That's how we kind of talk about it.

43:32 So you'd have the number of workers that you have CPU cores, and then inside that you'd have multiple threads.

43:39 So it kind of means you can handle more requests at a time.

43:43 If you've got eight cores, you can handle at least eight requests at a time.

43:47 However, because most web code can be concurrent on the backend, like you're making a database query,

43:54 or you're reading some stuff from a file like that, that doesn't necessarily need to hold the GIL.

44:00 So you can run it concurrently, which is why you have multiple threads.

44:04 So even if you've only got eight CPU cores, you can actually handle 16 or 32 web requests at once,

44:12 because some of them will be waiting for the database server to finish running at SQL query,

44:16 or the API that it called to actually reply.

44:20 So what I wanted to do with this experiment was to look at the multi-worker,

44:25 multi-thread model for web apps, and say, okay, could the worker be a sub-interpreter?

44:32 And like, what difference would that make?

44:35 So instead of using multi-processing for the workers, could I use sub-interpreters for the workers?

44:40 So even though the Python interface in 3.12 is experimental, I basically wanted to adapt Hypercorn,

44:48 which is a web server for ASCII and WSGI apps in Python.

44:53 I wanted to adapt Hypercorn and basically start Hypercorn workers from a sub-interpreter pool,

44:59 and then seeing if I can run Django, Flask, and FastAPI in a sub-interpreter.

45:04 So a single process, single Python process, but running across multiple cores and listening to web requests

45:11 and basically running and serving web requests with multiple gills.

45:15 So that was the task.

45:16 In the article, you said you had started with G Unicorn, and they just made too many assumptions about the web workers being truly sub-processes.

45:26 But Hypercorn was a better fit, you said.

45:28 Yeah, it was easier to implement this experiment in Hypercorn.

45:32 It had like a single entry point.

45:35 Because when you start an interpreter, when you start a sub-interpreter,

45:39 you need to import the modules that you want to use.

45:41 You can't just say, run this function over here.

45:46 You can, but if that function relies on something else that you've imported,

45:50 you need to import that from the new sub-interpreter.

45:52 So what I did with this experiment was basically start a sub-interpreter

45:58 that imports Hypercorn, listens to the sockets, and then is ready to serve web requests.

46:03 Interesting.

46:04 Okay.

46:04 And at a minimum, you got it working, right?

46:07 Yeah, it did a hello world.

46:09 So we got that working.

46:13 So I was, yeah, pleased with that.

46:16 And then kind of started doing some more testing of it.

46:19 So, you know, how many concurrent requests can I make at once?

46:22 How does it handle that?

46:23 What does my CPU core load look like?

46:25 Is it distributing it well?

46:27 And then kind of some of the questions are, you know, how do you share data between the sub-interpreters?

46:35 So the minimum I had to do was each sub-interpreter needs to know which web socket should I be listening to.

46:42 So like which network socket, once I've started, what port is it running on?

46:46 And is it running on multiple ports?

46:49 And which one should I listen to?

46:50 So yeah, that's the first thing I had to do.

46:52 Nice.

46:52 Yeah.

46:53 Maybe just tell people real quick about just like what are the commands like at the Python level that you look at in order to create an interpreter, run some code on it and so on.

47:02 What's this weird world look like?

47:04 Derek, do you want to cover that?

47:06 Yeah, there is a whole lot.

47:07 I mean, if we talk about PEP 734, you have an interpreters module with a create function in it that returns you an interpreter object.

47:17 And then once you have the interpreter object, it has a function called run or a method.

47:24 The interpreter object also has a method called exec.

47:27 I'm trying to remember.

47:29 It was exec sync because it's synchronous with the current thread.

47:34 And whereas exec run will create a new thread for you and run things in that there.

47:40 So they're kind of different use cases.

47:41 But it's basically the same thing.

47:43 You have some code.

47:44 Currently supports just you give it a string with all your code on it, like you load it from a file or something.

47:53 You know, basically it's a script that's going to run in that sub integer.

47:56 Alternately, you can give it a function.

48:00 And as long as that function isn't a closure, doesn't have any arguments and stuff like that.

48:05 So it's just like really basic, basically a script, right?

48:08 If you got something like that, you can also pass that through.

48:12 And then it runs it.

48:14 And that's just about it.

48:16 If you want to get some results back, you're going to have to manually pass them back kind of like you do with threads.

48:21 But that's something people already understand pretty well.

48:24 Right.

48:24 And create one of those channels.

48:25 And then you just wait for it to exit and then read from the channel or something like that.

48:29 Yeah.

48:30 And so there's a way to say things like just run.

48:33 And there's also a way to say create an interpreter.

48:36 And then you could use the interpreter to do things.

48:38 And that lets you only pay the process like startup cost once, right?

48:43 Yeah.

48:44 Yeah.

48:44 And you can also, you can call that the run multiple times.

48:49 And each time it kind of adds on to what ran before.

48:52 So if you run some code that modifies things or import some modules and that sort of thing, those will still be there the next time you run some code in that interpreter.

49:02 Which is nice because then if you've got some startup stuff that you need to do one time, you can do that ahead of time right after you create the interpreter.

49:10 But then in kind of your loop in your worker, then you run again and all that stuff is ready to go.

49:16 Oh, that's interesting because when I think about, say, my web apps, a lot of them talk to MongoDB and use Beanie.

49:22 And you go to Beanie and you tell it to like create a connection or a MongoDB client pool.

49:28 And it does all that stuff.

49:30 And then you just ambiently talk to it.

49:31 Like go to that, you know, kind of like Django or whatever, right?

49:33 Go to that class and do a query on it.

49:35 You could run that startup code like once potentially and have that pool just hanging around for subsequent work.

49:40 Nice.

49:41 All right.

49:42 Let's see some more stuff.

49:45 So you said you got it working pretty well, Anthony.

49:47 And you said one of the challenges was trying to get it to shut down, right?

49:51 Mm-hmm.

49:52 Yeah.

49:53 So in Python, when you start a Python process, you can press Ctrl-C to quit, which is a keyboard interrupt.

50:01 That kind of sends the interrupt in that process.

50:07 All of these web servers have got like a mechanism for cleanly shutting down because you don't want to just, if you press Ctrl-C, you don't want to just terminate the processes.

50:16 Because when you write an ASCII app in particular, you can have like events that you can do.

50:22 So people who've done FastAPI probably know the on-event decorator that you can put and say, when my app starts up, create a database connection pool.

50:32 And when it shuts down, then go and clean up all this stuff.

50:35 So if the web servers decided to shut down for whatever reason, whether you've pressed Ctrl-C or it just decided to close for whatever reason, it needs to tell all the workers to shut down cleanly.

50:49 So signals, like the signals module, doesn't work between sub-interpreters because it kind of sits in the interpreter state, from what I understand.

51:00 So what I did was basically use a channel so that the main worker, like the coordinator, when that had a shutdown request, it would send a message to all of the sub-interpreters to say, okay, can you stop now?

51:15 And then it would kick off a job, basically tell Hypercorn, in this case, to shut down cleanly, call any shutdown functions that you might have, and then log a message to say that it's shutting down as well.

51:28 Because the other thing is, with web servers, if it just terminated immediately, and then you looked at your logs, and you were like, okay, why did the website suddenly stop working?

51:38 And there was no log entries, and it just went from, I'm handling requests, to just, you know, absolute silence.

51:44 Yes.

51:44 That also wouldn't be very helpful.

51:46 So it needs to write log messages, it needs to call, like, shutdown functions and stuff.

51:50 Right.

51:51 So what I did was, and this is, I guess, where it's kind of a bit of a turtles all the way down, but inside the sub-interpreter, I start another thread.

52:02 Because if you have a poller which listens to a signal on a channel, that's a blocking operation.

52:09 So, you know, at the bottom of my sub-interpreter code, I've got, okay, run Hypercorn.

52:15 So it's going to run, it's going to listen to Socket's web requests.

52:18 But I need to also be able to run concurrently in the sub-interpreter a loop which listens to the communication channel and sees if a shutdown request has been sent.

52:30 So this is kind of a, maybe an implementation detail of how interpreters work in Python, but interpreters have threads as well.

52:39 So you can start threads inside interpreters.

52:42 So similar to what I said with G-Unicorn and Hypercorn, how you've got multi-worker, multi-thread, like each worker has its own threads.

52:49 In Python, interpreters have the threads.

52:53 So you can start a sub-interpreter and then inside that sub-interpreter, you can also start multiple threads.

52:59 And you can do coroutines and all that kind of stuff as well.

53:03 So basically what I did is to start a sub-interpreter which also starts a thread,

53:08 and that thread listens to the communication channel and then waits for a shutdown request.

53:12 Right.

53:12 Tells Hypercorn, all right, you're done.

53:16 We're out of here.

53:17 Yeah.

53:17 Okay.

53:18 Interesting.

53:18 Here's an interesting question from the audience, from Chris.

53:21 Well, it says, when you, we talked about the global kind of startup, like if you run that once, it'll already be set.

53:27 And, you know, does that make code somewhat non-deterministic in a sub-interpreter?

53:31 I mean, if you explicitly work with it, no.

53:33 But if you're doing the pool, like which one do you get?

53:36 Is it initialized or not?

53:38 Eric, do you have an idea of a startup function that runs in the interpreter pool executor type thing?

53:44 Or is it just they get doled out and they run what they run?

53:47 With concurrent features, it's already kind of a pattern.

53:54 You have an initialized function that you can call that'll do the right thing.

53:58 And then you have your task that the worker's actually running.

54:05 So with the, I don't know, I wouldn't say it's non-deterministic unless you have no control over it.

54:14 I mean, it's if you want to make sure that state progresses in an expected way, then you're going to run your own sub-interpreters, right?

54:23 But if you have no control over the sub-interpreters, you're just like handing off to some library that's using sub-interpreters, I would think it'd be somewhat not quite so important about whether it's deterministic or not.

54:35 I mean, each time it runs, there are a variety of things.

54:40 The whole thing could be kind of reset or you could make sure that anything that runs it, any part of your code that runs is careful to keep its state self-contained and therefore preserve determinist behavior that way.

54:58 One thing I do a lot is I'll write code that'll say, you know, if this is already initialized, don't do it again.

55:04 So I talked about like the database connection thing.

55:07 If somebody were to call it twice, it'll say, well, it looks like the connection's already not none.

55:11 So we're good.

55:13 Right.

55:13 You could just always run the startup code with one of these like short circuit things that says, hey, it looks like this interpreter, this is already done.

55:20 We're good.

55:21 But, you know, that would probably handle a good chunk of it right there.

55:25 But we're back to this thing that Anthony said, right?

55:28 Like we're going to learn some new programming patterns potentially.

55:31 Yeah.

55:32 Quite interesting.

55:33 So we talked at the beginning about how sub interpreters have their own memory and their own module loads and all those kinds of things.

55:40 And that might be potentially interesting for isolation.

55:42 Also kind of tying back to Chris's comment here, this isolation is pretty interesting for testing.

55:48 Right, Anthony?

55:49 Like my test.

55:50 So another thing you've been up to is working with trying to run pie test sessions in sub interpreters.

55:56 Tell people about that.

55:57 Yeah.

55:58 So I started off with a web worker.

56:00 One of the things I hit with a web worker was that I couldn't start Django applications.

56:07 And realized the reason was the date time module.

56:13 So the Python standard library, some of the modules are implemented in Python.

56:18 Some of them are implemented in C.

56:20 Some of them are a combination of both.

56:22 So some modules you import in the standard library have like a C part that's been implemented in C for performance reasons typically.

56:31 Or because it needs some special operating system API that you can't access from Python.

56:36 And then the front end is Python.

56:38 So there is a list basically of standard library modules that are written in C that have some sort of global state.

56:48 And then the core developers have been going down that list and fixing them up so that they can be imported from a sub interpreter or just marking them as not compatible with sub interpreters.

57:00 One such example was the read line module that Eric and I were kind of working on last week and the week before.

57:08 Read line is used for, I guess, listening to like user input.

57:13 So if you run the input built in, like read line is one of the utilities it uses to listen to keyboard input.

57:20 If you start, let's say you started five sub interpreters at the same time and all of them did a read line listened for input.

57:27 Like what would you expect the behavior to be?

57:29 Which when you type in the keyboard, where would you expect the letters to come out?

57:35 So it kind of poses an interesting question.

57:37 So read line is not compatible with sub interpreters, but it discovered like it was actually sharing a global state.

57:46 So when it initialized, it would install like a callback.

57:50 And what that meant was that even though it said it's not compatible, if you started multiple sub interpreters that imported read line, it would crash Python itself.

58:00 The date time module is another one that needs fixing.

58:04 It installs a bunch of global state.

58:07 So yeah, date time was another one.

58:10 So what I wanted to do is to try and test some other C extensions that I had and just basically write a pytest extension, a pytest plugin, I guess,

58:22 which you've got an existing pytest suite, but you want to run all of that in a sub interpreter.

58:28 And the goal of this is really that you're developing a C extension, you've written a test suite already for pytest, and you want to run that inside a sub interpreter.

58:40 So I'm looking at this from a couple of different angles, but I want to really try and use sub interpreters in other ways, import some C extensions that have never even considered the idea of sub interpreters and just see how they respond to it.

58:54 Like read line was a good example.

58:57 Like I think it was a, this won't work, but the fact that it crashed is bad.

59:02 How is it going to crash, right?

59:03 Like what's happening there?

59:05 Yeah.

59:05 So it should have, it should have kind of just said, this is not compatible.

59:09 And that's kind of uncovered a, and this is all super experimental as well.

59:14 So like, this is not, you know, you've, you've had to import the underscore XX module to even try this.

59:21 So yeah, there's, there's read line date time was another one.

59:26 And so I put this sort of pytest extension together so that I could run some existing test suites inside sub interpreters.

59:33 And then the next thing that I looked at doing was CPython has a huge test suite.

59:40 So basically how all of Python itself is tested, the parser, the compiler, the evaluation loop, all of the standard library modules have got pretty good test coverage.

59:54 So like when you compile Python from source or you make changes on GitHub, like it runs the test suite to make sure that your changes didn't break anything.

01:00:04 Now the next thing I kind of wanted to look at was, okay, can we, to try and kind of get ahead of the curve really on sub interpreter adoption.

01:00:14 So in 3.13 when PEP 7.3.4 lands, can we try and test all of the standard library inside a sub interpreter and see if it has any other weird behaviors.

01:00:26 And this test will probably apply to free threading as well, to be honest, because I think anything that you're, you're doing like this, you're importing these C extensions, which always assume that there was a big GIL in place.

01:00:40 If you take away that assumption, then you get these strange behaviors.

01:00:44 So yeah, the next thing I've been working on is basically running the CPython test suite inside sub interpreters and then seeing what kind of weird behaviors pop up.

01:00:54 I think it's a great idea because obviously CPython is going to need to run code in a sub interpreter, run R code, right?

01:01:00 So at a minimum, the framework interpreter, all the runtime bits, that should hang together, right?

01:01:06 Yeah.

01:01:07 There are some modules that it doesn't make sense to run in sub interpreters.

01:01:10 Readline was an example.

01:01:11 Some TK interrupt, maybe?

01:01:13 Yeah.

01:01:14 Yeah, possibly.

01:01:15 Maybe not, actually.

01:01:17 I don't know.

01:01:17 Yeah.

01:01:18 If you think about like, if you got, when you're doing GUI programming, right, you're going to have kind of your core stuff running the main thread, right?

01:01:27 And then you hand off, you may have sub threads doing some other work, but the core of the application, think of it as running in the main thread.

01:01:36 I think of applications in that way.

01:01:38 And there are certain things that you do in Python, standard library modules that really only make sense with that main thread.

01:01:46 So supporting those in sub interpreters isn't quite as meaningful.

01:01:51 Yeah.

01:01:52 I can't remember all the details.

01:01:54 So I feel like there are some parts of Windows itself, some UI frameworks there that required that you access them on the main program thread, not on some background thread as well, because it would freak things out.

01:02:05 So it seems like not unreasonable.

01:02:07 Yeah, same is true.

01:02:08 Like the signal module, I remember at exit, a few others.

01:02:12 Excellent.

01:02:13 All right.

01:02:13 Well, I guess let's, we're getting short on time.

01:02:15 Let's wrap it up with this.

01:02:17 So the big thing to keep an eye on really here is PEP 734, because that's when this would land.

01:02:24 You're no longer with the underscore XX sub interpreter.

01:02:28 You're just working with interpreters sub module.

01:02:32 Yeah.

01:02:32 313.

01:02:33 Yeah.

01:02:33 So right now it's in draft.

01:02:35 Like, what's it looking like?

01:02:37 If it'll be in 313, it'll be in 313 alpha something, some beta something.

01:02:42 Like, when is this going to start looking like a thing that is ready for people to play with?

01:02:46 So I, yeah, this PEP, I went through and did a massive cleanup of PEP 554, which is why I made a new PEP for it.

01:02:55 I simplified a lot of things, clarified a lot of points, had lots of good feedback from people and ended up with what I think is a good API, but it was a little different in some ways.

01:03:06 So I've had the implementation for PEP 554 mostly done and ready to go for years.

01:03:12 And so it was a matter, it's been a matter of now that I have this, this updated PEP up, going back to the implementation, tweaking it to match, and then making sure everything still feels right.

01:03:25 I try and use it in a few cases.

01:03:28 And if everything looks good, then go ahead and I'll start a discussion on that.

01:03:33 I'm hoping within the next week or two to start up a round of discussion about this PEP.

01:03:38 And hopefully we won't have a whole lot of back and forth so I can get this over to the steering councils in the near future.

01:03:45 Well, the hard work has been done already, right?

01:03:48 Yeah.

01:03:49 The C layer is there and it's accepted and it's in there.

01:03:52 Now it's just a matter of what's the right way to look at it from Python, right?

01:03:56 And one thing to keep in mind is that I'm planning on backporting the module to Python 3.12 just so that we have a print interpreter Gale in 3.12.

01:04:07 So it'd be nice if people could really take advantage of it.

01:04:11 So for that one, we'd have to pip install it or would it be added as...

01:04:14 Yeah.

01:04:14 pip install.

01:04:15 Okay.

01:04:15 I probably won't support before 3.12.

01:04:18 I mean, subinterpreters have been around for decades, but only through the C API.

01:04:23 But that said, I doubt I'll backport this module past 3.12.

01:04:28 So just 3.12 and up.

01:04:30 That's more than I expected anyway.

01:04:32 So that's pretty cool.

01:04:33 All right.

01:04:34 Final thoughts, you guys.

01:04:35 What do you want to tell people about this stuff?

01:04:38 Personally, I'm excited for where everything's going.

01:04:40 It's taken a while, but I think we're getting to a good place.

01:04:45 It's interesting with all the discussion about no-gil, it's easy to think, oh, then why do we need subinterpreters?

01:04:51 Or if we have subinterpreters, why do we need no-gil?

01:04:54 But they're kind of different needs.

01:04:56 The most interesting thing for me is that what's good for no-gil is good for subinterpreters and vice versa.

01:05:04 That no-gil probably really wouldn't be possible without a lot of the work that we've done to make a per-interpreter GIL possible.

01:05:12 So I think that's one of the neat things.

01:05:16 The future's looking bright for Python multicore.

01:05:19 And I'm excited to see where people go with all these things that we're adding.

01:05:24 When's the subinterpreters programming design patterns about coming out?

01:05:29 Yeah, my thoughts are...

01:05:33 Subinterpreters are mentioned in my book, actually, when it was like Python 3.9, I think.

01:05:42 Because it was possible then, but it's changed quite a lot since.

01:05:45 They, I guess, kind of some thoughts to leave people with.

01:05:50 I think if you're a maintainer of a Python package or a C extension module in a Python package,

01:05:57 there's going to be a lot more exotic scenarios for you to test coming in the next year or so.

01:06:05 And some of those uncover things that you might have done or just kind of relied on the GIL with global state,

01:06:12 where that's not really desirable anymore and you're going to get bugs down the line.

01:06:17 So I think with any of that stuff as a package maintainer, you want to test as many scenarios as you can

01:06:22 so that you can catch bugs and fix them before your users find them.

01:06:26 So if you are a package maintainer, there's definitely some things that you can start to look at now to test.

01:06:32 That's available in 313 Alpha 2 is at least probably the one I've tried, to be honest.

01:06:39 And if you're a developer, not necessarily a maintainer, then I think this is a good time to start reading up on parallel programming

01:06:51 and how you need to design parallel programs.

01:06:55 And those kind of concepts are the same across all languages and Python would be no different.

01:07:01 We just have different mechanisms for starting parallel work and joining it back together.

01:07:05 But if you're interested in this and you want to run more code in parallel,

01:07:10 there's definitely some stuff to read and some stuff to learn about in terms of signals, pipes, queues,

01:07:20 sharing data, how you have locks and where you should put them, how deadlocks can occur, things like that.

01:07:27 So all of that stuff is the same in Python as anywhere else.

01:07:29 We just have different mechanisms for doing it.

01:07:31 All right.

01:07:32 Well, people have some research work.

01:07:35 And I guess a really, really quick final question, Eric, and then we'll wrap this up.

01:07:38 Following up on what Anthony said, like test your stuff, make sure it works in a sub-interpreter.

01:07:42 If for some reason you're like, my code will not work in a sub-interpreter and I'm not ready yet,

01:07:47 is there a way to determine that your code is being run in a sub-interpreter rather than regularly from your Python code?

01:07:54 Yeah. If you have an extension module that supports sub-interpreters, then you will have updated your module to use what's called multi-phase init.

01:08:04 And that's something that shouldn't be too hard to look up.

01:08:09 I think I talked about it in the PEP.

01:08:10 If you implement multi-phase init, then you've already done most of the work to support a sub-interpreter.

01:08:17 If you haven't, then your module can't be imported in a sub-interpreter.

01:08:23 It'll actually fail with an import error if you try and import it in a sub-interpreter,

01:08:28 or at least a sub-interpreter that has its own GIL.

01:08:31 There are ways to create sub-interpreters that still share GIL and that sort of thing.

01:08:35 But you just won't be able to import it at all.

01:08:39 So like the readline module can't be imported in sub-interpreters.

01:08:43 The issue that Anthony ran into is kind of a subtle side effect of the check that we're doing.

01:08:53 But really, it boils down to if you don't implement multi-phase init, then you won't be able to import the module.

01:09:00 You'll just get an importer.

01:09:02 So that's, I mean, it makes it kind of straightforward.

01:09:04 Yeah, sounds good.

01:09:06 More opt-in than opt-out.

01:09:07 Yep.

01:09:08 Right on.

01:09:08 All right, guys.

01:09:09 Thank you both for coming back on the show and awesome work.

01:09:12 This is looking close to the fetish line and exciting.

01:09:15 Thanks, Michael.

01:09:16 Yep.

01:09:16 See y'all.

01:09:18 This has been another episode of Talk Python To Me.

01:09:21 Thank you to our sponsors.

01:09:23 Be sure to check out what they're offering.

01:09:24 It really helps support the show.

01:09:26 Are you ready to level up your Python career?

01:09:29 And could you use a little bit of personal and individualized guidance to do so?

01:09:34 Check out the PyBytes Python Developer Mindset program at talkpython.fm/pdm.

01:09:41 Take some stress out of your life.

01:09:43 Get notified immediately about errors and performance issues in your web or mobile applications with

01:09:49 Sentry.

01:09:49 Just visit talkpython.fm/sentry and get started for free.

01:09:54 And be sure to use the promo code talkpython, all one word.

01:09:58 Want to level up your Python?

01:09:59 We have one of the largest catalogs of Python video courses over at Talk Python.

01:10:03 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:10:09 And best of all, there's not a subscription in sight.

01:10:11 Check it out for yourself at training.talkpython.fm.

01:10:14 Be sure to subscribe to the show.

01:10:16 Open your favorite podcast app and search for Python.

01:10:19 We should be right at the top.

01:10:20 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:10:25 and the direct RSS feed at /rss on talkpython.fm.

01:10:30 We're live streaming most of our recordings these days.

01:10:33 If you want to be part of the show and have your comments featured on the air,

01:10:36 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:10:40 This is your host, Michael Kennedy.

01:10:43 Thanks so much for listening.

01:10:44 I really appreciate it.

01:10:45 Now get out there and write some Python code.

01:10:47 I'll see you next time.

01:10:47 Bye.

01:10:48 Bye.

01:10:49 Bye.

01:10:50 Bye.

01:10:51 Bye.

01:10:52 Bye.

01:10:53 Thank you.