Learn Python with Talk Python's 270 hours of courses

#225: Can subinterpreters free us from Python's GIL? Transcript

Recorded on Friday, Aug 2, 2019.

00:00 Have you heard that Python is not good for writing concurrent asynchronous code?

00:03 This is generally a misconception, but there is one class of parallel computing that Python is

00:08 not good at, CPU-bound work running in the Python layer. What's the main problem? It's Python's

00:14 GIL or global interpreter lock, of course. Yet the fix for this restriction might have been hiding

00:18 inside Python for 20 years, sub-interpreters. Join me to talk about PEP 554 with core developer

00:25 Eric Snow. This is Talk Python to Me, episode 225, recorded August 2nd, 2019.

00:31 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:49 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm at

00:54 M. Kennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the

00:59 show on Twitter via at Talk Python. This episode is supported by Linode and TopTal. Please check out

01:06 what they're offering during their segments. It really helps support the show. Eric, welcome to

01:10 Talk Python to Me. Hi, how's it going? It's going really well. It's an honor to have you on the show.

01:14 We met up at PyCascades and talked a little bit, but this latest work you're doing to address

01:20 concurrency and parallelism in Python is super interesting. So I'm looking forward to talking

01:25 to you about that. Well, it's super interesting to me too. Yeah, I can imagine. I'm glad you're

01:30 interested. This kind of stuff is, I don't know, there's just something that draws me in and I'm

01:33 really enjoy exploring it. But before we do, let's start with your story. How'd you get into programming

01:38 Python? Oh boy. I had all sorts of ideas on what I wanted to do growing up and computers was not

01:46 really one of them. But then I ended up at school and somehow ended up signed up for computer stuff,

01:55 ended up getting a CS degree. And then it's funny because I actually, while I was in school, I was

02:01 working for a web hosting company doing technical support. Once I graduated, I moved over to a development

02:06 team. And the guy I replaced is, you may know him, he's named Van Lindberg.

02:11 Okay. Yeah.

02:12 So it's kind of funny. So I ended up working on this project that Van had been running and ultimately

02:20 ended up kind of being the tech lead on that project. It was all written in Python. And so I

02:26 have Van to thank for my introduction to Python.

02:29 That's really cool. So you went from talking to the customers and helping them with the problems

02:34 the developers created to creating the problems for the person who took the job.

02:38 Just kidding.

02:40 Kind of.

02:41 Yeah. That's a really great progression, right? Like you sort of get your foot in the tech space

02:45 and then, you know, you make your way over to kind of running the team. That's great.

02:49 It was a good experience. One neat thing is that it was pretty flexible as far as the job goes.

02:55 There were only a handful of us on the team and we're doing a pretty big job, but we had taken an

03:01 approach of highly automating stuff. So it was mostly just a matter of making the effort to address

03:07 automation stuff, which meant that otherwise we had a little more time to kind of delve into issues and

03:15 solve problems in a better way. And as part of that, whenever I'd run into Python stuff, like I couldn't

03:22 figure out what was going on or I wanted to understand how something worked, I had the time

03:26 to go and explore it. And, you know, within a year or something, I discovered the mailing lists.

03:32 And then, you know, before long, I was actually active in email threads and, you know, I started

03:39 getting involved with the import system. And by 2012, so this is over the course of a few years,

03:45 I got commit rights and was pretty heavily active in core development.

03:50 That's so cool. I think there's something special for working for a small company. You get to touch a

03:55 lot of different things. You get this freedom that you're talking about to kind of solve the problems

03:59 the way you see they should be solved and then go and, you know, kind of explore. Right. I work for a

04:05 small company and I think I really attribute like being in that company in the early days to like a lot

04:10 of the success of my career because it gave me a chance to like round out the corners that I didn't

04:14 really know. I wasn't like just pigeonholed into like some super narrow role. Right. You work on what

04:19 this button does. That's your job. Like, right. Right. Right. Exactly. That reminds me what you

04:25 just said. That's been my experience with CPython that as I've gotten involved in the mail list and

04:30 the bug tracker and everything, I feel like is really rounded me out a lot because I'm exposed to

04:37 so many things, you know, the breadth of programming and just 90% of it, I probably never really would have

04:44 been introduced to because probably isn't that interesting to me, but because, you know, their

04:49 email threads and whatever, you know, I learned about it and that's, it's really made a huge

04:56 difference for me, I think. Yeah. I can imagine. It's just, it's an amazing group of people working

05:00 on there and then get down into the technical details. Oh yeah. So you started out and in this

05:05 web hosting company and now you work for a really big web hosting company, right? Yeah.

05:09 With Azure? Oh yeah. No, not exactly. But definitely, definitely I got some web hosting

05:14 going on. What do you do day to day over at Microsoft? So I work with Brett Cannon on the

05:19 Python extension for VS Code. I joined the team a little over a year and a half ago.

05:23 Nice. That's got to be a fun team to work on. You know, the excitement around VS Code is massive,

05:28 right? It's, you know, I always ask this question, what's your favorite editor? What editor do you use?

05:32 Things like that in the show. And yeah, VS Code is definitely tracking as a major thing. And

05:38 it used to sometimes be Sublime or Atom or something. It's, it seems like certainly for like that type

05:46 of interface, what would you call it? So what would you call it? I mean, it's, it's not an IDE

05:50 really. It's not like a terminal app. What, what category of editor is our Sublime Atom VS Code?

05:56 What's the name? What should I be calling these things?

05:58 I don't know why. Like a full featured editor?

06:02 Yeah, exactly.

06:04 You can't call it an IDE because that's, that's verboten.

06:07 Yeah. And there's not enough buttons. It needs more buttons and windows, right?

06:11 It needs more menus and more stuff so you can get lost in there, right? Right now.

06:15 Yes, exactly.

06:16 It's too easy not to get lost.

06:17 Yeah. There's not enough floating windows. Congratulations. I'm sure that's a super

06:20 exciting thing to be working on and it's, it's really growing quickly.

06:23 No, it's funny. This is a team that I first talked with them about getting on the team

06:28 in 2014 and it almost happened. And then there were some complications because I was only going

06:34 to work remote. At the time I was working for Canonical who makes Ubuntu. So I ended up just

06:40 kind of waiting and it took like what, two, three years or something like that. But it worked

06:46 out in the end. But that's, it's kind of a story of my life. I just kind of find a good

06:51 thing and then wait for it to work out. I'm not, I'm never really in a big hurry, which,

06:56 which I suppose we'll talk about relative to the stuff I've been working on.

07:00 Yeah, absolutely. So, well, that's, that's really good. And that's, that's a great project

07:06 to be working on day to day. And you said that Microsoft actually gives you a decent amount

07:10 of time to focus on CPython as well.

07:12 Yep.

07:13 As they do with Brett and some other folks. And that's, that's really quite, quite cool.

07:17 Yeah. I get to work basically my Fridays. I work on exclusively on Python. So that's been

07:25 a big boost to, to what I've been able to get done.

07:28 That's awesome. So you're saying the fact that we scheduled this on Friday is actually

07:31 cutting in your time to make Python better for everyone.

07:34 You know, it's actually part of why I just, yeah, no, it's cool.

07:37 That's the cost.

07:38 Yeah. But I think awareness of what you're doing is really good because I think it can make a

07:42 big difference. So let's just talk about parallelism and concurrency and asynchronous

07:46 programming and stuff kind of in general and in the, in the Python space. I feel like there's a lot

07:52 of people who look at what Python does with async. They see things like the GIL and they say, well,

07:58 this just doesn't work for parallelism. I'm switching to go or some, you know, some,

08:03 something like that. And I feel like it's, you know, there may be situations where you got to

08:09 switch to see you, you got to switch to go, but they're like 1% of the situations where people

08:13 actually do that. Right. Like most of the time, I think that's just not taking advantage of what's

08:18 out there. So maybe like, let's just set the stage with talking about like concurrency in general.

08:23 Yeah, you bet. If you look at it, the history of computing, other than really large systems,

08:29 most computers have been single processor, single core until relatively recently.

08:34 Yeah. Like 2005 or so, it just used to be the clock speed went up and up and up.

08:40 And that was how computers got faster.

08:42 So it's kind of funny because threading, sure, you can kind of logically program for different threads,

08:48 but ultimately it was just a single thread getting switched by the OS. And that's kind of the,

08:56 what you had to deal with, but it's a different story now.

08:59 Yeah. And back in the early days, you didn't even have preemptive multi-threading.

09:03 Oh yeah. Cooperative multi-threading, like you had to like give it up. Right. It was like in

09:08 Windows 3.1 in the early days, there was some weird stuff where like you had to be a good citizen on

09:13 the operating system to even allow that. We're kind of full circle here with async.

09:16 Yeah. Async it away is exactly the same thing.

09:18 Yeah. So it's kind of a conceptually the same. So it's really interesting, but now not only do we

09:24 have concurrency where you have to deal with matters of who's running at a given time, but now we also

09:32 have parallelism, which gives us performance boosts. But of course with Python, it's an issue with the

09:39 GIL, which everyone likes to complain about.

09:41 Right. Exactly. So within a single process, you can't really, unless you are doing certain operations

09:48 that release the GIL, you can't really run more than one interpreter instruction at the same time.

09:53 Right. Right. It's really a CPU bound code that suffers.

09:56 Right. Yeah, exactly. So if you're talking to like databases or you're waiting on web services,

10:01 all that stuff's fine, right? Like CPython interpreter, once it opens a network socket down

10:06 the C level, like while it's waiting, we'll release the GIL. And you can do those kinds of things in

10:11 parallel with threads already. Right. Yeah.

10:13 Not computationally. Yeah. It's kind of funny because if you look at it and I think Guido's

10:17 mantra has always been, well, you aren't really hurt by the GIL as much as you think you are,

10:22 because a lot of code that we write really isn't CPU bound. Very often it's not. And especially

10:30 for some of the CPU bound stuff, you know, a lot of the critical stuff, people have moved into C

10:35 extensions anyway. There's still a set of problems that are affected by the GIL. And people have had

10:43 to work around that number of solutions. You know, asyncio is kind of one thing,

10:47 but you also have multiprocessing and you have, you know, all sorts of distributed frameworks.

10:52 Right. Like Dask and other types of things. Yeah.

10:54 So all that stuff is in part, well, for distribute, it's a little different, but

11:00 part of the motivation there has just been to leverage parallelism better. So that's one of the

11:07 biggest complaints that people have with Python. It has been for a while, just parallelism,

11:13 multicore. And it's a bigger problem now that multiple cores are essentially ubiquitous.

11:18 Right. Even here on my MacBook, if I go and ask it how many processors it has, how many cores rather,

11:25 it says it has six and each of those are hyper-threaded. So as far as the OS is concerned,

11:29 I effectively have like 12.

11:31 Yeah.

11:31 And yet it's super difficult to take advantage of those.

11:35 Yeah. Yeah.

11:35 In Python.

11:36 Yeah. Yeah. It's just, it's really interesting. So it's funny the way things have gone and it's,

11:41 it's going to go even more, more this way. I mean, I expect that the way people program

11:46 will be different as we think about multiple cores more, but maybe not. I mean, because how often are

11:52 we writing, you know, CPU bound code?

11:55 I feel like there's just a couple of situations where it really matters and there are already to

12:00 some degree, some escape hatches, right? So the most obvious place where in Python,

12:06 it really matters for computational parallelism is in data science, right? Like I've got a billion of

12:11 these things. I want to run some crazy algorithm on it and like machine learning training or whatever.

12:16 But a lot of the libraries that the machine learning folks already have, have some capability

12:22 for, or for data science folks have, have some capability for parallelism at their lower C levels

12:29 anyway, right?

12:30 Yep. That's exactly right. I mean, a lot of these libraries have C extensions where they need them.

12:35 Exactly. The other place where I feel like you really could get a lot better support is on the web.

12:43 Yeah.

12:43 Right. Like we have some of the newer frameworks, we have Molten and Jepronto and Starlette and all these things,

12:50 Responder that let us write async def, some web method and truly leverage the asyncio components there.

12:59 But, you know, the main ones, Flask, Django, others, Pyramid, whatever, they don't, right?

13:06 They're all WSGI based and it's, you just can run into issues, right?

13:09 I mean, I know the web servers themselves have some capability just to parallelize it out, but it's still, it's, it would be much easier if you did.

13:16 So I don't think it's that big of a problem.

13:18 Like there's these two areas, the data science space, and I think the sort of like high-end web serving space that could be handled a little bit better.

13:26 Yeah.

13:26 We're already seeing some stuff with async and away on the web, which is, I think, where it's appropriate.

13:30 I think there's one important caveat too, and it's something that we, we don't really bring up a whole lot in the community, which is that there are a lot of enterprise users of Python that we never hear about how they're using it.

13:43 In part because of, you know, competitive advantage and that sort of thing, but we don't really hear about it.

13:49 Yeah.

13:49 Or they just, they just don't go to the conferences and they don't like spend all their time on Twitter.

13:53 They just see it as a job.

13:55 They do their work.

13:56 They go home.

13:56 Like they don't, not also their hobby necessarily.

13:58 Yeah.

13:58 Yeah.

13:59 So in a lot of those cases, performance matters and not just performance, of course, efficiency and, and that sort of thing.

14:07 I mean, it really adds up.

14:09 So I'm sure there are a lot of folks that we don't even think about who would benefit from better multi-core support in CPython.

14:18 But, you know, we just, we don't hear about those folks.

14:22 Well, maybe that's not even them directly.

14:24 Right.

14:24 Yeah.

14:25 Maybe they, they pip install a thing and that thing now works better and they don't even know that it's using multi-core support.

14:30 Right.

14:31 But somebody who's really clever found a way to make something they were doing much, much better using that.

14:37 Right.

15:07 This work that you're approaching basically tries to deal with this limited,

15:37 implementation of the Python's guild, the global interpreter lock, which basically has the effect of what I said before, that only a single interpreter instruction can run at a time.

15:46 Maybe some low level C stuff can also happen, but like the stuff that you write runs, you know, only one like bytecode instruction at a time, basically.

15:54 Yeah.

15:54 So maybe just tell people like, that sounds bad.

15:59 But it's here for a reason, right?

16:02 It solves a lot of problems for us, right?

16:04 Oh yeah.

16:05 It hides a lot of problems that you face when dealing with threads that you don't have to worry about in Python.

16:16 But not only that, it's also when you're writing C extensions.

16:19 In C, you have to do a lot of stuff yourself.

16:23 And when you're dealing with threads, you have to deal with all that.

16:27 So when you're using Python and you're holding the GIL, you don't have to worry about other Python threads.

16:34 You don't have to manage your own locks for those threads, which is, I think, makes a threading at the C level in the C API easier.

16:43 But also, there's a lot of implementation details in CPython that depend on the fact that the GIL protects them.

16:52 We deal with re-entrancy a lot.

16:54 But other than that, we don't really have to worry about race conditions on any of the C types, the built-in types or any of those, because they're protected by the GIL.

17:05 Yeah, which is great.

17:06 And the GIL is largely a memory management thing.

17:09 It's not initial job.

17:11 I mean, it is for threading, but it's mostly to protect the memory management and making that thread safe, right?

17:16 In large part, it's to protect the runtime state, especially memory management.

17:21 Yeah.

17:21 Yeah.

17:21 So it serves this important role.

17:24 I mean, we still do have R-lock and things like that, because we might write algorithms that a whole bunch of different steps can't be interrupted, temporarily invalid state or whatever.

17:34 So we might have to think.

17:36 But it's very rare, actually, that you end up doing locks and stuff.

17:39 And other languages like C++ or C# or something, it's common to do locking all over the place, right?

17:46 For all kinds of funky things.

17:48 So it's nice that it's not there.

17:50 And there have been several attempts to take it out to switch to other types of memory management, other things that let us avoid it.

18:00 But it's always had these problems of making the C extensions not working well or breaking them, of actually making the single threaded case slower.

18:11 It's one thing to say, okay, we could switch to some other system that's not using the GIL, but now your code's 1.5 times slower.

18:18 Unless you send like six cores on it, then now it's faster, sort of, sometimes.

18:23 Like, that's not a great solution either, is it?

18:24 One of the key things that we protect with the GIL is ref counts.

18:29 Because we use ref counting for our, essentially for memory management, then we have to keep those ref counts safe from race conditions.

18:39 So we would have to do locking around all ref count operations, and that would get really expensive real fast.

18:45 Right, exactly.

18:45 There have been other projects in the past.

18:47 Several people have tried to get rid of the GIL, including most recently Larry Hastings with the Gilectomy.

18:55 And each time it comes down to having to add a lot of locks or similar mechanisms to protect those global resources.

19:04 And those things kind of fall apart and cause performance issues that ultimately kind of kill the goals of the project.

19:13 Right, or break the C APIs.

19:15 Yeah, yeah.

19:16 If you're looking for performance and you're like, well, we made the Python bit 1.5 times faster, but the C parse doesn't work.

19:23 Like, all of a sudden, it's much slower, right?

19:25 Like, that's a problem.

19:27 If we were able to just break the C API, or even get rid of it and use something else, then we'd be able to solve this problem, I think, without a lot of trouble.

19:37 But because people in C extensions rely on a lot of these details, we just can't get rid of them that easily.

19:44 There has been recognition of this kind of in the core team in the last few years,

19:50 and a recognition that we really got to figure out how to solve this.

19:55 So I'm hopeful that we're going to figure this out.

19:59 There have been a lot of smart people thinking about this and a lot of good ideas over the last year or two.

20:04 There are some things that will have to break, but I think we'll be able to sort it out.

20:08 That's good.

20:08 Let's talk about the proposal that you've been working on, PEP 554, which has this concept of a sub-interpreter.

20:17 And when I heard about this, I thought, wow, okay, this is like some creation that's going to be like this new thing that allows for isolation so you can effectively mimic what you're doing with sub-processing or multi-processing,

20:34 but without actually the overhead of processes and the inter-process communication.

20:38 I'm like, okay, this is great.

20:39 But then as I looked into it, this is not a new idea at the very core of it, right?

20:45 No.

20:45 But it's just not something anybody's leveraged.

20:47 Tell us about it.

20:47 It's interesting.

20:48 It really is.

20:49 Nick Coghlan kind of expressed it as the isolation of processes with the efficiency of threads.

20:58 And it's not a pure explanation, but it's pretty close.

21:03 Sub-interpreters have been around as part of Python.

21:06 Originally, CPython was just implemented as kind of a blob of state, and there was an effort to kind of bring a little sanity to that

21:14 and isolate all of the state related to Python threads in one C struct and interpreters, which can have multiple threads, in another C struct.

21:27 And then there's run type state still all over the place.

21:30 That's just global.

21:32 So at that point, that was, I don't know, 20, 21, 22 years ago, something like that.

21:39 And at that time, C API was added for creating and destroying sub-interpreters.

21:46 And the threading API is built around sub-interpreters to an extent.

21:53 But it's funny because, like you said, it's not a new thing.

21:57 And yet, a lot of core developers didn't even know about sub-interpreters.

22:01 Very few users knew about it.

22:04 I knew of only one project that was actually using sub-interpreters meaningfully up until four or five years ago.

22:12 And that was Mod Whiskey, Graham Dumbledon.

22:14 And it's funny because sub-interpreters now, there's more awareness and people are starting to use them more, and some big projects including.

22:22 And at the same time, a lot of old users, so Graham, and I've since heard from a few people that use sub-interpreters internally for a long time.

22:32 Now that we're fixing all the problems with them, they're actually moving off of sub-interpreters because they gave up.

22:38 It's like, no, just wait another year.

22:40 We'll probably have a lot of this stuff.

22:42 Yeah.

22:43 And you can benefit from performance improvements that we're doing.

22:46 So it's really funny.

22:47 A lot of people just didn't know about it.

22:49 And the people who did didn't really think about it all that much.

22:52 But it's funny, as C-Python progressed, things would get added in, and they would affect sub-interpreters, but nobody would realize it.

23:01 There wasn't good tests of sub-interpreters.

23:02 There weren't many users, so nobody would report problems.

23:05 Poor Graham, he'd report things, and nobody would really pick up the bugs and work on them.

23:11 Well, this guy's crazy.

23:12 What's he talking about, this weird sub-interpreter?

23:14 Yeah.

23:14 Is that even a thing?

23:15 Exactly.

23:16 There are a number of problems.

23:17 In my opinion, it never really was quite finished, because they're not as isolated as they probably should be.

23:25 And there are a number of other rough corners, bugs, and stuff.

23:30 So what's interesting is the stuff I'm doing, one consequence is that those things have to get fixed.

23:37 Yeah.

23:38 So the idea is to lift this concept of a sub-interpreter up out of the C layer, create a standard library module called interpreters, that allows you to program against this concept of the sub-interpreter.

23:51 Correct.

23:52 So it's definitely, I'm doing this with isolation in mind.

23:56 You know, at first, the proposal was just wrap the C-API in Python in a C extension and call it good.

24:04 Because it's there, right?

24:06 And somebody early, early on pointed out, well, if you can't share stuff between sub-interpreters, all you can do is just start one up.

24:14 It's not really nearly as useful.

24:17 In C, you just do the C thing, you know, pass stuff around however you want and shoot yourself in the foot if you want.

24:24 Here's a bunch of pointers.

24:25 Uh-huh.

24:25 You can talk to them all you want.

24:26 Exactly.

24:27 Make sure you don't talk to them at the same time.

24:28 Don't hurt yourself.

24:30 Yeah, exactly.

24:31 But in Python, you know, we don't have the opportunity, which is, I think, a good thing here.

24:35 So they're like, yeah, well, it's not nearly as useful as it would be if you had just at least some basic way of sharing data between them.

24:43 So I was like, oh, yeah, that's a good point.

24:45 And so really got me thinking about sub-interpreters more than just as a tool to achieve other goals, which I expect we'll talk about, but also as actually a vehicle to a concurrency model that I think fits the human brain better, at least in my opinion.

25:03 I'm not a big fan of async.

25:05 I'm sure it's great.

25:07 Some people really get it.

25:09 For me, it's just, it's, I don't like it.

25:12 Yeah.

25:12 But, you know, that's fine.

25:13 I think there are other ways of thinking about concurrency that work a lot better.

25:17 Things have been studied since the 60s.

25:20 Right.

25:20 Message passing and some of these types of concepts where you're more explicitly like, I'm going to send this over to the thread and the thread's going to pick it up and work on it or things like this, right?

25:30 Yeah, yeah.

25:30 Before I moved to Microsoft, I was at Canonical for three years working on various projects written in Go.

25:37 And Go has a concurrency model that's, I would say, loosely based on CSP, which is kind of a concurrency model that was researched and developed since the 60s, especially by a guy named Tony Hoare from over in the UK.

25:53 Really powerful stuff.

25:55 And, you know, it has a lot of similar roots with like the actor model.

25:59 Yeah, exactly.

26:00 Go is one of these languages that very explicitly controls how concurrency works.

26:06 And it's part of the language that this data sharing and whatnot happens, right?

26:11 I don't think it's great what they did because they took CSP and then they broke some of the fundamental ideas behind it, like isolation in these processes, right?

26:21 I mean, CSP is communicating sequential processes.

26:25 So the idea is that you have a process that is just, it's like a single threaded program, right?

26:31 You could break it down into just a linear flow of code, no matter what, deterministically.

26:37 And then you have a mechanism by which these processes can communicate.

26:42 Basically, just send messages back and forth.

26:44 And they block at those points.

26:46 I'm going to send a message and wait for the other process to pick it up.

26:51 And then at that point, both processes will move on.

26:54 So I spent a while trying to figure out really what would be the best way to set up rudimentary communication between subinterpreters.

27:07 And my experience with Go came about, so I don't know if I just said this, but Go routines, which are kind of the idea of these processes in Go, they're not isolated.

27:19 So you can share data between them.

27:20 So basically invalidates a lot of the ideas behind CSP.

27:24 I mean, it's interesting.

27:26 So I want to take advantage of the isolation between subinterpreters.

27:30 And so essentially, you end up with kind of opt-in data sharing or opt-in concurrency.

27:38 You don't have to worry about races and stuff like that.

27:40 It's very much kind of like what the multiprocessing communication flow is, right?

27:46 I'm giving this data over to this other process, and then they can just have it and they own it and don't have to worry about it.

27:52 Or they get a copy of it or something like that.

27:54 So I looked for a lot of prior art that kind of followed this model.

27:57 And the stuff in multiprocessing was one.

28:00 The queue module has a lot of stuff that's kind of a similar idea.

28:05 And there are a few other things out there.

28:07 And of course, in other languages, I really stuck with this idea of following the model of CSP as much as I could.

28:15 And really, while the proposal isn't like some CSP implementation, the whole thing is kind of with CSP in mind.

28:24 Okay, could I build like a nice CSP library on top of this?

28:28 Right.

28:28 Because like you said, without the communication, like you said, it's like, it's kind of interesting, but it's just like a task spawning type of thing.

28:36 Right.

28:36 Like it's not really any sort of cooperation.

28:42 This portion of Talk Python to me is brought to you by TopTal.

28:45 Are you looking to hire a developer to work on your latest project?

28:48 Do you need some help rounding out that app you just can't seem to get finished?

28:51 Maybe you're even looking to do a little consulting work yourself.

28:54 You should give TopTal a try.

28:56 Maybe you've heard we launched some mobile apps for our courses over on iOS and Android.

29:00 I use TopTal to hire a solid developer at a fair rate to help create those mobile apps.

29:05 It was a great experience and I can totally recommend working with them.

29:07 I met with a specialist who helped figure out my goals and technical skills required.

29:12 Then they did all the work to find the right person.

29:14 I had a short interview with two folks and hired the second one.

29:18 Then we released the apps just two months later.

29:20 I think what we ended up with, which is channels, really basic, but I wanted to keep the PEP as

29:36 minimal as possible.

29:37 And I think it really, I came up with a good solution for this.

29:41 So one of the tricks though, is that because of the isolation, you can't just share objects

29:47 between sub interpreters.

29:48 I mean, currently you can at the C layer.

29:51 But in the Python, I didn't want to give anybody ever the opportunity to share objects, at least

29:56 not at first.

29:57 Maybe we can come up with some clever solutions for that.

29:59 But currently you can't.

30:00 So, I mean, there's really a limit to what can be shared between sub interpreters as proposed.

30:06 And I want to keep it as minimal as possible so we can build from there.

30:10 Yeah, absolutely.

30:11 Well, one of the really exciting parts of that is one thing that is not shared between sub

30:16 interpreters is the global interpreter lock, right?

30:18 Well, currently it is.

30:20 That's the problem.

30:22 So right now, sub interpreters do share the gil.

30:26 So one of the things I'm working on is kind of the bigger problem.

30:30 And really, I'm trying to tackle this problem of supporting multi-core parallelism in CPython

30:36 using sub interpreters.

30:38 So kind of PEP 554 is just a vehicle to make sub interpreters accessible to Python users.

30:45 But really, the actual goal is to fix sub interpreters, including to stop sharing the gil between sub interpreters, which is kind of crazy.

30:53 That is crazy.

30:54 But at that point, then you can say start five threads.

30:58 Each thread starts a sub interpreter as its startup process.

31:02 And then all of a sudden, the gil is no longer a problem.

31:05 Precisely.

31:05 Potentially.

31:06 And because you're not sharing the objects, right?

31:08 You don't have to worry about that.

31:09 And now you have these channels where you can pass data back and forth in a thread-safe way.

31:14 That's super cool.

31:15 It sounds like sub interpreters as they exist don't do that.

31:19 But that's kind of the ultimate goal.

31:20 Yeah, yeah.

31:21 Create this exposure of the API to actually create sub interpreters.

31:24 Move the gil down so it's one per sub interpreter.

31:28 And then a way to communicate between them.

31:30 So that's the Harrier problem, right?

31:32 Is the gil.

31:33 What we have is, like I said earlier, we have a whole bunch of runtime state all over the place.

31:39 So one of the things I did a couple of years ago for this project, there are a bunch of things that we've done.

31:44 Big things that probably nobody even notices because they're all internal.

31:48 But one of the things I did was I took all the global state I could find and I pulled it all into one single C struct.

31:55 So what's neat about this project, the ultimate goal of not sharing the gil between sub interpreters

32:00 is that it requires just a ton of other things.

32:04 I think I list out 80 different tasks that are probably not even super fine grained that have to get done in order to make this work.

32:13 And probably 70, 75 of those are things that are a good idea regardless of the outcome of my ultimate goal.

32:22 Right?

32:23 Right.

32:23 See, Python is going to be cleaner.

32:24 Uh-huh.

32:25 If we get to those 75 things and then we're like, oh, it's not going to work.

32:29 Well, that's okay because, dang, you know, we got some good stuff done anyway.

32:34 Stuff that we wouldn't have done because we weren't really that motivated.

32:37 I mean, this is open source.

32:39 So I have a motivation and other people share some of the motivations.

32:43 It's really neat.

32:44 There's a lot of collaboration going on now because not just for the whole sub interpreter thing.

32:50 Some of the stuff that I need is stuff that other people need for different reasons.

32:55 And it's working out pretty well.

32:57 But the whole thing is I took all this state and smashed it into a single struct.

33:02 And as kind of a side effect, I just want to make sure I didn't hurt performance by doing that.

33:08 So I ran Python's performance suite and it turned out that I was getting a 5% improvement in performance.

33:16 Interesting.

33:16 Do you know why?

33:17 It's crazy.

33:17 Well, I expect it's because of cache locality of that struct.

33:20 That was my first guess as well, right?

33:22 As you load one element of that struct, everything drags along onto L2 cache or the local cache.

33:29 And it's just a little quicker, right?

33:31 You just accidentally do fewer deep memory lookups.

33:35 Somebody pointed out to me that it probably doesn't have quite the same effect on performance for a PGO build.

33:41 Where the compiler can optimize the layout of memory and various other things relative to what's hottest in the code, right?

33:51 Yeah.

33:51 So it kind of runs the thing through a workload and determines what's the hottest chunks of memory and pushes those together.

33:59 So you get those same cache locality benefits.

34:01 So ultimately under a PGO build, probably not the same performance benefits.

34:06 But I only bring that up because...

34:09 PGO performance guided optimization.

34:10 Yes.

34:11 Thank you.

34:11 For everyone out there.

34:12 Yeah.

34:12 Yeah.

34:12 Acronyms.

34:13 Yeah.

34:14 Yeah.

34:15 So maybe it's not as big necessarily.

34:17 Is that theoretical or is that something that is actually done on CPython builds?

34:21 Yeah, yeah, yeah.

34:22 People do it.

34:23 If the one that I brew install, is that one?

34:25 PGO?

34:25 I don't know.

34:26 Optimize?

34:27 Okay.

34:28 Yeah.

34:28 Yeah.

34:28 Yeah.

34:29 You get some real benefits from a PGO build.

34:31 There are lots of these little things.

34:33 One of the things I needed to happen was there was some work that Nick Coughlin started like four or five years ago to clean up runtime startup.

34:43 And I needed that because otherwise there were certain things I just couldn't do.

34:48 So I was blocked on that.

34:50 So finally, I got around to taking the branch that he had, which is on subversion, and moving it over to Git and then at the same revision.

35:02 And then I had to rebase it against master and fix all the conflicts and finally got that merge like two years ago.

35:09 And that was a big thing.

35:10 And then because of that, we're able to do a lot of really good things with startup that we weren't able to before.

35:16 So that's a side effect.

35:17 And there are all these things that are just good.

35:20 That was one of the larger goals of Python 3 as well is try to like fix the just the cold startup times, right?

35:27 No.

35:27 No.

35:27 One of the goals that we have is to fix startup times so they're at least on par with Python 2, which they weren't.

35:34 Yeah, that's what I was thinking.

35:35 Weren't nearly at first.

35:36 That's what I was thinking.

35:36 Yeah.

35:37 So we're mostly on par now.

35:39 The biggest problem was all the codecs and Unicode stuff.

35:42 That really hits startup performance.

35:45 So if you think about subinterpreters, if you start up a new subinterpreter, it has to build all this state.

35:50 It has to load all these modules.

35:51 There's a ton of stuff that has to happen, right?

35:54 So you're going to incur that cost for each subinterpreter.

35:57 As a consequence of what I'm working on, I want to make sure that startup time is as small as possible.

36:02 So it's definitely one of the things, maybe not one of the immediate concerns, but kind of one of the relatively low-hanging fruit for this project once I finish this first phase is to go in and do things like make interpreter startup more efficient, whether it's sharing or whatever.

36:19 And those are things that are a good idea regardless of use of subinterpreters.

36:24 I mean, just for the main interpreter, getting startup faster is a good idea.

36:27 And that's something that I want for subinterpreters.

36:29 And I'm motivated to do it.

36:31 And I think other people, once subinterpreters get in widespread use, people are going to be like, oh, yeah, this is great.

36:37 And people are going to be motivated to fix some of these deficiencies in subinterpreters.

36:42 Yeah, absolutely.

36:42 And it'll definitely, as it's a little bit of a catch-22, right?

36:46 You said it wasn't even hardly quite finished because here's this idea.

36:50 But at the same time, if no one's really using it, why do you care about fixing this thing?

36:55 And if no one's fixing it, I'm not going to use it because it doesn't quite work.

36:58 And then here you are in this lock, right?

37:01 So it reaches a point where, well, for me, it was in 2014.

37:04 I was having a conversation with somebody at work.

37:08 And they were saying, yeah, Python's going to die because it doesn't do multicore because of the GIL.

37:14 And it was just, I don't know, one of those moments, you know, when something hits you so deep, you know, they dig a little too hard.

37:23 And you're like, okay, fine.

37:24 Forget you.

37:25 I'm going to fix this.

37:26 And I'm never going to hear anybody complain about the GIL again.

37:29 Yeah.

37:29 Yeah, absolutely.

37:30 That's what this project is.

37:32 That's what I've been working on for several years now.

37:34 It's basically just to get people to stop complaining about the GIL.

37:40 I definitely think, you know, rightly or wrongly, it's one of the perceived deep limitations of Python.

37:46 Yeah.

37:46 I think wrongly, but I do think that it is perceived to be that like Python is not nearly, is barely appropriate for like parallel computing.

37:55 Yeah, yeah, yeah.

37:55 I don't think that's right, but I think that's the perception.

37:58 Yeah.

37:58 Outside of a lot of Python, or maybe within it.

38:00 I think it's a fair perception for a class of users.

38:04 And as a community, we like to be inclusive.

38:07 We don't want to leave anybody out.

38:08 We want to make sure that things work out for folks.

38:11 It's just being open source.

38:13 It's nothing's going to happen until somebody cares enough to do something about it.

38:18 And that happened for me.

38:19 That's awesome.

38:20 Yeah.

38:20 And so here we are.

38:21 Yeah.

38:22 So it sounds like around September 2017, you introduced PEP 554 to address this.

38:28 Probably you've been working prior to that.

38:31 2018, you talked about it at PyCon US at the Language Summit and whatnot.

38:36 And then also, again, in 2019.

38:38 And like those, also, it sounds like those experiences were a little bit different.

38:41 Do you want to maybe recount those for us and tell us how this is being perceived over time?

38:46 You bet.

38:47 I mean, I've gotten support from the core team.

38:49 I think my first post about all of this to the Python dev mailing list was probably 2016, early 2016, I think.

38:59 And, you know, and there was a lot of discussion about it.

39:04 And there were really only a handful of people that had any sort of opposition to it, any major concerns, which I took as kind of a valid litmus test on if it was worth pursuing.

39:18 Yeah.

39:18 And when you initially presented it, what was the scope?

39:21 Was the scope like the final end goal where there's like you're trying to use it for concurrency and all that?

39:27 That was like from the start.

39:28 Talking about using subinterpreters and not sharing the GIL to achieve these goals of multi-core parallelism.

39:36 So, you know, at some level to the details, pursuing kind of a CSP model, a standard library module.

39:43 And, you know, and the response was pretty good.

39:45 There was several long threads and I incorporate all the feedback into the PEP ultimately.

39:51 But, yeah, the feedback from the PEP was great.

39:54 And then come PyCon 2018, I basically asked everybody I talked to, explained subinterpreters and what I was working on and asked them what they thought of it, how they would use it.

40:06 And it seemed like everybody had a different response.

40:09 Everybody was excited about it, almost universally.

40:12 And everybody had a different response on how they would use it.

40:17 Most people, I didn't even have to ask them.

40:20 The big guy, go like, wow, I have the perfect use case for that.

40:23 This, this, this, this.

40:24 And I even at one point asked, oh, his name escapes me, the maintainer of Dask.

40:30 Matthew Rockland.

40:31 Matthew Rockland.

40:32 You're welcome.

40:32 I asked him about this.

40:33 He's like, wow, subinterpreters sound neat.

40:36 But I doubt I would incorporate, I would make use of them for Dask, except Dask internally has all these control threads and all this machinery built out.

40:50 For managing all of the distributed programming.

40:54 Just so people know, Dask is a way to run your Python code potentially on a bunch of different systems.

40:59 It's kind of like, like a Pandas data frame style programming.

41:05 But you say run this computation, but like all over the place.

41:08 That's the problem that Das solves.

41:10 And then.

41:10 So it's really interesting.

41:11 He said, but yeah, I would totally use that for my internal stuff.

41:16 I mean, for these control threads.

41:18 I mean, I would totally make use of that because it was, it's perfect.

41:21 Or, you know, talking to web folks and they're like a bunch of different use cases for how this apply to web frameworks.

41:27 Or basically everybody had ideas.

41:30 Yeah.

41:31 There's a ton of great ways.

41:32 Yeah.

41:32 It was really neat.

41:33 So I got excited and then come the sprints that year.

41:37 Oh, and one of the people that was really supportive was Davin Potts, who's one of the maintainers of multiprocessing.

41:44 Which is, you know, I thought that was a pretty good sign that I was in the right direction.

41:49 It is absolutely a good sign.

41:50 I mean, this is like the next gen multiprocessing in my mind, kind of.

41:54 I still wonder if once we have sub interpreters, do we even need multiprocessing?

41:58 Subprocesses make sense.

42:00 But does multiprocessing make sense?

42:02 I'm not sure.

42:02 I mean, it's kind of like the big hammer to solve the GIL problem and this isolation problem by just going fine, the operating system will do it.

42:11 But if CPython itself does it, then I mean, maybe there's a memory.

42:14 But it's interesting to think about.

42:16 Yeah.

42:16 What's funny is now after I'm charged up, I'm excited.

42:20 I've got all sorts of notes.

42:21 I'm like, wow, you know, there's all this stuff.

42:24 Several people have said that they'd like to help out.

42:28 You know, and then I get to the sprints and you can imagine that Guido's a busy guy at PyCon.

42:33 You know, everybody wants to talk to Guido, taking pictures.

42:36 He can't get even a moment piece.

42:38 I know it's, yeah, it's definitely got people chasing him around.

42:41 It wears him out.

42:42 And there's always stuff going on.

42:44 There are people that he needs to talk to about different proposals and whatever.

42:48 And this is 2018.

42:50 He's still BDFL.

42:51 And there's a lot going on.

42:55 And what happens?

42:56 He actually comes and finds me, sits me down.

42:59 And for 45 minutes, he basically tells me that he thinks it's a bad idea.

43:06 And I can tell you, I walked to, so I understood where he's coming from.

43:10 And I think in part, he had misunderstood what I was trying to do.

43:14 Yeah.

43:14 It's like that telephone game where one person tells a person who tells a person something and

43:19 it's not the same on the other side.

43:21 And in the conversation, you know, I tried to clarify a few points, but it really wasn't

43:24 a great opportunity to try and explain really why this was a good idea.

43:29 I mean, the PEP to an extent does, but I think there's kind of a gap in the justification that

43:38 really Guido was just, I hadn't communicated well to him.

43:42 So 45 minutes.

43:43 And basically, you know, I conceded some of the points that he made and tried to explain

43:49 the others.

43:50 And ultimately, you know, it's not like he said, stop.

43:54 He basically said he thought it was a waste of my time that I should work on something

43:58 that's going to benefit people more.

44:01 Also, he was coming from thinking about the problem in a different way than I was and a

44:08 different understanding of exactly what I was trying to solve and what I was trying, what

44:12 the proposal was, what the solution was.

44:14 Well, he probably also has a lot of Gil fatigue hearing how Gil is ruining Python and all that,

44:19 right?

44:19 Yeah.

44:19 And I think in part, he was just worried that I was going to get people excited about something

44:24 that wasn't going to actually end up happening.

44:26 Okay.

44:27 So it was kind of a bummer.

44:28 I was bummed out probably the rest of the day.

44:31 Did you walk away less inspired or are you still excited after all the other input you

44:35 got?

44:36 I was still determined.

44:37 Yeah.

44:37 I probably, my excitement level was lower only because it'd been suppressed a little, but

44:42 you know, that wears off.

44:44 And talking to more people about it, same level of excitement, the same excitement about how

44:49 they would use it.

44:50 And so, you know, I didn't worry about it, but I was worried that if I couldn't convince

44:55 Guido, then A, of course, I didn't think it would happen.

44:59 And B, maybe it really wasn't a good idea.

45:01 Because Guido is smart and he's been doing this a long time.

45:04 And I have absolute trust in that uncanny ability he has to understand whether something

45:10 is good for Python or not.

45:12 Yeah.

45:12 I mean, he's amazing.

45:13 So it did make me wonder, well, what if he's right?

45:17 Maybe I'm not understanding.

45:19 That's probably more likely.

45:20 So, but I kept at it.

45:22 I was determined.

45:23 You know, like I said, I waited three years for the job I have now.

45:28 So I was like, you know, I'll just keep going.

45:31 And if nothing else comes of it, I was convinced that 80 or 90% of stuff that I was doing was

45:37 a good idea regardless.

45:38 So I was like, I'll just keep going.

45:40 And if it ends up that it's not going to work out, I won't feel too bad about it.

45:45 I'll have made a difference, I think.

45:47 So kept going.

45:48 But then 2019 rolls around and Guido pulls me aside again and says, oh yeah, that's

45:55 a good idea.

45:55 Been thinking about it.

45:57 Because he got it.

45:58 Well, the point that we were fixing, he saw over the course of the year, he saw that I was

46:02 working on all these things that I needed for the goal, but they were a good idea regardless.

46:08 And he's like, oh yeah, you're working on all this stuff.

46:11 And also, he probably heard my explanation a few more times and it clicked on how I was trying

46:18 to solve this problem.

46:19 And he said, yeah, that could work.

46:21 And so I was floating around for a while.

46:23 It was exciting.

46:24 That's super cool.

46:25 One of the challenges a lot of peps and projects have had recently, let's say since July 2018,

46:32 maybe, is we, more like you guys, have not really had a way to decide to make decisions

46:41 after Guido said, I'm stepping down, I'm just stepping back to standard core developer or

46:47 steering council now, but stepping back saying, you guys have to figure out a new way to like

46:51 make decisions and sort of govern yourself.

46:54 Right?

46:54 So that, your PEP spanned that gap.

46:57 So I'm sure that didn't.

46:58 Oh man, it was brutal.

46:59 Was it?

46:59 It literally killed a lot of the momentum I had coming out of PyCon 2018 because that happened

47:06 just a couple months after.

47:09 And basically, I kept working on stuff, but there was all these discussions about governance

47:14 and governance and governance, and it just dominated a lot of what we were working on.

47:19 So there wasn't a lot of collaboration going on with this project.

47:22 And there was a lot of just cognitive effort to stay on top of this stuff because it's important.

47:28 So really, until all this was solved, pep554 was ready.

47:35 Basically, right after PyCon, I was 2018, I'd worked up kind of a separate list of arguments

47:43 to make to Guido on why this was a good idea and try and kind of fill that gap that I had

47:48 perceived.

47:48 And then on top of that, I had updated the PEP to kind of iron out some of the small things.

47:54 I felt like it was ready.

47:55 And literally, right before I was going to ask for pronouncement on the pep, then, or no,

48:02 I think I was going to wait until the core sprints in September so that I could talk to

48:06 Guido in person and try and make the case and then ask for pronouncement.

48:09 So, you know, this was, it was brutal because then no peps got decided.

48:16 And the core sprint was mostly spent talking about governance stuff, which that's fine.

48:21 It was productive, but I wasn't able to get a lot of progress.

48:24 So it just kind of slowed things down so much.

48:28 And then when we finally got governance ironed out, you know, there's transition.

48:32 So this whole time I was honestly aiming to solve it, to get PEP 554 landed for three, eight,

48:39 and then even get the, the stop sharing the GIL stuff done for three, eight.

48:44 Neither one happened in large part because of the whole governance issue.

48:48 It's probably good in the longterm that this transition happened, but in the short term,

48:52 it definitely threw a bunch of molasses in, you know, it's a little disappointed every release you miss on something.

48:59 It's a little part of you hurts, but.

49:01 I can imagine.

49:02 Well, and the releases are long, like the gaps are wide between them, right?

49:05 18 months is a long time in technology.

49:08 It's not like, well, maybe next month it'll come out.

49:10 We're actually talking about reducing the release cycle to a lot smaller, six or 12 months.

49:16 Depending.

49:17 I think that's interesting.

49:18 What's the trade-off there?

49:21 So our currently released manager, Rukash, he, he said for three, nine, I want it to be shorter.

49:26 So he basically said there are a variety of reasons.

49:30 The main opposition to having shorter release cycles was that it's more of a burden on the release team,

49:38 but that's less of an issue.

49:40 Now there's a lot more automation.

49:41 And so this is coming from the release manager.

49:45 So he was in a position to determine what made sense.

49:49 So that's, that's kind of how that's played out.

49:52 He's, he's like, let's do this.

49:54 And so there was some discussion for a stretch on what would be the best time.

49:57 And if it made sense at all, of course, but if we went with it, you know,

50:01 what kind of release interval we'd have and how that would work logistically and how that would play into other factors of core development.

50:10 So I, I don't remember where that's gotten to.

50:12 I think there was some consensus that it would make sense to look at it further, but I think like most long discussions do, it kind of tailed off without a good conclusion quite yet.

50:24 I don't know.

50:25 I don't remember.

50:26 It's, I don't remember what the PEP number is.

50:28 It's the, the release PEP for three, nine is where he started this discussion.

50:32 So there's a number of threads related to that PEP.

50:35 You know, to me, it sounds generally positive, right?

50:39 Like smaller releases that you can understand a little bit more completely rather than just like, here's a huge dump of 18 months of work.

50:48 But I definitely do understand it.

50:50 I mean, you've got all the places, all the Linux distributions, all the other places that are shipping it.

50:57 They have to now think probably about that more frequently.

50:59 That was definitely one of the concerns.

51:01 But now that the Linux distributions are moving away from exposing their system Python, that it's less of a concern.

51:11 Right.

51:12 So one interesting thing in this discussion was just the idea of moving to Calver for versioning Python.

51:19 I think that was something that Brett had talked about.

51:22 So, you know, there are a number of different ideas.

51:24 Like actually having the version number be like 2019.6 for June or something like that.

51:32 Yeah.

51:32 So then you'd end up with 2019.6.0.1, you know, for bug fixes and all that.

51:38 Definitely.

51:38 I like the calendar versioning for like packages and stuff.

51:42 But for the actual core, like that's pretty interesting.

51:45 I don't know.

51:45 It may not make sense.

51:47 There are a lot of things that people talked about.

51:50 We talked about the possibility of LTS releases or some variation on that.

51:55 And so that we'd be maintaining multiple.

51:57 But, you know, I think a lot of people are kind of burnt out on having maintained 2.7 and Python 3.

52:05 At this point, like, have we just about gotten out of this?

52:07 Most people don't bother with 2.7 at this point, core developers.

52:12 So it's really interesting.

52:14 I don't know.

52:15 There are lots of ideas.

52:16 I think ultimately we'll settle on the right thing.

52:19 Something that'll work well for us.

52:21 Even if it's a status quo, if we figure out that's the best way forward.

52:24 But we've already since 3.6, I think it was, we started doing a shorter release cycle, more like 14 months.

52:32 Because we used to do release cycle from release to release.

52:37 But now we do, or from final to final.

52:40 Now we do, if you think about it, it's more like final to beta 1.

52:44 Right.

52:44 Which we're already like way past 3.8 beta 1.

52:47 The final release date for the next version is basically 18 months from beta 1 now instead of final.

52:54 That's the way we've been doing the last few releases.

52:57 So it breaks it, shortens it to like 14 months.

53:00 So 12 months really wouldn't be that different.

53:02 Yeah, that's true.

53:03 We'll see what happens there.

53:05 But, you know, interesting topic.

53:06 For sure.

53:07 So the final takeaway is you're targeting Python 3.9, which will be basically where the work is going into now, right?

53:14 Like you're already in beta of 3.8.

53:16 It's kind of frozen and whatnot.

53:17 So it's going to be probably the next version of Python.

53:20 Maybe that will be shorter.

53:22 Maybe not.

53:22 A little undetermined at this point.

53:24 Might be 12 months from now or who knows.

53:27 I expect, regardless of when it is, that we're close enough that we'll be able to get all of this sub-interpreter stuff done for that.

53:33 Assuming PEP55R gets accepted, which I expect.

53:37 I hope it does.

53:38 I expect it will.

53:39 I don't see a reason why it wouldn't.

53:41 Yeah, it seems like the excitement is there for it.

53:43 To me, it clearly solves the problem, assuming like the startup time of the sub-interpreters is not just equal to multiprocessing and things like that.

53:53 It seems like it's going to be really great.

53:56 Yeah, and what's nice is I've done this in a way that will start really minimal.

54:00 Like you'll only be able to pass bytes or strings or other basic immutable types between sub-interpreters.

54:08 But with this foundation, then there's like a whole list of really neat projects that people can work on to improve things for sub-interpreters.

54:16 Like I talked about earlier, improving startup time, but also things like one neat idea is the idea of for memory allocators in CPython.

54:25 Right now, we use one memory allocator throughout the whole lifetime of the runtime.

54:30 Memory allocator is in charge of, of course, allocating and deallocating memory.

54:34 So what if you could use a different memory allocator per interpreter?

54:39 Well, what if you could at any arbitrary time swap out an allocator so that objects are allocated using different allocators?

54:47 Then you could manage relative to the allocators for those objects and you get some neat things.

54:54 Like what if you had an allocator that was page size, right?

54:58 And so then you actually can, in Python, have a class that kind of wraps that allocator so that you can create objects relative to that class.

55:10 Or create an object that represents allocator and then any attribute that you create on the object is in that allocator or whatever.

55:20 So now you have this self-contained memory page that then you could mark, let's say, read-only.

55:27 Suddenly, all that memory is read-only and you have truly read-only objects in Python.

55:33 What if you take that read-only and now you can pass that whole memory page over to another interpreter and you don't have to worry about any race conditions relative to that memory page?

55:46 One of the best ways to get parallelism is to have immutability.

55:49 I think there are lots of...

55:51 And so there's a...

55:52 I have a project open for this and a number of other resources where I've basically written all this stuff down.

55:59 Like, here's a whole list of awesome things that we can do once we have this foundation set.

56:05 Would you get things like maybe less memory fragmentation for long-running processes?

56:09 If you could start up these sub-interpreters, like give them a block of memory, let them throw that away.

56:14 And like things like this, like other benefits, possible memory leaks for like badly written code, but like it was all within a sub-interpreter.

56:19 There are a number of things there.

56:22 One is that I have a list of...

56:26 Like I said, there's all this global state all over.

56:29 This is kind of the main blocker for me right now.

56:31 And so we have all these static globals in the C code all over the place, thousands of them.

56:36 Most of them can't be global.

56:40 So I can't even pull them into the runtime state struct.

56:44 I have to pull them down into the interpreter state, which means I have to collect them out of static globals and kind of migrate them into this pi interpreter state struct.

56:54 And it's just a lot of work.

56:56 And then I have to make sure that nobody adds any static globals that they shouldn't in the future.

57:01 Or else, same problem all over again.

57:04 So, I mean, this is probably the main problem right now.

57:08 Aside from all those globals, there are some parts of the pi runtime state, which is this struct where I pulled in a lot of globals earlier, a couple years ago.

57:19 There are key items of that struct that I've identified that need to move over into the interpreter state.

57:28 The gil will be the last one of those.

57:30 But right before that is memory allocators.

57:33 So I'm pretty sure that we'll be able to do this just fine.

57:37 But I need to see how it affects performance.

57:40 But moving the memory allocators to per interpreter.

57:45 So I think one of the side effects, I mean, it really could be reducing memory fragmentation down, you know, isolating it to per interpreter.

57:53 Which, if you're using multiple interpreters, that's a good thing.

57:56 Yeah, it's really interesting.

57:57 And certainly the pressure that hardware is putting on top of programming languages and runtimes is not getting less, right?

58:05 Like, we're only going to have more cores, not fewer going forward.

58:09 So it's only going to be a problem that stands out more starkly if Python only reasonable runs on like one core at a time.

58:17 When you have 16, 32, 64 cores, whatever it is in like five years, right?

58:22 Yep.

58:23 So it's definitely a good project.

58:24 I'm really excited about it still after first motivated to work on this and five years ago.

58:30 You know, I'm still motivated.

58:31 Almost gave up at one point.

58:33 But plugging away.

58:35 And now a lot of people are excited.

58:36 Looks like it's really going to happen for 3.9.

58:38 Are some of the other core developers helping you?

58:40 Somewhat.

58:41 Everybody's got different goals in mind.

58:45 Victor Stinners, he's been really helpful for some of this stuff, especially relative to the CPI.

58:50 I've had offers of help from others.

58:54 Before Emily Morehouse became a committer, I was helping to mentor her.

59:00 And one of the things that we did, we met basically weekly.

59:04 And for the most part, we paired up on working on sub-interpreters.

59:09 And that was a big help.

59:10 Now she's...

59:11 Yeah, that's cool.

59:11 She's all important now.

59:13 No, no.

59:15 Emily's great.

59:15 But she's so busy.

59:17 Yeah, she's great.

59:18 She's running a successful company.

59:19 It's really busy.

59:21 And on top of that, she's the chair for PyCon 2020 and 2021.

59:27 Well, I'm guessing 2021.

59:29 Anyway, at least next year.

59:31 And then she's got a lot of this stuff going on.

59:33 And she did the assignment expression implementation.

59:36 And all sorts of stuff.

59:38 But during that time when she was helping out with this stuff, it was a really big help.

59:42 So lots of help.

59:43 I had help from a number of folks out in enterprise.

59:48 Talked to folks at Facebook and Instagram and some other companies.

59:53 I've had offers to help from other individuals.

59:56 Help from small companies.

59:59 People coming up and saying, hey, I want to get my whole team working on this.

01:00:04 It hasn't really gone anywhere.

01:00:05 I don't get my hopes up too high.

01:00:07 Yeah, it's such a big problem, right?

01:00:09 It's like so wide spanning.

01:00:10 It sounds like with all the globals and whatnot.

01:00:13 You got to really...

01:00:14 It's not very focused.

01:00:15 So it's hard to work on, I suspect.

01:00:16 One thing I made sure to do was break this problem down into zillion tasks.

01:00:20 As granular as I could.

01:00:22 So I think I gave you the link there to the multi-core Python project that I have.

01:00:27 If you look me up on GitHub, you'll find that repo.

01:00:31 And that repo is basically just a wiki and GitHub projects breaking down all this work into discrete chunks.

01:00:40 I'll certainly link to all those things in the show notes.

01:00:42 So we'll just click on it.

01:00:43 But yeah, that's great.

01:00:44 Link to your...

01:00:45 You gave a talk at PyCon 2019.

01:00:47 I don't think we mentioned that yet.

01:00:49 Yeah.

01:00:49 So it was a talk.

01:00:50 I actually proposed two talks.

01:00:52 One of them was specifically about subinterpreters and both PEPF5-4 and the whole effort to move GIL to per interpreter.

01:01:02 That got rejected.

01:01:03 That was the one I wanted to give.

01:01:05 I gave another one that's broader.

01:01:08 It was kind of a superset.

01:01:10 It included the stuff from the other talk, but it also talked about all about the GIL in general.

01:01:16 The history of the GIL, what really, you know, the technical ideas behind the GIL, really race conditions and parallelism, concurrency and all that stuff.

01:01:28 And then also talked about what we need to do to kind of solve that problem, including talked about some of the past efforts and also current efforts to make fixes in the CPI, changes in CPI so that we can move past the GIL.

01:01:46 And then I focus a lot of the talk on the stuff with subinterpreters.

01:01:51 Cool.

01:01:51 Yeah, that sounds really interesting.

01:01:52 We'll definitely link to that.

01:01:53 All right, Eric, I think we're just about out of time.

01:01:57 We'll definitely cover this.

01:01:58 And I'm really excited for this project.

01:02:00 So if you need any more positive vibes and feedback, I think this definitely has a chance to, like, really unlock this multi-core stuff in a general way.

01:02:10 I think there's interesting APIs you can put on top of it to even make it, like, almost transparent to folks.

01:02:17 You know, I'm a big fan of the unsync library, which has a cool unifying view on top of threading, multiprocessing, and async.

01:02:24 And this would dovetail, like, right into that, like, just a little decorative value.

01:02:29 Oh, yeah.

01:02:29 And boom, it's subinterpreter execution and all sorts of stuff.

01:02:32 That would be great.

01:02:33 Yeah, it's really awesome.

01:02:34 Excellent work.

01:02:35 I'm looking forward to using it in 3.9 beta 1.

01:02:38 Now, before you get out of here, though, I do have two final questions for you.

01:02:41 I think we may have spoiled the first response.

01:02:44 If you're going to write some Python code, what editor are you going to use?

01:02:47 I think people may be able to guess what you're going to say here.

01:02:50 You know, it's funny.

01:02:51 First, I'll say that the Python extension for VS Code is written not in Python, but in TypeScript.

01:02:59 Because it's an Electron JS app, yeah.

01:03:01 That's a whole other topic.

01:03:02 An interesting one.

01:03:04 So, for the most part, I've been using Vim forever.

01:03:08 As long as I've used an editor that wasn't on Windows, I've been using Vim.

01:03:13 And so, you know, naturally, after years, you kind of build up muscle memory, and you build up a whole set of configurations and all that stuff.

01:03:23 And so, changing editors is hard.

01:03:25 But given that I work on an extension for VS Code, it's pretty meaningful to actually use VS Code, right?

01:03:34 Right.

01:03:34 Just to experience the thing, right?

01:03:36 It just makes it all that better.

01:03:38 I really appreciate VS Code.

01:03:39 I'm not really a big use my mouse while I'm working sort of guy.

01:03:43 So, VS Code is definitely out of the box oriented towards use your mouse.

01:03:49 I mean, Windows is.

01:03:50 So, kind of there's that mentality.

01:03:53 And that's fine.

01:03:54 It's definitely, that's a target.

01:03:56 So, it's not really how I operate all that much.

01:03:59 There are ways, however, there's like a Vim extension, which basically makes VS Code work like Vim.

01:04:06 So, I tried it.

01:04:07 And it was nice.

01:04:09 There were only a couple problems.

01:04:11 And they're kind of blockers for me.

01:04:13 Okay.

01:04:13 I use VS Code for Python stuff sometimes, but most of the time not.

01:04:18 Once you know and love an editor, it's tough.

01:04:21 I think that they're solvable problems.

01:04:23 And I've kind of pushed the feedback upstream.

01:04:25 So, who knows?

01:04:26 I mean, maybe I'll move away from Vim at some point.

01:04:29 Makes it hard when I'm in a terminal and I need to edit stuff.

01:04:32 I can't really pop up VS Code.

01:04:34 Yeah, yeah.

01:04:35 But you do have that cool of like remote editing stuff that's coming in VS Code, which is pretty cool.

01:04:39 That was one of the blockers.

01:04:40 And now that there's that, it's less of an issue for me.

01:04:43 So, there are only really a couple things left that are kind of blocking me from using VS Code.

01:04:49 Otherwise, I like it.

01:04:50 There are a lot of things that I just don't, haven't bothered with them, but you just get out of the box with VS Code.

01:04:57 And it's nice.

01:04:57 Cool.

01:04:58 All right.

01:04:59 And then notable PyPI package.

01:05:01 Maybe not the most popular, but something you're like, wow, people should really know about this and maybe they haven't heard of it.

01:05:06 That's a great question.

01:05:07 There's a few out there.

01:05:09 I'm a big fan of projects that really have been able to stay on top of their growth.

01:05:14 That's a really hard problem when you're working on a project and it gets popular.

01:05:18 Trying to keep up.

01:05:19 Most of the time it's just volunteers, spare time.

01:05:22 Things often grow pretty organically.

01:05:24 I think for the most part, most programmers are pretty pragmatic.

01:05:28 So, they aim for immediate fixes.

01:05:30 So, it's really hard over time to keep a project under control, especially when it gets big.

01:05:36 So, I'm a big fan of projects that kind of keep that under control.

01:05:41 There's some projects I think that have aimed for simplicity and really focused on that.

01:05:47 See, I'm setting myself up for failure here though, because I want to give a good example of this.

01:05:52 And not having looked at any projects too closely in a while, I may be kind of invalidating my whole point.

01:06:00 There's some neat ones out there that people find useful.

01:06:02 Of course, adders, but adders is kind of, with data classes now, adders is, it still has a place, I suppose, but not quite as much as it did.

01:06:11 Yeah, it definitely seemed to have.

01:06:12 Did it directly inspire data classes?

01:06:14 Yeah.

01:06:15 It's kind of achieved its goal like in a meta way.

01:06:17 Anyway, yeah.

01:06:18 I have one on there, ImportLib2, which is a backport of Python 3's ImportLib to Python 2.

01:06:24 But I haven't really kept up with it, so it probably doesn't work anymore.

01:06:29 Those are good.

01:06:30 Yeah, adders is definitely a good one.

01:06:31 Backport ones are kind of useful sometimes.

01:06:34 But there's also, there's some that make it easier to use some of the trickier functionality of Python.

01:06:41 So, things that deal with descriptors, for instance.

01:06:44 There's some decorator packages out there.

01:06:47 I think Graham Dumbledore has wrapped.

01:06:49 That's an interesting one.

01:06:50 One that I think people don't think about a whole lot is PSUtil, which actually is really neat because it has some good abstractions cross-platform for a lot of the things that you do system-side.

01:07:04 Like monitoring processes, getting system information, killing processes, or whatever.

01:07:10 But it also stays pretty focused.

01:07:12 I think that's a good one.

01:07:13 Yeah, PSUtil, that's definitely a good one, yeah.

01:07:14 PyC parser is one I've looked at recently that does some neat things.

01:07:19 It allows you to parse C code.

01:07:20 Pure Python, though.

01:07:21 Oh, interesting.

01:07:22 Okay.

01:07:23 There's some limitations to it, but otherwise, I think it's actually pretty cool.

01:07:27 Awesome.

01:07:28 Very cool.

01:07:28 Those are definitely some good ones.

01:07:30 All right.

01:07:30 Final call to action.

01:07:31 People are excited about this.

01:07:32 Maybe they want to help out.

01:07:34 Maybe they want to try or see some of the changes.

01:07:36 Is there something they can do?

01:07:38 Is there like this list you talked about?

01:07:40 Can they find this list to see if they can take one of them for you?

01:07:43 First of all, if anybody's interested, they can just get in touch with me immediately.

01:07:47 I'll get right back to you.

01:07:49 We'll talk about all about the project, how they can help, what their interests are, how that lines up.

01:07:55 That project I talked about.

01:07:56 That project I talked about, the link that you'll have, has a lot of the tasks broken down as issues organized on the project board.

01:08:05 So you can take a look at those.

01:08:07 Also, the wiki is basically where I've dumped pretty much all of my notes on this stuff.

01:08:12 Read through there.

01:08:13 There's lots of stuff.

01:08:15 You can see how it applies.

01:08:16 Give feedback on the pep.

01:08:19 And there may be other ways that it could work that you've thought of that nobody else did that are worth talking about.

01:08:27 But again, just get in touch with me.

01:08:29 It wouldn't take a lot of effort.

01:08:30 And I can get you working on something right away, something that will interest you and will make a real difference here.

01:08:38 I think this is a feature that people, until they think about it, don't realize how important it is.

01:08:45 I really do think that it's going to make a big difference for people.

01:08:47 That's awesome.

01:08:48 Great bunch of ways for people to get involved.

01:08:49 I totally agree with you.

01:08:50 I've certainly put this in my top five most important projects for Python.

01:08:55 So very good work.

01:08:56 I love this deep dive.

01:08:57 Thanks for taking the time, Eric.

01:08:58 Yeah, thank you.

01:08:59 Thanks for having me, Michael.

01:09:00 Yeah, you bet.

01:09:01 Bye.

01:09:01 This has been another episode of Talk Python to Me.

01:09:05 Our guest on this episode was Eric Snow.

01:09:07 It's been brought to you by Linode and TopTal.

01:09:09 Linode is your go-to hosting for whatever you're building with Python.

01:09:14 Get four months free at talkpython.fm/Linode.

01:09:17 That's L-I-N-O-D-E.

01:09:19 With TopTal, you get quality talent without the whole hiring process.

01:09:24 Start 80% closer to success by working with TopTal.

01:09:28 Just visit talkpython.fm/TopTal to get started.

01:09:32 That's T-O-P-T-A-L.

01:09:34 Want to level up your Python?

01:09:36 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:09:41 Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python.

01:09:49 And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

01:09:53 It's like a subscription that never expires.

01:09:55 Be sure to subscribe to the show.

01:09:58 Open your favorite podcatcher and search for Python.

01:10:00 We should be right at the top.

01:10:01 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:10:11 This is your host, Michael Kennedy.

01:10:12 Thanks so much for listening.

01:10:14 I really appreciate it.

01:10:15 Now get out there and write some Python code.

01:10:17 I'll see you next time.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon