#107: Python concurrency with Curio Transcript
00:00 Michael Kennedy: You've heard me go on and on about how Python 3.5's async and await features changed the game for asynchronous programming in Python, but what exactly does that mean? How does that look in the APIs? How does it work internally? Today I'm here with David Beazley (@dabeaz) who's been deeply exploring this space with his project Curio. And that's what this episode of Talk Python To Me is all about. It's episode 107 recorded April 14th 2017. Welcome to Talk Python To Me, a weekly podcast on Python: the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is brought to you by Rollbar and Hired. Thank them both for supporting the show, check them out @rollbar and @Hired_HQ on Twitter and tell them thank you. David, welcome to Talk Python.
01:25 David Beazley: Hi, how are you doing?
01:26 Michael Kennedy: I'm doing great, it's great to have you back. It's been going on two years since you were on one of my first episodes, episode number 12, talking about packaging and modules and diving deep into understanding those. I think we're gonna be diving deep into another topic, another area of Python today, but this time in concurrency.
01:47 David Beazley: Yeah, it should be fun. Something I've talked about in the past.
01:50 Michael Kennedy: Yeah, yeah, you've definitely been talking a lot about it lately in amazing presentations which we'll get to. I know many people know you, it's been two years since I last asked you this question, so maybe just briefly you could tell us how you got into Python programming, that sort of thing.
02:07 David Beazley: All right, well, not to get into too much detail, I guess I first found Python in 1996. I was doing some scientific computing, parallel computing kinds of things, and just found it for doing, basically found it for scripting scientific calculations and it kind of grew from there. In the more modern era, I'm known for writing a couple of Python books so that's where people know me.
02:30 Michael Kennedy: Yeah, absolutely. What books are the most famous ones you've written, the most well known?
02:35 David Beazley: Yeah, the Python Essential Reference. That's been around for a while. And then the, did the third edition of the Python Cookbook with O'Reilly.
02:42 Michael Kennedy: Okay, excellent, yeah, those are both great. And these days, what are you doing in terms of work and programming with Python and other things?
02:50 David Beazley: Most of my work is training, actually. I do a lot of teaching of Python classes. That's what's mainly paying the bills. And then also funding, sort of hacking on various open source projects in the other time when I'm not doing that.
03:06 Michael Kennedy: That's great, I've done a lot of training, a lot of in person training previously, and I thought it was just, I think it's a really great career. I think it's a perfect balance or a great balance, let's say, where you get to teach things to people, see the reaction, see how they take it. They kind of test your understanding of it, and that's part of your job, and the other part is just to research and learn and stay on top of whatever it is you're teaching. It's really nice, I think.
03:35 David Beazley: Yeah, I'm always trying stuff out with teaching. I mean, I find it, in forms like talks, to give at conferences, and also informs of books and things. A lot of what people would see at a conference is probably something that I've tested out in the context of teaching or I've tried to do it different ways and just kind of see people's reactions and confused looks.
03:56 Michael Kennedy: This isn't working, we've got to try this. Of course, the iteration is so much faster, right? You could teach two or three classes in a month. How many conference talks do you give in a month, right? Or how many books do you write in a month? Not nearly as many.
04:09 David Beazley: Right, right. I mean, I'm supposed to be working on a book update right now. And it's, it's going slowly, but I'm thinking a lot about these topics, like how to present material, how to think about it.
04:20 Michael Kennedy: Yeah, that's great. So let's go ahead and start talking about our main topic here, which is concurrency. And maybe we could start by just kind of talking about the concurrent options in Python, in general a little bit, and how do you feel that we're doing in 2017 with Python 3.5, 3.6, compared to say, five years ago?
04:41 David Beazley: Oh, okay, that's an interesting question. The gist of the problem with concurrency is doing more than one thing at a time. I mean, that's the basic problem. And it comes up a lot in network programming especially, so that's where a lot of people are very interested in it. Python has certainly been involved with concurrency for a long time, I mean, threads have been part of Python since 1992, I think, so that goes way back. And there's certainly been the option of launching separate Python interpreters, you could have multiple interpreters, might be a way of doing that. So these have been kind of classic approaches that have been around for a while. Alongside that, you have a lot of people messing around with things like callback functions, event loops, packages based on that. So things like the Twisted framework sort of emerges out of that. So a lot of this has been going on for quite some time. I mean, maybe over just throughout Python's history. This question about where it goes in Python 3, I mean, that's kind of a, I'm trying to think how to chew on that.
05:47 Michael Kennedy: It's a big question, huh?
05:48 David Beazley: No, it is a big question, because you've got the... I mean, obviously the big development is the asyncio library. You get that added to Python. And then that is tying together a lot of ideas from different places. A lot of concepts about event loops and generator functions and coroutines and all these things are kind of coming together in that library. There's a lot of excitement around that library, but it's also a really difficult concept, that's a very difficult library to wrap your brain around.
06:24 Michael Kennedy: Yeah, it takes a bunch of things that are individually pretty conceptually hard and then puts them all together.
06:31 David Beazley: Yeah. I actually realized that, in hindsight, I never really quite understood that library the first time I heard about it. I watched Guido give a keynote talk about it at PyCon, and I'm trying to think which one that was. Might've been 2012 maybe, 2013, and my takeaway from the talk is that, oh, this is gonna be cool, we're gonna do things like coroutines for async, and we'll probably talk about that later. I went and rewatched that talk recently, maybe three months ago 'cause I was like, what did Guido say in that talk, exactly? And it was not at all what I remembered. You rewatch the talk, he's talking more about trying to have a common event loop in Python to have some kind of interoperability with some of these libraries like Twisted and Tornado, maybe gevent or something. And the focus on coroutines was more incidental.
07:33 Michael Kennedy: That's really interesting. So if, for example, you're using Twisted, it has some kind of event loop that's doing the callbacks for you as your things complete or whatever. And if you're using something else that also has an event loop, those things might not know anything about each other, right?
07:51 David Beazley: Right, right, right.
07:52 Michael Kennedy: Yeah, that can definitely be a challenge. So that takes us up to, what, 3.3, 3.4?
07:58 David Beazley: 3.3, yeah.
07:59 Michael Kennedy: 3.3, right. And then in 3.5, we got async and await. Which was really take these ideas and make them more accessible to people, I think.
08:12 David Beazley: I think, yeah, trying to put like a better face on it. It's like putting a different, I don't know how to, it's almost like a different API on top of that machinery to present it in a more coherent way. It's certainly not a, that approach is not a Python invention. I don't know whether they directly cited, but it had been done in C# before.
08:36 Michael Kennedy: Yeah, and what's interesting, the history and C# kind of follows the same way, they didn't come up with that initially either. They came up with just this idea of a task framework, and it was all callback driven and whatnot, and then somebody said, "Oh, look, this callback way of writing, this is super not nice." It works, but it's not the same as writing serial code, if we put this async and await on it, it will be, it has exactly the same benefit for Python is that it takes code that would have to look special and it kind of makes it look serial, right.
09:09 David Beazley: Yeah, that really, that is kind of the whole focus of it, writing code with callbacks. You see that in every talk about async, there's usually a slide that's like, oh, callback hell or something. Everybody kind of moans.
09:25 Michael Kennedy: Yeah, exactly.
09:26 David Beazley: I've seen callback hell, and then there's the different approaches for how to untangle yourself from that. It's interesting, where it tends to push everyone is more into just serial code, people want something that looks a lot like maybe thread programming or just kind of straightforward code, but then there's this question of how do you get there. So that is fooling around with generators and tasklets and green threads. All these things that you see in these libraries are all kind of focused on that general problem.
10:05 Michael Kennedy: Yeah, so that's a really interesting thing to understand from the ground up. One of the subjects I wanted to cover while we were talking today is some of the ideas that you brought up in the talk that you gave at PyCon 2015 called Python Concurrency From the Ground Up: Live. And before we get into that, I just want to say, that was such a masterful talk. You did a fine job on that talk. For those of you who haven't seen it, David basically pulls up the editor and says, "We're gonna write a web server and we're going to explore all," or a TCP server, "And we're gonna all the ways that you might approach concurrency from it and the ways we might invent something similar to the asyncio that's built into Python," and it was really well done.
10:58 David Beazley: Okay, thank you.
10:59 Michael Kennedy: Yeah, yeah. I'll link to that in the show notes, and people should definitely check it out. But maybe we could just kind of talk through some of the ideas of, if we start with a serial server and a serial client, how do we, what are the ways in which we can build up to that? Like, we can use threading, we could use coroutines, there's lots of things, right?
11:20 David Beazley: Right, right.
11:21 Michael Kennedy: So obviously, you start with serial code, it's fine as long as you don't have too many people requesting from the server, but as soon as you have some long request, everything is blocked, right?
11:29 David Beazley: Right. The whole big picture of that talk, I'm gonna try to distill it down from a high level view here. It's all about scheduling, basically task scheduling. If you have normal serial code, you're executing statements, you're going line by line through the code. If you hit an operation like receive. Like I wanna receive data on a socket or something, that code is gonna block. If there's nothing available, it's going to block and you're gonna have to wait. And that really is the gist of the whole problem, which is what happens that like, what happens when you block. And if you do nothing, then your whole program just freezes, everything stops and then nothing can happen. Some ways around that, one approach to do it is to use threads in the operating system. Essentially with threads, you're running multiple serial tasks at once. And if one of them decides to block, it needs to receive, well then the others are still allowed to run. So you're essentially allowing the operating system to deal with it, it would take care of scheduling the threads and making sure that things work. The other approach, and this is something that the talk gets into at the end, is to do it yourself. Don't have the operating system do it, take care of that blocking on your own. And one of the tricks that's used for that is to use a Python generator function. And that is used there, it's actually just the behavior of that yield statement, so if you haven't written a generator function before, I think most people kind of know them in the context of iteration and the for loop, you can write this function where you use the yield statement to emit values out of a for loop. And the thing that's really cool about that yield statement is that it causes the function to just suspend itself right there at the yield, it's like it emits a value and then it suspends. And that's exactly the kind of thing you need to do this concurrency thing, you can say, well, if there's no data to receive, I can suspend myself. It's actually, you can take over the role that an operating system would normally do at that point.
13:52 Michael Kennedy: Yeah, and I think that's a really really interesting insight, that you can say, "I'm gonna take this thing that really doesn't generate anything and make it a generator anyway." And so you had some interesting examples of like, we're going to basically simulate parallelism with generators, and one of the reasons you might care about that is you can switch things over to threads, but that actually slows things down quite a bit, and especially if there's computational stuff, you might have to push that out to do multiprocessing and then it really slows down 10 times what it might normally be. And so you had this great example that maybe we could talk about just a little bit where you say like, "Let's just come up with generators that can count down" from, like, you give it a number like 10 and it'll count down to nine, eight, down to one and then it's done. And you generate multiple of these with different numbers and whatnot and put them all into a task list of things that have to be run. And then you, one at a time, sort of round robin work with those generators. And I think that really highlights, curious how this event loop can work, right? We can actually process these in a semi-fair way across all these different generators.
15:05 David Beazley: Right, you can cycle, you can kinda cycle between them.
15:08 Michael Kennedy: Yes, which is really, doesn't make a lot of sense when you just have a countdown little thing. But then you say, okay, well, now let's apply the same idea to functions, like a while true loop that is going while true come over here and wait to receive from a socket, then process the response, while true wait. And if you put yield statements throughout right before all the blocking places, you can kind of accomplish the same thing with the same technique, right?
15:40 David Beazley: Yeah, yeah.
15:41 Michael Kennedy: Okay.
15:42 David Beazley: I kind of describe it, I don't know, this might be sort of silly, but the whole approach in that talk, I sort of view it as analogous to maybe the game of hockey or something, you've got a task and it's out on the ice and it's doing its thing, but if for some reason it's gotta receive and it can't proceed, it gets thrown into the penalty box.
16:03 Michael Kennedy: All right, one of the rules is you can't block, and any time it fails that--
16:06 David Beazley: Yeah, you take a blocking penalty. It's like, okay, you blocked, you're going to the penalty box and you're gonna sit in the penalty box for as long as it takes until some kind of data comes in. And then once some data has arrived, and it's like, okay, you get to go back out on the ice. It's very much that model, it's like task it to run as long as there's things to do, but once there's nothing to do, you go sit in the penalty box.
16:33 Michael Kennedy: I think that's a really interesting analogy, it definitely makes, it's a good way to think about it. The challenge though, and what it was, I think this is pretty obvious, I'm going through and if I could pull out any task and I could ask it, "Hey, do you have work to do " Then I'm gonna let you do it, "Otherwise go back to waiting or go to the penalty box, and when you decide that you have work to do, "You come back out." It's interesting, but then how do you know when it actually has work to do? Like on the socket, how do I know, or if I'm doing something computational, how do I know that that task is ready to run?
17:06 David Beazley: Oh yeah, well, to get that, you need the help of the operating system. So there are some system calls related to like polling of sockets. Like the select call is one, there's things with like the poll function, there's low level event APIs in the operating system where you can present it with a whole collection of sockets and you can say, "Okay, I have these 1,000 sockets, why don't you watch these and then, if anything happens, tell me about it."
17:37 Michael Kennedy: Yeah, and that works really well for sockets, but what if I gave it like a piece, just a function, a Python function that was computationally expensive?
17:45 David Beazley: Yeah, if it's computationally expensive, it's just gonna run. It's actually a problem with this event loop thing, if you have something that runs, I don't know, it's gonna go mine a bitcoin or something, it's just gonna run. And there's no way to get it back until it finishes.
18:01 Michael Kennedy: Yeah, it's true. They're all really, the event loop really is running on the same thread, right?
18:05 David Beazley: Yeah.
18:06 Michael Kennedy: I guess there are some things you can do, like you can say, well, this part is computational so we're going to kick that off and some multiprocessing away or something like that, that's possible. You had some kind of socket trick where you're, even when you weren't using sockets, you were using it to signal with.
18:21 David Beazley: If you're gonna do CPU work somewhere else, in some sense, you turn that back into an IO problem. You might have some work that gets carried out somewhere else, and then when it's done, it gets signaled on a socket saying, "Hey, that thing was done."
18:38 Michael Kennedy: Yeah, okay, it was a very interesting technique and in the end, down in the internals, you kind of had to deal with some of the callbacks, but the way it got consumed, it was pretty straightforward. So that was your talk, and like I said, people should absolutely go watch it, it's really quite amazing. And then the other thing that I kinda see as the frameworkification of these ideas, I'm not sure what the origin is, I'll ask you which one came first, but there's this project that you have called Curio. Do you want to tell people what Curio is?
19:14 David Beazley: Yeah, okay, so Curio is a library for doing concurrency in Python. It exploits a lot of this async and await stuff. It ultimately sits in kind of the same spot as asyncio, although it has a very different flavor to it.
19:30 Michael Kennedy: One of the things that I think is really interesting with Curio, it's like there's been, as we talked at the beginning, there's been these ideas for a long time of doing some sort of asynchronicity through callbacks and things like that, like with Twisted. We've got asyncio built into earlier versions of Python. But in Python 3.5, we have async and await, and onward of course, and you kind of took all of those ideas and said, "Let's rethink them." How would this API look if we actually had this version of Python concurrency, not what we had before? That was how I was reading it when I was going through it.
20:09 David Beazley: Yeah, that's part of it. So there's a little bit of a complicated background on this. So let me back up. In a past life, going back a ways, I was a professor in computer science. And the main course that I taught was operating systems. And in that course, so this is a typical course where you'd make students cry on some huge project, and it was just a bloodbath of a course where we'd make people write an operating system kernel in C. And that kernel had to do all of this stuff, it had to do IO and it had to do like multitasking and task switching and all these things, and it turns out that all the stuff in that project was exactly the same kind of thing that people have to do in these asyncio libraries. I mean, that's what they're doing, they're doing IO and they're coordinating tasks and switching stuff, so it's... The problem is essentially the same, just in a different environment. Instead of down at the low level of C and interrupt handlers and device drivers, you're up in Python and it's much higher level, but it's a similar topic. Having done that, I've always kinda had an interest in systems topics. So gave a well known PyCon presentation on the GIL. That would've been maybe 2010 or something, something like that, and I also have done some tutorials at PyCon about coroutines, sort of exploring this idea of using generators and coroutines for concurrency. It's been a topic that I personally been kind of exploring for a long time. But one of the things that has kind of bothered me over those years is that all of my presentations on that have been completely out of line with what is actually going on in Python. I mean, if you look at that concurrency talk, that is not at all how asyncio approaches concurrency, approaches that problem. If you look at a lot of the presentations and things, what they talk about is they're like, oh, we have an event loop, and then we put callbacks on top of the even loop, and then usually there's like a transition into a discussion of futures or promises. And a whole approach based on futures and promises and tasks, and it starts getting a lot of moving gears, and frankly I just have not been able to wrap my head around that stuff. I look at that approach and it is completely different than anything that would've been taught in an operating systems class. I've never seen an operating system kernel built on top of futures, for instance. I actually went and got my old operating system books not too long ago, and I was like, god, did any of those books talk about futures or promises? And they're nowhere, in any of the operating system texts do you see that. So I've been struggling with this kind of mismatch in a way, where it's like hmm, there's this whole approach with futures and there's this thing that I did in the talk which is a completely different thing and it, I've been fascinated with that mismatch. Why is that or what is going on there, and in some sense, the Curio project is maybe it's kind of like a green field project. Just trying to do asyncio, but more in the operating system kind of model where I'm thinking more in terms of task scheduling and the structure of how I would it in like a kernel project, not in the framework of the futures and promises and callbacks and all of that stuff. That's a big part of that project, and so just kind of a re-envisioning of how this might work.
24:12 Michael Kennedy: Right, okay. Do you feel like it provides a cleaner model by not making you think about all of the callbacks and futures and stuff like that?
24:22 David Beazley: I think it does. It is really wild at first glance because it pretty much kills off everything that you're used to like, in fact there are no callbacks in Curio at all, there's no callbacks, there's no futures. There's almost none of that machinery that you see in the asyncio library. One way of describing it, it's almost like I took the async and await feature that got added in 3.5, and then just used it as a starting point for some completely different approach.
24:54 Michael Kennedy: Yeah, absolutely, and so it's really involving a lot of async and await, and coroutines, basically, right? So a lot of the starting points, a lot of things you want to do, you're like, provide this class with some kind of async method, right, an async coroutine. And then it runs from there, right?
25:15 David Beazley: Right, right, right.
25:16 Michael Kennedy: Yeah, so there... I don't know, people have looked at it or not, but there are a lot of building blocks, a lot of really nice parts to this library. When I first thought about it, when I first checked it out, I kind of thought, okay, this would be, maybe it's got like a coroutine loop and it's doing a few things differently, but there's a lot of building blocks here, you can build some really interesting things with it.
25:38 David Beazley: Yeah, there's some really odd stuff going on, I don't know how we want to get into that. One thing about, actually getting back to the operating system model on that, these async libraries, let's see if this makes sense, a lot of these async libraries are kind of like an all in proposition where you're either coding in the asynchronous world or you're not, and it tends to be kind of a separation between those two worlds, even if you're working with callbacks, it's kind of like you have to program in kind of the callback style or you're out of luck. And I think Curio embraces that as well. One thing from operating systems is there's usually a really strict separation between what is the operating system kernel, and then what is like a user space program. Even at the level like a protection kind of thing, a user program isn't even allowed to see the kernel in any meaningful way. There's very strict separation. That is also something that's going on in this Curio project, there's kind of the world of async, you have all these async functions and await and all of that, and then there's the kernel, and those two worlds are really separated from each other in the Curio project. That's another unusual thing about it.
27:04 Michael Kennedy: Yeah, that is really interesting. Now I can see the operating system analogies, and there is a kernel, I think you actually call the kernel in Curio, right?
27:13 David Beazley: Yeah.
27:14 Michael Kennedy: Yeah. This portion of Talk Python To Me has been brought to you by Rollbar. One of the frustrating things about being a developer is dealing with errors, or relying on users to report errors, digging through logs files trying to debug issues, or a million alerts just flooding your inbox and ruining your day. With Rollbar's full-stack error monitoring. You'll get the context, insights and control that you need to find and fix bugs faster. It's easy to install. You can start tracking production errors and deployments in eight minutes or even less. Rollbar works will all the major languages and frameworks, including the Python ones such as Django, Flask, Pyramid, as well as Ruby, JavaScript, Node, iOS and Android. You could integrate Rollbar into your existing workflow. Send error alerts to Slack or HipChat. Or even automatically create issues in JIRA, Pivotal Tracker and a whole bunch more. Rollbar has put together a special offer for Talk Python To Me listeners. Visit rollbar.com/talkpythontome, sign up and get the bootstrap plan free for 90 days. That's 300,000 errors tracked all for free. But hey, just between you and me, I really hope you don't encounter that many errors. Loved by developers and awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Give Rollbar a try today. Go to rollbar.com/talkpythontome. Basically you use the constructs and you pass these async codes into it, and that's that. Okay. So if you're doing, like you said in the all in part, if you write some sort of coroutine and it comes down to a point where you're either doing something computational blocking and you block there, you kind of take everyone out, right?
29:09 David Beazley: What do you mean, take everyone out?
29:13 Michael Kennedy: You can clog up the event loop.
29:15 David Beazley: Oh yeah, yeah, definitely.
29:16 Michael Kennedy: Yeah, I mean, you can like throw a wrench into the whole, hey, we're just gonna keep going and letting that, you do your work and when you're done, come back and we'll pick up where you left off. Yeah, so how do you deal with that? If I've got some async coroutine I want to run in Curio, and it's got something computational, how do I make that work? Sometimes you just have to do something that's gonna take a while, right?
29:40 David Beazley: Yeah. I mean, if it's concern, you have to put it out either to a thread or another process.
29:45 Michael Kennedy: Yeah, okay, like with multiprocessing or something.
29:47 David Beazley: It's kind of the standard technique for all of these async things, it's like we got something computational and it's gonna block, you gotta put it out somewhere.
29:56 Michael Kennedy: And then do you have a way, like a construct, to like await a thread or await some kind of multiprocessing call?
30:02 David Beazley: There's a function in there you can ask to run something in a thread, or it can run something in a process. It will take care of it and wait for the result to come back, but it will not block the internal loop.
30:16 Michael Kennedy: Okay, yeah, very nice. If we have some time, we could talk at some of the individual building blocks. What can you build with Curio, do you think? It looks like it's a little bit below a web framework, but it's close to a framework for building asynchronous programs on its own. Where do you think this fits in?
30:38 David Beazley: Okay, yeah, it is definitely not a web framework. I don't even think there's any HTTP support in it right now. I think it is more a framework for concurrency, actually setting up tasks, communicating between tasks, coordinating things. It's the kind of thing that you might start building libraries on top of, maybe libraries to interact with Redis or interact with databases or even to do HTTP. But it's definitely a lower level thing. So it's a lot of coordinating tasks and things of that nature.
31:15 Michael Kennedy: I see, so if I wanted to create some framework that was backed with Redis, I could use Curio to make a really nice async and await framework and somehow do the network IO internally and people might not even know that it's Curio. They might just know my framework, and it talks to Redis, and internal part of that could be Curio.
31:34 David Beazley: Yeah, maybe, yeah, yeah. I mean, I see it personally, it's something I might implement a lot of microservice code with. Like little web services, things like that. Done a little bit of playing around trying to implement a game server with it.
31:48 Michael Kennedy: Sure, like a socket based game server.
31:50 David Beazley: Yeah, socket based game server. I think there's a lot of uses with testing. That is, there are a lot of people who do network programming that's not necessarily web programming, and so I think it kind of fits into that.
32:03 Michael Kennedy: Yeah, and it has really good support for TCP and UDP type stuff, right?
32:09 David Beazley: Right, right.
32:10 Michael Kennedy: Okay. If I wanted to take a framework that maybe I'm already using, let's say Django, Flask, Pyramid, something like that, that doesn't have any support for this idea of concurrency or async await, could I somehow use Curio in my own code if I'm willing to do some kind of callback mechanism or notification mechanism in my web app for asynchronous stuff, or would those things just not make any sense together?
32:41 David Beazley: I think it would be tough. If you've got code that was written originally for the synchronous world. Getting that into any async framework, even asyncio or anything like that, it can be kind of a tough proposition just because there's, just the programming model can be very different and you have to instrument a lot of code with like these async and await calls. It's unclear.
33:06 Michael Kennedy: Yeah, yeah. I'm thinking like, if you have web sockets or some kind of call and then poll JavaScript thing. Maybe those parts of it could somehow use Curio?
33:18 David Beazley: Maybe.
33:19 Michael Kennedy: Yeah, maybe, I don't know.
33:20 David Beazley: Maybe. I mean, you actually kinda, this opens up an avenue of discussion which is actually, where does this thing, where does it fit in the grand scheme of things? And I think, one thing with these async frameworks is just stepping back for a moment and thinking like, okay, what is the use case for these things or what are they really good at? One of the things that they're really good at is handling a gigantic number of connections. A high degree of concurrency where you might have like, let's say I had 100,000 clients connected to some server, and I've gotta maintain some pool of 10,000, 100,000 socket connections. That is where these async things tend to shine, because you can't just spin up like 10,000 threads.
34:12 Michael Kennedy: Yeah, you can't. Just the memory required for the stack spase would be problematic, right?
34:18 David Beazley: Right, right. So they're really good at that. So you have a high degree of concurrency, but at the same, even when you have a lot of clients, it doesn't mean that those clients are all doing things at the exact same time either. You might have like a server that has like 100,000 connections open, but maybe it's doing push notifications or low traffic stuff. It's not like you're gonna have 100,000 connection open just completely hammering your machine all the time.
34:51 Michael Kennedy: Right, something like Slack or something where everybody's got it open, but the amount of traffic is actually quite low. But you want it to be basically instant, right?
34:58 David Beazley: Right, so the kind of stuff I'm thinking about is like, okay, so maybe let's say you have 100,000 connections open. Could you still use something like Flask or Django or something, could you still use that in some capacity? Now, you can't, you're not gonna be able to spin up 100,000 threads running Django or whatever, but could you have some coordination between these tasks and something like Curio or asyncio and coordinate that with maybe a smaller number of threads or processes or whatever that are running a more traditional framework?
35:35 Michael Kennedy: Yeah, exactly. That was what I was thinking, I'm not sure if it's possible though.
35:38 David Beazley: I don't know. One thing in Curio that I think it's actually one of the more interesting parts of the project is I'm trying to do a lot of coordination between async and traditional thread programming. As an example of that, one thing that Curio has is it has this universal queue object. This is probably one of the most insane things in the whole library, but a standard way to communicate between threads is to use a queue.
36:09 Michael Kennedy: Right, because shared data is problematic, you've got a lock on it and all sorts of stuff.
36:14 David Beazley: Right, so you have a queue and you share between threads. So there's this thing in Curio that lets, that basically allows queuing to work between async tasks in Curio and thread programs in a really seamless way. Essentially, the thread part of it just thinks it's working with a normal queue and everything works normal and like the Curio side works with an async queue and it thinks that everything is kind of normal. And you get this queuing going back and forth between the two worlds. It's sort of seamless in a really disturbing way. It's like, maybe you could have 100,000 tasks that are managing sockets, but then talking to some pool of threads through queuing, and it all kind of works. This is an area that is not, at least as far as I know, not being explored more traditionally in asyncio, for instance. They have a queue there, but it's not compatible with threads.
37:13 Michael Kennedy: Right, so this is a really interesting idea, this universal queue, it's like a dual facade. The different worlds can see it as part of theirs, right?
37:24 David Beazley: Yeah, yeah. Somebody contributed a feature to Curio to allow it to submit work to asyncio. So you could have Curio and you could have threads and you could have asyncio, this queuing object in Curio actually works in all three of those worlds. Done some tasks on that, so you could have a queue where one end of the queue is an asyncio task and the other end is a thread, or you could have a thread and Curio tasks putting things on a queue that's being read by asyncio and other things, that's really kind of wild, crazy thing to be playing with.
38:01 Michael Kennedy: Yeah, that's really interesting. I didn't know about universal queue, but the library's full of these really amazing little data structures and functions and stuff. It's quite neat.
38:12 David Beazley: The other thing that's wild and, kind of interesting in Curio too is just the way that the task model works. It has a lot of support for things like cancellation of tasks, and that turns out to be a really tricky problem. It's like, okay, you set up a whole bunch of work and then you wanna cancel it. Can you do that? And that's something that you can do in Curio, and it's very interesting because it's something that you can't do with threads traditionally.
38:39 Michael Kennedy: Right, if you just kill a thread, maybe it's holding onto some kernel level thing and you've forced it to just leak it or whatever, right? It's gonna be bad.
38:47 David Beazley: You have a way to even kill a thread. I mean, there's no API for killing a thread, and then I think some people have done it going through C types, but that just makes my skin crawl. Killing threads by going through C types seems like a really good way to just not have your program work.
39:08 Michael Kennedy: Yeah, exactly. If it's holding onto something important, let's imagine it's holding onto the GIL and you kill it. What happens then? That might not be great, yeah, okay. So this cancellation thing, you're right that that is not simple. How does it work, is it like basically every time one of these async coroutines yields or if it's not in a running state, you can just say, okay, this is getting canceled?
39:32 David Beazley: Pretty much, that's it. Yeah, since every operation requires the support of this kernel, if somebody wants to cancel something, if something wants to cancel something, it's either blocked in there already, or you can just wait for it to finish. Yeah, you can essentially just blow it away. You raise an exception at the yield statement saying, okay, you're done.
39:55 Michael Kennedy: Yeah, okay, that's what my next question is. You can't just take it out of the running task list and throw it away because it might've been in the middle of something that needs to be unwound, like create a file that needs to close the handle or something. So you basically just raise a task cancellation exception or something? Okay.
40:13 David Beazley: Yeah, it gets a cancel there and then it can choose to clean up if it wants, but it's sort of a graceful shutdown from that.
40:22 Michael Kennedy: Yeah, that's a really nice feature. So let me ask you about integrating with some other things. I have databases on my mind right now for some reason. There's a bunch of nice ORMs or ODMs if you're doing NoSQL in Python, SQLAlchemy, Mongo engine, Peewee, Pony and so on. The one ORM that I've seen that seems to integrate really nicely with async and await is Peewee ORM. You can basically await on the queries that you're getting back from it, which is super cool. Would Curio integrate pretty seamlessly with a framework like that?
40:59 David Beazley: I don't know. Do you know how they're doing that under the cover? I looked at Peewee before we met here 'cause you mentioned it, and I didn't see that feature off the top of my head, but--
41:11 Michael Kennedy: To basically the extent that I know is I've seen reference to it where it basically supports async and await on the queries, the things that are going to talk to the database, but I don't know what's happening internally there.
41:24 David Beazley: Okay, yeah, I don't know. I would have to take a look at it. My initial guess is it probably would not work, just because Curio is so out in the left field right now. If they've written that specifically to work on top of asyncio--
41:40 Michael Kennedy: I see, yeah, so it might not, right?
41:42 David Beazley: Hit or miss on that, yeah.
41:43 Michael Kennedy: Yeah, yeah, I mean, the API is you just have an async method and you just await either object dot create or objects dot query and so on, but I don't know what's internal. It's probably asyncio, I would guess.
41:55 David Beazley: Yeah.
41:56 Michael Kennedy: This portion of Talk Python To Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to thousands of companies who will compete to work with you. Take it from one of Hired's users who recently got a job and said, "I had my first offer within four days and I ended up getting eight offers in total. I've worked with recruiters in the past, but they were pretty hit and miss. I tried LinkedIn but I found Hired to be the best." "I really like knowing the salary upfront, and privacy was also a huge seller for me." Well, that sounds pretty awesome, doesn't it? But wait until you hear about the signing bonus. Everyone who accepts a job from Hired gets a 300 dollar signing bonus, and as Talk Python listeners, it gets even sweeter. Use the link talkpython.fm/hired, and Hired will double the signing bonus to 600 dollars. Opportunity is knocking. Visit talkpython.fm/hired and answer the door.
42:48 David Beazley: I've been thinking about this. I mean, the same problem, I'm not familiar with the Peewee ORM, but I am familiar with SQLAlchemy.
42:55 Michael Kennedy: Yeah, that was my next question, what about things like SQLAlchemy or Mongo engine that have no concept of this at all? Could we somehow shoehorn them into working with Curio or these types of things?
43:05 David Beazley: Yeah, maybe. Keep in mind it's highly experimental. What I'm about to talk may not work. But one thing that I've been playing with in Curio, there's a concept in there known as an async thread. At first glance, it's like, oh god, this is insane.
43:24 Michael Kennedy: I thought that was wonderful, it's really cool.
43:26 David Beazley: Oh no, async threads are nuts. Let me see if I can explain. Okay, so in threads, with thread programming, you have threads, and then you have all these asynchronous primitive, or you have all these synchronization primitives, you have locks and queues and semaphores and all this stuff, so in thread programming, you have all this stuff that you normally use to write programs. It turns out that almost all of that functionality is replicated in these async libraries. If you look at asyncio, it has events and semaphores and locks and queues and stuff, and Curio has events and locks and queues and all that stuff, but the limitation of that, of the async libraries, is that those things don't work with threads. If you read the docs, it has this huge warning on it, it's like, this is not thread safe. If you use a thread with this, you're gonna die.
44:19 Michael Kennedy: You'll be sorry.
44:20 David Beazley: Yeah, so you have all these things that you would normally use with threads in these async libraries, but you can't use 'em in thread code. I got this idea where, I wonder if you could flip the whole programming model around, you could create an actual real life thread, but then have the thread, sitting behind the thread, you could have a little task that interacts with the event loop, interacts with the async world.
44:53 Michael Kennedy: So instead of having a bunch of processes and the event loop is running on one of them, the event loop controls all the threads in a sense, right?
45:02 David Beazley: Yeah, so what you have is you have one event loop. But then you have a real thread. Keep in mind, this would be like a POSIX thread, a real life fully realized thread, but sitting right next to that thread, out of view, out of sight, would be a little tiny asynchronous task, a little task on the event loop that is watching for the thread to make certain kinds of requests, and I was thinking like, what if you took, in the thread, you took all these requests for all these synchronization primitives and you just kinda handed it over to this little helper on the side? And then you let it interact with the event loop. And the thing that's really wild about that, so Curio supports this, this concept. It turns out you get all of these features with tasks and stuff in Curio showing up in threads. You can cancel threads. You can do all the synchronizations with threads and all these other things. I've been thinking about that in the context of some of this database stuff. Let's say I did want to interact with something like SQLAlchemy. Maybe I could have a pool of threads or something that would take care of the SQLAlchemy side of it. But then kind of coordinate it with tasks on the event loop through this async thread mechanism.
46:22 Michael Kennedy: That's very interesting, like some kind of adapter that looks like SQLAlchemy but really routes over to another thread, or maybe it creates the session and it does all the filtering or the query stuff, and then brings it back over when it returns or something like that?
46:40 David Beazley: Yeah, maybe. I'm not even sure it would be an adapter. I'm kind of thinking of this, the model in my mind is that, okay, if you're using async, let's say you did have a server and you got 10,000 connections sitting there. It's extremely unlikely that I'm gonna have 10,000, or that I would want to make 10,000 concurrent requests on the database. Most of these connections are probably sitting idle most of the time or doing other things, so I'm thinking like, maybe I could have 100 threads, you could have a pool of threads that are responsible for doing the database side of it, and then you could coordinate that with these 10,000 asynchronous tasks in some way. So it's gonna be kind of a hybrid model where some of the work takes place in threads and other work takes place in these async tasks, but it's done in a more seamless way.
47:36 Michael Kennedy: Well, it sounds really cool. I'd love to see it and try it out, but I don't know if it'll work either.
47:41 David Beazley: Yeah, I don't know. Finding the time sometimes to explore that is a--
47:45 Michael Kennedy: It's a challenge for sure. Yeah, but if you could take these really popular, really nice frameworks like SQLAlchemy, and somehow click them into this world without actually rewriting them from the ground up, that would be really cool.
48:02 David Beazley: Yeah, I agree, I think that'd be fun, yes. It's something I wanna try, and I've got this web service project that I did a couple years ago, and right now it's sitting on Python 2.7 with the SQLAlchemy and a bunch of stuff, and I look at that, it's like, yeah I really wanna rewrite this thing in Python 3.6 with Curio and try this thing. There's only finite resources in the day, so it's still on the to-do list to sort of get to that.
48:37 Michael Kennedy: Sure, but some sort of framework that bridged that divide I think would be generally applicable in a lot of places, and a lot of people would be excited about it. Very cool. So you talked about the asyncio module a lot. Curio is not built on it, right? It's something different.
48:52 David Beazley: It's a completely different universe. Doesn't use asyncio, doing its own things.
48:59 Michael Kennedy: That was mostly because you said you wanted this to be like a green field, like how would the ideal world of this space look like, not like let's pick up the old model and see what we can do with it?
49:08 David Beazley: Yeah, partly that, yeah. And partly, actually, Curio was kind of a project where I'm trying to learn about this stuff myself. Just trying to learn, what is async and await all about in Python? What can you do with it or how can you abuse it? What kind of horrible insane things are possible with it? In some sense, Curio is a project exploring a lot of that and exploring a lot of ideas about APIs and really kind of even the programming environment itself. It's not built on asyncio and it's, I don't even think it's really meant to clone asyncio, it's kind of its own thing right now.
49:47 Michael Kennedy: Right, sure, okay. So where are you going with this in the future? What are your plans?
49:52 David Beazley: Well, part of the plan is figuring out how to write about this in books. I mentioned I'm supposed to be updating my Python book. So a big part of it is, I'm thinking about just how to approach async and await in the context of writing and teaching. And so there's that element of it. If you do want a little rant on that--
50:12 Michael Kennedy: Yeah, go for it.
50:13 David Beazley: I have convinced myself that the approach of teaching async needs to be flipped in some way. And let me describe what I mean by that. If you see a typical tutorial on a lot of this async programming, it ends up being this very kind of bottom up approach, where it's like, okay, you have sockets and then you have the event loop, and then you start building all this stuff on the event loop, you have callbacks and then there's like, oh yeah, callbacks and you have futures, and then you start layering and layering and layering and layering, and then at some point, you reach this like, oh and we have async and await.
50:47 Michael Kennedy: Yeah, finally.
50:49 David Beazley: Yeah, finally, oh it's awesome. We have async and await. The problem with this approach is that I just have never been able to teach it. I've tried this in classes, kinda doing the bottom up approach to this async stuff, and every single time it seems like you get about halfway through it, and then you're just looking at a room with like deer in the headlights.
51:12 Michael Kennedy: Yeah, you've gone through the strainer and you've stripped everybody's interest out by the time you get to the interesting part.
51:18 David Beazley: Oh, it's horrible. You're just looking at all this deer in the headlight look, and it's like oh, oh god. You think about it, it's like okay, wait a minute. Let's say I had to describe file IO to somebody. Like you open a file in Python and you read from it. Is my description of that going to start with, well, okay, you have CPU registers, and what you do is you load the CPU registers with a system call number, and then a buffer address, and then you execute a trap, and then the trap goes in the operating system, and it's gonna do some, finds like a file inode, and it'll probably check the buffer cache, and then it will go do some stuff with the disk scheduler and bring some stuff in, and then there's like copying and then... Is that the description of how I'm gonna describe file IO to somebody?
52:11 Michael Kennedy: Right, people are like, I just want to read JSON, what's going on?
52:14 David Beazley: Exactly, I'm like, I'm thinking about... So this is my thinking on async too. It's like, does anybody actually care how this stuff works? Seriously, do you care that there's an event loop or a future or a task or whatever it is in there, and I'm not sure that you do. I'm almost wondering whether the approach to teaching this async stuff is to do this total top down thing where you're just like, you basically say, hey, you have async functions and you have await. And you just start using it. You don't even say, don't even mention generators or coroutines or the yield statement. Yes, it's built on that. But do you care?
53:01 Michael Kennedy: Yeah, I think you are totally right. I mean, you probably care in six months once you've been using it a while, you might wanna look inside. But when you don't even know what async and await is, you're right, you absolutely don't care. It seems like, let's write a program, show that it's blocked, show that we unblock it with async and await, awesome, right? That could be the way to get started.
53:20 David Beazley: Yeah. I've been thinking about that a lot, the context of the book writing. It's like, hmm, how am I gonna, how do I bring async and await into this book project? Am I gonna go the top down approach? Where it's gonna require kind of a leap of faith where it's like, yeah, okay, you just do it, it's like a file, you don't care. In my whole programming life, have I ever cared how a system call works? The answer's no. Other than teaching the operating system class, I've never once cared how a system call worked. And I kind of feel maybe the same way about async and await, it's like, if you approach it in the right way--
54:03 Michael Kennedy: Yeah, it doesn't have to be so daunting, right?
54:05 David Beazley: Yeah, it doesn't have to be so daunting. And actually, in that context, I'm almost wondering whether something like the asyncio module, it's like assembly code for async.
54:15 Michael Kennedy: Yeah, a little bit, yeah yeah yeah, a little bit, yeah.
54:17 David Beazley: Very overwhelming, right? You'd get in there, it's like, oh, coroutines wrapped by futures and tasks, and blah blah blah blah blah, you're like, aah, head is exploding.
54:28 Michael Kennedy: It doesn't have to explode, right?
54:30 David Beazley: Yeah, maybe you don't even need to know that stuff.
54:32 Michael Kennedy: Yeah, that's awesome. So what book is this that these ideas are gonna land in?
54:36 David Beazley: Ultimately, it'll be the Python Essential Reference.
54:39 Michael Kennedy: Nice, awesome.
54:40 David Beazley: I'm trying to figure out how to put async and await into the first chapter, which is like tutorial introduction?
54:46 Michael Kennedy: Yeah.
54:47 David Beazley: I got this idea, it's like, I'm just gonna drop it in the tutorial right away, and just see what I can get away with.
54:54 Michael Kennedy: Yeah, that'd be cool, just set the tone like, no, this is kind of normal now, we're doing this now.
54:58 David Beazley: Yeah, it's just, it's normal, yeah. Just what else would you use?
55:02 Michael Kennedy: Yeah, sure.
55:04 David Beazley: No, I don't know whether I can get away with that or not.
55:06 Michael Kennedy: Well, if you want a vote of one for your flipping this presentation style for how you present it to people, I think that's the right way to do it. So it's unanimous.
55:17 David Beazley: Yeah, okay. And sometimes the Curio project is kind of experimenting with that too. I don't know, focusing more on the async and await side of the equation as opposed to the low level mechanics.
55:31 Michael Kennedy: It's a really cool project. I like where you're going with it, it's definitely worth checking out to understand this whole async world better. So David, I think we're getting pretty much out of time, don't wanna use up your whole morning. So let me ask you a couple of questions real quick as I always do. I think I can guess this from your presentations, but if you're gonna write some Python code, what editor do you open up?
55:54 David Beazley: It's gotta be Emacs.
55:55 Michael Kennedy: Emacs, right on. And favorite PyPI package, in addition to Curio, of course. Is Curio on PyPI or is it just installable from GitHub?
56:05 David Beazley: It's on there. You should probably GitHub version if you're gonna do anything interesting with it though.
56:10 Michael Kennedy: It's moving pretty fast.
56:11 David Beazley: It's moving along, yeah. I don't always update it.
56:14 Michael Kennedy: Think of it more like this. If people maybe don't know about a package that you recently found, you're like, this is really cool, you guys should try this out, more than just the popularity contest.
56:22 David Beazley: One of the goals of Curio is not so much Curio itself, but to basically change a lot of the thinking around async and await. Really, it's kind of an exploratory project where it's like, hmm, let's see what we can do with async and await that's maybe outside the context of asyncio. And this Trio project is something that has been kind of inspired by Curio, if you will. It's taking things in a slightly different direction. So I would recommend people look at that.
56:53 Michael Kennedy: All right.
56:54 David Beazley: But if you're really interested in concurrency and some of this async stuff, it will give you yet a third spin on the whole universe, so that's also an experimental project, but maybe I would advise that.
57:06 Michael Kennedy: All right, yeah, very very cool. And speaking of packages, I think the Peewee async thing that I was talking about might be a separate package, I'm not sure if it's built-in. Just to be aware of that. All right, final call of action, are you looking for people to contribute to this project? What can people do now that they know about Curio?
57:24 David Beazley: Oh, it's definitely something that I'm looking for contributors. I think the place where a lot of contributions could be made on the package are more in supporting some of these other networking protocols. So getting it to hook up with things like Postgres, MySQL, Redis, ZeroMQ, things like that, there's a whole space of things that could be done there. A really big project would be interesting would be support for HTTP.
57:51 Michael Kennedy: Yeah. Some sort of a WSGI integration?
57:56 David Beazley: Yeah, that might be a whole separate podcast, because there is a whole interest in HTTP and HTTP/2 right now where people are implementing the protocols independently of the actual IO layer. This would be like Cory Benfield's work and I think Nathaniel Smith is also working on this. So it's like HTTP/2.
58:21 Michael Kennedy: Right.
58:22 David Beazley: Where he's implemented the protocol as its own library, but then the protocol can be used from threads or used from async or used from Twisted or used from different places, and that's a really interesting avenue of work.
58:37 Michael Kennedy: Yeah, thanks for recommending that. That's cool, so maybe people could use that to build on top of or something for the HTTP layer.
58:44 David Beazley: Right, right.
58:45 Michael Kennedy: For their framework.
58:46 David Beazley: Yeah, there's been some work with that in Curio already. As you've been showing, you can use those libraries from Curio, but it has not been packaged at a level of, into what I would call a nice framework. It's kind of operating at a lower level right now, and it's turning it into more like a framework, it's a whole different question, really.
59:10 Michael Kennedy: Yeah, well, it's definitely a really great start, and if it turns into one of those frameworks, I would love to play even more with it. So very nice work on Curio, David, thank you for coming on the show to share all this async stuff with us.
59:23 David Beazley: All right, thank you very much.