Learn Python with Talk Python's 270 hours of courses

#306: Scaling Python and Jupyter with ZeroMQ Transcript

Recorded on Thursday, Feb 11, 2021.

00:00 When we talk about 'scaling software', 'Threading' and 'Async', get all the buzz and while they are powerful using 'Asynchronous Queues' can often be much more effective. You might think this means creating a 'Celery' server and maybe running 'RabbitMQ' or 'Redis' as well. What if you wanted this 'Async' ability and many more message exchange patterns like 'Pub/Sub', but you wanted to do zero of that server work? None of it, then you should check out 'ZeroMQ'. 'ZeroMQ' is to queueing what 'Flask' is to web apps a powerful and simple framework for you to build just what you need. You're almost certain to learn some new networking patterns and capabilities. In this episode with our guests, Minh Megyn Kelly. He's here to discuss 'ZeroMQ' for 'Python' as well as how 'ZeroMQ' is central to the internals of Jupyter notebooks. This is talk 'Python' to me, Episode 306, recorded February 11 2021.

01:03 Welcome to talk 'Python', a weekly podcast on 'Python', the language, the libraries, the ecosystem and the personalities. This is your host Michael Kennedy, follow me on Twitter, where I'm @mkennedy and keep up with the show and listen to past episodes at talk 'Python'.fm and follow the show on Twitter via @talk 'Python'.

01:21 This episode is brought to you by ''Linode'' and ''Mito''. If you want to host some ''Python'' in the cloud, check out ''Linode'' and use our code to get $100 credit. If you've ever wanted to work with 'Python' data science tools like Jupyter notebooks and 'Pandas', but you wanted to do it like you worked with Excel with a spreadsheet and a visual designer, check out 'Mito'. One quick announcement before we get to the interview, we'll be giving away five tickets to attend 'PyCon'US2021. This conference is one of the primary sources of funding for the PSF. And it's going to be held may 14 to 15th. Online. And because it's online this year, it's open to anyone around the world. So we decided to run a contest to help people especially those who have never been part of 'PyCon' before attended this year. Just visit talk 'Python'.fm/pycon2021'. And enter your email address. And you'll be in the running for an individual 'PyCon' ticket complements of talk 'Python'. These normally sell for about $100 each. And if you're certainly want to go I encourage you to visit the 'PyCon' website, get a ticket and that money will go to support the PSF and the 'Python' community. If you want to be in this drawing, just visit talk 'Python'.fm/pycon2021. Enter your email address. You'll be in the running to win a ticket. Now let's get on to that interview.

02:36 Man, welcome to talk 'Python to me'. Thanks. Thanks for having me. Yeah, it's really good to have you here. I'm excited to talk about building applications with 'ZeroMQ'. It's definitely one of those topics that I think, yeah, it lives in this realm of 'Asynchronous' programming in ways that I think a lot of people don't initially think of like you think of 'Async' programming like okay, well, I'm gonna do 'Threads'. And if 'Threads' don't work, maybe I'll do multi processing. And then like, those are my options, right? But queues and other types of intermediaries are really interesting for creating powerful design patterns that let you scale your apps and do all sorts of interesting things, right? Yeah, 'ZeroMQ' has definitely given me a new way of thinking about how the different components of a distributed application can talk to each other. Absolutely. So we're gonna dive all into that, which is going to be super fun. But before we do, maybe, just tell us a bit about yourself. How do you get started in programming in 'Python', I got started mostly during college, where I was studying physics and what essentially amounted to a computational physics degree. And that's where I met, I had a Professor Brian Granger, who's one of the heads of the now 'Jupyter' project. I started working with him as an undergrad back in 2006. Working on what would ultimately become the first i'Python' parallel, so interactive, parallel computing toolkit and 'iPython' Oh, that's really cool. What University was at that was at Santa Clara University. Uh huh. Okay, nice. Yeah, it's really cool. I mean, you were working on physics, but you were also like, at the heart of 'Jupyter'. And I guess i'Python' at the time, right. 'Python' notebooks is what they were called. Yeah, yeah. kept coming up on 15 years that I've been working on. i'Python' and 'Jupyter'. Nice. And so you basically, did you do programming before you got into physics? Or did you just learn that to get going with your degree? So I did it for fun in college, but I you know, I played around with calculators and stuff in in high school, but not right, right? Nothing more complicated than printing out a Fibonacci sequence.

04:24 So were you a 'TI' person? Are you an HP person? What kind of world were you? Yeah, 'TI' 'TI' Oh, boy. Yeah, I mean to obviously, that reverse Polish notation. That was just wrong. Yeah. Although I did it. In college. We did implement reverse Polish calculators because it's easier. Yeah. Easy to write a parser. For that guides. It's easier on the creator and not on the user. Which Yeah, exactly. I feel like computers used to be way more that way. Right. Like, oh, it's easier for us to make the fight. Well, if you put on more work easier for the millions of people that use it. Maybe not millions in the early early days, but yeah, very interesting. So what kind of stuff were you doing with i'Python' and

05:00 Early days, what kind of stuff were you study with your physics in grad school, I was studying computational plasma physics simulation. So it was particle simulations of plasmas, which are, you know what's going on in stars, I wasn't studying stars. But that's one of the main plasmas that people know about. And so I was doing particle simulations of just studying a system seeing what happens. And we were working on an interactive scale simulation code that was written in 'C++', it was a fairly nice and welcoming 'C++' Physics simulation, as far as academic fit 'C++' Physics simulations go, yeah. But essentially, my first year of grad school, I wrote a paper where I had to do a simulation, run for five days, look at the results, and then change some input parameters and run it again for five days, and do that for a few dozen iterations, and then wrote a paper about that. And then ultimately, my PhD thesis was about wrapping the same code in 'Python', so that I could have programmatic, like self steering, actually, by this time driven in a notebook. Okay, doing what had been manual, like change it, run it again, doing that automatically, by just by wrapping the 'C++' code in the 'Python' API, I could do it much more efficiently and actually take something that took multiple iterations of five days, I could actually get the same result in a little over half a day, I think, Wow, that's really cool. So you would run a basically a little small simulation with C, and then you would get the inputs into 'Python'? And okay, well, now, how do we adjust it? How do we change that and go from there, and it was the same simulation, but I had to add the tune the inputs a little bit differently in order to, there were some details of the of the physics that basically, I had to turn some knobs way up, because I wasn't constantly watching it. Yeah, I see. And then it would say, like, oh, that number was too big, turn it down, do it again, and then look and see if it was too big again. Whereas once I had a knob that could be turned while the simulation was running, which is the feature that was missing, I could say, ramp up the current in this case, that was I was setting the 'Max' limit of current that you can get through a 'Diode', right, and so ramp up the current. And then once I see evidence that it's too high, ramp it back down, and then I could change, you know, lower the slope, so that I would slowly approach an equilibrium and find find the limit. Whereas when I had to do it with this long, I had to basically wait until it's done to change the input, I had to dramatically over eject the current which made the simulations extra slow, because there were way too many particles. Yeah, and the spatial resolution had to be really fine in order to resolve something called a virtual cathode. And I didn't have to do nearly as much basically, the the semi inert because I could do this live feedback. The simulations themselves could be quicker, because I didn't have to provide enough information that I could get the right answer out a long time later. That's super cool. And obviously, it makes a huge difference. You know, I feel like we have a similar background in how we got into programming, like I was doing a math degree. And my math research led me into doing 'C++' on Silicon Graphics, mainframe computers and stuff like that. You talk about letting these things run overnight, like everyone would just kick off these things. And then they would come back in the morning to see what happened. And you know, interesting sort of random sidebars, like we came in one morning, and we all try to log into our little workstations or whatever, and like the system just wouldn't respond. There was just something wrong with it. And you're like, what the heck is going on? You know, like, this is like a quarter million dollar computer, it should let us log in what is wrong with it, right. And it was in there running really loud, right? It's like this huge, loud machine. And it turned out that one of the grad students not me, had started a job that was they were trying to figure out what was going on. And they were logging a lot, because they were having problems with their code. And it was in this tight loop that ran all night. And because it was like a somewhat small group, they had no disk limits, or permission restrictions for the large part. So it literally used up every byte on the server, and then it just wouldn't do anything at all. And I mean, that's the The reason I bring this up is like, that's the challenge of these things where you like, start it and let it run for days, and then figure out what's happening. You're like, Oh, we just used up the entire hard drive of this giant machine. Whoops, it broke it for everyone. Right? Yeah. And so this is live aspect or more. self guided is a really great idea. It's really, it's a big step ahead. Even more so than people initially like hear it as right. It's a big deal. Yeah. And I think it was, that's part of what got me really excited about the programmatic a giving when you have access to something in 'Python', and you have an environment like i'Python', or 'Jupyter' or whatever. Being able to interact with something, what is usually what you might think of as an off an offline physics simulation that runs for hours or days. If you have a 'Python' API to it, you can steer it. While it's going like you can turn the knobs on your experiment while it's running. Sometimes there are reasons why that's not actually a good idea, but right it's really powerful to not one of the costs, you know, that I had in my experiment was that in order to change the inputs, I had to start over, like physically I didn't have to like if it were a physical machine where I were measuring inputs and outputs. I could just you

10:00 No turn the current down. Alright, see the results. But the code didn't support that. The frustrating thing about it was the way the physics was written, did support that the 'C++' code did support that. We just hadn't written an interface that allowed me as the user to do it. And so the 'Python' wrapping work was surprisingly little it was, you know, expose 'Python' API's to or 'C++' API's to 'Python' with 'Cython'. Yeah, that's cool. And then I could turn all the knobs that were already there in the code. I just was now allowed to turn them It sounds to be thinking back on the timeline there. That must have been pretty early days in the data science, computational science world of 'Python', right, like 'NumPy' was probably just about out around then. But that time I was doing that work. That would have been around 2012. Okay, so yeah, it was more established at that point. I was thinking back to like, early, like, 2005, or something that Yeah, not quite right. Yeah, not Yeah, that was that was when I was I was still as an undergrad was still able to help make building NumPy I was able to submit patches to make building 'NumPy' on Mac a little easier. That's cool. before it became like, completely polished. And like, yeah, so widely used, I suspect is really nerve racking to contribute to those kinds of things now is a nice community. And it was it was exciting to be able to say like, I have a problem. There's probably other people who have a similar problem. Let me see if I can fix it and submit a patch. Yeah, that's fantastic. Cool. So you were carrying on with your scientific computation work these days? Right? What do you what do you do now? Now my job is actually working on 'Jupyter' and i'Python' related stuff. So I am a senior research engineer at simula research lab in Oslo, Norway, where I've been since 2015. And I'm the head of the department of scientific computing in numerical analysis. So I'm in a department where people are doing physical simulations of the brain, and studying peas and things like that. But what I do is I work on 'Jupyter' hub. Yeah, so we made 'Jupyter' hub as a tool for deploying, while I was still at Berkeley, we made it as a tool for deploying, for people who want to deploy Jupyter notebooks, we say it's for if you, you're a person that has computers, and you have some humans, and you want to help those humans use those computers, with 'Jupyter', that's what 'Jupyter' hub was for, initially was scoped to be an extremely small project, because we didn't have any maintenance burden to spare. The target of that was like research groups like mine, in grad school have like a few few people, maybe small classes who have like one server in their office and they just want to make it easier for people to log in and use 'Jupyter' on their right. This is when people write, people had all their like 'SSH tunnel', 'Reverse SSH' into my server with the open a 'Reverse tunnel', so I can point localhost and it's actually over there. So you're trying to make life easier for those folks. They were running the servers locally, and they're like, how do I get access to my computation over there? But on my machine, that kind of stuff, right? Yes, like if I'm a person who's already comfortable with 'Jupyter', but now I have access to a computer that's over there. How do I get how do I run my notebook stuff over there. And there were great solutions to that with involving SSH tunnels, or whatever. And 'Jupyter' hub was aimed at those folks. But it turned out that's not the user community, that's really been the most excited about it. with things like 'Zero' to 'Jupyter' hub, the 'Jupyter' on Kubernetes started by 'UV panda' know, went by a whole bunch of wonderful folks, we are shifted into a larger scale than initially designed for he definitely seems like 'Jupyter' lab and the predecessors to it have really probably exploded beyond the scale that people initially expected. I mean, they're just, it's just the de facto way of doing things. If you're doing computation these days, it seems I for certain, certain folks definitely seem to seem to like it. We try to try to respond to feedback. And yeah, just build things that people are going to use nice. So if you're running 'Jupyter' lab for such a big group, I mean, there's got to be crazy amounts of computation, and you know, supercomputer type stuff. They're like, what is your whole setup look like? What are you running for your folks, the research lab by run 'Jupyter' hub instances for like a summer school. And they'll run a workshop and they'll spin up a cluster for running 'Jupyter' hub for doing some physics simulations. Usually, it's for teaching. Okay, so I don't operate. I don't operate hubs for research. Usually, I usually operate them for teaching, then I see. Yeah. So maybe the computation there. It's like, Yeah, not nearly as high as we're trying to model the universe and let that go. Yeah, for those folks that computation are handed off somewhere, right? Yeah. Like the folks at nurse in Berkeley are are doing some really exciting stuff with running 'Jupyter' hub on on a supercomputer. Yeah. Cool. Are you doing like Kubernetes or Docker or setting up VMs? Or how do you handle that side? thanks to the work of the 'Zero all the folks contributing to 'Zero' to 'Jupyter' hub, deploying 'Jupyter'with Kubernetes, I think is the easiest to deploy and maintain as long as you have access to a managed Kubernetes. I still don't wouldn't recommend deploying Kubernetes itself to anybody. It seems fairly complicated to run and maintain that side of things. Yeah. But if you have a turnkey solution, the one I use the most is the SDK, the Google managed Kubernetes. Yeah, and here's 'Jupyter' hub. Let you just step through and say give me a cluster and run 'Jupyter' hub on it. And then all I really need to do is for the workshop is build the user images right?

15:00 Cool. We also do some stuff with 'Binders'. All right, I help operate 'Binder'. So "mybinder.org" is a service built on top of 'Jupyter' hub that is takes 'Jupyter' hub as a service for running notebooks on 'Kubernetes' in 'Binders' cases running on Kubernetes. And 'Binder' ties in another 'Jupyter' project called repo did 'Repo to Docker' that says, Look at a repo build a Docker image with the contents of that, hopefully that can run everything. So like if it finds a requirements that sub-txt. Yeah, exactly. It's got something specifies its dependencies or something like that. Yeah. And there's a bunch of things that that we support. And the idea is to automate existing best practices, right. So find anything that people are actually already using to specify environments, and then install those and then build an image. And then what 'Binder' does is so I built an image, then send that over to 'Jupyter' hub to say launch notebook server with that image, right? Yeah, you can see like, right here, on the main site, just put it in a 'GitHub repo' and maybe a 'branch' or a 'tag' or something and then click Go. And then it spins up literally, a 'Jupyter Notebook' that you can play with

15:58 this portion of talk python to me sponsored by 'Linode'. Simplify your infrastructure and cut your cloud bills in half with 'Linode'. Linux virtual machines, develop, deploy and scale your modern applications faster and easier. Whether you're developing a personal project or managing large workloads, you deserve simple, affordable and accessible cloud computing solutions. As listeners of talk 'Python' to me, you'll get a $100 free credit, you can find all the details at talkpython.fm/linode. 'Linode' has data centers around the world with the same simple and consistent pricing. regardless of location, just choose the data center that's nearest to your users, you'll also receive 24/7, 365 human support with no tears or handoffs, regardless of your plan size, you can choose shared and dedicated compute instances. Or you can use your $100 in credit on s3, compatible object storage, managed Kubernetes clusters. And more. If it runs on Linux, it runs on the 'Linode', visit talkpython.fm/linode. Or click the link in your show notes, then click that create free account button to get started.

17:03 You know, when people go to GitHub, they can oftentimes see the 'Jupyter Notebook'. And it always you know, when I first saw them, like how in the world is GitHub, computing this stuff for me to see right? Like, like that? Maybe that is like super computationally expensive? And here's the answer and like, how do they know what the data and the dependencies and the reality is? They've just taken what's stored in the notebook, right? That is the last run and it's there. But if you want to play with it, you can't do that on GitHub. But if you can on my 'binder.org', right, that that creates the inner is like turns that into an interactive notebook. It adds interactivity to the sharing that you already have within the viewer. GitHub. Yeah, very cool project. I didn't know that much about it. But I had Tim had on the show a while ago on episode 256 to talk about that. And I learned a lot. So yeah, quite neat. Now, let's jump into 'ZeroMQ'. So 'ZeroMQ' is not a 'Python' thing. But it is very good for 'Python' people, right. It has support for many different languages. And your work primarily has been to work on making this nice and easy from 'Python', right? Yeah. So 'ZeroMQ' is a 'C++' Library written in 'C++' with a 'C API', which makes using it a little easier. And it's a very small API, which is part of why it's usable from so many languages that writing bindings for it is relatively easy. And the idea of 'ZeroMQ' is, it's a messaging library, it's a little naming is a little funky, because it comes from the world. But the people who create it come from the world of message brokers and message queues. And 'ZeroMQ' is a bit tongue in cheek in that it's not actually a message queue at all. It's a messaging library, where it's a just adding a little bit of a layer of abstraction on the networking, in terms of you have some distributed application where things need to talk to each other. And 'ZeroMQ' is a tool for building that. And it's this library. And then you can write the bindings for that library so that you can use it from any of a variety of languages. And so Brian Granger and I worked on the 'Python' bindings to use 'ZeroMQ' from 'Python', which is called 'PYNQ'. Yeah, when I first thought of it, I imagined it is like a server. And something like Redis or 'Celery', or something like that, that you start up, and then you create 'Queues' or something on it. And then like different things can talk to it. But I think maybe a better conceptualization of it might be like 'Flask'. 'Flask' is not a server, but a 'Flask' is a framework that you can put into your 'Python' app, and then run it and it is the server itself, right? Yeah. And so I'd say that 'ZeroMQ', the best way to describe it is it's a fancy socket library. Yeah. So you create sockets and you send messages, and the sockets talk to other sockets, you send messages, you receive messages. And there mq is all about what abstractions and guarantees and, and things that gives around those sockets and messages. There seems to be a lot of culture and Zen about 'ZeroMQ', like there's a lot of interesting like nomenclature in the way that they talk about stuff over there.

20:00 So they talk about the zero in 'ZeroMQ'. And the philosophy starts with 00 is for the zero broker, its broker lists, zero latency, zero cost, it's free zero admin, you don't have like a server type of thing, but also to a culture of minimalism that permeates the project that ADDING POWER by removing complexity rather than exposing new functionality. You want to speak to that just a little bit like your experience with that well, so I can speak to that, as someone who doesn't work on 'Lithium Q', that much and more somebody who writes bindings for it, I can speak to that as it's nice that lets him q doesn't change that much. Yeah. And that part of point of So 'LithiumQ' 'ZenQ'queue has all these features, they have a good talk about a little bit about it in a second, but so it's structured. So they're 'Socket( )'s. And there are different kinds of 'Socket( )'s that have different behaviors, for building these different kinds of distributed applications. But in terms of the API, there's just one like a 'Socket( )' has an API, all 'Socket's have the same API. So from the standpoint of writing bindings to a library, I just need to say, I know how to wrap a wrap the 'Socket( )' API's. And then as they add new types, those are just constants that I need to handle. So I don't need to Oh, there's a new kind of 'Socket( )', I need to implement a new 'Python' class for that new kind of 'Socket( )'. I just need to wrap 'Socket( )'. Yeah. So that's really that's from my perspective, in terms of as new features are developed and things in 'lipsy Mq', from my perspective, as a binding developer, or binding maintainer, that's really nice. But it also that also extends to the application layer, that once you have a 'Socket( )', you understand 'Socket's, changing the type of 'Socket' changes your message pattern, it doesn't change anything about the API's you need to use and things like that. Yeah. So maybe we could talk a little bit about what the application model. I don't necessarily get into programming yet. But like the application model, right, so we've got contexts, we've got 'Socket( )'s, and we've got messages. And those are the basic building blocks of what working with this is, like, if I wanted to create something that could, you know, maybe other applications could talk to it and exchange data, one of my options might be to create a 'REST'ful API that exchanges JSON, right? Yeah, sure. There's a lot of challenges with that. One, it's sort of send requests only right? I send over my JSON, and then it gives me a response. But I can't subscribe to future changes, right? I got to do something like 'Web'Socket's in that world, if I want something like that, which I guess might be closer to this. And then also, it's doing the text conversion, it's probably a little bit slower. Gotta do maybe extra work for 'Async'. Right? So there's, there's a lot of things that are maybe similar, but extra patterns, right? So instead of just request response, you might have 'Pub' 'Sub', you might have like 'Multicast', like something comes in and everyone gets notified about it. Can you talk about some of those differences, like maybe compared to what other more common API's people might know about? Yeah, so the main, the main thing that distinguishes 'ZeroMQ' is that you have you to a context is kind of an implementation detail that you shouldn't need to care about. But you still need to create 'Socket( )'s are the main thing that you deal with. And you create a, so you create a 'Socket( )', and every 'Socket( )' has a type. And that type determines the messaging pattern. So that means that's where we're getting at these kinds of protocols and messaging patterns. So with a web server, that's usually a request reply pattern. So you have clients Connect, send a 'Request', and then they get a reply. 'Pub/Sub' system might be, you know, some totally different thing. Maybe in 'Web server' land, maybe it's a 'Server' side events, you know, an Event Stream right connection. And with 'ZeroMQ', the difference between those is the 'Socket( )' type. So if you creating a 'Publish subscribe' relationship, you create a publish 'Socket( )' on one side, and you create a 'Subscribe socket()' on the other side. If you're doing a request reply, you use a socket called a 'DEALER' and a 'ROUTER', there is a request and reply socket in 'ZeroMQ', but nobody should ever use them. They're just a special case of 'Router', 'Dealer'. And then there's another one called a pattern called like a 'Ventilator', and 'Sync'. So that's, like what you'd use in a work queue, for instance, where you've got a source of work, and then it sends a message to one destination, but you don't necessarily care which one I see. So maybe you're trying to do scaled out computing. Yeah, exactly. You've got 10 things that could 10 machines that could all do the work and you want to somehow evenly distribute that work, right. So you're like, Alright, well, we're just gonna throw it at 'ZeroMQ'. All the things that are available to do work can subscribe. They need to they could even like drop out after doing some work but then not receive any more potentially, something like that. That's what 'ZeroMQ' one of the main things that 'ZeroMQ' does is it, it takes control over those things like multiple peers, and connection events and stuff like that. Because take the for example, the 'Publish-Subscribe', there's there's two key things to think about with 'ZeroMQ' are what happens when you've got more than one peer connected, right and what happens when you've got nobody to send to so in the Publish subscribe model what it

25:00 does is when you send a message on a socket, it will send it will immediately send that message to everybody who's connected and ready. So if somebody is not able to keep up, right, there's like a queue that's building up and it's gotten full, it'll just drop, stop drop messages to that peer until they catch up. If there are no peers, then it's really fast, because it just doesn't send anything and just deletes the memory. Yeah, right. So with when you're thinking of 'ZeroMQ' from 'Python' sending messages is really the other thing about it is that it's 'Asynchronous' that send is not actually send doesn't return when the message is on the TCP buffer or whatever. Send returns when you have handed control of the message to the 'ZeroMQ' IO thread. And this is why 'ZeroMQ' has this concept of contexts. contexts are what own the IO 'Threads' that actually do all the real work of talking to over the network. So when you're sending in 'PyZen Q', you're really just passing ownership of the memory to 'ZeroMQ' and it returns immediately. And so you don't actually know when that message is actually is finally sent. And you shouldn't care. Sometimes you do care. And it can get complicated. Yeah, but 'ZeroMQ' tries to make you not care interesting. So basically, you set up the relationships between the clients and the server through like these different models. And then you drop off the messages to 'ZeroMQ' and it just it deals from there, right? It figures out who gets what and when. And so your application as soon as it gets the message passed off to the 'ZeroMQ' layer. It can go about doing other stuff, right? Exactly. So 'ZeroMQ', because it's a you know, it's a 'C++' library, it's not going to grab the 'GIL' or anything. So it's a true, even if you're using 'Python', it's a true multi threaded application. Even if you're only using one 'Python' thread. Yeah, you handed that memory off to 'C++', which is running in the background, you can do some 'GIL' holding intense operation. And 'ZeroMQ' will be happily dealing with all the network stuff. Right, right. It's got its own 'C++' thread, which has nothing to do with the 'GIL' (Global Interpreter Lock) and they go do its own thing, right? Yeah, it can cut there's there is something that comes up in picmg, where it can come back and try to grab the 'GIL' from the 'i/o thread'. Or it used to it doesn't anymore because of 'Python' does actually need to know when it let go of that memory in order to avoid say, Yeah, but that's an implementation detail. Cool. So is there anything about reliable messaging here where you can say, I want to make sure that this message gets delivered to every client, like if you said, for example, in this 'pub/sub', if some of them fall behind? It can just drop the messages? Is there a way to say, you know, pile that up, and then send it along when it catches up or whatever the 'ZeroMQ' perspective is that that's an application level problem. So 'ZeroMQ' helps you build the messaging layer to not so that it doesn't crash. And that's part of why it drops messages. And then it basically it's up to you to say, you know, if you're sending messages, and there's a 'Generation counter', for instance, okay, then you notice I got message five, and then I got message eight, it's up to your application to say, Okay, keep a, you know, keep a recent history buffer so that folks can come back with a different pattern, or request reply pattern to say, give me a batch of recent messages that I missed so that I can resume but 'ZeroMQ' doesn't help you with that. He handles all the networking stuffs, your connections, you don't have to worry about that. You just have to worry about like how do we set up a way to ask again, or have the application asked, but once more, yeah, nice. David, out there in the live stream asks what alternatives to use, your mq could have been used to build a 'Jupyter' protocol. So maybe before we get Thanks for the question. Before we get to that, though, maybe, let's just talk about like, Oh, wait, 'ZeroMQ' is used to build the 'Jupyter' protocol. Yeah, so the 'Jupyter' protocol, which essentially started out in i'Python', parallel. So we had this interactive parallel computing networking framework that eventually evolved into Wait, we've got this network protocol for remote computation, we can build an basically a 'Repple protocol' as an interactive 'Shell protocol'. And that ultimately became the the 'Jupyter' protocol, which was built with this kind of 'ZeroMQ' mindset of, I want to be able to have multiple front ends at the same time. So let's say and we had in 2010, I think Fernando and Brian had a working prototype of real time collaboration on a terminal. So you've got you using this protocol. So you've got two people with the terminal you're typing. You can run code, and you can see each other's output. So we built the 'Jupyter' protocol with multiple connections in mind. So that means there's this request reply socket. So the front end sends a request, please run this code. And the backend sends a reply thing I ran that here's the result. And there's another channel called 'I/O pub' where we publish output. So when you do a print statement, or you display a 'Matplotlib' figure, that's a message that goes on 'pub/sub' channel, which means that every connected front end, right so you could have multiple 'Jupyter' lab instances, they'll all receive the same message, right? I mean, that sounds like the perfect example of 'pub/sub'. Somebody making a change. Well, somebody has triggered the server to

30:00 Make a change but it doesn't matter who started that change everybody looking at it wants to see the output right?

30:06 This portion of talk 'Python' to me is brought to you by 'Mito', you feel like you're stumbling around trying to work with 'Pandas' within your Jupyter notebooks. What if you could work with your 'Data Frames' visually like they were Excel spreadsheets, but have it write the 'Python' code for you. With 'Mito' you can. 'Mito' is a visual front end inside Jupyter notebooks that automatically generates the equivalent 'Python' code within your notebook cells. 'Mito' lets you generate production ready 'Python' just by editing a spreadsheet. All right with inside 'Jupyter', you're sure to learn some interesting 'Python' 'Pandas' tricks just by using the visual aspects of that spreadsheet. You can merge 'Pivot' filter, 'Sort Clean' and create 'Graphs' all in the front end and get the equivalent 'Python' code written right in your notebook. So stop, spend your time googling all that syntax and try 'Mito' today, just visit 'talkpython.fm/mito' to get early access that's 'talk python.fm/mito', or just click the link in the show notes.

31:04 So to answer the question of, and this becomes particularly important in the design of 'IPython' parallel, but getting to the question of what alternatives to 'ZeroMQ' could have been used for the 'Jupyter' protocol. So when we were designing it, we thought this protocol of talking directly to 'Kernels' was going to be the main thing, it turns out that the main way, you know, in 2020, or 2021, I guess we're in now the most 'Kernels' talk one to one with a notebook web server. And that's the web server, that's the one that's actually expanding out to multiple clients. Right. With that being the case, if we had required the notebook server to be the place where we do all this multiplexing and everything. We could have actually built the lower level 'Jupyter' protocol and something much simpler just to an 'HTTP REST', 'Event Stream'. probably could have worked just fine. Right? Maybe just use WebSockets or something on the web server side. Yeah. If WebSockets had existed at the time. Yeah.

31:58 Those were also years away. Yeah. That's how much that stuff's easier now and the browser support it, and so on. So yeah, super interesting. Now, one thing that that makes me think of is, you know, how much how close are we or some sort of 'Google Docs', 'Jupyter' lab type of thing, right? I mean, we're not there, right? I know, there's there 'SageMath', and there's 'Google Colab'. There's other systems where this does exist, right? Where they're sort of, we can all type on the notebook, same cell, same time type of thing. Is there anything like that 'Jupyter' lab that I just didn't miss? Or I missed? Or there's been, I think, 3 prototypes at this point that have been developed and are and ultimately not finished, for various reasons. There's another one that's picking up again, and going strong using 'Yjs' is working with 'Quanstack', I believe, okay. Hopefully soon. That's not an area that project where I've done a lot of work. I helped a little bit with the last one. Well, I mean, to me, it sounds like that's like just all JavaScript front end craziness. And not a whole lot of other stuff, right? Like yet the state probably lives on the server. So you need to have a server, whether it's running 'CRDT' or whatever, to synchronize the 'State' and the 'pub/sub'. Yeah. As well, for the changes. Yeah. Going down this rabbit hole for a minute also. Now I asked the interesting question about 'ZeroMQ'. And what's the story with 'ZeroMQ' and micro services? Right, like, Yeah, and I think micro services, people often set up a whole bunch of little small 'Flask' or 'FastAPI', things that talk 'JSON' exchange request response, but men like the performance, and the multiplexing, all those types of things sound like it actually could be a really awesome, non HTTP based 'Micro service'. Yeah, I think 'ZeroMQ' is a really good fit for 'Microservice' based distributed applications. Because one of the things you do when you're designing with micro services is you're defining the communication relationship, and you're scaling axes. And a nice thing to do with 'ZeroMQ' is to say is that it your application doesn't change. When you've got a bunch of peers, your application doesn't even need to know when a new peer comes and goes, because 'ZeroMQ' handles that so one of the things that's nifty about that's weird and magical, but also really useful. About 'ZeroMQ', is it it abstracts, binding and connecting and transports. So you can have the same application with the same connection pattern, and maybe this one binds, and maybe, you know, one side binds and one side connects. So 'pub' binds and 'sub' connects. But you can also have 'sub' Bind and 'pub' Connect. And you can also have 'pub', the your 'pub Bind' once and connect three times. And none of that changes how your application behaves, it just changes where the connections go right? Through the micro services. When you're doing HTTP requests. You always got to figure out okay, well, what's the URL I'm going to? And sometimes that even gets real tricky with our what is even the URL of the identity server, what is the URL of the thing that manages the catalog before I even request it? And then usually, that's a single endpoint, HTTP request type of thing. If it's like an update notification, so yeah, I can imagine that there's some real interesting things here. You can have in you know, a distributed work kind of situation.

35:00 You can have one or a few sources of work. And then you have an elastic number of workers that just connect and start receiving messages. And the way this is in kind of the 'push pull' pattern. So 'pub/sub' is always send them every message you send, send it to everybody connected, who can receive a message. Whereas 'Push/Pull' is, whenever you send a message, send it to exactly one peer, and I don't care which one yeah, and so if more peers are connected, it will load balance across all those peers. But if only one is connected, it will just keep sending to that one. And at no point in the sender, do you ever need to know how many you never get notified that peers are connecting, you never need to know that there are any peers that there's one peer that's that there's 1000, peers, it doesn't matter. So then it's in your distributed application to say, you know, this one's sending with this push pull pattern, ventilator sync pattern, and then I just elastically grow my number of workers and, and shut them down. And they just connect and close disconnect and everything. And it just works. Interesting. Okay, so follow up question from now on. That makes me think so he asked. He/she sorry, asked whether it is a good idea to replace 'REST Communication' with your 'ZeroMQ'. And so that, that leads me to wonder you talked about you send the message, and it's sort of fire and forget style, like message sent success. But so often, what I want to do is, I need to know what products are offered on sale right now from the sale micro service, or whatever, I need to get the answer back these three products. Thank you, you know what I mean? How do I implement something like that where I send a message, but I want the answer, that's the 'Request-Reply' pattern. So you would use either the 'Request Reply' sockets or the Router Dealer sockets for that kind of pattern, and that's you send a message. So in that case, this does have multipeer semantics. But usually, the requester is connected to one endpoint, it can be connected to several, in which case it'll load balance its requests and the router. So the receiver side handles requests, and each request comes in with a message prefix that identifies who that message came from. And then it can send replies using that identity prefix, and it will go go to whoever sent that request. And that's how most of the 'Jupyter' protocol is a request reply pattern. Yeah, I guess. So that makes sense. Because you want the answer from the computation or whatever you want to know that it's done and so on. Interesting. Okay. Yeah. Well, that's pretty neat. What the other thing that comes to mind around this is 'Serialization'. So when I'm doing 'Micro Services', I make JSON documents. I know what things go in JSON, right? Like fundamental type strings, integers, and so on, surprisingly, dates, dates, and times can't go into JSON that is still like, this is 2021. We can't come up with a text representation of what time it is. Anyway, that's a bit of a pain, it seems to me like you might be able to exchange more data more efficiently using a 'Binary'. And it's not even going over the HTTP layer, right? It's going literally over a TCP socket. Yeah, or IPC, with 'BSD sockets', or 'UDP' or even lower level than that. Yeah. Not even touching the network stack. Right? Yeah, this gets us to the last piece of 'ZeroMQ' that we haven't talked about yet. And that's what what is a message, right. They're nice things, you know, if you've worked with, you know, a lower level socket library, just talking TCP sockets, you know, you have to deal with like chunks and then figure out when you're done with your message protocol. But if you've ever written an HTTP server, you know, you need to find those double blank lines and all that stuff. Before you know that you have a request that you can hand off to your Request Handler. Right, right. You're doing a whole lot of funky like parsing the header. Okay, the header, it says it's the next 100 bytes of the thing that I'm getting, and this one isn't. And so I'm at a point like it's, it's gnarly stuff. I've worked on projects where we did that. And it's super fast, but boy, it is. It's a low level business. That's something like Flask does for you. Yeah, right. It implements the HTTP protocol, and then says, Okay, here's a request, please send a reply. And it helps you construct that reply message. So Pi 'ZeroMQ' 'Zmq' live at that, at that level of flask, where a message is a not one 'Binary Blob', but a collection of 'Binary Blobs'. And 'ZeroMQ' always delivers whole messages. So it's 'Atomic' and it's 'Asynchronous'. And it's messaging, which means you will never get part of a message. There's no like, Okay, I got the first third of this message. Keep it in my own buffer until I get the you know, get to the end, 'ZeroMQ' socket does not become readable until an entire message is ready to be read, right? And there may be buffering down the 'C++' layer, but it's not going to tell you I've received a thing until it's fully baked. Got the answer? Right, right. Yeah, all that stuff still happens. It just junkie takes care of that. And then when, when at the 'PyGen' key level, or the 'ZenQ API' level, when a socket when you receive with a socket, or in 'PyGen', you use a receiver multipart, you get a list of 'Blobs' of memory. And so if you're talking about 'Serialization' with JSON surpising queue has a helper

40:00 function called 'send JSON' and it's literally just JSON that dumps thing and then send it. Yeah, with a little ensuring UTF 8 bytes I think. Yeah, nice. So 'pyzmq', it's and this is really turned out to be really important for more important for 'IPython' parallel than it turned out to be for the 'Jupyter' protocol. I'm not as familiar with 'IPython' parallel Tell me about this, 'IPython' parallel is. So if you're aware of the 'Jupyter' protocol, it's a network protocol for I've got somewhere over the network where I want to run code. And I have this protocol for sending messages, please run this code, give me a return output, show me display stuff. 'IPython' parallel is a kind of weird fellow computing library based on the fact that so I've got a network protocol to talk to, to run code remotely. Why don't I just wrap that in a little bit to talk to N remote, right, right places maybe partition up the work across them or something like that. The fun thing about 'ZeroMQ' is that in a 'Jupyter' Notebook, the 'Kernel' is the server, the 'Kernel' listens for connections on its various sockets. And then the notebook server, the web server, or the 'QT console', or the 'Terminal' is a client, and it connects to those sockets. 'IPython' parallel, because of this fun stuff about 'ZeroMQ' not caring about connection direction, or count, adds a scheduler layer, and it modifies the Kernel, the 'IPython' kernel that you use in the Jupyter Notebook. And the only change it makes is instead of binding on those sockets, it connects to a 'Central Scheduler'. And the Kernel is otherwise identical, the Message Protocol is otherwise identical. But the connection direction is different. Because the the many to one relationship is different. There's one controller and many engines instead of instead of many clients connecting to one kernel. And then again, using kind of some of the this the magic of the 'ZeroMQ' routing identities. There's a multiplexer in ' PyGen' queue called a 'Monitored Queue'. Where if you have router sockets, or a router socket is one where the first so we talked about zoom key messaging is a sequence of 'Blobs' of memory. So it can just be one with a router socket, it's always at least two. Because the first part is the routing identity to tell the underlying 'ZeroMQ', which peer should it actually send to her? Okay, let's go. And we don't have to worry about that. Because that's down at the low level, right. But that's what happens when you get a request, you need to remember that first part. So when you send the reply, the first part of the reply is the the idea that came with the request about a guy who was back to the right place. Yeah, but you can also use that if you know the IDs, you can send messages to a destination without being in response to a request, right? What a router really is, is a socket, that can route messages based on this identity prefix. So if you have a bundle of identity prefixes, then you can send messages to anyone at any time. And that allows us to build a multiplexing scheduler, that from one client connected to one schedulers just send messages, regular plain old junky protocol messages, but with an identity prefix from the client. And those will end up at the right kernel, just by the magic of 'ZeroMQ' 'Routing Identities'. Yeah. And so this is a substantially different messaging pattern, so that the request reply patterns are all the same, but the connection patterns are totally different. And the client and the endpoint don't need to know about it at all, we just have this adapter in the middle, I feel like to really get the Zen of this and take full advantage, you've got to really think about these messages, messaging patterns and styles a little bit because they're, they're fairly different than Oh, this is what I know from web servers. So the big thing to do with if you're getting into 'ZeroMQ' is to read the there's something called the guide. in there, there'll be a link in the notes. And if you go to 'zeromq.org', it'll be prominently linked. And this goes through kind of the different patterns that 'ZeroMQ' thinks about the abstractions in 'ZeroMQ', and the different socket types and what they're for. And the guide will help you. And they're examples in many languages, including 'Python'. Yeah, this seems great. Yeah. It'll help you build kind of little toy example patterns of here's a 'Publish-Subscribe' application. Here's a, you know, a 'Ventilator sync' application. And then it also does things with pictures. Yes.

44:01 That's, I think the way to internalize what are the Zen key concepts? And how do I deal with this. So when it comes to 'Serialization', this is really important for 'IPython' parallel, and it also comes up if you're in 'Jupyter', and use the interactive widgets if use the really intense ones that do like 3d visualization, interactive 3d visualization in the browser that sometimes are streaming a lot of data from the 'Kernel', because a 'ZeroMQ' this combines two things, one from 'pyzmq', and one from 'ZeroMQ' itself. So the 'zmq' concept that a message is actually a collection of frames, this lets you and another 'zmq' can be zero copy and buys mq supports zero copy. So anything that supports the 'Python' buffer interface can be sent without copying, meaning it still copied over the network, but at no point are there any copies in memory. So you can send 100 megabyte 'NumPy' array with 'ZeroMQ' without copying it. But then you've got you've got to think about Oh, wait, if I send a 'NumPy Array' using the 'Python' buffer interface, all I got were the bytes Where are you know, where's your

45:00 The D type information like how do I know this is a 2d array of integers? Because a message is in 'Python' language is a list of chunk of Blobs, instead of a single 'Blob', you can 'Serialize' that metadata as like a 'Header' and the 'Blob', you don't want to copy the big one separately. So you can say like JSON dumps the some message metadata that tells you how to interpret the binary blob, and then just the binary blob and you don't copy it, then you can send as one message, right? We're not breaking the single message delivery, you have your metadata that's serialized with message pack that comes in it's like a frame or something like that in the message. Yeah. Yeah. So one frame is your header. One frame is the data itself. And we do this in the 'Jupyter' protocol that the 'Jupyter' protocol has an arbitrary number of buffers on the end. But then there are three frames that are actually 'JSON', 'Serialize' and 'Dictionaries'. Very cool. Very cool. Yeah. Looking at the guide here, it says there's 60 diagrams with in 750 examples in 28 languages. That's a big cross product matrix of options in here. And you can also download it as a PDF to take with you which Yeah, this looks like a really great place to get started speaking and getting started. Let's talk about programming with the 'Python' aspect here. Alright, so you will use

45:00 'pyzmq', and this is a library that you work on as well. Yeah, yeah. I think I maintain 'pyzmq' . Yeah. Awesome. to maybe, you know, talk, it's, it's hard to talk about code, but just give us a sense of what it's like to create a server like in 'Flask', you know, I create I say 'app'= 'flask', I decorate 'app.Route' on a function like what's the 'ZeroMQ' 'Python' equivalent of that? Yeah. So you first you always have to create a context and then use that context context as a socket method that creates sockets. And then you either bind or connect those sockets. And then you start sending and receiving messages. So if you're writing a server, which usually means this is the one that binds Yeah, so you create a socket, you'd call 'socket.bind', and give it a URL, maybe a TCP URL, or an IPC URL with a path, you know, local path, and then you go into a loop saying, you know, receive a message, handle that message, send a reply, or if or if it's a publisher, often have a while true loop sort of thing, right? Just while, true. wait for somebody to talk to me, or while not exit. Yeah, that's a simple version. Or you could be inter integrated into 'Async IO' or 'Tornado', or, or 'G' event or whatever. Yeah, one of the fundamental principles of 'ZeroMQ' is that it's 'Async'. All over the place. What's the async and 'Await' story with 'pyzmq'? Is there any integration there? Yeah. So if you do import 'zmq.asyncio', instead of if you import the 'zmq.asyncio' as the 'zmq', you will have the same thing but send and receive are available and said, Oh, that's glorious. Yeah. That's really, really fantastic. So you should be able to scale that to handling lots of concurrent exchanges. Pretty straightforward, right? Yeah. And that's how the so taking 'Jupyter' as an example, again, so the Jupyter Notebook uses the 'Tornado framework', which if you're getting into 'async io', Tornado is basically 'async'. io before asyncio, right? Yeah, it's the early days early take asyncio. Yeah. And there, we use something called a 'zmq stream', which is a something inspired by 'Tornadoes IO stream', which is their wrapper around a regular socket. It's like bytes are coming in call events when bytes have arrived. 'zmq stream' is a tornado thing that says when you have an on receive method that passes a callback and says whenever there's a message, call this callback with the with the message after receiving it. And so that's actually how the 'IPython' 'Kernel' and 'Jupyter Notebook' work on the 'zmq' side is with these 'zmq stream' objects cool to the example he talks about is how to create a server. But you know, web version would be use request to do a request dot get against the server to be the client that talks to it. What's the that version in 'pyzmq' in 'pyzmq', a client looks very much like a server, except instead of bind, you'd call connect and instead of receive us do a send, yeah, yeah. And so in our request reply pattern. Yeah. So wherever you have a receive on the server side, you have a send on the client side, and vice versa. So in a request reply pattern the client is doing, send a request and then receive to get the reply in a server you're doing receive a request and send the reply. In pub sub, you're just you're only sending on the publisher side. And on the subscriber side, you're only receiving nice and the window way you set this, you basically choose these things. When you go to the contacts you create the socket, you tell it what kind of pattern you're looking for, is that where you specify that? Yeah, so the 'zmq' has a bunch of constants that identify socket types. So you use when you create a socket, you always have to give it a single argument. That is the socket type. So it'd be like 'zmq.pub' for a publisher sockets and 'zmq.sub' for a subscriber socket, router dealer, push /pull, yeah, that defines the messaging pattern of the underlying underlying sockets. You also have some 'Jupyter' lab examples, which I guess we

50:00 can link to as well like some some diagrams for that right? Yeah. So the 'Jupyter' protocol has a diagram of a diagram that we maybe should redesign. Whoops. there we go shows you what basically what happens when you have one kernel and multiple front ends connected to it with the different socket types that we have in the 'Jupyter' protocol. So the 'Jupyter' kernel has two router sockets, and a pub socket, and fully featured front end would have to dealer sockets or request sockets and a sub socket. It's just so much is happening below the scenes. I think getting your mind around these is really neat. But basically 'ZeroMQ' is handling so much of this for everyone, right? It's handling all the so we never care about there's multiple peers connected, we don't need to deal with that we have no connection events, some time folks working on different issues that causes headaches for them, because so there's some aspects of 'ZeroMQ', like they're not guaranteed pub sub delivery, is that it's actually kind of a pain. Because in because we actually want. Yeah, of course, is there any, you know, around a lot of libraries, there's stuff that like, adds layers that does stuff. So like 'Flask' extensions and stuff like that, is there an extension that will like let you do reliable messaging that you can plug in on top of this, or anything like that? Yeah. So if you look at the 'ZeroMQ' guide, there are different patterns, some of which are basic uses of sockets. So there's no reason to build a another layer of software on in order to implement that. Yeah, a simple ventilator pattern. But if you're talking about things like reliable messaging, there are some patterns in the guide. And they they have names. And so some people have written those protocols as standalone 'Python' packages that say, like, implement this scheme on top of 'zmq', that might have things like message replay. And yeah, or you know, election stuff. Like if you do Kubernetes things, there often leader elections to allow you to scale and migrate things. So you can do that with 'ZeroMQ' applications. Yeah, and some of those reliable messaging things sound amazing is gonna be great. But there's other drawbacks to those as well like points and messages, like, Yeah, I got to make sure I send this but the server, the client can't receive it. So they crashed. So then I tried to send it again, and just in these like weird loops, and that there's a lot of, they'll have their challenges. Yeah. So another thing that I think would be interesting to touch on, or a conversation which we've spent so much time talking about all the programming patterns and stuff that I don't know, we have as much time anyway, as we imagined. But yes, like building 'pyzmq' for basically to wrap this C library, right. This is some challenges you've had it supports both 'CPython' and 'Pypi', 'pipi',

50:00 'Pip'. Yeah, and whatnot. So maybe talk about some of the ways you did that you had to do this. Like in the early days, when there was 'Python2' and 3, there's a lot of stuff going on, maybe pre wheels, right? Yes, a few years pre wheels. So with 'IPython', and 'Jupyter', our target audience is pretty wide, right? We have a lot of people in education, a lot of students, a lot of people on Windows, a lot of these people don't even want to be programmers or care about like, exactly, they just want this to work. And Why won't this thing install, I just need to do this for my class or for my project, it needs to work, right? That kind of thing. Yeah. And so having a compiled dependency was a big deal for a lot of people. And so making binary releases as widely installable as possible, was really important to us. And supporting as many, you know, 'Python' implementations as possible, was also important to us. So I think you was originally written all in 'Cython', which is a wonderful library for this when you're interfacing with the C library, especially when you want to do things with the buffer buffer interface. So when you have when you have a C object and a 'Python' object that represent the same memory, 'Python' is the best. Right? And so that, and that's a lot of what we do for the zero copy stuff in 'pyzmq'. So when we were working on this wheels didn't exist wheels being the binary version that you get from 'PyPI' now. Yeah, so if you 'PIP' install something like 'pyzmq', it doesn't compile it, right, you get a wheel, and that just unzips it and it's really nice. But at the time, there were only eggs. And there was a period of time when pip was taking over from 'easy_install', which was eventually wonderful. But one of the drawbacks was, during this time, when pip was taking over, there was still no wheels. And so people had started shifting to pip because 'easy_install' did a lot of things that people don't like, but you had to use 'easy_install', if you want it to get a binary, which means effectively if you're on Windows, you had to use 'easy_install'. And so we had a really complicated there actually used to be a big delay might even still be there in 'pyzmq' No, this is definitely not so there was a big delay in 'pyzmq' through that if you ran 'setup'Py. It would sleep for 10 seconds and show you a big message that says you might want to 'easy_install', but now we're in a very different world. Right. So even after wheels, it was a couple years before we had before we had many Linux wheels, right? There was a while before you were even allowed to make wheels for Linux and now we're at a place

55:00 Where we've got wheels for 'Arm Macs' and 'Arm Linux', and a bunch of different Linux versions and Windows and everything. And it's really different world. And a lot of things would be different if we were starting this project, right? It would be easier if you started now probably right? Yeah, it would be a lot easier because we were getting in on some some early stuff. But one of the wonderful things about 'Cython' is if you're writing 'Cython' on you're really writing C, right? If you're writing 'Cython' on code, it's generating a C program that calls the 'Python' 'C API', right, you write 'Python' with a little typing stuff, it turns that into C and then compiles that to machine instructions, right? Like you're basically projecting C somehow, yeah, you're basically writing 'C' that looks like 'Python'. The nice thing about C is that with directives and things, you can have one file that actually contains 10 different files. Because you can just turn off lines when when it's compiling. And that means that it's much easier to write 'Cython' code that supported 'Python'. At the time, we were supporting 'Python 25331'. And with a single code base, we had, you know, no two to three, none of that single code base in 2010, supporting 'Python25', and 'Python31' and everything in between. That's quite an accomplishment. And the tricks were all you know, dealing with, you know, 'pyzmq'. We were early adopters, a lot of the 'Python' concepts, 'Python3', concepts of we talk bytes, we don't talk 'str()', right, disambiguate. We use, we use bytes and Unicode everywhere. We never use the word 'str( )'. Yeah, there's not necessarily that many differences between 'Python2 and 3. Yeah, but you could make them different I but in a lot of the more modern 'Python' two, you could still be much closer to what eventually become 'Python' became 'Python'3. Yeah. So I would say it was either three, three or three, four, it became the norm to have single code base single syntax support 27, so drop support for 26, support to seven, and I think three, three, or above or three, four and above, whenever they let you back in, because then it became easier. What's the story with 'Python2' now? Does it still support it? Just as of December? Right. So 'Python2' end of life was last December, right? 'pyzmq' dropped support. The latest release requires 'Python36, actually nice. Yeah, I feel like a lot of people are going to 36. Why did you guys choose 36 Is it 'f-strings' Or was it something else? It was actually the typing. Right? One of the main complaints about 'pyzmq' is it's so auto generated and dynamically defined, right, because we didn't just target multiple versions of 'Python', there's also multiple versions of Lypsim and Q. And that means that 'pyzmq', what constants are defined is different depending on what version of lipsy and Q was linked, which meant that a lot of the way it's written a lot of static analysis fails. So when your autocompleting based on static analysis, you know, all the constants don't show up. And so that was in an occasional, you know, folks like I my autocomplete in 'PyCharm' was not working. That's why I added the types was to allow a static Oh, yeah, that's fantastic. Does it do anything with like, what type shad does where it's like defining the structure in the 'Stub' files. So there's some type annotations in the pure 'Python' code. But the most the the relevant part was the 'Stub', the 'Stub' files for the compiled the compiled part. And if people haven't seen those stub files, those pi files, it's, it's a little like a 'C++' header thing where it has the definition, but then somewhere else is the implementation of it. It's a little funky. But yeah, it's also useful for adding that in, right. So you could say, here's the structure. And we'll make that at runtime dynamically. But this is what you should think of it as right. Yeah, yeah. Very cool. Very cool. All right, man. Well, you know, I think we just honestly just scratched the surface, we could go on and on and on. But at the same time, I want to be respectful of your time. So maybe we should wrap it up on the main topic there. So I'll ask you the two questions on the way out, if you're gonna write some code, if you're gonna work on 'pyzmq' or something like that, what editor would you use? My favorite editor of all time is 'TextMate'. Okay, but for various reasons, I don't use that anymore. It kind of it hasn't kept up with activity and things and I didn't feel sustainable anyway. So I haven't used it in a long time. I've tried pretty much everything. And I'm, I'm in a, I'm in constant search of the next textmate. So right now, I'm actually using Nova, the new editor from panic. I was gonna say maybe Nova is your next textmate. How about that? Like, I know if I say Visual Studio code or 'PyCharm' people like oh, yeah, I'm pretty sure I know what that means. 'Nova' is pretty new tell folks about it. Nova is a new text editor from panic. One of the great Mac developers. Oh, yeah. I use some other apps, like transmit for working with s3 files. They're really nice stuff. Yeah. Yeah. So they do a great job designing things. And thanks to the recent work of language, the language server protocol and stuff, new editors are able to there's a lot more shared infrastructure in editor features. So it's, it's starting further from zero than it might otherwise be. But I'm not sure I could recommend it widely that 'Python' developers, it's a bit of an early adopter situation. It's supposed to have 'Python' support, but it's

01:00:00 It's not specifically for 'Python', right? Yeah, no, it's a general purpose editor. And the target community is more in Mac developers. So Ruby and web stuff. Yeah. I feel like it's pretty JavaScript friendly. Yeah. And maybe Mac developers as well. Yeah. Yeah. And extensions are written in JavaScript. I've written a couple extensions for

01:00:17 the darker code formatter.

01:00:21 Very cool. All right. Well, that's quite neat. And I'm glad to hear that's working out for you. I've wanted to try it, but I just haven't. And then notable 'PyPI' packages, I know, you pick two that have some relation back to this challenge of building binary stuff and distributing it. Yeah. So up until December, there was exactly one computer in the world that could build pi 'pyzmq'releases. That's my laptop. And I finally solved that problem, thanks to two wonderful packages. One is 'ci build wheel', which is a more general useful if you have compiled 'Python' packages, 'ci build' builds a wonderful thing for building and distributing all your wheels to on all kinds of platforms. So now 'pyzmq' wheels are all built on GitHub actions, and I don't need to do anything other than tag a release. And it all happens magically. The other one that I wanted to highlight that probably fewer people know about, that's related to 'ci build wheel', because when you build a 'Python' package that has an external dependency, there's an extra step to say, I built 'pyzmq', I linked it against lipsum Queue over here, but if somebody else installs that wheel as it is, it's not going to work, because it's gonna say, like, I don't have lives in queue. So there for a long time, there have been Mac, there's a Mac thing called 'De locate', and a Linux one called 'Audit wheel' that say, look at the binaries in there, and find them on your system, bring them in and update the linking to make sure they load. So the wonderful thing that I just found is someone created something called Delve wheel, which is audit wheel or D locate. But for Windows, and I don't understand anything about Windows, go grab the DLLs and all that kind of stuff that have to be there and you know, put it in the right location. Perfect. Yeah. And so for a long, long time for Yeah, I guess 10 years plus 'pyzmq' built lidsey and Q on Windows as an extension. So it actually took the 'zmq' sources, and then just said, Hey, I have a python this is a 'Python' extension, don't you don't need to worry about the fact that it's actually a C++ library, just pass it to this utils and compile it as an extension. And there were a lot of issues with that. You never got up good optimized results, but it worked most of the time. And that was the point. And it was wonderful contribution from Brandon Rhodes. That is a huge step in making 'pyzmq' installable a lot more of the time was this bundling lives in queues and extension. But thankfully, finally got to the point where I almost never do that anymore. Nice. Thanks to Delfield. So Delwheels my big one. Yeah, the 'CI build wheel' wide. It makes me happy to see the macOS Apple silicon. Got a little checkbox there. That's what I'm doing are recording from on my machine over here got the Mac Mini one and man that that is a sweet device. But you're a little bit back in like, Oh, we don't have wheels for your system. Sorry. Yeah, I have five, two Mac silicon wheels, one that I just finished last week. But with the 'CI build wheel', that's a universal wheel. And then I have another like Mac arm wheel targeting mac os 11 Plus, basically just for homebrew 'Python', on homebrew 'Python' three, nine on our Mac's. That's a specific but yes, that's I think I have actually, it's not an insignificant target. And that one I build on my wife's new lab. Very cool. Very cool. That's the only one that's not built on ci yet. Right. Awesome. Well, thank you so much for sharing all this stuff. And of course, all your work. I feel like I have a lot to learn. But it's exciting stuff to be able to think about new networking way ways of doing stuff with networking and 'Python'. So thanks so much for that. And final call to action. People want to get started with this stuff. What would you tell them to do? I say read the guide. So if you're interested in 'ZeroMQ', thing about building distributed applications and things, read the 'ZeroMQ' guide, the whole thing, and I think it'll give you some new ideas. Even if you don't use your mq. It'll give you some new, good new ideas for how to how to think about this kind of application, right? Like these design patterns that maybe are not so common, but like pub sub, or whatever. Yeah, awesome. Well, thanks again for being on the show was great to chat with you. Yeah. Thanks so much. You bet. Bye.

01:04:07 This has been another episode of talk 'Python' to me. Our guest in this episode was men Reagan Kelly. It's been brought to you by 'Linode' and 'Mito' simplify your infrastructure and cut your cost bills in half with 'Linode'. Linux virtual machines develop, deploy and scale your modern applications faster and easier. Visit 'talk python'.fm/linode' and click the Create free account button to get started.

01:04:29 Do you feel like you're stumbling around trying to work with 'Pandas' within your Jupyter Notebooks? What if you could work with data frames officially just like they were an Excel spreadsheet, but have it write the 'Python' code for you with 'Mito' you can check them out at

01:04:29 talkpython.fm/mito' to level up your 'Python'. We have one of the largest catalogs of 'Python' video courses over at talk 'Python'. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight

01:05:00 Check it out for yourself at training.talkpython.fm' Be sure to subscribe to the show, open your favorite podcast app and search for 'Python'. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play and the direct RSS feed at /rss on

01:05:00 talkpython.fm'. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkPython.fm/YouTube. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some 'Python' code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon