#212: Python in Web Assembly with Pyodide Transcript
01:16 Michael Droettboom: Thanks, it's good to be here. Thanks for inviting me.
01:17 Michael Kennedy: Oh, you have such an interesting topic and thing that you've been working on. I was really excited when I heard about Pyodide, a little bit, I don't know maybe six months ago when I first heard about it and stuff and I was like, oh, this has some real possibilities. So I'm really excited to have you here to dig into that.
01:33 Michael Droettboom: Cool!
01:33 Michael Kennedy: Yeah. Now before we do get into that, let's start at the beginning, let's start with your story. How did you get into programming in Python?
01:39 Michael Droettboom: So I've been programming almost as long as I can remember. I think my parents brought home like, an IBM XT, sometime in the mid '80s and I learned you know, Basic on that, as one does. And just have been programming ever since. I found Python, I think in 1996 while I was at university. And I sort of used it secretly in the background to like, prototype my assignments that I had to write in other languages. I would kind of write these quick little hacks as I'm learning to program in Python, 'cause it was a lot easier for me and then once I had figured out the problem, convert it to Lisp...
02:13 Michael Kennedy: I can now solve the Syntax problem to make this happen over here, right.
02:16 Michael Droettboom: Exactly.
02:17 Michael Kennedy: Yeah what languages were you using?
02:18 Michael Droettboom: There's a lot of Lisp... at my school and as well Java and ML was a big thing then which you know, sort of grew into OCamel and that family of languages
02:29 Michael Kennedy: Yeah my first CS language, that I... I only took a couple of CS classes as a minor type of thing. But my first one was in Scheme, which is a derivative of Lisp, right?
02:39 Michael Droettboom: Right.
02:39 Michael Kennedy: And I felt that it was both very mind shifting and very interesting and also super not useful. Like what was your what were your feelings of studying Lisp, early on? Like I'm like, I can't go build anything with this, what is this crazy language?
02:53 Michael Droettboom: Yeah it definitely is kind of mind bending and you know, I think it enforces a lot of good habits like, functional programming ideas that you can bring into any language that are probably good habits to have. But yeah, like you say, it's not always the most practical. I know this is a question that usually is at the end of your show but I use Emacs and so occasionally to write Emacs extensions in Lisp. And what's fascinating about it is, Emacs is basically built by monkey patching everything else, that's... Emacs extensions, basically monkey patching or... Monkey patching is the functional thing that you do, which I find interesting and that's...
03:31 Michael Kennedy: That is interesting, wow.
03:33 Michael Droettboom: I guess tends to happen in Lisp a lot, or it's, well that's the only big Lisp code I'm really familiar with but...
03:38 Michael Kennedy: Yeah. Emacs is definitely one of the bigger projects. Quite, quite interesting. Cool, okay so. You sort of used it, you got into it pretty early, right? Like, what is that, four or five years into the existence of Python in which, for Python you know it grew grassroots. It wasn't like busted on the scenes by Microsoft, or Apple, or whoever, right. That is pretty early.
03:57 Michael Droettboom: Yeah. I went to my first Python conference in 2001 and at that point, I don't remember the exact number but I would guess it was maybe around 200 people. Not much larger than that and was one track. Everybody could fit in one room. It was small enough that as a grad student I was able to just walk up to Guido and have a talk with him 'cause there wasn't that many people demanding his attention and of course now you go to PyCon and you know, forget anything like that happening. There's just thousands of people.
04:24 Michael Kennedy: Yeah, it's completely crazy.
04:26 Michael Droettboom: Yeah I mean, it's really fun to see how big the community has grown. On the other hand I do miss the days of the sort of smaller conferences where you feel like you've got a hold of it and you feel like you got to sense of seeing a lot of the stuff but these are good moments to have, I guess.
04:38 Michael Kennedy: I guess they are. Like, how does it feel to do Python, now versus then? It's got to be kind of different with pip install antigravity and all that.
04:46 Michael Droettboom: Yeah. I think definitely having, having good package management means you're no longer compelled to like, bring everything into your projects in the same way. So I will be the developer on Matplotlib and one of its, sort of I guess, technical debt for lack of a better word. It's the fact that, for the longest time it was as if someone had a great idea for a new plot type we'd just say, yeah, let's include it in that plot lib because the alternative was forcing all the users to install a bunch of packages which was really hard at the time. So we would make this one really big package and you know, in hindsight that's not the greatest thing. It means a lot of code for the core developers of that project to maintain. So it is nice that we live in this world now where we can have lots of little packages that interact and that's not a huge burden on the user, as it once was.
05:35 Michael Kennedy: Yeah, that's pretty interesting. I mean, maybe it would've been better to have a bunch of little matplotlib -extensions that you include in your requirements files if you want to do these kind of graphs and stuff and you could've kept it a little more distributed.
05:46 Michael Droettboom: Right.
05:47 Michael Kennedy: in terms of support but yeah it's the right architecture and patterns for the right time, right?
05:52 Michael Droettboom: Exactly.
05:53 Michael Kennedy: That's what you needed, yeah?
05:54 Michael Droettboom: Yep.
05:54 Michael Kennedy: Cool. So you both work at a cool web company and do data science. So you kind of do both of the things that Python is really good at, simultaneously, right. Tell us about that.
06:03 Michael Droettboom: Yeah. So I'm a data engineer at Mozilla. I've been there about a year-and-a-half. Well I work on the team that manages the telemetry that comes from our products. So from Firefox on desktop and on Android and iOS and all these things. You know and the telemetry goes into improving the product. It helps us discover when things are going wrong, whether things are getting better with changes or worse with changes and things like that. And so there's a whole thing that manages collecting that data, ingesting that data and then providing ways for people to analyze it at the end of the day. And what's really exciting about how they do that at Mozilla is we have this document called our Lean Data Practices, where we try really hard to not collect anything that we don't need to collect, not collecting anything that will invade people's privacy. We really are just collecting what we need in order to improve the product. And so it's really nice to come at it from that point of view. Not just to sort of snarf it all up and see what we can do later but to really think upfront, do we need to do this and do we need to collect this.
07:01 Michael Kennedy: Yeah, that's super cool. Like maybe you had for a while, Firefox had the Address Bar and then like a Search Box, right. And most the browsers have given up on this idea but you can tell that from telemetry, right, how people are using each part and so on, yeah?
07:14 Michael Droettboom: Exactly, yeah we're able to sort of see how the Search Bar is getting used even now that it's a unified thing. It's a really fun place to work, that the engineering talent there is just beyond anything I've ever had the privilege of working with.
07:27 Michael Kennedy: I can imagine that it's super awesome. I'm personally a big fan of Firefox. I generally just use Firefox if at all possible and it frustrates me to no end where I go to places and they're like, this page is only available on Chrome or this site only streams this video on Safari. And you know there's no good reason for it other than they're just lazy to do like, the tiny bit of effort to make it work, right.
07:51 Michael Droettboom: Yeah, it's unfortunate. You know, I think one of Firefox's biggest challenges is that unfortunately there's like a network or a snowball effect, right. It's that the more websites that don't work on Firefox, the fewer people are going to use it and therefore, the fewer websites that are going to work on Firefox. And so trying to break out of that cycle is sort of a constant battle for us. That we're you know for tackling on a number of fronts.
08:15 Michael Kennedy: Props to you guys, you're doing great stuff, keeping the web open. I really am a big fan of Mozilla and I guess our topic today is just one more reason why.
08:23 Michael Droettboom: Cool.
08:23 Michael Kennedy: Yeah, absolutely. So we're going to talk about WebAssembly and Pyodide but I think it maybe makes sense to just sort of lay out a little bit of history of like, what even led to WebAssem-not necessarily what led to WebAssembly but like, what preceded WebAssembly, you know?
08:40 Michael Droettboom: Sure.
09:46 Michael Droettboom: Yeah, it's fantastic.
10:45 Michael Droettboom: Yeah, that's a great little history. I mean, most of that predates my coming to WebAssembly. I've only been using WebAssembly for about a year-and-a-half when this whole project started, so fortunately Mozilla has a lot of people who do work on WebAssembly and have that...
10:58 Michael Kennedy: Yeah I'm sure. Today they're...
10:59 Michael Droettboom: So it's really nice...
11:00 Michael Kennedy: orinate with Mozilla, or was it...
11:02 Michael Droettboom: It did.
11:02 Michael Kennedy: Yeah I felt like Rust and WebAssembly and all that stuff kind of came from you guys.
11:06 Michael Droettboom: Exactly. Yeah it originated at Mozilla but it is definitely an open standard.
11:11 Michael Kennedy: Right.
11:11 Michael Droettboom: that all the browsers are supporting and all that stuff, so.
11:14 Michael Kennedy: Yeah super cool. So I mean, maybe give the elevator pitch of like, what is WebAssembly for folks who don't necessarily know.
12:10 Michael Droettboom: Yeah. Yeah so these are details I don't know too much about but
12:14 Michael Kennedy: Yeah same.
12:14 Michael Droettboom: they definitely, is making all these assurances that the sort of typical things you could do in C that become security flaws and you cannot do in WebAssembly. Or if you do them you don't break out of the browser. Right?
12:27 Michael Kennedy: Right.
12:28 Michael Droettboom: Yep.
12:28 Michael Kennedy: You just get an exception or something, right?
12:29 Michael Droettboom: Exact, yeah.
12:31 Michael Kennedy: Cool. Dan Callahan, another Mozilla person, gave a pretty interesting call-to-action around WebAssembly last year in one of the PyCon key notes at US PyCon. Were you there for that?
12:42 Michael Droettboom: I wasn't there. And actually, what's interesting about that is, I had already been working on Pyodide for a few months when he gave that talk. And he and I were not aware of each other at all.
12:52 Michael Kennedy: Yes.
12:52 Michael Droettboom: Just sort of explains how big a place Mozilla is.
12:54 Michael Kennedy: Yeah, yeah for sure.
12:54 Michael Droettboom: You know, it's nobody's fault at either end at all but...
12:59 Michael Kennedy: And you guys also have like PyPy.js, which I don't think is active anymore. But there's like a lot of these little flowers blooming in that world, right?
13:05 Michael Droettboom: Exactly, exactly. And so it was really cool to see his talk and sort of realize we were thinking along the same lines and we sort of check-in with each other periodically on what's going on there, which is great.
13:16 Michael Kennedy: That's cool. So I guess the quick summary of that, I'll link to the whole half-hour presentation but it was like, Python is amazing, we love Python but the web is one of the most important places where code runs right now and running in a browser, Python is sadly absent for the most part. I mean, we have Sculpt and a few of those other things, like PyPy.js but they're always in some kind of like, seven caveats and some little sliver of use case, right. What I was hoping for, when I watched this was like, okay this is the build up, please let this be an announcement. Like, please let this be an announcement involving WebAssembly and it just turned out to be a, community we need to work on this and I thought it was really awesome when I saw Pyodide come out. I'm like, oh my gosh, they actually were working on something. But I guess since you didn't know about each other you couldn't really... It's not like he could do the big reveal of like Pyodide's a good start or something, right?
14:03 Michael Droettboom: Yeah yeah. It would've been a little bit of a different topic, I guess, yeah but still I mean, the points he makes are great points in terms of I think you know, the web is where so much computing happens these days that if you aren't playing in that space, you know, you are becoming limited. And yeah like you say, there have been a bunch of other projects to bring Python to the web browser. The thing that makes Pyodide a little unique is that, it tries to be as close to upstream as possible. So it's using upstream's CPython, the upstream versions of NumPy, and SciPy, and all these things and tries to change them as little as possible so that the effort on those projects contributes into our effort directly rather than reinventing and then constantly having to keep up with that, right.
15:03 Michael Droettboom: Yeah that's just so many years of effort that I think you know, it would always be a poor imitation of the real thing, right. So it's...
15:17 Michael Droettboom: Right and you see this a little bit even with like PyPy. I mean, PyPy is incredibly impressive, really cool project but there's still like the 3.6 level of Syntax because they're just always sort of following. It's just sort of the nature of what they're doing and I don't mean that as criticism but if you aren't tracking the leader you're always going to be a little bit behind, right.
15:37 Michael Kennedy: Right.
15:37 Michael Droettboom: Right.
15:38 Michael Kennedy: So I guess before we move off of just WebAssembly on its own I can dig into Pyodide, how well-supported is it? Like, WebAssembly sounds like new and futuristic. How well-supported is this?
15:48 Michael Droettboom: It's in all the major browsers, in the stable versions of all the major browsers right now so it's pretty easy to rely on it. It's at what they're calling sort of this MVP-level of WebAssembly. They sort of decided which features were the most critical and that's everywhere. Then there's a bunch of ways in which WebAssembly is already planning to be improved that will eventually trickle down to browsers. So things like, threading are newer features
16:16 Michael Kennedy: Right.
16:16 Michael Droettboom: that are coming down. And garbage collection is going to be added. So there's a bunch of things that are coming that you can't rely on yet but for the core stuff and actually for most of the stuff we needed for Pyodide, it's already there.
16:29 Michael Kennedy: This portion of Talk Python is sponsored by Microsoft and Visual Studio Code. Visual Studio Code is a free, open source and lightweight code editor that runs on Mac, Linux and Windows with rich Python support. Download Visual Studio Code and install the Python Extension to get coding with support for tools you love like Jupyter, Black Formatting, PyLint, py.test and more. And just announced this month, you can now work with remote, Python code bases using the new Visual Studio Code Remote Extension. Use the full power of Visual Studio Code when coding containers in Windows Subsystem for Linux and over SSH connections. Yep, that's right, auto completions, debugging, the terminal, source control, your favorite extensions. Everything works just right in the remote environment. Get started with Visual Studio Code now at, talkpython.fm/microsoft. So it looks like I opened up caniuse.com. I'll put a link to the WebAssembly report for, Can I Use which talks about like what browsers support what. So it's look like, Edge, Firefox, Chrome, Safari, Opera, all those desktops are supported. iOS on Safari supports it and Android, Chrome and Firefox, they're supported. That kind of sounds like 99%, right? Yeah, I'm excited to see things like threading and stuff coming as well. There's you know, a possibility for more interesting stuff to come along, yeah. Cool, all right. So that brings us to Pyodide. Now there's a couple of interesting projects that are saying WebAssembly plus other language plus interesting runtime means something in the browser. What did it mean for you guys?
18:00 Michael Droettboom: The reason Pyodide started is, when I arrived at Mozilla, Hamilton Olmar and Brenda Collerin and also William LaChance and Tion Brooks were working on this sort of internal Skunkworks project for Data Science at Mozilla. The idea was people were... Data Scientists at Mozilla were using a lot of things like Jupyter notebooks or there's a tool called Databricks that's very similar and the problem with these tools was that sharing them is harder than maybe it needs to be because you have a front-end on the web but your actual computation is happening somewhere else. It might be like...
18:36 Michael Kennedy: Right, Yeah.
18:36 Michael Droettboom: But often it's the remote kernels somewhere else.
18:39 Michael Kennedy: Right, so you need maybe access to that compute cluster or if you want to run it locally you've got to pip install a bunch of stuff right, you're like, oh you can run this. It's easy to run except for you have to now set up a virtual environment, now you should probably use Mini Conda, here's your stuff, you're like, whoa whoa whoa, I just want to look at the report. What is this, right? Like, for a lot of folks...
18:55 Michael Droettboom: exactly.
18:56 Michael Kennedy: it's super overwhelming.
18:56 Michael Droettboom: Yeah your choices are generally, you either require people to install like, which like you say, is very difficult. Or you have to sort of pay for some Cloud-computing resources somehow. And so if you were to put that on say a public website to share your data science you might end up with an unexpectedly huge bill, perhaps. Right?
19:16 Michael Kennedy: Right. The worst case thing happens, exactly what you want is a lot of people get interested in it.
19:20 Michael Droettboom: Exactly, exactly. And so the idea with Iodide was let's move all the computation into the browser and then the computation's all happening at the Edges in people's clients, right.
19:30 Michael Kennedy: Right and to be clear for people is Iodide, not Pyodide, right? This is a another project.
19:34 Michael Droettboom: Exactly.
19:36 Michael Kennedy: Yeah okay, yeah.
20:11 Michael Kennedy: Right. And numbers are weird, right. Like you can't have true integers for example, in other stuff, yeah.
20:16 Michael Droettboom: Exactly. And there's a lot of movement in that space trying to make that all better but that's kind of a big lift and definitely the Iodide project wants to encourage that and we're working on that kind of in a you know, in one thread. But in the meantime, we thought, why don't we try and come to data scientists where they are, which is in Python and somehow bring Python to the browser. And this seemed like a very crazy idea to me when it was first raised but fortunately like I said, at Mozilla we have a bunch of WebAssembly experts who got on a meeting with us and said, ehhh, it can't be that hard. Other people have done things much more difficult than that. So why don't you just try it? So it went off and found this project by a GitHub user named Djim, called CPython Script Engine which had done a lot of the like, initial footwork on this and starting with that was able to get something going in probably a couple weeks. That's kind of like that 80% in 20 weeks and then the main 20% is... Or I'm sorry, in two weeks. And then the remaining 20% takes forever. Certainly getting to the proof concept was pretty quick and realizing, hey you can't actually pile the real CPython interpreter and get that to run and then have the real NumPy loading in and then all those things actually do kind of work. It was pretty exciting.
21:32 Michael Kennedy: Yeah that's pretty amazing in how much of the CPython... Like, how big is the CPython.wasm or whatever it's called. Like the core runtime bits you got to take down before you can start doing stuff. Just roughly.
21:45 Michael Droettboom: Yeah, roughly... So I'm actually pulling up those numbers because they change all the time. It's about 20 megabytes for PythonCore and then of course, the libraries you're going to pull in will add to that some NumPy's, around 8 megabytes. It goes from there but one of the things that Pyodide does is it only downloads the libraries you actually import.
22:06 Michael Kennedy: I see. If you don't import Matplotlib or NumPy like those are not things it has to go hit, right?
22:11 Michael Droettboom: Exactly and also your browser will cache those things.
22:14 Michael Kennedy: Yeah, yeah, yeah.
22:14 Michael Droettboom: So the first time you'll pay the network penalty. But once it's been done once, it's on your machine. And it will actually recompile the WebAssembly each time but it doesn't actually have to download it again.
23:26 Michael Kennedy: Yeah exactly.
23:26 Michael Droettboom: We can get any of them in there. Are there ways that it could be done like, more efficiently or more conveniently? Certainly, like I think...
23:33 Michael Kennedy: Right.
23:34 Michael Droettboom: you know maybe there'll be some like web extension you download that gives you languages so that it just kind of will always update those in the background and you don't have to worry about this kind of stuff. There's ways that you could make it almost like a browser just with a little bit of extra added to make...
23:49 Michael Kennedy: Yeah I'm not nearly proposing Python in Firefox. I'm proposing Firefox come with these preloaded or like you say, in the background, like preload the latest one or something. And then you could have Python. You could have C#, you could have all the languages that are building these run times in WebAssembly and make them available. And just, I think that would be super cool actually.
24:09 Michael Droettboom: Yeah it definitely would open up the possibility for say, writing web applications using this technology. So one of the things I do warn people about Pyodide is, the thing that's cool about Pyodide is you can actually see the Python in your web browser and run it there which is a great for the data science use case, right. But if you don't have a use case where you need to show people the code, this is probably not how you want to write your web app.
24:36 Michael Kennedy: Yeah, for sure.
24:41 Michael Kennedy: Yeah so, Pyodide is very much focused on basically, what you do with Jupyter notebooks but make that execution happen on the client's side in the browser, not connected back to a Docker thing or some kernel elsewhere, right. Like that's the...
24:56 Michael Droettboom: Exactly.
24:58 Michael Kennedy: That is the use case and that's what it's built for and optimized for, right.
24:59 Michael Droettboom: Exactly. That's the original use case that came out of Iodide. What's interesting is since then, we've been talking with the WebAssembly folks again who are really pushing the idea of WebAssembly as a containerization technology. So WebAssembly that doesn't actually run in a browser but would run on Cloud computers, right.
25:22 Michael Kennedy: Yes.
25:22 Michael Droettboom: Because it provides a really nice, sandboxed way of running arbitrary code but it does it in a way that's actually a lot lighter weight than Docker, right? So like, Docker is maybe the industry standard for this right now but Docker, essentially you're taking a whole Linux distribution and a whole OS and shoving that in a container and passing that around.
25:41 Michael Kennedy: Right... They do require the kernels to match if the host and container, right. Whereas WebAssembly doesn't care. You know there's...
25:48 Michael Droettboom: Exactly.
25:49 Michael Kennedy: There are projects like wasmer that are bringing this to Python already. So it's not a farfetched idea.
25:52 Michael Droettboom: No it's not. I think it has a lot of promise. And for our own little data science community, what excites us about it is, we can use the browser as like a prototyping tool where the computation is happening locally and it's really fast, maybe while you're working on a small part of your data but then to be able to very smoothly say, I'm now going to run that on a cluster without having to change anything and having it still be built on the same technologies could be a really powerful thing. This is all kind of in the pie in the sky, dreaming of it's stage for us.
26:27 Michael Kennedy: Yeah but it's a sweet pie, like it's really nice, yeah.
26:29 Michael Droettboom: Exactly, yeah.
26:29 Michael Kennedy: And you could bring in things like Dask as well to help you do distributor computation without people really even knowing or caring that that's happening. So there's a lot of stuff that could like, expand, expand to the servers, expand to clusters and so on.
26:42 Michael Droettboom: Exactly, yeah.
26:42 Michael Kennedy: Cool. So I've seen some cool examples of this already working and like I pulled up the, what's it, L.A., data or some kind of like, city map, the one that came from your article that you recently published but there's a live example, I want to link to that, it's pretty interesting. It's doing real data science. It's doing real computations. So it takes like 15 seconds to load right.
27:03 Michael Droettboom: Yep.
27:53 Michael Kennedy: Right.
28:11 Michael Kennedy: Nice.
28:35 Michael Kennedy: Yeah and maybe in this crazy feature you could also import some other WebAssembly-based visualization thing that you don't even know what it's written in, right. Like who knows.
28:43 Michael Droettboom: Exactly.
28:44 Michael Kennedy: Maybe it's in Swift or something crazy, right. But yeah, interesting. Okay, one thing I want to ask you is like, if I'm a data scientist and I'm listening to this and I'm like super excited, should I be excited today or should I be excited in like a year-and-a-half like, is this something I can reasonably use now or is that a cool proof on concept or like what's the status?
28:59 Michael Droettboom: I would say, you have to have a little bit of patience still. You know, we certainly encourage people to come up and try to do the kind of things they're doing in Jupyter now in Iodide and Pyodide and sort of see where some of the rough edges are. We do actually have it, it is being used for real work within Mozilla for data science. And people are using the Python parts of it and stuff.
29:22 Michael Kennedy: That's cool.
29:22 Michael Droettboom: So if you sort of know where the boundaries are and what you can get away with, it's already working.
29:27 Michael Kennedy: It's sharp and jagged over there, don't go over there. Just stay here.
29:30 Michael Droettboom: Exactly. Yeah, yeah, yeah right. But I would be being disingenuous if I said it was ready for everything that people might want to do.
29:36 Michael Kennedy: Sure. Well I guess the way you find out and the way you get it ready is people try to use it and you're like, wait, everyone's trying to do this and it doesn't do that, well maybe that's something to get.
29:43 Michael Droettboom: Exactly. No it's really helpful for us, actually. 'Cause we've been getting some really great bug reports. We had a blog post last week that kind of brought a lot more traffic to the site and that turned into a lot of really great bug reports of things that like, seem obvious in hindsight but we had never thought to check, is that going to work. That's really helpful. And also like it helps us prioritize what features need to be added. You know, if like 10 out of the 20 people that show up all have problems with the same thing, well that's a pretty good sign that that's where we should focus effort.
30:14 Michael Kennedy: Mmm-hmm, yeah that makes total sense. Are there full-time employees working on it at Mozilla? Like, what is its status, sort of as a project for you all?
30:21 Michael Droettboom: So there's probably a total of about three FTEs working on it divided among five people within Mozilla and it's still primarily sort of internally devoted project in that the, our internal users are kind of helping us prioritize what gets worked on and moved forward. We do of course have the public facing website at iodide.io where anybody can come up and create notebooks and we do look at that too and it's all open source. And we're just sort of hoping that if we can bootstrap it enough internally and make, prove it as a useful tool then we can get, hopefully some more resourcing to kind of make it something that will serve a broader community.
31:00 Michael Kennedy: Yeah it's super cool. So if am I person who maintains or somehow in charge of a data science package that is not available in there. I'm guessing there are some that are not available, is that right? So if I am like, how do I get mine in like iRun you know, whatever and I really want that available alongside NumPy, how do I make that happen?
31:20 Michael Droettboom: So right now, all our package building is, it's built as part of, kind of this monolithic tree of make files so to add a new package, you would basically add it to the Pyodide source code and then that causes it to automatically get built and then eventually distributed. What I'd really like to move is to something that works more like conda-forge. And maybe even literally to use conda-forge if we can make that work so that anybody could just walk up and contribute a package sort of in their own repo and that would automatically get picked up and then distributed so it would be a more like, distributed-build system than what it is now.
31:57 Michael Kennedy: Oh yeah, that's pretty interesting, yeah. That's a cool idea. To just basically have a approve sources for packages and you guys just continually pull them kind of like, like you said, like a CI system almost.
32:08 Michael Droettboom: Exactly, yeah. Yep so you have individual package maintainers who could maintain their own package but the sort of infrastructure that makes that all work would be centralized is the idea. Yeah.
32:19 Michael Kennedy: Right. Is it hard? If I have something that has like, some C section and some Python section, what's the overhead to get it working in this?
32:28 Michael Droettboom: It varies. So like, NumPy that has C but it's fairly straightforward C is not too bad. One of the things we've really struggled with is SciPy 'cause SciPy actually has a fair bit of Fortran. And so Fortran has its whole other set of problems for WebAssembly. There's basically, there's not a good compiling option for anything that's not Fortran 77 right now. So we kind of have to push that forward somehow. So there's kind of this range of easy to hard and it's hard to know maybe upfront how hard something's going to be. Pure Python stuff is very easy. Pure Python, there's actually even a little helper script where if you run this little helper script with the name of the package on PyPI, and it will automatically generate the Make file needed to build it as part of Pyodide and it automatically goes in you know, you can send that as a PR and you're good to go.
33:20 Michael Kennedy: That's pretty cool. So when, say my Python, supposed I have Pure Python package on PyPI and I wanted to have it in there, does it get somehow compiled to WebAssembly? Does WebAssembly itself just like, become an interpreter and just use Python byte code, like what, do you know what the process there is?
33:38 Michael Droettboom: Yeah, so Pyodide is literally running the Python interpreter inside your browser.
33:44 Michael Kennedy: Okay, right.
33:44 Michael Droettboom: So if you have a Pure Python package, it's actually just shipping Python to that interpreter running in the browser and that's how it runs.
33:51 Michael Kennedy: So it basically just works off PYC byte code.
33:52 Michael Droettboom: Exactly, yeah.
33:56 Michael Kennedy: And just feeds it off to the interpreter, just happens to be executing, not on C but in WebAssembly.
33:59 Michael Droettboom: Exactly.
34:01 Michael Kennedy: All right.
34:01 Michael Droettboom: Exactly...
34:02 Michael Kennedy: That was my first guess but I thought maybe there's some other magic like, oh we had to add a JIT compiler to this or you know, some weird thing.
34:08 Michael Droettboom: No. And so for that reason, what you're getting is something that performs pretty similar to CPython, right. Something like PyPy.js has of course, opportunities to JIT a Python, itself and potentially get a lot more performance. So we're not doing anything sophisticated like that in Pyodide at this point. So because we have a really nice JIT sitting there in the browser it's certainly there and could be on point.
34:54 Michael Droettboom: Absolutely, yeah.
35:08 Michael Droettboom: I don't actually know those numbers. I'm sure that WebAssembly, at this point is quite a bit better than Assembly.js. The numbers I do have is what happens to Python. So like, comparing Python, running natively on a machine versus inside a browser. And you generally get anywhere between like the same speed and 12 to 20 times slower. And what seems to matter... So if your Python application is mainly just calling NumPy operations, which are at the bottom sort of, C-type inner loops.
35:39 Michael Kennedy: Yeah, just...
35:41 Michael Droettboom: That stuff tends...
35:41 Michael Kennedy: Orchestrate, yeah.
35:43 Michael Droettboom: Pretty much the same speed, right.
35:44 Michael Kennedy: Right.
35:44 Michael Droettboom: Those tight C loops tend to be pretty much the same in WebAssembly as outside Assembly. If you're doing a lot of looping in Python or calling a lot of Python functions, those things tend to get quite a bit slower. And the reason is that the Python interpreter is basically calling a lot of C function pointers all the time. That's sort of kind of how it works. It's calling other C code through C function pointers.
36:09 Michael Kennedy: PyObject* all over the place, yeah.
36:11 Michael Droettboom: Exactly yeah and calling a C function pointer in WebAssembly is quite a bit slower than it is on Native. For reasons that I don't know if I could fully articulate but probably for...
36:21 Michael Kennedy: But they are.
36:21 Michael Droettboom: Yeah, partly related to the security model like there's just ways in which that's a lot slower. And unfortunately the way the Python interpreters designed is making lots of C function pointer calls all over the place.
36:33 Michael Kennedy: Right. But it might not matter, it depends, right. Like...
36:35 Michael Droettboom: Exactly.
36:36 Michael Kennedy: So it's five times slower, might make it go from half a millisecond to I don't know, two-point-five milliseconds. And like the user doesn't perceive these or care about them, right but now all of the sudden you have this great new deployment story and this execution engine and like, what it opens up is way more valuable than that amount of slowness, potentially.
36:54 Michael Droettboom: Exactly. It's certainly within the realm of like, I can live with that. You know it's not hundreds of times slower, right.
37:00 Michael Kennedy: Interesting. Cython, can I do some sort of, like I've got some double-y nested loop and I know that that's the problem, can I cythonize that puppy and then WebAssembly the result or something?
37:12 Michael Droettboom: Yeah. So Cython works just fine. And in fact like Pandas is largely written in Cython. So in order to get that to work, that needed to work. But all of that compilation happens ahead of time on the native machine, before shipping it to the browser.
37:30 Michael Kennedy: Right.
37:30 Michael Droettboom: We don't actually have the ability to compile Cython code inside of the browser.
37:34 Michael Kennedy: Right, but as a package developer maybe I could leverage that to avoid like the really, so where there's a penalty in Python versus C now, the penalty may be worse in WebAssembly but may be the answer, is Cython still something like that?
37:47 Michael Droettboom: Absolutely, yeah. Yep, totally. And then of course people have gotten the client compiler to run in WebAssembly.
37:53 Michael Kennedy: Of course.
37:55 Michael Droettboom: So theoretically, we could get, we could put that there as well and then we could send it Cython code and hope, get something back and maybe run that right away. Like, that's a little bit, getting into crazy territory maybe but who knows.
38:09 Michael Kennedy: Yeah, well it's turtles all the way down, or WebAssembly's all the way down. Or something like that. Right?
38:12 Michael Droettboom: Right, right.
38:13 Michael Kennedy: Yeah, interesting. Both having all of this experience and expertise in Rust and Rust being so built for WebAssembly or being so well paired with WebAssembly and then also some projects trying to do, say CPython's runtime in Rust. I guess the question is, rambling around, is to say like, if I take Rust and like rethink CPython what do you think the possibilities are there in this context?
38:43 Michael Droettboom: Yeah I mean, I think what's exciting about Rust is because it's a newer technology unlike C, it's a lot easier to get into WebAssembly because there's just not as much baggage. I mean, that's kind of Rust's advantage on WebAssembly. Plus there's the fact that I think Rust and WebAssembly both coming out of Mozilla with a lot of people overlapping between those two communities has really helped that make that story much smoother. You know, for example, if I was going to write something from scratch that I wanted to run in WebAssembly, I would absolutely reach for Rust and not C in this day and age, it just is going to an easier experience. So like you say, there's a project, maybe more than one I don't know, that to rewrite the C and the Python interpreter in Rust, right. So then that would help us build something like Pyodide a lot easier because it's in Rust, we don't have to deal with a lot of the sort niggly details we've had to deal with, with C.
39:37 Michael Kennedy: Sure.
39:37 Michael Droettboom: My worry there is historically whenever people write an alternative Python interpreter, it's really hard for it to catch on because there's so much catch up to do. If they can, I think this sort of uphill battle for that project and I think I've read about the project and the author is like, clearly doing it for fun and there's no...
39:57 Michael Kennedy: Exactly.
39:59 Michael Droettboom: You don't have to have a better reason than for fun and I'm not trying to say that, that's not a valid reason. But like, I think for that to kind of take over from the CPython interpreter, which I know is not a goal, it would have to like, convince that community that's currently maintaining the CPython interpreter that Rust is going to be a better way forward. If they can do that and sort of replace it and become the leader, that would be an amazing outcome but I think that's a real, uphill struggle.
40:25 Michael Kennedy: It would be an interesting outcome, but certainly with Rust being so new and C being so, such a stall word, right. Like it's the foundation of so many things. It would be a interesting conversation, for sure. This portion of Talk Python To Me is brought to you by Microsoft and Azure Pipelines. Azure Pipelines is a CI/CD service that supports Windows, Linux and Mac. It let's you run automatic builds and tests of your Python code on each commit or a pullrequest. It is fully integrated with GitHub and it let's you define your continuous integration and delivery of pipelines with a simple yenma file. Azure Pipelines is free for individuals and small teams. If you're maintaining an open source project you'll even get unlimited build minutes and 10 concurrent pipelines. Many Python projects are already using Azure Pipelines so get started for free at talkpython.fm/microsoft. I guess what I'm thinking also is like, it's interesting that people have these, hey I want to learn Rust and let me do that by trying to rewrite CPython in Rust. These are interesting, like you say, super good goals and people, I'm sure are getting a lot out of it. But I'm more thinking of like, what if you rethought what it meant to be the run time for Python, specifically optimized for WebAssembly. You know what I mean? You try to make it like 99% compatible and so you can bring in C libraries and stuff like you can through a but is there an opportunity to like, truly rethink CPython, not just rewrite it with different Syntax and compilers?
41:57 Michael Droettboom: Yeah that's a really good question.
41:59 Michael Kennedy: I don't know the answer. Just enough to get you to think about.
42:03 Michael Droettboom: Yeah, I think one of the things that a lot of various projects I've kind of worked on, rethinking the Python interpreter has sort of been, what if we can assume that there's a really good Jit around, right. That's kind of what PyPy is and like, Iron Python when they built it on top of the .NET CLR and those sorts of things. They sort of go, if we have a really good JIT around, what can we do. I think a lot of times they run up against the sort of really dynamic corner cases of the Python language that are really hard to deal with there. And you end up with something that's slightly different from Python. That to me always feels like where it gets a little stuck.
42:41 Michael Kennedy: Right.
42:41 Michael Droettboom: What if there's a way of unsticking that, like unfortunately the community has gone through this big transition from Python two to three already and I don't know if there's a lot of appetite for another Python that would be slightly incompatible.
42:53 Michael Kennedy: Exactly.
42:55 Michael Droettboom: If you could live with something that's slightly incompatible, I think there's a lot of ways you could make it more performant with some of these newer technologies.
43:39 Michael Droettboom: Yeah, absolutely.
43:39 Michael Kennedy: Yeah and the file size doesn't matter in these offline electron apps and stuff because you're already downloading like a 60 MG Chrome binary like, what's another 10 MGs that's just zipped up in there anyway, right?
43:52 Michael Droettboom: Right, exactly.
43:54 Michael Kennedy: So who knows, maybe that's a interesting corner to explore. Not necessarily for you but for like, anyone interested, right?
44:00 Michael Droettboom: Yeah, absolutely.
44:00 Michael Kennedy: Yeah, cool. So I guess maybe, tell us a little bit about where things are going like, where are you and what's the future plans?
44:08 Michael Droettboom: There's a bunch of things that don't work that we'd like to work on. So like, currently we don't support threading. Because when we started the project, WebAssembly didn't support threading. Now WebAssembly does, so it'd be good to go back and kind of build on top of that.
44:22 Michael Kennedy: Is that true operating system threads? Or is that some kind of like preemptive thread. Like, what is threading in WebAssembly mean?
44:29 Michael Droettboom: It's based on the web worker technology in browsers. My understanding is that they are sort of true separate operating system threads and they're probably even more isolated than you would think of with threads.
44:41 Michael Kennedy: Right.
44:44 Michael Kennedy: Right, they could kind of message passing and that's all they get for data sharing and what not.
44:47 Michael Droettboom: Exactly, exactly. So you can take advantage of that in WebAssembly now and pass things between web workers. And there should be a way to kind of hopefully build the Python threading API on top of that.
44:59 Michael Kennedy: Right, it sounds like Python's multi-processing more than Python threading.
45:04 Michael Droettboom: That's probably accurate.
45:04 Michael Kennedy: Yeah, but still like it's, it's cool that it would be some parallelism you could do, regardless of how that happens, right.
45:11 Michael Droettboom: Exactly, exactly. Another big sticking point is networking is obviously very different for Pyodide than it is for native Python, largely because of the sandbox, right. You can't just open up a Unix socket and start writing things to it because that would be a big security hole. So what that means is a lot of the libraries that come in the Python data science community, like Pandas, they have ways of fetching things over the network. And those don't currently work. Because they try to use, open a socket and they fail.
45:41 Michael Kennedy: I see.
46:00 Michael Kennedy: Right. Axios or something nice, yeah.
46:01 Michael Droettboom: Yeah, exactly.
46:04 Michael Kennedy: Okay, yeah. So like if I installed, imported requests and try to use that for example, that might not work?
46:10 Michael Droettboom: That's definitely not going to work, yeah.
46:12 Michael Kennedy: Okay, cool. All right so what else? You talked about this like conda-forge like, distributed, build integration thing, that's pretty cool. What else?
47:23 Michael Kennedy: Right.
47:23 Michael Droettboom: And then again, building on top of work already been done and not have to build it ourselves. But there's a lot of details there. The other thing that's sort of exciting for Arrow, for us is the sort of industry standard way to bring data in for data science computation is still the comma-separated value format, right. They're not terribly efficient to read. They're not very space efficient or memory efficient. Whereas Apache Arrow provides this sort of nice, tight binary format that we could use. And that would actually allow us to sort of shove more data into the browser which is pretty memory limited to begin with. So anything that will let us kind of get away from CSV's is also on our roadmap.
48:02 Michael Kennedy: Yeah. That, just parsing all those strings is going to be a slow thing wherever.
48:06 Michael Droettboom: Yeah. And most of the libraries that do it don't, they assume tons and tons of memory, so they don't necessarily do it in the most efficient way possible. They don't necessarily stream it. They might copy the whole thing and then you know, so.
48:20 Michael Kennedy: Right. Okay, yeah. That sounds pretty cool. And then I was looking around the site, I found like a pretty cool demo notebook that people can try out. I guess you know, there's that and I'll put a link to that in it. What else do you recommend for people, just trying to play around with it?
48:33 Michael Droettboom: There's a demo notebook that kind of goes through the language features that works also kind of as a tutorial for how to get started. But then also linked on the blog post from last week and maybe we can link to that blog post. In there, there's a bunch of other demo notebooks that kind of do more real world, cool things. Putting stuff together. And like you say, the call data one is pretty fun. There's another one that I used at Mozilla internally for figuring out how to time things in Firefox. It's kind of fun. So if you do to iodide.io all the notebooks that anybody has created on that public website, they're all public and so you can just kind of browse through there and see what interesting things other people are doing.
49:11 Michael Kennedy: Yeah cool there's always interesting stuff happening on the data science space. Yeah cool. One thing that I just want to give a shout out to, I don't know if you've even looked at it or something, I became aware of it basically like a week ago is, wasmer. So, WASM is often the extension for WebAssembly. So WAS you're like WebAssembly all right. So, do you, are you familiar with this project?
49:30 Michael Droettboom: Yeah, yeah I am.
50:14 Michael Droettboom: Yeah what excites me about it actually, is it's going to make languages that aren't C a lot easier to integrate in Python. So like, Rust for example. There is a way to integrate Rust in Python that's actually pretty good and works really well. But if the story was compile whatever you have to WebAssembly and we can get to it from Python, I think that makes it a lot easier to have things written in whatever language is the most convenient, either the most you know, at hand. And what also is potentially exciting from the Pyodide point of view is if this causes there to be a big sort of community of WASM packages that work with Python we can use those in Pyodide basically for free.
50:52 Michael Kennedy: Right, yeah exactly. It just grows the pie for everyone, yeah.
50:55 Michael Droettboom: Exactly, yeah.
51:13 Michael Droettboom: Yeah absolutely.
51:14 Michael Kennedy: Yeah, I guess if I'm throwing out other stuff that's just kind of random, like to deal with. Here, one more I'll throw out that's really, really interesting, I don't know if you've heard of this one at all but Blazer, have you heard of that?
51:24 Michael Droettboom: Yes this is the...
51:24 Michael Kennedy: The C# one.
51:26 Michael Droettboom: The C#, yeah.
51:26 Michael Kennedy: Yeah so they had a totally different take, but they've gotten the .NET runtime, the CLR, all that stuff, running a WebAssembly and now you can do C# in the browser. Their take was to build an Angular.js-like framework that let's you write front-end code in C# and then run it in the browser. I don't know if that's a good idea or not. But it's you know, it's kind of the other half of the story I think, for Python, right, like right now, we've got the data science, like with your work going really well. But there's no story around like, what would I use in Python instead of Vue or Angular. Not necessarily saying those are bad or you should. But like, you could. You know, C# is showing the way like, on that side of the story.
52:05 Michael Droettboom: Yeah I think, there's a real advantage to having your implementation and your back-end and your front-end being the same language. I think that's kind of what node has proven.
52:16 Michael Kennedy: Right, there's definitely an appeal there, yeah.
52:17 Michael Droettboom: Yeah and so I think, for Blazer it's the same thing if you're a shop that's done your back-end in C# for a long time, well now you can have your front-end in it, too. That's really nice. One of the things that from talking to some Jupyter developers, one of the things they're really excited about with Pyodide is now they could potentially start to have some of their front-end stuff written in Python, as well as their back-end that's currently in Python. So...
52:41 Michael Kennedy: Right.
52:42 Michael Droettboom: Because right now the world they live in is, there's sort of these arbitrary lines that get drawn between what you would write in one language versus another. And they're not always the right thing. And sometimes you have to write the same thing in two different languages just so that you can put it in both places.
52:56 Michael Kennedy: Right validation or something, yeah.
52:58 Michael Droettboom: And getting rid of those, yeah validation or even with Jupyter it's certain kinds of computation that they need in a widget as well as in the back-end and they need to match. But one...
53:08 Michael Kennedy: That's not fun.
53:10 Michael Droettboom: Python, it's not fun. And it's just sort of this arbitrary speed bump that gets created because of the world we live in. But if you imagine a world where all languages run everywhere, suddenly, hopefully, you're doing less work.
53:23 Michael Kennedy: Yeah, yeah absolutely. And you could reuse stuff in a context where you wouldn't, like, until your work, it would've been kind of insane to say, well let's reuse NumPy in the browser, right like... But now, these doors are open so it just creates more synergy, I think. It's pretty awesome. All right well I think we're pretty much out of time there but definitely a fun conversation.
53:43 Michael Droettboom: Cool.
53:43 Michael Kennedy: I think the future is bright what do you think?
53:44 Michael Droettboom: Yeah, absolutely. I'm really excited about all this stuff.
53:47 Michael Kennedy: Yeah same. All right, now before I let you out here, let me ask you the two final questions, I kind of think I can guess this first one. 'Cause of the way you opened this whole show but favorite editor for writing Python code?
53:58 Michael Droettboom: Yeah, so I use Emacs but I actually use SpaceMax. So it's kind of this like, weird Emacs Vi-hybrid. But I find it works for me, so.
54:07 Michael Kennedy: Yeah, right on. Very cool and then notable PyPI package?
54:10 Michael Droettboom: Oh gosh, I mean, the one I'm most familiar with is Matplotlib 'cause I worked on it for years and years. And you know if you're not familiar with it, go check it out. It's the kitchen sink of plotting for Python.
54:21 Michael Kennedy: Yeah it absolutely is. And you know, something exciting and like, both silly, but also kind of real since, is I saw, now xkcd-style plots have come to Matplotlib. Did you see that?
54:30 Michael Droettboom: Oh yeah. I implemented that, actually.
54:34 Michael Kennedy: You did? How awesome! Yeah that's... How hard was that?
54:37 Michael Droettboom: Eh, not too bad. Strangely the infrastructure that was already there kind of made it easier than it might have been, so.
54:45 Michael Kennedy: Yeah, it looks like the sort of cartoon-y hand-drawn, like, plots and stuff but it actually looks really hard to do because it's imprecise and it has these imperfections, right. It seems like it would be hard to tell a computer to be imprecise in like a human way, but well done. And that looks great, so fun.
55:03 Michael Droettboom: Oh, thanks.
55:03 Michael Kennedy: Yeah, absolutely. Awesome, all right, final call-to-action, people are excited about Iodide, Pyodide, they want to check it out, maybe contribute. What you got for 'em?
55:11 Michael Droettboom: Yeah. Check out iodide.io where you can check out all the notebooks that people have created. And then we have a GitHub site at github/iodideproject/pyodide.
55:18 Michael Kennedy: Nice and I'll put those links in the show notes. Are you looking for contributors or people working on any or is it kind of still an internal project at the moment?
55:29 Michael Droettboom: We're definitely looking for contributors. Find us on Gitter if you have any great ideas. We'd be glad to help you make them reality.
55:36 Michael Kennedy: Super cool. All right well I am very thrilled to see you all working on getting this take on Python in a browser. I think the more attempts that we have here, the better, and it's an exciting time and I think it'll take off.
55:49 Michael Droettboom: Thanks a lot! It was fun talking to you.
55:50 Michael Kennedy: Yeah, you as well. Thanks for being on the show, bye!
55:52 Michael Droettboom: All right, take care, bye.
55:54 Michael Kennedy: This has been another episode of Talk Python To Me. Our guest on this episode was Michael Droettboom and it's been brought to you by, Microsoft. If you're a Python developer, Microsoft has you covered. From VS Code and their modern editor plug-ins to Azure Pipelines for continuous integration and server-less Python functions on Azure. Check them out at talkpython.fm/microsoft. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 apps course. Or if you're looking for something more advanced, check out our new Async course, that digs into all the different types of Async programming you can do in Python and of course if you're interested in more than one of these be sure to check out our, Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python, we should be right at the top. You can also find the iTunes feed at /itunes. The Google Play feed at /play and the Direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening, I really appreciate it. Now get out there and write some Python code.