Learn Python with Talk Python's Python courses


« Return to show page

Transcript for Episode #256:
Click to run your notebook with Binder

Recorded on Monday, Feb 24, 2020.

00:00 KENNEDY: Have you come across a get a repo with a Jupyter notebook that has a "Run in Binder" button? it seems magical. How does it know what dependencies and external libraries you might need and where does it run anyway? Like all technology, it's not magic. It's the result of hard work by people behind the project. In this case, mybinder.org. On this upside, you'll meet Tim Head, who has been working to bring Binder to us all. Take a look inside mybinder.org, how it works in the history of the project.

00:27 KENNEDY: This is Talk Python To Me Episode 256 recorded february 20th 2020.

00:44 KENNEDY: Welcome to Talk Python To Me, a weekly

00:48 KENNEDY: podcast on Python. The language, the library's the Ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show and listen to past episodes at talkPython.fm and follow the show on Twitter via @TalkPython. This episode is brought to you by Linode and Talk Python Training. Be sure to check out what the offers are for both of these segments, it really helps support the show.

01:13 KENNEDY: Tim. Welcome to Talk Python To Me,

01:15 HEAD: Hi Mike

01:16 KENNEDY: It's great to have you on the show, and I'm really looking forward to learning more about Binder.

01:19 HEAD: It's fantastic to be on the show and to talk to you about Binder. What we do, how it came to be,

01:26 HEAD: hopefully how it will continue forever in the future

01:29 KENNEDY: I have. So it's definitely going to continue in some sense, in no matter what in this you know where GitHub is taking all of the public repositories and encoding them on tape and putting them in like some vault in some Nordic country. I can't remember exactly where, but it's probably already been archived there for the world, so it's definitely gonna continue, but I hope it continues actively as well.

01:55 HEAD: Yeah, that sounds good.

01:57 HEAD: We have, ah, contributor who lives in Norway. So maybe we should try and organize a trip to wherever this vault is.

02:06 KENNEDY: Yes, exactly! I can't remember where it is, which country it's in. But yeah, it's it's up there. It's gotta be near Norway pretty cool right now.

02:14 KENNEDY: I'm definitely looking forward to talking about binder and learning more and a lot of the behind the scenes stuff. But before we get to all that, let's just start with your story. How'd you get into programming? And by then.

02:23 HEAD: So when I was a teenager, I wanted to do what everybody 20 years ago wanted to do is build websites. And at the time. My dad told me, Oh, yeah, maybe you should check out Python.

02:39 HEAD: And then they had this fantastic Web server called Zope. And then I learned how to use that and make forum software and other like, little websites like that. And that's how I got into Python and, If you wanted to be mean, never really learned any other programming languages.

02:59 KENNEDY: If you don't have to, you know, is I think there's value in knowing other programming languages. But people who know Python are in a bit of, ah, special place because it's so widely used and accepted, you're not forced to go learn something else necessarily other than JavaScript. Everybody's forced to learn javascript. If you want to do anything on the Web.

03:17 HEAD: Yeah, that's right. So in a little bit of javascript.

03:20 HEAD: Ah, I was, ah, physics, physics student at university. Then work that CERN as well as a physicist. And there we write a lot of C++. So staring at assembly Code is the other thing that I do.

03:37 KENNEDY: Are you still working CERN.

03:39 HEAD: Oh, no, I don't. I'm not an academic anymore.

03:42 KENNEDY: You've hung up your tweed jacket and your pipe.

03:46 KENNEDY: Yeah, actually, I'm not sure if you get those before you become a professor,

03:51 KENNEDY: Maybe only when you get tenure, maybe full professor. You get the hat at the end .

03:57 HEAD: Yes. I think three or four years ago, I left academia and thought there was no better thing to do in your life than be unemployed and moved to Switzerland the same day.

04:10 HEAD: And I started my own small consulting company around data science, machine learning that kind of stuff on and today I work for a small company in Zurich that is called Scribble. So, like you scribble on a piece of paper and we do Electronic signatures. So if you need to sign documents which require legal signatures, you should come to us.

04:38 KENNEDY: Okay? Yeah. Super cool. What was it like working at CERN?

04:41 HEAD: Oh, it was one of the best places I've worked in my life. There's a reason the competition to work there is so crazy.

04:49 KENNEDY: Yeah, it seems like a really amazing place. And it's one of the true cutting edge places where science is happening these days. And they also didn't know set the atmosphere on fire or create a black hole or anything like that. So it was all good?

05:03 HEAD: Yeah. It's a great place to work on and I really, really enjoyed it, but at some point decided it wasn't what I wanted to keep doing. and I, uh, so I switched to do something else.

05:16 KENNEDY: Yeah. You know, I said that I was in academics for a while, working in math, and I really loved it. But in the end, I felt like I could just have more impact and just do more interesting concrete stuff with programming than I could with, you know, just math ideas. Yeah, I've been there. Been there with you. Now let's start this conversation at the high level before we dive down into the binder, which is a tool for working with and hosting notebooks in a special way.

05:46 KENNEDY: Hopefully, I got that roughly right,

05:49 HEAD: yeah, that's about right.

05:50 KENNEDY: Cool. But let's start at a high level, just like what's the state of notebooks today? Now, just a little background. If you don't know I do more on the web side, database side, random utility side. Not as much data science, right? Although I do you pay attention. It's not where I like live day today, So I'm genuinely asking, like, What's the state of notebooks today?

06:15 HEAD: I think is more exciting than it's ever been. Where notebook started and you know you can discuss on who invented them, But for a long time it was a tool for describing what you were doing to a human on to the computer simultaneously. You have the narrative text for humans and the computer text code for the computer. And now people are like going crazy. Now there's people who are trying to automatically turn notebooks into Web applications. People are wanting to run my only small parts of their notebook and make it look like it's within a normal editor, and it's like a text file kind of thing. How can we make a notebook which ...

07:06 KENNEDY: Right. both PyCharm and VS code have that kind of flavor, right? There's like a special comment that separates the cells. But you're kind of in a text file in one of their views, at least

07:17 HEAD: exactly Yeah. I don't use it very much, so I'm not sure. But I think you can then just highlight lines and say, Run these lines in my file,

07:27 KENNEDY: right

07:29 HEAD: Yeah, You can imagine doing all sorts of completely crazy stuff that way.

07:33 KENNEDY: Yeah, it's pretty exciting. And there's a lot of online hosted places as well, Right? Google and Azure both have ways in which you can log into your notebooks in the make cost money. They may be free. It depends, right?

07:46 HEAD: Yeah. So there's, uh, Google Collab, Azure Notebooks. Various startups offer it as a service. There's a service called CoCal that offers hosted notebooks so that there's a lot off people out there who do it and, mybinder.org is one of them.

08:08 KENNEDY: Okay, I guess it's probably a good time to talk about Binder and, like, where does it fit in this world.

08:14 HEAD: So Binder, what we offer people is that you send a link to somebody else and they click on it. and I, if you did everything right, they can just run whatever code you wanted them to run. Example, you wanted them to try out, It should just work. And it runs in their browser so they can do it from the tablet from the phone. If they want to use a laptop, they can use a laptop as well, but they don't have to install anything. Do anything like this and...

08:45 HEAD: In that sense, it's just another notebook hosting service. The twist is that you cannot create an account. There's no way for you to make an account with us. Okay? That means you can just send the link to everybody and it will just start, which I think is fantastic. So people use it a lot, for example, for workshops because everybody knows

09:11 HEAD: that tutorial or workshop they went to where they spent the first half of the workshop trying to get everybody set up right on and, with service like my binder.org you can send them all the link. And if the class is not ginormous, they follow the link, and then they have something that they can use for the workshop.

09:31 HEAD: Then you can spend maybe the end of the workshop teaching them how to set it up locally. But you can get going.

09:37 KENNEDY: And binder is something that you could set up locally. Or maybe they would focus more just on. Here's how you set up this particular project with these notebooks. In these requirements, I get, I guess, either, huh? No, but more likely the ladder.

09:50 HEAD: The answer is, it depends what you want to do. Imagine. how would I set up something that I can use for my course? I would create a GitHub repository. And I would put, for example, a `requirements.txt` in it.

10:04 HEAD: And for human instructions, I would write "Please run `pip install -r requirements.txt" and this is such a common pattern that by default, or we built a tool that goes looking for these files and says, "Hi, I found a requirements.txt. I bet you, the author of this repo, wanted us to run pip install -r requirements.txt" and in an overwhelming number of cases, that is what what I wanted us to do that that's what we automate. The end goal is that we didn't invent any new way of setting up your software. We try to just spot the patterns of what what lots of people are already doing That makes it easy to automate. But it means also it should be fairly straightforward to then do it locally,

10:58 KENNEDY: right, because long as you follow the same basic patterns as was intended anyway, then you're good, but you get to avoid all of the challenge of Oh, wait, I don't have Python or when I type Python It says. That print has these weird semicolon error or what these parentheses? Some weird things like print does not have function, or I don't know whatever is going to say because that they have Python2, and they didn't know that when they type Python, they got the wrong Python. But they don't have permission to install the right Python, and they're just like, yeah, there's just layers of these challenges,

11:33 KENNEDY: and the problem is often that's the first thing to hit people, right? Like in a couple of weeks. They're probably fine to be addressing these issues ago. Yeah, I understand what's happening. I'm gonna fix this, but that's not what you want to be. Welcome to Python. Here's your four problems to struggle through and set up, right?

11:50 HEAD: Exactly. And it takes an infinite or seemingly infinite amount of time to debug all these problems. If you have a group of 30 people, I guarantee there's going to be people there. You have a problem you've never seen before. Now is the person trying to run the course. You're like, Oh my God, we're going to do some live debugging. That's not what I wanted to show.

12:11 KENNEDY: Yeah, exactly. It's the way that a lot of training sessions begin or a lot of workshops begin. It's definitely not a good impression for the students, and it's certainly not productive time. So this is really nice, right? So you can set this up. Typically, I don't know. The other workshops and trainings I did all have GitHub repos, and it's like, Okay, here's your GitHub repo for the stuff we're giving you. Here's the repo. Here's the part of the repo where it has the code that you're given. Here's the code where I'm going to write and I'll put in there later.

12:40 KENNEDY: Get started. And that's a perfect fit for binder right?

12:42 HEAD: Exactly yeah. We started. A lot of people who work on Binder come from the project Jupyter ecosystem, but there's a strong focus on notebooks. But now there are examples where you replace the UI of notebooks with VSCode because VSCode is also just a Web app.

13:05 HEAD: Actually, you can start if you wanted to. You could start a repo, which to the user's presenters of it's just VSCode in the Cloud somewhere.

13:15 KENNEDY: Oh, no kidding. Now I knew VSCode did that. I knew that. That's something they have announced under the term visual studio online and also coder.com was doing that.

13:28 KENNEDY: But how do you guys do that? Like, I guess VSCode will let you set that up for your own environment, not just for theirs. Is that how it works?

13:37 HEAD: So we actually take advantage off the very hard work coder.com. I think. I'll give them a shout out because I think they're the people who did all the work for the VSCode port or modifications. I'm not sure what you call it that if it does run as the web app.

13:58 KENNEDY: Right? Right. Coder.com is definitely the first experience that I've seen where that was happening. Yeah,

14:05 HEAD: And they publish ther're hard work as open source. We allow people to install that or trigger that being installed in their binder. And then you can you can use that as an alternative UI.

14:21 KENNEDY: So that was one of the things that I was thinking of here as we were exploring, like some of the use cases and how this is interesting, useful, right? So it makes it on a sense that this host notebooks because Jupyter and Jupyter lab their web app anyway. So you know, it's just a matter of where the server executes its colonel and whatnot, right? So that's pretty easy to dio relatively speaking. But if it's pure Python code, it's a different kind of challenge, right? Because then how do you edit those files? How do you interact with them? How do you get a terminal into the environment.

14:55 KENNEDY: It's a lot more multi-step, I feel like. But if you could fire up like a visual studio coded in the cloud instance, right that it's got the little terminal that terminal runs inside the container that's hosting it and and so on. I I think that's pretty awesome. That really expands the appeal of Binder, I think.

15:14 HEAD: Yeah, you get a lot of the language server extension the full. Like what is it Hints how to complete the function?

15:29 KENNEDY: I think automcomplete?

15:30 HEAD: All this stuff. If you know how to install the plug ins or enable the plug ins you need,

15:37 HEAD: then you get all the good stuff.

15:41 KENNEDY: This portion of Talk Python to Me is brought to you by Linode.Whether you're working on a personal project or managing your enterprises infrastructure, Linode has the pricing support and scale that you need to take your project to the next level. With 11 data centers worldwide, including their newest data center in Sydney, Australia, Enterprise grade hardware S3 compatible storage and the Next Generation Network, Linode the performance that you expect at a price that you don't get started on the note today with a $20 credit and you get access to native SSD Storage. A 40Gb network industry leading processors. Their revamped cloud manager a cloud, not leno dot com Root access to your server along with their newest API and a Python cli Just visit talkpython.fm/linode when creating a new low note account and you automatically get $20 credit for your next project. Oh, and one last thing they're hiring go to linode.com/careers to find out more, Let him know that we sent you.

16:41 KENNEDY: Did you set that up as part of as, Ah, project owner. When you create your binder set up, can you say this is gonna be a Python plus JavaScript thing. So we're going to make sure those extensions come along for the ride.

16:53 HEAD: Yeah, So, as the owner of the get repo, you specify that you would also like to have, uh, I think it's called _code server_ executable. You would like to have that installed as well,

17:07 HEAD: and then you former the link you share with people slightly differently so that we know to send you to this other UI. Then you're You need to also see if you know which plug ins and so on to install. Then during the building phase off your binder, you have to a few lines that we can execute to do all that stuff.

17:29 KENNEDY: Sure. OK, well, that sounds very flexible. And so on. I want to talk about getting the launch binder launched this in binder on your GitHub and all the details of how you do this. But really, just since we're still going to begin, let me just ask you real quick. How'd you get started with Binder and this whole project, anyway.

17:45 HEAD: that's a good link back to my previous life at CERN. Where as a student and as a post doc also, we write so much software to analyze our data, and it does fairly complicated stuff. So when you see somebody else doing something cool, you're like, Oh, I wish I I could do that as well. But if you ask anybody in particle physics or at CERN to share their code and then it only takes you, let's say a week to get running, doing what they were doing? Everybody's

18:20 KENNEDY: Compile that from source. And then you do this. You know, you got to get this right version of the header.

18:25 HEAD: Yeah, and if it only takes a week. People are super impressed and super happy. But for me and a bunch of my friends, it was just We were like, No, this has to take, like, an hour, maybe. Too long if we just spend a Week to figure out how to run it before we start modifying its too long.

18:45 HEAD: So at some point there, we decided, is to we need to let's do something about it or shut up Complaining.

18:53 KENNEDY: Yeah, that's kind of my philosophy as well is like, It's fine to complain and criticize, but it should come with a But you should do this and said, Oh, here's how we're making it better, right? That's great.

19:03 HEAD: Uh, so fortunately. There's a yearly hackathon hosted at CERN

19:09 HEAD: called the Web Fest. We said, Okay, Gonna take three days to try and build something, and at least people have tried.

19:17 HEAD: So we started on and. The idea was to build something very similar to what we now call binder

19:25 HEAD: What was exciting is that on the Saturday So it's a weekend event On the Saturday morning or afternoon, I read an email from, but Jeremy Freeman, who is the inventor or creator of mybinder.org announcing mybinder.org. I was like, "that sounds very much like what we're trying to do here." So from day one, that was Yeah, it was exciting to see that other people are trying to do the same thing. You are for me and then took a detour, just doing some more science and lost my my connection to this.

20:00 HEAD: And I think it was three years ago. 2-3 years ago, at PyCon somebody said, "we're trying to restart mybinder.org."

20:12 HEAD: We've seen you did something similar do you want to be involved.

20:15 HEAD: Since then, I've not managed to escape.

20:19 KENNEDY: They pulled you back in. Wonderful

20:21 KENNEDY: Cool. Well, yeah, I mean, that totally makes sense. It sort of fits with that overall theme of reproducible science and that kind of stuff, right?

20:30 KENNEDY: Yeah. If you can't create the environment to re-run or play or experiment with the code, well, that's a pretty bad shot against reproduce ability Yes,

20:42 KENNEDY: for sure. All right, so the way that I learned about Binder is I would go to various GitHub repos, and I would see a little launch binder badge right They get have as all these little cool, I guess it's not really GitHub like you could just put all these little badges on top of your README that appear.

21:02 KENNEDY: I guess it's a convention of GitHub. It doesn't come from GitHub.

21:05 KENNEDY: And it might say this thing requires Python 3.5 or above, or the continuous integration is working or there's 94% code coverage. Or, it could say, click here to launch this project in Binder. Now, to me, that was always really impressive because you know, it's not enough to just take the code from the repository or the notebook from the depository and just run it right. As you described, there's all these requirements potentially right. Like if I'm

21:34 KENNEDY: using ScyPy,

21:35 KENNEDY: you know, maybe that's not installed or, more likely, using some edge package that nobody knows about it. You need a certain version or whatever, right? There's a lot of specific requirements that have to happen. So to me, its just kind of like magic. You go. You can click on this button and then this thing runs. But I have no idea how it might run because it seems like the environment where is going to run should be so specific. So how does this happen, Right? And I guess, yeah, give us the rundown on how that happens. There's a pretty good little UI that you can get started with over at mybinder.org that talks you through it. But let's talk about what happened.

22:13 HEAD: So we use a little tool we built called _Repo-to-Docker_, and it does exactly what the name suggests. It will take your repositories on and. We'll look at it and say, If there's an environment that yaml, it will go "Okay, we need to install Python".

22:33 HEAD: Well, we need to install Conda and then we need to run whatever the Condo command is to get all the stuff which is listed in the environment.yaml installed on, that really is 80% of what repo-to-docker does is it will look for these very well known files on the Python world. Requirements.txt., environment.yaml. Setup.py.

22:59 KENNEDY: Does it know, like Pipfile and PyProject.toml All on those things?

23:04 HEAD: We have some support for both of those as well. It doesn't seem to be used as much. So I would say it's something that could do with more love.

23:15 KENNEDY: sure. Okay.

23:16 HEAD: And then we have the same for the our community. They have a whole bunch of magic files that the community is agreed on using as a format for specifying things to Julia community.

23:28 KENNEDY: Because the notebooks will support a bunch of different kernels these days and a bunch of different language and runtimes. And so I guess one of the questions is, When I create a Python project. I don't typically tell it "you're a Python project". I mean, I guess the closest I ever get to that is I will have the gitgnore, be the Python default, gitignore. But other than that, when I create a new project, I don't tell it. "It's Python" I just put Python files in there, but I might also put javascript files or less files or other things that could confuse you guys. So how do you when you look at one of these projects? No. Oh, this one's Python. That one's Julia.

23:28 HEAD: We don't try and get that so much as we have a list of files that we recognized. So, like,

23:28 HEAD: yeah, requirements.txt And when we see that we now add like the command install Conda.

23:28 HEAD: Add the commands to install these git packages to a Docker file on we build a docker container for you.

23:28 KENNEDY: I see. So, for example, if you see, like a requirements.txt you're like "aha Python!"

23:28 HEAD: Exactly. We don't try and look at the files in the report in general and try and deduce. "Okay, there's 80% Python files and 20% Javascript. Let's install node and Python 3.7."

23:28 KENNEDY: Yeah, it always makes me I frustrated or something when I see that the GitHub estimation of what the project is, you know you can, like, click down the languages in a little show like you know, 70% this 20% that because sometimes there's just some huge file right, like maybe I have one notebook, but there's like a ton of output in that notebook. But it's otherwise pure Python. Straight files, not people I files. And it'll say this is like 80% Jupyter. You know

23:28 KENNEDY: more like outfit from Jupyter that you're counting or or all decide. You know, I wantto bundle up some node packages and put them in the repo so I don't ever have to worry about them vanishing or changing or ever just put them there and they're fine. And then it becomes a Javascript project because you know, it's got more Javascript than it does Python. But it's not really It's just like the library's right, so I can see that that would be very fraught with air.

23:28 HEAD: Yeah, so instead, the slogan we have is we reward community best practices by automating them. We try and encourage people to just be boring and do the mainstream thing. And in exchange, we can automatically build a Docker container. Well Docker Image for them, which does what they want to do,

23:28 KENNEDY: right? So one of the main things, the way you solve the challenge I started outlining the beginning is like it seems like magic. Why is this thing that seemed like has a bunch dependencies and is very specific. Just run when I click it, it's because you've looked at the depository. You followed these best product. You've sort of guessed it the rest practices and found what you're supposed to dio and then you definfined a Docker file that didn't built an image that then you run. It's a container when you click the button that that sets up that environment, right?

23:28 HEAD: Yeah. And then, on all those users, all you need to bring is something that is that, like a Web browser. And then at the very end, we connect you to it. and off you go.

23:28 KENNEDY: and now you have an environment, so that's pretty interesting. Where does that run?

23:28 HEAD: Well, we're lucky to have four clusters around the world now, which is fantastic. So we rely on people donating resources is to run all these containers for people on. We have one cluster in Google Communities Engine. We have one at OVH, which is a European cloud host. We have one sponsored by the Turing Institute in London on we have one bare metal continuous cluster at a social science research institute in Germany, in Cologne called Geisis.

23:28 HEAD: So they are really at the front. You know, none of this cloud hosting business. We run our own community cluster.

23:28 KENNEDY: you go in there, you pull the cover off of the one you server. And you could just see binder, right?

23:28 HEAD: Yeah. Yeah,

23:28 KENNEDY: that's really cool. Are you looking for more or is it is that enough? Is it okay,

23:28 HEAD: so the our master plan for how to make this sustainable is to encourage more and more research institutes or cloud providers to host a small chunk of mybinder.org. Because that way, each individual contributor doesn't have to pay a cloud bill, which is hundreds of thousands of dollars, but only

23:28 HEAD: maybe $20,000. And if you're

23:28 HEAD: a big research university like MIT or University of Oxford or someone like that spending $20,000 on something which benefits to you because you can use it at your university for your researchers, but also benefits the whole world is still a difficult sell. But it's an easier sell than if we come to you and say, Do you have $200,000 in any currency? No, Yeah, people will just laugh at you and show you the door so that that's the plan, said the short answer to your question. Are we looking for more is yes.

23:28 HEAD: It would be very interesting to add more clusters to this. It should be easy to do technically now that we've gone from 1 to 4,

23:28 HEAD: all right, you see

23:28 KENNEDY: pretty much after you go from 1 to 2, that's the hardest step

23:28 KENNEDY: right and then onward. It's not naturally easy, but it's the going from a very bespoke. It's all matching this to like now. We have multiple environments. Well, that's a pretty big step. I would say

23:28 HEAD: Yeah, that's true. It comes with its own challenges because now you have four times as many things that can go wrong or idiosyncrasies to deal with. But it's great to see that we can run on four so different communities flavors.

23:28 HEAD: I think it's great because it keeps us honest when we say we are not building something which is tied to anyone about provider on their secrets, secret source. So you can really take it and run it at home.

23:28 KENNEDY: right? Yeah, that's the super cool. So basically, if you have admin access, like binder has admin access to a kubernetes cluster. It can create containers and pods and then spin them up and spin them down. And like, it is basically just managing a kubernetes cluster wherever that happens to run.

23:28 HEAD: Okay, that's what we do.

23:28 KENNEDY: Yeah,

23:28 KENNEDY: that sounds very exciting. Actually, it's cool. Now, one of the things That's what mentioning is if I have, ah, GitHub repository and I go through these efforts this minor effort described at mybinder.org to, like, set this up and register it What, not get my little badge so people can run my code there. That's all well and good. But what happens as I make changes to my GitHub repository like, Oh, I need a new version, a new library I'm going to add or we're gonna update this code. But it's gonna require some underlying fundamental change to the environment.

23:28 HEAD: Every time you make, commit to your repository.

23:28 HEAD: When you then launch, we check in our cache. Do we already have an image for this commit of this repo and if the answer is no, we will rebuild it or build it again,

23:28 KENNEDY: Right. They probably you tag it by, git commit hash sha Something like that, right?

23:28 HEAD: Yeah, Exactly. Okay, Um, that's nice. Because any time you change your repositories and add the new dependency, we will just automatically rebuild it next time.

23:28 HEAD: It has the disadvantage that it can take quite a long time to build your repositories, depending on, what, crazy stuff you're trying to do.

23:28 HEAD: How many dependencies you need to install and compare from scratch. It can take a long time to build the Docker image for your for your repositories,

23:28 KENNEDY: right.

23:28 KENNEDY: Do you do anything like So with Docker, you have, like, layers of dockerfiles and dependencies, and it'll cache all of them and only build the changes. Do you do anything blank? Here is the layer that is the dependencies of this project. And here is the layer that actually has the code because the dependencies probably change, infrequently, whereas the code is probably, you know, 10 or 100 times more likely to change, right. And the slow part is not getting the code. The slow parts getting the pip install -r from ubuntu or whatever.

23:28 HEAD: Yeah, so we try and be clever about how we order the layers in the dock,

23:28 KENNEDY: right, cause that's the trick.

23:28 KENNEDY: Do the minimum work when the change happens. Yeah,

23:28 HEAD: yeah. What makes it tricky is that a lot off package managers allow you to do arbitrary stuff. We have a few hopefully enough pieces of code that inspect your requirements are t X t to try and figure out. Are you referring to stuff in the rest of the repo? Yes or no?

23:28 HEAD: If you are, then we need to update the whole rest of the repo before we ran your file. If no, then we can cache. It s Oh, yeah, We try and be clever on that front,

23:28 KENNEDY: right, Because your biggest expense at least I think there's actually a Jupyter notebook that talks about this. So what do we could get into which is fun, but your biggest expenses compute, right? Yes, as opposed to storage or bandwidth

23:28 HEAD: Yeah, we do have a fairly larger Docker container registry.

23:28 HEAD: And the funny story about it is that for a very long time, we paid no attention whatsoever to it. And then one day in this notebook that you were referring. Which shows you how much money we spend on which

23:28 HEAD: line item in our cloud build beside noticing a new one like what it states. Why is it getting bigger? Constantly?

23:28 HEAD: And it turns out that there was,

23:28 HEAD: I think, tens of terabytes of images in our container registry and that is the level at which it starts being noticeable on our charts.

23:28 KENNEDY: Right, Right, right. That's like $20,000 or $10,000. It's something like that. But it's a nontrivial number

23:28 HEAD: So I actually haven't looked a

23:28 KENNEDY: $1000 I think, Yeah,

23:28 HEAD: in quite a while, before we had when we only had two clusters. So that's probably at least half a year ago or so, we would say it would cost betweenen $80k and $100,000 in Google clouds cost to run my mind. The vast majority of that is paying for the virtual machines.

23:28 KENNEDY: Right, right, right.

23:28 KENNEDY: If you're a regular listener of the podcast, you surely heard about Talk Python online courses. But have you had a chance to try them out?

23:28 KENNEDY: No matter the level you're looking for, we have a course for you Our _Python for Absolute Beginners_ is like an introduction to Python, plus that first year computer science course that you never took our data driven Web App courses. Build a full PyPI.org clone along with you right on the screen, and we even have a few courses to dip your toe in with.

23:28 KENNEDY: See what we have to offer at training.talkPython.fm or just click the link in your podcast player.

23:28 KENNEDY: I guess that's worth talking about as well is. So there's no book, which I'll put the link into the show notes. Basically, it's showing you the real time, semi real time you can adjust it. It's a notebook, the costs of over the day over the week and it's a nontrivial number, right? Like I'm on the weekly costs and it's showing, on average, maybe $1000 a week, plus a little bit more. You know, that's I know Open sources is doing well these days, but I gots a lot of money 1000 a week for for commute. So where does that money come from? You talked. You sort of hinted at it before, but where's it come from.

23:28 HEAD: So the particular notebook there only attracts a cost we incur from Google. Or, Google cloud.

23:28 KENNEDY: Which is probably not core but one of the four.

23:28 HEAD: Yeah, it's probably the biggest of the four.

23:28 HEAD: I'll talk about how that bill gets paid. At the very beginning, we were fortunate enough to have a grant from the Moore Foundation, which included a chunk of money to pay for the bill.

23:28 HEAD: And that was several years ago. Now, with I have friends, or benefactors at Google, you so far have been able to justify why Google should give us credit. High enough to pay the bill. So that's fantastic to see.

23:28 KENNEDY: So in some sense Google is donating that compute to the project. Yes, yeah, yeah. Have you considered reaching out to the other big groups, like Azure and AWS? See if they want to be part of that party to, get their name in some little halo effect.

23:28 HEAD: Yeah, So that's how we got the cluster OVH the European cloud hosted. They donate the computer resource is for that cluster on the Turing institute's cluster runs on Azure Cloud. I don't know exactly how the invoicing works there. But I would imagine that at least some in some way it's Ah, in some way it's sponsored or supported by Azure. The fact that the Turing Institute has enough spare money to finance such a we'll donate resource is for such ah, altruistic project.

23:28 KENNEDY: Yeah, okay, it's really cool because it really shows you that, you know, this is it's not free and I mean at the core, right? Somebody. Those images have to get built somewhere. They have to the Docker containers and pods, Kubernetes Pods have to run somewhere, right? And this is an and also probably and I was probably a small percentage of the people that touched the button. But it's still got to be a non trivial amount of real computational stuff happening, right?

23:28 KENNEDY: It's scientific. In large part. I would imagine

23:28 HEAD: so. Luckily for us, most people who start a notebook spend a lot more time thinking and reading than running code. Yeah, so we can cheat and over commit

23:28 HEAD: Our CPUs by quite a large factor,

23:28 KENNEDY: I would think. Yeah,

23:28 HEAD: but yeah, you know, they're not sitting there idling.

23:28 KENNEDY: No, I'm sure Also, this hasn't helped for the Python file and VSCode angle. But it might help for the notebooks. ways, like a lot of those notebooks come pre populated with the last data they had run, right? You don't have to run them to read them initially. Is that true from our set up, or is that just true? For GitHub.

23:28 HEAD: No. So if the notebook in your repositories you has that output still in it from the last time it was run, then we will show it to you because we use standard Jupyter Lab or Jupyter Notebook. UI to show it to you. And so that way you get something to read before you ever try and run code.

23:28 KENNEDY: Yeah. So maybe people actually loaded up and never actually run it a run. They might not run the expensive bits or something. So that's got to help us. Well,

23:28 HEAD: yeah. Oh, then it's a very expensive way to view your notebooks.

23:28 HEAD: It would be good if you just want to read it. And

23:28 KENNEDY: I Just use the GitHub viewer

23:28 HEAD: Yeah, If the GitHub viewer works well, then that's good because that's much cheaper to run than starting off with Docker Container.

23:28 KENNEDY: And, you know, I guess there are definitely benefits that you have on interactivity as well, right? Like GitHub, doesn't It is kind of amazing that you can look at an output of a notebook and it looks like it ran, but it's just the cache version or say version of whatever's before. But, you know, a lot of those have, like, I little widgets with iPy, widgets or whatever. You have to go sliders. You can play with and, like, adjusted and in order to do those types of explorations, even if you're not writing code. But you're kind of exploring parameters. You've gotta have a real life system to do that, right?

23:28 HEAD: Yeah. Yeah. You need some kind of kernel connecter to run the computations that are needed when you slide the slider to 11.

23:28 KENNEDY: I think it's awesome that all these people are donating the compute to the environment and making this available of the entire world. But there might be situations where people want this type of system, but they for some reason, can't put their data publicly out there or they don't want a link to it, or they just want to keep it more control. Is there a way to take what you guys have built and say? Create a minemineallminebinder.org rather than, you know, like an internal version of it. That's not just out there on the public cloud.

23:28 HEAD: Yes. So the software that is behind mybinder.org is called BinderHub like JupyterHub. But binderhub? It’s open source like all other Jupyter project as well. And you can deploy it yourself on your compute with your credit card, and then you can access private repositories. You can limit access to your binder. Instance. So you have to log in instead of right now where it will is completely open to the public.

23:28 KENNEDY: Sure, yeah. So basically, what you have those links, you could just visit it like that's the point. You already emphasized that, right, that you're not even supposed to have an account, which is kind of the opposite. If I need to keep this private

23:28 HEAD: exactly on and Also because we take compute which has been donated to us by people. We say If you have private repositories, you probably also work for somebody who could pick up the bill themselves. So it's a political decision that we've disabled these features for mybinder.org, but you can take the same software on run it yourself on, then talk an access private repos.

23:28 KENNEDY: do. Basically, there's no technical imitation, but right now, in order to give to run one of your repositories on mybinder.org, it has to be a public repository.

23:28 HEAD: That's right,

23:28 KENNEDY: because that's the Zen that you guys are going after. That's the overriding philosophy of what you're doing, right? Yeah. Okay, yeah, that's that's totally reasonable. Now, I guess this repo-to-Docker is also a pretty interesting, potentially on its its own, right? Like it's cool that you guys were using it or capture these environments,

23:28 KENNEDY: So you gonna run them on mybinder.org's. But is that something people would find useful outside of that use case?

23:28 HEAD: Yes and no. There's nothing stopping you. Well, we encourage people, for example, when they so you can run. Repo-to-Docker Locally on your laptop and it will perform its little magic trick just as well as it does online. Right? We encourage people to do that. For example, if they're trying to debug why their binder isn't building or why isn't doing quite what they wanted to do.

23:28 KENNEDY: Oh, interesting. So if things were not working right, you're like this container won't build or it's not finding the files or whatever it is, right? The volumes aren't mapped, right? A porch art map, right? You can play around with repo-to-Docker locally and get a better shot at. Figure that out, huh?

23:28 HEAD: Exactly. Because you can see a lot more off the logs to turn around is potentially much, much faster because they're more powerful machine.

23:28 HEAD: No, you're just You're just more direct contact with it than right if you have to control it by some Web form with log automatically scrolling parts. And so...

23:28 KENNEDY: I'm gonna go do another commit to the repo to force that GitHub Web hook to go off and, like, make it happen again or whatever.

23:28 HEAD: Yeah, exactly. So you can use it locally. The disadvantage is that you probably don't have as big a library off cache docker image layers on your laptop. That report to Docker uses potentially have lots of them already, because Docker is very good at filling up your hard drive,

23:28 KENNEDY: right. It won't clean up any of those cache ones unless you run like a Docker cleanup. I forget the exact command. But yeah, it's It's really easy to fill up your computer with Docker Images that are stale cached results, right?

23:28 HEAD: Exactly, so you can run it locally. I do run it locally. You want in a while to just get the the environment that the author of the repositories had in mind. However, what I'm finding is that if the repositories builds on myBinder.org with repo-to-Docker because it's in some way very simpleminded. What repo-to-Docker recognizes the instructions to do it by hand are also fairly straightforward. So if you're comfortable using Conda,

23:28 HEAD: then it's quicker to make another Conda environment and just do what the author wanted you to do. And because you can try whether it actually works by clicking on the badge in the README and seeing it run on mybinder.org. You’re like yeah, I'm pretty sure the instructions are complete. I'm not going to spend two hours and then find out that I forgot something.

23:28 KENNEDY: Sure, sure, sure. That's pretty cool. I guess it Maybe if it depended on Linux and you are on Mac or something like that. That might be a quick way to jump to jump start a docker container without knowing too much about Docker. But yeah, If it's that kind of standard environment, then there's probably not a huge benefit.

23:28 HEAD: Yeah, so if you're or you're on a different flavor of Linux, so whatever, then yes, automating the build of the Docker container and have it run. Likely. It's great on and is something people do, but it's, I would say, just know, as convenient as clicking the link. So that's why not as many people are doing it.

23:28 KENNEDY: Yeah, for sure, it's it's easy. Can I get a public repository into mybinder.org that didn't decide to be in there. What I mean is, I've got to go through mybinder.org and put in the repo and stuff in general to set us up.

23:28 KENNEDY: Could I just go. This one doesn't have a link. But I'm just gonna, like, give it to mybinder.org anyway and see what happens.

23:28 HEAD: Yes. You don't have to be an author of the repo. Have any special rights on the repo to launch it,

23:28 KENNEDY: right? So with no account enforce its required to be public, right? It's pretty hard to not fill out the form at mybinder.org with the info you want, right?

23:28 HEAD: Exactly. And A lot of projects will just work.

23:28 KENNEDY: Yeah, because of the practices. So you follow, right?

23:28 HEAD: Exactly. Yeah. So then you can open the pull request and at the badge to the README.

23:28 KENNEDY: That's exactly what I was thinking. Yes. So if you find a repo, that should have a _launch In binder_ and yet it doesn’t. A real nice way to make a simple contribution would be to go fill out mybinder.org if it works, fill out a PR with the details required. I mean, you could just edit the read me file

23:28 KENNEDY: on a branch, put the little icon there, and submit that as the PR “like here I was missing this. I fixed it.“

23:28 HEAD: Yep, that's a super contribution for for the project. It's the good way to get started. And they will help whoever comes next to the project because they'll see the badge. And even if they don't yet know what mybinder.org is and they might click it on. Then they can run the examples.

23:28 KENNEDY: Yeah, it seems like a really nice, generous thing to do in super easy local. So we're getting near the end of our time, and I was gonna ask you just a really quick with a high level question, looking back a bit on a conversation in your experience. Over time, people recently have been asking me, Why do you think Python has become so popular in the data science in the scientific computing space over the last five years or so? You know someone who worked at CERN and it's been probably more involved in that than I haven't a lot of ways were your thoughts? Why Python? Why did it become so much more popular recently?

23:28 HEAD: I think the short answer is, I don't know, because I've since I was 15 thought Python is a pretty good language program in,

23:28 KENNEDY: and finally everyone agrees with me.

23:28 HEAD: Exactly. So I don't know what really what happened. I think part of it or why? Why? I think why place and not some other language is

23:28 HEAD: There's somebody. It's ah, quote somebody else told me is you invite people to a job interview and make them write something or do some coding on the white board or something like that on and, they will write, even if they are. I don't know C++ or Julia or whatever. Programmers. Often they will write in something which looks very much like Python on the white board. They will not will not put all the curly brackets. Or if you read Wikipedia often they'll be, uh, about algorithms. There will be a little section and pseudo code, and you can almost copy and paste there, and it will run right? So I think that's a killer feature. In some sense, it makes it just so much more accessible to people who don't think of themselves as programmers. They are they working in insurance company on and they want to do some data science, but they don't think of themselves as programmers, they are actuaries.

23:28 KENNEDY: Right. I think that agrees a lot with my general working theory is that Python is one of these special language is where it's as simple as possible to get started with, like a tiny bit of computation. You don't even have to have a function, you know, Forget compiling, linking headers, static main voids just like three lines I want you to use those libraries, go do this.

23:28 KENNEDY: And so people easily get into it in the beginning. But then, unlike a lot of easy languages, you don't really outgrow it, all right. It's not like, Well, you can't use Python anymore because now we're doing astronomy or whatever I was like. No, it's just you just keep using the libraries and you can keep adding more computer science like concepts, functions and then classes, then generators. Then whatever. I feel like it sucks people in who don't believe, just like you said that there programmers, they believe they're biologists, astronomers, physicists, whatever. But then it's kind of got this. Well, I already know this. It's totally working. Why would I leave and pick up a harder language? Because This is just kind of this gravity. Like, once you get sucked into it, it's your kind of there's no reason to get kicked out of because you could just keep growing with the language and the libraries, as my theory is. Well, sort of. Yeah.

23:28 HEAD: Yeah, I think so, too. And the fact that we can put a nice

23:28 HEAD: user interface in the programming language that you use sense onto big trustee FORTRAN and C++ libraries that have been around to do really heavy lifting, linear algebra and whatnot, I think that is fantastic. If you imagine Python didn't have that feature, then we would be stuck. We'd have to write all this stuff from scratch.

23:28 HEAD: Like with this week can. We can put a nice UI on FORTRAN code and. The other benefits of having well tested FORTRAN code do crazy linear algebra For us.

23:28 KENNEDY: That’s Perfect. but not think about it ever again.

23:28 KENNEDY: Nice. All right, well, yeah, that was definitely a good, insightful answer. I like it all right now, before you get out of here before we call a show, you gotta answer the final two questions. So if you're gonna write some Python code. What editor do use

23:28 HEAD: Today? I use Atom.

23:28 KENNEDY: Okay, We'll call. Atom is neat.

23:28 HEAD: My favorite editor probably is emacs in a console. But at some point, I had to admit that I didn't want to spend as much time as I used to trying to understand how to get Javascript tools to hook into my editor and auto-formatting stuff. So, yes, I use atmost.

23:28 KENNEDY: now. Nice.

23:28 KENNEDY: Very cool. And then notable PyPI, I package maybe not something necessarily super popular, but something like, Oh, I saw this and it was so cool. You should check it out.

23:28 HEAD: Just a think...

23:28 KENNEDY: while you're thinking, let me tell you about one thing I ran across. It's not exactly that, but is actually super cool. And I think it's relevant to the audience who would be interested in this topic as well. So have you Ah, heard about carnets? C A R N E T S. carnets

23:28 HEAD: no.

23:28 KENNEDY: So this is a standalone Jupyter notebook environment that runs on iOS. Not like the browser sees it, but it actually has numPy and SciPy and all that stuff installed and Python executing disconnected on iOS

23:28 KENNEDY: so we could put that one out there for sure. But that and it’s open source. It's cool. People can go and check it out.

23:28 HEAD: It does uncertainty, So maybe yeah is actually just called uncertainties.

23:28 KENNEDY: Nice. That's a good name.

23:28 HEAD: Yeah. So if it lets you put in numbers with uncertainties. On them and then you can do complicated computations on it and it will spit out `3 ± 1.7`.

23:28 KENNEDY: Right? Right, Because if you have plus or minus one but then you square it and then you add this other uncertainty to it, right? Like the

23:28 KENNEDY: the propagation of uncertainty is not obvious. It's It's uncertain, even.

23:28 HEAD: Yeah, it is one of these things where it's no difficult, but it's hard to do, right? Yeah, very. Yeah. You make mistakes. So you

23:28 KENNEDY: have uncertainties.

23:28 KENNEDY: Okay, Uncertainties. Nice. Another one that's kind of in that realm is pint,

23:28 KENNEDY: which lets you do different units and multiplications and divides. And what? It's pretty cool.

23:28 KENNEDY: Okay, Very cool. That's a good good recommendation indeed. Right, Sam? Well, we're about out of time, so maybe a final call to action. People were excited about my binder dot organ and this whole concept that we talked about what do they do?

23:28 HEAD: Try out tell a friend about that is I think on easy thing to do, and you can get it done because it's easy.

23:28 KENNEDY: Yeah, yeah, it looks like you've automated a lot, and it looks really not hard at all to get started with.

23:28 HEAD: And of course, we would be super happy of people, find mistakes or want to add new features if they stop by and may code contributions, make contributions to the documentation or help us deal with the fact that there's so many people who use it. Who have questions. You know you can contribute to my binder, or even what, especially if you can't program. That's actually an asset because there's lots of people who know how to program who have run Mybinder.org.

23:28 HEAD: but ah, explaining how it works, advertising it, giving talks about all that kind of good stuff, adding _launch on Binder_ to various repositories that should have it.

23:28 KENNEDY: Good example here Now those air pretty straight for things to do. Awesome. All right, Well, it was great to learn more about mybinder.org. What you guys were up to you. Thanks for being on the show.

23:28 HEAD: Thanks for taking the time to talkto me.

23:28 KENNEDY: Yeah, you bet it was great bye.

23:28 KENNEDY: This has been another episode of Talk Python to Me. Our guest on this episode was Tim Head and has been brought to you by Linode and Talk Python’s Online courses.

23:28 KENNEDY: Start your next Python project on Linode’s State of the Art Cloud Service. Just visit TalkPython.fm/linode. L I N O D E. You automatically get a $20 credit when you create a new account.

23:28 KENNEDY: Want to level up your Python. If you're just getting started, try my Python jump start by building 10 Apps course. Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python. And, of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at `/iTunes` the Google play Feed at `/play` and the direct RSS feed at `/rss` on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page