00:00 One of the fastest growing areas in Python is scientific computing. In scientific computing with Python, there are a few key packages that make it special. These include NumPy / SciPy / and related packages. The one that brings it all together, visually, is IPython (now known as Project Jupyter). That's the topic of the episode #44 of Talk Python To Me.
00:00 You'll learn about "the big split", the plans for the recent $6 million in funding, Jupyter at CERN and the Large Hadron Collider and more, with Min RK & Matthias Bussonnier.
00:00 [music]
00:00 Welcome to Talk Python to Me. A weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on twitter where I'm @mkennedy.
00:00 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on twitter via @talkpython.
00:00 This episode is brought to you by Hired and Snap CI. Thank them for supporting the show on twitter via @Hired_HQ and @snap_ci.
01:23 Hi folks. No news this week. I do have a big announcement coming in and I am really looking forward to share it with you all but I am not quite ready to talk about it yet, so stay tuned. For now, let's get right to the interview with the project Jupyter core devs, Min RK and Matthias Bussonnier.
01:39 Matthias, Min, welcome to the show.
01:40 Thanks.
01:43 Thanks for having us here.
01:44 Michael: Yeah, I am really super excited to talk about Python intersected with science, and this thing called IPython or what's become Project Jupyter. So, that's going to be really great and before we get to that though, let's just talk about how you got involved, how did you get into programming, how did you get involved with IPython and all that stuff; what's your background? Min, do you want to go first?
02:04 Min: Sure. So I was an undergrad in physics at Santa Clara University working with Brian Granger, one of the founders of IPython project. And I was interested in computing and simulation and things and it ended up on the interactive parallel computing part of IPython as my undergrad thesis and started doing online numerical simulation homework stuff in Python even though the classes were taught in MatLab and Active and things. And, enjoying the scientific Python ecosystem of NumPy, and Matplotlib and things- that's kind of how I came to the project and scientific Python in general.
02:44 Michael: Yeah, that's really cool, and was IPython already a thing when you got started?
02:48 Min: Yeah, Fernando Perez created IPython in 2001, and I was doing my undergrad a few years after that, and so I joined the project after it had been around for about 5 years, in 2006. And I have been working on it for the past 10 years, I guess.
03:02 Michael: Yeah, about ten years, how time flies. Matthias, how about you?
03:06 Matthias: I've been to the project shorter than Min, I actually started programming long time ago and came across one of the huge refactoring of IPython Min did, I think it was finished in December 2011, just after QT console was released, and that project was much more friendly and in good shape for beginners Python programmers. At the time I was beginning my PhD in Biophysics in Paris and I started contributing to the project, it was my first big contribution to an OpenSource project. And, I started to spend my nights and weekends during my PhD improving IPython which was helping me really a lot for my PhD and I quickly became a core contributor and I have stayed in the team since then.
04:01 Michael: Maybe a good place to start talking about this whole project is the history. Originally, this project was called IPython and IPython Notebooks, right?
04:13 Min: Yeah, IPython was around for a good ten years before we got a version of the notebook out, although we had been working on various versions of notebooks for about five years that most attempts kind of didn't go anywhere.
04:27 Michael: So what did it look like, what did the product look like, in the early days?
04:32 Min: Yes, so initially, Fernando created IPython just a better interactive shell for Python. So, giving you some better tab completions, nice colorful tracebacks, things like that. Also, Python is a nice verbose language but when you are doing interactive stuff some of the bash shell syntax is nicer to type when you are doing LS and CD and everything, so one of the things that Fernando did early on was add this notion of magics for extending the Python language to give convenient commands for interactively typed things, like you can type CD in IPython which you can't type in a regular Python environment. And the magics are particularly useful for the scientific visualization things and things like the time at magic for profiling and good Matplotlib integration for the event loop and things like that.
05:29 Michael: I know what the shell looks like today, you can load it up and it kind of looks like a shell and you type in there but you can do things like plotgrass and those will pop up in the separate windows and things like that, right?
05:41 Min: Yeah, and most of that is provided by tools like Matplotlib, but often those tools need a little bit of help to make sure that the terminal stays responsive, and that's one of the things that IPython helps with, in terms of what Python calls the input hook to ensure that the terminal remains responsive while Gui event loop is also running.
06:05 Michael: Yeah, very cool. So how did it go from that to the more- one of the things was like articles or publish style these notebooks that you could use to communicate almost like finished work rather than something interactive.
06:18 Min: Notebook style interfaces- since a lot of the IPython folks come from a physics background, so Brian and Fernando started the project doing their graduate work in physics at Boulder at the same time, and then I was Brian's physics student and notebook environments are pretty common, there are various commercial and non commercial products that have kind of notebook processing environments especially for math analyses, so often code is not the best representation of math but there are rich rendered mathematical expressions that are nice. Brian Fernando knew that they wanted notebook type interface fairly early on, but the tools just weren't there to build it. And IPython wasn't in a shape to really support it, so slowly at first and then it kind of pick up speed, we had the pieces for putting that together but it was kind of in- it's been on the horizon for many years before it actually happened.
07:20 Michael: Sure, it took a while to build the maturity into it. What are some of those building blocks that it was waiting on?
07:28 Matthias: Yeah, I think that's a web technology and web sockets where one of the technology that was missing to actually use a notebook if I remember correctly one of the latest prototype that we did not release was using HX spooning, but the ability to actually push a result to web front end once as soon as a kernel gets a resort 7:53 is one of the key factors that pushed the notebook forward- notebooks that you know nowadays was actually using still draft of web sockets that stayed in draft state long time. And so we were really bidding edge in this technology and adapting a lot of everything in browser and everything on what current browser can do. There is a way we can put that in the notes of the podcast, a really nice blog of Fernando that recaps the history of IPython. And even 150 lines of Python which is a version of IPython when it was like a few weeks old which is IPython 0.1 that we can dig up for people who are interested in trying really only prototype.
08:44 Michael: Yeah, and go back and see the history. That's a really interesting point Matthias, because it's easy to think of the web as being this very rich, powerful, capable platform because it has been for the last 5 years or so. But 10, or even more than 10 years ago- it was not, right, it was basically just documents on the web right, you had a little bit of Javascript and that was about it, right?
09:10 Matthias: Yeah, I think so. I haven't used that much, I wasn't developing on the web that much 10 years ago, I was more the C, C++ person. Min maybe was more developing at that time?
09:24 Min: Yeah, I wrote an early version of the web based notebook for IPython during the summer of 2006, 2007, I think. Even then, the tools available really weren't- it was not a particularly pleasant thing to work with.
09:41 Michael: I bet it wasn't. Did you end up in a lot of situations where you were like, "Oh this only works in Firefox," or, "This one only works in IE"
09:51 Min: Yes, it's frankly still like that.
09:55 Matthias: It still never works in IE.
09:56 Michael: [laugh] Yeah, it's hard to love IE, I know.
10:03 Min: Well, I mean, recent versions of Internet Explorer are actually really nice and have good standards, implementations, and everything but the reputation of IE 6 kind of overshadows it.
10:18 Michael: Yeah, it definitely casts a long shadow. And you know, Microsoft I think just last week possibly, or very recently, just ended support for all versions of IE other than I think 11 and onward, maybe 10 and onward, but certainly knocked out a whole bunch of them and once that kicks in, that's going to be, that's going to be a good day for everyone that has to work on the web.
10:39 Matthias: Microsoft has done a lot of things recently. Last week if I remember correctly, it did release as open source so Javascript engines that will power the next version of their browser, so Google as V8 which both power Chrome and NodeJS which is actually one of the technologies that had the notebook become reality because Javascript was painfully slow 10 years ago, and it's now really fast, thanks to V8. And so, it's a pretty nice to see nowadays Microsoft actually using open source software and contributing to the community, and I hope that in the next few years Microsoft will use some fact that everybody is complaining about IE and everything and get actually nice software.
11:36 Michael: It will be really nice if that comes along, because a lot of people run their software and the world would be a better place if it works really well. I certainly think through on the right path, I think it's pretty interesting. So one thought I had while you guys were talking about this is- what's the cross platform story of IPython and Jupyter in general? Does it work kind of equally well on Windows, Linux, OS10, or are there places that are more equal than others?
12:06 Min: Linux and OS10 are a little bit more equal than Windows, but it should work everywhere, and even though all of our developers and everything are working exclusively on Linux and OS10, when we do user surveys and things we find that roughly half or even slightly more than half of our users are running Windows. So even though it often doesn't work quite as well, we'll introduce bugs we don't notice for a while. Windows really is a first class platform for the kind of local desktop app that happens to use a web browser for UI case of the notebook. There are certain aspects of installation that are often more challenging on Windows, especially in terms of installing kernels other than the Python one, so installing multi language kernels is more challenging on Windows. I think that's not necessarily a specific deficiency of Windows, it's more just the kind of developer maintainers don't tend to use Windows so the documentation in that case often just don't cover what you need to do for Windows as well.
13:16 Michael: Right, if you don't develop and test deploying your packages in the underlying compilers that have to make them go, you are more likely to run into problems, right?
13:25 Matthias: Yeah, I would say also that continuous integration is often on Linux only, setting up on Windows is painful, so we catch up bugs much often with continuous integration on Linux, so it's less prone to bug on Linux. And the other thing is, I don't always like to say good things about half proprietary tools, but Conda changed a lot of things for the last few years- it was really painful to install Python on many systems, and now it's one of the solutions, especially at Software Carpentry Bootcamp, where we ask people to just install Conda, and Conda install Jupyter, and it almost always works out of the docs. And especially for beginners, it's really nice to work with.
14:20 Min: Yeah, Conda has really moved the bar for how easy it is to get 14:24 especially on Windows, there are lots of different ways to install things on Unix platforms that work fairly reliably but the binary is provided by Conda and Anaconda are extremely valuables for beginners especially on Windows where people don't tend to have working compilers set up, and a lot of the scientific packages won't build on people's Windows machines. So, having binaries is extremely important and the binary is provided by Conda and Anaconda, and have been extremely valuable, especially for people getting started in scientific Python.
14:59 Michael: Yeah, I still think I have scars from the "vcvars.bet was not found" sort of errors trying to do stuff on Windows and we had Travis Oliphant on show #34 who is behind Conda and Continuum and all that, I think that it's a really cool thing that those guys are doing, sort of taking that build configuration step and just pre-building it and shipping the binaries like you say. That really helps people when they are getting started I think.
15:30 Min: Yeah, it's been a huge difference especially as Matthias mentioned in the workshop the kind of Software Carpentry and Python bootcamp type environments, which just a few years ago were- you spend the first day on installation basically, which is a high price to pay in a 2 day workshop. And now it's often down to an hour.
15:54 Michael: It's awesome, it's a super high price to pay and it's also super discouraging, right, people come not because they want to learn how to configure their compiler, they want to build something amazing right, and they have got to like plough through all these nasty configuration edge cases and they aren't very cool. So, before we get any farther, just the other day, I was trying to describe IPython to somebody in like one or two sentences. And, I didn't do a super job, I think. Could you guys maybe give me your elevator pitch for what is Jupyter or IPython which becomes Jupyter?
16:30 Matthias: It's really tough- have you seen "The Lego" movie? You know the song "Everything is Awesome"?
16:36 Michael: Yes.
16:38 Matthias: That would be my pitch.
16:39 Michael: [laugh] Everything is awesome, ok.
16:45 Min: So I would say that IPython and Jupyter products together provide tools for inactive computing, and reproducable research, and software based communication. The high level gist.
17:02 Michael: it's fairly different than a lot of what is out there from a programmers' perspective, so it does take a little explaining, doesn't it?
17:10 Min: Yeah, so we have things like an environment in which to do the interactive programming and do the exploratory work. And then we also have things like the document format which are for distributing the communication and sharing it with other people. So those are kind of the two aspects, and Fernando likes to say we have tools for the life cycle of a computational idea.
17:32 Michael: That's a very cool way to put it, it's a very cool tagline. I like it. We are talking about IPython because that's the historical place and we are talking about Jupyter because that's the present and the future. Could you guys maybe talk about how it went from one to the other, what's the story there?
17:47 Min: Yeah, so when we started working on building these UIs with Rich media displays, the first one of which was the QT Console, the first step of that was separating the front end from what we call the kernel which is where the code runs. That meant essentially establishing a network protocol for a repl basically, and with that we have the ability, an expression of ok, I am going to send an execute request that has some code for the kernel to evaluate and then the kernel sends messages back that are display formats of various types, so it can send back pngs or html or text. We realized, not entirely on purpose, it wasn't what we set out to do, but we realized that we had this protocol that there was nothing Python specific about it, that any language that understand the repl can talk this protocol, and because the UI, and the code execution were in different processes, there is no reason that the two need to be in the same language. Communities like, the first big one was the Julia language community, essentially saw the UI, specifically the notebook UI and said "you know, we like that, we want to use that we had rather not re-implement it". So what they implemented was the protocol, and once they implement the protocol they got the UI for free. The result of that since we didn't set out to design that were, there were a bunch of rough edges where we had assumed Python, but they were kind of incidental, smaller assumptions to work around, and so since that started, we have been kind of refining protocols and things kind of to remove Python and IPython assumptions, so that kind of the UI is separate from the language in which execution happens, because we don't really- you know a lot of the benefits of the protocol and the display stuff, there is no reason that should be confined to code executing in Python.
19:50 Michael: Yeah, that's a really happy coincidence, isn't it? That's excellent.
19:53 Min: Yeah.
19:53 [music]
19:53 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
19:53 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.
19:53 Typical candidates receive 5 or more offers in just the first week and there are no obligations ever.
19:53 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!
19:53 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.
19:53 [music]
20:53 Michael: Matthias, where did Jupyter come from? It used to be called IPython obviously that doesn't make sense if you are not using Python.
21:02 Matthias: We've been thinking about renaming part of the projects for much longer time when we actually announced that we were giving name to Jupyter. Of course we were aware that users, especially non-Python users were confused, like I want to use the notebook with R why should I install IPython? And you have to understand that many users do not make the difference between Python and IPython, and many users also write, IPython with lower case i and everyone knows that is upper case I.
21:37 Michael: It's not made by Apple, come on.
21:38 Matthias: Yeah, and so we are searching for another name, for something that is easy to Google, that is not already taken, we wanted to get the domain name and that would have a scientific connotation and we wanted to 21:59 that has been using IPython for a long time, almost since the beginning, and I still remember one day Fernando wrote to me to say, "Hey, I just found this name, what do you think?" And everybody agreed, and almost in a couple of days we decided to grab all, the domain name and start working on actually separating the project and everything. It has been a really tough transition, people were really confused about the renaming, they are still confused especially for our new users, the distinction Jupyter/IPython is really useful. Also, it allowed Jupyter to become something slightly bigger, that was also in our mind in the back- Jupyter is more a specification that you have a protocol and you have a set of tools what is part of Jupyter, is much broader, and it can allow anybody to basically say, "hey, I implement Jupyter protocol" and so it's easier to say, "hey, I have a Jupyter atom plugin". There are also legal issues around that, using trademarks are really close to Python is difficult, and Jupyter being a brand new name space and you know that namespace are great we should use more of them, allow people to use that and say that they are multi language which in much better ways than with IPython. IPython is also considered as a shell, and Jupyter is more than just a notebook, so having Jupyter is much better and we are happy with that.
23:47 Michael: Yeah, it makes perfect sense. I am sure the transition was a little confusing for people who have been doing IPython, or they have heard about IPython they were going to look into it now it's this other thing, but there is more than just a couple of languages that are supported, right, how many are supported?
24:02 Matthias: It depends on what do you want to be supported, we have a wiki page which is still on the IPython repository which list if I remember correctly, 50 or almost 60 languages, it means that you can have like languages that have many kernels, it means that someone at some point wrote a kernel that works with IPython and if I remember correctly we have around 60.
24:34 Michael: 60, that means probably if you have a language you care about, it probably works with Jupyter or it's very edge.
24:42 Matthias: Most kernels won't have all the features. I would say the one I know works with most of the features are the Python one because we maintain it, so you can see there is a reference implementation, or RPython, like only a few hundred lines to show you how to implement that, and the Julia kernel is pretty feature complete actually, many of the features that we have in the IPython kernel where actually the Julia team moved into the Julia language itself so actually having implemented the protocol, having seen the notebooks UI allows them to make much better abstraction for the Julia language, and actually improve performance in some small area. The Haskell kernel also has a really good maintainer and has really nice features, like if you write some code in Haskell, and you can rewrite it in a more compact form the Haskell kernel will tell you that after running your code, is "hey, you can rewrite it this way it could be more compact and more readable by someone". Ruby had some activity at some point, I am not sure now how much activity there is and we definitively have people from the R kernel, R kernel was created by Thomas Kluyver who is now back and working with us, and have been taken over by some R people who are actually contributing to our kernel and also reporting bugs and fixing bugs a lot in IPython itself.
26:21 Min: Yeah, then another active kernel author community is the Calico project which is fun CS department run by Doug Blank where it's kind of the kernel itself is a multi language environment, they can actually switch between different runtimes that does some pretty cool stuff and they have been very helpful with implementation and protocol testing and things.
26:47 Michael: Yeah, that's very cool, that's kind of related with what I was going to ask you next- if I want to write something in Python and then something in C++ and then something in R- can I do that like in one notebook and have the data work together?
26:58 Min: Yes. So there are a couple of things to that- one is we have chosen in the notebook to associate one notebook with one kernel, so there is one process determining how to interpret the code cells and produce an output as a result. There is another project derived from IPython called Beaker notebook, that doesn't do this, that associates each cell with the kernel and then defines the data interchange for moving data around, that allows running code and passing data around from Javascript, R, Python. However, a kernel from Jupyter, a kernel can itself define semantics for running code in other languages. And IPython, this is where sort of the distinction between IPython and Jupyter comes up, as far as Jupyter is concerned, there is one kernel associated with the notebook, but the IPython kernel can define these things called "cell magics" that say this is shorthand for actually compiling a block of C++ code with Syphon and then running that or the R magic that actually hands off code to an R interpreter. As far as Jupyter is concerned, there is only one kernel per notebook, but the kernels themselves can actually provide some of this multi language functionality. And IPython does.
28:19 Matthias: To extend on what Min said, there is another kernel, one of the Calico kernel actually, which actually is one kernel that implements many languages, and there is this nice distinction that a kernel is not always one language, it can be many languages and that the Calico kernel uses triple person syntax to say, "Hey kernel, change how you parse the next string", and so you can actually switch in between 3 or 4 languages, I don't exactly remember, there is Python, there is Scheme and something like that. And it's really interesting and you have different way of actually sharing the tab. One of the examples I can try to give which is really interesting in numeral languages integration is the Julia/Python magics demonstration, you can actually have a tight integration between Julia and Python, it's only on Python 2 unfortunately we need to update the code for Python 3, but I am not 29:25 enough in Julia, you can define the fibornacci function recursively in Julia calling fibornacci n-1 Python and fibornacci n-2 in Julia and the Python versions at call fibornacci n-1 Julia fibornaccie n-2 in Python, can ask for a number and you would actually get a cross language [Docker] stack trace where you have a layer 29:50 of each language. Which is really impressive. You can even create, from Julia you can import Matplotlib from the Python side, create a figure in Julia with function which takes sign for example from the Julia and go sign from the Python standard library on Numpy. And, 30:14 that on the matplotlib figure and get back into Python and it means that that the two interpreters are actually sharing memory, so it shows you that you can do some really advanced cross language integration without having to 30:35 force.
30:36 Michael: That sounds really interesting and useful for scientists, maybe they've got something they have done in R or some other language, some little bit of processing and they are like "I really just want to plug this in over here but it's the wrong language", right, and that sounds like that makes it kind of possible?
30:53 Min: Yeah, and it's been- so Julia being a very young language community, it's been extremely valuable to them to build this bridge largely to Python but also to C and things that Julia didn't- in order to come up to speed with what programmers expect with things like Matplotlib and stuff, they didn't need to write, ok here is the Julia plotting library just so that the people could do anything, they could start out by saying, well, let's just use Python libraries and because of these really sleek layers that let Julia talk to Python and let Python talk to Julia, in a really native way, Julia basically gets the entire Python library ecosystem for free and then can kind of re-implement as needed and as they find more idiomatic Julia ways to do things they can start building those libraries but they didn't have start from zero just because it was a new language. For new languages in general, I think being able to interoperate with other languages is a really valuable way to start kind of hit the ground running.
31:58 Michael: That is super interesting, because there is 10, 15 years of super solid data science stuff happening in Python and f you just go, we'll just start from there, rather than from zero, that makes all the difference.
32:10 Matthias: The other cross language thing that a few people think about is the language in the kernel and the language of the front end which is Javascript and HTML, and the notebook allows you to do like this cross language binding really easily especially with widgets, and one of the examples I give is you can have Fortran, you can actually plot something in Fortran and this base in Javascript and have a slight area that you move and change the result of your Fortran competition.
32:39 Michael: That's really awesome. One of the things that I wanted to spend a few minutes talking to you guys about was what you refer to as "the big split". So, it used to be that IPython was like one giant GitHub repository, right, and now you have broken it into many smaller pieces, yeah?
32:55 Min: Yeah, we went from one to about a dozen.
32:57 Michael: It sounds like a lot of work.
32:59 Min: Yeah, that was my spring pretty much, it took 3 to 6 months to kind of get it all split up but, since we knew we eventually wanted to do something like that we had for the most part are we organized IPython into kind of sub-packages of dedicated functionality, so it wasn't too crazy difficult to break it up, but the tricky bits were like the common utilities that we use all over the place and how to deal with that.
33:32 Michael: Yeah, it's the little inter-dependencies that don't seem big but when they are in between all the pieces all of the sudden it gets harder and harder, right?
33:42 Min: Yeah, and so that was- it was tricky to execute the big split, in a variety of ways even from we wanted to preserve history but we also didn't want to duplicate the already large IPython repo so that installing from Git would mean that it now is 12 times as big as the IPython was so we had to do some, on the new repo, some clever git history rewriting to kind of save the history for the files that didn't survive.
34:10 Michael: This is really interesting. You talked about that in your blog called, "The Big Split" which I will link to in the show notes, and you did some funky comments to sort of make that happen, right, be really careful about how you move the history over.
34:24 Min: Yeah, and this is what- there is some I think in the various version control tools, the fact that you can rewrite history in Git is both really scary and weird and gross, but also really useful sometimes, that it lets you do things like we did, which is selectively preserve history which has been nice, you get the history of the notebook work and the notebook repo even past the creation of the notebook repo.
34:51 Michael: But, you don't get the baggage, right?
34:54 Min: Yeah, but we don't get the history of all the rest of IPython.
34:56 Michael: Yeah, because you normally you delete the file the repo and it just- it doesn't show up. But you are still moving around, right?
35:02 Matthias: to give you an idea, I wouldn't just share, I will send you a link; someone made a graph visualization of the dependencies into IPython before the big split and on the same blog post you have comparison with Django, Twisted, Flask, Requests, and so you can get an idea of what is the complexity of that, and it's on Grok code that come and it's blog post 864 and I will give you the link so you can put it into the show notes.
35:35 Michael: Oh yeah, thanks. It's pretty seriously entangled.
35:37 Min: There aren't dependency cycles and crazy loops depending on each other, it's kind of a tree of dependencies but there are many nodes on the graph.
35:51 Michael: Sure.
35:51 [music]
35:51 SnapCI is a continuous delivery tool from Taught Works that lets you reliably test and deploy your code through multistage pipelines in the cloud, without the hassle of managing hardware. Automate and visualize your deployment with ease and make pushing to production an effortless item on your to do list.
35:51 Snap also supports Docker and in browser debugging and they intergrate with AWS and Heroku. Thank SNapCI for sponsoring this episdoe by trying them with no obligation for 30 days by going to snap.ci/talkpython.
35:51 [music]
36:46 Michael: Another thing I wanted to talk about with you guys is this thing called Jupyter Hub. What's the story to Jupyter Hub?
36:51 Min: So Jupyter Hub came out of the value of Jupyter notebooks and things in teaching context, so a lot of people whether in workshops or classes, are using notebooks to present, this is the material that we were talking about and the example code and running, doing live demos and things. You want your students to be able to follow along, and this is one of those cases where installing scientific Python stack as much as Conda has made that easier, it still can be a significant bar to get over. So we wanted- and people were building kind of the tools for kind of hacks around IPython at the time to deploy notebooks on behalf of users and we wanted to provide kind of an efficient implementation of hosting notebooks on behalf of a group of users, in the context of the research groups, you've got a machine that a bunch of- a half a dozen or so research scientists or students have access to or you have a class of 10 or 50 or 100 students. And, you say, all right, I've got these users, I can install packages for them and then point them at this URL and they can log in and run their notebooks. Basically take away the installation problem, by saying I am going to control the installation and host the notebooks and everything. And we wanted to kind of create the simplest smallest version of that and that's Jupyter Hub.
38:26 Michael: Ok, nice.
38:27 Matthias: From the technical side, we asked before which technology were necessary for the notebook, and we talked about web sockets. One other thing which is important is we were using really recent technology and web sockets was really new at the time, and the one other problem is many proxies or web servers were unable to correctly re-direct web sockets are even to update proxi rules without actually restarting the proxy or server. And that is one of the requirements we have on this notebook, if you want to 39:06 notebook without cutting the connection of the other we had to have dynamic proxy which was able to respond to for example rest requests like changes rules there without dropping the web socket. And before Jupyter Hub only a handful of prototype of projects were able to do that, actually Min wrote one such an HHTP proxy using node.js to suit this specific need that no other tools require. And that is why you actually need Jupyter Hub to run something and Jupyter Hub needs to be one of the front facing software, I think that no engine x can do it too. You cannot for example use Apache or something else or it's really much more difficult to have many notebook servers running.
39:55 Michael: It's interesting, that a lot of the web servers weren't really built for that, right, because I guess they came before web sockets, anyway, that probably was not a super important criteria for them, right?
40:06 Min: Yeah, the web servers were actually surprisingly slow, to adapt to provide web socket implementations. And Emacs didn't take too long but it was quite a while before you could reasonably expect Apache installation to support web sockets. They do now, so there are notebook deployments for Apache and Engine X but we put together the configure if they should be proxi as this kind of super simple proxy that you can update, you can update the routing table without relaunching or without losing existing connections or anything.
40:41 Michael: More you can make it easy for people to set up these environments, the better, because it's not always going to be like some web server admin or really experienced web developer doing this, right, it could just be a scientist who just wants this thing for their class, right, they don't want to deal with EngineX.
40:58 Min: Yeah, and that's something we are working on right now, because the way Jupyter Hub is put together, it has two primary extension points one is authentication, so how users log in and we can kind of drop in any implementation of logging whether it's just local authentication with the system or the password or using Oauth with GutHub or that kind of stuff, and on the other is the spawning how it actually allocates resources for the single users service, but because there are so many traces for how to do that, it actually- one thing we are working on is making kind of more of a turnkey version that you can say I want to use this authentication system and this spawning mechanism and people can just deploy that, because there are form the very simple default behavior of just that works out of the box for I've just got a shared machine that is on the internet, and I want to give all the users who already have accounts on that machine access, that's pretty trivial right now, all the way to a deployment last year that just Jess Hamrick did at UC Berkley for a couple of hundred students in psychology using Docker Swarm and Engine X and a big multi note deployment for a large number of users. And using Ansible to automate all that deployment.
42:26 Michael: That sounds awesome! Is that document somewhere, like is there an article or something on this?
42:30 Min: Yeah, so she wrote a blog post for the Rackspace developer blog because the hosting for that class was all provided by Rackspace. She had blog post covering that and Ansible setup is just the repo on GitHub that we can link to.
42:43 Michael: Ok, I'll put that in the show notes. So, I can go to Dropbox and get like storage as a service, I can go to Google apps and get word processors as a service; can I do that for Jupyter somewhere, can I just go and like pay $5 a month and get like Jupyter access?
43:03 Min: Yeah, there are few companies hosting Jupyter notebooks. So IBM has their I believe it's called work bench, data science work bench, Continuum Analytics has Wakari which hosts notebooks- I am trying to think how many others there are, there is Domino Data Lab. Sage Math cloud is probably the primary, the one we are most connected to.
43:32 Michael: Ok, cool. It's good to hear.
43:34 Min: Yes. So there is a variety of these hosted notebook things.
43:40 Matthias: One thing which is slightly related, mybinder.org which have been setup by Jeremy Freeman from Janelia labs, where basically you setup a GitHub repository, your notebooks require 43:57 file some extra meta data if needed and you link to mybinder.org and it would actually just for you spawn a Docker instance with requirements, and give you a temporary notebook online so if you have an article that you want to be reproducible, you can just post it on GitHub, it's basically like- there are sure those who know, but back by your kernel. And it's paid directly out of Jeremy's pocket. And huge thanks to him, if ever someone from Google hears that, and if you like the projects it would be great if you click give some free resource to Jeremy to host that.
44:43 Michael: Wow, that's really cool, I was thinking about how Docker might fit into it here and that's certainly a really nice use of it.
44:51 We have TMPNB which is written by Kyle Kelley Rackspace which drives if you have to try Jupyter.org you will get temporary notebook server to kind of try out Jupyter with a few kernels installed, but it's very locked down, you can't get your own work in there or do network things. And so Binder basically combines the nbviewer idea with TMPNB to kind of create a work space where you can actually install packages and get everything running with no books, so we are working on variety of these applications, a lot of them are Docker based, deploying notebooks on various contexts.
45:34 Michael: So, one thing I wanted to make sure we talk about before we wrap things up, is not too long ago, I guess in July your project got $6 million in funding, and that's just so awesome, I mean I can't think of another open source project that's not deeply tied to a company that got that kind of money.
45:56 Min: Yeah, we have been extremely fortunate with the funding over the last few years thanks to huge amount of work from Brian Granger and Fernando Perez, in tracing that funding down and which is what has given me a job, for the last few years and Matthias as well.
46:14 Michael: That's really great, I mean what kind of difference does that make for what you guys are doing, obviously a job, but what are the goals with that money?
46:22 Min: To expand the coverage of the Jupyter project that has a lot to do with education and kind of the document and publication pipeline, so with nbconvert we can convert notebooks to various formats but that's a very young project that has a lot of room for improvement. And so integrating into publication pipelines and things like that and converting other formats to notebooks in addition to converting what we already do which converting notebooks to random formats. And then the other side is in the building web applications with the tools that we have so life collaboration, real time collaboration, and the notebook is going to be a bug part of it and then building applications other than the notebook, so we've got this protocol for execution, we have talked about as kind of the building block, you can use that without the notebook document so you can build a web application that involves running code and producing output, but it may not necessarily be a document editor type application.
47:27 Michael: It could be something on the other side that's not the notebook app basically.
47:32 Min: O'Reilly built something that basically built a version of this called Fibi, that drives a few demos and some interactive code based blog posts on the O'Reilly website but if you look at how Fibi works, and Fibi is also it relies on some tempnb based Docker deployment stuff. The way the Javascript side of the code is organized and also to some degree the server side, they don't make it easy, so while it's technically true that you can use the kernel execution display stuff without the notebook document piece, we didn't make it easy to do that so a Fibi kind of has to hack around a few other choices we've made and right now we are in the middle with the help of a lot of volunteered contributors, work from Continuum Analytics and Bloomberg and other folks. We are working on refactoring the client side of the code into small Javascript packages just like we broke up the Python side into the various pieces we are breaking up the Javascript side so you can say I want to install just the client side code that lets me run code and get output. And with that, then you can build an application that is a web app that maybe you don't even show any code to the user, you just show sliders and buttons and everything, but the way your application works actually involves this executing code and getting output with the display protocols you don't have to rewrite all of that logic yourself. It's kind of similar to the way Julia and Haskell drove us to revise our messaging protocol to be more useful outside of IPython things like Fibi and other applications are driving us to kind of rethink how our Javascript code works so that you can actually, you can use it kind of pieces at the time in contexts other than the ones that we have already thought about.
49:25 Michael: Yeah, it sounds like there is going to be a lot of innovation and cool new stuff built on top of this, now you have broken out all the building blocks.
49:32 Matthias: Beyond this just purely technical- we have money for the project and actually write code, I want to do thanks, I mean to 49:42 who are really nice to work with, it's really back and forth and the money is allowing us to do much more things than just being employed. We, for example now will be able to host Jupyter day where we can actually have talks and invite speakers. We already had one in November in New York and one next month in Chicago there is a Jupyter day and without the funds it would be difficult to actually bootstrap that. And the other thing is, now we are around 20 people across working on the project, people that are actually on their day job at Continuum and Rackspace or IBM, Google, Microsoft, are actually allowed doing their work state contribute to the project and this helps a lot and the one thing that this fund allowed us is to actually gather twice a year all of us, to actually meet each other and once meet people you can do much more work than only interacting via internet. And that's something that is really great and that will change a lot of things. Last year the numpy developer had never met all together and to get some funding to get them together and apparently it was really great to motivate everyone and re-launch the project, and I am really looking forward because some people that are working with us I have never met in person and I would really, in a few weeks we should meet together thanks to some of these funds and it's something I am really looking forward to.
51:30 Michael: That's a really interesting point about the community, there is obvious technical stuff, but like you said, the building the community both from the outside and the inside that's definitely going to be powerful.
51:41 Min: Yeah, and another aspect of the building and maintaining the community that is facilitated by this funding is being able to- and one of the things that all open source projects our own included really suffer on is documentation, that especially a fast moving project the documentation can really suffer because especially when you are working with volunteers and things it's a lot more fun to work on code than docs-
52:10 Michael: Yeah, would you rather write the feature, a new feature or would you rather talk about somebody else's feature that is going to be outdated anyway, right.
52:15 Min: Yeah, and I think it is the responsibility of funded projects like ours to devote some of those resources to maintaining communication with the community and building good docs because you can, it's often easier to find people who are, who can build good docs that you can pay to do that then it is people who can build good docs who are willing to volunteer a bunch of their time and so being able to compensate people for contribution to documentation is I think an important thing for funded projects to do and so we have recently hired a couple of people to specifically emphasize improving documentation and things and I think that's a really viable thing that the funding provides for not just us, but for the community.
53:05 Michael: Good documentation and tutorials and samples- that really makes it much easier to start using the project.
53:12 Matthias: One point about contribution that has followed me since I started the project, and I felt was great especially in the Python community and make me stick to IPython, is most of the co-developers- if you are a beginner Python programmer and you want to contribute to the project, we will likely spend a few hours with you on GitHub or even on Skype if you ask, to help you build a feature, I will personally take a few hours to teach you how to do it if you want to contribute. So, don't be afraid to come and ask and say I don't know, I want to build that, I have the time, I have the motivation it's ok if it takes you 4 weeks, it's ok to be wrong, it's ok to ask. Unlike some other communities who are known for having a 54:05 which is not really nice, we will take time for you to help you, to teach you, because we really think that you should feel welcome and that's how we had some people that stayed here because contributing is a project where we help you, it's not let me do it and I am done in 5 minutes later, we prefer people to take 4 weeks to do something.
54:33 Michael: It's all about teaching people to fish rather than getting them the fish, right, you can just do so much more if you can get more people brought into the project and comfortable- and that's a great message, thanks. So I basically have time for one more question for you guys and then we'll call it a show. Min and I were talking about some really cool places using Jupyter before we started recording, and I just wanted to ask you each, maybe Matthias first, what is the coolest use or example people using Jupyter or IPython that you know of?
55:06 Matthias: Peter Norvig NBViewer are some of the things I have seen, he is really busy and he had some type of his notebook and I sent him a batch and he took the time to reply to me so he is both someone who wrote us on that notebooks, and someone who is really friendly. And I also like a lot Jake Vanderplas blog where he is blogging he has blog engines that could 55:37 notebooks, so if you can add the plugin and you can directly like route your notebook and publish directly as blogs and also my blog which is Nicola's is just notebook I just push notebook on the github repository and 55:52 take care of compiling them to html and publish, which makes publishing really easy.
56:01 Michael: Oh those are really cool examples. Min?
56:05 Min: Yeah, so the examples for me are 56:08 and I got to visit Cern in December where they are working on some big Jupyter deployments for the scientists working on experiments at Cern, so that is definitely a really cool thing to see, Jupyter and IPython, as someone whose degree is in Physics seeing Jupyter and IPython valuable in that level of physics experiment and theory research has been really cool. The other things that I think are really cool are where it's been adopted in educational contexts, so for instance Lorena Barba has her AeroPython course teaching aerodynamics with Python, it's all in notebooks on GitHub and then Doug Blank has his computer science courses all with notebooks and in a Jupyter hub instance for his students and Jess Hamrich at Berkley with her large deployment teaching cognitive computational models for cognition, and some of these are not programming classes but they are classes meant to teach scientific principles where programming just happens to be a useful way to illustrate and explore the ideas and the fact that notebooks are proving useful in both learning how to do science for students and then also for actual scientists doing real cool science both of those I think make me feel really good about working on this project.
57:48 Michael: Yeah, I am sure those are really motivational, and when you wake up you are like, wow I am building software for all of this, this is fantastic. On show #29 I had Kyle Cranmer from the Atlas experiment on the Large Hadron Collider, he talked about how they were using IPython and stuff like that there as well which is very cool. All right gentleman, this has been super interesting, it sounds like IPython has become Jupyter and it has been such a success, but it seems like it's on the brink of breaking out to be way much more then what it has been, so it's an exciting time.
58:25 Matthias: Thanks for having us.
58:27 Min: Yeah, thanks for having us.
58:28 Michael: Yeah, you bet, thanks for being on the show. Talk to you later.
58:28 This has been another episode of Talk Python To Me.
58:28 Today's guests were Matthias Busognier and Min RK, and this episode has been sponsored by Hired and SnapCI. Thank you guys for supporting the show!
58:28 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $2,000 USD.
58:28 Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at snap.ci/talkpython
58:28 You can find the links from the show at talkpython.fm/episodes/show/44
58:28 Be sure to subscribe to the podcast. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.
58:28 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on our website.
58:28 Don't forget to check out the podcast T-shirt at talkpython.fm/shirt, get yours and share your love for Python with the whole world.
58:28 This is your host, Michael Kennedy. Thanks for listening!
58:28 Smixx, take us out of here.