Learn Python with Talk Python's 270 hours of courses

#44: Project Jupyter and IPython Transcript

Recorded on Tuesday, Jan 26, 2016.

00:00 One of the fastest growing areas in Python is scientific computing.

00:03 In scientific computing with Python, there are a few key packages that make it really special.

00:07 These include NumPy, SciPy, and the related packages.

00:11 But the one that brings it all together, visually, is IPython, now known as Project Jupyter.

00:16 And that's the topic of episode 44 of Talk Python to Me.

00:20 You'll learn about the big split, plans for the recent $6 million in funding,

00:25 Jupyter at CERN and Large Hadron Collider with Min Arkay and Matthias Boutonnier.

00:54 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:01 This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

01:05 Keep up with the show and listen to past episodes at talkpython.fm.

01:09 And follow the show on Twitter via at Talk Python.

01:11 This episode is brought to you by Hired and SnapCI.

01:15 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.

01:23 Hi, folks. No news this week.

01:25 I do have a big announcement coming, and I'm really looking forward to sharing it with you all,

01:29 but I'm not quite ready to talk about it yet, so stay tuned.

01:31 For now, let's get right to the interview with the Project Jupyter core devs,

01:36 Min Arkay and Matthias Boutonnier.

01:38 Matthias, Min, welcome to the show.

01:40 Thanks.

01:41 Thanks, Mike, for having us here.

01:43 Yeah, I'm really super excited to talk about Python intersected with science in this thing called

01:49 IPython, or what's become Project Jupyter.

01:51 So that's going to be really great.

01:53 And before we get to that, though, let's just talk about how you got involved.

01:58 How do you get into programming?

01:59 How do you get involved with IPython and all that stuff?

02:03 What's your background?

02:03 Min, you want to go first?

02:04 Sure.

02:05 Yeah, so I was an undergrad in physics at Santa Clara University, working with Brian Granger,

02:11 one of the founders of the IPython project.

02:13 And I was interested in computing and simulation and things, and ended up working on the interactive

02:21 parallel computing part of IPython as my undergrad thesis, and started doing my numerical simulation

02:28 homework stuff in Python, even though the classes were taught in MATLAB and Octave and things,

02:34 and enjoying the scientific Python ecosystem of NumPy and Matplotlib and things.

02:39 And that's kind of how I came to the project and scientific Python in general.

02:44 Yeah, that's really cool.

02:45 And was IPython already a thing when you got started?

02:48 Yeah.

02:48 Fernando created IPython in 2001, and I was doing my undergrad a few years after that.

02:54 And so I joined the project after it had been around for about five years in 2006, and I've

02:59 been working on it for the past 10 years, I guess, now.

03:02 Yeah, about 10 years.

03:03 So how time flies.

03:04 Matthias, how about you?

03:05 Oh, so I've been to the project much later than Min.

03:11 I actually started programming a long time ago and came across one of the huge refactoring

03:16 of IPython, MinDead.

03:18 I think it finished in summer 2011.

03:22 Just after that, they released the Qt console, so the current IPython team at the time.

03:27 And the project was much more friendly and in a good shape for beginners Python programmers.

03:35 At the time, I was beginning my PhD in biophysics in Paris, and I started contributing to the project.

03:42 It was my first big contribution to an open source project.

03:45 And I started to spend my night and weekend doing my PhD, improving IPython, which was helping

03:52 me really a lot for my PhD.

03:55 And I quickly became a core contributor, and I've stayed in the team since then.

04:01 Maybe a good place to start talking about this whole project is maybe we can start with the

04:07 history.

04:07 Originally, this project was called IPython and IPython Notebooks, right?

04:12 Yeah, IPython was around for a good 10 years before we got a version of the notebook out, although

04:19 we had been working on various versions of notebooks for about five years that most attempts kind

04:24 of didn't go anywhere.

04:25 So what did it look like?

04:27 What did the product look like in the early days?

04:31 Yeah, so initially, Fernando created IPython as just a better interactive shell for Python,

04:38 so giving you some better tab completions, nice colorful tracebacks, things like that.

04:44 Also, Python's a nice, verbose language, but when you're doing interactive stuff, some of

04:50 the bash shell syntax is nicer to type when you're doing LS and CD and everything.

04:55 So one of the things that Fernando did early on was add this notion of magics for extending

05:01 the Python language to give convenient commands for interactively typed things.

05:06 like you can type CD in IPython, which you can't type in a regular Python environment.

05:13 And then there are magics that are particularly useful for the scientific visualization things

05:20 and things like the time at magic for profiling and good map.lib integration for the event loop

05:27 and things like that.

05:29 I know what the shell looks like today.

05:31 You know, you can load it up and it kind of looks like a shell and you type in there,

05:35 but you can do things like plot graphs and those will pop up into separate windows and things like that, right?

05:41 Yeah, and most of that is provided by tools like MapPotlib, but often those tools need a little bit of help to make sure that the terminal stays responsive.

05:50 And that's one of the things that IPython helps with in terms of what Python calls the input hook

05:58 to ensure that the terminal remains responsive while a GUI event loop is also running.

06:04 Right, yeah, very cool.

06:05 So how did it go from that to the more, I want to think of it as like articles or published style,

06:12 these notebooks that you can use to communicate almost like finished work rather than something interactive?

06:18 Notebook style interfaces, since a lot of the IPython folks come from a physics background.

06:24 So Brian and Fernando, who started the project, were doing their graduate work in physics at Boulder at the same time.

06:32 And then I was Brian's physics student.

06:34 And notebook environments are pretty common.

06:37 There are various commercial and non-commercial products that have kind of notebook processing environments,

06:43 especially for math analysis.

06:47 So often code is not the best representation of math, but there are rich, you know, rendered mathematical expressions that are nice.

06:54 Brian and Fernando knew that they wanted a notebook type interface fairly early on,

07:00 but the tools just weren't there to build it.

07:04 And IPython wasn't in a shape to really support it.

07:07 So slowly at first, and then it kind of picked up speed.

07:10 We added the pieces for putting that together.

07:14 But it was kind of in, it was on the horizon for many years before it actually happened.

07:20 Sure.

07:20 It took a while to build the maturity into it.

07:23 What are some of those building blocks that it was waiting on?

07:27 Yeah, I think that web technology and web socket were one of the technologies that was missing to actually use a notebook.

07:36 If I remember correctly, one of the latest prototypes that we did not release was using AJAX polling.

07:45 But the ability to actually push a result to the web front end once, as soon as the kernel gets a result,

07:53 is one of the key factors that pushed the notebook forward and allowed us to do the notebook.

07:59 Actually, the notebook that you know nowadays, the first prototype, was actually using still draft of web sockets that stayed in draft state a long time.

08:08 And so we were really bleeding edge on this technology and adopting a lot of everything, everything in browser,

08:15 and everything to just rely on what current browser can do for the notebook.

08:19 There is, by the way, we can put that in the note of the podcast, a really nice blog of Fernando that recaps the history of IPython.

08:27 And even 150 lines of Python, which is a version of IPython when it was like a few weeks old, which is IPython 0.1,

08:38 that we can dig up for people who are interested in trying really early prototype.

08:43 Yeah, go back and see the history.

08:45 That's a really interesting point, Matthias, because it's easy to think of the web as being this very rich, powerful, capable platform,

08:54 because it has been for the last five years or so.

08:58 But 10 or even more than 10 years ago, it was not, right?

09:03 It was basically just documents on the web, right?

09:07 You had a little bit of JavaScript, and that was about it, right?

09:10 Yeah, I think so.

09:11 I haven't used that much.

09:13 I was not developing on the web that much 10 years ago.

09:17 I was more a C, C++ person.

09:20 You mean maybe it was more developing web at the time?

09:23 Yeah, I wrote an early version of the web-based notebook for IPython during the summer of 2006, 2007, I think.

09:33 Even then, the tools available really weren't...

09:37 It was not a particularly pleasant thing to work with.

09:42 I bet it wasn't.

09:43 Did you end up in a lot of situations where you're like, oh, this only works in Firefox, and this one only works in IE, and just partly working in a lot of places?

09:52 It's frankly still like that.

09:54 It still never works in IE.

09:56 Yeah.

09:58 Yeah, it's hard to love IE, I know.

10:02 Well, but I mean, recent versions of Internet Explorer are actually really nice and have good standards implementations and everything, but the reputation of IE6 kind of overshadows.

10:16 Yeah, it definitely casts a long shadow.

10:18 And, you know, Microsoft, I think just last week, possibly, like very recently, just ended support for all versions of IE other than, I think, in 11 and onward.

10:30 Maybe 10 and onward, but certainly knocked out a whole bunch of them.

10:33 And, you know, once that kicks in, that's going to be a good day for everyone that has to work on the web.

10:38 Microsoft has done a lot of things recently.

10:41 Last week also, if I remember correctly, they did release as open source the JavaScript engine that will power the next version of their browser.

10:51 So Google has V8, which both power Chrome and Node.js, which is actually one of the technologies that helped the notebook become reality because JavaScript was painfully slow 10 years ago and is now really, really fast thanks to V8.

11:10 And so it's really nice to see nowadays Microsoft actually releasing open source software and contributing to the community.

11:20 And I hope that in the next few years, Microsoft will lose some fact that everybody is complaining about IE and everything and get actually nice software, not that many security bugs and so on and so forth.

11:35 It will be really nice if that comes along because a lot of people run their software and it would be, you know, the world would be a better place if it works really well.

11:44 I certainly think they're on the right path.

11:46 I think it's pretty interesting.

11:48 So one thought I had while you guys were talking about this is how does, what's the cross-platform story or IPython and Jupyter in general?

11:57 Does it work kind of equally well on Windows, Linux, OS X or are there places that are more equal than others?

12:05 Linux and OS X are a little bit more equal than Windows, but it should work.

12:12 It should work everywhere.

12:13 And even though all of our developers and everything are working exclusively on Linux and OS X, when we do user surveys and things, we find that roughly half or even slightly more than half of our users are running Windows.

12:28 So even though it often doesn't work quite as well or we frequently during the development process will introduce bugs that we don't notice for a while, Windows really is a first-class platform for the kind of local desktop app that happens to use a web browser for UI case of the notebook.

12:46 There are certain aspects of installation that are often more challenging on Windows, especially in terms of installing kernels other than the Python one.

12:55 So installing multi-language kernels is more challenging on Windows.

13:01 And I think that's not necessarily a specific deficiency of Windows.

13:05 It's more just the kind of developer maintainers don't tend to use Windows.

13:10 So the documentation and education often just don't cover what you need to do for Windows as well.

13:15 Right.

13:16 If you don't develop and test deploying your packages in the underlying compilers that have to make them go, well, you're more likely to run into problems, right?

13:25 Yeah.

13:26 I would say also that Trevis CI, so continuous integration, is often on Linux only.

13:31 Setting up on Windows is painful.

13:33 So we catch up bugs with continuous integration, much often with continuous integration on Linux.

13:41 So less prone to bug on Linux.

13:44 And the other thing is, I don't always like to say good things about half proprietary tools, but Konda changed a lot of things for the last few years.

13:55 It was really painful to install Python on many systems.

13:59 And now it's one of the solutions, especially at Software Carpentry Bootcamp, where we ask people to just install Konda and Konda install Jupyter, which now even come vended in it.

14:10 And it's almost always works out of the box.

14:14 And especially for beginners, it's a really, really nice tool.

14:19 Yeah, Konda has really moved the bar for how easy it is to get set up, especially on Windows.

14:26 There are lots of different ways to install things on Unix-y platforms that work fairly reliably.

14:32 But the binaries provided by Konda and Anaconda are extremely valuable for beginners, especially on Windows, where people don't tend to have a working compiler set up.

14:42 And a lot of the scientific packages won't build on people's Windows machines.

14:48 So having binaries is extremely important.

14:51 And the binaries provided by Konda and Anaconda have been extremely valuable, especially for people getting started in scientific Python.

14:58 Yeah, I still think I have scars from the vcvars.bat was not found sort of errors trying to do stuff on Windows.

15:07 And we had Travis Oliphant on Show 34, who is behind Konda and Continuum and all that.

15:14 And I think it's a really cool thing that those guys are doing, sort of taking that build configuration step and just pre-building it and shipping the binaries, like you say.

15:26 That really helps people when they're getting started, I think.

15:29 Yeah, it's made a huge difference, especially, as Matthias mentioned, in the workshop, the kind of software carpentry and Python boot camp type environments, which often, you know, just a few years ago, where you spend the first day on installation, basically.

15:48 Which is a high price to pay in a two-day workshop.

15:51 And now it's often down to an hour.

15:53 It's awesome.

15:54 It's a super high price to pay.

15:55 And it's also super discouraging, right?

15:58 People come not because they want to learn how to configure their compiler.

16:01 They want to come build something amazing, right?

16:04 And they've got to, like, plow through all these nasty configuration edge cases.

16:08 And, yeah, very, very cool.

16:09 So, before we move farther, you know, just the other day, I was trying to describe IPython as somebody in, like, one or two sentences.

16:19 And I didn't do a super job, I think.

16:21 Could you guys maybe give me your elevator pitch for what is Jupyter or IPython, which becomes Jupyter?

16:29 It's really tough.

16:30 Have you seen the Lego movie?

16:32 Do you know the song Everything is Awesome?

16:35 Yes.

16:37 That would be my pitch.

16:39 Yeah.

16:42 Everything is awesome.

16:43 Okay.

16:43 Yeah.

16:44 So, I would say it's IPython and Jupyter projects together provide tools for interactive computing and reproducible research and software-based communication.

16:58 Okay.

16:59 It's kind of the high-level gist.

17:01 It's fairly different than a lot of what's out there from a programmer's perspective.

17:06 So, it does take a little explaining, doesn't it?

17:08 Yeah.

17:10 So, we have things like an environment in which to do the interactive programming and do the exploratory work.

17:16 And then we also have things like the notebook document format, which are for distributing the communication and sharing it with other people.

17:23 So, those are kind of the two aspects.

17:26 So, those are kind of the two aspects.

17:26 And Fernando likes to say we have tools for the life cycle of a computational idea.

17:31 That's a very cool way to put it.

17:33 It's a very cool tagline.

17:34 I like it.

17:34 We're talking about IPython because that's the historical place.

17:39 And we're talking about Jupyter because that's the present and the future.

17:42 Could you guys maybe talk about how it went from one to the other?

17:46 What's the story there?

17:47 Yeah.

17:47 So, when we started working on building these UIs with rich media displays, the first one of which was the Qt console, the first step of that was separating the front end from what we call the kernel, which is where code runs.

18:04 That meant essentially establishing a network protocol for a REPL, basically.

18:10 And with that, we have the ability, an expression of, okay, I'm going to send an execute request that has some code for the kernel to evaluate.

18:19 And then the kernel sends messages back that are display formats of various types.

18:25 So, it can send back PNGs or HTML or text.

18:29 We realized, not entirely on purpose, this wasn't what we set out to do, but we realized when we had this protocol that there was nothing Python-specific about it, that any language that understands a REPL can talk this protocol.

18:44 And because the UI and the code execution were in different processes, there's no reason that the two need to be in the same language.

18:53 Communities like, the first big one was the Julia language community, essentially saw the UI, specifically the notebook UI, and said, you know, we like that, we want to use that, we'd rather not reimplement it.

19:07 So, what they implemented was the protocol.

19:09 And once they implemented the protocol, they got the UI for free.

19:13 The result of that, since we didn't set out to design that, there were a bunch of rough edges where we had assumed Python, but they were kind of incidental, smaller assumptions to work around.

19:25 And so, since that started, we've been kind of refining protocols and things to remove Python and IPython assumptions, so that the UI is separate from the language in which execution happens.

19:40 Because we don't really, you know, a lot of the benefits of the protocol and the display stuff, there's no reason it should be confined to code executing in Python.

19:49 Yeah, that's a really happy coincidence, isn't it?

19:53 That's excellent.

19:54 Yeah.

20:05 This episode is brought to you by Hired.

20:07 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

20:13 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.

20:20 Typically, candidates receive five or more offers within the first week, and there are no obligations, ever.

20:26 Sounds awesome, doesn't it?

20:28 Well, did I mention the signing bonus?

20:30 Everyone who accepts a job from Hired gets a $1,000 signing bonus.

20:33 And as Talk Python listeners, it gets way sweeter.

20:35 Use the link Hired.com slash Talk Python to me, and Hired will double the signing bonus to $2,000.

20:41 Opportunity's knocking.

20:43 Visit Hired.com slash Talk Python to me and answer the call.

20:53 Matthias, where did Jupyter come from?

20:55 It used to be called IPython.

20:57 Obviously, that doesn't make sense if you're not using Python.

21:00 We've been thinking about renaming part of the project for much longer times than when we actually announced that we will be renaming to Jupyter.

21:11 Of course, we were aware that users, especially non-Python users, were confused.

21:16 Like, I want to use a notebook with R.

21:20 Why should I install IPython?

21:22 And you have to understand that many users even don't make the difference between Python and IPython.

21:30 And many users also write IPython with lowercase i, and everyone knows that it's uppercase i.

21:34 It's not made by Apple.

21:37 Come on.

21:37 Yeah.

21:38 And so, yeah, we were searching for another name to actually something that is easy to Google, that is not already taken, where we can get a domain name.

21:50 And that would have a connotation, a scientific connotation.

21:54 And we wanted to do thanks to the Astro community that has been using IPython for a long, long time, almost since the beginning.

22:04 And I still remember one day, Fernando wrote a mail to the whole team and said, hey, I just found this name.

22:12 What do you think?

22:14 And it was everybody agreed.

22:16 And almost in a couple of days, we decided to grab all the domain name and start working on actually separating the project and everything.

22:26 It has been a really tough transition.

22:29 People were really, really confused about the renaming.

22:33 People are still confused.

22:35 But especially for new users, the distinction Jupyter-IPython is really, really useful.

22:42 And also, it allowed Jupyter to become something slightly bigger.

22:47 That was also in our mind in the back.

22:49 Which was like, Jupyter is more a specification that you have a protocol and you have a set of tools.

22:56 What is part of Jupyter is much broader and it can allow anybody to basically say, hey, I implement the Jupyter protocol.

23:05 And so, it's easier to say, hey, I have a Jupyter Atom plugin.

23:11 There are also legal issues around that, that using trademarks that are really close to Python is difficult.

23:17 And Jupyter, being a brand new namespace, and you know that namespaces are great, we should use more of them.

23:24 Allow people to use that and say that they are multi-language in a much better way than when saying we are compatible with IPython.

23:34 Because IPython is also highly connected as a shell.

23:38 And Jupyter is more than just a notebook.

23:40 So, having Jupyter is much better and we are happy with that.

23:46 Yeah, it makes perfect sense.

23:48 I'm sure the transition was a little confusing for people who have been doing IPython or they've heard about IPython.

23:54 They were going to look into it.

23:55 Now it's this other thing.

23:57 But there's more than just a couple of languages that are supported, right?

24:01 How many are supported?

24:01 It depends on when you want to be supported.

24:06 We have a wiki page which is still on the IPython repository, which lists, if I remember correctly, 50 or almost 60 languages.

24:17 It means that you can have languages that have many kernels.

24:22 It means that someone at some point wrote a kernel or a toy kernel that works with IPython.

24:29 And if I remember correctly, we have around 60.

24:32 60.

24:34 That means probably if you have a language you care about, it probably works with Jupyter, right?

24:40 Or it's very edge.

24:41 Most kernels won't have all the features.

24:44 I would say that the one I know works with most of the features are the Python one because we maintain it.

24:53 So you can see it as a reference implementation.

24:55 There are other Python ones, like toys that are only a few hundred lines to show you how to implement that.

25:02 The Julia kernel is a pretty feature complete.

25:07 It's actually many of the features that we have in the IPython kernel were actually, by the Julia team, moved into the Julia language itself.

25:18 So actually having implemented the protocol, having seen the Nullbook UI, allowed them to make much better abstraction for the Julia language and actually improve performance in some small area.

25:30 So that's almost a thing.

25:32 The Haskell kernel also have a really good maintainer and have really nice features.

25:39 Like if you write some code in Haskell and you can rewrite it in a more compact form, the Haskell kernel will tell you that after running your code.

25:47 They say, hey, you can rewrite it this way.

25:49 It will be more compact and more readable by someone who does Haskell.

25:53 So Ruby had some activity at some point.

25:57 I'm not sure now how much activity there is.

26:00 And we definitively have people from the R kernel.

26:03 The R kernel was created by Thomas Clover, who is now back in the UK, still working with us.

26:08 And I've been taken over by some R people who are actively contributing to the R kernel and are also reporting bugs and fixing bugs a lot in IPython itself.

26:20 Yeah, and another active kernel author community is the Calico project, which is from the CS department of Bryn Mawr by Doug Blank, where it's kind of a multi-language.

26:33 The kernel itself is a multi-language environment.

26:37 They can actually switch between different runtimes.

26:39 That does some pretty cool stuff.

26:41 And they've been very helpful with implementation and protocol testing and things.

26:46 Oh, that's cool.

26:47 Yeah, and that's kind of related to what I was going to ask you next is if I want to write something in Python and then something in C++ and then something in R, can I do that in like one notebook and have the data work together?

26:58 Well, yes.

27:01 So there are a couple things to that.

27:03 One is we have chosen in the notebook to associate one notebook with one kernel.

27:10 So there's one process determining how to interpret the code cells and produce output as a result.

27:16 There's another project derived from IPython called Beaker notebook that doesn't do this, that associates each cell with a kernel and then defines a data interchange for moving data around that allows running code and passing data around from JavaScript to R, Python, and like this.

27:34 However, a kernel, from Jupyter perspective, a kernel can itself define semantics for running code in other languages.

27:43 And IPython, this is where sort of the distinction between IPython and Jupyter comes up.

27:49 That as far as Jupyter is concerned, there's one kernel associated with the notebook.

27:53 But the IPython kernel can define these things called cell magics that say this is shorthand for actually compiling a block of C++ code with Cython and then running that, or the R magic that actually hands off code to an R interpreter.

28:09 As far as Jupyter is concerned, there's only one kernel per notebook, but the kernels themselves can actually provide some of this multi-language functionality.

28:17 And IPython does.

28:18 Yeah, to extend on what Min said, there is another kernel, one of the Calico kernels, actually, which actually is one kernel that implements many languages, won't go exactly into details.

28:31 And there is this nice distinction that a kernel is not always one language, it can be many languages, and in particular the Calico kernel uses triple person syntax to say,

28:41 hey kernel, change how you parse the next string.

28:46 And so you can actually switch in between three or four languages.

28:50 I don't exactly remember.

28:51 There is Python, there is Scheme, and something like that.

28:55 And it's really interesting.

28:57 And you have different ways of actually sharing data between languages.

29:01 One of the examples I can try to dig, which is really interesting in multi-language integration,

29:07 is the Julia Python magics demonstration.

29:12 You can actually move data and actually have a tight integration between Julia and Python.

29:19 It's only on Python 2, unfortunately.

29:22 We need to update the code for Python 3, but I'm not fluent enough in Julia.

29:26 You can define the Fibonacci function recursively in Julia, calling Fibonacci n-1 in Python and Fibonacci n-2 in Julia.

29:35 And the Python versions that call Fibonacci n-1 in Julia and Fibonacci n-2 in Python.

29:41 And you can ask for a number, and you will actually get a cross-language talk or a stack trace where you have a layer cake of each language,

29:51 which is really, really impressive.

29:53 You can even create from Julia, you can import Matplotlib from the Python side,

30:01 create a figure in Julia with a function which takes sine, for example, from the Julia standard library,

30:09 and cosine from the Python standard library on MPI, and plot that on the Matplotlib figure and get back into Python

30:17 and annotates the figure from the Python side without copying the figures.

30:21 It means that the two interpreters are actually sharing memory.

30:24 So it shows you that you can do some really, really advanced cross-language integration

30:31 without having to copy data back and forth.

30:34 That sounds really interesting and useful for scientists.

30:39 You know, maybe they've got something they've done in R or some other language.

30:43 Yeah.

30:44 Some little bit of processing, and they're like, I really just want to plug this in over here, but it's the wrong language, right?

30:50 And that sounds like it makes it kind of possible.

30:52 Yeah, and it's been, so Julia being a very young language community, it's been extremely valuable to them to build this bridge,

31:01 largely to Python, but also to C and things.

31:05 That Julia didn't, in order to come up to speed with what scientific programmers expect,

31:11 with things like Matplotlib and stuff, they didn't need to write, okay, here's the Julia plotting library, just so that people could do anything.

31:18 They could start out by saying, well, we'll just use Python libraries.

31:22 And because of these really, really slick layers that let Julia talk to Python

31:28 and Python talk to Julia in a really native way, Julia basically gets the entire Python library ecosystem for free,

31:35 and then can kind of re-implement as needed.

31:39 And as they find more idiomatic Julia ways to do things, they can start building those libraries.

31:44 But they didn't have to start from zero just because it was a new language.

31:48 For new languages in general, I think being able to interoperate with other languages

31:52 is a really important, really valuable way to start kind of hit the ground running.

31:57 That is super interesting because, you know, there's 10, 15 years of super solid data science stuff happening in Python.

32:04 And if you can just go, we'll just start from there, rather than from zero, that makes all the difference.

32:09 The other cross-language thing that a few people think about is actually the language in the kernel

32:15 and the language of the front end, which is JavaScript and HTML.

32:19 And the notebook allows you to do like these cross-language bindings really easily, especially with widgets.

32:25 And one of the examples I give is you can have interactive 4chan.

32:30 You can actually plot something in 4chan and display something in JavaScript

32:34 and have a slider that you move and change the result of your 4chan computation.

32:38 Wow, that's really awesome.

32:40 One of the things I wanted to spend a few minutes talking with you guys about was what you referred to as the big split.

32:46 So it used to be that IPython was like one giant GitHub repository, right?

32:51 And now you've broken it into many smaller pieces, yeah?

32:54 Yeah, I went from one to about a dozen.

32:57 That sounds like a lot of work.

32:58 Yeah, that was my spring, pretty much.

33:02 It took three to six months to kind of get it all split up.

33:09 But since we knew we eventually wanted to do something like that, we had, for the most part, already organized IPython into kind of sub-packages of dedicated functionality.

33:20 So it wasn't too crazy difficult to break it up.

33:26 But the tricky bits were like the common utilities that we use all over the place and how to deal with that.

33:31 Yeah, it's the little interdependencies that don't seem big, but when they're woven in between all the pieces, all of a sudden it gets harder and harder, right?

33:40 Yeah.

33:41 So it was tricky to execute the big split in a variety of ways.

33:46 So we wanted to preserve history, but we also didn't want to duplicate the already large IPython repo so that installing from Git would mean that it now is 12 times as big as getting IPython was.

34:00 So we had to do some, on the new repos, some clever Git history rewriting to kind of prune out the history for the files that didn't survive.

34:10 That's really interesting.

34:12 You talked about that in your blog called The Big Split, which I'll link to in the show notes.

34:17 And you had to use some funky commands to sort of make that happen, right?

34:21 Be really careful about how you move the history over.

34:23 Yeah, and this is what, there are some, I think, in the various version control tools, the fact that you can rewrite history in Git is both really scary and weird and gross, but also really useful sometimes.

34:35 But it lets you do things like we did, which is selectively preserve history, which has been nice that you get, you know, you get the history of the notebook work in the notebook repo, even past the creation of the notebook repo.

34:50 But you don't get the baggage, right?

34:52 Yeah, but we don't get the history of all the rest of IPython.

34:56 Yeah, because normally you delete a file out of the repo and it's just, it doesn't show up, but you're still moving it around, right?

35:01 To give you an idea, I will just share, I will send you a link.

35:06 Someone made a graph visualization of the dependencies into IPython before the Big Split.

35:14 And on the same blog post, you have comparison with Django, Twisted, Flask, requests, so that you can get an idea of what the complexity of the entanglement was.

35:26 It's on grokcode.com and it's blog post 864.

35:30 And I will give you the link so you can put it in the notes.

35:34 Oh yeah, thanks.

35:35 It's pretty seriously entangled.

35:37 There aren't dependency cycles and crazy loops of depending on each other.

35:44 It's kind of a tree of dependencies, but there are many nodes on the graph.

35:50 Sure, sure.

35:51 SnapCI is a continuous delivery tool from ThoughtWorks that lets you reliably test and deploy your code through multi-stage pipelines in the cloud without the hassle of managing hardware.

36:16 Automate and visualize your deployments with ease and make pushing to production an effortless item on your to-do list.

36:22 Snap also supports Docker and M-browser debugging, and they integrate with AWS and Heroku.

36:28 Thanks SnapCI for sponsoring this episode by trying them with no obligation for 30 days by going to snap.ci slash talkpython.

36:45 Another thing I wanted to talk about with you guys is this thing called JupyterHub.

36:48 What's the story of the JupyterHub?

36:50 So JupyterHub came out of the value of Jupyter Notebooks and things in teaching context.

36:57 So a lot of people, whether in workshops or classes, are using notebooks to present.

37:04 This is the material that we're talking about and the example code and running, you know, doing live demos and things.

37:10 You want your students to be able to follow along, and this is one of those cases where, you know, installing scientific Python stack is as much as Conda has made that easier.

37:20 It still can be, you know, a significant bar to get over.

37:24 So we wanted, and people were building kind of, people were building tools for kind of hacks around IPython at the time to deploy notebooks on behalf of users.

37:37 And we wanted to provide kind of an official implementation of hosting notebooks on behalf of a group of users in the context of a research group.

37:49 So you've got a machine that a bunch of, you know, half a dozen or so research scientists or students have access to,

37:55 or you have a class of 10 or 50 or 100 students.

38:00 And you say, all right, I've got these users.

38:03 I can install packages for them and then point them at this URL and they can log in and run their notebook.

38:10 So basically take away the installation problem by saying, I'm going to control the installation and host the notebooks and everything.

38:21 And we wanted to kind of create the simplest, smallest version of that.

38:24 And that's JupyterHub.

38:25 Okay, nice.

38:26 From the technical side, we asked before how, which technology were necessary for the notebook.

38:32 And we spoke about WebSockets.

38:34 One of the things which is important is we were using really recent technology and WebSockets was really new at the time.

38:44 And one of the problems is many proxies or web servers were unable to correctly redirect WebSockets or even to update proxy rules without actually restarting the proxy or server.

39:01 And that's one of the requirements we had for the notebook.

39:04 If you want to spawn someone notebook without cutting the connection of the other, we had to have a dynamic proxy, which was able to respond to, for example, REST requests, like changes through there without dropping the WebSockets.

39:19 And before JupyterHub, only a handful of prototypes of projects were able to do that.

39:24 And actually, Min wrote one such an HTTP proxy using Node.js to suit this specific need that no other tools require.

39:37 And that's why you actually need JupyterHub to run something.

39:40 And JupyterHub needs to be one of the most front-facing software.

39:43 I think that no Nginx can do it, too.

39:45 You cannot, for example, use Apache or something else.

39:50 Or it's really much more difficult to have many notebook servers running.

39:54 It's interesting that a lot of the web servers weren't really built for that, right?

39:59 Because I guess they came before WebSockets anyway.

40:03 And that probably was not a super important criteria for them, right?

40:06 Yeah, the web servers were actually surprisingly slow to adopt, to provide WebSocket implementations.

40:13 Nginx didn't take too long, but it was quite a while before you could reasonably expect an Apache installation to support WebSockets.

40:21 They do now, so there are notebook deployments behind both Apache and Nginx.

40:26 But, yeah, we put together the configurable HTTP proxy as this kind of super simple proxy that you can update.

40:34 You can update the routing table without relaunching or without losing existing connections or anything.

40:41 The more you can make it easier for people to set up these environments, the better.

40:45 Because it's not always going to be like some web server admin or a really experienced web developer doing this, right?

40:52 It could just be a scientist who just wants this thing for their class, right?

40:56 They don't want to deal with Nginx.

40:57 Yeah, and that's something we're working on right now.

41:00 Because the way JupyterHub is put together, it has two primary extension points.

41:05 One is authentication.

41:07 So how users log in, and you can kind of drop in any implementation of logging in, whether it's just local authentication with the system, with the password, or using OAuth with GitHub, or your campus sign-on, that kind of stuff.

41:22 And then the other is the spawning, how it actually allocates resources for the single-user servers.

41:28 But because there are so many choices for how to do that, it actually, one thing we're working on is making kind of more of a turnkey version that you can say,

41:40 I want to use this authentication system and this spawning mechanism, and people can just deploy that.

41:46 Because there are, from the very simple default behavior of just that works out of the box for, I've just got a shared machine that's on the internet,

41:55 and I want to give all the users who already have accounts on that machine access.

41:59 That's pretty trivial right now, all the way to a deployment last year that Jess Hamrick did at UC Berkeley

42:08 for a couple hundred students in psychology using Docker Swarm and Nginx and a big multi-node deployment for a large number of users

42:22 and using Ansible to automate all that deployment.

42:25 That sounds awesome.

42:26 Is that documented somewhere?

42:28 Like, is there an article or something on this?

42:30 Yeah.

42:30 So she wrote a blog post for the Rackspace developer blog because the hosting for that class was all provided by Rackspace.

42:37 So she wrote a blog post covering that, and her Ansible setup is just a repo on GitHub that we can link to.

42:43 Okay, awesome.

42:44 Yeah, we'll put that in the show notes.

42:45 So I can go to Dropbox and get, like, storage as a service.

42:50 I can go to Google Apps and get Word Processor as a service.

42:54 Can I do that for Jupyter somewhere?

42:57 Can I just go and, like, pay $5 a month and get, like, Jupyter?

43:02 Yeah, there are a few companies hosting Jupyter Notebooks.

43:06 So IBM has their, I believe it's called a workbench, data science workbench.

43:11 Continuum Analytics has Wokari, which hosts Notebooks.

43:17 I'm trying to think how many others there are.

43:20 There's Domino Data Lab.

43:23 So there's William Stein with SageMath.

43:26 Yeah, SageMath Cloud is probably the primary, the one we're most connected to.

43:31 Okay, cool.

43:32 That's good to hear.

43:34 Yeah, so there are a variety of these hosted notebook things.

43:38 Yeah, one thing which is slightly related, I don't know if we might talk to that later.

43:44 It's mybinder.org, which has been set up by Jeremy Freeman from Janelia Labs, where basically you set up a GitHub repository, your notebooks, a requirement file, some extra metadata if needed.

44:01 And you link to mindbinder.org.

44:03 I will give you the link.

44:05 And it will actually, just for you, spawn a Docker instance with the requirements and give you a temporary notebook online.

44:15 So if you have an article that you want to be reproducible, you can just post it on GitHub.

44:20 It's basically like NB viewer, for those who know, but back by a kernel.

44:26 And it's paid directly out of Jeremy's pocket.

44:28 And huge thanks to him.

44:31 If ever someone from Google hears that and likes the project, you would be grateful if you could give some free resource to Jeremy to host that.

44:42 Wow, that's really cool.

44:44 You know, I was thinking about how Docker might fit into it here.

44:47 And that's certainly a really nice use of it.

44:50 We have TempNB, which is written by Kyle Kelly at Rackspace, which drives, if you go to try.jupiter.org, you'll get temporary notebook server to kind of try out Jupyter with a few kernels installed.

45:03 But it's very locked down.

45:05 You can't get your own work in there or do network things.

45:09 And so Binder basically combined the NB viewer idea with TempNB to kind of create a workspace where you can actually install packages and get everything running and preload it with notebooks.

45:22 So we're kind of we're working on a variety of these kind of applications.

45:28 A lot of them are Docker based of deploying notebooks on in various various contexts.

45:34 So one thing I wanted to make sure we talk about before we wrap things up is not too long ago, I guess in July, your project got $6 million in funding.

45:44 And that's just so awesome.

45:47 I mean, I can't think of another open source project that's not deeply tied to a company that got that kind of money.

45:55 Yeah, we've been extremely fortunate with the funding over the last few years, thanks to a huge amount of work from Brian Granger and Fernando Perez in chasing that funding down.

46:06 And which is what has given me a job for the last few years and Matthias as well.

46:13 So, yeah, that's really great.

46:15 I mean, what kind of difference does that make for what you guys are doing?

46:18 Obviously, a job, but what are the goals for that money?

46:21 Really to expand, kind of expand the coverage of the Jupyter project that has a lot to do with education and kind of the document and publication pipeline.

46:33 So with NB convert, we can convert notebooks to various formats, but that's a very young project that has a lot of a lot of room for improvement.

46:41 And so integrating into publication pipelines and things like that and converting other formats to notebooks in addition to converting what we already do, which is converting notebooks to rendered formats.

46:52 And then the other side is in the building web applications with the tools that we have.

47:00 So live collaboration, real time collaboration in the notebook is going to be a big part of it.

47:05 And then building applications other than the notebook.

47:08 So we've got this protocol for execution that we've talked about as kind of the building block.

47:13 You can use that without the notebook document.

47:17 So you can build a web application that involves running code and producing output, but it may not necessarily be a document editor type application.

47:26 It could be something on the other side that's not the notebook app, basically.

47:31 O'Reilly built something, basically built a version of this called Phoebe that drives a few demos and some interactive code-based blog posts on the O'Reilly website.

47:43 But if you look at how Phoebe works, and Phoebe also itself relies on some TempNB-based Docker deployment stuff.

47:51 The way the JavaScript side of the code is organized and also to some degree the server side, we don't make it easy.

48:00 So while it's technically true that you can use the kernel execution display stuff without the notebook document piece, we didn't make it easy to do that.

48:09 So Phoebe kind of has to hack around a few of the choices we made.

48:12 And right now we're in the middle with the help of a lot of volunteer to contribute work from Continuum Analytics and Bloomberg and other folks.

48:23 We're working on refactoring the client side of the code into small JavaScript packages, just like we broke up the Python side into the various pieces.

48:31 We're breaking up the JavaScript side.

48:33 So you can say, I want to install just the client side code that lets me run code and get output.

48:40 And with that, then you can build an application that is a web app that maybe you don't even show any code to the user.

48:46 You just show sliders and buttons and everything.

48:50 But the way your application works actually involves this executing code and getting output with the display protocol.

48:55 So you don't have to rewrite all of that logic yourself.

48:59 It's kind of similar to the way Julia and Haskell drove us to revise our messaging protocol to be more useful outside of Python.

49:09 Things like Phoebe and other applications are driving us to kind of rethink how our JavaScript code works so that you can actually, you can use it kind of pieces at a time in contexts other than the ones that we've already thought about.

49:24 Yeah, it sounds like there's going to be a lot of innovation and cool new stuff built on top of this now that you've broken out all the building blocks.

49:32 Beyond this just purely technical, we have money for the project and actually write code.

49:37 I first want to do thanks.

49:39 I mean, the MZ Moore and Sloan Foundation are really people who are nice to work with.

49:45 I mean, they just don't ask us for reports.

49:47 It's really a back and forth.

49:48 And the money is allowing us to do much more things than just being employed.

49:53 We, for example, now will be able to host a Jupyter Day where we can actually have talks and invite speakers.

50:01 We already had one in November in New York.

50:04 And there is, no, it's in November.

50:07 Anyway.

50:07 And one next month, so in February in Chicago, there is a Jupyter Day.

50:13 And without this sound, it would be difficult to actually bootstrap that.

50:18 And the other thing is now we are around 20 people across the globe working on the project.

50:26 People that are actually on their day job or at Continuum and are at Workspace or IBM, Google, Microsoft are actually allowed during their work day to contribute to the project.

50:37 And this helps a lot.

50:39 And one of the things that this fund will allow us is to actually gather twice a year, all of us, to actually meet each other.

50:46 And once you meet people, you can do much more work than only interacting through the Internet.

50:52 And that's something that is really, really great.

50:55 And that will change a lot of things, I think, for the project.

51:00 I mean, at least last year, I think the NumPy developer had never met all together.

51:06 And they got some funding to get them together at SciPy.

51:10 And apparently, it was really great to motivate everyone and relaunch the project.

51:14 And I'm really looking forward because some people that are working with us have never met in person.

51:20 And I would really, in a few weeks, we should meet together thanks to some of these funds.

51:27 And that's something I'm really looking forward to.

51:30 That's a really interesting point about the community.

51:33 There's obvious technical stuff.

51:35 But like you said, building the community both from the outside and the inside, that's definitely going to be powerful.

51:41 Yeah.

51:41 And another aspect of the building and maintaining the community that is facilitated by this funding is being able to,

51:51 and one of the things that a lot of open source projects, our own included, really suffer on is documentation.

51:58 That especially a fast-moving project, the documentation can really suffer.

52:02 Because especially when you're working with volunteers and things, it's a lot more fun to work on code than docs.

52:08 Yeah.

52:09 Would you rather write a feature, a new feature, or would you rather talk about somebody else's feature that's going to be outdated anyway, right?

52:15 Yeah.

52:16 And I think it is the responsibility of funded projects like ours to devote some of those resources to maintaining communication with the community and building good docs.

52:29 Because it's often easier to find people who can build good docs that you can pay to do that than it is people who can build good docs who are willing to volunteer a bunch of their time.

52:42 And so being able to compensate people for contribution to documentation is, I think, an important thing for funded projects to do.

52:50 And so we've recently hired a couple people to specifically emphasize improving documentation and things.

52:57 And I think that's a really valuable thing that the funding provides for not just us, but for the community.

53:04 Good documentation and tutorials and samples.

53:08 That really makes it much easier to start using a project.

53:11 One point about contribution that has followed me since I started the project, and I felt was great, especially in the Python community, and made me stuck to iPython, is most of the core developers, if you are a beginning Python programmer and you want to contribute to the project,

53:31 We will likely spend a few hours with you on GitHub or even on Skype, if you ask, to help you build a feature.

53:39 Even if it would take us five minutes, I will personally take a few hours to teach you how to do it if you want to contribute.

53:47 So don't be afraid to come and ask and say, I don't know.

53:51 I want to build that.

53:52 I have the time.

53:54 I have the motivation.

53:55 It's okay if it takes you four weeks.

53:57 It's okay to be wrong.

53:58 It's okay to ask.

54:00 Unlike some other community who are known for having a BDFL, which is not really nice on the meaning list, we will take time for you to help you, to teach you Git, because we really think that you should feel welcome.

54:17 And that's how we had some people that stayed here, because contributing is a project where we help you.

54:23 It's not, let me do it and I'm done in five minutes later.

54:27 We prefer for people to take four weeks to do something than doing it ourselves.

54:32 It's all about teaching people to fish rather than getting them a fish, right?

54:36 You can just do so much more if you can get more people brought into the project and comfortable.

54:41 And that's really, it's a great message.

54:43 Thanks.

54:43 So I basically have time for one more question for you guys, and then we'll call it a show.

54:48 Min and I were talking about some really cool places using Jupyter before we started recording.

54:56 And I just want to ask you each, maybe Matthias first, what's the coolest use or example of people using Jupyter or IPython that you know of?

55:07 Peter Norvig posts on NB viewer are some of the most awesome things I've seen.

55:12 He is really busy and he had some typos on his notebook and I sent him a patch and he took the time to reply to me.

55:20 So he is both someone who wrote awesome notebooks and someone who is really friendly.

55:27 And I also like a lot Jake Van Der Plass' blog when he's blogging.

55:33 He has a blog engine that publish directly in the book.

55:37 So it's Pelican.

55:38 And if you can add a plugin and you can directly like wrote your notebook and they're published directly as blogs.

55:45 And also my blog, which is Nicholas, is just notebook.

55:49 I just push notebook on the GitHub repository.

55:51 And Travis CI take care of compiling them to HTML and publish, which makes publishing really easy, I find.

56:00 Oh, that's, yeah, those are really cool examples.

56:02 Min?

56:03 Yeah, so the examples for me are Thomas and I got to visit CERN in December where they're working on some big Jupyter deployments for the scientists working on the LHC and other experiments at CERN.

56:22 So that's definitely a really cool thing to see, you know, Jupyter and IPython, as is someone whose degrees are in physics, seeing Jupyter and IPython valuable in that level of physics experiment and theory research has been really cool.

56:39 The other things that I think are really cool are where it's been adopted in educational contexts.

56:47 So, for instance, Lorena Barba has her AeroPython course teaching aerodynamics with Python is all in notebooks on GitHub.

56:58 And then Doug Blank has his computer science courses all with notebooks in a JupyterHub instance for his students.

57:06 And Jess Hamrick at Berkeley with her large deployment teaching cognitive computational models for cognition.

57:15 And some of these are not programming classes, but they're classes meant to teach scientific principles where programming just happens to be a useful way to illustrate and explore the ideas.

57:28 And the fact that notebooks are proving useful in kind of just learning, both learning how to do science for students at the undergraduate and graduate level, and then also for actual scientists doing, you know, doing real cool science.

57:43 Both of those, I think, make me feel really good about working on the project.

57:47 Yeah, I'm sure those are really motivational.

57:50 You wake up, you're like, wow, I'm building software for all of this.

57:53 This is fantastic.

57:55 On show 29, I had Kyle Cranmer from the Atlas Experiment at Large Hedron Collider on, and he talked about how they were using IPython and stuff like that there as well, which is very cool.

58:06 All right, gentlemen, this has been super interesting.

58:10 It sounds like IPython has become Jupyter, and it has been such a success, but it seems like it's on the brink of breaking out to be way much more than what it has been.

58:22 So, it's an exciting time.

58:24 Thanks for having us.

58:26 Yeah, thanks for having us.

58:27 Yeah, you bet.

58:28 Thanks for being on the show.

58:29 Talk to you later.

58:30 This has been another episode of Talk Python to Me.

58:34 Today's guest was Matthias Boussignier and Min RK, and this episode has been sponsored by Hired and SnapCI.

58:41 Thank you guys for supporting the show.

58:43 Hired wants to help you find your next big thing.

58:45 Visit Hired.com slash Talk Python to me to get five or more offers with salary and equity presented right up front and a special listener signing bonus at $2,000.

58:54 SnapCI is modern, continuous integration and delivery.

58:58 Build, test, and deploy your code directly from GitHub, all in your browser with debugging, Docker, and parallelism included.

59:05 Try them for free at snap.ci slash Talk Python.

59:08 You can find the links from today's show at talkpython.fm/episode slash show slash 44.

59:14 Be sure to subscribe to the podcast.

59:16 Open your favorite podcatcher and search for Python.

59:19 We should be right at the top.

59:20 You can also find the iTunes and direct RSS feeds in the footer of the website.

59:24 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

59:28 You can hear the entire song on talkpython.fm.

59:32 Oh, and don't forget to check out the podcast t-shirt at talkpython.fm/shirt.

59:37 Get yours and share your love for Python with the whole world.

59:41 This is your host, Michael Kennedy.

59:43 As always, I really appreciate you listening to the show.

59:45 Smix, take us out of here.

59:49 I'll see you next time.

01:00:08 Bye.

01:00:09 .

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon