Analyzing dozens of notebook environments

0:00

54:24

Links Episode Deep Dive Transcript

Panelists

Are you using interactive notebooks for your data exploration or day-to-day programming? What environment do you use? Was it Jupyter and now you've made the move to JupyterLab? That's a great choice. But did you know there are more environments out there to choose from and compare? Have you heard of Callisto or Iodide? How about CoCalc or PolyNote? That's just the tip of the iceberg!

That's why I'm happy to have Sam Lau and Philip Guo here to share their research comparing and categorizing over 60 notebook environments.

Episode Deep Dive

Guests Introduction and Background

Sam Lau is a PhD student at UC San Diego with a focus on cognitive science and data science education. He originally got into Python through online courses and went on to study how people learn and use Python-based data tools. Philip Guo is a professor and researcher at UC San Diego specializing in human-computer interaction (HCI). He has previously appeared on Talk Python to Me discussing CPython internals, maintaining open source projects, and now returns to share this latest research on notebook environments.

What to Know If You're New to Python

If you’re new to Python, this episode explores interactive notebook systems such as Jupyter and alternatives. A basic grasp of Python’s syntax, importing libraries, and printing output is all you need to follow the main ideas. Being comfortable running Python code in a web-based or local environment will help you understand the benefits and limits of these notebook tools. And don’t worry, most notebook systems aim to reduce setup friction so you can focus on data and code rather than complex configuration.

Key Points and Takeaways

Dozens of Notebook Systems Beyond Jupyter
- Jupyter notebooks are popular, but Sam and Philip identified over 60 computational notebook solutions for data exploration and collaboration. Many are built on Jupyter or directly inspired by it, each offering a twist on features like collaboration, versioning, or execution models.
- Links / Tools:
  - Jupyter Notebook
  - JupyterLab
Defining "Notebook Environments" and Literate Programming
- Notebooks blend code, visualizations, and explanatory text into one unified document, often referred to as literate programming. This format preserves the logical flow of data analysis while keeping the underlying code in place for reproducibility.
- Links / Tools:
  - Observable (JavaScript-focused reactive notebooks)
  - Iodide / Pyodide (Python compiled to WebAssembly in the browser)
Academic vs. Product vs. R&D Notebook Projects
- The research grouped notebook systems into three main categories: Academic prototypes (pushing new interaction ideas), commercial product offerings (cloud-hosted or enterprise solutions), and R&D or experimental tools. Each aims to solve distinct user problems like large datasets, real-time collaboration, or domain-specific integration.
- Links / Tools:
  - Google Colab (Cloud-hosted Jupyter environment)
  - Azure Notebooks (Microsoft’s Jupyter-based environment)
“Out-of-Order Execution” Challenges
- One of the most common notebook pitfalls is the ease of running cells in a non-linear order, possibly deleting or skipping cells that defined variables. This can result in mysterious bugs and stale state. Yet, out-of-order execution is also a superpower for interactive exploration.
- Links / Tools:
  - Papermill by Netflix (Helps solve reproducibility and versioning)
  - Streamlit (Automatically re-runs code in a linear flow under the hood)
Versioning and Collaboration
- While plain Jupyter notebooks can be hard to merge in Git, some notebook systems integrate robust versioning and real-time collaboration. Projects like Callisto added chat windows and cursors for a “Google Doc–like” experience, and Gigantum extended notebooks with built-in project tracking.
- Links / Tools:
  - nbdime (Diff and merge for Jupyter notebooks)
  - Gigantum (Collaboration and versioning platform for notebooks)
Exporting Notebooks as Reusable Artifacts
- Besides static HTML exports, new approaches let you create entire Python packages from notebooks. NB Dev, for instance, organizes notebooks into modules, tests, and documentation, all from within the same environment.
- Links / Tools:
  - NB Dev (Build libraries from Jupyter notebooks)
  - Carbide (Experimental reactive system with adjustable code variables)
The “Kitchen Sink” vs. Specialized Solutions
- Many advanced features (live collaboration, reactive execution, real-time streaming data) are scattered across various tools. Yet no single notebook solution combines them all, often due to complexity and potential user-interface overload.
- Links / Tools:
  - Rex (Automates data cleaning via program synthesis)
  - Nextjournal (Notebook platform for reproducible research)
Future Directions for Notebook Innovation
- While most products focus on data scientists today, Philip hopes to see “weird” notebooks exploring areas like augmented reality, multi-device usage, or specialized creative workflows. Pushing these boundaries could reveal new ways to learn and write code beyond the web browser.
- Links / Tools:
  - Callisto Paper (Research on real-time collaboration)
  - Pyodide (Python running entirely in the browser through WebAssembly)
Use Cases for Production Workflows
- Beyond data exploration, some teams run notebooks as scheduled tasks or embedded devops processes. Netflix’s Papermill exemplifies how logs, error states, and outputs can be preserved seamlessly when notebooks act as daily or hourly jobs.
- Links / Tools:
  - Paper Mill Blog (Netflix’s approach)
Academic Prototypes & The State of HCI Research

From Sam and Philip’s perspective in HCI, this is still a young area of research, there’s ample room for more experimentation. Many academic notebook ideas aim to solve real developer and researcher pain points but haven’t become mainstream tools yet.
Links / Tools:
- UC San Diego Cognitive Science Dept.
- Philip Guo’s Research

Interesting Quotes and Stories

"I came into this project thinking, oh, it's probably just Jupyter plus three or four other systems. Then we found over sixty." – Sam Lau

"One of the roles of academic research and prototypes is you want to push the bounds and try stuff that's very different. So I'd love to see more weird notebooks." – Philip Guo

"It looks to you as a user like the notebook re-runs from top to bottom every time, but it really uses caching and clever tracking behind the scenes." – Sam Lau on Streamlit’s execution model

Key Definitions and Terms

Notebook Environment: A tool that allows code, output, and explanatory text to live in a single document for iterative and literate programming.
Literate Programming: An approach where code and human-friendly text are interwoven, making the analysis and computational thinking more transparent.
Out-of-Order Execution: The ability to run cells in a notebook in an arbitrary sequence, which can lead to inconsistent variable states.
Reactive Programming: A style where changes to certain variables automatically trigger updates in dependent outputs, often used for live dashboards.
Papermill: A tool that parameterizes and executes Jupyter notebooks, useful for scheduled jobs and capturing final notebook states on failure.

Learning Resources

If you want to deepen your Python knowledge, especially if you’re newer to Python, here are some resources you might enjoy:

Python for Absolute Beginners: This course guides you through Python essentials with no prior coding experience required.
Data Science Jumpstart with 10 Projects: Ideal if you want a practical foray into working with data in Python through multiple hands-on projects.

Overall Takeaway

Notebooks have revolutionized the Python data science workflow but exist in an evolving, increasingly diverse ecosystem. From the mainstream popularity of Jupyter to highly specialized or experimental notebook systems, these environments continue to expand what’s possible for scientists, developers, and educators alike. At the same time, challenges around collaboration, versioning, and execution order remain ongoing research and development frontiers. Sam Lau and Philip Guo’s work highlights both the richness of today’s notebook landscape and the exciting potential for future innovations.

Links from the show

Sam on Twitter: @samlau95
Philip's site: pgbovine.net
The paper (PDF download): computational-notebooks-design-space_VLHCC-2020.pdf
NBInteract: nbinteract.com
NBStripout: pypi.org/project/nbstripout
Audio live coding: foxdot.org
NBDev: github.com/fastai/nbdev
PyIodide episode: talkpython.fm
Carnets: holzschu.github.io/Carnets_Jupyter

Episode #268 deep-dive: talkpython.fm/268
Episode transcripts: talkpython.fm

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Are you using interactive notebooks for your data exploration or your day-to-day programming?

00:04 What environments do you use?

00:06 Was it Jupyter and now you've made your way over to JupyterLab?

00:08 That's a great choice, but did you know there are more environments out there to choose from and to compare?

00:13 Have you heard of Callisto or Iodide?

00:15 How about CoCalc or PolyNote?

00:17 Those are just the tip of the iceberg.

00:19 That's why I'm happy to have Sam Lau and Philip Guo here to share their research,

00:24 comparing and categorizing over 60 notebook environments.

00:27 This is Talk Python To Me, episode 268, recorded June 11th, 2020.

00:32 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:52 This is your host, Michael Kennedy.

00:53 Follow me on Twitter where I'm @mkennedy.

00:56 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python.

01:02 This episode is brought to you by Linode and Sentry.

01:05 Please check out what they're offering during their segments.

01:07 It really helps support the show.

01:09 Sam and Philip, welcome to Talk Python To Me.

01:12 Pleasure to be here.

01:12 Same here.

01:13 Yeah, nice to have you on the show, Sam.

01:15 Philip, welcome back.

01:16 You've been on the show a few times.

01:18 You were one of the very first guests way back when, diving into the CPython source code internals.

01:23 And then we've talked about a bunch of fun topics, you know, coding in your golden years and maintaining open source projects individually and things like that.

01:32 It's been a lot of fun to have you on the show and nice to have you back.

01:34 Yeah, it's great to be back.

01:35 And yeah, that was my first ever podcast with you.

01:38 That was, I think, almost five years ago.

01:40 It's good to be back as a podcast veteran now.

01:43 That's right.

01:44 You know the whole deal.

01:44 And Sam, welcome to the show.

01:46 Before we get to the main topic, I always like to set the stage for where the guests are coming from.

01:52 So when somebody says, you know, like, I think you should really do this, like, they might say that because they're a biologist using Python, not a person running Facebook or the other way around, right?

02:02 So maybe just tell us really quickly about your background and how you got into Python.

02:06 My background in Python came from, actually, back in high school, I ran into the first wave of, like, massive open online courses.

02:15 Right.

02:17 Right.

02:18 And so, like, at my school, there wasn't much emphasis on programming much at all.

02:23 And so when I ran into, when I found out about those courses, I got really into it and decided to major in computer science at UC Berkeley and then ended up doing, ended up starting this PhD at UCSD with Philip studying cognitive science.

02:36 Cognitive science is really interesting.

02:38 I've talked to a few folks from there.

02:40 There's a lot of interesting computational CS type problems happening in cognitive science.

02:46 Yeah.

02:46 And data science as well.

02:47 Absolutely.

02:48 I worked for a while at a company that was basically a spinoff from a research lab that was a bunch of cognitive science PhDs.

02:55 And we did all sorts of cool stuff with eye tracking and programming.

02:58 And, man, it's just a lot of fun.

02:59 I really enjoyed my time in that space.

03:01 Yeah.

03:01 People here are, cognitive science is so broad that you have people here using programming for all sorts of tasks.

03:07 Like, we have people here training deep neural networks, but also people studying fMRI signals and processing those and working with neuroimaging.

03:18 And so you just get a really diverse, broad range of perspectives and uses for Python.

03:23 Yeah.

03:23 It's interesting, too, because I feel like there's still a lot of open questions.

03:26 Yeah.

03:27 There's a lot we don't know.

03:28 And it feels like the more we know, the more we know about what we don't know.

03:31 Yeah, exactly.

03:33 Every answer opens up to more questions.

03:36 That's right.

03:36 Exactly.

03:37 All right.

03:37 Well, we're going to talk today about the research that two of you have done in the space of notebooks, as in computational notebooks, Jupyter notebooks, that type of thing.

03:48 And it turns out, actually, there's more than just IPython and Jupyter, right?

03:53 Quite a few more.

03:54 Yeah.

03:55 So in our paper, we talk about a total of 60 different notebook systems that we put together and try to draw out some patterns from.

04:03 Yeah, it's really interesting.

04:04 I mean, people, I think Jupyter, IPython, when it was first started, became Jupyter, really changed the way people thought about programming.

04:12 I mean, there were tools like this.

04:13 Maybe you guys can throw out some that felt kind of like that.

04:17 So when I say that, the one that comes most to mind is, I think, Mathematica or Maple, right?

04:24 They're really beautifully visual, but then they also had computation, but they also had like output, you know, MATLAB, but MATLAB's not quite as pretty, for whatever reason, more utilitarian, I guess.

04:36 And when you thought about programming, that was, you know, maybe if you're trying to emulate that, it would be a script, and then it would like output some stuff, or it would pop up a window, it'd save like a JPEG or something.

04:46 But the notebooks really brought those two worlds together.

04:49 Say, look, you can write code in a, like a pro programming language in a way that you really would use like libraries and stuff that are not just baked into these tools.

04:57 But then they have this kind of interactive world.

05:00 I think it's really interesting to see all these derivative concepts, right?

05:04 We've got Jupyter that had its way of doing things, but there's challenges there, and there's a lot of variation.

05:10 Like, were you surprised to find this many different systems around?

05:13 I certainly was.

05:14 I think I came into this project thinking, oh, it's probably just Jupyter plus like maybe three or four other systems.

05:21 And I think the deeper we went, the more we were like, whoa.

05:25 People are really taking Jupyter and changing it, adding new things to it, making it fit their use case more effectively.

05:36 And there's a lot of, it turns out that there's a lot of different groups studying Jupyter and notebook systems in general and trying to figure out how they can make it better for themselves.

05:45 Yeah, I think I'd just add to that, to what Sam said, was that, you know, we started this project.

05:50 So this is, this paper is co-authored by Sam, one of my other PhD students, Androsos, one of my undergrad students, Julia Markle, and myself.

05:59 And we basically just kind of split up the work and just went to find a lot of these notebook systems.

06:04 And the motivation for this was that, you know, in the academic literature, in the kind of academic world of studying programming tools, in the past few years, there's been a lot of people studying how people use notebooks.

06:15 I believe Adam Rule was on your podcast, Michael, was Adam?

06:19 It sounds familiar.

06:21 Yeah, he was about the Jupyter, the, you know, analyzing the million Jupyter notebooks.

06:25 Yes, that's right.

06:26 Yeah, that's right.

06:27 Yeah.

06:27 So there's a bunch of studies around Jupyter notebooks and how people use them for science and stuff, and then people developing new prototypes.

06:34 And at the same time, there's a bunch of industry people, you know, every startup company is like, we're going to do a notebook, we're going to do this other notebook, this other notebook.

06:40 So it became like this wild west.

06:41 That's right. Google CoLab, DataLore, Azure notebooks, all those, yeah.

06:44 Yeah, so it became like this wild west, and we felt it was around time for this sort of what we call a survey paper or a meta-analysis, right?

06:51 It's like, this paper is not doing original research in that we're not making a new notebook, but we are collecting together dozens of these notebooks from both academic prototypes and also industry products and trying to, like,

07:04 analyze them and see, you know, how does this space fit together?

07:06 So I got a few of my students together, and we kind of brainstormed and looked for a lot of notebooks and stuff.

07:12 And then Sam, who is the first author on this, kind of, you know, did a lot of the leadership on this work.

07:17 So I'll let him talk more about the details, because I forgot all the details, so I just delegate them.

07:22 Yeah, sure.

07:22 Yeah, and your primary research is in human-computer interaction?

07:26 Right, right.

07:27 So my primary research area that my students are in is called HCI, or human-computer interaction.

07:31 And the kind of industry term for this is, you know, people know it as UX design, right?

07:36 User experience design, user interface design.

07:39 And, you know, HCI, or human-computer interaction, is sort of the academic term for the research that goes into how do you make better user experiences for, you know, different kinds of users.

07:49 In this case, it's for scientific programmers and data scientists.

07:52 Yeah.

07:53 Okay.

07:54 Now, I think probably we should start this conversation with a definition, because I kind of opened it saying, well, Jupyter is the prototypical thing of what a notebook is.

08:05 And it's kind of the natural progression of something like Mathematica.

08:09 But what would you all define a notebook system to be?

08:14 So, like, how do other things that are, you know, not exactly that fit into this world?

08:19 In this paper, we define a computational notebook as a system that supports literate programming, where you can blend a text-based programming language with program outputs in a single document.

08:32 Right.

08:32 And this is not a super new idea, right?

08:34 This, I mean, Donald Newt is like in the 80s or something talked about this idea, right?

08:38 Yeah.

08:38 And actually, I think the entire tech program, like LaTEC, the tech that Donald Newt made, was a literate program to begin with.

08:45 He was like the, I would say, like most prominent proponent of literate programming.

08:50 Okay.

08:50 So, notebooks are these embodiments of this literate programming idea.

08:54 Basically, I guess I'm hearing that it's the ability to blend together the presentation of, like, prose almost, and computation and visualization around that.

09:07 Yeah, yeah.

09:08 And it kind of shifted the dynamic of how we view a program.

09:12 Instead of having a program be instructions for a computer, it allowed us to see programs as kind of like the outputs themselves, as a thing that you would show to other people in the end.

09:22 Yeah, because traditionally, it would be, here's the graph.

09:25 Exactly.

09:26 We ran the analysis, and this is the output.

09:28 Here's the description of the output, maybe a description of the algorithm, and here's the output.

09:33 And, you know, there's been a big push in the scientific space for reproducibility.

09:37 And I think also just in the data science space for explorability, right?

09:42 Like, I've got data.

09:43 I don't really know what it is.

09:44 I need to just get in there and play with it before I turn the algorithms loose on it.

09:48 And that's kind of what's happening here, right?

09:50 Yeah, definitely.

09:51 I would say many of the projects that we looked at in this paper were birthed out of this, directly out of, like, the reproducibility issue, where some person might write a script to generate a graph somewhere, but then lose it.

10:04 And then all you have now is, like, a graph in the end.

10:06 If you had a notebook, the pitches, if you had a notebook, you wouldn't need to have a separate graph, like JPEG, and then a separate program to run it.

10:14 You could just have it in one place.

10:16 Right, right, right.

10:17 How often does version control in the science space or the sort of beginner developer space look like a bunch of zip files where they're named the thing and then the date or the thing of the date and then, like, two and then three?

10:30 Oh, yeah, exactly.

10:31 Something like that.

10:32 It's like zero five underscore final final V3, you know?

10:35 Exactly.

10:37 Final edited.

10:38 That's not amazing.

10:39 So having something in here that sort of brings a little more formality and brings, I guess, the paper and the presentation and the script or code that ran it, like, into one thing that's versioned continuously seems a little bit better.

10:52 Yeah, and actually, there are the basic way to version control a Jupyter notebook is to put it in a version control system like Git.

11:00 But oftentimes, it's not good enough for actual scientists who have, like, the notebook files themselves, but also their data and also the libraries that the notebook use to generate the outputs.

11:11 And so there's a group of projects that we surveyed that not only version a notebook, but also in version a notebook and the dependencies.

11:17 Okay.

11:18 Yeah, very nice.

11:19 One of the challenges, however, I think it's ironic, like, it brings stuff together here to make the code and the analysis and the presentation all as one.

11:28 But it actually, Jupyter notebooks are not very easy to version.

11:30 Quite hard, actually.

11:32 Yeah.

11:32 Yeah, like, if they store the last run of the output, and if that is dynamic in the sense that it's, like, not every time you run it, you get the same output.

11:40 That can be a, I mean, even if it changes, right?

11:43 Even if it is that case, it's still hard.

11:44 But if it's, like, changes every time you run it because it pulls something from the internet that varies or whatever, it's basically always a merge conflict.

11:51 So there's tools like NB strip out, which are nice, that are, like, pre-commit hooks that'll let you do that and other ones.

11:57 But still, it's, I think it's, these notebooks, they come both with, like, a new power and accessibility, but also their own challenges.

12:05 Yeah, exactly.

12:06 Yeah.

12:06 So in your paper, one of the things you all talked about is you said around notebooks, they have these several themes.

12:12 They have end-user programming, exploratory programming, live programming, and literate computing.

12:19 So what do you mean by, like, live programming?

12:23 I understand end-user programming, which is pretty cool.

12:25 Observatory makes sense.

12:27 But what's the live here?

12:28 So, yeah, so there are a few kinds of definitions of live programming.

12:31 One definition comes from kind of the creative space.

12:34 So there are these people who are graphic artists or musicians who actually literally, you know, they're streaming on Twitch or they're doing these, you know, pre-COVID, I guess.

12:42 They're doing these live performances on stage where they're using some kind of graphical software to, you know, Max MSP or whatever to put together live music performances or graphical art.

12:52 Another form of live programming that I think what we mean in this paper is more like a programming environment that updates live, right?

12:59 So as you're typing the code, it's actually just constantly running it and generating output.

13:03 So some of the notebook systems we've looked at allows you to kind of do this live programming where as you start typing, it might autocomplete.

13:10 It might start generating graphs for you just to kind of cut down on this edit run loop and such.

13:16 Right, right.

13:17 It's not this I'm going to edit when it's time to hit run, I run.

13:20 Or, you know, maybe in some languages there's a compile, then a link, then a run, and there's a big delay.

13:25 Would something like some of the features in the editors, like PyCharm and probably VS Code, I don't know what plugin you need for it, but where it's like continuously running unit tests just as you type?

13:35 Yeah, I mean, that's a form of liveness, right?

13:38 The kind of, you know, whatever.

13:39 And even like, you know, even the squiggly lines, right?

13:42 The IntelliSense and the, you know, linting and those things.

13:45 You know, anything that kind of cut down on the, you know, the friction of having to write a bunch of code.

13:50 Because especially for beginners, right?

13:51 So for beginners and for people who might be scientists and people who may not be as programming experienced, you know, one thing you often see beginners do is they write a lot of code, right?

14:00 They write a lot of code and they sit and they hit run.

14:02 They're like, why doesn't it work, right?

14:03 Whereas more experienced programmers kind of know to write a line, check it, print something out, make sure it works, and then write the next line, write the next section and such.

14:11 Yeah, it's easy to do a bunch of work and then have it, especially when you're new, and have it not come out the way you like.

14:16 And you're like, well, now it's broken.

14:18 Oh, no.

14:18 What am I going to do, right?

14:20 It's small bites, of course, small steps.

14:22 This portion of Talk Python To Me is brought to you by Linode.

14:28 Whether you're working on a personal project or managing your enterprise's infrastructure, Linode has the pricing, support, and scale that you need to take your project to the next level.

14:37 With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise-grade hardware, S3-compatible storage, and the next-generation network,

14:48 Linode delivers the performance that you expect at a price that you don't.

14:51 Get started on Linode today with a $20 credit, and you get access to native SSD storage, a 40-gigabit network, industry-leading processors, their revamped cloud manager at cloud.linode.com, root access to your server, along with their newest API, and a Python CLI.

15:08 Just visit talkpython.fm when creating a new Linode account, and you'll automatically get $20 credit for your next project.

15:16 Oh, and one last thing.

15:17 They're hiring.

15:18 Go to linode.com slash careers to find out more.

15:21 Let them know that we sent you.

15:25 When you had live here, and you talked about the live, sort of interactive, almost onstage type of stuff, I was thinking of FoxDot and the live, like, musical composition with Python.

15:36 Have either of you guys seen that?

15:38 No, I haven't seen this.

15:38 I have not.

15:39 I think it's called FoxDot.

15:40 I'll put a link of a presentation.

15:42 So there's this thing called FoxDot, which is Python-based.

15:45 People can go up there, and they basically create little, like, symphonies.

15:48 Oh, that is awesome.

15:50 And with Python, and it's super interesting.

15:52 Anyway, I was going to ask your thoughts on that as a sort of HCI person.

15:56 Yeah, yeah.

15:57 I'll get it next time.

15:58 Yeah, I grabbed a, yeah, I'm just looking at the page now.

16:01 But yeah, that's a whole other world.

16:02 I mean, it's an awesome world.

16:03 I mean, this whole world of just live performance and, you know, live streaming and stuff is fascinating.

16:08 You know, maybe the next Twitch stars are going to be data scientists, right, who are, you know, live streaming their exploratory analysis.

16:14 There are some data scientists I've seen who do these live streams or do these, you know, recorded sessions.

16:19 Like, let me just play with this data or do this Kaggle thing and just watch me play with it, which is really cool.

16:24 That's really, I think you're right.

16:26 I think it's super interesting.

16:27 I feel like a lot of algorithms and libraries and just software products, I guess you'd call them.

16:34 It could be very stale, you know, much like math or science.

16:38 Like, here's the final result.

16:39 Here's the algorithm or the formula.

16:41 But that doesn't give you an appreciation of, like, seeing the, how do I, like, bumble around and, like, discover and think and try and explore and then come up with an answer.

16:51 I think that would be really interesting for data science and, like, Kaggle competitions and stuff.

16:55 Yeah.

16:55 And I think that segues to our notebook work, too, because I feel like a lot of these notebooks, maybe Sam, I'm going to talk more on this later, is that, like, a lot of these sorts of interfaces could support this sort of liveness and this sort of iteration and keeping different versions around and seeing how you, you know, bumble your way toward a final result.

17:11 Yeah, yeah.

17:11 Maybe the next big particle physics discovery Nobel Prize will be live streamed.

17:16 All right.

17:16 So, Sam, let's get to the analysis.

17:19 So, you said there were a bunch of different environments that you studied and you broke them into three categories, the academic world, the product world, and the R&D world.

17:31 Maybe give us a sense of what's in each one.

17:34 I don't know if we want to go through all of them, but read off maybe, like, five from each of those that you feel like is representative of that area, just to give people a sense of what's all out there.

17:42 Yeah.

17:43 So, probably start with the industry world because that world is probably the most familiar to us and maybe to your listeners as well.

17:51 So, in the product world, we kind of have Jupyter Notebook and we also have, like, Google Colab, which is Google's version of Jupyter Notebook, as well as Azure Notebooks, which is kind of like this up-and-coming notebook system for JavaScript.

18:15 Okay. Oh, right, right. Yeah. There is this move to try to make a JavaScript equivalent, basically, of what Python has, right? I mean, it's pretty natural to run JavaScript, but the trick is to find the libraries, right?

18:28 Exactly. Yeah.

18:29 And maybe integers.

18:30 Yeah.

18:31 No integers in JavaScript can make that a little hard, but still, very cool.

18:36 All right. And then, next category is academic notebook projects. And these projects typically come in the form of, like, papers. So, people will prototype some interaction and then submit a paper in a human-computer interaction conference.

18:53 And these papers typically have, like, they take a base notebook system and extend it in one particular way. For example, the Callisto system takes Jupyter Notebook and adds in, like, some live collaboration tools, like a chat window, as well as, I think, being able to see other people's cursors and where they are on the screen.

19:12 A little bit like a Google Doc-ification of Jupyter.

19:15 Yeah, yeah. That's what they're going for. We have tools like Rex, which is a tool specifically for, like, data cleaning. So, the idea there is that you would, you could display a data frame in your notebook and then, by example, show Python, in a sense, like, what you want the data to look like after data cleaning. And it can try to infer the Python code that would generate that, that would change the data in the way that you want.

19:41 Oh, that's pretty cool. And so, it uses a bit of machine learning or something.

19:45 Yeah, it uses what's called program synthesis. Yeah, that actually was a project led by Ian, who was my other PhD student, who was a co-author on this paper. He did this at a Microsoft research internship. So, they have this program synthesis technology. They have this, you know, engine that basically synthesizes pieces of code. And they applied this technology to notebooks. So, what you can actually do is you can, you know, in a data table, you can say, here are some examples of the way I want to clean the data. And then it'll infer. I don't think it actually is machine learning because it's not database. It's kind of rules-based.

20:15 Right, right, right, right.

20:45 So, you kind of have, like, these experimental, but also, like, you look at it and you're like, well, that could be useful in a real system.

20:53 Yeah, yeah, yeah, I could see that for sure. You maybe just drop a marker, like, hey, I'm going to go explore this. I might want to, like, a save point type of thing. Maybe you want to get back here.

21:02 Exactly.

21:03 And then, right, it'll just sort of, that's, like, behind the scenes. We're going to create a Git branch and start doing stuff for you, right? Like, that'd be really nice for people who are, they don't know how to, like, deal with Git branches or want to deal with it and stuff like that.

21:14 Yeah, that's exactly the use case that they were going for. It's for this very exploratory, early stage, like, data exploration. Yeah.

21:21 Yeah. Yeah. Okay, cool. The one in the product space that seems to be closest to that is Gigantum. I had those guys on the show and there's a little bit of, like, auto-versioning collaboration stuff in that world.

21:34 Yeah, yeah. We looked at Gigantum in this paper and I think we did mark it down as being relatively similar to that.

21:40 Yeah, yeah, cool. All right, the third category was experimental and R&D.

21:43 Yeah. So experimental R&D is, in a sense, very much like the academic projects in that they are, they're trying to test out new interactions or new features that might not, like, don't have immediate, like, industry use cases or maybe, like, business propositions.

22:00 And so we have tools like Carbide is an interesting one. Carbide actually allows you to, it creates widgets for your Python variables in line with your Python code.

22:12 And then it allows you to, like, use sliders to change your variables on the spot.

22:17 Okay.

22:18 And it's, like, way out there. It's really interesting. But you can, it actually infers, like, what values your Python variables should take on to produce an output.

22:25 So if you use a widget to say, like, oh, I want Y to be, like, 32, then if Y depends on X, then it'll also update X at the same time.

22:35 So it does, like, it's weird. It does look like it's really, like, intriguing, like, program synthesis or, like, program, like, inference, like, behind the scenes to get your program to look right.

22:44 Right, to look at the dependencies and figure out what it has to rerun.

22:46 Yeah, yeah.

22:47 Yeah, that's really cool. It sounds a little bit like the interactive widgets in Jupyter that you can have, you know, you can sort of put some sliders and adjust some things.

22:55 But I don't believe the Jupyter one has a lot of pendency type of stuff on down the line, right? It's just like this one cell as you slide the widget recomputes.

23:03 Yeah, yeah. Like, Carbide is like those Jupyter widgets, but super next level. The widgets can change the code itself.

23:09 Okay, that's pretty wild.

23:10 Yeah. We have a few industry projects here as well. We have projects from, like, Netflix and Stripe, where they take notebooks, but use notebooks as, like, kind of, like, recurring jobs instead of just one-off, like, exploratory scripts, where they use notebooks as these jobs.

23:29 And so they can re- the notebook might automatically rerun, like, once a day to perform some data operation.

23:36 They're kind of, like, broadening the scope of what notebooks can be used for in a large-scale setting.

23:40 Yeah, you know, the stuff that Netflix was doing with, like, Paper Mill was super interesting.

23:45 How they're using it to automate a bunch of stuff and sort of schedule a bunch of things to work, but also the fact that if it fails, the notebook has the output of the failure.

23:56 So basically, you can, you know, save the notebook in that state and just go look at it and see what went wrong.

24:02 Rather than there's a log message that says something went wrong, like, there's the whole sequence with the history of the notebook output, which is pretty awesome.

24:09 Yeah. I think the Paper Mill stuff, I mean, they have a blog post about it, but we link to all of this in the paper.

24:13 So if you, you know, link to our paper, we have all the URLs.

24:15 I mean, I thought that was interesting.

24:17 I mean, we didn't really capture this in the- in our space because, you know, most of it is for data scientists.

24:22 But the Paper Mill project is interesting because it's using notebooks for, like, DevOps and for, you know, production deployment work, right?

24:28 It's like really using the notebooks as a kind of a DevOps-y sort of tool rather than a direct data science tool.

24:34 So I thought that was a really interesting and out-of-the-box use case.

24:37 It's like instead of running cron jobs and seeing, like, gigantic log files on your terminal or grepping through text files,

24:44 you can just, you know, reconstruct it in a notebook and actually use a notebook to debug your production software, which is cool.

24:50 Yeah. And the essence of that, I think, was one of the things you do with Paper Mill, if I have it remembered correctly,

24:56 is the notebooks can take inputs and they can generate outputs.

24:59 They're, like, sort of become functions in the traditional black box sense of it.

25:03 Yeah.

25:04 Yeah. That's pretty neat.

25:05 Another one I see that you have in there is NB Dev.

25:08 NB Dev is a really interesting project.

25:10 I just, we talked about that, Brian Nock and I did a couple episodes ago on Python Bytes.

25:15 And I was like, oh, well, this might be kind of interesting.

25:18 And I looked and I'm like, wow, this is something really quite special.

25:21 It does a lot of neat things over there.

25:24 Right. So if I'm remembering correctly, NB Dev allows you to take a notebook and, like, deploy it as a Python library.

25:32 Yes, exactly. Yeah. You can basically turn it into a package.

25:35 Right. And it can have the test, like, built into it.

25:38 And, yeah, it's, again, a really interesting example of, like, notebooks are kind of being reused in all sorts of ways that we would see traditional Python scripts being reused.

25:47 So in this case, instead of writing a set of Python files to become a Python package, you could write a notebook instead.

25:54 And that could be, like, your package.

25:56 Yeah. I'll give you the quick rundown for people who are listening and might be interested just off their little highlights.

26:01 So you can export the functionality of a notebook to a library.

26:04 They have a CLI commands you can use to interact with NB Dev.

26:08 Export to HTML builds documentation for your library.

26:11 Sync brings the Python.

26:13 So you can edit the Python code and then bring it back into your library.

26:17 You can put the tests in there and run them in CI.

26:20 There's just a button that also has this, like, NB strip out concept as well for checking stuff in.

26:24 So, yeah, that's, I mean, there's a bunch of exploration and innovation happening here.

26:29 Yeah, it's really interesting.

26:30 One other interesting project in the experimental section is, like, Iodide.

26:34 Iodide is, Iodide is, like, super experimental.

26:37 But they compiled Python and some of its data science libraries into WebAssembly.

26:44 And then because of that, you can run, you can write Python in the browser.

26:49 And it doesn't require a separate Python process to run that.

26:53 Instead, it runs completely within the browser because your browser is interpreting the WebAssembly version of, like, the Python interpreter.

27:00 And, like, I think they have, like, NumPy and, like, Matplotlib support in there, too.

27:05 Yeah, they've recompiled many of the core, what would be C-based Python libraries as well, right?

27:11 Like you said, like, NumPy and maybe Matplotlib.

27:13 I can't remember.

27:14 But some of those types of libraries.

27:15 Right.

27:15 It's really, it took a while for me to wrap my mind around it.

27:19 But I was really amazed when I first saw that.

27:21 I think WebAssembly has some real interesting possibilities to bring a lot of this stuff together.

27:25 You know, it's one of the things that's interesting about some of the stuff you were covering in these different projects here is just what it means to run the notebook, right?

27:34 Sometimes that means a Docker image is spun up and configured exactly for you in some environment or even, like, with Gigantum on your local machine.

27:42 With Google CoLab, there's, like, a way to say, run this on a GPU, by the way.

27:47 Yeah.

27:47 Just to get a GPU to run it on.

27:49 And Iodide runs in the browser through WebAssembly.

27:53 There's just, there's a bunch of options here that are pretty cool.

27:56 Yeah.

27:56 Kind of to Michael's point, it's like, you know, we've just informally kind of starting this project.

28:01 We've kind of heard about all these.

28:02 It's felt like the Wild West, right?

28:04 It's like, oh, you can run the notebooks locally.

28:05 You can run some JavaScript.

28:07 Observable ones run the browser.

28:08 And, like, you know, Iodide, Pyodide compiles.

28:10 The WebAssembly runs the browser.

28:12 And then you need some other ones in the cloud.

28:14 And then some of them have this live editing.

28:16 Some of them don't.

28:16 So that's why kind of motivation for putting together this paper was, you know, how do we just categorize all this in the most kind of concise way we can?

28:25 And, I mean, we're going to talk about more of this detail, but we have one category about runtime environment, right?

28:29 So we categorize, here's the four types of main types of runtime environments people have and other things.

28:34 So the kind of main contribution of this paper was really people have all these intuitions.

28:38 They've heard about these notebooks doing all these different things.

28:40 Let's just try to plop them all together in a map, in a sense, right?

28:44 Like, where does everything lie on this wild, this high-dimensional space?

28:47 Yeah, it's highly dimensional.

28:49 We're going to talk about the dimensions in a minute.

28:51 It's more than three dimensions.

28:52 This portion of Talk Python To Me is brought to you by Sentry.

28:58 How would you like to remove a little stress from your life?

29:00 Do you worry that users may be having difficulties or are encountering errors with your app right now?

29:06 Would you even know it until they send that support email?

29:09 How much better would it be to have the error details immediately sent to you, including the call stack and values of local variables, as well as the active user stored in the report?

29:20 With Sentry, this is not only possible, it's simple and free.

29:23 In fact, we use Sentry on all the Talk Python web properties.

29:26 We've actually fixed a bug triggered by our user and had the upgrade ready to roll out as we got the support email.

29:33 That was a great email to write back.

29:35 We saw your error and have already rolled out the fix.

29:37 Imagine their surprise.

29:38 Surprise and delight your users today.

29:41 Create your free account at talkpython.fm/sentry and track up to 5,000 errors a month across multiple projects for free.

29:49 And if you use the code Talk Python, all one word, it's good for two free months of Sentry's team plan, which will give you up to 20 times as many monthly events in some other features.

30:00 So create that free account today.

30:06 One thing I do want to talk about maybe first is just like some of the challenges of notebooks and then the paradox of choice.

30:12 So first of all, I think this comes from the paper, but also some of the feelings I had.

30:17 So you talked about some of the challenges, Sam, being stale data, out of order execution, abundance of code, and then the inability for composition, things that Paper Mill, for example, is trying to solve.

30:29 Do you want to speak to some of the challenges you saw out in this space?

30:32 Well, one of the first challenges that a user of Drupal notebooks encounters is typically the out-of-order execution problem, where because you can run cells in any order you want, what might happen is you run some cells, and then you delete some cells, and then you write more cells, and you run those in some out-of-order way.

30:51 And what happens is, well, one problem is that sometimes your variables just get changed, and you don't know where a change came from because your code might be gone.

30:59 Right. You might even delete the cell that defined them.

31:01 Yeah, yeah.

31:02 But the kernel's still running, so they're still in memory. For now, it's going to be okay.

31:06 Exactly, exactly. It happens really, really often, in my experience, when we teach students how to use Drupal notebooks for the first time.

31:13 They hit some random keyboard shortcut, and their cell's gone. And it looks like the code's running fine, but then when they try to turn in their code, everything kind of goes up in flames.

31:22 Yeah. Well, but it's so much the power of Jupyter as well, right? Like, there's some reports and stuff I have for my business behind the scenes, where there's like a 30-second bunch of reading data, correlating data, and then you want to graph it and slice it and analyze it.

31:40 And the ability to just go, rerun the cell that sorts it this way and shows that aspect, it runs instantly, right? It just goes and goes and goes. And it's so powerful to not have to rerun that code to be able to change your analysis or how you're exploring it, right?

31:56 Yeah, exactly. That was the selling point of Jupyter. That's why I've got everyone hooked on Jupyter to begin with.

32:00 Yeah.

32:00 It was like that ability to really quickly.

32:01 Yeah, you do the expensive part. Exactly. Exactly.

32:04 But it's like the world's worst go-to and then some, you know?

32:09 Sure. It's like the human-powered go-to, right? Like, you're self-like running the cells and go-to statements yourself.

32:16 Exactly. Like, with a real go-to, it might be really hard to understand, but you can literally go through the code and see it.

32:23 Yeah, it's at least deterministic.

32:24 Yeah, it's at least deterministic, right? But yeah, it's not even with Notebook. So that's a challenge.

32:29 Yeah, exactly.

32:29 Because to me, that's the biggest challenge. Yeah, it's this sort of out-of-order stuff. But it's so hard to say, like, well, let's just not do that because it's also the superpower.

32:37 Right, right. One of the dimensions in this space, I don't know if we're getting there quite yet, but one of the dimensions in this space specifically addresses this because a number of projects that we see from academia and from industry try to, like, address this specific in-order, out-of-order execution issue.

32:53 Projects like Papermode, which we briefly talked about before, address it in a different way by trying to just, like, make sure that whenever a Notebook is run, it only runs from top to bottom like a computer would, rather than how a human might execute it.

33:06 So there's different angles on how to get around an out-of-order execution thing. But as you mentioned, it's like the main benefit and also the main weakness of these Notebook systems.

33:16 Yeah. Did you guys study Streamlit?

33:19 Streamlit?

33:20 Yeah.

33:21 So Streamlit has a really interesting way of solving that in that they basically use Funktool's LRUCache.

33:27 They don't exactly use that, but they more or less, the concept is the same.

33:31 And they've got these different functions that more or less act like cells.

33:35 And when you rerun it, if you give it the same inputs, it uses the cached version.

33:38 But if the inputs happen to change, then it'll give you a different output.

33:42 So it sort of keeps that, you know, don't recompute the stuff I already know aspect.

33:47 Right, right.

33:48 So Streamlit has an interesting execution model that I think we also noticed in some other systems where it looks to you as a user that whenever you make a change, your Notebook gets rerun from top to bottom every time.

34:01 But then behind the scenes, it does some smart caching and computing to avoid the work.

34:07 So you kind of get the benefit of rerunning a script every time without the drawback of having to like wait a long time for some expensive operation to recompute.

34:16 Yeah.

34:17 Interesting.

34:17 This is a big challenge to put all these things together.

34:21 I mentioned The Paradox of Choice, which is a fantastic book if people haven't read it.

34:25 But I feel like one of the challenges that things you all may experience is if you look at all these different things, you're like, oh, Rex does this amazing thing.

34:33 Callisto does that.

34:34 And Streamlit does this other thing.

34:36 And then Jupyter has this amazing aspect.

34:38 Not any single one of these has all the good things you want.

34:42 But any choice down one path will make you feel like, well, I don't get the machine learning like auto cleanup in this one.

34:50 And I don't get the, you know, the cursors.

34:52 Like, do you feel like there's a bit of a, I don't know, fear of missing or feeling of missing out as you like have to go down?

34:59 You got to use one.

35:00 You can't use them all.

35:01 Oh, yeah, for sure.

35:02 Nowadays, when I use Jupyter, I mean, Jupyter is a system I still use.

35:06 But now that I know about all these other systems, oftentimes when I use Jupyter now, I'm like, dang, if only I had, you know, Next Journal.

35:13 If only I had like Databricks right now.

35:15 Yeah.

35:16 But then you'd want Jupyter to lab back for something else, right?

35:18 I mean, it's like, there's a lot of different things going on here.

35:21 It's a lot of flowers blooming, I would say, I guess.

35:23 Right, right.

35:24 So we talked a little bit at the end of the paper in the kind of discussion part about, you know, what if you did this kitchen sink, right?

35:30 What if you just threw everything together?

35:32 What if someone made a notebook that has everything?

35:34 And, you know, one, it's an enormous software engineering challenge, right?

35:38 Nobody would want to take on that challenge of maintaining 500 different sorts of features.

35:43 And the other one is that it just may make the interface really complicated, right?

35:46 It's not like, if you can choose, imagine, like, you can choose different kinds of execution orders or choose a reactive versus an in-order one and you choose all the stuff, like, it would just be really, really hard for people to set up.

35:58 So, you know, some of these notebook systems, you know, say Jupyter or some of the other ones, I think they've been successful because they've, I would say, I don't want to say straightforward, but their feature set is limited, right?

36:08 Like that, you know, it's, even though Jupyter has this out-of-order issue, it's like, okay, what you see is what you get as long as you execute in that order.

36:16 Yeah.

36:16 And some of the real power is you can have semi-structured, reusable tiny bits in, say, cells when you're just a biologist or an economist and you don't have to become a computer scientist person to, like, learn Haskell to take advantage of this thing.

36:33 Like, it's incredibly, yeah, it's incredibly accessible to just jump in and write a few lines of imperative code and then get some really awesome output.

36:42 And so, yeah, I feel like if it became too advanced, right, you're like, well, do you want the reactive model, the asynchronous model?

36:50 You're like, I don't know what this means.

36:52 I just want a graph, right?

36:53 Yeah, because, I mean, one of the challenges with, say, Observable, right?

36:56 I mean, Observable has this really elegant reactive thing where this stuff auto-updates, and they have a lot of examples, right, on their website.

37:03 But I still think it's going to be hard for regular people, regular programmers to pick up because it looks kind of like JavaScript.

37:09 But then when you write it, it's like it has all these, like, little extensions to JavaScript that you need to wrap your head around.

37:15 So, you know, it's the age-old thing of, you know, more powerful programming languages environments are just going to be higher barrier to learn, right?

37:22 So it's very hard in practice.

37:24 Yeah, yeah.

37:24 So let's talk about the dimensions.

37:26 Sam, you put together 10 different dimensions on how you sort of evaluate these things and, you know, a spectrum along them.

37:34 For example, like data sources, you have local files, cloud storage, large data, streaming data.

37:39 Want to run us through this?

37:41 Sure.

37:42 We can start with, like, the highest level breakdown.

37:45 We organize the design space dimensions here around the steps in the workflow of a data scientist.

37:51 So as a data scientist, we imagine that you might start a project by importing your data and then writing and editing your code and your prose around that code.

38:02 And then you would want to run that code.

38:04 And then finally, you'd want to publish your notebook in some way.

38:07 So it's like the input, editing, and output steps for a data scientist.

38:12 And those are, like, the workflow steps.

38:16 And the 10 dimensions that we pulled out were data sources, as you mentioned, and things like versioning and collaboration, as well as execution models and execution liveness.

38:29 And for publishing a notebook, we talked about, like, you might want a notebook as a static HTML page or maybe as a software package, as we alluded to earlier.

38:38 There is a lot to consider here.

38:39 Let's just maybe grab a couple and to give people a sense, then they can go look at the paper.

38:45 You've got a big full page graph chart type of thing.

38:48 Yep.

38:49 Chart, I guess.

38:50 For example, I mentioned data sources, right?

38:52 We've got local files.

38:53 And you say, okay, this is access local file system.

38:56 And you always give an example of a system that implements that.

38:59 Like, so RStudio, for example.

39:01 Right, right.

39:02 What else is on that access there?

39:03 Yeah.

39:04 So we have local files, which means that the system basically only, like, natively supports opening files that are stored locally on the same computer.

39:12 Some notebook systems also allow you to read in cloud files, like files on their servers as though they were local.

39:20 This is what Google Colab does.

39:23 It allows you to read in files from Google Drive as if they were stored locally alongside a notebook.

39:29 We have some systems that have some special handling for large data sets, like data sets that don't fit in a computer's memory, like Databricks, uses Spark to handle that.

39:38 Right.

39:38 And we also have some systems that support streaming data.

39:41 So, like, you might hook up your notebook to some, like, web socket on the internet.

39:47 And as new data comes in, your notebook will automatically rerun and update to reflect the latest versions of those data.

39:52 Oh, like if you want to set up a dashboard and as things behind the scenes change, like, the notebook is always sort of up to date with the state of the world.

40:02 Polling almost, but better.

40:03 Yeah, exactly.

40:04 So you can kind of see from the way you described that, that every single, like, dimension here has some use case.

40:11 And the use case is kind of motivated by some real world, like, somebody sat down and was like, man, I don't like rerunning my notebook to update my dashboard every time.

40:19 I just wish I had some way to, like, have it automatically, like, streaming the data and update itself.

40:23 Yeah.

40:23 Yeah, for sure.

40:25 So let's go just through two more and then I'll let people go check out the details for the rest.

40:29 So how about execution order?

40:31 Yeah, so execution order goes directly to the in order, out of order problem that we were talking about earlier, where the Drupal notebooks allow users to run cells in any order they choose.

40:42 So I can jump around cells freely.

40:44 Other systems like Observable require a certain order of cells.

40:50 So what Observable does is it takes a cell, looks at all the dependencies, like, the variable dependencies of that cell, and then forces a notebook to run in the topically sorted order of cells, right?

41:02 Depending on, like, what cells to run should depend on other cells.

41:06 We also have systems like Streamlit, which force, like, this in order execution model that we discussed earlier, where it looks to the user, like, the cell, like, the notebook always runs from top to bottom every time.

41:17 So those are the three main variations we found of ordering execution.

41:22 Yeah, interesting.

41:23 Let's go to the very last part, the end of this lifecycle, if you will, and talk about notebook outputs.

41:29 Yeah, so notebook outputs, we found the most common use case, I think, reflected across all of our systems was taking a notebook and publishing it as a static report, so to speak.

41:39 So it's essentially like taking a notebook and converting it to HTML and emailing the HTML file or putting it up as a web page somewhere.

41:47 The next step up would be, like, a dashboard where you don't have, where you don't show viewers the code that you might, like, where you might require to produce to charts or graphs.

41:58 You just show the charts or graphs, like, themselves.

42:01 Right.

42:01 Okay.

42:02 And then finally, within that space, we have, like, what we call software artifacts, which encapsulate the idea of you can use notebooks as cron jobs, or you can use notebooks as software packages.

42:13 Right, right, like NB dev creating something you could put on PyPI.

42:17 Yeah, exactly.

42:18 Exactly.

42:18 Yeah.

42:19 Yeah.

42:19 I really think it's cool that you broke it down like this, but then each one has an example type of thing.

42:28 So you're like, okay, well, how does, say, Gigantum do it?

42:30 Or how does Streamlit do this thing?

42:32 Or Observable?

42:33 And so on.

42:34 Yeah, I think actually seeing all this stuff together might inspire features to start cross-pollinating.

42:40 Hopefully not every single feature, right?

42:42 That would be too much, but still.

42:44 The kitchen sink.

42:44 Yeah.

42:45 It seems powerful.

42:45 Kitchen sink notebook.

42:46 Yeah.

42:48 Yeah.

42:49 I mean, maybe kitchen sink is available on PyPI.

42:52 Like, you could just go for that.

42:53 Yeah, yeah.

42:53 Pick install kitchen sink.

42:54 Yeah.

42:54 Exactly.

42:55 Gives you half the kitchen sink.

42:57 Takes half an hour.

42:58 Yeah.

42:58 Kitchen sink.

42:59 Wait five minutes.

43:00 Awesome.

43:01 All right.

43:02 Well, this is a really interesting view.

43:04 I guess, you know, maybe let's kind of close out the conversation around this stuff by asking,

43:08 I'll let you both each way in on this.

43:10 What surprised you?

43:12 Like, as you went through this, what did you expect and what was different than that?

43:16 What surprised you as you went through this?

43:18 I mean, first thing that surprised me is how many there were, but what else?

43:21 For me, I think it was this tension between we hear about the problems of notebooks quite often.

43:27 And yet we, from this analysis, we don't often see those desires reflected in actual products.

43:33 So the in order, out of order execution model, for example, the vast majority of the systems we looked at only allowed any order execution where the user defines execution.

43:43 Collaboration between users and notebook authors is relatively common.

43:49 But in a company, you'd rarely find someone working on notebook all by their loans themselves.

43:53 But at the same time, we see very few systems that support like the sort of real-time collaboration that Google Docs has made popular.

44:00 Right.

44:00 And it seems so, so easily on the cusp of that because it's already in the browser.

44:06 Often it's already hosted.

44:08 Right?

44:09 Right.

44:10 Exactly.

44:10 So it kind of reveals, I think as a whole, the analysis reveals that there's still room for improvement for a lot of these products.

44:18 Yeah.

44:19 Yeah.

44:19 Good.

44:20 Yeah.

44:20 And I guess, I don't know if it was a surprise as much as, you know, something that, you know, we mentioned toward the end of the paper was that, you know, on the, you know, Sam kind of mentioned the product side of, you know, practically speaking, things are pretty tied to Jupyter-like things because people are used to that.

44:34 And a lot of products are Jupyter extensions or Jupyter hosted on Azure, Jupyter hosted on AWS, Jupyter hosted on Google Cloud.

44:41 And so my kind of focus is thinking about the academic side, think about the research papers that people have been prototyping for this.

44:48 And, you know, I don't know if it was a surprise, but it's something that was kind of a call to action we had at the end of the paper is that, you know, I felt personally that the academic work was not sufficiently weird, maybe.

44:58 Right.

44:58 So it's like you want, you kind of, you, you know, one of the roles of academic research and prototypes is you want to push the bounds and try stuff that's very different.

45:06 Right.

45:06 So imagine experimental programming language or experimental toolkits.

45:09 You know, the point isn't to be practical.

45:11 The point is to stretch the limits of what we know.

45:14 So because I think notebook research is still pretty early, it's all these systems in academia have been done probably within the last four or five years.

45:22 It makes sense that the first wave is fairly kind of, you know, making the extensions that people expect, right?

45:28 Versioning, in order execution, collaboration and stuff.

45:31 So I think that what would excite me more to see in the coming years is just to see the academic projects or the R&D projects becoming more weird, right?

45:39 So one of the things that we pitched at the end is just thinking about multi-device, different kinds of off-desktop devices, right?

45:46 I mean, already we're computing with phones and smartwatches and augmented reality and displays, but we're still just hunched over in front of our laptop, typing in this, literally just typing in a text box in the web browser, right?

45:57 Like that's all the notebooks are.

45:58 They're just little text boxes in web browsers.

46:00 So I think we can really expand our imagination thinking, you know, way beyond.

46:04 That would excite me a lot more.

46:06 Yeah.

46:07 Very interesting.

46:08 Yeah.

46:08 And to echo that point, what we drew out of the notebook systems was like this data science workflow.

46:14 But the reason why we drew out the data science workflow is because I think every single system that we've evaluated here was designed for data scientists for the most part.

46:24 And so with data scientists as the audience of choice, I think they're necessarily like, it comes with like a limitation on what we can imagine notebook be used for.

46:35 So in the paper, we talk about broadening notebook use cases for other types of users, like perhaps instructors for perhaps like maybe like artists, let's say.

46:45 And each of those audiences has probably an entirely different set of desires and things they want to accomplish with programming.

46:53 And I think with that comes like a whole nother set of trade-offs and perhaps design dimensions for notebook systems.

46:58 That's a good point.

46:59 Yeah.

46:59 Like DJs, musicians, orchestras.

47:03 Like what could you do?

47:04 Like what kind of world would you build if you were trying to, you know, coordinate that kind of stuff?

47:08 Philip, you brought up our phones, which are pretty incredible these days.

47:13 Did you guys look at Carnet?

47:15 I mean, it's not really something that needs separate analysis because it's just basically Jupyter, but Carnet, C-A-R-N-E-T-S, which is a open source thing on at least a free thing.

47:28 I'm pretty sure it's open source for Jupyter running natively on iOS.

47:31 Oh, interesting.

47:32 I haven't looked at that.

47:34 Yeah, it's pretty new.

47:35 I just learned about it a few months ago.

47:37 Oh, Stan, that's on GitHub.

47:38 Yeah.

47:39 We should add that to our analysis if we had time.

47:43 That's pretty cool.

47:44 It is, but I mean, it's not really fundamentally different, but it's interesting that it has NumPy.

47:49 Right.

47:50 It's just on the platform.

47:51 Yeah.

47:51 And a lot of those libraries like natively executed on iOS, not like an interface to some cloud thing, but like legitimately there, you know?

47:58 Yeah.

47:59 I mean, already people are talking about, you know, on our mobile phones, right?

48:02 That, you know, for machine learning, right?

48:04 You download these pre-trained models so then you can just do inference on your phone without going to the cloud, right?

48:09 So both for privacy, right?

48:10 If you want to do image recognition, both for privacy and for bandwidth.

48:13 You can do all this computation on your phone locally.

48:16 So you could imagine, you know, what would it mean if you could do data science out in the world, right?

48:20 You have your smartwatch, you have your phone.

48:22 You could just, you know, do your analysis, your Google Glass or whatever.

48:25 There's this kind of sci-fi future.

48:27 Yeah.

48:28 I guess like, you know, the thought that we have as kind of human computer interaction or HCI researchers is, you know, what would the future of computing look like, right?

48:35 Right now, it still looks like people hunched over on their laptops, typing in text boxes, right?

48:40 Like, you want to go beyond that somehow.

48:42 Still waiting to just plug an interface right into the neck, right?

48:45 No more typing or massing.

48:47 Projects onto your retina.

48:49 You can have Jupyter in your eyes.

48:51 Exactly.

48:52 Sign me up.

48:53 You'll be the first best subject.

48:55 Yeah.

48:56 Maybe not the first, but early adopter.

49:00 Let's put me in that category.

49:01 All right, you guys.

49:02 This was super interesting.

49:03 And it definitely opened my eyes to some things that are going on out there that I had no idea about.

49:08 So super cool research.

49:09 Thanks.

49:10 It was really fun for us to talk about it.

49:11 Yeah, for sure.

49:12 Now, before you get out of here, I'm going to ask you the two questions that go at the end of the show all the time.

49:17 And I'll, Philip, I'll pick on you since you're a veteran.

49:19 If you're going to write some Python code, what editor do you use?

49:23 I still use Vim because I haven't learned anything new.

49:26 So I still use Vim with whatever defaults.

49:28 Right on.

49:29 All right.

49:30 And then notable PyPI package, something that you've used that is interesting.

49:34 I'll do a plug for Sam's actually.

49:36 So Sam, for his master's thesis at Berkeley, he made this package called MB Interact, which is related to Jupyter Notebooks.

49:42 It's called MB Interact.

49:44 And you can pip install it.

49:45 So I'll let Sam talk more about that.

49:46 That's a lead into Sam's stuff.

49:49 Thanks.

49:50 Yeah, yeah.

49:50 Cool.

49:51 All right.

49:51 Sam, if you're going to write some Python code editor?

49:53 Yeah.

49:53 So I use VS Code.

49:54 But I've hopped from Sublime to Emacs to Vim and then not to VS Code.

49:59 So now I'm a happy VS Code user.

50:02 All right.

50:02 Cool.

50:03 Do you do any bindings?

50:04 Like you set up Vim bindings or Emacs bindings in there or something?

50:07 I set up Vim bindings.

50:08 But if the VS Code team is listening, there are some problems I have with the Vim bindings in VS Code that I'd love to talk about.

50:14 Oh, we're calling them out.

50:14 This is the call-out section.

50:16 This is where we start the beefs.

50:18 The call-out section at the end of the episode.

50:20 That's right.

50:20 It's the underhanded compliment.

50:23 Not exactly.

50:24 But it's like, I love it, but beautiful.

50:26 No, VS Code is a good one.

50:28 All right.

50:28 You want to tell us about MB Interact?

50:29 Sure.

50:30 MB Interact, it lets you take a Jupyter Notebook with some widgets.

50:35 And normally those widgets don't work outside of the notebook environment.

50:38 Like if you convert the notebook to a web page, to an HTML page by default, the widgets will just break because you don't have a Python service running underneath the notebook.

50:46 But MB Interact allows, it kind of like takes your HTML page and hooks it up to the binder service from the Jupyter team so that you can have a, you can have like this HTML page with interactive widgets, but it doesn't require you to have notebook, a notebook server running locally.

51:00 So you can kind of like share those things.

51:02 You can kind of share your interactive notebooks more readily.

51:05 That's the idea.

51:06 I see.

51:06 You don't have to set up a proper Jupyter server with all the execution and stuff.

51:10 Yeah.

51:10 You're going to send someone a link to a web page and then they can run your interactive widgets.

51:14 That's the idea.

51:15 Okay.

51:15 Yeah.

51:16 Yeah.

51:16 Ah, excellent.

51:17 As for like personal use, I like the TQDM package.

51:20 It gives you like a progress bar when you run, when you run the four loops.

51:23 Really handy.

51:24 Yeah.

51:25 That's really cool.

51:25 There's a couple of interesting progress bars that you can use, but it's those little touches,

51:31 you know, color, progress bars, maybe a few emojis.

51:35 It just makes it feel so much nicer to work in a CLI app.

51:38 Yeah.

51:38 Our next paper is analyzing 60 progress bar libraries.

51:41 Yeah.

51:41 60, 60 progress bar libraries.

51:44 And emojis.

51:44 What emojis they use.

51:46 And emojis.

51:46 Yeah.

51:47 Perfect.

51:48 All right.

51:49 Well, you guys, this is really fun.

51:50 People want to know more about your research, your paper.

51:52 What do they do?

51:53 How do they find it?

51:54 I mean, I'll put a link in the show notes, but what else?

51:56 Yeah.

51:56 They can just go to the links.

51:57 Michael, I sent you a link in the chat about MB Interact.

52:00 And then we sent you the paper links and such.

52:03 We can put both of our homepages on there and such.

52:06 Sam, any other plugs?

52:07 Yeah.

52:08 I think you remind me that I haven't put up the paper online.

52:13 I don't know if you have the paper online anywhere.

52:14 Yeah.

52:15 I have it on my page.

52:16 Yeah.

52:16 Oh, you do.

52:16 Okay.

52:17 So we can maybe link to the paper from Philip's page.

52:19 Yep.

52:19 Sounds good.

52:20 Well, I'll definitely do that.

52:21 All right, you guys.

52:22 Thank you so much for being on the show.

52:23 It's been really interesting.

52:24 And yeah, there's a lot of notebook exploration and flowers blooming.

52:30 So thanks for putting it all together into like one analysis here.

52:34 Yeah.

52:34 Really big thanks to you, Michael, for having us on.

52:37 And it's been really fun to talk about all the flowers, all the fields of flowers.

52:42 Likewise.

52:43 Thank you so much, Michael.

52:44 Have a good day.

52:45 See you later.

52:45 Bye.

52:46 This has been another episode of Talk Python To Me.

52:49 Our guests on this episode were Sam Lau and Philip Guo.

52:53 And it's been brought to you by Linode and Sentry.

52:55 Start your next Python project on Linode's state-of-the-art cloud service.

53:00 Just visit talkpython.fm/Linode.

53:03 L-I-N-O-D-E.

53:04 You'll automatically get a $20 credit when you create a new account.

53:07 Take some stress out of your life.

53:10 Get notified immediately about errors in your web applications with Sentry.

53:15 Just visit talkpython.fm/sentry and get started for free.

53:19 Want to level up your Python?

53:21 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

53:26 Or if you're looking for something more advanced, check out our new async course that digs into

53:31 all the different types of async programming you can do in Python.

53:34 And of course, if you're interested in more than one of these, be sure to check out our

53:38 Everything Bundle.

53:39 It's like a subscription that never expires.

53:41 Be sure to subscribe to the show.

53:43 Open your favorite podcatcher and search for Python.

53:45 We should be right at the top.

53:46 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

53:51 and the direct RSS feed at /rss on talkpython.fm.

53:55 This is your host, Michael Kennedy.

53:57 Thanks so much for listening.

53:59 I really appreciate it.

54:00 Now get out there and write some Python code.

54:02 I'll see you next time.