We're streaming live right now.
Join in to be part of the show.

#394: Awesome Jupyter Libraries and Extensions in 2022 Transcript

Recorded on Thursday, Dec 1, 2022.

00:00 Jupyter is an amazing environment for exploring data and generating executable reports with Python. But there are many external tools, extensions and libraries to make it so much better and to make you more productive. On this episode, we're going to cover a ton of them. We have Marcus Schanta, the maintainer of the awesome Jupyter list on the show, and we'll highlight a bunch of Jupyter's gems. This is talk Python to me. Episode 394 recorded December 1, 2022 welcome to Utal Python. A weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Mastodon, where I am@mkennedy, and follow the podcast using @talkpython, both on fosstodon.org. Be careful with impersonating accounts on other instances. There are many. Keep up with the show and listen to over seven years of past episodes at Talkpython FM. We've started streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over at Talkpython.FM/YouTube, to get notified about upcoming shows and be part of that episode.

01:14 This episode is sponsored by the AWS Insiders Podcast. AWS is changing fast. Listen in to keep up over at Talkpython.fm/awsinsiders, and it's brought to you by Sentry. Don't let those errors go unnoticed. Use Sentry. Get started at Talkpython.FM /sentry. Transcripts for this episode are sponsored by AssemblyAI, the API platform for state of the art AI models that automatically transcribe and understand audio data at a large scale. To learn more, visit 'talkpython.fm/assemblyai'. Marcus. Welcome to talk python to me.

01:50 Hi, Michael. Thanks for having me.

01:52 This is going to be a very broad, not necessarily super deep episode, but we're going to talk about a ton of cool little extensions and widgets and libraries that you can plug into your Jupyter work and make it awesome.

02:06 I really like talking about these topics where it's like, oh, that's only a ten minute commitment to see if it's going to help me out or not. Right. And not like some huge framework you got to learn. And there's going to be a lot for people to take away if they do anything at all. With notebooks, I think.

02:18 Yeah. And I think one of the nice things about notebooks is they're very easy to use. I think the barrier to entry to you just being able to do something tangible with them is very low, much more so than maybe a lot of other things.

02:32 Yeah, they're very welcoming environments, especially if someone set them up for you, if it's one of the hosted ones, or if you're not in there trying to configure a kernel with a special virtual environment. How do I get to the right installs? And why is this thing out there?

02:47 Or Hub running on Kubernetes.

02:50 Yes, exactly. But once you're kind of beyond that shell, and many people have that set up for them, then it's ready to roll. All right. So going to be tons of fun to talk about these things. But before we get to them, let's start with your story. How did you get into programming in Python?

03:02 How did I get into programming? I think sort of started when my parents decided to buy a computer when I was eight years old. Back in the sort of at twelve I started to realize you can program these things and some well intentioned adult bought me a book on C and twelve year olds and pointers doesn't make.

03:25 See the easy parts.

03:28 Come on kids, get with it.

03:30 That was just like conceptually a bit too much for it like twelve year old. But then luckily once I got into high school they actually taught us Borl and Delfi, I don't know if she ever came across that. It was a very easy to use like GUI driven way to get into programming. And you could do lots of things even that like a 14 year old would find exciting. Right? Just like see style hello world. And I can print numbers in a loop but actually like fun GUI like stuff.

04:00 I never used that but I used Visual Basic and some Windows forums and other stuff and I really think even today that style development it's kind of withered and it's too bad, right? It has to be so amazing. You would just go over there, button, text box, I click on this thing. Here's what it calls. The interaction was so simple, and now maybe you're doing the web and it's like, I'm going to do Web Pack and then bring in all these things, and I'm doing TypeScript, and then it's all CSS and HTML, and I'm not bemoaning that style of development. But it's like that's kind of replaced this. Just give me a visual of what it looks like and let me build something simple has been replaced with

04:44 Feel like it was a very good Pedagogical tool to teach programming and GUI development. Like I remember we did fun things like rectangles on the screen that just chased each other around.

04:57 That was a fun way into programming.

04:59 Yeah.

05:00 And I think there are some bits like that still around, isn't it? With the programming learning language, something with an animal. Is it small talk or the thing with a turtle?

05:10 Yeah, yeah, the turtle where you can do some drawing. There's that yeah, there's anvil actually which is Python sort of front end and back end web framework that has this it's really really similar. My daughter used that to play with it a lot. But I would love to see like a Visual Basic py or something like that would be so amazing.

05:31 Maybe it exists, but that's where I really started developing or programming on my own. Then I went to university. That was in the early two thousands and they taught us mostly Java, but also sort of a weird mix of like logic oriented programming, functional programming, and I really only ever got in touch with Python when I did my Masters in the States during some courses and then professionally. I learned proper Python when I was working in London and that was a company man group with a lot of really good Python developers. And there I picked up like proper Python and stuck with it ever since. If I have a choice, I will stick to it.

06:13 It's interesting that we come from my background was not that different than yours in some ways. I did like C++ in the early days. And you think as you advance in your career and you get more technically capable and stuff, you would think, well, you're going to be doing more intense stuff and more like deep dive. Now you're writing like kernel drivers for me at least, and it sounds like you are quite the opposite, more towards some of these higher level languages that let you build really amazing things, but you're not juggling like, pointers to pointers and all this crazy aspects. Right.

06:44 I think the first bigger expectation that you're describing is maybe a function of sort of technical expertise. In the second development, what you end up doing is a function of age as evictable or you focus more on the high level.

06:59 You just want to get stuff done. You don't care about showing off what's.

07:04 The shortest way for me to be.

07:06 And very often that's quite an absolutely it is. All right, so some real time follow ups. The audience has helped us out here. Don says Delphi is a wonderful way back into Pascal from the college days. Indeed, in some of those languages. Logo was one of these visual building ones and as well as the one with the cat is Scratch, which what an amazing name. I haven't really used Scratch, but I.

07:26 Think the cats or animals in general seem to be good metaphors for teaching programming. Yes, they do.

07:32 And O'Reilly Publishers, the publisher O'Reilly, has built their entire book series on unique animals and programming. All right, well, how about now? What are you doing these days?

07:41 Right now I am a founding partner at Blue Balance Capital. We are the small, independent alternative asset management firm based in Vienna, Austria. I started this company three years ago with my partners. Before that I worked at an asset management firm in London. Yet before that I worked at Goldman Sachs as a quantitative analyst.

08:06 So basically during my professional career, I was always more a data analyst rather than a systems developer. So everything or most of the things I use Python for are more from the perspective of here's a piece of data, and I want to sort of analyze that data set to better understand like, I don't know, a company or an economic trend or an industry or a particular trade. So I was always interested in Python in the context of some data set and some insight that you can glean from analyzing that data set.

08:40 How much of that was taking a block of historical data, like, here's the last quarters reports or whatever, versus trying to make predictions like real time trading or other types of real time information.

08:55 So I think sort of the starting point is almost always some kind of like time series. And then the asset management for Man group was they were actually developing automated trading systems. So it was like very much the first thing you described where you have an input that is a time series. You apply some transformation and then a computer actually produces traits on the back of that. Whereas at other times more like what I do now is you think about it, you get some insight from the data, but then there's also some other real world considerations or exogenous factors that you think about as a human person and then make your decision. So I think for me it was the whole gamut of sort of the product is a direct trade that the machine puts on or trade that you put on or maybe just some advice that you give to a company.

09:44 Right. We're thinking that tech generally the tech indexes are going down over the next three months versus buy today, sell tomorrow or buy now.

09:53 Predictions are hard, especially about the future.

09:56 Yeah, well, especially now with COVID and wars and like, I have been in.

10:02 Finance for more than ten years now, but I've never seen anything like last two years.

10:07 Yeah, it's nuts.

10:08 I think much more happened in the last two years than in the ten years before.

10:11 It is living through some history, isn't it? One more question on this background side of things. You did a lot of work for these other companies like Goldman and now you're founding a smaller company. How do you feel that your programming and data science background has suited you to be more of leading this new company?

10:31 I think I have a great sort of education from both Goldman as well as Man Group. They're both very technically capable organizations with a lot of very smart people. And as I was starting out together with my partners as a very small team, one of the things I gave you comfort is that I have the stack of things that I know work together. I understand how they work together and how to sort of apply them to do useful things.

11:00 So having that sort of under your tool belt ready to go and not having to figure out sort of how they go together. But just on B one, here's an AWS instance or an EC2 instance. Spin it up, get a notebook running and produce some nice charts. That was something very valuable for me to just be able to do very quickly.

11:19 That's what I would think. I feel even if you have these skills, even if you're not the one. Right. If you hire a team or you find a consultant being able to speak with them and understand, like, yeah, no, I have a recommendation and I think actually this let me tell you, that tool is good, but this one is better and it would fit because XY, I think, just that's super valuable. So I just wanted to kind of check in with you on that.

11:42 Yeah, I tried to test a stack of tools and every once in a while you get to branch off and try something new, shiny, and see if it works, see if it sticks. But having this trusted thing that you know and how it works is valuable.

11:55 Like if you're doing JavaScript, you want to have a really trusted tool. It's been around for at least a month.

12:01 Just kidding. Luckily we don't have it quite that bad in Python. It's a lot more stability there. Okay, well, let's talk about this list, which is why we got together and I invited you here. Is this awesome Jupyter list? What is it? Where did it come from? I mean, most people are familiar with awesome lists, but maybe what's the philosophy of yours?

12:20 Yes, so awesome lists, they're just like curated lists with resources that are useful or pertinent to some particular topic. And then in this case, it's Jupyter notebooks. So this is really a list of things that I started with 20 entries and then just put it up on GitHub and then over time, just more people added to that list of things they find useful and have some relationship with Jupyter. And I think that up to this point, I think more than 100 people have collaborated and added things that they find useful to this list. So it's like living, breathing thing of whatever people find useful that has a relationship to Jupyter.

13:01 I still find these very valuable, even though a good part of my job is to track what is new and what's interesting and what's trending. Still, I find so many things that are new here. And when I first got to Python, I was like, wow, look, it just keeps going. I just thought, there are three web frameworks, or just one way to talk to a database. Look how many there are. It's so amazing and it's always delightful.

13:22 You probably remember back in the old days of the internet, you had directories, right?

13:28 I mean, that was yahoo.

13:29 That was the first search engines, right? You have these like catalogs of things. And here's a website that is about cats and here's one about dogs. And in some ways it feels like in this day and age we have come back to like you actually have a person who is keeping some kind of directory of things that are useful or pertinent to a particular topic. It's kind of funny that way. And what's quite interesting for me is sort of one of the benefits that I have from doing this is that I see what other people find useful. And so for myself, I just know, hey, these are the things that people are using. And so I've got a pretty good radar of the whole jupyter and notebook ecosystem just because I'm sort of curating this thing.

14:09 Yeah, you probably have had people recommend things and you're like no idea what that is, but that looks awesome, so it belongs on awesome Jupyter.

14:16 I tried to be very inclusionist to this.

14:19 So when people include things on this list, more often than not I include them. And even in the cases where I'm like, I don't see myself using that. But I'm sure there's some category of people who might find that useful and then it just goes on the list.

14:35 Yeah, you don't want to over index on your specific use of Jupyter and.

14:40 Your totally your vertical.

14:41 Right. Because we've got astronomers who are using this stuff, we've got economists, we've got biologists, we've got students, right.

14:49 Publishers.

14:52 People who care more about, I don't know, like keeping during their computation on the cluster, other people who are more into visualization. You name it, you have it.

15:01 Like ML folks might have one concern over others.

15:05 A lot of people who use it in education. There's actually one of the sections in there is a whole section dedicated to education. People teaching courses, using notebooks and what the best tools around that is for, I don't know, maybe even grading homework assignments that you distribute to people in the form of notebooks.

15:24 Yeah, I definitely want to highlight that because I haven't talked about it very much, but it was a graduate tara. A lot of calculus, a lot of linear algebra, and various other applied calculating type things like MATLAB type of stuff and automating all. It would have been good.

15:42 Okay, you just replace yourself with a regression case.

15:46 Exactly.

15:48 Submit your calculus test to the continuous integration. We'll see how you did. This portion of Talk Python to Me is brought to you by the AWS Insiders podcast.

16:01 When was the last time you ordered a physical server to host your functions as a service? Your latest API or your most recent web app? I remember the last time I did that was around the year 2001. And yes, it was quite the odyssey. Of course we don't do that anymore. We run our code in the cloud with near instant provisioning and unparalleled data centers. And the most popular cloud provider is AWS. But for all the ways that AWS has made our lives easier, it has also opened a massive box of choices. Should you choose platform as a service? Or maybe it's still VMs with IAAS. What about your database? Maybe you should choose a managed service like RDS with Postgres. Or is DynamoDB better? Maybe Aurora? No, wait. I hear good things about Amazon document DB too. And that's where the AWS Insider podcast comes in. This podcast helps technology leaders stay ahead of Amazon's constant pace of change. And innovation. Some relevant recent episodes include Storage Wars Database Edition, Microservices or Macro Disaster, and exploring computer vision at the Edge with AWS Panorama. They bring on guests to debate the options, and the episodes are vibrant and fun. So if you want to have fun and make sense of AWS, head on over to Talk Python FM/Awsinsiders. Yes, I know you probably already have a podcast player and you can just search for it there, but please use the link so that they know you came from us. Thank you to the AWS Insider podcast for keeping this podcast going strong.

17:36 So there's a couple of sections that I'm not sure I really want to dive into because I think they're not exactly the notebook ones. But one is this collaboration education stuff, so maybe we could start there. Let me just set the stage by saying a little while ago I met with Sam Lau. I talked with him, he and Philip Guo, they did a research project where they studied 60 different network environments, not just Jupyter, but like Google Colab and 58 others. And so just kind of putting it out there, you might think just Jupyter versus Jupyter Lab is the discussion, but there's a whole lot of different places where you can do notebooks, right?

18:19 Yeah. And some of them you can run on your own machine. That's sort of what I have in this runtime/environment section. Those things tend to go in there and there's a separate category of I call them in the list, hosted notebook solutions. Those are things that you don't really run on your own machine, but they run somewhere in the cloud. So I think basically that is one way you can break them down into categories. It's just do you run them yourself or do they run on the cloud somewhere else?

18:47 One thing that I didn't see on the list, but maybe it would be kind of its own special open a pull request. Yeah. Right. Here we go. Is the Jupyter Lab desktop app. Have you seen this?

19:00 Not sure I've seen that.

19:01 Yeah. So what it is, is it's an Electron JS app that bundles the runtime environment to be Jupyter Lab and it comes with its own python and everything. So it's a thing you can hand to somebody that runs locally that lets them do notebook stuff without them having to have python installed and set up the environment. And it just kind of has a little wizard to get it started, which is I'm not sure I would use it personally, but it's pretty interesting.

19:26 It sounds like a very low barrier to entry notebook environment.

19:30 Yeah, I think it could be good for, you know, like in a school environment where you're like, alright kids, just take this and run it. I don't want to have to debug why you can't install Python 3.10, but you need 3.7 whatever.

19:43 Lots of different people have different kinds of Jupyter or notebook setups. Mine personally tends to be I actually run mine in the cloud because I find it convenient to be able to access it from different machines. So I access it from work, and then I can access it from my notebook, even, like, when I'm at a friend's place or something. All I need is a browser to access it and I can just continue where I left off on the other machine. Other people prefer.

20:12 You can probably be closer to the data.

20:14 Right, exactly right. Yeah. And you've got a lot faster pipe, and you're not that dependent on what your own network situation looks like, wherever you are.

20:24 Yeah. You're just sipping the answer, not the data required.

20:30 Exactly right. And you've got a Prevc machine, on the other hand, that can deal with all the calculations. So I find that pretty cool. At some point, I overdid it and even docker eyes, the whole thing. And then I felt like that was getting more in the way of it than being helpful.

20:47 Yeah, I gave myself a DevOps job. Why did I do that?

20:51 Yeah, exactly. I only made one of these installations. Why do I dockerize them?

20:56 Yeah, that's actually a really good point. I have the same philosophy on web apps. It's like, well, if there's just going to be one of them and it's just me, how much flexibility does this thing really need? Okay. Yeah. So there's a whole section on these with honestly many places I haven't heard of and ways to run it. But let's talk about two things in this collaboration education section. Three, actually, but two kind of in my mind, put them together. One is NB grader, like real quick. This is a pretty cool project. Tell people about this.

21:24 This is pretty much what I described before in the abstract. If you are a person in education and you teach a course and you want your students to do a particular assignment and then they send in their submissions, you don't want to hand grade them one by one. What you can do is formalize basically what you want the answers to look like in a form of regression taste. And that is basically what any greater is. So you get one notebook and you define what you want the answers to look like, and then it just does the rest of it for you.

21:56 That's pretty interesting.

21:57 I've never used it myself because I'm not working in academia. The value properties, obviously that one.

22:04 Well, I think there's two values here. Obviously, the less effort on the instructor, there's also a little bit of more fairness.

22:13 Sure.

22:14 There's an interesting angle. I'm sure that this is true for grading.

22:18 Is it morning and you're rested and patient, or is it late and you're in a rush and you're frustrated? I don't know which affects which in terms of how the grades go, but it's got to have an effect. Right. I was talking to some folks who did machine learning for discovering planets on Talk Python, and they said after the afternoon coffee and cake or cookies or whatever it was they had at this university more exoplanets to be discovered than in the morning.

22:46 Okay.

22:47 I was always told to call people after lunch. That's when they are usually most contained and want to open to exactly.

22:55 There's probably a thing of that grading. So the fact that this doesn't care, it doesn't get coffee, it gets electrons.

23:02 And I think there's also, like, a social science paper on, like, jury verdicts. Sort of the harshness of jury verdicts over time of day. Right.

23:12 That's a little bit harsh to think about, isn't it? I got an extra year in prison because they were grumpy. Right. That's not how it justice should work.

23:19 Yeah. They didn't have their coffee yet.

23:21 So the other angle that I think is interesting with this is if you're a student, you get to know whether or not you passed that question. Right. A lot of times when you're doing complicated things, it's like, I think this is right, but if it's not, it's really straightforward, like a calculus. Here's what the formula derivative is, right. But it's a slightly more nuanced. It's hard to know what the right answer is. And so here you're like, well, the test passed, so we're good to go. I like that.

23:45 I had a course like that at university once where you could do multiple submissions and the system would tell you, like, how many points you scored. And it actually sort of was very motivating to sort of keep going until you score a perfect answer. I think having something like that in the course would be super cool.

24:02 I totally agree. Before we move on real quick in the audience, David says, I use NB grader for my teaching. It's super helpful. NB grader identifies wrong answers, and then you can go in and assign partial credit. Yeah, I love it. That's actually really neat. Really neat. Okay. The other one that's more of an educational demonstration or exploration is NB tutor here. So NB tutor lets you go in. Will you tell people about it if you're familiar with this one?

24:28 No, I haven't used it lately, but it looks like William I mentioned Philip.

24:32 Guo and Sam Lau. They did Python tutor, which lets you go and write some Python code. And it shows you basically how it executes and how variables are related with pointers and stuff. And this is inspired by that. So what it lets you not this one. It lets you basically run a what is it? A magic command with a percent cell magic. Cell magic. You run some cell magic to turn it on, and then to the right of the cell, it starts showing the pointers and how things are relating. So if you're trying to understand computer science and things, I think this would be cool for teaching.

25:07 And you have your code and all you have to stick on to get the visualization is this one short Cell Magic and you get the risk for free. That's pretty cool.

25:15 Yeah, it's really cool. They give credit right here to online Python tutor. There's some other ones, the Jupyter Drive one to integrate Google Drive looks pretty neat. I think it's a little bit expired when I opened it.

25:28 It's definitely more of the experimental flavor and I imagine sort of whoever develops this is also at the mercy of Google Drive, keeping their API API stable.

25:39 Absolutely.

25:40 I think that is one way or it's in general, it's a nontrivial question, figuring out how to best store your notebooks. I mean, if you're just one person, you can probably stick them into Google Drive. But as soon as you have more than a handful of people working on the same set of notebooks, you probably want a solution that is a bit more sophisticated than that.

26:00 I totally agree. One of those solutions might be a proper Git story. Right. And some of the tools we'll talk about are going to cover that, right?

26:08 Yes, exactly.

26:09 The other one could be a collaborative, like a Google Collab or some other one of these other environments where it's like Google Docs.

26:16 There's hosted environments that sort of have that as a built in, or basically the other way is you roll your own and make it Git based. And both have their advantages and disadvantages. I think with Git, you always know a little bit better what you have and what it does. Whereas with the other one, that might come with some other fringe benefits, like being able to comment on it or having versions of the notebook very nicely integrated with GUI and it's like whatever you've preferred, both work.

26:48 Yeah. The online ones often have infrastructure that comes with them too, right. The ability to presco and run it on a GPU if you're willing to pay or whatever.

26:57 Yeah.

26:57 Cool. Okay, I think that probably is the interesting one. So that jumped out at me from there. And then next one is visualization. I mean, this is at the heart of the value of notebooks in the first place. So, Altair, tell us about that one.

27:13 I'm biased, even though I sort of I'm not a developer of altair. I think what what that team has developed is pretty amazing. I use it for most of the things for most of my visualization needs are almost exclusively what's neat about Altair is that it is declarative and it is built on top of a technology, on top of another package, which is called Vega. And Vega is a platform agnostic visualization framework. So basically what you have to do is if you want to have a chart like that, you just write some JSON declaration of basically here's your data set. This is the URL to the. Data set. It's a tabloid format. The variable that I want on the X axis is called shoe. The variable that I want on the Y axis is called Bar. And I want a scatter plot. And please make Origin in this example here, please make that the color of the dots. And you just specify that in a declarative format. And then what that allows you to do is you create this declaration from Python. But it might just as well be JavaScript or even like a handwritten JSON. Right?

28:27 Right. There's some kind of JSON data definition that goes to Vega and that drives the picture. So basically Altair generates that data set that goes down to the next layer. Right?

28:37 Exactly. So, like, Altair is the Python binding on top of Vega. And I think sort of declarative systems, most of the time they have a higher level of abstraction. They have more concise notation. And the way I always explain this to people when they ask about it is Vega and Altair is to visualization what SQL is to data Query.

29:01 Your SQL query. You can execute that from within Python. You can execute it from within a Gui. It's sort of a language Agnostic specification of what data you want to query. And this is basically the same thing for Visualizations.

29:16 Yeah, it looks really great. There's a beautiful picture of a scatter plot with a legend and multiple colors kind of pulling out some nuance in the data. And I don't know how many lines of code you put that maybe four if you didn't multiline one of them. I mean, it's really you can probably.

29:34 Golf it together four lines without semicolons.

29:38 If you did semicolons, you could do it one, but that would be wrong. But four reasonable lines. You could do this beautiful picture here, including the import.

29:46 And that already gives you quite impressive visualization. I think it has some nice defaults. Like it knows how to nicely space the labels of the axes and stuff like that. And then what it still allows you to do is build pretty complex visualizations too. So There is this one example where you basically have a scatter plot on top and then on the bottom you have something like a histogram and you can select the range in the histogram at the bottom. And there you go. And Then you get this beautiful interactive animation. And you don't actually have to write any imperative code. You just specify what you want and Altair and Vega kind of do the rest for you. So I write some JavaScript, but years ago I used to see these really nice and beautiful animations that were built up D3. And I'm like, I want to do cool stuff like that. That's really pretty. But I don't know any JavaScript. And I feel like this is like leveling the playing field a little bit more and it allows you to do similar things from within Python.

30:51 It's really nice. One of the things that I think it's a little ironic is for the people who create these tools, like the people who created Altair, they have to write so much JavaScript and not that much Python. Right. Because they're building these interactive, beautiful experiences on the front end for us, we get to write the Python and there's like a lot of that complex JavaScript is encapsulated into these tools that we don't have to think about but we get to use, which is great.

31:17 It's a dirty job, but somebody's got to do it. And I have a lot of appreciation what those folks are doing for the rest of us.

31:24 I do too.

31:25 All right, so Altair is number one, right. On the visualization.

31:29 It doesn't hurt that it starts with a but also maybe one of the best on it.

31:34 Some of the other shout out, in.

31:36 All fairness, these things are sort of usually alphabetized every once in a while, like, things in the wrong place. And by now, I either built myself a litter that keeps the list nicely alphabetized.

31:49 I'm sure that makes a lot of sense. So Boque is one that's out there that's pretty well known.

31:54 People use a lot.

31:55 Yeah, there's a lot of them.

31:58 Right? There are. And I think when you talk about visualization, the other very popular ones are probably MatPlotLib, which probably was one of the first plotting engines for our back ends for Python and then Seaborne, which kind of like builds on top of that.

32:13 Yeah, absolutely. Seaborne is nice. One that I've seen just recently on notebooks is tqdm. I've always used this from CLI applications and tqdm is a way to just take a for loop and whatever you're going to loop over, you just put that in tqdm, bracket that thing, and it becomes this live animated progress bar, which is really neat, but I've only thought of this as a terminal CLI type of thing. But it works in Notebooks, too, I just learned. Right.

32:43 Yes, it does. tqdm is one of those, like, does exactly what it says on the team kind of things. It does one thing and it does it very well.

32:51 Yeah, it's not an incredible output, but at the same time, it's like, you know what, I want to have a little bit of feedback for the users or from myself. And you're like, okay, it's just literally wrap your iterator in tqdm and it's good to go.

33:06 It's a very natural thing to want. Just imagine you've got like, some long running computation over a loop, right, and you just don't want to stare at a blank screen for two minutes.

33:16 Yes.

33:16 You can kind of see how maybe the idea developed from there.

33:19 I've got a lot of those types of things.

33:24 This portion of talk Python to Me is brought to you by Sentry. How would you like to remove a little stress from your life? Do you worry that users may be encountering errors, slowdowns or crashes with your app right now. Would you even know it until they sent you that support email? How much better would it be to have the error or performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in the report? With Sentry, this is not only possible, it's simple. In fact, we use Sentry on all the Talk Python web properties. We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got the support email. That was a great email to write back. Hey, we already saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users. Create your Sentry account at Talkpython.Fm/sentry and if you sign up with the code, Talk Python all one word. It's good for two free months of Sentry's business plan, which will give you up to 20 times as many monthly events as well as other features. Create better software, delight your users, and support the podcast. Visit Talk Python.Fm/sentry and use the coupon code talkpython.

34:39 Brian out in the audience says hv, plot, holow views, bokeh and panel are all awesome and tightly interconnected. Yeah, those are really nice. All right, so next one is the publishing. This might be also a little bit at the very heart of notebooks. The original idea of the notebook was I want to have some explanation and then some executable code and then some visualization, almost like I want to tell the story of a research project or something like that. Right. And so this section, it's right there, isn't it?

35:11 Yeah, exactly. I think it's sort of less clear what this category is as compared to some of the others, but it is basically anything in sort of that space of how do you run it, how do you tell a story with a notebook? How do you point out little things inside those notebooks? One of the entries in there that I find quite interesting and useful, and it's also pretty awesome from a technical perspective, is Binder. So what Binder allows you to do is you can basically take any GitHub or even GitLab or other hosted git solution URL and put it in there. And then what Binder does is it builds a docker image that has all the dependencies of those notebooks and builds that image, finds an executable node somewhere in the cloud in their infrastructure, and then points you to a jupyter instance that has that notebook running. So what it allows you to do is you see a notebook of GitHub and you're like, geez, if I want to poke around with this thing, you just go on Binder, put in the URL, and you can play around with a notebook interactively.

36:24 It's really cool. So people have seen the Launch Binder little tag or whatever you call that on unlike a GitHub repo or somewhere else. I guess it could even be in an article that then just points back over to one of these. If you click it just as you said, it's going to create an executable environment with the right dependencies and let you run your code there. Which kind of impressive that that's available to the world openly, publicly, without authentication. Right.

36:49 It's an incredible engineering feed, right? Yeah, just all the considerations of signing a note that has sufficient sort of resources available to be able to do that assembling basically you don't know what people are going to throw at you in those repos. Right. Doing a pretty good job at sort of making a whole lot of notebooks executable.

37:11 I had Carol Willing among others on the show recently to talk about mastodon and we talked about the Federated story and there's a bunch of people who are creating servers and allowing others to use it, sort of volunteering to add a little bit of resources. And she said a lot of what she sees over there actually was there's some parallels over in the binder space about how certain universities and other places are. We'll set up the ability to run some of these binders to add a little bit of compute and resource to the world and yeah, it's similar.

37:45 It's pretty amazing. Like from what I can see, what goes on behind the scenes there is some of this I think the execution is run on Google hardware and then beyond Google they have three other hardware providers. So not only do they manage to sort of make all those code notebooks executable, but they even run them on four different sets of infrastructure, which is pretty amazing.

38:11 Very cool. Okay, so a couple of others another one here that really jumps out at me is Jupyter Book over.

38:19 Let me pull up there. So build beautiful publication quality books and documents from computational content. Really nice.

38:27 I think there's a couple of sort of projects that try and do similar things which is basically you create a set of notebooks and then you either get a web page or you get a book. And two of the things that these are useful are well one, you're trying to write a book about some subject matter, a machine learning book or something like that. And the other case where it's really useful is documentation, right. If you are a developer or maintainer of a software package and you want to document your API, something like that can be very useful. What this gives you is not only the ability to write documentation, but also to include code in that documentation. And then in some cases if you have the binder link, you can even set it up so that you've got a piece of code in there and people, by clicking a tab next to it can even try out what that code does. Fiddle with it a little bit and then see what that does.

39:29 It allows you to do some pretty cool stuff.

39:31 That's a pretty interesting way to bring it back around. Like, we've taken this computational thing, got it going, turned it into a static book that you have. But if you click this button, you can go back and kind of what.

39:40 We just talked about also sounds very much like NB DEV, which is actually a project that has a very similar flavor, where it's specifically geared towards people who write software packages. And the idea there is that you take your code and you define your classes inside of a notebook.

40:03 So you can both actually have your code lives inside that notebook. You can also define your tests in that notebook. And then some added benefits that you get from that is you can run your tests from a notebook. And these things don't live in seven different places, like your code base, and then your test base, and then your documentation repo. But they all live together in one space. And if you make a change that influences or has an impact on all three of them, they don't need to go in three places, but you can just do it in the notebook where it's all together.

40:38 Of all the things that plug into Jupyter, I think I'm most impressed with NV DEV. It's pretty nuts.

40:44 Yeah. I think this is Donald news. He called this literate programming environment, and I feel like this is the kind of stuff he envisaged back in, like, 83 when he wrote his book about literate programming. Right. This is really what he had in mind.

41:01 It took a while to get there. Yeah. The tools he was working with, they weren't like these. Yeah. So with NVdev, you can have your documentation.

41:08 You can take your notebook and export it, or convert it into or build it into a Python package or a Conda package. You can publish it to PyPI and Conda. You can have tests, you can have continuous integration.

41:21 Then also, if you've got complicated code that you need to integrate in other ways, you can sync it to Python files and then back. I think that two way integration is pretty cool for this. I really need to get this out of the notebook into Python directly. But then don't just carve it off. Like, keep them sort of connected.

41:39 Right, exactly. That's one of the key points there, is that Nbdevs allows you to do all of these things in the same place. Right? Sure. You can do all of these things separately, and a lot of people do them separately, but then having them all in one place is just so much easier. When you say make an API change yeah.

41:58 Another one which wasn't really even shouted out there in that highlights that they have is Nbclean, which if you're doing we talked about the two possible ways for collaboration. If you're doing the get way, these notebooks. Their files contain the output, which could completely vary from run to run. So like every time you save it or rerun it, it's a merge conflict, right? And so this will strip out that kind of information to avoid merge conflict. So it can be a real good way to sort of prepare it. I suspect that could be a get pre commit hook. I'm not sure of it.

42:29 Probably that's one way to do it. And I think sort of the fact that the notebook format contains the cell output and also submitting information is a bit of a blessing and a curse at the same time. On one hand, when you have a notebook file, you can load it up and you immediately see the output without having to run the cells, right? So imagine someone sends you a notebook file and you cannot run it on your machine. You still get the benefit of seeing what their output was when they generated it. So it's nice for that. It's also nice for the fact that if you put it up on GitHub, GitHub can render this notebook file with the output in a very nice and sensible way. You immediately see what the output was. But the disadvantage of doing that is that whenever you rerun it, the contents of that file change and it doesn't produce clean DIVS. For example, if you just change one line and then that line produces a plot, your diffs might be a couple of kilobytes long, right? If you don't want that. And one of the tools that sort of deals with that is something we had on the list, JuPY text. So Jupytext is basically so true text deals with this problem by basically giving you paired notebooks. So you have an Ipnb file and you tell your Jupyter IDE. In addition to that Ipnb file, I also want a markdown file or a PY file that contains just the input cells. And then what you can do is if you want to version your notebook, is just check in that clean or stripped PY file or markdown file and rather ignore the Ipnb. So that is what we actually do in our company. We have a couple of notebooks that serve as reports and we edit them collaboratively. And the way we version them is by stripping out the output via Jupytext. And so you really get nice clean diffs. And when someone makes a change, you can tell what they changed in a reasonable way.

44:33 That's super valuable. Do you want to go back to the Nvdev for just 1 second? Because there's a big long list here that you would get if you train Nbdev help and so many that you're like as a standalone command to be like, oh my gosh, what an amazing tool. Like another one I just saw is nbdev change log, which will create a changelog Nb file just from your closed and labeled GitHub issues and that's a cool feature on its own. I could see installing that and never even using a notebook and just running that on my Git repo.

45:04 So there's a bunch of neat things here.

45:06 You look at the list and what NB Dev does for more than five minutes, and it makes you want to build a software package.

45:13 Yes, it does.

45:16 Which you could do with Nbdev, by the way, so it's kind of meta. Another one around. There is NB convert.

45:22 NB Convert.

45:24 It's almost there in the name, right. Basically, one thing it gives you is a command line command, where you can do Nbconvert full Ipnb and say, I want to convert this notebook to HTML so you can script the conversion of notebook on a command line level. And then beyond that, what you can do with NV convert is it's also accessible as a Python package. So you can control that notebook conversion from within your Python code. And that is useful for a number of things when you really want fine grained control over how you execute or convert your notebook. So one of the things that I have built with that is basically a way to convert notebooks without the input cells. So if you want to build a report out of a notebook, and you have a non technical audience, and you want to get the code out of their way, and you really want to just show them the output, that is something you can build with NB convert.

46:25 That's really cool. I hadn't realized it did that. So here's what I want to show you. And if you actually want to see the code, click this binder version or view it statically on GitHub. But for most people, they just want to see. Here's the description and here's the figures. It's still potentially accessible, at least.

46:43 Totally. And I think notebooks are actually quite nice way to design a report. Right? Because what you have is building blocks. And my report is I want the total number of orders for the last week, and I want a chart, and then I want the number of orders by zip code or whatever. And you build that in a notebook, and then you just NB converted with the input stripped out, and you get an HTML file that you can serve on a web server and have something like a dashboard for your team.

47:17 Very good.

47:18 So it's super cool for building things like that.

47:20 Yeah, very neat. One more in this section before we move on. There's a bunch, but I think Paper Mill is pretty unique. You want to tell people about Paper Mill?

47:30 I haven't, to be honest. I haven't used it that much.

47:32 Yeah, I haven't either. But I did read this Netflix Paper Mill article about what they were doing, and they were basically using Kyle Kelly. Yeah, they were doing some work to take notebooks and use those as, like, building blocks for managing their infrastructure. So there were a lot of interesting benefits. And Paper Mill will let you basically turn the variables at the top of the notebook into inputs and then the variables at the end of the notebook as outputs and you could treat it like a function. So they're chaining these together for all sorts of crazy DevOpsy type things.

48:11 I believe it basically sounds like something I once rolled my own of, where basically what I just described that I'm doing with NB convert. I once built a version where you could in the first cell have a variable and serve that like a function input. And it sounds like they probably did a much better job of sort of imagine you have a dashboard and then you've got one for the US. And you want one for international orders, right. Or for Canadian or what have you country. Right. You can basically do that out of the same notebook file, but just say, hey, these are two separate versions, one for the US. One for Canada. One for Mexico. Whatever.

48:49 Exactly. One of the benefits that they said they were getting was if something goes wrong, like if a step crashes, because as you described the notebook, as it executes, it stores the exact output and the inputs. It's a snapshot in time of what happened when it went wrong. So instead of just having a log message of it went wrong and here was the input, it's like, well, here are all the steps and you can see the variables and the output coming. And then here's the crash. You're like, oh, look, these are the three inputs and here's how this one got.

49:17 It actually gives you nice diagnostics.

49:20 Yeah, it's like a report of what went wrong.

49:22 It's almost like a you know you have these parameterized tests, right? It is, yes. Parameterized test with some diagnostic output about what went wrong during that execution. That's pretty cool.

49:35 It's unique, I think. What are the shout out? I suppose there how about the version control side? Anything you want to give a shout out today?

49:41 That is a section that is a bit more sort of well defined and clear what goes in there. It's basically all sorts of tools for differing merging, code, reviewing, changes in notebooks. NB Dime is probably one of the more well known packages in that category that looks very well built.

50:01 Right. The diff and merge tools. Yeah.

50:03 So imagine you have an image in an Ipnb file. You would usually get like a horrendously large div if one of the things in that image changes and it just displays the div not as 50 lines that change, but just like a need. Here is something that changed one line and doesn't mess up your whole div.

50:25 Right. This plot changed, not these 700 lines of condition change.

50:31 Very cool. And merging as well, which is neat. All right.

50:36 A little short on time. Let's see.

50:38 I know you want to give a shout out to Deep Note, both because you like them, but also they're a sponsor of the they support the list. Yes, they support the list. Yeah. So maybe tell us about them.

50:47 Deepnote is basically one of these hosted notebook solutions. We already mentioned Binder, we already mentioned Colab, and they have built one that is a bit more centered around collaboration of teams. You also get an execution environment with different sets of hardware from them and they really emphasize collaboration. So you can comment and have discussions about individual cells in your notebook. You can very nicely see what changes other people did to your notebook. And all of that in a very well done, very dense GUI that makes a lot of things very easy for you if you're not already using one of these environments. And you have a team that is maybe also not sort of particularly technically focused on where it runs, how it runs. This is a very turnkey solution.

51:39 Less DevOps, more Google Docs type of style.

51:43 They solve a lot of those problems for you.

51:46 It looks really neat. The collaboration seems very nice. I think that's a pretty unique thing to do in a polished way now.

51:53 And they also have nice integrations for a lot of data sources. So you can directly query SQL from their GUI and PyPI SQL results into a data frame. Right? So you have one cell where you write your SQL and that goes directly into the data frame that you can then visualize in the next cell. So the Ergonomics of Deep Note are probably better than many other notebook solutions.

52:18 That's pretty neat. And so, like I said, it's worth pointing out their sponsor of your list. Not of our show, but they are a commercial thing, but a commercial venture. But they do have a free version. There's something for me where I feel like I kind of like to support companies that are purpose built, right? Deep Note is built for running notebooks and collaboration, whereas a lot of these big tech things, it's like, well, I know Facebook is a social media company, but I could also do this other thing that runs on or I could run this on Google. On Google. And there's something about like, okay, there's a company whose only job is to do this versus I'm not sure I could ever get support. Like, if I had a problem with my Gmail, I don't know I could ever get help ever. Right? Whereas if I went to an email company and got email from them, they would help me with email because it's their thing. Right.

53:09 I think where it shows is that they spend a lot of time thinking about how people use notebooks and what they want to do with those notebooks. And sort of they sing a bit beyond just sort of the Jupyter technical, but more the ergonomics of how you actually use a notebook within a team. And I think you can definitely tell that they know how to use notebooks and have thought about all the ways in which people use notebooks and how to make that better.

53:36 Nice. Well, I've never used them, but if I have the need, I'll check it out. Sounds good. All right, what else? Maybe we got time for one more. Is there one we haven't talked about yet? You're like, we really got to cover this one.

53:48 Let's just cell magics. Right?

53:50 Okay.

53:50 If you maybe just google for IPython. Cell magics.

53:53 I'll caggy for it.

53:55 Yeah. They are not strictly a Jupyter thing, but I think they're a not so well known thing that is super league.

54:04 And they are, since Jupyter itself started, was born out of IPython, these are things that are also built into IPython. So one of them is, for example, percent debug. So when you have a notebook open and you get an exception and your code gives you an error message, you can then sell directly below that type percent debug, and that will open up a debugger session that takes you directly to the point where your code fail. Right. So if you write something like, I don't know, I access foo, the array foo at an index position of five. Right. And that's where your code fails. You can do percent debug and look at well, at that point in the execution, what was actually stored in the list? Foo. And did it actually have five elements or did I run out of bounds? So whenever I get an error during an execution of a notebook and it's not obvious, I like to use percent debug to find out what's going wrong with my code.

55:07 Very nice.

55:08 Another one that is very useful is percent time, which is, have you used that?

55:13 No, but I often want to answer this question of, like, is this getting a little bit faster or a little bit slower? And I don't want to go into a profiler and I don't want to go to write the date, time, or timestamp code to print it out myself.

55:27 I use it a lot in those situations where I've got weight A to do something, and weight B to do something. Right. And then I wonder, is this actually faster or is this actually, performance wise, worse than doing it the other way? And this gives you just a very quick and dirty answer to, like, orders of magnitude. Is this the same level or should I be doing things differently? It's very useful for that.

55:51 Yeah. Percent time and some function call and see what it takes.

55:54 It's brilliant. And then another one I want to mention is just exclamation point and then the shell command. Yeah, it's very useful for I think it's also available as percent SX.

56:08 Yeah.

56:08 So it's basically a shorthand, so you can do exclamation point LS to get to a directory listing just to see. What is in the directory. Or you can do like, I don't know, who am I to see what user you're running something as? Or you can do ping a machine to see if it's all directly from your notebook. So this is a quick and dirty way of running command line commands without having to leave your it's really cool.

56:36 You know, if you wouldn't just interact with the file system or you see what files are available to me or all these things, right. You wouldn't know because this hasn't published yet. But in the sequence of when these shows come out, I just talked to or you're run from Data Science at the command line. Have you seen this book?

56:52 No.

56:52 Yeah, so it's got a bunch of interesting things that you can do on the command line for querying data or running things in parallel and a whole bunch of these these sort of ideas of like, how do I do really cool stuff with the shell? Just use your bang command and they become integrated into your notebook. Right, which is I think that's super cool.

57:12 You can even take it to do things like you do an LS and then your python session gets past the contents of that directory as a list and you can then use it as a variable and assign that list of variable. So that is a really easy but also very shoddy way of listing the directory contents very quickly.

57:33 You should probably use that. We should bring that up. But if there's like, really interesting stuff that's happening on the shell you want to use like, for example, there's this example that says total lines equals that's a brilliant one.

57:45 Yeah.

57:45 Which is a Jupyter level command. And then it says bing arrow to redirect input of some text file word count L. And that tells you how many lines are there. Like, maybe you want to make this all python so it's not platform dependent, right?

57:59 Sure. If you want to package this off or package. But if you only end up doing it once and you just want to know how many lines there are, that's the way to do it.

58:08 Right. You're in like an exploratory situation, right, and you just generated this file and now you want to know. I don't know. So there's that's a really powerful one. I'm glad you brought that up.

58:17 And then just the last one trim there is because it definitely saved me in a couple of very hairy situations. Yes. Percent history. So that basically gives you a history of the 2010 or however many last commands you executed in a python section. Why it's sometimes useful is just imagine you accidentally delete a cell in a notebook. I mean, you can always undo that, right? But every once in a while, your undo history is so horribly messed up that you somehow sort of lost that cell forever. And there was like, a lot of slightly to code in that. So you can go back to that with percent history and get your deleted cells from beyond the grave.

59:01 Yeah. And you can do percent history in five. Just show me the last five changes or yeah, it's really nice. Cool. All right, Marcus, I think we might be out of time, but you know what what a cool project this awesome list is.

59:15 We've got a lot of people listening in. I'm sure there's a lot more awesome packages, tools, resources that could and should go on this list. So if whoever is watching or listening to this has a package that they feel should be mentioned there, it's just as easy as doing a pull request and adding your favorite tool or resource to that list. We're very inclusionist on the list. Right. We like to include people's suggestions.

59:43 Yeah, fantastic. There's a couple that I could see showing up. There, like, a few comments in the live stream that this could be PR. So how about it? That'd be awesome. Yeah, awesome. Final two questions before I let you out here, though. If you're going to write some Python code, what editor do you use these days?

59:59 Jupyter, of course. I overuse it.

01:00:02 Beautiful. And then notable PyPI or conda package or something out there, or even a Jupyter plug in. I know basically this entire show has been one after another, but something you want to give a shout out to.

01:00:14 If I can just give one a shout out, it's Altair. You should be doing the applauding, and.

01:00:18 Altair right on you. It's quite nice. All right, well, thank you so much for being here. It's been great to have you on the show.

01:00:24 Thank you for having me.

01:00:25 Yeah, you bet. Bye.

01:00:26 Have a good one. Bye.

01:00:28 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. AWS is the lead cloud for developers, but with over 250 services, it's an overwhelming set of choices as to where the AWS Insiders podcast comes in. Their job is to help you make sense of all those AWS options. Listen to an episode at Talkpython.FM /awsinsiders. Take some stress out of your life. Get notified immediately about errors and performance issues in your web or mobile applications with Sentry. Just visit talkpython.Fm/sentry and get started for free. And be sure to use the promo code Talk Python all one word when you level up your Python. We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and Async. And best of all, there's not a subscription in sight. Check it out for yourself at training.Talkpython.FM. Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be right at the top. You can also find the itunes feed at itunes, the Google Play feed at /Play and the Direct RSS feed at /rss on Talkpython FM.

01:01:44 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talk python.Fm/YouTube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon