#343: Do Excel things, get notebook Python code with Mito Transcript
00:00 Here's a question. What's the most common way to explore data? Would you say Pandas and Matplotlib?
00:06 Maybe you went a little broader and more general and said Jupyter Notebooks. How about Excel or
00:13 Google Sheets or Numbers or some other spreadsheet app? Yeah, my bet is on Excel. And while it has
00:18 many drawbacks, it makes exploring tabular data very accessible to many people, most of whom
00:24 aren't even developers or data scientists. On this episode, we're talking about a tool called
00:30 Mido. This is an add-in for Jupyter Notebooks that injects an Excel-like interface right into the
00:36 notebook. You pass it data via a Pandas data frame or some other source, and then you can explore it
00:42 as if you're using Excel. The cool thing is, though, just below that, in another cell, it's writing the
00:48 Pandas code. You need to actually accomplish that outcome in code. I think this will make Pandas and
00:54 Python data exploration way more accessible to many more people. If you've been intimidated by
00:59 Pandas or know someone who has, this could be what you're looking for. This is Talk Python to Me,
01:04 episode 343, recorded November 8th, 2021. Welcome to Talk Python to Me, a weekly podcast
01:24 on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy,
01:28 and keep up with the show and listen to past episodes at talkpython.fm. And follow the show
01:33 on Twitter via at Talk Python. We've started streaming most of our episodes live on YouTube.
01:39 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows
01:45 and be part of that episode. This episode is brought to you by Shortcut and Linode,
01:51 and the transcripts are sponsored by Assembly AI. Aaron, Nate, and Jake, welcome to Talk Python to Me.
01:58 Hey. Hey. Hey, y'all. Thanks, man. How's it going?
02:00 It's going really well. I'm excited to be on the data science side of the world today with you guys.
02:06 Cool. So are we.
02:07 Yeah, you built some really neat tools to help people get started and get up to speed and just be
02:11 more efficient than just solely writing Python code, but not, you know, excluding that either with your
02:18 product Mito. So that's super fun, and we're going to talk about that. But before we do, you know,
02:22 let's just kind of go around and how do you get interested in data science and working on this
02:27 Python tool? Aaron, you want to go first?
02:29 Yeah. So a little background. Jake and I, you can't tell just by the first names,
02:32 but we are twin brothers. So we've been working on projects together for a long time. Nate has been
02:38 our best friend since middle school. I think I didn't get invited to his eighth grade birthday.
02:43 So I think maybe starting in high school. And then he also went to college with us.
02:46 And I think like, yeah, we got our first like taste of data science at Penn. We all studied a mix of
02:53 computer science and business. And in the business classes, you do a lot of Excel based,
02:59 mostly, unfortunately, Excel based data analytics work and stat classes and finance classes and stuff
03:04 like that. So I think that's really where we got our first taste of data analytics or data science
03:09 work. And then we've each had some experiences through internships and jobs that we've had over
03:14 the past few years in the data science space as well. But really, I think it all goes back to the
03:20 beginnings of the courseworks that we did at Penn.
03:22 That's great that you all are able to stay together. I mean, obviously brothers,
03:25 continue to work together. And on this project, you know, business, business schools,
03:30 a whole business program just run on Excel, don't they?
03:33 It's kind of amazing the contrast because so Aaron and I specifically both got degrees in computer
03:39 science and in the business school. And so it was there was this transition where you'd be hanging out
03:43 in, you know, in class in the engineering school, and you'd be writing code. And then you would like
03:47 walk up campus into the business school, and it'd be like returning to the dark ages in some ways,
03:51 right? Yeah. And what's really cool, I think about Excel generally is that what we observed is that it
03:55 let, you know, our peers in school and us as well kind of complete really some amazing projects that we
04:01 might have not been able to do with code because our skills, you know, we were still learning at that
04:04 point. And so it's really kind of this really beginner friendly, amazingly powerful tool for what it is.
04:08 But then we would go back down to the engineering school and be like, Oh my God, there's all this
04:11 tooling here. We could have superpowers, but we don't know how to use this stuff. Right? So there was this
04:16 very direct contrast that we witnessed where there's very cool stuff happening all over the
04:20 place. But the tooling differential is pretty dramatic between between business school and
04:24 computer science world. And I think that's kind of what initially made us interested in this space.
04:27 Sure. You don't want to underestimate the power of just firing up Excel, selecting a section,
04:32 throwing up a graph or two. And that's incredible. And all of the functions and stuff. But you know,
04:38 it's if you think of bad programming practices, one of the worst ones has got to be like,
04:44 do these three things and then go to over here and then do a couple of things, then go over there and
04:48 then get that thing and then go back. You know, we've banished this type of programming from regular
04:53 programming long ago. And Excel is like that without even being able to see where the go to's point.
04:58 You know, it's really, it's really not very predictable, right?
05:01 Yeah. It's really amazing. And it's really, it's crazy, I think, because once you start thinking about
05:05 spreadsheets, you quickly realize, I mean, there's a couple of crazy things about spreadsheets that
05:08 people don't really acknowledge. We, you know, talk about them internally because we're
05:12 spreadsheet nerds at this point. And, you know, we like, you know, hyping up spreadsheets and stuff,
05:15 but it's like, you know, the original killer applications of computers were spreadsheets,
05:19 right? Oh yeah. And more than that, it's spreadsheets are the most successful programming
05:23 environment in the world. Hundreds of millions of people can program in Excel. The next leading
05:28 programming language has 10, 20 million. You know what I mean? So it's really, there's an order of
05:32 magnitude difference in how, you know, well adopted these things are, but you're totally right.
05:35 The number of foot guns in Excel and the amount of like 50 megabyte insane models that we've seen
05:40 where people are like, it's 75 tabs and they're all linked to each other in some crazy circular way.
05:45 It's really slow. I don't know why.
05:46 Well, I know why.
05:48 Not sure what's happening, but yeah. And we probably have built a couple of those ourselves,
05:51 probably even worse ones. But yeah, so, you know, we've seen that and it's definitely,
05:55 those are some of the problems of spreadsheets that we kind of initially,
05:58 we were like, hmm, maybe these are some things that we can, you know, try to help solve.
06:02 Yeah, absolutely. And Aaron and Nate, as you would go from business school back to the computer science
06:07 side, I guess specifically to the business school, did your business peers look at you like, oh,
06:13 these are the guys that have the power to make the thing happen. They can help us build the thing
06:17 that we can't quite automate or can't quite pull off.
06:20 I think in some ways it was like trying to work in a group in an Excel spreadsheet is a miserable
06:25 experience. I hope you haven't had to do it, but it's like you upload, you have a Google drive
06:29 and then you end up uploading new versions to the Google drive. And then it's usually a Google drive
06:34 paired with a text message group chat. And it's like, I just finished this sheet. Why don't you go up
06:39 and, you know, do download it again or something? Yeah, exactly. Make sure you're not doing it the same
06:44 time I'm doing it. And I think that I don't think Nate and I maybe solved, solve those collaboration
06:50 problems while we're at Penn, but I think it was those experiences and our thinking we've had,
06:55 our programming as a superpower made us want to start doing this. I don't know if it was
06:59 always recognized by everybody else though.
07:01 I think the big thing was maybe we weren't the best at attending class. So that was, it was hard to be a
07:05 good group member in the first place, but one day, one day will be helpful.
07:08 Yeah. Group work was always hard for me as well. Jake, how about you? How'd you get into this whole
07:12 project here?
07:13 Well, mostly by plug line. I'm Aaron's twin brother. So sort of like a, some sort of covenant,
07:18 I think comes through that. But no, we, so we, we started working really in this like Excel
07:23 collaboration space. My background, I worked at a software company during college, sort of on the
07:28 project management side of some data science projects. So I had, you know, not quite the business
07:33 side of it yet, but at least a little one separate from like the coding product side of it.
07:37 Yeah. And yeah, we started working on these collaboration issues and we built a few other
07:41 products before it had some, you know, modicums of success there, but really took a step back
07:45 at one point. We're looking at like, what are the biggest problems with spreadsheets? It's the speed,
07:50 it's the inability to play in the large data sizes, and it's the lack of repeatability though.
07:55 It just allows you to do, you know, repeatable processes in an efficient way. And so the place we
08:01 found that does all this really well is some kind of sort of very elite is, is Icon. So if the idea was to
08:06 stick a spreadsheet interface on top of Python, and that was sort of like a sentence we had written
08:12 down really, okay, now we need to backtrack and realize like, what does that mean? How do we do
08:15 that? How do we deal with that? And, opened the few thousand cans of worms doing so.
08:20 Yeah, I'm sure. But it's a super neat idea. You know, there's a lot of things that you can automate
08:25 with Python, but with what you guys built, we'll get to it in a little bit, but what you all built
08:29 lets you interact in this spreadsheet way, and then it writes the Python code. It doesn't
08:35 just sort of allow you to make changes. And then you got to stay in your tool, right? You
08:41 use the tool to write code that otherwise might be a little bit of a stretch for you.
08:45 Yeah, I think, and I can talk more high level, they can talk about exactly why and how we did
08:49 that. But like a lot of just from the business side, like a lot of other tools will try and
08:53 extract Python away. So we'll give you, we'll allow you to do the types of workflows that you
08:57 would do in Python, but in a DUI individual environment, we're much more tethered to the Python
09:03 or try to be more tethered to the Python and the notebook. It's really important to us that you're
09:06 staying in your Python environment, and you're not at any, now you're not at a disadvantage at
09:11 all because you don't have the code. The code is right there. It's being generated real time.
09:14 And that's important for yourself. If you're learning Python, if you're trying to use the code,
09:19 or if it's a communication layer, you want that code because you want to send that to a developer
09:23 who's working in Python as well. So really important for us to be tethered there.
09:27 Right. Yeah. It probably brings, allows you to bring more people into the actual project than
09:32 before.
09:32 Yeah. You're much less siloed by being in the environment.
09:35 Yeah, absolutely.
09:36 That kind of like mentality of when you're building tools for beginners or people that don't know,
09:41 like maybe the professional software, make it really point and click and kind of like hide a lot
09:46 of the complexity. I think that's something that we've experienced with tools that we've used.
09:50 For example, like we use Stripe and Stripe creates a bunch of dashboards for you. But the problem is
09:55 we have no idea what those dashboards, like what, what is the nitty gritty details of how those numbers
10:00 are calculated. And so we have all these metrics and we have really poor understanding of like,
10:04 what is this actually telling us? And so I think something interesting that we definitely try to do
10:09 is we give you, you know, people that are maybe less familiar with writing the syntax yourself,
10:14 the ability to, as you said, point and click and use the spreadsheet environment and then generate the
10:18 code. And then if you ever have questions about, oh, like what did this, what is this pivot table
10:22 that I created? What does it really mean? Then you can look at the generated code and see exactly
10:27 what's going on. And I think that kind of like understanding where users need help and where
10:31 users want the as professional as possible, you know, nitty gritty details is a stratification that
10:37 we've thought about and I think have a somewhat unique approach to when it comes to these like no code,
10:42 low code tools.
10:43 Yeah, absolutely. So I kind of want to set the stage by talking about some of the different
10:49 things people are doing with notebooks because notebooks have really taken over in the data
10:54 science space for a good reason. I think, you know, we had IPython notebooks and we had Jupyter and we
10:59 had JupyterLab, which is doing a little bit more than just Jupyter and people really love them,
11:05 I think. And while JupyterLab is great, I think there's even, there's a bunch of creative things
11:12 going on, trying to extend it and use it in different ways. And I feel like Mito falls in there. So I
11:17 wanted to throw out a couple and just see if you all have heard of these and if so, get your thoughts
11:20 on them. One of them is this thing called JUT, J-U-T, J-U-T, maybe something like that. And what it allows you
11:29 to do is it allows you to actually view notebooks in the browser. So have you guys, or not the
11:35 browser, in the terminal, have you guys seen this?
11:36 I haven't seen it, but I do have a lot of sympathy for the unknown pronunciation of JUT or JUT because
11:42 we get Mito and Mito a lot. So my heart goes out to the JUT for sure.
11:46 Yeah, I'm sure, I'm sure I'm messing it up, but yeah. Yeah. So here's a way to say like, well,
11:51 these notebooks are so popular. Let's see if we can show them in the terminal. Like if I'm SSH'd into a
11:57 remote machine and it has, you know, IPYNB files and I just want to see them, how do I look at them?
12:03 And so you just say JUT, I'm going to go with JUT. You just say JUT and then the file name and then
12:08 boom, it shows it right there using Rich, I think. Super cool. Yeah, I think that this is, and I'm sure
12:13 this comment will expand out on as we see more of these, but I think it really demonstrates something
12:18 that we've really observed, which is that notebooks are not like, they're just like Excel in many ways.
12:22 They're not a tool that's used by one person for one specific thing, right? So we use a tool called
12:27 Mixpanel, for example. It helps us tracks and metrics, for example. It's a product analytics
12:32 tool. Yeah. I've used it before. Yeah. It gives you insane amount of analytics of like, where did
12:36 these people come from? How did they find my product and stuff like that, right? Exactly. And you can
12:40 kind of imagine though, it's really in some ways, it's just for product analytics, right? In some ways.
12:44 And the really interesting thing about a lot of our users is that we, you know, one thing as we're
12:49 trying to learn about our users and work with them to improve the tool, one thing we really realized is
12:52 there's people from all over the place and they're interested in notebooks for 472 different
12:57 reasons, right? And so some people using notebooks are people who've never written any Python code
13:01 before in their life. Some people have two weeks under of experience and some people are 75 year old
13:06 developers who only use the terminal. And if you show them anything else, they'll try and fight you,
13:09 right? So it's, you know, it's really, I think this demonstrates that there's really an appetite for a
13:13 wide range of kind of ways of consuming these things and presenting these things and editing these things
13:18 on these notebooks specifically. Yeah. That's a great, yeah. Yeah. Great analysis.
13:22 I totally agree. That's another cool thing about a tool like this or tools like this more generally
13:26 is that I think what they sort of have to do for a product perspective is condense down. What are the
13:30 really powerful things about notebooks? Cause they're taking a notebook and bringing it out of a notebook
13:34 environment. And I think kind of what we're trying to do with a spreadsheet is like, what are the values
13:38 of a spreadsheet? How can we bring a spreadsheet into other environments? So I think a tool like Jude is,
13:42 is just trying to do that. I think it's a, it's an interesting way to think about product. It's like,
13:47 sort of like, what is the essence? What is it? What are the essential parts of a notebook? What
13:51 are the essential parts of a spreadsheet? How do we translate those and bring that value to other
13:55 environments? So I always, I'm interested in tools like that. Yeah, for sure. This is an interesting
13:58 one. Another one is they just came out with JupyterLab desktop version. And I suspect that you guys
14:06 could even integrate with the JupyterLab desktop app, right? I hope so. Most likely. Yes. The answer is
14:12 probably yes. But, you know, I'll be honest with you, like candidly, if I spend one more minute
14:17 on installation problems, I might, you know, chop my arms off or something. I don't know,
14:20 but it's, it's really, I mean, I'm sure I don't have to preach to you. I'm sure you've heard this a
14:24 million times, but the, the Python installation ecosystem environment issues is a, it's a massive
14:30 blocker. I knew it was a massive blocker from like an individual level. Oh, I bet if you're trying to
14:35 reach people who are going through it, like I would just want to use Excel and like have a little bit of
14:39 code. Yeah. Conduit virtual environments, hip versions of Python. Yeah. You probably get a
14:46 couple of questions about that every now and then. Yeah. Yeah, exactly. I think that this is a really
14:54 cool example of, I guess, you know, the Jupyter devs realizing that distribution is one of the primary
14:59 problems here. And certainly with us, it's, it's like the primary bottleneck and users trying our product
15:04 is not, they can't figure out how to use it. It's that they can't even get the thing installed in the
15:07 first place. Yeah. You know, making that as easy as possible. I'm, you know, I'm sure there's still
15:11 work to do here, but, you know, really hats off to them and definitely something that we're
15:14 interested in and working with them in the future. If we don't already, there's probably some hack to do
15:18 it currently, but I have not, I've not done anything in earnest with Jupyter and lab desktop,
15:23 but I have installed it and run it. And I think it comes pre assembled with Python and Conda.
15:28 You don't have to have, it basically comes all set and it, then it just hosts Jupyter
15:33 lab locally inside of an electron app. So it might even be better for you guys.
15:38 Yeah. I'm not totally sure, but if you can make it work at all, I bet it's better.
15:41 I think what Nate is talking about is it's a trade-off that any tool or product building
15:45 as an extension to Python is going to face, especially if it's one that's trying to make
15:49 parts of data science more accessible to a newer audience. If you're building it in a
15:53 Jupyter environment, there's a lot of freebies you get. There's a lot of great nuggets,
15:56 valuable things to the user you get, but, you know, installation can be such a nightmare that you might
16:01 be casting away at certain part top of the funnel just by doing it.
16:05 One of the benefits, I guess, that you all will receive is people can go Google for help setting
16:11 up notebooks and getting the notebook started and all that. And you don't have to be part of that,
16:14 right? Like there's a whole ecosystem of, of people running notebooks, people writing articles
16:19 about using notebooks for beginners. And so you can just sort of level up on top of that and say,
16:24 once you go through all what they show you over here, here's how you go. Right. I mean,
16:28 obviously you want to help people succeed because if they can't get Jupyter going,
16:31 they can't use Mito. Yeah. I was creating documentation for getting set up with Mito and
16:35 I went to the JupyterLab documentation and first things like creating a new sheet,
16:39 I just grabbed the JupyterLab YouTube video and put it in our documentation free.
16:44 Well, that's one of the things I like about your documentation and your site is it's sprinkled
16:49 with screencasts, like little examples of how to do stuff or how to demonstrate stuff. And
16:54 I think more places should do that, right? There's so many places or so many projects that I just
17:00 don't understand though. It'll be a UI framework and there won't be a single picture of anything.
17:05 Like what it's about picture. The whole purpose is pictures. Give us a picture, at least it's like a
17:10 gallery or something. And similarly with how do you use things and just, yeah, you know, hats off to you
17:16 guys for putting those in there. Cause I think it makes a big difference.
17:19 Yeah. One of the reasons I think we've done that and had some, some good videos is just in terms of
17:23 just growing the tool, we've partnered with a lot of people in the YouTube data science community,
17:27 and they're all really good at making demos. I think we've learned a lot from them. People like
17:31 the data professor, Chris Nick, or two we work with a lot, but they've sort of, I think, I think at least
17:37 myself learned a lot from how to present a tool in a video. And that's obviously really valuable for,
17:42 for going to teach.
17:43 Yeah. I prefer to just fire up a two or three minute video and watch it instead of reading
17:47 through and see what I really got to pay attention to.
17:51 This portion of talk Python to me is brought to you by shortcut, formerly known as clubhouse.io.
17:56 Happy with your project management tool. Most tools are either too simple for a growing engineering team
18:01 to manage everything or way too complex for anyone to want to use them without constant prodding.
18:06 Shortcut is different though, because it's worse. No, wait, no, I mean, it's better.
18:10 Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible,
18:16 powerful, and many other nice positive adjectives. Key features include team-based workflows.
18:22 Individual teams can use default workflows or customize them to match the way they work.
18:27 Org-wide goals and roadmaps. The work in these workflows is automatically tied into larger company
18:32 goals. It takes one click to move from a roadmap to a team's work to individual updates and back.
18:38 Type version control integration. Whether you use GitHub, GitLab, or Bitbucket,
18:43 Clubhouse ties directly into them so you can update progress from the command line.
18:47 Keyboard-friendly interface. The rest of Shortcut is just as friendly as their power bar,
18:52 allowing you to do virtually anything without touching your mouse. Throw that thing in the trash.
18:57 Iteration planning. Set weekly priorities and let Shortcut run the schedule for you with
19:03 accompanying burndown charts and other reporting. Give it a try over at talkpython.fm
19:08 slash shortcut. Again, that's talkpython.fm/shortcut. Choose shortcut because you shouldn't
19:16 have to project manage your project management.
19:18 JupyterLab Desktop is one of these sort of interesting things. Another one is JupyterLite,
19:25 which is Jupyter written in WebAssembly that runs just in the front end of the browser.
19:29 You guys check this thing out?
19:31 No, but this sounds amazing. I don't know if I saw the WebAssembly version, but I saw a Python
19:35 compiled to JS version of this, etc. But I mean, it would be very cool if everything could happen
19:41 in the browser because I feel like installation problems would evaporate, which is the goal.
19:46 It's the goal at the end of the day. So this is super interesting. We'll definitely check this out.
19:49 Yeah, this might also be relevant to you guys because it's basically Jupyter running and then they run
19:55 in-browser WebAssembly language kernels. So a limited set of CPython in WebAssembly in the browser and
20:03 probably also Julia and R and all those things.
20:04 Maybe partially non-technical audience like myself. What is the benefit here?
20:09 I'll take a shot. Yeah. So how I would describe this is that normally the way JupyterLab works is
20:14 that it's a client server model, right? So your client is effectively the thing that you as a user
20:19 interact with. You have a notebook that you see, you write code in the notebook, you press shift enter
20:23 to run a notebook cell. What actually happens when you press shift enter is that code gets sent to
20:28 the server. The server in this case is a Python kernel and the Python kernel actually runs that
20:33 code and then sends you back the results. So there's what you see on your web browser and then
20:37 there's a server running usually on your command line or something and you send code to it. It executes
20:42 the code and sends you the results. The benefit of this thing is that you can get rid of that backend.
20:45 So there's no more installation and instead you only need this kind of notebook that lives on the front
20:50 end, WebAssembly is a tool that allows us to run the code on the front end. And so instead of taking
20:56 code, sending it to the backend and getting it back, we can do that all in one location.
20:59 Right. And the challenges you've already been talking about are the challenge of setting up
21:02 that backend to get Python to run on your machine with the environments and the dependencies. And this
21:07 would theoretically in principle, at least avoid all that. It's just loaded up and it goes.
21:11 Yeah. It's super cool. No, we'll definitely, we'll definitely look into it because this is a,
21:14 I mean, this is an area of continual research for us is how to improve our installation process.
21:18 Yeah. Very cool. All right. What else have I pulled up here? A paper mill. So paper mill lets
21:22 you treat notebooks like almost like functions that you can execute a notebook and then get a value out
21:28 for all sorts of things. This is probably the least relevant to you guys, the things that I pulled up
21:33 here, because it's exactly about making it not interactive and, you know, kind of skipping that,
21:38 but just another interesting thing that people are doing to kind of add more or do more with these
21:43 notebooks. Yeah. It seems super cool. Now I was just going to say, I've seen people,
21:48 we've seen people who have notebooks where they like call functions in other notebooks and all of that
21:52 syntax, you know, the Panda syntax is confusing, but then like linking notebooks and all of that is
21:57 confusing as well. So I think it's all interesting. It's all interesting to see, you know, I think just
22:01 like spreadsheets have so many different use cases and Nate touched on this earlier. Notebooks have
22:05 an incredibly large number of different use cases with people from like top level data scientists to
22:10 people just getting started. So the tooling ecosystem that's being created and it already
22:15 existed really quite diverse and really powerful. Absolutely. All right. I think I've got one more
22:20 for you here. This might be the one that you heard of Nate called notebook JS. No, this one renders,
22:26 parses them and then renders it down to HTML, which is, I guess I don't have one to open, but that's
22:34 all right. So yeah, that's, that's pretty neat that it'll basically allow you to turn notebooks into
22:38 HTML. And I think just to kind of draw a categorization model over what we've kind of seen
22:43 before, I think there's, there's a couple areas of focus and these are things that we focus on
22:47 internally. And we, I think we'll deems that will come up throughout the rest of the conversation.
22:50 So there's things that we'll call, let's say presentation, right? And presentation is this
22:55 looking at the outputs here in the terminal or looking at the outputs easily on a webpage without
22:59 running a server, right? And presentation is something that we encounter. How do we present data?
23:03 how do we present conclusions from analysis, et cetera. Then there's another thing, which I think
23:07 this paper mill thing that we looked at last is super relevant on, which is this idea of repeatability,
23:13 right? As anyone who's worked in notebooks knows, there are really great areas for being a scratch pad.
23:18 It can get pretty confusing sometimes when you've got like all these cells flying around,
23:22 you're executing things out of order.
23:23 That's the thing, you know, we have sort of solved the, the Excel, this cell refers to that cell.
23:29 And then, but you have this human aspect of notebooks, right? I can go, I want to try this,
23:33 try this, try this, go back and change this and then run this and the stuff below it, you know,
23:38 you can run them in different orders and even change them and then have some of the invisible
23:42 changes stuck in another unrun cell. Yeah. That's yeah. That can be a problem.
23:47 I mean, Aaron and I spend all day in notebooks and the amount of times where I'll go up to Aaron and be
23:51 like, Aaron, dude, look at this, this code makes no sense. What is this bug? I don't understand it.
23:54 And Aaron's like, you fool. You've been one, three, four, not one, two, three. What were you thinking?
23:58 You know, like just the amount of times that even we as people use notebooks all day,
24:01 that is really, you know, it's dramatic. And so I think that, it's, you know, that area of
24:06 reproducibility and repeatability is something that we spend a lot of time thinking about mostly
24:10 because a lot of our users are really interested in it and it's problems that they struggle with.
24:13 And, it's something I think we'll definitely get into, you know, as we kind of, you know,
24:17 what we actually ended up building. Yeah. Awesome. So speaking, get into it. Let's,
24:20 let's talk about Mido. So I think Mido lives in this realm of these interesting ways to do more
24:27 with notebooks. And as we've already hinted at, you basically take an Excel user interface,
24:32 stick it into a notebook, and then allow people to interact with the data within the notebook
24:37 in Excel style way. Right. But what's your elevator pitch?
24:42 Oh, I can do elevator pitch. Elevator pitch. Essentially, it's a spreadsheet interface for Python.
24:47 So everything you do in the spreadsheet is going to generate the equivalent code for you in the code.
24:51 So below, if you're watching the video right now, you're seeing a little demo happen, but we have
24:55 features for exploratory data analysis. We have features for data wrangling, data munching. I've
25:00 heard lots of different words for that type of process. So yeah, there, and then graphing as well.
25:06 And also we have the ability to save and replay analysis sort of like a macro. And in terms of
25:11 users, you know, we have people who are newer to Python using it to sort of introduce themselves to
25:15 the Python world. They're learning as they go. But the nice thing is that they're not held back by the
25:19 syntax at any point. So you're not Googling syntax, not going to stack row of flow. You're in your
25:23 notebook the entire time, sort of staying in a state of flow. And then we also have more advanced Python
25:28 users and people intermediate in the middle using it just to get their analysis done more quickly.
25:32 It's really fast to do something usually, especially like the code for graphing or pivot tables or
25:37 merging. Take a lot of time to type it out and get the syntax right. So in our tool, you just do it
25:41 in an Excel interface that you're used to and it spits out the correct code for you.
25:45 Yeah, it's cool. People definitely need to see the little video. So if you haven't checked it out,
25:49 you know, obviously I'll put a couple of videos in the show notes, but yeah, that's the idea is you
25:54 come in and you just basically inside the notebook as part of the cell, you know, you might be familiar
26:00 with having an interactive widget for like some kind of graph, right? Where it's got like some
26:04 sliders and stuff. It's like that, but it's Excel ish. Right? Yeah, exactly. Was this your first idea or
26:10 where did this whole idea start? We've worked on a lot of different like spreadsheet related tools
26:15 over, I guess, a year and a half ago or two years, a long time ago at this point, we started building
26:22 like a GitHub for Excel. We're building essentially like, you know, difference detection,
26:27 allowing you to merge, going back to those original problems that we talked about that we experienced
26:32 at school. So we were building kind of this GitHub for Excel platform. We were primarily talking to
26:38 investment bankers and people in private equity and these like really Excel power users. We built the
26:44 tool. We've interacted with a lot of them, eventually realized that wasn't maybe the most
26:48 helpful space that we could be working in and ended up finding our way to this where it was more
26:54 Python based Jupyter notebooks and just making them more accessible. And as Jake said, you know,
26:58 helping people across the spectrum from beginners to more advanced data analysts, get their analysis
27:04 done faster. Sure. I think there's two angles here. One is just, you could be really fast, right?
27:08 You could say, okay, I want to do, I see the data. I want to sort by this. Now I want to drop these
27:13 three columns and then I want to, you know, compute a, do a computer column of sort of thing,
27:18 or maybe join two pieces of data, like two data frames and create like a larger one out of that
27:25 and so on. You could be really quick with that. But I think maybe even more important to that is
27:30 helping people who are just stepping into the data science side of the world, right? They've,
27:35 they've been working in some other tools or no tools whatsoever, really. And then they just,
27:39 they hear, oh, I should go do Python and I should do notebooks. And then they're confronted
27:43 with pandas, which pandas is great and it's not that hard to use, but there's a zillion things
27:48 you can do with pandas and it's not super discoverable. What it is you should choose.
27:53 And also the notebooks don't really help as much as they could. I think it presenting
27:57 the features of an API, right? Like if I'm working in PyCharm or VS Code and I say dot,
28:04 like, boom, there's a bunch of descriptions. And if I get the mouse near it, it'll give me like
28:07 examples and documentation. And here it just lets you type unless you hit, you know, dot tab,
28:12 and you explicitly asked for it. And a lot of people don't, I suspect if they're coming from
28:16 economics, they don't know that, oh, like tab has this magic to show me what I can do and stuff like
28:21 that. Right.
28:21 Those people making the transition to the notebook environment are doing so at a, along a spectrum
28:27 of willingness as well. Some people are doing it because they're really excited about, you know,
28:31 up-leveling their skills, learning Python, making their analysis faster. And then some people are doing
28:37 it because, you know, upper level management wants me to do that. And I think those people are,
28:42 really grateful for having a tool that will help them, you know, meet the workflow requirements that
28:46 they're working in, but not necessarily having to, you know, I'm sure you've read at some point the
28:52 Pandas documentation. And it's, it's one of those that it has lots of great examples in fairness,
28:56 but no pictures. Yeah. It's not the most intuitive.
28:59 Probably not a lot of tutorial videos in there.
29:01 Yeah. So, I would add a sort of a third category to what Aaron was saying, which is like, it's, it's people not
29:07 forced by management, but it's not sort of out of their own interest in Python. It's more so they're just sort of
29:12 forced to, based on what they're trying to accomplish. If they have a certain data set of a certain size, you simply can't do it
29:18 in Excel or Google sheets. So there, I need a medium that's going to allow me to facilitate data analysis on this scale.
29:23 And it'd be really frustrating if you're thrown into pandas to do that and you have to spend all your time.
29:28 So it's kind of doing simple things that you're so used to doing in Excel, like adding columns, doing pivot tables,
29:32 whatever it is, using formulas. And now you're spending a new set of your time going through syntax and making sure that you're
29:37 typing out and capitalizing the right letters in a, in a, in a white code. That can be really frustrating. So our tools really
29:42 trying to allow you to stay focused on the actual analysis the entire time and less about getting your code.
29:48 And many people are Excel literate. Right. And so it kind of leverage that a little bit.
29:52 Yeah. And then for, you know, people who aren't even coming from Excel, but are just data scientists, they're a word. It's not, not Excel
29:58 literate, but they're sort of like visual literate. That's a horrible phrase, but what it means essentially is that they
30:03 understand how to do this. They understand, okay, these are the things I need to do it. Here are the butts. Do it here.
30:07 The options I could use. So it's still a really valuable environment for everyone, Excel or Python.
30:12 Nice. So question from the audience out there's phone says, hello, what are the limitations of Mito?
30:18 That's a great question. I think the, the honest answer is that Mito is certainly a work in progress and
30:23 there's a large portion of Panda's functionality that we don't currently support. So kind of, I think a
30:29 helpful context here is how we're actually developing this tool. So Mito is very much, I would say a
30:34 collaboration with our users. And what that practically means is that join our discord,
30:37 there's a feature request channel and anyone has the ability to show up and essentially say, Hey,
30:41 this is missing from my workflow. And we'll say, Oh, that's on a roadmap or, Oh, that wasn't on a
30:46 roadmap. We'll add that or help us engage with your workflow and understand where that's coming from.
30:49 But essentially what's happening is over the past six to eight to nine months or so, we've been working
30:54 really heavily with our users to kind of build out the core pieces of Panda's functionality that most of
30:58 our users work with one by one and kind of investing their workflows back into the tool.
31:03 Yeah. Which way do you take it? Do you go and say, all right, these are the things people need
31:07 to do in Pandas. How do we surface that in our interface? Or do you say, these are the Excel type
31:12 things people are happy with? How do I make that happen? And it's like, which direction do you find
31:17 yourself going?
31:18 It's a really great question. And Aaron, you can definitely speak to this more because this is
31:21 kind of part of our process internally that's been evolving, but really kind of what we try and do
31:26 at this point is we try and, we try and work very heavily with our users to understand, you know,
31:30 at the highest of levels, what are they trying to accomplish? Right? Someone really isn't usually
31:34 trying to make a pivot table. Usually what they're trying to do is conclude, should I tell a salesperson
31:40 to do X or Y?
31:40 What state are we spending, getting the most sales in? And all I have is zip codes or something
31:45 like that.
31:45 Exactly. Exactly. Right. There's really a lot of people that we work with and this isn't the only
31:49 thing, but one of the things, for example, is they're operating the level of, I'm looking to
31:52 predict this feature or understand what affects this, this piece of my data. And so what we do is we kind
31:57 of work with them to understand their workflow. And then we use that to internally figure out,
32:01 you know, what features drive this, what does Pandas support and how can we provide an interface
32:06 on that that really lets people, you know, get this done as quickly as possible in a way that gives
32:09 them as much flexibility at the end of the day, if they need to take this from Mito and
32:13 and go run with the code somewhere else.
32:14 Yeah.
32:14 Yeah.
32:14 And Eric, go for it.
32:16 Eric, go for it.
32:17 I was going to say, I feel like in the beginning, kind of when they were saying, we were very focused
32:22 on let's put Excel into Python. And it was like more about the Excel functionality. But I think over
32:27 time we've come to think more about the questions more, what is the best visual interface for Python,
32:33 for data science? And it's less so. And so some of those things are from Excel. Some of those things are
32:37 our own creations. It's still a spreadsheet interface, but the goal is certainly not to take
32:43 Microsoft Excel and give you all of that Python.
32:45 Interesting. Yeah. Cool. All right. I think maybe a good way to understand this, how this workflow
32:51 works and get a peek inside of the features is over here somewhere, I think in the documentation side of
32:59 things you've got, yeah, right at the beginning here, quick tutorial here, you've got this sort of,
33:04 I don't know, I'm not bouncing around, I'm not finding it quite where it was to tell you, but
33:08 there's this example where you go and load up a couple of CSV files and then like join them to
33:13 create a pivot table and stuff on that, like that. Maybe give us a talk through what working with that
33:18 data flow kind of looks like. You know, I talked about this idea of like turning zip codes into states
33:23 and that kind of stuff. What does work with Mito feel like? Give us a sense.
33:27 I can talk a little about that. So I think working with Mito feels hopefully very fast and very
33:32 intuitive, or I think two of the things and maybe robust or maybe like the three kind of things we
33:37 strive for. I think in terms of workflows that we see people doing, a lot of people kind of fit this
33:43 like feature creator and automator use case that we think about. So it's people that have some business
33:50 sort of question like where are all my sales coming from when I just have these zip codes.
33:54 And they're probably trying to do that in Python because they've been doing it in Excel over and
34:00 over again each month and they're looking for a more robust and more automatic way of doing that.
34:05 So I think a lot of, a lot of how, a lot of the ways that we see these workflows play out is people
34:11 either have a CSV file or they have like a snowflake connection to get some data. And then they start by
34:18 doing some simple EDA and trying to get a better sense of what their data actually looks like.
34:23 Right. Without Mito, it might be something like df.head or df.sample and just like kind of get a
34:29 visual look of a grid of data, right?
34:30 Exactly. Or even some people are, they want the more visual interface. So they're downloading it to
34:36 Excel initially and doing some manipulations or something like that there. But then that workflow is
34:42 very separate from their more automatic script that they're creating. And then they have to kind of
34:47 reconcile any changes that they've made back into the Python workflow. But yeah, so Mito has a bunch of
34:53 features. There's some here, you know, if you scroll down in the documentation on this left-hand side,
34:57 you might see some stuff that might be helpful. So there's things like summary stats right there,
35:02 which will show you like a graph, a distribution of the data in each column, a lot of those dot
35:08 describe functions and has a really intuitive filtering. So things like filter by value. So you
35:12 can see all the, all the unique values in your column, and then you can toggle them in and out of your data
35:18 set, or you can add more, you know, customized filters, whichever you would like. But then once you
35:23 move past this kind of like initial data cleaning, some people do write spreadsheet formulas. So we
35:28 have a bunch of Excel's most popular formulas, things like date manipulation and date parsing.
35:34 Here's a bunch of, here's a few of them. Once you kind of get a sense of what your data looks like,
35:39 you can do some of these transformations. And then ultimately you'll end up with this script that you can
35:44 use to run over and over again and never have to go back into the Excel world to and fight through the
35:49 manual process again. This portion of Talk Python to me is sponsored by Linode. Cut your cloud bills in
35:57 half with Linode's Linux virtual machines. Develop, deploy, and scale your modern applications faster
36:03 and easier. Whether you're developing a personal project or managing larger workloads, you deserve
36:08 simple, affordable, and accessible cloud computing solutions. Get started on Linode today with $100 in
36:14 free credit for listeners of Talk Python. You can find all the details over at talkpython.fm,
36:19 slash Linode. Linode has data centers around the world with the same simple and consistent pricing,
36:26 regardless of location. Choose the data center that's nearest to you. You also receive 24/7/365
36:34 human support with no tiers or handoffs, regardless of your plan size. Imagine that real human support
36:39 for everyone. You can choose shared or dedicated compute instances, or you can use your $100 in credit on
36:46 S3 compatible object storage, managed Kubernetes clusters, and more. If it runs on Linux, it runs
36:52 on Linode. Visit talkpython.fm and click the create free account button to get started. You can also
36:58 find the link right in your podcast player show notes. Thank you to Linode for supporting Talk Python.
37:04 "The DataFrame" to get data into this spreadsheet-like front-end that is Mido.
37:10 You basically just have to have a data frame. If you have a data frame, there's a ton of flexibility for
37:16 doing that. You could load a file, you could get it from the internet, you could do even read HTML off of
37:24 a URL and then go grab a table and then there's your data frame, work with that. There are simplifications,
37:31 things like when you're in Mido, you can hit file load equivalent and browse to the files and then it'll
37:37 write the pandas code, say like pd.readcsv, whatever you selected, right?
37:42 Yeah, exactly. One of the things I think would be really interesting to talk with you about is
37:47 generally in your survey of the Python ecosystem, it's interesting because Python is code, right?
37:53 And so in some ways we're on this boundary, this flexible border between pure code and low-code,
38:01 no-code tool. And there's been a hundred million low-code, no-code tools that have existed over the
38:05 past 25, 30 years, whatever. And some of them are around and some of them aren't. And really the
38:09 question for us that we kind of ask ourselves is, what unique do we bring to the table here? And as a
38:14 local tool, how do we differentiate ourselves? And it's that exact idea that you can also just pass a
38:19 data frame. We're not necessarily interested in, we don't want to stop you from writing Python code,
38:23 we want to enable you to write Python code as easily as you possibly can. And really that's
38:26 how we see ourselves, you know, manning that spectrum.
38:29 I think it's a really good place to be because my hesitation with all these low-code, no-code tools
38:35 is they usually, one, lock you into their thing, which is often a SaaS thing, right? So you're locked
38:42 into having your data there and continuing to subscribe to a thing, which don't get me started
38:47 on subscribing to so many things. I was suggested that I subscribe to my internet speed checker,
38:53 app, not pay for it.
38:55 You need to know, but like...
38:56 Once an hour, you need to check, right?
38:57 Yeah, yearly, I can subscribe to it. If I pay yearly for my speed checker, like,
39:02 what are you doing? Too many things that are subscribed to. Anyway, not that subscriptions
39:07 don't make sense for a lot of tools. But what I like about this is, like you said, you bring the
39:12 data from wherever and you could do this out of a Postgres database and you build up a data frame,
39:17 and then you could throw into this visual place that speeds you up. And then what comes out the
39:21 other end are more data frames, right? So you could even do multiple, I mean, maybe tell me this is
39:26 true. It seems like you could. You could do some regular Python code, some Mito that generates a
39:32 really interesting transformation, some more Python code, and then maybe another Mito block that takes
39:37 another bit of output and then brings in more data, does other stuff, right? You can kind of mix and match
39:41 throughout these, right? Throughout the notebook? Yeah, exactly. And actually, the other day, we
39:45 went into GitHub and we searched Mito sheet to see how people are using it. And you see people using
39:50 it in that exact way where they're importing data, they then generate some code using Mito. And then
39:55 they'll, you know, one of the things, admittedly, that the tool is maybe not the best that right now
39:59 is graphing. We have, you know, we support creating basic graphs, but not changing the colors,
40:05 changing the titles, all of that. What UI frameworks do you support? Map, Plotlib?
40:10 We use Plotlib. We use Plotlib. Okay. Yeah. We generate Plotlib code, which has great documentation. It's really, you know, you can go in there, we give you a link,
40:18 and you can go in and make your edits. But this is one of those places where generating Python code,
40:22 and you know, in this case, Plotlib code is super helpful for everybody involved. For us,
40:27 it's helpful because, you know, we didn't have to recreate the Plotlib library, which is massive.
40:31 But for everybody else, it's also helpful because, you know, because we didn't generate the Plotlib,
40:36 we didn't recreate the Plotlib library. You are able to use, you know, the entire Plotlib ecosystem,
40:41 and you're not locked in at all. So for something like Alteryx, you know, where you can use their
40:47 graphing features, and if, you know, they don't have the graphs that you want to create, then you're kind
40:51 of out of luck because you're locked in, and there's not really an easy customizability path.
40:56 But you own your Python code that Mito generates. So it's up to you to do whatever you want.
41:00 Yeah, I suspect you guys don't recommend this. But you technically could delete out,
41:05 in a lot of cases, the Mito bits after you generate your Python code and keep going like,
41:10 okay, this was really helpful, but we actually don't need this anymore. There are cases where some of the
41:15 Excel-like functions really come from you guys, right? But a lot of it is what it's writing is pandas,
41:20 and NumPy and Plotly code, right? Just speaking to the lock-in or not the lock-in side of the story,
41:27 right? Yeah, I think we definitely support that. In Mito, actually, we have a button clear.
41:32 Data analysis is all about, it's very iterative, building your understanding of what is useful and
41:38 where you want to go. So one of the, actually, the most kindly done things in the tool is actually
41:43 clearing all of the edits that you've made to your analysis. And getting rid of Mito now that you have
41:47 that understanding and wanting to take it in another direction, I think we're huge supporters,
41:51 well, we're big supporters in our own development process of cleaning your workspace. Nate is proud
41:56 of how clean our code is. And, you know, if Mito isn't helpful for you right now, definitely get rid
42:01 of it so you have an easy notebook to clean up and debug. Yeah, I think that's a big contrast going to
42:07 some other no-code SaaS system. There is no, I don't want to use this exactly anymore, just let me carry
42:13 on with my analysis. There's none of that in most of these tools. And with yours, I think it's there to
42:18 support you and I can see it being incredibly valuable. But at the same time, it's not the
42:22 essence of what you're doing. It's the UI on top of Python, as Jake likes to say. Yeah, I was just going to
42:27 say to the point of deleting out the Mito sheets, we have a society of secret Mito users who are trying to
42:33 convince their bosses that they're really good at Python to get pay raises. So we support that workflow very,
42:39 very well. Exactly. So one of the challenges that you can have when you have a machine write code
42:44 is that it writes bad code that's hard to understand. And Nate, it sounds like you might have had some
42:50 input on this, that when you interact with Mito, you do certain operations. Like I do a filter,
42:57 I create a pivot table, or I filter out certain things. It'll actually write step one, you did this,
43:02 step two, you did that. And it writes what looks like pretty well formatted Python code with little
43:07 bits of documentation, like pivot the table, reset the column name and indexes and stuff like it'll
43:11 even comment your code that it writes. You want to speak to that a little?
43:14 Yeah, absolutely. I mean, this is a really interesting area that we've put a bunch of
43:19 research time in, if you can call it that. So there's a couple things that I want to,
43:22 I think, highlight here. And I'd love to also hear your thoughts on as well. So you're totally right.
43:26 Machines writing code, it's all the rage these days, in some ways, with these fancy,
43:30 you know, machine learning systems where you can, you know, write a little prompt and everything will get
43:34 written for you. Mito takes a little bit of a different approach where exactly as you say,
43:37 when you, for example, add a filter, it'll generate the line of code that corresponds to that filter.
43:41 Of course, immediately, the question becomes, this is not exactly how I want the code to be.
43:46 I didn't mean to do that. I meant to do something else. And so really, the way we think about it is
43:51 giving the user that code in the cleanest way that we possibly can. So just as an example of this,
43:56 if the user adds a column, and then immediately after renames that column, a feature that we're actually
44:00 releasing this week is that is going to get collapsed into just the adding of the column with the new name.
44:03 Right? Nice. Yeah.
44:04 That's ultimately what the user was intending to do. There's other more fancy things that you can do.
44:08 You can start getting into kind of code optimization, where it's like, you made a pivot table, then you
44:13 overwrote the pivot table. And those are all things that we're definitely interested in, and kind of are
44:17 on the roadmap for improving. But generally, you're totally right. If we want users to be able to learn
44:22 from this code, to be able to use this code, we need to generate clean semantic Python that really
44:27 works in the wild and is actually editable by the users. Otherwise, it's just a blob,
44:31 a mass that you can actually interact with.
44:33 Yeah, well, you get it, but at least from the examples I've seen, I would be happy to take that
44:37 and then start writing directly on that. Even though you don't see it on the screen,
44:41 it's like right above, I scrolled it right a little bit off there. But it says, don't edit this section.
44:45 This is Mido code. I guess that's if you still want to be able to use Mido on it.
44:49 Right. You know, exactly. Don't mess it up, right?
44:51 We're actually, that's a feature we're working on this week is improving that communication. You can
44:56 take Mido code and edit it and change it however you want. The only problem is Mido might have
45:00 trouble reinterpreting that if you try and then later replay an analysis. But that's also a work
45:04 in progress. And ideally, you know, the kind of perfect version of what we, you know, envision
45:09 long term, and maybe we'll get there. Maybe this is impossible. But one really cool thing that
45:13 we're kind of thinking about is edit a spreadsheet to generate Python.
45:17 Yeah, I was thinking if this could be a bi directional, that'd be fantastic.
45:21 Yeah, really cool. So really, it's this world where you're you're really fluidly writing code
45:25 and editing a spreadsheet. And if something's easier in code, go write the code. If something's easier
45:29 in the spreadsheet, go write the spreadsheet. And that's definitely a vision that we've seen
45:31 ourselves kind of realizing over time currently. And something that, as you mentioned, you can go
45:35 Mido Python, Mido Python and do that currently. But making that easier for our users to really use the
45:41 code in a dynamic and real way, as you would any other Python code that you write is definitely
45:45 something that we're actively investing in right now and really trying to improve.
45:48 Nice. Yeah, I like the idea of being able to have it do some optimizations rather than create a
45:53 variable and then overwrite the variable with some other thing and just do it all at once. I'm sure
45:58 there's a ton of stuff in pandas that could be done better rather than like a really naive,
46:03 straightforward, like multi step stage.
46:05 Yeah. And yet the other thing I will say is the one benefit actually to generating this code is that
46:11 and not to insult anyone because I'm the biggest offender of this, but most data science scripts you
46:15 see in the wild are not the like pinnacle of clean, well-kept code. It's usually out of order notebooks
46:21 where, you know, because it's such a dynamic process, it's just very hard in practice to keep
46:25 these things well organized. And so actually what we can do in practice is generate some documentation
46:30 for what's happening and help users save and manage these scripts in a more linear and organized way,
46:35 et cetera, and help users kind of adopt these best practices. And that's sort of the stuff that we've,
46:39 you know, we've been exploring recently and Aaron was actually working on some of this today,
46:43 improving some of this code generation stuff, but it's not the highest of bars to meet. And we
46:47 definitely think that we can, you know, continue to improve and surpass that and make sure that the
46:50 code we're generating is really great, great stuff and stuff you'd be happy to edit.
46:53 One thing that while I'm looking at this, I'll just throw this out of here as a piece of feedback with very
46:57 little actual experience. So take it with a grain of salt is you've got like step one,
47:02 step two, step three. It'd be cool if those were actually separate cells.
47:05 So like at the end of say step three, I could do like a pivot table dot head or something just to
47:09 like sort of touch on it and explore it a little bit along there.
47:13 Yeah, no, definitely. And a similar thing that we've also certainly thought of that that
47:17 multi-cell approach would enable is, you know, changing the order of steps, switching things
47:21 around and saying, oh, I actually want to filter first and then pivot versus pivoting and then
47:24 filtering. Yeah, certainly on the roadmap, but definitely, definitely something we
47:27 want to do, giving users more options on how they actually export this code and what they
47:31 do with it at the end.
47:32 Yeah, super neat. I've got a question out here that kind of leads into where I was going to
47:35 go with this anyway, and I'm going to switch the order so I don't cover your head, Jake.
47:39 Spawn also asked, does Mito support switching or you switching from or to or using Dask?
47:47 So, gotcha.
47:48 Yeah, not currently. No. So the one really cool thing is because of the Mito, the way Mito works
47:53 internally, which is something we can definitely get into depending on, you know, if you're,
47:56 you think your audience would be interested in it, what the appetite is. But we really have the
47:59 ability to kind of switch out, let's say, what the backends is of Mito, the kind of what code
48:06 do we actually end up generating is something that can we can leave up in the air. And really,
48:09 our interface can be a more general thing than just Python code or just pandas in Python code.
48:14 Yeah, especially for these frameworks that are near compatible, API compatible with pandas,
48:19 like Dask.
48:20 Like I think Dask is almost exactly, and many, for most of the basic operations,
48:24 it's, they've aligned on the pandas UI just, or the pandas, the pandas API just for, you know,
48:29 how to kind of handle things. So it's definitely something that in the future we could, you know,
48:33 we could support. And if we have users who are working with huge data sets and need Dask,
48:37 that it's definitely something we'd be interested in learning about and exploring.
48:40 You know, Dask, when I first think of Dask, I think large clusters, massively scaled out data.
48:46 But then at the same time, right over here, I have my MacBook Pro Max, which has, you know,
48:52 10 cores on it. And when I run Python code, I get 10% of that CPU, right? And Dask will allow you to
48:59 scale across your CPUs, even on your local machine, right, or scale larger than your memory and stuff.
49:04 And so I feel like it's Dask, even for just making your local work go better, is actually
49:09 probably under realized or under utilized. That's super cool. It's interesting that the range of
49:14 data set sizes that we actually see in practice, it's very, I would say, at least from my observations,
49:19 and Aaron, Jake, feel free to hop on this, but it's like very bimodal and that there's a lot of people
49:22 hanging out with 100,000 rows. And then there's some people are like, hey, I have, you know,
49:26 a hundred million records I'm looking to analyze. And we're like, well, you know,
49:29 good luck on your 2000 fall MacBook. You know what I got? It's going to take a bit.
49:34 Get a coffee.
49:36 Chill out, go to sleep, wake up tomorrow, hope it hasn't crashed, I think is the general strategy.
49:40 Yeah. So that does lead me towards my final two little areas I want to speak about before
49:45 we run out of time here. Now, on one hand, this is just writing Python code. So its performance and
49:51 its limitations are what effectively pandas can deal with. On the other, it is showing that stuff and
49:57 allowing you to sort it visually. So there might be some constraints on like amount of data you can
50:01 work with. What's the data size story?
50:04 It's a great question. We have a release coming out within the next two weeks. Generally,
50:09 but here's our, here's our motto. We obviously we're providing a visual interface. There's going
50:13 to be a little bit of overhead, but the way we like to think about what we do is that it's a tiny
50:18 little bit of flat overhead, no matter how big your data is.
50:21 Okay. For example, you won't try to show the entire hundred million rows in a grid or like
50:25 you'll do some sort of like virtual lazy load list or something like that.
50:29 Exactly. A lazy load list. And we actually have a lazy load of the entire data set. That's a
50:33 feature coming out within the next two weeks or so. It's all written. We just got to test it a bit
50:37 better in the wild. But yeah, effectively, it's, we're a very thin wrapper on top of pandas
50:42 functionality. And in practice, what that means, what that means is anything you can do in pandas,
50:46 you should be able to do in Mito from a data set size perspective. It's something that was
50:50 very important to us, especially because a lot of our users are in Python because of data set
50:53 size limitations in the first place. As I think Jake mentioned. Yeah. Yeah. Yeah. That's a good
50:57 point. This is something that we'd spent a lot of time. We were previously using AG grid as our
51:01 actual display unit. And it just wasn't made in combination of probably us not implementing it 100%
51:08 how they might implement it. And you know, them not optimizing for these ginormous data sense,
51:13 data sets. Nate in particular just spent a huge amount of time recreating the entire grid from
51:19 scratch. So we could have complete customizability over it and show as much as much data as possible.
51:24 Nice. I mean, sometimes you got to do that, right? You're like this, this control is amazing,
51:28 but we've outgrown it and bite the bullet and just do it right. Exactly. Yeah. Now,
51:33 interesting question from Samir out there. Hey Samir, can I use Mito and VS Code and I'll sort of
51:39 expand to that just a little bit. Can I use it in, in some of these other tools that are not exactly
51:44 notebooks. So we've got VS Code has its kind of own way of presenting and showing notebooks. We've got
51:50 PyCharm and we've got data spell, which is JetBrains new data science IDE thing. What's the story with
51:57 these environments? So unfortunately right now, Mito only works in future lab, but this question is
52:04 something that comes up all the time. We, I think the places we hear the most interest are VS Code and
52:08 Google collab. And we're definitely excited and really going to support those environments as well. And I
52:14 think we've, we've done a lot of work internally on how we design, how we design Mito to make it
52:18 extendable. It's now, you know, we have a lot of these like functionality that we're trying to pack
52:23 into the tool and then handling these new environments as, you know, a decent amount of development work as
52:28 well. So it's all a prioritization game at this point. Yeah, of course. Yeah. So short answer is we
52:33 don't support them now, but in the future we definitely will. Cool. All right. Two other areas quickly I want to
52:38 touch on. One, tell us a little bit about how this is implemented internally. I think it, I don't know
52:43 how it's implemented, but I'm, I'm guessing that it's somewhat like a lot of Jupyter stuff. Like I
52:48 want to do Jupyter things for Python, but I got to write them in JavaScript. Is that the story here as
52:52 well? You know, actually your earlier comment of, I imagine it's just like a slidey widget that you can
52:58 use in a graph in JupyterLab was spot on. The IPy interactive widget or whatever it's called?
53:02 We are actually just a very fancy IPy interactive widget. Okay. And practice how that actually works
53:07 for your audience. If they're interested is there's kind of two pieces to your code base. There's a
53:11 JavaScript front end and there's a Python backend. The JavaScript front end is the sheet that you see.
53:16 It's the buttons that you click. And what that actually does is it just, it's a very thin wrapper
53:20 that just then sends a message to this Mito sheet Python package in the back and says, Hey, I just
53:25 clicked this add column button. You should add a column, excuse me, to this data frame. And then that
53:30 Python processes that message and then responds to the front end and says, okay, great. Display the new data
53:34 frame and also write this code to the cell below. And that's kind of the high level of what happens
53:39 there. It gets, you know, as you can say, more complex than the nitty gritty. We're in practice.
53:43 We're a react code base. We use TypeScript because we like strong typing and we kind of hate Python's
53:47 weak typing. But do you use type annotations on your Python side? We are gradually adding them to our
53:53 code base. We don't currently type check. Actually, we mostly use them as like IDE support to make
53:57 things easier. That's the main thing I use them for as well, because a lot of times the
54:00 IDEs will show you the errors if you make them anyway, right?
54:04 Yeah. Yeah. No, you definitely, you get some support, but pandas, I'll say pandas typing
54:08 support. Obviously pandas is the main library we interact with. It's not perfect in all cases. It's
54:12 a very complex library, so for sure. But effectively in those cases, things kind of break down and the
54:17 errors that you get are maybe sometimes false positives, sometimes false negatives, and you can
54:21 shoot yourself in the foot sometimes. Yeah. Interesting. Okay. So it's a blend. It's
54:24 it's the JavaScript, React, TypeScript, front end, and then Python back end. Yep. Yeah, exactly.
54:29 That question of that, let's say that stratification in that architecture is that's exactly what's going
54:34 to have to evolve as we kind of move into other places like Google Colab, VS Code, etc. They all
54:39 have slightly different extension architectures. And so architecting our code bases, so these things are
54:43 separatable and reconfigurable and in the ways that other data science environments expect is something
54:50 that we've kind of been trying to do. But you know, a plea to everyone who's developing data
54:54 science IDEs, settle on one extension environment, please. I know it's never going to happen, but it'd
54:59 be really nice for us extension developers. We'd love it for sure. Right. Or making it an adapter.
55:03 Yeah. Right. If somebody created things that if you have a Jupyter UI and you want to put in
55:08 Google Colab, you just insert this thing and talk to it and then magic happens. Yeah. Shims when they work
55:14 are great. So if someone's done that, let us know. Please reach out and we'd really appreciate it.
55:17 Yeah. Yeah. Super cool. All right. Another thing I do want to make sure that we touch on a little bit
55:22 is up here at the top, I see plans. And so this is not for every possible use case of free tool,
55:30 right? You have a free version and you have a higher order paid version for teams, if I'm
55:35 understanding that correctly. Right. So we have, my tool is a free 3D tool. We have free users,
55:40 90 whatever percent of our users are for free to an end user. Please download it. You know,
55:44 you can, you can pivot solve and it's free. We work with some larger organizations in sort of more
55:50 of a scope manner doing cost of development, cost integration. And those are the payment happening
55:56 there. Sometimes it's some of those larger enterprises. What we're building out now is
55:59 sort of that middle piece where we, you know, we want to have sort of a plan for teams, maybe with better
56:03 security, some free development hours, things like that. And that business model is evolving, but we'll
56:09 probably be, what we want is a SaaS model there where you're paying, you know, 10 bucks a month
56:13 or something. It does seem like some kind of online system. I mean, you go to Jupyter through a browser
56:20 anyway, some kind of system that's like really already configured because you're helping people come into
56:24 Python who probably don't totally want to pip install and manage their path and activate virtual
56:30 environments and all those kinds of things. Yeah. Yeah. It's kind of like really helped them there.
56:34 Right? Totally. And that's one of the things we do when we work with some of these larger enterprises is
56:37 help them with the setup of their Drupal Hub environment, help them get the package they
56:40 need, help them get my to install, obviously. But I was going to say to your viewers, you know,
56:44 we're definitely looking to partner with more organizations or teams. If anyone wants to reach
56:47 out and make my email or something, we'll be in a link somewhere. Yeah. Yeah. We'll put your
56:52 contact information in the show notes. Sounds good. But yeah, we're definitely, you know,
56:55 we're looking to work with teams right now as we have a really good, really strong user base.
56:58 Yeah. Cool. Cool. So I'm glad that you guys have some kind of business model because a lot of
57:02 these things, they come and then people kind of lose interest and then they go. And there's a
57:07 real big difference of this is my job and my investment. So I'm really going to work on it
57:11 versus this is a thing I'm kind of excited about for a few months. So it's cool. You got a free plan
57:16 for people to use that. That's awesome. It's also cool that there's a path to support to just make it
57:21 better. Yeah. Yeah. We're here for the long haul and works out. The other thing I'll add here is it's,
57:25 and I know I mean this on a knock on the like hundreds of amazing data tools that are out there, but
57:30 there's, I think like this level of polish that we really feel a desire to reach with our tool.
57:36 It's like that, that kind of when you're delivering a tool to a paying customer, there's often a
57:40 different expectation that comes from the paying customer. And we do our very best to hold ourselves
57:44 to the highest standard possible. But when someone who's paying you reaches out and says, Hey, this
57:48 button doesn't look the way it should. That extra level of polish really kind of kicks the tool over
57:52 the edge. And all of those, all of that feature development ends up getting, you know,
57:55 pushbacks to the individual users. And so really the free users I'll say, and you know, most of the
58:00 people who use our tool. And so really, you know, we're trying to build the best tool that we can here
58:03 and making sure that we can do that sustainably long-term and really invest in, you know, what
58:08 we're doing and build a team around it is something that, you know, really is necessary if we're going to
58:11 deliver on what we think we can and the promise. Yeah. Cool. Another nice thing about the paying users
58:17 is just that we get to, we need to work in a much more close relationship with them. So there's a lot,
58:21 that's where we get to zone in on like specific use cases around financial services or around
58:27 bio research. So we've come to work on specific features and specific workloads that we definitely
58:31 wouldn't otherwise that I think made inspire going to really benefit everyone uses the tool.
58:35 Yeah. Yeah. Very cool. I'm fascinated with the different ways people are working and operating in
58:40 open source space or building on top of open source tools to create businesses. You know,
58:44 we've got the Anacondas, we've got the MongoDBs and stuff out there. So yeah, good,
58:49 good luck to you guys. I'd like to see you succeed here.
58:51 Yeah. Thanks so much. And then the last thing I'll say is, you know, we do our best to give back to
58:56 open source tools as well, especially the ones that we work on. So you'll see me sometimes being
59:01 annoying opening issues on GitHub. And I think that's another big piece of this is as we build
59:04 on open source tools, making sure we contribute back to them in ways that are meaningful and actually
59:08 helpful is, you know, certainly really important as well. And definitely something to say something.
59:12 Yeah. Fantastic.
59:13 All right. Well, I think that's about it for the time that we have to talk about extending notebooks and
59:19 Mido and all this really cool stuff that you all built. Before I let you out of here,
59:22 there's the final two questions, however many in what order and whatnot, you want to take this,
59:26 just jump on in notable PI PI package out there, something you've come across like,
59:30 oh, this library is awesome. It doesn't get enough attention. Anything come to mind?
59:34 Mido and Mido: Yes. Honestly, something that a lot of our users use and Jake,
59:37 actually feel free to hop in after me, but I would say a pandas profiling. It's a tool that does somewhat
59:41 similar stuff to us, but it's a super great tool for many of our users. And I don't think a ton of
59:45 people know about it. And it works right in JupyterLab as well. So it's called Lux, which is cool. I don't
59:49 even know if it's actually still being supported and developed at all, but I really love it.
59:53 LogTrestles. You look up Lux Python. It's cool. It does like automatic graph suggestions. So you
01:00:00 can pass in a data frame and it'll suggest sort of give you options of visualization to just click
01:00:04 on and use, which I think is a really quick thing. It's not the most like, why fully fledged package,
01:00:09 but for what it does, I think it's really good.
01:00:10 Mido and Mido: Now these little things that people don't know about, they're like,
01:00:12 oh, that's cool. I'm going to go check this out. It might help.
01:00:14 Mido and Mido: Yeah.
01:00:15 Mido and Mido: A Python API for intelligent visual discovery.
01:00:17 Mido and Mido: I may not could be exact more, but we're really close and we love the DeepNote
01:00:22 product. It's like, it's pushing notebooks forward. So trying to add collaboration,
01:00:28 like collaboration, like Google sheets and Google docs, it has a lot more, potentially a more friendly
01:00:33 interface than sending out the rest Jupyter notebooks that are a little bit bare gains at times.
01:00:38 Mido and Mido: Okay.
01:00:38 Mido and Mido: There's also another notebook.
01:00:40 Mido and Mido: Yeah, really cool. I've spoken to the DeepNote people just a little bit and
01:00:43 they're doing cool stuff for sure. Mido and Mido: Yeah, he's cool. Obviously,
01:00:45 there's another notebook called Hex. We had to talk to their founders recently
01:00:47 about some potential collaboration, but they're doing cool stuff as well.
01:00:51 Mido and Mido: Okay, fantastic. All right. And then if you're going to write some Python code,
01:00:55 on notebooks, obviously, if anything else, what editor do you use?
01:00:59 Mido and Mido: We're big VS Code users.
01:01:00 Mido and Mido: Okay. All three of you guys?
01:01:01 Mido and Mido: Jake when he dabbles. Mido and Mido: Yeah.
01:01:03 Mido and Mido: Yeah. Mido and Mido: Yeah.
01:01:04 Mido and Mido: Yeah. No, VS Code.
01:01:05 Mido and Mido: I'm in notebooks.
01:01:06 Mido and Mido: I was thinking about this yesterday recently. It's like, I wish at school someone taught a class called actually doing software development in the real
01:01:12 world because I feel like I lived my whole school life writing Java code in Eclipse and
01:01:17 Mido and Mido and Mido and Mido and Mido and Mido and Mido and Mido and Mido and Mido and Mido. And I'm in the real world. And then I started using VS Code and it was like this transcendent moment of bliss where I was like, oh, programming can actually be fun. And it turns out the tools I was using just made my computer heat up to a thousand degrees and burn my lap.
01:01:29 Mido and Mido: Yeah, I really agree with that statement that that there's a bit of a mismatch of what is taught in sort of computer science and then what is expected of people when they get out in the real world. And it might not be as academically highly valued, but really good, like working with tools like VS Code and and PyCharm and these other tools that help you write code better quicker. And some of the software engineering science, I think that's really could be valuable.
01:01:54 Mido and Mido: No, totally. And also the other thing I'll say is for developers like us, maybe who came out of school and moved into a startup and, and, you know, didn't have a ton of experience, let's say writing Python in production in the wild. The other thing I would highly recommend is continuous integration. You can set it up through GitHub or GitLab. I'm sure if you use that as well, testing your code automatically on a server, huge productivity gains for us and really is increased our confidence that we're able to deliver like the best possible product and not something ever anyone ever told us about, you know, when we're in school. So.
01:02:23 Mido and Mido: No, go implement a database in Lisp. Okay. All right. And it's John out in the audience has a quick funny comment about the editors, Neo Vim, of course, that's starting to get some attention lately as well. Very cool.
01:02:36 Mido and Mido: I'm scared of anything with the word Vim and it scares me, but I'm sure you're a superhuman for using it. But,
01:02:41 Mido and Mido: You know how you generate a proper random number or character set is you get a first year computer science student into Vim and then you ask them to quit.
01:02:51 Mido and Mido: Yeah. I mean, you've seen the most, the most like Stack Overflow questions like I'm trapped in Vim. How the hell do I get out of here?
01:02:57 Mido and Mido: Yes, exactly. Mido: It's so funny. It's like half of what Stack Overflow does is answering just that question specifically.
01:03:01 Mido and Mido: Exactly. For sure. That's great.
01:03:04 Mido and Mido: All right. So final call to action. You guys, people are excited about this. How do they get started? Where do they go from here?
01:03:09 Mido and Mido: Dot Trimido.io, which is our documentation website. And if you're an organization enterprise,
01:03:14 just reach out to me at my email, cheek@sagacollab.com, which will have all links here. Or here, the planes page is a link there. But yeah, the documentation,
01:03:22 Mido and Mido: Yeah. Right on. Also, I'll throw out there while you're on the docs, watch the videos. That's a quick and easy way to really see what it's about. And before we wrap it up here, Mr. Hypermagnetic has a little comment like that Vim is the eighth. It's like deadly sins.
01:03:38 Mido and Mido: It's weird though, because it's not just the eighth of the deadly sins. It's the eighth of the deadly sins that also like 10% of the population swears is the greatest thing since sliced bread. And so it's like half the population, like me, I'm terrified of the damn thing. But my father is like, son, like my dad's dad from the eighties is like, son, have you heard of this? I'm like that, please. I can't take this right now.
01:03:56 Mido and Mido: Yeah, it's amazing. Yeah. My co-host in Python bites, Brian, he's all about Vim. Everything's Vim. It's great. But I did some Emacs and then I kind of did some other more UI oriented things. Awesome. All right. Well,
01:04:08 Jake, Aaron, Nate, it's been fun to have you here. Congratulations on this project. I think it's going to help a lot of people get into Python and data science quicker, more easy.
01:04:16 Mido and Mido: Awesome. Yeah. Thanks for having us. Great chatting.
01:04:18 Mido and Mido: Talk soon. Bye bye. Mido and Mido: Yep. Bye.
01:04:20 Mido and Mido: This has been another episode of Talk Python to Me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Choose Shortcut, formerly Clubhouse.io, for tracking all of your projects work, because you shouldn't have to project manage your project management.
01:04:37 Visit talkpython.fm/shortcut. Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines. Develop, deploy and scale your modern applications faster and easier. Visit talkpython.fm/linode and click the create free account button to get started.
01:04:53 Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code. Visit talkpython.fm/assemblyai.
01:05:03 Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show. Open your favorite podcast app. And start with your podcast.
01:05:15 a subscription in sight, check it out for yourself at training.talkpython.fm. Be sure to subscribe to
01:05:21 the show, open your favorite podcast app, and search for Python. We should be right at the top.
01:05:25 You can also find the iTunes feed at /itunes, the Google Play feed at /play,
01:05:31 and the direct RSS feed at /rss on talkpython.fm. We're live streaming most of our
01:05:37 recordings these days. If you want to be part of the show and have your comments featured on the air,
01:05:42 be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host,
01:05:47 Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write
01:05:51 some Python code.
01:05:52 Bye.
01:05:53 Bye.
01:05:54 Bye.
01:05:55 Bye.
01:05:56 Bye.
01:05:57 Bye.
01:05:58 Bye.
01:05:59 Bye.
01:06:00 Bye.
01:06:01 Bye.
01:06:02 Bye.
01:06:03 Bye.
01:06:04 Bye.
01:06:04 Bye.
01:06:04 Bye.
01:06:04 Bye.
01:06:04 Bye.
01:06:04 Bye.
01:06:04 Bye.
01:06:05 Bye.
01:06:06 Bye.
01:06:06 Bye.
01:06:06 Bye.
01:06:06 Bye.
01:06:07 Bye.
01:06:08 Bye.
01:06:08 Bye.
01:06:09 you you Thank you.
01:06:12 Thank you.