#273: CoCalc: A fully colloborative notebook development environment Transcript
00:00 Everyone in the Python space is familiar with notebooks these days. One of the original notebook environments was Sage math. Created by William Stein and collaborators. It began as an open source Python based computational environment focused on mathematicians. It has since grown into a full blown company and has become a proper collaborative environment for things like Jupyter notebooks, Linux, back bash shells, and much more. Think Google Docs, but across all these facets of development in your browser, we welcome back William Stein to give us an update on his journey from Professor to entrepreneur, building co calc along the way. This is talk Python to me, Episode 273, recorded July 7 2020.
00:53 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is brought to you by linode and Talk Python Training. Be sure to check out what the offers are for both of these segments. It really helps support the show. Speaking of Talk Python Training, have you been thinking about taking one of our courses, we're participating in the latest Humble Bundle deal for Python developers along with a bunch of other great educators and tool developers until July 22, you can get $1,400 worth of Python goodies, including three of our popular courses for just $25. Yeah, humble bundles are crazy. That's $25 for our three courses, and all those other things combined, just visit talkpython.fm/ humble 2020 at talkpython.fm/humble 2020 all together before July 22. To take advantage of this offer. Now let's get to that interview.
02:01 William, welcome back to talk Python to me. Thanks for having me back. It's really great to have you back. Your story is one that I've thought about a couple times over the years because it's it's really inspiring. And it's quite interesting how you created Sage math. And you went on this journey. And it sounds like the journey is still going strong. And you've come a long ways. Yeah, it's been a very intense and exciting couple of years ago, he started Sage math way back in 2004. worked on it in a normal amount until about 2013. And for the last few years, I've been working on code calc, which is a web application whose goal partly is to make it really easy to use Sage without having to install anything but also a lot of other open source software. Right. Absolutely. So kind of notebooks before notebooks were cool. Yeah. So in sage, we needed some sort of notebook interface. So we wrote something called the sage notebook back in 2006, through 2008, which back then it looked a lot like it was a first sort of web based computational notebook. And we had to make a lot of interesting design decisions. At the time, we put a lot of work into it over many years. And then eventually the the whole stage project we switched to using Jupyter using their file format and using Jupyter notebooks with a sage kernel, rather than having to maintain our own Louis separate notebook just for sage. Yeah, I guess before we get too far into it, maybe just tell folks what Sage is. It has been almost four years since you were on episode 59 talking about Sage math and what it was. So a lots happened since then. Yeah. So Sage is a big piece of software that's open source and free. And the goal is to provide an alternative to mathematic and maple, and magma and various mathematical software systems. But technically what it is basically a Python library for mathematics. And main things that distinguishes it from say sim pi, is that we really, really focus on advanced research, mathematics applications. So things in pure mathematics like number theory, topology, differential geometry, etc. So it gets used in physics and mathematical research. But it also has a lot of functionality related to undergraduate mathematics teaching. Oh, nice. Yeah, that's a very interesting angle that we're going to talk about. And your backgrounds in number theory, right? Yes, yeah. I did my PhD and number theory, in 2020 years ago at UC Berkeley. number theory. So interesting. And that's one of the things I didn't study that much when I was working on my math PhD, which To be clear, I have didn't finish I dropped out halfway through and went and did computer stuff, which is fine. But yeah, I was more on the analysis side, but very, very cool. number theory is quite an interesting topic. Now, I think, you know, let's just get straight into a little bit of history. So back in 2006, use some of your grad students and did you have you had some other students helping sort of along the way right to build out Sage math at the time you were a professor at University of Washington in the math department.
05:00 Right, yeah. And then you built this up to be used in the classroom and to be used in research. Also with this undergraduate focus that you talked about. Yeah, yes, that's right. Probably 20 different students at University of Washington worked on stage over the years, and then hundreds and hundreds of people around the world. And I taught frequently a class to about 20 to 50 undergraduates really about how to use software in mathematics. So I guess nowadays, it might be called a data science class. Yeah, exactly. Right. It's, it's sort of like, oh, remember back when engineering, like you would go get an electrical engineering degree to actually do computer science or like programming, right? Like you, it wasn't really a fit. So like, it would kind of go there. Interesting. Okay, the course covered a bunch of topics, like how to use our How to use pandas, which is a Python library on how to use law tech. So it's been a week or two, and just how to use a lot of tech. And, I mean, it was really kind of a whirlwind. And it was difficult to coordinate all this software with the students to get them to install something on their computer, it was just, I mean, not not really an option, right? You have all these different platforms, right? Yeah. And I like to have been a part of class every, every meeting where I would give them something to work on, and then walk around and help them and just see what was happening live. So it was, you know, more interactive. And so I ended up writing, well, first using the sage notebook, and then later use writing code calc, specifically, to make teaching that class more effective. So they can do law tech, they could use are these Python libraries, they could use sage, everything in the course of the class. And moreover, I did it in a way that was collaborative. So absolutely, all the functionality of CO calc is collaborative. So you have multiple people simultaneously, using the Linux terminal editing a Python script editing in the Jupyter notebook of any law tech document, whatever. That's rebuilt, baked in from the ground up in cocoa. Yeah. So co calc is, I guess the way I see it is like Sage math as a service plus a bunch more. Is that like a fair representation of it? Yeah. Okay, we'll talk about what those things are. But there's a lot of things going on there. And this collaborative side, I think, is super interesting. I mean, we're just starting to see that in a lot of other areas, for example, you know, with VS Code, having like the live share stuff, right? You know, Jupyter notebooks still doesn't have that I'm aware of this rich, like Google Doc level of back and forth that you can actually have, right? There are some online systems that have it. I think, maybe data lore, maybe has it, but it synchronizes a sentence or a cell. Right? It's not like you see it going. And what's really interesting about the collaborative stuff, so I'm, I think this is really valuable. So I'm diving into it a lot here, right at the beginning is you can even do like a collaborative bash shell, and stuff, right? Yes, there's more than just I can work on the notebook. And then there's other stuff. It's like all that is collaborative. Every single thing is collaborative. Do you have a cupboard of bash shell with a little chat on the side, we can chat back and forth to each other. It's basically integrated, you see the other person as they're typing. And we actually, it's kind of funny, we were looking at different terms to, you know, kind of do SEO for. So just for fun. We've played around with Linux online and online terminal. And strangely, we're the top we've been the top hit for both of those search terms for quite a while now. And we're getting a beautiful a lot of users for learning Linux online, these co calc these the terminal, and they learn Linux. Right, right, right. Sure. Because you get basically a synchronized interactive bash shell, exactly. Oh, I hadn't thought about that. But of course, that makes a lot of sense. And there's also a code editor, so you can edit a.sh file, and then run it in the bash shell. Okay. Very cool. You know, you mentioned the chat. And one of the things on the website that I thought was cool. So you've got kukoc.com. Pretty straightforward. That's awesome. One of the things you mentioned, which I recently ran into in a programming variant is you have math friendly chat.
08:56 Tell us about Matt friendly chat is okay. So first, anytime you Right, so the child has marked down, but you can use dollar signs, and it will typeset using lawtech. Whatever's between dollar signs. So in that sense, it's kind of like writing in law tech. So that basically makes it Matthew only. One other thing that I think makes it math friendly, is you can look at any message in the past and double click on it, and edit it in order to fix something that's wrong. Mathematicians like everything to be exactly right, always. So whenever you see something that's wrong, you really want to be able to fix it exactly. You can edit other people's chats, not just your own. So somebody else writes a chat, and you see a typo or something that's wrong, you can make an edit to their chat. And it's recorded. So there's a history of the changes. And you can see the click a button to see what's the history of edits to particular chat cell. Wow, that's really, really interesting. Yeah. You know, I slight diversion. I think that that would fix so many problems with social media like Twitter, for example, they say, you know, we're not going to let you edit chats because we want to have an authoritative view. We don't want people to twist what it means like why isn't there just a history like
10:00 Show original, you know what I mean? Like if Yeah, it's so bad. I just recently tweeted in, tried to amplify this message as much as I can. And I'm like, Oh, no, there's a bad grammatical error in that. But I can't fix it. I just got to live with it. Right. So I think that's a super cool feature. The scenario where I ran into the equivalent of this, as I was in a zoom meeting, I have office hours for my online courses. And I wanted to share some bit of code. I'm like, this little fragment of Python is what you need to fix it and I paste it into zoom, no indentation, right, not even indentation, the person's got to go through and like rebuild it. And it's obviously not a fixed width font and all those things, which I guess doesn't matter if you take away the indentation, but still just having those little touches, I think is actually really nice in the code. And co calc stands for collaborative. And my guess is that right? Yes, that's right. Yeah. So really focusing on that rich exchange, that sort of interactive bits there seems valuable. Yeah, for a long time, it was called Sage math cloud. And that really made you think it was just a cloud based way to run Sage math. And we did a really, really supporting pretty much everything in the open source math related ecosystem equally. And so focusing just on Sage is really kind of not really what we were about. And we also put so much effort in to collaboration. So we ended up coming up with a name that really focused on collaboration, and just general calculation, not just Ah, yeah, this is a great name, I really like it, I think it's the right move to one thing I do want to touch on, I think, is probably interesting. For a lot of people out there, let's see, we've got a lot of academics in the audience, is you started out as a professor at University of Washington in the math department, which is a very coveted and challenging position to get worked on this project. And I'm sure it was not easy, but you decided to leave the professor position leaves a department and then focus exactly on this, could you talk to us a little bit about that, for folks out there maybe wondering how when, or in similar situations, I, you know, went to graduate school, and, you know, it was, you know, terrified, would ever be able to get a job at all in academia as a postdoc, and didn't expect to, but then amazingly, I did. And I had a job as a postdoc for a while. And then I got a job at UC San Diego. And surprisingly, I got hired with tenure out of my postdoc, which was not at all what I expected to have happen. And so the job at University of Washington was fantastic. It's a great department, the chair of the department is a very active stage developer. So it's, you know, it's like, cool, very supportive environment, I had nothing, no issues with the department at all, I didn't leave for reasons like that. But I just really, really felt it was the right time to focus full time on making code calc better. And you know, we had a critical mass of users. So it's worth it to do so we have three full time employees of the company, including myself. So, you know, we can support their enough users to at least support significant full time development. But yeah, I was a terrifying choice to leave a tenure job, take a salary cut significant salary cut to work on Coke, all the time. So I really had to be very, very confident that this is going to work. You know, I know exactly how you feel, it's quite quite a stressful thing to to have a really good job in hand and say, I have this theory that this thing is going to work out.
13:18 Let me you know, with bills to pay and responsibilities, let me just jump in and try to think but also my wife is faculty at University of Washington. So we had solved a two body problem. Oh, my God. So yeah, that's really hard to solve a second time. And I'm also very well very aware of how hard it is to get an academic position or to go back because half the time I was at University of Washington, there was very, very little hiring, due to the 2008 financial crisis. Oh, yeah, it was around that time to now looking back, now that you're on the other side of that fence, how do you feel about it, I really, really liked my current job a lot. Working on Coachella, I love it, you get pulled in a lot of directions as a professor or as an academic, right? It's hard. Yes, focus on any one thing. You have the you know, the Monday, Wednesday, Friday, you end up with a full schedule, basically. So I would basically Monday, Wednesday, Friday, all about teaching and then Tuesday and Thursday, all about meeting with students from you know, 9am to 6pm, I would just schedule in a, you know, when our per student or prepare of students and just fill up my days. And so we just get an incredibly end up with a very rigid schedule. So instead of waking up and thinking, Okay, what is the best possible thing like spend my time on today? It was sort of already decided what I would be doing for the next, you know, 10 weeks, you've got that 30 minute gap in there where maybe you could squeeze it in for the day, and the rest is already booked. Right? I do like the freedom a lot of being able to iteratively decide what's the best thing to work on. And I feel like I still have a positive impact on education with Coachella because a lot of educators use it in their classes. And I'm still contributing to the education of students just in a slightly less direct way. Maybe more so in some senses, right? Like, hopefully, yeah, at least in terms of numbers and maybe average out the impact
15:00 Right, like, what I was doing before is I was doing developer training, and I would go and work for a week with 20 people or something like that. And that was I had a big impact with those 20 people. But, you know, I had to leave that world to go do the podcast and to do online courses, which you know, you work with thousands of people, it's, it's a really interesting trade off to make, it's very difficult to scale teaching in certain contexts, like as an as a university professor, there's various limits to how much you can scale up due to it. Yeah, sort of being all owned by the University System. Yeah, we're both running into similar things. So yeah, it's nice that cocao can scale up a lot more in your online courses as well. Something right in the context, it's basically a matter of awareness, rather than Yo, making more of you or whatever.
15:47 This portion of talk Python to me is brought to you by linode. Whether you're working on a personal project, or managing your enterprises infrastructure, linode has the pricing support and scale that you need to take your project to the next level, with 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise grade hardware, s3 compatible storage, and the next generation network will know delivers the performance that you expect, at a price that you don't get started on the node today with a $20 credit and you get access to native SSD storage, a 40 gigabit network industry leading processors, their revamped Cloud Manager cloud not linode.com root access to your server along with their newest API and a Python COI just visit talkpython.fm/ linode when creating a new linode account, and you'll automatically get $20 credit for your next project. Oh, and one last thing they're hiring go to lynda.com slash careers to find out more, let them know that we sent you.
16:47 Before we move off this whole academic side of things, let me just ask you really quick, with all this COVID stuff going on. And everything around universities and online education and zoom education or whatnot, what's going to happen to universities is, are they going to get a massive transformation? Or is it going to be hard for them to fill up their courses? You think it's just going to go back to the way it was? What's your view of the next five years of universities quickly? Well, I really wish I had a good answer to that question. I really want to know the answer that because it has a big impact on how I should develop co calc. Yeah, and my wife's job and many other things. There are people who you'll see a lot of comments online by people who kind of think they know the answer to that question and have it pretty dire predictions. But they often are not seeing the whole picture of what a university is. Because a lot of research, a really important part of the university is the research aspect of it, right. Also the social aspects of people meeting. And it is definitely possible to have an online course, as long as isn't too large, where I think you get a lot of the value of an in person course, if you you know, conducted over zoom in real time where you don't just record the lectures and play them back. But you actually have real time discussion between the students. So I think the traditional value of the university can still be propagated via web based mechanisms the other way Yeah, so I think that doesn't go away with COVID. But I really, I think things aren't going to change dramatically. But evolution in a more of an evolutionary way. So there'll be some changes, but it's not going to all come crumbling down. Yeah. And that probably speaks good things for Coke, which we'll get into in just a second are just like the real value of the universities as the professors and the students could a meeting. And you still have that on the professors still exist in the students still want to talk with them? And yeah, I learned a lot from them. And I guess the research side of things I hadn't, I hadn't really considered how much money grants bring into the university, right? Like if you get a million dollar grant, half a million or something. And that scale goes to the university just straight out of pocket. Right. And so I think the whole research side of the universities is still going to have an interesting story. They're supporting them. Yeah, yeah, it brings in, like at University of Washington, it's well over a billion dollars of their budget every year is just research money that comes in How interesting. All right, so let's dive into some of the features of CO calc. And I mean, we've been bouncing around a couple of them. And maybe with this conversation, like let's start with talking about teaching a course. So one of the things that you focused on with Coke, in its origins, were helping teach a scientific or mathematical class, either in a classroom or potentially remotely these days, which I don't know how much you envisioned that before. But now, it's pretty easy to do. When I would teach classes, I would usually put all the content for the class on GitHub or on my website, and then tell the students hey, here's the here's the website for the course grab the content, download it, upload it, you know, whatever. But as soon as other people started using co calc, and I made it available more generally, instructors are just right, I need a functionality in coke that does the following. It takes a directory of files and assignment pushes it out to all the students. Let them work on it in a way where I can see what they're doing. And then
20:00 At a certain point, I can collect the assignment, grate it and return it to them. And I want that to all be integrated in the system. So we implemented that, and then in 2014, and then iterated on it and polished that kind of functionality over the years. So now co calc has its own integrated course management system, where you paste in a list of email addresses of all of your students, they get invited, and they have their own little private workspace, a Docker container where they work on content for the course. And they periodically receive assignments, the assignments just appear in their project. And they work on it. And they don't have to do anything, just submit the result. It's just when it's due, it gets collected. So that functionality is integrated into Coachella. And it's done in a way where while the students are working, you can see their cursor moving, you can like, basically, you can look over their shoulder and see how they're doing and see if they're running into errors. See if things are working well, right, get a report of when they were working. If you wonder how their notebook got to a particular state, we have this time trouble feature. It's kind of like a track changes, you click a button and you get a slider. And it shows you exactly what the notebook was at about every three seconds, as it's been edited actively, that's pretty fine grained. Yeah, it's very, very fine grained. And the students also have access to it. And so I often notice students, when I would walk around in a physical classroom with them, they'd be working, they'd say, I had this figured out five minutes ago, and I messed it up. And it's really hard to just undo, you know, back five minutes in time in a Jupyter notebook, especially, which doesn't even have the global and do. So with time travel, they just have a separate paddle right next to the notebook, which is every version of the notebook every three seconds. And so they can just look and see exactly what they had five minutes ago, copy, paste it back over. And they're, they're back in business. So it's a kind of reproducibility in a really practical sense. That's super interesting. I did notice that that jumped out at me that the the time travel story was pretty cool. I didn't realize it was every three seconds, that's great. And I suspect very high already exchanging that data with a server for sync for the collaborative side. So just save it right? how this came about was, I implemented an algorithm in 2014, for real time collaboration called differential sync, which is passes around lots of messages. It's this really weird, complicated algorithm. And it doesn't record any history of what happened. And then an undergraduate, Jonathan Lee, who was working on Coachella as a student project, he added in some sort of time travel history feature, because I think maybe had some like online hack pad to forget about it, right, I looked at what he was doing, I realized that you could basically rewrite our whole sync algorithm or come up with a different sync algorithm that automatically recorded history. And that was sort of the basis for the real time sync as well. So I threw out the old algorithm we were using, wrote a new one. And it made his thing he was implementing a lot easier to implement. And then I started using it just in coding, like I'm editing a Python file or a TypeScript file or something. And I realized, hey, five minutes ago, I had some code that I wanted, and I don't want to undo to get it. It was not put in git, yet. It's gone. But with time travel, I just have it there. So now whenever I'm coding, it's like a critical, I don't know tool for me, I can't imagine writing code without having the slider that gives me all past states in my code. Well, it's a little bit like writing a document in Google Docs, or sheets or something like that, right? No, I don't envision that I ever have to save or I or anything, it doesn't quite have that fine grained slider feel to it. But you know, I was watching a show yesterday as like a sitcom type thing. And the joke was these college or high school aged kids were working on some it was actually advanced math in their, on their laptop, and like, there was a power surge and their laptop died and they lost all their work. I'm just like, What do you mean, you lost all your work, right? Like this is such an outdated show from I feel really sad actually. But still, like the concept of my work is not being saved. It's kind of wild with kukuk. I also with development, I've always assumed that for a while, we had a lot of bugs, and you'd have to refresh your browser a lot because of you know, memory leaks, or it just crashes. So I've always designed to kind of assuming that you're going to have to refresh your browser frequently, or your network connection is really flaky. And so a second issue after you type something in it goes to the server. And that's it. That's all just like, after that happens, you can refresh your browser and you'll be exactly where you were. And so that's just the assumption with medical system that your browser is going to be crashing all the time your networks terrible. It's not true. Now it is, but it's a little bit like the chaos engineering from Netflix, right? Like, let's just throw monkeys into the, you know, wrenches into the mix or whatever analogy you want and say, let's design assuming that it's going to be wrecked. That's also a fundamental design constraint with collaboration, I think, with CO calc, you make a project which is a Docker container, use Jupyter notebooks in it, and you invite collaborators to work on it. And the collaborators could easily in a few keystrokes cause all kinds of damage. And so you want to be able to trust adding collaborators. And so with CO calc as the edit your files, all the changes are saved and if you want to revert back to before they messed it up, you just do that and you won't have to worry about anything being lost. Also
25:00 The files on disk are snapshotted, every 20 minutes, like the complete state of all the files. So if they accidentally delete all the files, you can, you might lose a few minutes, you're not going to lose very much. And there's no way for collaborators that you add to delete the history of editing. So it's really immutable. It just is moving forward, immutable, the only way to delete it is to create a support request and ask us to delete it. Lisa, that's great. In a teaching context, that's really good. Look, if you want to have like a scientific paper you published where you have no history of it, copy, paste. Here's the new version, right? I mean, same thing with Google Docs, I work on something and I share the Google Doc with someone else. Yeah, say it's like a business proposal or something. I assume that what I've typed, they're going to potentially go and look at so either export as a PDF, or just, you know, just know, when you go into, that's how it's going to be and it's just, that's fine, right? Yes. So let's run out this teaching side of things, I guess, really quick. So probably in the fall, I don't know what the world is gonna look like, fingers crossed, it's vaccines, everything's are well, but you know, that's not necessarily the likely outcome. So there's probably a lot of features both Middle High college and even grad school level teachers thinking, I'm going to have to do this online. One option is I get a whiteboard, and I write on it with a pen, you know, like a digital pen. To teach something, it sounds like this might be a better option for certain types of classes, at least as a side by side with a video call or something. We say to those people, like, how could they use it? What role would it fill there, if you ever had like a computer lab part to your class where you wanted all the students to work on a Jupyter notebook or something like that, a cookout basically just put stuff in the cloud, in a way where you really can walk around and look at students work, and you know, jump in and start editing, and see when they're active. So it solves that. And it makes it a lot easier to have homework assignments that involve Jupyter, could I have like 20 students and just open 20 tabs and just, you know, command all the arrow between my students. And you also, whenever they have an issue, they can type a chat in the side and you'll see a notification, and you can click on that notification. And you'll open exactly their document. But yeah, absolutely. You could, you could just command you can just tell through all the students or if you had a really big monitor, watch them all at once. Exactly, yeah. But that 55 inch 4k TV good use, we are constantly being contacted by instructors who are basically being forced to move classes online and looking into options, they find Coke, they think it might work, maybe not. They ask us a bunch of questions, the answers are all Yes. And then they're amazed and they start using it. So people often assume I think some of the things that we can do well, probably nothing can do them yet. But we can do that now. So very cool. Now, I was gonna say this question for a little bit later. But let me jump in ahead. Because it might be more relevant here. What does this cost? Is it? I know there's a free tier and you can go over to cocao.com? And do stuff for free? But is this teaching a course free? Or what's the story there, you can use co calc for free in a couple of ways. But if you really want to teach a course and have a good how the students have a good experience, etc, then it's basically I mean, there's a range of different costs, but basically think $15 per student, that's roughly on average, when it comes out to there's a few that students can pay directly or you can have your university pay us or something else. Yeah, I mean, it seems like maybe in a high school scenario, you might have the school district pay, but in a college scenario, and students are used to just pay and pay. I remember, when I went to my first I first started out in community college, and I paid my tuition, and it was like $400, like, imagine that right? For a full like 15 hours course. And then I went to the bookstore. And it was like $600. I'm like, how can the books cost more than the entire tuition of all the courses like this doesn't make any, like $200 for a chemistry book or something like that. And so I think 15 bucks to take a math courses is probably a drop in the bucket. And it's probably easy enough. If there's a textbook, it's about the sales tax on the textbook, it's about the cost of the sales tax on the textbook. Yeah, it's interesting when you know, from a, starting a business side of things, some universities, it's the culture that always students pay directly and other universities that would be it's considered unethical and crazy to have the students pay directly. It's kind of like random which one it is, like University of Washington, the students pay. That's that's just how it is. Whereas various universities and other places, it would be considered just crazy to do that. Right. Right. How interesting. Well, if you're talking Mathematica, or MATLAB or maple, like student pay is like, a lot of money, yours, but there's the base rate, but oh, I also need the wavelet decomposition toolbox. And I need this other toolbox. And now all of a sudden, it is legitimately really expensive. Alright, cool. So the teaching side, I think it's really interesting. And maybe let's talk about some of the other things that it does. So you talked about the time machine thing, and that's a form of version control. It's a little less structured than I'm saving a point in time actively.
30:00 Is there a way to integrate this with a GitHub like a thing? or external source control? Can I export it? How does that happen? It Coachella project is literally just a Docker container running Ubuntu currently 1804 bit seem to be 2004. And you can use the terminal and any Git commands in the terminal. That's what people free. So there's like in the Docker, the shared Docker collaborative space, there's a sage work book or something to that effect plus the other files, and I can just get add those things, get in it, and then get pushing git add until you just have a home directory, it's a file system, the default free quote is three gigabytes per project, you can create as many projects as you want. And within a project, you can say, you know, git clone git, pull git push git add, and just use all the Git commands, which usually is pretty easy, because anything you could possibly imagine wanting to do with git, you just Google it, and you find it pretty quickly. There's like 10 examples. Yeah, for sure, for sure. But there's no graphical integration. Yeah, it's all like you use the terminal chair. What about file exchange? Is there like a Draggy, droppy Yo, in the tool, or do I have to use SCP or anything like that, there's a pretty nice, I guess, File Explorer application inside of Coachella. And it has drag and drop functionality that you can just drag a file or directory from your desktop, onto the file listing, and it just gets uploaded automatically. And next to any file, there's a download button. So you can tell the files back, you can also set up an SSH key just like in GitHub, either for an individual project or across all projects you're using. And once you upload an SSH key, your public key, then you can SSH to any of your projects. And then you could use your are seeing course some graphical tool on your desktop to sync files back and forth. Yeah, it's really just like a remote Linux server, basically. Yeah. Cool. So you say Docker, a lot of people hear Docker, they think transients,
31:54 Is there like a persistent map volume or something like that, where it's kind of permanent, we have a really cool thing, what we do is, we have this, it feels like you're running a Docker container that's about where the image is about 300 gigabytes. Because we install every Python like 9500 Python packages, like 10 different versions of sage, and everything, a whole bunch of versions of Anaconda just we install everything and the kitchen sink. But what we do is we store all of that in persistent in a disk somewhere. And then NFS mounted into a Docker container when the Docker container spins up. And we also plug a home directory into that Docker container. And so you have a fairly lightweight little Docker container, that via remote file systems has a huge amount of useful information inside of it. Yeah. And it's cool. It's important, because we want to be able to start to think fairly quickly. And yeah, yeah, every little bit of computational resource. You want to keep that right. Yeah. Yeah. Speaking of which, Where are you running this? when I, when I go to cocao? Can I calculate Where do I calculate? It's on Google Cloud Platform in Google Compute Engine? and physically, it's on the East Coast? Currently, that seems like one of the best choices, right? Like, that's where my servers are there, New York City, actually right now. But it's, you know, you got to pick somewhere, it's good for us. It's good for Europe, it's less good for other folks. But, you know, you got to pinpoint somewhere in the globe. That's probably like as terms of user, at least from what I experienced, that's a good place. We have a lot of users in Europe. And so it makes sense. timewise. We do plan to expand to other data centers when we have more users. And also one of our main developers, Harold Chile, lives in Austria. And so he wants it to be at least not ridiculously far from. Yeah, no, that makes perfect sense. I agree. I think it's a good spot. You talked about running graphical software, I guess. Let me go back and touch on one other thing. You talked about all these packages. So it sounds like if I can SSH in or I can go to the bash shell through cocao coms live interactive bits, I can I can pip install whatever I want. When I get in there. Yes, you can put through your pip install. And it just installs it into tilde slash dot local. And yes, you can stall any packages, you might find that almost anything you want to install already been installed. Because our Python environments have basically everything anybody's ever requested. Since 2014. data entered Python, is it Anaconda? What's what do I get? When I Yes, in the sense that it's there's a Python two system why to Python three system wide, there's several versions of Anaconda. There's each version of sage has its own Python install inside of it. So there are many many different pythons available. If you look at the list of Jupyter kernels that we have there. It's you know, takes up the whole page. Interesting. So speaking of Jupyter and Jupyter kernels, it sounds like Sage math is slightly different than Jupyter itself, Jupyter notebooks, but also that there's some compatibility between like, could I take a Jupyter notebook, i pi and B, whatever the extension is, and drop it up there and work on it in CO calc. You can use Jupyter notebooks we have a reimplementation of the Jupyter stack but
35:00 It uses standard Jupyter kernels. And it's designed to be as compatible as we can possibly make it with, you know, the official upstream Jupyter project. And that's a big difference, I think from some of the other Jupyter ish implementations, like co lab in kaggle. Like, we have all the same menu options as Jupyter classic. And we really I put, you know, basically my to do list when implementing the new version of Jupyter for coke calc was to implement every single feature that Jupyter classic has. Maybe we're at 99%. But we, our goal is to be 100% compatible at school. So it sounds like if you're a student in your top, you know talking to them about here's they know Jupyter or they're looking at Jupyter documentation, they know pretty much just work. Yeah. Not only that in CO calc, we have our built in Jupyter client, but we also have a button that you can click which runs a Jupyter classic server from the project. And then it's, you're looking at exactly as they literally is. Yeah, it is Jupyter classic. And we have another button there is JupyterLab. So you can fully just use JupyterLab or Jupyter classic to interact with your project instead of coke kalx view of the project, did you probably lose the collaboration at that point? Yeah, because those don't have collaboration. But you lose the collaboration, but but at least you can get 100% compatibility with those programs. Right, right. That's a good escape hatch. One thing that I thought was pretty interesting, I didn't necessarily expect was I saw on the documentation x 11. Yes, that's like a remote windowing system for full on like Linux applications that are Windows based. So you can do that in here as well. Yes, this is kind of weird. But there's a couple of html5 based x 11 servers out there like Apache guacamole and extra people. Every once in a while, people requested that we add something like that to cook calc. And it was always like, you know, they can do 99% of what they need to teach this class. But there's this 1%, where they want to use an old Java application with a graphical interface for a few days in the class to look at some data, or you know, something like that. And so I decided to finally try to figure out how to do it and ended up writing an expert client. So I took so extra x pra is some excellent server that runs under Linux, and it has an html5 client that they have. And I took their code. And I basically rewrote it in a way that would be like paint the screen on an HTML canvas or something like that, it sends you a bunch of data and then use an html5 canvas to display that data, and you handle clicks, and so on. So I just rewrote their client to fit into co calc, collaboratively like you can have multiple people looking at the same excellent application. And when somebody moves the cursor, you'll see the cursor move around. So Wow, that's actually really cool actually did the gotcha is just the you know, it's over the network. So it's slow. There can be latency, which is really annoying for graphical application. Yeah, but it's it exists, it's better than nothing. There are a few like if you want to play around with tickle TK to do some old school Python, graphical user interface development for like, you know, an assignment in a class, you can totally do that in CO calc, you can fire up idle, which is the official Python, you know, graphical interface that nobody uses. But the official Yeah, and you can type in some commands to pop up graphics, and they'll work. Yeah, that's pretty neat. Or you can run GIMP or Declan or Inkscape or something. Yeah. So I don't think it gets used a lot. But again, it was really a mean, at the very edge case, that would be something that stops somebody from teaching a course, or otherwise using Coke, right? Or they have this one GUI interface that they did for this research project. And they want to move the research project there. Yeah. And if they're going to do that they need they need it to work or something. Right. Yeah. Yeah. Very interesting. All right. Well, what else is notable, you know, somebody jumps out at me is you kind of have like MATLAB support and a bit go on since right with through octave, tell people about that the octave Jupyter kernel, we, you know, make sure that graphics work nicely. And yeah, so you can use octave octave is very similar to Matt love least for the basic language and your Jupyter and into the graphical interface we just talked about. There's also an octave clients, which, you know, shows octave graphics and kind of looks like MATLAB, I don't know much about octave. I haven't. I used to do some MATLAB stuff. But I never do think that octave How close are those two things very close. But there's deep functionality first, certain special domains that aren't implemented in Octave, but are available. And it's like, there's always toolboxes. And so on the right there. Yeah, I think is nearly compatible. Cool. So that's, that's a pretty good option. And again, online as part of this thing. Yeah. And other things to note about co calc, you can run co calc yourself, there's a Docker image that we regularly update. And so you just type a command in Docker and you're running your own calc server, on your laptop or on a remote server. So it's like a mini version of calc with all the same functionality that you run locally or or anywhere you want. Yeah, very cool. Okay. Nice. low tech. We already talked about the tech in chat. And for those who don't know, law tech is a markup language, kind of like HTML but for visually accurate mathematical representation.
40:00 So if you want to do a sum from n equals one to 1000, it has the little n equals one at the bottom and 1000 of that, like, it's the representation that you would write in, you know, proper math class. Right? So it's kind of a markup for that. And you, you'll have some pretty good support for that there as well. Yeah, so it's what everybody who writes papers and math and physics and probably some other areas in academia, we'll use, for sure, because it looks very, very professional. It has excellent support for cross referencing between different sections, and splitting your document up into different files, and so on. So we fully support that. Also, there's a package called PAETEC, like use PACKAGE P y, and then you go backslash, p y. And then right in your law tech document, you can type a Python, a bit of Python code, and it gets evaluated whenever the document gets updated. So that's pretty cool. And there's something called Sage tech, which lets you embed plots in arbitrary stage code in your document, which automatically get updated when the document gets compiled. And we make that very easy to use inside of calc, so you don't have to install anything or mess with anything. It just works. Yeah. And one of the things I thought was pretty cool. There was right on the home screen, you've got representation of some nice is not that fancy, but it is, you know, proper law, tech representation of, you know, fractions and square roots and stuff. So you say, Show, law tech. So you've got a solver that solves an equation. And then you say, Just show me the tech of the solution. And boom, there it is. That's pretty cool. You just say logic of the solution, then it shows you the code that you would put into a law tech document, or a markdown file to get that beautiful formula. Cool. So it shows you that like the law tech markup, the dollar, whatever. Yeah, exactly. Yeah. Okay. That's basically a feature of sage, and other systems, like sim pi has something similar, where you build up some complicated answer. And then you just say, give me the logic, that I don't have to try to convert this ASCII looking thing into something nice, because you'll make a typo. Yeah, that's, you definitely know what that yeah. So another thing that stood out to me was that you have database support for things like Postgres, and what, you know, again, it's just a bun to Linux. So we have a lot of things pre installed that people requested. And it's just sitting there. And you can use it from using Python or other languages. Yeah, sure. So maybe you analyze some data, throw it in there, put some indexes on it, and let to ask a bunch of questions, quick and fast. But the nice thing is, it's all in one place, like all in the same site, all in the same container. You can work with some Python code, you can use it in a Jupyter notebook, you can play around with the data from a database, and then write a paper and lawtech. all in the same place. Yeah, with collaborators. Yeah, that's really cool. And it's all saved there. Very nice. So there's a bunch of other features that we're not talking about. But these all seem really, really cool. And the collaborative bits are really neat. Like, there's our support and other stuff. I guess maybe we could talk a little bit getting shorter on time here, I guess. But talk a little bit about some of the internals, you already mentioned that it's basically your own reimplementation of the Jupyter stack. Because the way Jupyter worked, it didn't really support real time operation, right? Yeah, basically, Jupyter has a front end client that takes a receives a bunch of messages from the kernel, it's kind of a proxy from the backend to the front end client. And the front end client decides what to do with all that information. So the huge amount of processing goes on on the front end. And with CO calc, that kind of model really doesn't make sense for us. So instead, what happens is back on our servers, we process all the messages from the Jupyter kernel, and then figure out what how that should make the document evolve. And we just synchronize the document between the front end and the back end. And it's a very different architecture than Jupyter currently has. And so it made real time sync difficult. I did implement real time sync directly on top of Jupyter classic for a couple of years. But we've mostly worked. But every once in a while things that go wrong. And I'm getting complaints from users. And I really wanted that last 1% to work. And it really, I was, you know, kept trying to figure out how to do it. And I couldn't find any way to do it besides just rewriting the whole thing, which took months and months of effort. It's one of those things that doesn't sound like a tough decision, or, hey, we're gonna have to do this. Here we go. You have to underestimate the difficulty in order to start. Otherwise, you wouldn't do it. Exactly. I bet we could pull this off two months later, you're like, boy, yeah, this was more than I bargained for. Exactly. Now you're committed, right? Yep. Yep. Yeah. And it and we continue to be committed. So we have to implement every extension as well, we have to re implement all the extensions. So we, you know, prioritize that based on what is most in demand. So like, people want the Table of Contents extension digit groups, we have to re implement that. So we don't get anything for free. Do you have telemetry and stuff like that, that tells you what's going on or is it sort of support requests coming in. We have a lot of telemetry because of this time travel, recording everything that's happening and so on, but we basically ignore it for the most part and make our day
45:00 Development decisions in a direction based almost entirely on support requests from users. We make it very easy for users to ask us questions, just click a button help in the top of the screen, type something, and then we get get the support request. And so that really drives our development. It's just what people are wanting and willing to share. We also have a discord chatroom, where people often tell us what they want. Nice. So there's probably a lot of Python going on here. But there's also probably a lot of JavaScript being basically a very fancy single page application. Exactly. Right. Yeah, it's a lots lots of TypeScript code, both the front end and the backend is mostly JavaScript. But we use Python, mostly for managing the complicated back end stuff, like getting projects to run and moving data around and manage. And there's just a lot of like tests that periodically happened on a Kubernetes cluster. And monitoring tasks. And but calc is mostly a TypeScript application. So because it you know, a lot of it runs in the web browser. And that's just the canonical English these days. It sadly, is I wish that the story could be a little bit different. I mean, on one hand, I lament it for Python on the other, you know, what other environment? Is there only one language? Yeah, it's pretty amazing how JavaScript has been dominant. Other things haven't really picked up in the browser. That's really surprising things like Dart, and things just haven't caught on yet. And a lot of the foundations we might have with webassembly, and so on, I guess I'll take us on a slight diversion, I really wish that there was a way to have more runtimes running as WebAssembly in the browsers, right? I mean, on one hand, you say, well, it sees you compile it to WebAssembly, and then you can just link it and download it. But no app is gonna say it's acceptable. I have a 10 megabyte download to get my page to show, right? Yeah, maybe if it's something like kokako, you're there for a while, but like, not in the general sense. But you know, if you had a set of canonical, like, here's the Java runtime, here's the dotnet. Here's the C, Python, here's the and those just got shipped as part of Firefox, Chrome edge, and that you didn't have to download them all of a sudden, you know, we wouldn't be free from JavaScript. But that's a world I see in the future. It doesn't sound like Python on the front end has a lot of value to you guys, at this moment. We don't get any requests for that from our users ever. I don't know if that's because there aren't lots. I i'm not saying that there. There isn't a huge demand for it. It's just that I think maybe other sites like repeat, and just other things already solved that problem. So people don't come to us because they don't need us for that. To solve that problem. You already have a way for them to execute Python. Yeah. Right. It happens back there on on the server in a totally fine way. It would really be just for you guys. So that you could say we'd rather write Python, if that were the case, I don't even know if that necessarily makes sense, right? Because you probably have nice front end frameworks yo is doing the binding is TypeScript and react js and TypeScript, pretty good, and provides a lot of value. And it kind of fits into the browser ecosystem. So there's that I mean, pythons somehow developed as a language that's super good for data science and scripting and a lot of types of applications. But I think TypeScript and JavaScript have developed to be really good for asynchronous applications, where you, you're often responding to a flow of events, and where everything is done as non blocking as possible. And most code Yeah, because our code runs very quickly, and it's responding to things. So it's the kind of tend to be solving different problems well, or differently, right? Well, that's what you get when you write a language that has no mechanism for synchronous. Yeah. All right, it has to be called backer async and await relays, depending on how modern it is. Interesting. All right. Well, it sounds like a pretty cool project. I mean, it's, I love the collaborative side of this thing. And when I heard Coke, and I thought about it, I envisioned the notebook thing to be very collaborative, but I didn't expect, you know, collaborative, gooeys, collaborative Linux terminals, collaborative, all these things. I think this is, you know, a higher level of collaboration that I realized, which is, you know, a real testament to what you guys built. Cool. Yeah. All right, William, before I let you out here, though, I got to ask you the two questions, and I'm looking forward to answer this one. First of all, if you're going to write some code, what editor Do you use, so calc itself, when I started writing it, I also bought one of the original Chromebooks from Google, I can't remember is called but it was one of the nice ones. Yeah, but it was really running Chrome OS, which was very limited back in 2013 2014. And I decided to write co calc entirely from within co calc. And you know, I bootstrapped it a little bit to get going. But then after that, all the development since then has been from within co calc. So really, the editor under the hood is code mirror. But I've developed you know, added a ton of extensions to code near and functionality just, for example, if you're editing a page of code, you can split it horizontally vertically, as many times as you want. And look at lots of different points in the document. There's code for syntax highlighting, the time travel feature, just a kind of consider that a basic
50:00 Need for editing code. Now I find it impossible to imagine editing code without that. So the one thing that is really lacking from co calc for beautiful ideas VS Code style language server protocol support where you get extensive information about typing and other sort of static analysis of your code. And there's no reason that I can't add it. It's just hard. And I haven't done it yet. And I'm working on that for sure. But before cocalico used Emacs, for a very, you know, for 20 years? Well, you always think about, you know, the witness language, a proper language, when it can build itself can compile its own runtimes. And it when it's creating itself, and this is an interesting thing for the editor as well, right? When when can you create this tool with the tool itself? And I think that's a pretty awesome. I think it's got to help make it better, right? Because it's one thing to say, well, students get to type in here. And they're a little annoyed with that thing, the way it works, but whatever, right, versus I gotta live in this thing, day to day, no, we're fixing that problem all of a sudden, right? Is there some of that? Oh, yeah, yeah, like, we've had a million problems over the years. And he hit me all day long every day. So I definitely prioritize them. Not taking this paper cut anymore, exactly. Tomorrow, I'm fixing this or whatever. Alright, then. Notable pi baggage. So there is a package for Sage math, you can do pip install Sage math, but it's a kind of silly package, because all it does is check the SH installed somewhere on your computer, it doesn't actually install anything except this sort of like tiny little five lines of code. And in the future, though, I hope someday, maybe in five years, when you do pip install Sage math, you will actually install Sage into whatever Python environment you're using. So that's a major challenge, because Sage is like a million lines of code. That's new code written for stage plus a bunch of dependencies, which are themselves, you know, hundreds of thousands of lines of C code and assembly and other things. But you know, that's if I had a million dollars to spend on stage development, that would be what I would want to do, I want to break up stage and smaller pieces take like the really cool Python functionality we've written for sage, break it out of stage, and we get lots of Python packages. So they're my favorite pi pi packages, and not the many nonexistent packages that exist only in my head. Which together someday with me, pip install Sage math, install Sage math. There you go. That's like calling your shots, right? Yeah, we're gonna do this here comes awesome. It's from 25. Exactly. Put it on the calendar, right? But I'll call the action people want to get started with cocao. Maybe try it in their classroom or try to one of their research projects, what do they do just type cocao.com into your browser. And you'll get to the website, and there's lots of information right on the website about how to use it. You can also be typed shared kukoc.com, you can browse through, like 10s of thousands of documents, people have publicly shared from co calc. So they're like Jupyter notebooks, markdown files, all kinds of different things. And when you're looking at any one of those documents, there's a big green button at the top, which is called Open and run or run now. And when you click it, it will start up code calc and interactively run that thing. And you don't have to make an account. You don't have to do anything at all. Just sit back for a few seconds. And then you'll be using code calc. And it's completely anonymous. And free to do that. Yeah, very, very cool. Well, great work on this project. It's so nice to check in with you. You know, we last spoke in 2016. And I think at the time, you're still in your professor role, but working on Sage math. So what a long journey you've been on and well done. Thank you very much for having me on the podcast. Yeah, you bet. Bye. Bye. This has been another episode of talk Python. To me. Our guest on this episode was William Stein, and it's been brought to you by linode. And us over at Talk Python Training. Start your next Python project on the nodes state of the art cloud service, just visit talkpython.fm/ linode li in Eau de, you'll automatically get a $20 credit when you create a new account. Want to level up your Python. If you're just getting started, try my Python jumpstart by building 10 apps course or if you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes. The Google Play feed is /play in the direct RSS feed at /rss on talk python.fm. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code