#247: Solo maintainer of open-source in academia Transcript
00:00 Michael Kennedy: Do you run an open source project? Does it seem like you never have enough time to support it? Have you considered starting one but are unsure if you can commit to it? The challenge is real. On this episode we welcome back Philip Guo who has been a solo maintainer of the very popular PythonTutor.com project for over 10 years. He has some nontraditional advice to help keep your sanity and keep your project going by holding down a busy full time job. This is Talk Python to Me, Episode 247 recorded December 10th, 2019. Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is brought to you by Tidelift and Clubhouse. Please check out what they're offering during their segment. It really helps support the show. Philip, welcome back to Talk Python to Me.
01:11 Philip Guo: Alright, I'm super excited to be here. I think it's my third time here, I believe.
01:14 Michael Kennedy: This, I do believe this is your third time here. The first time you came we talked about the CPython Source Code and we spent a lot of time talking about ceval.c you had been doing a graduate student course walking them through the basically the source code of Python to talk about interpreters, right?
01:32 Philip Guo: Yeah, yeah, that was back when I was at the University of Rochester. That was back in the Python 2.7 days and I've heard recently on your shows, there've been people who've done updated of versions for Python 3. Updated of version of this interpreter walk.
01:43 Michael Kennedy: Yeah, yeah, exactly yeah. We had Anthony Sean not long ago. He wrote almost a book on it. So, yeah, we had a lot of time, a good time talking about that. And the other one was really well received as well. And it was something like geeking out in your golden years or something like that. Like coming into programming basically near retirement you've done some research on that, right?
02:01 Philip Guo: Yeah, so that was a right When I came to UC San Diego, which is where I'm working now. And that was a research study actually done on my Python Tutor platform, which we'll talk about a lot today. And it was a survey I deployed to a bunch of programmers who were, explicitly we want to find people over 60 years old and kind of trying to find, these people who are 60 and plus who are learning programming in all sorts of settings. And we found all these really interesting things about them. So check out that episode.
02:29 Michael Kennedy: Yeah, it was really surprising and a lot of folks really enjoyed hearing it because I think they were in that situation and I think they felt kind of alone or they felt like they were doing something that was weird. It was not going to work. And it turns out there's a bunch of people who really appreciated getting into programming. One of the ones that touched me was the idea of I want to get into programming so I can help my grandchild either do robotics or automate Minecraft. Or something like that was a really an interesting reason for it.
02:57 Philip Guo: Yeah, and it was such a cool intergenerational thing too. So that was awesome.
03:01 Michael Kennedy: Yeah yeah, for sure. So we're not going to talk about either of those things really today, we're going to talk about, as you mentioned, your project called Python Tutor at pythontutor.com, right? Do I have the domain correct?
03:13 Philip Guo: Yep pythontutor.com.
03:14 Michael Kennedy: So you like to mix it up and keep things a little bit different because I can go tutor myself on C++ at pythontutor.com right?
03:23 Philip Guo: Yeah, the name is quite outdated, right? It started as a Python-only tool and then it gets expanded to a bunch of languages of which, you know C, C++, Java, and JavaScript are the most, you know, the most widely used. I really need to think of a better name, but for now it's just by Python Tutor.
03:39 Michael Kennedy: Python Tutor is fine now, keep the roots. It's kind of as if IPython Notebooks didn't get renamed to Jupyter, right?
03:45 Philip Guo: That's right, yes, that's right.
03:47 Michael Kennedy: It sounds something to that effect. So, yeah super cool. Before we dive into the topics though, let's focus just for a moment on kind of what you do day to day. You've already told your story, how you got into programming in Python, but you said you're at the University of San Diego where I also was a grad student for a little while, so yeah, it's a beautiful place to be. And what do you do there?
04:07 Philip Guo: Especially in the winter. So I'm at a UCSD or University of California, San Diego. And I'm a assistant professor in the cognitive science department. So in our department we actually, it's a very interdisciplinary program where we have people from all sorts of backgrounds who are interested in studying the mind, studying how people interact with technology, building new technologies and such. It's a very kind of vibrant interdisciplinary place. And my research and teaching interests are in a field called HCI or Human Computer Interaction. So that's more widely known in industry as UX or User Experience. So I teach a bunch of courses on web development, user experience design, basically how to develop products that are very user-focused. And my research is on a topic that I think many of your listeners would be interested in is on, how do you build new kinds of interactive technologies that teach people programming and also increasingly now data science? So both of those are obviously super relevant to the Talk Python audience.
05:06 Michael Kennedy: Yeah, absolutely. It sounds like super interesting research. And for a long time I worked at a scientific company that was spun out of a cognitive science lab. And there's just a ton of interesting technology stuff going on there. We were using eye tracking, like E-Y-E not the letter I, tracking to understand how people interacted with software and other things. And yeah, it's a fun area to work, isn't it?
05:29 Philip Guo: Yeah, it's really cool. I think we have, in our cognitive science department, we have professors from all sorts of different backgrounds from like neuroscience, to psychology, to linguistics to computer science, artificial intelligence, and you know, emerging kind of interdisciplinary fields. And it's this nexus of a lot of, like you mentioned, of kind of people and minds and technology all together in one place. So it's a really unique field to be in.
05:53 Michael Kennedy: Yeah, it's got a lot of kind of interdisciplinary cross-pollination stuff. More so than, I don't know, I don't want to put any discipline on the spot, but more so than maybe a lot of them, right?
06:03 Philip Guo: Yeah, yeah I think so. You're very diplomatic. I mean, I originally was from a computer science department. So my background, my degrees are in computer science. I was at the University of Rochester in a computer science department. And that's much more traditional single field. Even though my work is clearly very interdisciplinary in education and technology and UX and everything. I feel very at home in a very interdisciplinary place.
06:25 Michael Kennedy: Nice, I can imagine. I doubt that a CPython code walkthrough would really be a big hit in a cognitive science type of thing. But certainly in the computer science world. Alright, so let's talk about Python Tutor. Give us a quick picture, like what is this thing? I've actually been using it recently and I'll share my experience with it. But let's start by just the elevator pitch. I'm driving in a car, I should not be pulling up websites while I'm doing so. Tell us what it's like.
06:53 Philip Guo: Alright, so if you're in your car, if you're in a Tesla and you have your big touchscreen and then while you're driving you want to, learn some code. So Python Tutor is a website. It's basically like a very simple online IDE, a very simple IDE. You just paste in code that you find online or you just type in code. Very simple text box. And when you hit run, the really unique thing about it is that it runs your code or shows the output just like many online coding environments do. But what's really unique is that it steps you through step by step what's going on. And at every step it draws diagrams of what's going on in memory. So what the stack frames are, what your global variables are, where your local variables with the pointers are with the values are in such. And it basically tries to emulate what a teacher would draw on the board, right? So if you had a teacher explain what does this little bit of code do? They'll start drawing on the board like here's some data, here's some variables, here's what they point to. And the Python Tutor tool just tries to automatically render that so that you could just either teach yourself or you could actually use that to show someone as an instructional tool.
07:56 Michael Kennedy: Yeah, it's really interesting and it does, I feel like it addresses the situation where someone's pretty new to programming, they're starting to think about, what is a variable? What is a list? What is a data structure? What is a reference type versus a value type? What is pass by reference? What is pass by value? And all that kind of stuff that you need to start to at least just get a bit of an intuition for if you're not necessarily doing a computer science degree. But you still need to have a sense of this thing is actually shared by multiple variables and if I interact with it from either, it affects all of them, and stuff like that. And so you can do really simple things like create a list and then assign it to multiple variables and just very clearly show how that works.
08:36 Philip Guo: Yeah. The examples that we show, the list aliasing is a great one. Just always great to say code on the air, right? So to say like x = [1, 2, 3] and then y = x. So it's not clear exactly why y equals x does, right? In some languages it might actually call a copy constructor and make a copy of that [1, 2, 3]. But in Python it happens that, y = x, kind of it copies the reference, right? So after you do that, then x and y both point to the same list of [1, 2, 3] in memory. Then if we asked you, what does y.append(4) do? And if I doy.append(4), what is x print out as? If you see a diagram, it's very clear that x will print out as [1, 2, 3, 4] because there's only one list. But if you show people a bunch of code, x = [1, 2, 3], y = x, y.append(4). What does print(x) do? It's not at all obvious because people actually, they've done these research studies, which are fascinating basically. It's quite low tech, right? So you actually just give students a bunch of codes, like introductory program students and you ask them to either draw out what they think is happening or just say what's happening. And people have all sorts of misconceptions. They think like all sorts of different mental models, right? But the nice thing about showing the diagram is like there's one right mental model there. And once people get it, it's like night bright as day, right? Like, oh we have clearly x and y point to the same thing. And that's going to happen. So the diagrams really go a long way into helping people learn these fundamentals.
09:59 Michael Kennedy: Yeah, I think they really do. It's kind of the thing where it's hard to unsee it. Once you've seen it, you can't unsee. Like, oh well, obviously this is what's happening in memory, here's the list. And then all the things in the list are other things that are pointed to by the list and so on. But when you're new, you don't really know. I think it's even more challenging when you're doing something like a class or an object. Some complex thing that is you could easily imagine like if I have a dictionary, a bunch of dictionaries in a list, those dictionaries are in the list, right? Allocated as part of the list. But obviously as you put it together you have the cool diagrams there I think that helps a lot. I also think it actually helps understanding memory management a little bit if you go and explore it. Because Python's core memory management story is reference counting.
10:46 Philip Guo: Yeah and it's interesting because the tool's designed to keep the visualization simple for beginners. But you can imagine augmenting it with, if you want to make a more advanced version, maybe you put the ref counts next to everything and you actually see. If there's three pointers pointing in the ref count is 3 and then maybe if there's like a weak pointer or some other thing pointing. And this is not just of Python, C++ or something you can see, oh there's 3 real pointers and one weak pointer. If every real pointer goes away, the weak pointer's still there, it can still be garbage collected or deallocate and stuff. I think there's a lot of stuff there. And the think you mentioned before, just a brief aside of this, can't unsee it. So in the CS education literature, that's sometimes called the threshold concept. So a threshold is, it's like you've processed the threshold, right? I'm not super familiar with all of them that they've identified, but there are certain things that they're called threshold concepts. So once you get it, like getting the concept of aliasing or something, you can't unsee it, right? You always will get it. But it's so hard for students to get to that point unless someone really shows you well.
11:44 Michael Kennedy: Oh, interesting. I didn't realize how formalized it was. Because of course it makes a lot of sense, right? Because once you identify those, if you can get people over the the gap, well they're ready to just proceed, right? Just start understanding what data structures do or understanding how reference counting works. But before that it's kind of this weird buzzy world. Because they don't really understand even what refers to what else. And so how do you understand how...
12:08 Philip Guo: Exactly and there is all, I mean, I don't personally do this kind of research but I've read up on a lot of it, it's just giving even small bits of code to beginners and just seeing what diagrams they draw. And this is very like cognitive science, right? It's very much like what mental representations are you building in your head? And people of all sorts and they draw the arrows backwards. They draw like boxes and other boxes they draw like, variables pointing other variables. And it's like if you have the students explain to you it all makes sense in their mind, right? Because the thing with programming is it's all an artificial construct, right? I think for people like us and your listeners who've been doing programming for a while, it seems so natural to us, but it's all artificial right? It's all just made up a bunch of rules that are made up.
12:47 Michael Kennedy: Yeah, I mean, our boxes and our lines, those are conceptual ideas, but at least the concept isn't incongruent with the way the computer works, but the computer doesn't actually care about these concepts to the large degree, right? It just has pages of memory and numbers and whatnot.
13:03 Philip Guo: That's right, and I think that's another more rabbit hole that we're about to go down. But it's just, what diagram should we draw, right? Like should we draw the bits of memory? Should we draw the quantum states of the atoms, right? And obviously for Python we want something a bit more abstract, right? And the whole point of abstraction in these higher level languages is you don't have to worry about all the bits of memory. You just worry about conceptual data structures and stuff.
13:25 Michael Kennedy: It's interesting. It's all about finding the best conceptual model that is both accurate but not too low level that you get lost in the details but not too high level that you don't understand the important parts, right? I think that programming and computer science, there's a lot of just having the right mental model.
13:40 Philip Guo: Yeah, and from the machines side it's the right abstractions right? It's really like the abstractions for your mind. And I'm sure because you teach a lot of these online courses and you make your own materials and stuff and we can talk about that bit, like I'm sure you think a lot about, in these domain specific things, if you're teaching async programming, what diagram should I draw? I don't know. Is it high level or is it low level? What is enough of an abstraction so that people can actually understand it and do stuff with it, but it's not too low level to get people confused.
14:08 Michael Kennedy: Yeah, I think one of the challenges I see in teaching in general also in like async for example, but in general is having the right level. Often you want to have something that's easy to understand, but if you give all the detail, it's just too much. If it's too easy, people go over that's fake, that's not real. I need to actually understand what's going on. And so you've got to walk this tight line. And that's also in the applications you present, right? Do you present something that's easily understandable but not real? Or do you rebuild Instagram, and people are like, what is all this caching and what is this database thing? I just want to know a little bit about web development. It's definitely interesting. I actually, for a course I'm working on I have been really diving into pythontutor.com.
14:51 Philip Guo: Oh cool. Alright.
14:53 Michael Kennedy: So I'm working on a course called Python for Absolute Beginners or Python for the Absolute Beginner. So it's kind of like, what I'm hoping to be is a first year computer science course for people who don't think they want computer science. What I mean is like take away all the abstract sort of theoretical stuff and just talk enough about data structures and pointers to understand like the shared list concept and whatnot. And it turns out Python Tutor is really good for creating those pictures. I was thinking about, how do I draw them? Maybe I could hook up my iPad with my Apple Pencil and I could do some stuff, right? I could obviously make some graphics, but it's really nice to just walk people through. Let's throw this into pythontutor.com and see what it does. Let's just step through it. And the other thing I think is interesting is it's not like you just dropped code in there and say run this and out pops the resulting in memory structures and values and so on. But you can step line by line and see how the pointers and the data structures evolve. And you can even step backwards. This portion of Talk Python to Me is brought to you by Tidelift. Tidelift is the first managed open source subscription giving you commercial support and maintenance for the open source dependencies you use to build your applications. And with Tidelift, you not only get more dependable software but you pay the maintainers of the exact packages you're using, which means your software will keep getting better. The Tidelift subscription covers millions of open source projects across Python, JavaScript, Java, PHP, Ruby, .NET and more. And the subscription includes security updates, licensing verification and indemnification, maintenance and code improvements, package selection and version guidance, roadmap input, and tooling and cloud integration. The bottom line is you get the capabilities you'd expect and require from commercial software, but now for all the key open source software you depend upon. Just visit talkpython.fm/tidelift to get started today.
16:48 Philip Guo: A big part of this tool is that it's step by step. So, let's say your code runs for 100 steps, then it brings you to UI that has a slider and a button. There's two buttons that goes forward and back and you can scrub back and forth to go forward and back on all the steps. And this works because all the code is already run on the server. It runs all 100 steps. And the idea behind this tool is not meant for giant pieces of code. So the code doesn't run for that many steps. If it's just a few lines of code and we can exhaustively run it and then collect the in memory trace at every one of those 100 or 1,000 or whatever steps and then we bring it back to the front end and then every time you do a step, either forward or back, we just render that in a visual form. So like you said, people can go with their own pace and go back and forth and try see, oh, what just happened between this line and this line? or why this thing do that? Hopefully some people can figure it out on their own, right? If they have some intuitions about it, but even if they can't, then they can use this as a tool to show their friend and say like, "Oh, can you explain why this thing goes down? It doesn't copy it." And at least there's something to talk about rather than just saying, "My code doesn't work."
17:50 Michael Kennedy: Yeah, there's a lot of people just throwing code out onto Stack Overflow or whatever. But you could permalink back into these examples, right? So you can actually put it in there and you could say, you see this step five, this is where your conception has gone off the rails. And my answer applies to that right there. See this picture, right?
18:09 Philip Guo: Yeah, and it's cool because the nice thing about taking advantage of the web as a medium is that URL concept is so powerful, right? So not only the code is embedded in the URL, the step number is as well. So if you ran a diagram, step 20 out of 50 and you see something funny, you can send someone a link and then when it goes out at link it'll run the code and it'll step to the step 20 and then you can ask them about that. So people have posted on Stack Overflow and on discussion forums for like a MOOC, like an online course and stuff. They're just like, here's the Python Tutor link. Can you tell me what's going on here?
18:43 Michael Kennedy: Yeah, and it's used in some textbooks and it's used in, like you said, some of the MOOCs, Massive Online Courses, whatever the MOOC stands for, I forgot.
18:53 Philip Guo: Yeah, the Massive Open Online Courses. So it's used by a few tech. So like Brad Miller, who was on your podcast pretty early on who has this Runestone Interactive and interactive Python textbooks. Also it's used in UC Berkeley's introductory course, which is I think one of the biggest intro programming courses and probably in the world, right? It's over 1,500 students a term, almost 2000. They can't even like fit in a lecture hall. They have to give several sessions and it's because UC Berkeley, the giant school for computer science and all their students have to take intro Python and they're using the Python Tutor all throughout their course materials. And a bunch of other schools use it too that I haven't even kept track of.
19:31 Michael Kennedy: Yeah, that's got to be a pretty rewarding feeling to have that many folks using it and benefiting from it.
19:36 Philip Guo: Yeah, it's really nice. We can talk about the organic growth and everything, but every day we have maybe over 10,000 probably active users a day and it's just on a site. The other thing that is relatively new since the last time we talked was this live help mode. So there was a public help queue. So if you could just press, if you're brave enough to press the get help button. You actually put your session on a help queue and anybody on the site, whether they're like a tutor who are just hanging out on the site or just another student who was just procrastinating or just stuck on their own problem, they can click your name and you join a shared session. And it's as though you have like a screen-share in the browser and you can see each other's mice and you can walk through and write code together. And then also chat in a little chat box. So we have like a few dozen people a day using this feature and like getting help from this absolute strangers around the world. And it's basically like a Stack Overflow like thing, except it's real time and it's private. So they're not shy about posting their questions, and it's private, and it's chat based. That's been really successful.
20:38 Michael Kennedy: Yeah, that was really interesting. I did see that and when I was messing with trying to visualize the code that I was trying to explain, I did see, so-and-so from Argentina or a user from Argentina is asking for help on Python. So and so from Germany is asking for help on C++ code or whatever. And there's a little button that just says help them.
20:57 Philip Guo: Yeah, you can just jump in and it's all self moderating, right? It's all voluntary. If you don't like it, you can just leave. There's no private information being exchanged. It's very kind of lightweight and it's worked really well so far just because the community is still relatively modestly sized, right? So that the people on the site, they're usually pretty well behaved because they're there because they're trying to learn or genuinely wanting to help each other, right? It really reminds me of the good parts of the internet in a sense where people were actually helpful and friendly to each other.
21:28 Michael Kennedy: Yeah, that is actually really nice. There's not the permanent snarkiness, right? You just get there to help people or whatever.
21:33 Philip Guo: Yeah, and there was no harm done. If you can't help you say, "Sorry, good luck." And someone else might jump in and stuff.
21:39 Michael Kennedy: Yeah, for sure. You talked about some interesting things that it does. Maybe we could just talk a little bit about that before we get into the history and just the maintaining of it. So I have this Python code or this C++ code and I want to put it onto your server and run it. That already seems a little interesting and risky. And the other thing is, you've talked about it being stateless and yet there's all these interesting things like I can bookmark and share this code on step five with this visualization run or I can have this interactive chat with these people and so on. So how does that all work?
22:14 Philip Guo: The blog posts that I think you'll link to is about maintaining and scaling the system as just one person, right? And we'll talk about that later on the show. But one of the, I guess design principles are, I guess inadvertent design principles is that, I didn't want to have like much permanent state at all. So, for the most part I call it stateless in the sense that, I guess the state is all explicit, right? If you visit the site, for every URL you go to, that state is completely in the URL. So if you go to the site, it's blank, it's blank code. You start typing and if you want to save your code, quote unquote, the only way to do that is to create a URL. And your code is actually in the URL.
22:51 Michael Kennedy: Is it like Base64 encoded or something like that?
22:54 Philip Guo: Yeah, something, yeah. I don't think it's even compressed. But you can imagine compressing it. It's probably Base64 or some kind of encoded. It's all in URL. And the thing is like modern URLs are like, you ain't going to be up to a few megabytes and stuff. I mean, it's not recommended, but you can fit a fair amount of code in there. And again, this tool isn't for a lot of code, right? So it's like a few lines of code, it's in the URL. And then also the status of, do you want to execute this code? Which step do you want to be on? What options do you have toggled? They're all just parameters in the URL. So the nice thing about that is that I don't have a database, right? There's no database anywhere. There's no user accounts. Like you don't register, we don't keep track of your history of your code. There's none of the frills of like an online editor. Like what you mentioned about the chat is, of course there is a chat server, right? You need a chat server in order to maintain that. And I guess the chat server has in memory state, but like it doesn't keep any on disk, right? So like if my server gets rebooted or something crashes and, at the worst that happens is your chat session dies and then you hope to wait till the server auto reboots and then you reconnect and stuff. So it's very janky like that.
24:02 Michael Kennedy: Well, I think that's actually really interesting. It's all about the trade offs, right? What do you want to build? Are you trying to build a community around this thing? Are you trying to build a tool? And one of the things we're going to dive into is, this is something that you've grown quite a bit, even though you'll have a full time job and you're not getting paid for it and it's sort of focusing on one thing that you really wanted to build instead of just letting it grow and grow. Because there are so many knock on effects from the stuff that you're talking about, right? So once you have user accounts, well now you have to have email, because one of the very first features of a website that gets used is, I can't log in, I forgot my email, click here to reset it. Like within hours of launching my site, that thing got used right away, right? It means as users were just signing up, that thing got used. And once you have email, you got to worry about spam and then you've got to worry about the American CAN-SPAM Act. You've got accounts and now you've got to worry about GDPR policies and all of these. Just the tentacles of it just grow like crazy, right? And then there's the support stuff that goes on. And just, it's so easy to ask for these simple things. And we haven't even talked about patching databases and migrations and backups and those kinds of things. And it's fine if that's what you want to build, go do it. I went and built something kind of like that with my platform. But it's not your main job or your main focus, right?
25:22 Philip Guo: Yeah, that's great. As you were talking about all this things, it just made me, have all these...
25:26 Michael Kennedy: Feeling of all these, I don't enjoy any of that.
25:30 Philip Guo: Either that or vicariously feel like, because I know folks like yourself and I have other friends who are building their own software businesses, essentially a software, their own SaaS businesses. And of course if you're building a business and you have users, and not to mention having money involved, right? You have payment processing.
25:46 Michael Kennedy: Oh right, we haven't even talked about bank, bank accounts and merchant accounts and all that kind of stuff. That's a whole 'nother level.
25:52 Philip Guo: Yeah, I like your framing of like just from minute one of login, right? Let's say you just like you want to have accounts from the minute people log in, people are going to forget their password and then they need an email reset and then you need to send out emails. So you need to figure out how to not get on everyone's spam filters and like all these. And then if you keep any user data, there's all these laws and terms of service and you want, and then when you have money involved and stuff. So my goal with all this, I mean this all started out as just like a personal project in grad school, almost 10 years ago. And like many, I followed a lot of these independent creators and independent open source developers and a lot of these project just start out like mine, right? There's someone's personal itch, someone has a personal interest, they start a project that starts small and then it organically grows. And then it just depends on what people's goals are with that. And for myself, I'm in a very traditional academic role, my day job is teaching and doing research and all the professor things and it just happens that I have this thing that I keep running and it's been beneficial to me both in terms of my research and teaching obviously and also just publicity and just general personal enrichment. But then on the other hand I want to be very careful about scoping so that it doesn't take over my whole life.
27:04 Michael Kennedy: Yeah, absolutely. That's sort of the blessing and the curse of popular things. Be it a free website or an open source project like SQLAlchemy or something like that. All of a sudden all of these people are asking you to help add this feature or do this thing or, "I can't get it to work. Can you help me with this?" It can just overwhelm, overwhelm you, right? A lot of people get burned out trying to deal with that. Not even to mention the folks who come along ask for help, which is clearly uninformed sort of foundations and then they're angry if you won't take an hour out of your day to help them.
27:37 Philip Guo: Right, right, yeah. I think everyone has experiences of that. We've seen a lot of blog posts and things on Twitter and it's very common sentiment. Basically I think if you've maintained any piece of software, whether it's your own business or it's an open source one. When you get thousands, hundreds of thousands, millions of users, even if 99% of the users are great, that 1% or the 0.1% of bad interactions can just be really bad just because kind of a large numbers thing, alright? And then those bad interactions really sour your mood for the whole day and such though.
28:09 Michael Kennedy: Right, and it's not just that there's only just a couple of them, right? There might just be one a week, but human psychology of it is, we feel the negativity much more than we feel the either, explicit, "Hey, thanks, this thing really helped me." Or even just the satisfied people using it and not saying anything, right? But that negative stuff that sticks with you and it can really drag you down. If you could somehow wash it away with the 10,000 other good experiences, you could drown it out. But like that's just not how people are.
28:37 Philip Guo: Yeah, yeah, and I think Brett Cannon has talked a lot about that. So, Brett is one of the core Python developers and he's written some great blog posts, given some great talks and interviews about that concept and others as well. I mean, in my article, my blog posts links to a bunch of prior work from other people talking about open source maintenance and burnout and this sort of volunteer labor, right? And I learned a lot from reading a lot of this stuff through the years and I explicitly want to design this project so that I hopefully don't suffer from that, right? Because that's not my full time job to do open source.
29:08 Michael Kennedy: Right, right. And you want to have a healthy psychology and feel good about yourself and not just feel beat up all day, that's great. In terms of the history of Python Tutor, basically this is something that you created in grad school, right? And it's taken off. It kind of rode the wave of MOOCs and online education and interactive books and all that. And since then you've mostly been able to just sort of keep the lights on and add a few features, not to spend tons of time on it. Is that pretty accurate?
29:35 Philip Guo: Yeah, so I think it started around 2010 and then the first few years, I would say 2010 to 2013 those first three or so years were very active in development. Both because I was still in school, so I had much more free time. And then also the MOOCs, massive open online courses. MOOCs were coming online around the early 2010. Khan Academy, a lot of these paid online courses, right? On platforms like yours and other bunch of other, Lynda and Pluralsight. All these platforms that are coming on. I also like Hour of Code and a lot of this, teaching everyone programming and kids getting into programming. So there's just so much energy in the first half of this decade around online tools for programming. Fortunately a lot of the intro courses were taught in Python, so I had this tool that had very good organic Google searches. They call it Python Tutor and as more people use it, more people found it and they link to it from their blogs and from online courses and online textbooks and lecture notes and stuff. And it just really grew and then like you said, in the last five, five years or so it's mostly been in maintenance mode because I've been, very busy with my early professor career and such.
30:41 Michael Kennedy: Right, and that's a critical time in that career for sure to make it through to tenure and so on. This portion of Talk Python to Me, it's sponsored by Clubhouse. Clubhouse is a fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Great teams choose Clubhouse because they get flexible workflows where they can easily customize workflow states for teams or projects of any size. Advanced filtering. Quickly filtering by project or team to see how everything is progressing. Effective sprint planning, setting their weekly priorities with iterations and then letting Clubhouse run the schedule. All the core features are completely free for teams with up to 10 users. As Talk Python listeners you'll get two free months on any paid plan with unlimited users and access to the premium features, so get started today. Visit talkpython.fm/clubhouse. That's talkpython.fm/clubhouse. So you have an interesting quote in the article that I'll link to that we talk about the maintaining this as a solo, open source developer and so on. You say that Python Tutor is probably, as far as you know, the most widely used piece of open source software that's maintained my single active assistant professor. That's quite an interesting statistic. I think you're probably maybe right.
31:58 Philip Guo: Yeah. As far as I know, it's always good to say as far as I know, because it's true as far as I know, it's true. So the quote there was about like an assistant professor is someone who's basically in the first five or six years of their career
32:11 Michael Kennedy: Tenure track but not yet tenured, right?
32:12 Philip Guo: Yeah, so tenure track means that you're on a path to work toward getting tenure, but I'm not there yet. So you're basically at these big universities, it's a lot about publishing, getting grants, writing research papers, teaching well, all that stuff goes in your portfolio. And building open source software is not really part of that portfolio, although there are people who do it because that's part of their research lab right? I guess some way claim to fame is I think out of people who are early in their careers. I don't know of anybody who's really been maintaining software that's been so widely used. And there's of course software projects that are much more widely used. The Jupyter project is a great example, right? So Jupyter Notebooks and the whole Jupyter ecosystem that is started out of academia and now there's a lot of industry partnerships, but that's a big team effort with a lot of funding. It's a big team effort. It's not just one person in there in their home office hacking away late at night.
33:06 Michael Kennedy: Yeah, for sure. There's a bunch of folks that work on that. Another example that came to mind was SageMath. Are you familiar SageMath and William Stein's work?
33:13 Philip Guo: Yes.
33:14 Michael Kennedy: He was at University of Washington and near Seattle and he worked on SageMath. I don't think he did it solo, but he actually left academics to just focus on SageMath and the platform. I interviewed him many years ago and I hope, hopefully he's still doing well. I mean that's kind of the pressure, right? For him he's like, I can't do both, I'm going to just go work on this project. Which it seems pretty interesting, but you've got to balance it, right?
33:37 Philip Guo: Yeah, it's just depends on priorities, right? So William is a bit, quite a bit more senior than I am. I mean, he made tenure and a full professor, so he advanced quite a bit in the ranks at University of Washington as the math professor and all the meantime, obviously his passion had been making the Sage project for, on computational mathematics and that, again, I think that started with him, right? Like many projects and then it grew to the point where he wrote some great blog posts about this kind of over the years of, how it's really hard to sustain this in academia, especially if it's growing because that's not really your day job right? It's hard to get funding for it. It's hard to get students to work on it. But then, he decided at least as of a year or two ago, he decided to quit his professor job and full time just basically run it as a business, right? I think it's still open source, but there's a consultancy model, and hosted and everything.
34:26 Michael Kennedy: Yeah there's paid hosting cloud version as well. That's interesting. Have you considered, you are subject to the publish or perish segment of your career, right? So have you considered publishing things about Python Tutor on say JOSS, the Journal of Open Source Software or something like that?
34:43 Philip Guo: So far actually this whole thing about Python Tutor in these past few years, it's not just, for my own personal enrichment or benefit in the world. There was actually a great career benefit as well. So because the platform has a lot of users, I'm actually able to use it to do data analysis or to run experiments, or to deploy these user surveys like for the older adults. Imagine it's hard to reach thousands of older adults coding all around the world if you don't have a platform that you can just deploy.
35:08 Michael Kennedy: Especially beginner coders.
35:08 Philip Guo: Exactly.
35:10 Michael Kennedy: Like the advanced ones, you can go through the standard dev channels.
35:14 Philip Guo: Like Stack Overflow or whatever.
35:15 Michael Kennedy: It's really hard to reach the the beginners though.
35:17 Philip Guo: Yeah, so I actually have been able to, my students, I've actually been able to do a good amount of research on the platform. So we have a bunch of papers using the platform. But you're right, we haven't actually thought about writing technical papers about the system itself per se. And part of it is just the lack of time and priorities. And the hope is in the future if I have more time to think about these things I might transition that. But so far, like you're saying the, I've been down a much more traditional, academic research route. So our papers are much more academic in nature, which is good because I obviously like doing that too.
35:50 Michael Kennedy: That's right, that's right. It's the right fit. Alright so let's talk a little bit about your article and you basically laid out the various steps and trade offs. I think trade-offs is the under the key word or key underpinning here of how can you both keep this very popular thing going and yet focus on your academic career. And you have a ton of students, right? At UCSD that need to come to office hours, and you need to grade their papers or their code and work with them, and that already is draining. You probably don't want to go and then take care of a bunch of issues and bugs and stuff afterwards that you don't need to bring upon yourself right? So you kind of laid out some of the steps that you went through and it's a little bit of a counter example of what people say you should do to be a really good open source maintainer. And all comes back to where do you want this thing to lead you to? Do you want to create requests of super-popular library, or do you want to create Jupyter, or do you want to keep this thing interesting and useful while not letting it consume your life, right?
36:54 Philip Guo: Yeah, and I've definitely chosen, sustainability as the most as a highest priority, right? It's like, I like this thing, it's running, it's running well, it could sustain itself. But if I try to grow it in any way really, I feel like it's at a good equilibrium now. And if I try to do more stuff, it would just create more work, right? Doing more stuff just creates more work and such. So, and I like the framing of it, it's kind of a counter example. I think the explicit framing of the article which people can read is that there's all these best practices that people talk about in open source, which I'm sure many people on your podcast have talked about building a community, being responsive to users, having good documentation, good tutorials.
37:31 Michael Kennedy: Inviting great collaborators.
37:33 Philip Guo: That's right, inviting collaborators, maintaining a community, both of users and of contributors and such. And I basically try to turn every one of these best practices and think about the opposite, right? Because my use case is that I don't want to grow this, I want this to keep going, but I don't want to, just let it grow.
37:51 Michael Kennedy: Right well, I do think one of the key missing elements is, or key assumptions is of course I want my open source project to be the most widely used and highest contributed to thing, period, right? And that may be, but if it's not, then maybe that advice no longer applies.
38:08 Philip Guo: Yeah I like that. I like the frame and actually, and that's on a related note, this is kind of, you read these blog posts about, in the technology world, right? There's always more technologies coming about. You can do Kubernetes and you can do all of these crazy setups and you have all these new cloud engines and all platforms and stuff.
38:25 Michael Kennedy: Teaching our frameworks. So you can use the new async stuff and all that right?
38:28 Philip Guo: Yeah and then people are just like, alright, you can just chill because you're not running Facebook, or Google, or Twitter. you're just starting a minimal viable product or a small business something. Just pick something that works. It's fine, just build your product out. And I feel very similar. I don't use all the new technologies. My tech stack is pretty old and pretty crufty but it kind of works. And as long as it keeps working and it's reasonable, I don't want to like, I don't want to poke at it all because I don't want to have to deal if anything breaks.
38:55 Michael Kennedy: Right, Well if you're trying to chase the most modern JavaScript front end framework, think how many times you'd rewrite that.
39:00 Philip Guo: Right exactly.
39:02 Michael Kennedy: It's Angular no we're angry, we're angry at Angular now. So it's a Vue, oh Vue went to 3, and people are upset about something there. So now it's React, it goes and goes and goes right? So pretty interesting. So let me just take you through some of the key points in the article that you talked about, some of the steps you've taken to help keep this balance that you talked about them. There's one that I think we've already hit on a lot is you hyperfocused on a single use case.
39:27 Philip Guo: Yeah. The main use case or I guess the only use case is emulating what a teacher would draw on the board. So I felt like that focus is great because that gets rid of a lot of the scalability issues, right? Because you're like, oh, I need to run arbitrary code. And like, how do you render, a bunch of code and a bunch of diagrams and stuff? And it's like the way to users this as think about what would a teacher draw on the board. If the teacher can't draw on the board, you probably can't understand anyways because that's not for the use case. And if you have too much code, if your code is too complex, we just throw up our hands and we bring you to like an unsupported features page, and like, alright, this is really outside of the scope with this tool. Focus really helps eliminating feature creep.
40:06 Michael Kennedy: Right, we already talked about databases, email, reset passwords, accounts, GDPR, all that kind of stuff, right? And just saying, look, we just need this really cool diagramming, this auto diagramming feature. This is what we're going to focus on. It's been really successful. And I think that leads pretty naturally into not listening to user requests, right? People ask for accounts, or social gamification, or integration with GitHub, or programmer like auto-complete, like PyCharm or Visual Studio Code, or an LMS, all these different things people are asking for.
40:40 Philip Guo: Yeah, I mean these are all great ideas. If other people want to build them or if I had a team to build those, that'd be great. But again, if it's only me, there's no way to implement all of those.
40:51 Michael Kennedy: Yeah, they all do sound fun, but they all, it's one of those things where you want to ask, could you just make this small change and let's focus it down, not just from a whole application but let's just take it down to a little library, right? Some open source library you got. Could you just add this overload or this default value to this function? Or could you just add one other function that does something slightly different? It's probably only three lines to write. Please do that. Why won't you do that? Well, because now I've got to go write a bunch of tests and then I've got to go rewrite the documentation. And then there was that screenshot that showed the output, but now the output is different. So we've got to go regenerate screenshots for all these things and then got to rewrite the tutorial because now this would be an alternative way and there's just these three lines blows up into a week long experience, right? It's super hard to see those knock on effects.
41:38 Philip Guo: Yeah, it's like you could have this whole hour just you talking to yourself. Because, I mean, it's like you basically said all of these things way better than I could, right. Yeah, that's right. These things just keep piling up. I think that goes with the focus, right? Like if you're really focused on providing one thing and the anecdote here is that I don't have any flexibility in how the diagrams, I'm sure you've run into this tier, you're like, I really wish this diagram was drawn in a slightly different way.
42:02 Michael Kennedy: Yeah, I'm like, "Okay, let's go across, and that stuff really cut behind?"
42:06 Philip Guo: So it's just like, yeah, too bad, right? Because to make it more flexible, it's just a lot more work. And in a way, I kind of view this tool as, because it's kind of pretty stable, I just said, if I was an instructor, you want to work around it, you just basically would explain it. It's like, this is a tool, it works but just be careful that, this thing actually you should point backwards or whatever. Early on, I actually, I have a little anecdote here. I actually did work with a few professors. It was way in the early days, right? And I actually customize several versions. With some version they wanted the pointer drawn this way. And then I had a few versions for different classes just because these are my friends and colleagues. And also it was early on so I wanted to help them out. But after a while I'm like, alright, there's no way I can do this for everybody. This is a canonical graphic visualization, take it or leave it.
42:49 Michael Kennedy: Yeah, yeah. I think it's fine. Do you need a better picture? Use a whiteboard or...
42:54 Philip Guo: Or manually draw it yourself.
42:56 Michael Kennedy: That's what I meant. Draw it on, either on a chalkboard or a whiteboard. Like some sort of digital equivalent. Here's the picture I want to draw and this is actually how it should look. I think that's probably just fine.
43:08 Philip Guo: Yup.
43:08 Michael Kennedy: Nice. The other, the next one up in the article was that you resist talking to users.
43:14 Philip Guo: Yeah, yeah. That's a very a very blunt way to put it. Yeah. In the early days, it was great because I have my email address on the site and it was very helpful to talk to users and get bug reports and future requests because that was how I was able to iterate on the tool so well. I'm very grateful for the early users. Even though I put it so bluntly, I put an asterisk, I'm like, in the beginning it was awesome. But then after, again, after the tool stabilized that for a few years there wasn't anything obvious I wanted to add. Then the bug reports just were corner cases or things that have already been said before or like, "Could you do this?" I'm like, "No, this is way too complicated." And so what I do is, I have a very comprehensive kind of FAQ/unsupported features page and I list out basically anything people would ask. It kind of lists it out. And if people actually ask something, that's unusual, I would add it to the FAQ, and that seems to work reasonably well.
44:05 Michael Kennedy: Yeah, that's really nice. Instead of just answering it privately over email, which is frustrating, find a way to answer it in a public permanent form so it can just stay. You could just either just have an autoresponder that says, first you need to look here and then you can email me, or something like that, right?
44:21 Philip Guo: Yeah. Basically now, where there's an error message on the site for whatever reason, right? Either it's a user triggered error or just like the server went down or something, I just put a little link to the page where they can just go read the page. And it was before I put my email address, so then obviously if I have an email address they would just email me a lot. So again, it was like the thing with design or like if you want to create less work for yourself, then you know, make yourself less available.
44:43 Michael Kennedy: Right, right, absolutely. So another one that you decided to do is not to go and explicitly try to do marketing to promote the thing, but more somehow you just sort of grew organically in the MOOC era and have been going strong since then.
44:58 Philip Guo: Yeah. I'm a really big fan of following a lot of these open source conferences, PyCon and others, and watching their videos, and seeing how people promote and market and spread the word about their open source project. But again, it's just a time thing, right? It's like I have to spend my time giving talks and traveling on much more academic, in research conferences. It's actually something I would love to do in the future. I want to have a bit more kind of freedom and time to. I would love to explore this world of nonacademic conferences because it's like I've been to many academic conferences, All pretty similar. All of the stereotypes that you hear about. And I would love to participate in things like PyCon and OSCON and all these things. Again, it's just a priority for me at this point. That I haven't really prioritized it.
45:44 Michael Kennedy: Sure, trade offs. The other one that you mentioned that we have covered fairly deeply is keeping everything stateless.
45:50 Philip Guo: Yeah, so just briefly on that is that I don't have, basically I don't have a persistent database. I guess by stateless I'd mean there's no persistent data store.
45:59 Michael Kennedy: Right, right. Sure, there might be in memory chat logs to sync that up, but that's not the same as I need to migrate Postgres to the latest version or had a failure. So we had to go over to the backup cluster. Like that's not problem you worry about.
46:12 Philip Guo: Yeah, there's none of that. I mean basically, I just reboot the servers periodically if something, it's pretty bad. I mean it's like there's some weird memory leak issues because maybe with Docker, maybe with something else, I don't know, that's the point, right? I don't actually know, I don't bother to debug. I just have some cron job that just checks my memory usage, and if the memory usage start spiking too high for too long, I just reboot the server, right? I think I have a few servers so it kind of load balances. So like it seems to work like pretty well. And at worst it's a free thing. People would just go try again a minute later and it works. It's been holding up for a few years like this is not a way to run DevOps, a system at all, but it works.
46:51 Michael Kennedy: Yeah, no look, practicality definitely beats the purity of, absolutely. You're saying this and it's like, I'm just this guy, I got to just keep it working and I can't be debugging this weird Docker issue. So you know what, just forget it. We're just going to reboot it every now and then. If you were a real company you would definitely not do this. But on the other hand, there's a really cool article from Instagram's engineering blog called Dismissing Python Garbage Collection at Instagram. You can go import GC and say gc.disabled in Python and it'll turn off the generational garbage collector that catches the cycles. But because most of the stuff is caught in reference counting, you can actually live for a long time. So they ended up doing that in production and saving tons of memory usage because they got better memory sharing across like the forked out processes. Yeah. And then they just reboot it because eventually you got a bunch of cycles, you've got to get rid of it. So they just recycle the process.
47:42 Philip Guo: That's great.
47:43 Michael Kennedy: Yeah, so it's the same thing, right? Like it's not as crazy as it sounds, I guess.
47:48 Philip Guo: Yeah, and this is like that's great. That's an engineering hack in the truest sense of the word, right? That it's like a simple and kludgy solution. But I'm sure it saves them all sorts of time and money. You can imagine calculating literally the savings of money in the data centers from that efficiency. And also the savings of money and paying engineers, to debug and maintain all that. If they had implemented a more complicated custom memory allocator scheme, it just takes all this money for highly paid engineers to maintain all that. It's like, no, we'll just reboot it.
48:17 Michael Kennedy: Yeah, exactly. It's so weird that they get actually better performance by just letting it leak memory, but apparently when there's the right use case. They definitely do.
48:26 Philip Guo: That's great.
48:27 Michael Kennedy: Yeah, yeah. Another maybe put two couple of things together like kind of in that bucket are, I'm not super worried about performance reliability and not, we already mentioned, not super dedicated to staying up on the latest version of the hot new web framework or whatever.
48:43 Philip Guo: Yeah, and actually ironically, me having a very stable sort of setup is kind of good for a liability in a sense because if I try to change anything around it might bail in some weird way. I'm always upgrading the latest libraries or latest framework. There might be some weird memory leak that's undiagnosed. But if I stick with super old kind of a LAMP stack, a super old setups, those things are pretty patchy and those things are pretty well debugged and fairly stable. But on other hand, I'm not trying to squeeze every ounce of performance out. That it works well enough. Sometimes you have to wait a little longer if the server's busy or you have to retry. But this is not a Wall Street high speed trading or something.
49:24 Michael Kennedy: Yeah, maybe some large lecture is just finished and now they said, "Everybody open up your laptop." Like, "All 1,000 of you go here now and try this." Maybe you run into it. But my experience was, it was super fast and it was totally fine. I had no latency issues. And I think that's interesting because I feel like so much the tradeoffs that we consider, imagine a world where we're so successful that we can barely stand it. You know what I mean? What if we're featured on the front page of Forbes or we're stuck at the top of Hacker News for like three weeks, or Product Hunt. And just the people come and they just rush it. I think people really don't appreciate a lot of people. Not everyone. A lot of people don't really appreciate how much traffic, just simple Pyramid, Flask, Django on a $10 server can handle. I mean, my server, we get millions of requests. We do like 15 terabytes of data traffic exchange. It's ridiculous. And it's like the CPU usage is 5%. It's nothing.
50:24 Philip Guo: Yeah. I mean this goes back to the point earlier that we mentioned, these blog posts about don't worry about designing for scale up front, right? This is, I mean, it's a classic case, the modern instance of premature optimization. It's like, I think a lot of this premature optimization is as software developers we often like to try on new technologies and try to, they're intellectually interesting, right? Like, if you're feel hook this up in this up, it's intellectually interesting. And I feel it personally, it's sort of like procrastination from thinking about your product, right? So it's like the hard thing when you're building either open source or a product or anything is the actual product and talking to users and the real core of what you really want, whether you're building something people really want, right? Because then you have to actually talk to people and face criticism.
51:06 Michael Kennedy: Documentation on tutorials or other boring stuff in your mind that's not like cutting edge yeah?
51:11 Philip Guo: Yeah. If am building the best tech stack, then no one's going to tell you no, but then you might over-engineer that. So for me it's not that I'm some engineering genius, it's just I didn't have time to do any of this. I stuck with whatever stuff was available 10 years ago. That's another thing that people ask. Like "Oh do you use all these things?" Like "They didn't exist 10 years ago." So of course I didn't use it. I haven't really upgraded at all.
51:33 Michael Kennedy: Yeah yeah, super interesting. The other last two I guess kind of fit together as well as the code is available on GitHub and I'll link to it, but you don't make it super easy for people to work on and you don't have a lot of contributors.
51:45 Philip Guo: Yeah yeah, so this is kind of like the last part. Like when people talk about open source, another assumption that people have is that open source projects have a community of both users and contributors, right? Like you think about these projects with contributors and GitHub, pull requests and issues and all of this very vibrant thing of open source. And for me it's like it's open source in the strictest sense is that the source code is open and there was a open source, valid open source license on the code. You can use it, you can put in your products, whatever you want according to the licenses. It's not quote unquote open source in that I don't foster a community contributor. I explicitly don't spend time on documenting the code or telling people really any instructions for how to install it, or run it on their own servers, or under different edge cases and stuff. And also I don't really solicit contributions, right? That anyone can fork the code and use it. I'm sure people use it in all sorts of ways I don't even know. And that's the great part about it, but I don't personally have time to, merge the contributions and manage all of this complexity around. When you go from one developer myself to anybody who's more than one, you're dealing with a team project you have to manage and I just didn't want all that.
52:54 Michael Kennedy: Yeah, and it's another one of these knock on effects of, well it would be great to have people to contribute but then your code has to be a little bit higher quality so that it's easier for them to do so. Oh, and then also you got to make sure that you have proper test coverage completely across the board. Because if they contribute something and the Travis CI automation says it passes well, is that really broken I got to go test it. I mean like here we are again going down this rat hole, which if that's the direction, right? Like it's about where you want to go. Like I said at the beginning, if that's the way you want to go, you definitely want to do that. But if that's not the way you want to to go, then maybe that's not the right thing.
53:27 Philip Guo: Yeah, totally. And yeah, I think you summarized it really well. At this one because every word at these point, you summarize the examples better than I could have.
53:36 Michael Kennedy: Well I'm happy to do it. You wrote a good article there. That I read through and thought about. I guess an interesting term that I've heard, I heard this from Scott Hanselman first, but maybe he heard it somewhere along the way. I don't know the original attribution, but I've heard of this type of project, at least the way you described at the end as source open instead of open source. It's like the source is there, but it's not sort of participating in the whole PR flow. That said, you do have people who have contributed, right? The Java visualizers was done by other people and so on. So it's not that nobody contributes, it's more that you're not fostering it. So it probably counts as open source and maybe a little asterisk like limited for special cases or something.
54:17 Philip Guo: Yeah, very limited. And like the Java, that one's a great example because it was actually done by a professor who taught in Java. So like funny to admit, I haven't done Java since like the beginning of college in one class because I've just never worked in Java. I just never worked in Java.
54:33 Michael Kennedy: Let's go use something else.
54:34 Philip Guo: Yeah, I mean I just never happened. That just would never the world I worked in, I didn't work in, enterprise apps in the 2,000 or widgets or applets or whatever. So he taught in Java and he actually made a Java extension. He actually hosted on his own site for a while and then after a while I actually merged his thing into the main thing. Because it was well contained, right? It's self-contained. It was its own thing. It just interfaced with my visualizer and it works great. It works great, but the thing is if anyone reports a bug in it, I'm just like, I have no idea what to do. Because this guy has moved on to something else too.
55:04 Michael Kennedy: The software is as is.
55:06 Philip Guo: It as is, it's literally as is. I don't actually know Java, I don't know how to debug it. It's running some old version of Java. It works and I don't want to touch it.
55:15 Michael Kennedy: Alright yeah, it's the version of Java from six years ago, but that's what it's going to be. It's all good. Nice. At the end of the article you said, it's a bit of a fluke that Python Tutor is doing so well that you've all these millions of users who have visited it and benefit from it and so on. And the reason it's a bit of a fluke is your day job doesn't incentivize it. So let me ask you, what do you think, do you think it should? I mean, I did mention the Journal of Open Source Software is a way to kind of like shoehorn it in a tad, but what are your thoughts on things like this in academics and should academics get credit for these kinds of creations? I mean, it's clearly a value to the world if 10 million people are understanding code better through it.
55:55 Philip Guo: Yeah, that's a great point. That could be its own hour. And I honestly get that question a lot. Even the question is really about currently academic path, kind of research studies and publications and grants. All the more traditional metrics are the main thing. And I think that things are broadening out in some fields, right? In fields like my own and HCI, human computer interaction, user experience. A lot of our things are open source products that you write papers on, you do studies on. It's all very symbiotic. I think by kind of most clear stance on this is that, research is really about sharing generalizable knowledge with the world right? So like an example is if I made Python Tutor, I just stuck the code on GitHub and said, "Here's the link." I think that by itself is not really any much generalizable knowledge. It's just a bunch of stuff. But like you mentioned, if I wrote about it in detail, I showed how it works for users if I have some general design ideas, maybe these blog posts and these things. I think that there is a way to work that into a more scholarly sort of portfolio. I see that in, now that in more interdisciplinary field. We have colleagues and in visual arts, right? So in UC San Diego, we have a great visual arts department. What is their portfolio, right? Their portfolio is, they write some papers, they write some scholarly works, analysis, history. But a lot of portfolio is my art pieces, and my film, a film professor or music professor, you're doing performances and stuff. So I think that the world of science is broadening out its definition more. And I definitely do see that broadening in the future. But universities and academia is a very kind of traditional and slow moving sort of institution.
57:30 Michael Kennedy: It's a big ship to turn.
57:32 Philip Guo: That's right, so for the time being, I still think that, the ending of my article. I have a lot of students especially, very programing orientated students are like, "Oh I want to be able to build software and stuff." And then actually tell me, what's your goal? If your goal is to be working in an industry or working in open source, that's great. But if your goal is still to build of a more traditional academic career, at least for the time being, the more traditional research and scientific studies are the way to go. And if you can do your science in a way that can foster open science and open source, which some people have done very well, including some people we've interviewed. That's probably the most sustainable way, right? That you do your work in the open, but you also kind of adhere to more traditional scholarship as well. It's a lot of work, right? That's the most practical way now.
58:16 Michael Kennedy: Yeah, it does seem like it's shifting a little bit, but I definitely hear where you're coming from. I think some of the drivers are, "Hey, we have this great paper about, we just took the first picture of black holes, but, by the way, we can't actually take the pictures. We have to use artificial intelligence to interpret a bunch of different things that actually compile the picture." And how can you possibly write your article without somehow talking about what you built? There's probably a component that can be extracted out. I think as people are moving away from things like MATLAB and other proprietary and SAS and whatever proprietary systems. They're starting to get into open source and it just draws them into building stuff that helps do their research. And I think that that's eventually, that's going to put enough pressure to turn the ship a little.
58:59 Philip Guo: Yeah, I think so. And I think that people at the more senior levels can be, can have more flexibility in doing these sorts of things and advocating for that. And I think a lot of this stuff works on a top down too. And already, we've seen the top down view is, if the funding agencies say that you must put your code open, your data open and stuff. And which some funding it, the NIH and NSSF, they're starting to do that. I think those are great efforts because that at least gets people to think about, oh we might need to like clean up our data and our code so that other people can use it. And eventually those, the stuff will percolate downward.
59:31 Michael Kennedy: Yeah, I agree in the reproducibility aspect, that's becoming a bigger focus. It's always been super important, but now it's easy to test whether it's reproducible. So I think that also leads to, it's more of the code being open and whatnot.
59:44 Philip Guo: Yeah, exactly.
59:45 Michael Kennedy: Alright, well this has been a super interesting conversation but you've got to answer the two questions before you get out of here. If you're going to write some Python code, what editor do you use?
59:54 Philip Guo: I still use Vim because I learned it in grad school and I still use it even though everybody's onto Visual Studio Code by now. I can, just like with the old technologies, right? I've passed several generations, right? There was, Adam was big, Sublime was big, and VS Code is now, obviously really big. I still use Vim and that's what I do.
01:00:15 Michael Kennedy: We talk about these threshold concepts. There's probably some kind of theory about editors as well. And that's how it's interesting. And then notable PyPI package.
01:00:24 Philip Guo: That's really funny. I was trying to prepare for this, last time I gave this cop out, you know, just Anaconda or just this package manager and stuff. It's hilarious because I actually don't write a lot of Python code anymore. Because this Python Tutor, it's all JavaScript. It's all web code.
01:00:40 Michael Kennedy: Right, you've got to write the front end stuff. Like that's where all the magic is.
01:00:44 Philip Guo: That's right, ironically I'll plug one thing. I was actually scrambling because I knew you were going to ask that question then. I was scrambling to look at, this is Python code or for something else? This code that I wrote to inventory of my file system. So I've been basically, doing a lot of stuff with basically kind of building my own personal archiving and rsync and Dropbox and kind of thing. And one of the problems is, how do you just crawl through a directory hierarchy super fast? When you have a million files. And I've actually found this is a plug for upgrading Python 3. I found that this is not a package, but you know, the os, the built in os library, os.walk. And os.walk I believe is this thing that's only in Python, I think 3.5 or above. And it's super fast. At least on the Mac and I think it's because it probably uses some system calls or stuff. And it's notably because I had a Python 2.7 version before using scandir or some other thing right? And like if you upgrade to Python 3.5 and use this function, os.walk, it just goes like, it's just like thousands of times faster. And people talk about this a lot, right? And like, that's one of the things that gets people to upgrade is that if something just is slow and there's a new version of the API and a lot of things that are new in new Python standard library, in the standard library I guess, are either drop in replacements or a slightly different API, but it just goes so much faster. I just encourage people to look at that. So this is not a PyPI package.
01:02:10 Michael Kennedy: No, that counts, like I used to like it.
01:02:12 Philip Guo: Where you looking at the standard. Just re-looking at standard library for things that they optimize. That was the first thing that came to my mind because I was just working on that code.
01:02:23 Michael Kennedy: Very nice, very nice. Alright, final call to action. People are either teachers or they're students, they're hearing about Python Tutor, they think it's maybe useful. What do you tell them?
01:02:32 Philip Guo: I tell them that just go try out the site pythontutor.com. And to go participate in the community of just helping each other out and asking for help and such, and it's really community driven. So, any help I can get on the site would be great. And also if they want to integrate it when they're teaching materials and stuff. Despite what I say in my article, I actually really do like hearing from instructors and students in the pile of emails that I get. So I love hearing about how people are using it. If you have good user stories or interesting stories, I always write, and despite again, in my article I say, "Oh, I don't listen to users." But I actually do listen to all the users. I actually write down all of their notes. In the GitHub repo there's notes of like, these are cool future directions. I don't think I have time to do this, but these are awesome suggestions.
01:03:18 Michael Kennedy: It's good to write that down. So once you get a tenure or you want to take, you have a sabbatical and you want to come back and spend some time on it, you can actually harness all that feedback.
01:03:26 Philip Guo: That's right yeah. That's always a dream of like, oh, someday I'll get to it to make some of that. And then you just end up getting more busy over time because everyone gets. That's just the reality of life.
01:03:37 Michael Kennedy: That's how life is. Alright Phillip, it's really great to chat with you as always. Thanks for being here.
01:03:40 Philip Guo: Awesome, thank you so much, Michael. Thank you so much again.
01:03:43 Michael Kennedy: You bet, bye. This has another episode of Talk Python to Me. Our guest in this episode was Phillip Guo and it's been brought to you by Tidelift and Clubhouse. If you run an open source project, Tidelift wants to help you get paid for keeping it going strong. Just visit talkpython.fm/tidelift, search for your package and get started today. Clubhouse is a fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Fall in love with project planning. Visit talkpython.fm/clubhouse. Want to level up your Python? If you're just getting started, try My Python Jumpstart by Building 10 Apps course, or if you're looking for something more advanced, check out our new Async course that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher in search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google play feed is /play in the direct RSS feed at /rss on talkPython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code