Learn Python with Talk Python's 270 hours of courses

#247: Solo maintainer of open-source in academia Transcript

Recorded on Tuesday, Dec 10, 2019.

00:00 Do you run an open source project? Does it seem like you never have enough time to support it?

00:04 Have you considered starting one but are unsure if you can commit to it? The challenge is real.

00:09 On this episode, we welcome back Philip Wu, who has been a solo maintainer of the very popular

00:15 PythonTutor.com project for over 10 years. He has some non-traditional advice to help

00:20 keep your sanity and keep your project going by holding down a busy full-time job.

00:26 This is Talk Python to Me, episode 247, recorded December 10th, 2019.

00:31 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:49 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm at,

00:54 and Kennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the

00:59 show on Twitter via at Talk Python. This episode is brought to you by Tidelift and Clubhouse. Please

01:05 check out what they're offering during their segment. It really helps support the show.

01:08 Philip, welcome back to Talk Python to Me.

01:11 All right. I'm super excited to be here. I think it's my third time here, I believe.

01:14 This, I do believe this is your third time here. The first time you came, we talked about the

01:20 CPython source code and we spent a lot of time talking about coval.c. You had been doing a

01:25 graduate student course, walking them through the, basically the source code of Python to talk about

01:31 interpreters, right?

01:31 Yeah, yeah. That was back when I was at the University of Rochester. That was, that was back in the Python

01:35 2.7 days. And I've heard recently on your shows, there've been people who've done updated versions

01:40 for Python 3, right? Updated versions of this interpreter walk.

01:44 Yeah, yeah, exactly. Yeah. We had Anthony Shaw not long ago. He wrote almost a book on it. So yeah,

01:49 we had a lot of time, a good time talking about that. And the other one was really well received

01:52 as well. And it was something like geeking out in your golden years or something like that,

01:57 like coming into programming, basically near retirement. You'd done some research on that,

02:01 right?

02:01 Yeah. So that was right when I came to UC San Diego, which is where I'm working now. And that was a research

02:07 study actually done on my Python tutor platform, which we'll talk about a lot today. And it was a survey

02:12 I deployed to a bunch of programmers who were, you know, explicitly, we want to find people over 60

02:17 years old and kind of trying to find, you know, these people who are, you know, 60 and plus who are

02:22 learning programming and all sorts of settings. And we found all these really interesting things about

02:26 them. So check out that episode.

02:28 Yeah, it was really surprising. And a lot of folks really enjoyed hearing it because I think they were

02:33 in that situation. And I think they felt kind of alone, or they felt like they were doing something

02:38 that was weird or was not going to work. And it turns out there's a bunch of people who

02:43 really appreciated getting into programming. One of the ones that touched me was the idea of I want

02:48 to get into programming so I can help my grandchild either do robotics or automate Minecraft or something

02:55 like that was a really interesting reason for it.

02:57 Yeah, it was such a cool, you know, intergenerational thing too. So that was awesome.

03:00 Yeah, yeah, for sure. So we're not going to talk about either of those things really today.

03:05 We're going to talk about, as you mentioned, your project called Python Tutor at PythonTutor.com,

03:11 right? Do I have the domain correct?

03:13 Yep, PythonTutor.com.

03:14 So you like to mix it up and keep things a little bit different because I can go tutor myself on C++

03:21 at PythonTutor.com, right?

03:23 Yeah, so the name is quite outdated, right? It started as a Python only tool and then it expanded to a bunch

03:29 of languages of which, you know, C, C++, Java and JavaScript are the most, you know, the most widely

03:34 used. And I really need to think of a better name, but for now it's just PythonTutor.

03:39 No, PythonTutor is fine. Yeah, keep the roots. It's kind of as if IPython notebooks didn't get renamed

03:45 to Jupyter, right?

03:45 That's right. Yes, that's right.

03:46 It's something to that effect. So yeah, super cool. Let's, before we dive into the topics though,

03:51 let's focus just for a moment on kind of what you do day to day. You've already told your story,

03:57 how you got into programming in Python, but you said you're at the University of San Diego,

04:02 where I also was a grad student for a little while. So yeah, it's a beautiful place to be.

04:06 And what do you do there?

04:07 Especially in the winter. Yeah. So I'm at UCSD or University of California, San Diego, and

04:12 I'm an assistant professor in the cognitive science department. So in our department, we actually,

04:17 it's a very interdisciplinary program where we have people from all sorts of backgrounds who are

04:22 interested in studying the mind, studying how people interact with technology,

04:27 studying, you know, building new technologies and such. It's a very kind of vibrant interdisciplinary

04:31 place. And my research and teaching interests are in a field called HCI or human computer interaction.

04:37 So that's more widely known in industry as UX or user experience. So I teach a bunch of courses on web

04:43 development, user experience, design, basically how to develop products that are very user focused.

04:49 And my research is on, you know, a topic that I think many of your listeners be interested in is

04:54 on, you know, how do you build new kinds of interactive technologies that teach people programming

04:59 and also increasingly now data science. So both of those are obviously super relevant to the,

05:04 to the talk Python audience.

05:05 Yeah, absolutely. It sounds like super interesting research. And for a long time,

05:10 I worked at a scientific company that was spun out of a cognitive science lab. And there's just a ton

05:16 of interesting technology stuff going on there. You know, we were using eye tracking, like EYE,

05:21 not the letter I tracking to understand how people interacted with software and other things. And yeah,

05:27 it's, it's a fun area to work, isn't it?

05:29 Yeah, it's really cool. I mean, I think we have, you know, in our cognitive science department,

05:32 we have professors from all sorts of different backgrounds from like neuroscience to psychology

05:37 to linguistics to computer science, artificial intelligence, and, you know, emerging kind of

05:42 interdisciplinary fields. And it's, it's this nexus of a lot of like you mentioned of kind of people

05:47 and minds and technology all together in one place. So it's a, it's a really unique field to be in.

05:52 Yeah, it's got a lot of kind of interdisciplinary cross pollination stuff more so than I don't know,

05:59 I don't want to put any discipline on the spot, but you know, more so than maybe a lot of them,

06:02 right?

06:03 Yeah, yeah, I think so. You're very diplomatic. Yeah. I mean, I originally was from a computer

06:07 science department, right? So my background, my degrees are in computer science. I was at the

06:11 University of Rochester in a computer science department, and that's much more, you know,

06:15 traditional single field, even though my work is clearly very interdisciplinary in education

06:19 and technology and UX and everything. So, you know, I feel, feel very at home in a very

06:24 interdisciplinary place.

06:25 Nice. Yeah, I can imagine. Yeah, I doubt that a C-Python, Python code walkthrough would really be a big hit in a cognitive science type of thing,

06:34 but certainly in the computer science world. All right, so let's talk about Python Tutor.

06:38 Give us the quick picture. Like, what is this thing? I've actually been using it recently,

06:43 and I'll share my experience with it. But let's start by just the elevator pitch. I'm driving in a

06:48 car. I should not be pulling up websites while I'm doing so. Tell us what it's like.

06:54 All right, so if you're in your car, if you're in your Tesla, and you have your big touchscreen,

06:57 and while you're driving, you want to, you know, learn some code. So Python Tutor is a website. It's

07:02 basically like a very simple online IDE, a very simple IDE. So you just paste in code that you find

07:08 online, or you just type in code, very simple text box. And when you hit run, the really unique thing

07:14 about it is that it runs your code. It shows the output, just like many online coding environments do.

07:18 But what's really unique is that it steps you through step by step what's going on. And at

07:23 every step, it draws diagrams of what's going on in memory. So what the stack frames are, what your

07:29 global variables are, what your local variables, what the pointers are, what the, you know, values

07:33 are and such. And it basically tries to emulate what a teacher would draw on the board, right? So you

07:39 had a teacher explain, what does this little bit of code do? They'll start drawing on the board,

07:43 like here's some data, here's some variables, here's what they point to. And the Python Tutor tool

07:47 just tries to automatically render that, so that you could just either teach yourself, or you could

07:53 actually use that to show someone as an instructional tool.

07:56 Yeah, it's really interesting. And it does, I feel like it addresses the situation where someone's

08:01 pretty new to programming, they're starting to think about, you know, what is a variable? What is a list?

08:05 What is a data structure? What is a reference type versus a value type? What is passed by reference?

08:11 What is passed by value? And all that kind of stuff that you need to start to at least just get a bit of an

08:17 intuition for if you're not necessarily doing a computer science degree, but you still need to

08:21 have a sense of this thing is actually shared by multiple variables. And if I interact with it from

08:26 either, it affects all of them, stuff like that. And so you can do really simple things like create a

08:31 list, and then assign it to multiple variables, and just very clearly show how that works.

08:36 Yeah. And you know, the examples that we show, you know, the list aliasing is a great one, like,

08:40 you know, just always great to say code on the air, right? So it's, you know, say like x equals

08:44 bracket, you know, one comma two comma three, and then y equals x. So, you know, it's not clear exactly

08:49 what y equals x does, right? In some languages, it might actually call a copy constructor and make a

08:54 copy of the list one, two, three. But in Python, it happens that, you know, y equals x kind of,

08:59 it copies the reference, right? So after you do that, then x and y both point to the same list of

09:04 one, two, three memory. But then if we asked you, you know, what does y.append4 do? And if I do y.append4,

09:10 what does x printout as? And if you see a diagram, you know, it's very clear that x will printout as

09:15 one, two, three, four, because there's only one list. But if you show people a bunch of code, you know,

09:19 x equals one, two, three, y equals x, y.append4, what does printx do? It's not at all obvious,

09:25 because people actually, you know, they've done these research studies, which are fascinating,

09:28 basically, which is, it's quite low tech, right? So you actually just give students a bunch of code,

09:33 like introduction program students, and you ask them to either draw out what they think is happening,

09:37 or just say what's happening. And people have all sorts of misconceptions, they think like

09:41 all sorts of different mental models, right? But the nice thing about showing the diagram is like,

09:45 there's one right mental model there. And once people get it, it's like night bright as day,

09:50 right? Like, oh, yeah, clearly x and y point to the same thing. And that's going to happen. So

09:54 the diagrams really go a long way into, you know, helping people learn these

09:58 fundamentals.

09:59 Yeah, I think they really do. It's kind of the thing where it's hard to unsee it. Once you've

10:04 seen it, you can't unsee like, oh, well, obviously, this is what's happened in memory, here's the list.

10:09 And then all the things in the list are other things that are pointed to by the list, and so on. But

10:14 when you're new, you don't really know. And I think it's even more challenging when you're doing

10:19 something like a class or an object, some complex thing that is, you could easily imagine,

10:24 like, if I have a dictionary, a bunch of dictionaries in a list, those dictionaries are

10:28 in the list, right? They're allocated as part of the list. But obviously, as you put it together,

10:33 you have the cool diagrams there. And I think that helps a lot. I also think it actually helps

10:38 understanding memory management a little bit, if you go and explore, right? Because Python's core

10:43 memory management story is reference counting.

10:45 Yeah. And you know, it's interesting, because the tool is designed to keep the visualization simple,

10:49 you know, for beginners, but you can imagine augmenting it with, you know, if you want to

10:54 make a more advanced version, you know, maybe you put the ref counts next to everything, and you actually

10:58 see, you know, oh, if there's three pointers pointing, and the ref count is three, and maybe if there's like

11:02 a weak pointer, you know, or some other thing pointing, and this is not just for Python, for like C++

11:06 or something, you can see, oh, there's three real pointers and one weak pointer. So if, you know,

11:10 if every real pointer goes away, the weak pointer is still there, it can still be garbage collected or

11:13 deallocated and stuff. So I think there's a lot of stuff there. And the thing you mentioned before,

11:17 just as brief aside of this, can't unsee it. So in the CS education literature, that's sometimes

11:22 called what's called a threshold concept. So a threshold is like, you know, it's like you've

11:27 crossed this for threshold, right? So they have some, I'm not super familiar with all of them that

11:31 they've identified, but like, there are certain things that like, they're called threshold concepts.

11:35 So once you get it, like getting the concept of aliasing or something, you can't unsee it,

11:39 you always will get it. But it's so hard for students to get to that point, unless someone really

11:43 shows you well.

11:44 Oh, interesting. I didn't realize how formalized it was. Of course, it makes a lot of sense, right?

11:49 Because once you identify those, if you can get people over the gap, well, they're ready to just

11:54 proceed, right? Just start understanding what data structures do or understanding how reference counting

12:00 works. But before that, it's kind of this weird, fuzzy world that you don't really understand even

12:05 what refers to what else. And so how do you understand how?

12:08 Exactly. And there's all, I mean, I don't personally do this kind of research, but I've read up on a lot of it.

12:13 Just, you know, giving even small bits of code to beginners and just seeing what diagrams they draw.

12:18 And this is very like cognitive science, right? It's very much like what mental models are you,

12:22 mental representations are you building your head? And people have all sort, you know, they draw the arrows

12:26 backwards. They draw like boxes and other boxes. They draw like, you know, variables pointing to other

12:30 variables. And it's like, if you have the students explain to it, it all makes sense in their mind,

12:34 right? But like, because the thing with programming is it's all an artificial construct,

12:37 right? Like it's, you know, I think for people like us and your listeners who've been doing

12:41 programming for a while, it seems so natural to us, but it's all artificial, right? It's all just

12:45 made up a bunch of rules that are made up.

12:47 Yeah. I mean, our boxes and our lines, those are conceptual ideas, but at least they, the concept

12:53 isn't incongruent with the way the computer works, but the computer doesn't actually care about these

12:58 concepts to the large degree, right? It just has pages of memory and numbers and whatnot.

13:03 That's right. And I think that's another, you know, more rabbit holes that we might have time

13:06 to go down, but it's just, you know, what diagram should we draw, right? Like, should we draw the

13:11 bits of memory? Should we draw, you know, the quantum states of the atoms, right? And, you know,

13:15 obviously for Python, we want something a bit more abstract, right? And the whole point of abstraction

13:19 in these higher level languages, you don't have to worry about all the bits of memory. You just worry

13:23 about conceptual data structures and so.

13:25 It's interesting. It's all about finding the best conceptual model that is both accurate, but not too low

13:31 level that you get lost in the details, but not too high level that you don't understand the important

13:35 parts, right? I think that programming and computer science, there's a lot of just having the right

13:39 mental model.

13:40 Yeah. And from the machine side, it's the right abstractions, right? It's really like the

13:43 abstractions for your mind, right? And I'm sure, you know, because you teach a lot of these online

13:47 courses and you make your own materials and stuff, and we can talk about that, but like, I'm sure you

13:51 think a lot about, you know, in these domain specific things, if you're teaching async programming,

13:55 you know, what diagram should I draw, right? I don't know, you know, is it high level? Is it low level?

13:59 You know, what is enough of an abstraction so that people can actually understand it and do stuff

14:05 with it, but it's not too low level to get people confused?

14:07 Yeah. I think one of the challenges I see in teaching in general, also in like async, for example,

14:13 but in general is having the right levels. Often you want to have something that's easy to understand,

14:18 but if you give all the detail, it's just too much. If it's too easy, people go, oh, that's fake.

14:23 That's not real. I need to actually understand what's going on. And so you've got to walk this

14:28 tight line. That's also in the applications you present, right? Do you present something

14:32 that's easily understandable, but not real? Or do you rebuild Instagram? And people are like,

14:37 what is all this, you know, caching? And what is this database thing? And like, I just want to know

14:42 a little bit about web development. So yeah, it's definitely interesting. I actually, for a course

14:47 I'm working on, have been really diving into pythontutor.com.

14:51 Oh, cool. Yeah. So, so I'm working on a course called Python for absolute beginners or Python for

14:58 the absolute beginner. So it's kind of like what I'm hoping to be is a first year computer science

15:03 course for people who don't think they want computer science. What I mean is like, take away all the

15:08 abstract sort of theoretical stuff and just talk enough about data structures and pointers to

15:13 understand like the shared list concept and whatnot. And it turns out Python tutor is really good for

15:19 creating those pictures. I was thinking about how do I draw them? Maybe I could hook up my iPad with

15:23 my Apple pencil and I could do some stuff or I could obviously make some graphics, but it's really nice

15:27 to just walk people through, you know, let's throw this into pythontutor.com and see what, what it does.

15:33 Let's just step through it. And the other thing I think is interesting is it's not like you just drop

15:38 code in there and say, run this and out pops the resulting in memory structures and values and so on.

15:44 But you can step line by line and see how the pointers and the data structures evolve and you can

15:49 even step backwards.

15:50 This portion of Talk Python to Me is brought to you by Tidelift. Tidelift is the first managed open source

15:58 subscription, giving you commercial support and maintenance for the open source dependencies you use

16:03 to build your applications. And with Tidelift, you not only get more dependable software, but you pay the

16:09 maintainers of the exact packages you're using, which means your software will keep getting better.

16:13 The Tidelift subscription covers millions of open source projects across Python, JavaScript, Java, PHP,

16:19 Ruby, .NET, and more. And the subscription includes security updates, licensing, verification,

16:25 indemnification, maintenance and code improvements, package selection and version guidance, roadmap input,

16:31 and tooling and cloud integration. The bottom line is you get the capabilities you'd expect and

16:36 require from commercial software. But now for all the key open source software you depend upon,

16:42 just visit talkpython.fm/Tidelift to get started today.

16:48 A big part of this tool is that it's, you know, it's step by step. So let's say your code runs for 100

16:53 steps, and it brings you to UI that has a slider and a button, two buttons that goes forward and back, and you

16:59 can scrub back and forth to go forward and back on all the steps. And this works because all the code is already

17:04 run on the server, it runs all 100 steps. And the idea behind this tool is not meant for, you know, giant pieces of code.

17:10 So the code doesn't run for that many steps, if it's just a few lines of code, and we can exhaustively run it and then

17:15 collect the in-memory trace at every one of those hundred or thousand whatever steps.

17:19 And then we bring it back to the front end. And then every time you do a step, either forward or back, we just

17:24 render that in a visual form. So like you said, you know, people can go at their own pace and go back and forth and

17:30 try to see, oh, what just happened between this line and this line? Oh, why this thing do that? And, you know, hopefully

17:35 some people can figure out on their own, right, if they have some intuitions about it. But, you know,

17:39 even if they can't, then they can use this as a tool to show their friend and say, like, oh, can you

17:43 explain why this thing goes down? It doesn't copy it. And at least there's something to talk about rather

17:47 than just saying, my code doesn't work. Yeah, there's a lot of people just throwing code out onto

17:53 Stack Overflow or whatever, but you could permalink back into these examples, right? So you can actually

17:59 put it in there and you could say, you see this step five, this is where your conception has gone off the

18:04 rails. And my answer applies to that right there. See this picture, right? Yeah. And it's cool because,

18:10 you know, the nice thing about taking advantage of the web as a medium is that URL concept is so

18:15 powerful, right? So not only the code is embedded in the URL, the step number is as well. So if you're

18:21 at a diagram, you know, step 20 out of 50, and you see something funny, you can send someone a link.

18:26 And then when it goes that link, it'll run the code and it'll step to the step 20. And then they can,

18:31 you know, you can ask them about that. So people have posted on Stack Overflow and on,

18:35 you know, discussion forums for like a MOOC, like an online course and stuff. They just like,

18:40 here's a Python tutor link. Can you tell me what's going on here?

18:42 Yeah. And it's used in some textbooks and it's used in, like you said, some of the MOOCs,

18:47 massive online courses, whatever the MOOC stands for. I forgot.

18:51 Yeah. The massive open online courses. Yeah. So it's used by a few techs. So like Brad Miller,

18:57 who was on your podcast pretty early on, who has this Runestone interactive and interactive Python

19:02 textbooks. Also it's used in UC Berkeley's introductory course, which is I think one of

19:07 the biggest intro programming courses and probably in the world, right? It's over 1500 students,

19:14 a term, almost 2000. They can't even like fit in a lecture hall. They have to give several sessions.

19:18 And it's, you know, because UC Berkeley is a giant school for computer science and all their

19:22 students have to take intro Python and they're using the Python tutor all throughout their

19:27 course materials. And a bunch of other schools use it too, that I haven't even kept track of.

19:30 Yeah. That's gotta be a pretty rewarding feeling to have that many folks using it and benefiting from it.

19:36 Yeah. It's really nice. I mean, it's been, you know, we can talk about the organic growth and

19:40 everything, but you know, every day we have maybe, you know, over 10,000 probably active users a day.

19:46 And it's just on a site. The other thing that is relatively new since the last time we talked was

19:50 this live help mode. So there's a public help queue. So if you just press, if you're, you know,

19:55 brave enough to press a get help button, you actually put your, just your session on a help queue

20:00 and some, anybody on the site, whether they're like a tutor who are just hanging out on the site or just

20:04 another student who is just procrastinating or just, you know, stuck on their own problem,

20:08 they can click your name and you join a shared session. And it's as though you have like a screen

20:13 share in the browser and you can see each other's mice and you can walk through and write code

20:18 together and then also chat in a little chat box. So we have like a few dozen people a day

20:22 using this feature and like getting help from just absolute strangers around the world. And it's like,

20:27 it's basically like a, you know, a stack overflow like thing, except it's real time and it's,

20:31 it's private. So it's, you know, they're not, you know, shy about posting their questions and it's

20:35 private and it's chat based. And it's, that's been really successful.

20:38 Yeah. That was really interesting. I did see that in when I was messing with trying to visualize

20:43 the code that I was trying to explain, I did see, you know, so-and-so from Argentina or a user from

20:48 Argentina is asking for help on Python. So-and-so from Germany is asking for help on C++ or whatever.

20:55 And there's a little button that just says help them.

20:57 Yeah. And you can just jump in and it's all very self, you know, it's all self-moderating,

21:01 right? It's all voluntary. If you don't like it, you can just leave. There's no, you know,

21:04 no private information being exchanged. It's very kind of lightweight and it's,

21:08 it's worked really well so far just because it's, you know, the community is still relatively

21:12 modestly sized, right? So that the people on the site are, they're usually pretty well behaved

21:17 because they're there because they're trying to learn or genuinely wanting to help each other.

21:20 It really reminds me of, you know, the, the good parts of the internet in a sense, right?

21:24 Where people are actually, you know, helpful and friendly to each other.

21:27 Yeah. That is actually really nice. There's not the permanent snarkiness, right? You just go there

21:32 to help people or whatever.

21:33 Yeah. And there's no harm done. If you, if you can't help you say, sorry, you know,

21:36 good luck and someone else might jump in and stuff.

21:39 Yeah, for sure. So you talked about some interesting things that it does. Maybe we

21:43 could just talk a little bit about that before we get into the history and just the maintaining of it.

21:48 So I have this Python code or this C++ code, and I want to put it onto your server and run it.

21:54 That already seems a little interesting and risky. The other one is you've talked about it being

22:00 stateless. And yet there's all these interesting things. Like I can bookmark and share this code,

22:06 on step five with this visualization run, or I can have this interactive chat with these people

22:11 and so on. So how does that all work?

22:14 The blog post that I think you'll link to is about, you know, maintaining and scaling the system as just

22:18 one person, right? And we'll talk about that in later on the show. But one of the, I guess,

22:23 design principles or I guess inadvertent design principles is that I didn't want to have like

22:28 much permanent state at all. So, you know, for the most part, I call it stateless in the sense that

22:34 I guess the state is all explicit, right? So like if you visit the site, you know, for every URL you go to,

22:40 that state is completely in the URL. So if you go to the site, it's blank, it's blank code. You start

22:44 typing. And if you want to save your code, quote unquote, the only way to do that is to create a

22:49 URL and that your code is actually in the URL.

22:51 Is it like base 64 encoded or something like that?

22:53 Yeah, something. Yeah. I don't think it's even compressed, but you can imagine compressing it.

22:57 And yeah, it's probably base 64, some kind of encoded. It's all in the URL. And the thing is like

23:02 modern URLs are like, you can be up to, you know, a few megabytes and stuff. I mean,

23:06 it's not recommended, but you can fit a fair amount of code in there. And again, you know,

23:09 the tool isn't for a lot of code, right? So it's like, you know, a few lines of code.

23:12 It's in the URL. And then also the status of, you know, do you want to execute this code? Which step

23:17 do you want to be on? What options do you have toggled? They're all just, you know, parameters

23:21 in the URL. So the nice thing about that is that I don't have a database, right? There's no database

23:26 anywhere. There's no user accounts. Like you don't register. You know, we don't keep track of your

23:32 history of your code. There's none of the, you know, the frills of like an online editor.

23:37 Like what you mentioned about the chat is, of course, there is a chat server, right? There is a,

23:41 you need a chat server in order to maintain that. And I guess the chat server has in memory state,

23:46 but like that doesn't keep any on disk state, right? So like if my server gets rebooted or something

23:51 crashes, then at the worst that happens is your chat session dies. And then you hope to wait till the

23:56 server auto reboots and then, you know, you reconnect and stuff. So it's very, very janky like that.

24:01 Well, I think that's actually really interesting. It's all about the trade-offs, right? What do you

24:05 want to build? Are you trying to build a community around this thing? Are you trying to build a tool?

24:09 And one of the things we're going to dive into is this is something that you've grown quite a bit,

24:15 even though you have a full-time job and it's, you're not getting paid for it. And it's,

24:19 it's sort of focusing on one thing that you really wanted to build instead of just letting it grow and

24:24 grow, because there are so many knock-on effects from the stuff that you're talking about,

24:28 right? So once you have user accounts, well, now you have to have email because one of the very first

24:34 features of a website that gets used is I can't log in. I forgot my email. Click here to reset it. Like

24:39 within hours of launching my site, that thing got used right away, right? It means, you know,

24:44 as users were just signing up, that thing got used. And once you have email, you got to worry about

24:49 spam. And then you've got to worry about the American CanSpam Act. You've got accounts,

24:55 and now you've got to worry about GDPR policies and all these, there's, it just, the tentacles of it

25:01 just grow like crazy, right? And then there's the support stuff that goes on and just, it's so easy to

25:07 ask for these simple things. And we haven't even talked about patching databases and migrations and

25:12 backups and those kinds of things, right? So, and it's fine if that's what you want to build,

25:16 go do it. You know, I went and built something kind of like that with my, my platform, but it's not

25:20 your main job or your main focus, right? Yeah. That's a great, yeah. As you were talking about

25:24 all those things, it just made me, you know, have all these, I don't have to worry about any of that.

25:30 Either that or vicariously feeling, you know, cause you know, I, I know folks like yourself and I have

25:35 other friends who are building their own software businesses, essentially a software, you know,

25:39 their own SAS businesses. And of course, if you're building a business and you have users and not to

25:43 mention having money involved, right? You have payment processing.

25:45 Oh, right. We haven't even talked about bank accounts and merchant accounts and all that

25:50 kind of stuff. That's a whole nother level.

25:51 Yeah. And I like your framing of like, just from, you know, minute one of login, right? Like to say,

25:57 you know, let's say just let one have accounts from the minute people log in, people are going to forget

26:01 their password and then they need an email reset. And then you need to send out emails. So you need

26:04 to figure out how to not get on everyone's spam filters and like all these. And then if you keep any

26:09 user data, there's all these laws and you want to term the service and you want, and then we have

26:13 money involved and stuff. So yeah, so my goal with all this, I mean, this all started out as just like

26:17 a personal project in grad school, you know, many, almost 10 years ago. And like many, you know, I've,

26:23 I followed a lot of these independent creators and independent open source developers. And a lot of

26:28 these projects just start out like mine, right? There's someone's personal itch, you know, someone has a

26:31 personal interest, they start a project, it starts small, and then it organically grows. And then it just

26:37 depends on what people's goals are with it. And for myself, you know, I'm in a very traditional

26:41 academic role. You know, my day job is teaching and doing research and all the professorly things.

26:47 And, you know, it just happens that I have this thing that I keep running. And it's been beneficial

26:51 to me both in terms of my research and teaching, obviously, and also just publicity and just, you

26:57 know, general personal enrichment. But then on the other hand, I want to be very careful about scoping

27:01 it so that it doesn't, you know, take over my whole life.

27:03 Yeah, absolutely. That's sort of the blessing and the curse of popular things, be it a free website

27:10 website or an open source project like SQLAlchemy or something like that. All of a sudden,

27:14 all these people are asking you to help add this feature or do this thing, or I can't get it to

27:20 work. Can you help me with this? And yeah, it can just overwhelm, overwhelm me, right? A lot of people

27:25 get burned out trying to deal with that. Not even to mention the folks who come along, ask for help,

27:30 which is clearly uninformed sort of foundations. And then they're angry if you won't take an hour out of

27:36 your day to help them.

27:37 Right, right. Yeah, I think everyone has experiences of that. Yeah, I mean, everybody,

27:40 you know, we've seen a lot of blog posts and things on Twitter. And it's a very common, you know,

27:45 sentiment, you know, basically, I think if you've maintained any piece of software, whether it's

27:49 your own business, or it's a it's an open source one, you know, when you get thousands, hundreds of

27:55 thousands, millions of users, you know, even if 99% of the users are great, you know, that 1% or the

27:59 0.1% of bad interactions can just be really bad, just because kind of a large numbers thing,

28:04 right? And then those bad interactions really sour you, sour your mood for the whole day and such.

28:09 So, right. And it's not just that there's only just a couple of them, right? There might just be

28:14 one a week, but the human psychology of it is we feel the negativity much more than we feel the either

28:21 explicit, hey, thanks, this thing really helped me, or even just the satisfied people using it and not

28:26 saying anything, right? But that negative stuff that sticks with you, and it can really drag you down.

28:30 If you could somehow wash it away with the 10,000 other good experiences, you could drown it out. But

28:36 that's just not how people are.

28:37 Yeah, yeah. And I think Brett Cannon has talked a lot about that. So, you know, Brett is one of the

28:41 core Python developers, and he's written some great blog posts, giving some great talks and interviews

28:45 about that concept and others as well, right? I mean, my article, my blog post links to a bunch of

28:50 prior work from other people talking about, you know, open source maintenance and burnout and

28:54 this sort of volunteer labor, right? And I learned a lot from reading a lot of this stuff through the

28:59 years, and I explicitly want to design this project so that I hopefully don't suffer from that, right?

29:04 Because, you know, that's not my full-time job to do open source.

29:07 Right, right. And you want to have a healthy psychology and feel good about yourself and

29:12 not just feel beat up all day. That's great. So, in terms of the history of Python Tutor,

29:17 basically, this is something that you created in grad school, right? And it's just,

29:21 it's taken off, it kind of rode the wave of MOOCs and online education and interactive books and all

29:27 that. And since then, you've mostly been able to just sort of keep the lights on and add a few

29:32 features, not spend tons of time on it. Is that pretty accurate?

29:35 Yeah. So, I think it started around 2010. And then the first few years, you know, I would say 2010 to

29:40 2013, those first three or so years were very active in development, right? Both because I was still

29:46 in school, so I had much more free time. And then also the MOOCs, you know, the Mass Open Online

29:51 Courses, you know, MOOCs were coming online around the early 2010s Khan Academy. A lot of these paid

29:56 online courses, right, on platforms like yours and other, a bunch of other, Linda and Pluralsight,

30:02 and all these platforms are coming on. Also, like Hour of Code and a lot of this, you know,

30:06 teaching everyone programming and kids getting into programming. So, there's just so much energy in the

30:11 first half of this decade around online tools for programming. And then a lot of, fortunately, a lot

30:16 of the intro courses were taught in Python. So, then I had this tool that had very good organic Google

30:21 searches. They call it Python Tutor. And as more people use it, more people found it, and they linked

30:25 to it from their blogs and from online courses and online textbooks and lecture notes and stuff. And it

30:31 just really grew. And then, like you said, in the last five years or so, it's mostly been in maintenance

30:35 mode because I've been, you know, very busy with my early professor career and such.

30:40 Right. And that's a critical time in that career, for sure, to make it through to tenure and so on.

30:45 This portion of Talk Python to me is sponsored by Clubhouse. Clubhouse is a fast and enjoyable project

30:53 management platform that breaks down silos and brings teams together to ship value, not features.

30:58 Great teams choose Clubhouse because they get flexible workflows where they can easily customize workflow

31:03 states for teams or projects of any size. Advanced filtering, quickly filtering by project or team

31:09 to see how everything is progressing. Effective sprint planning, setting their weekly priorities

31:14 with iterations and then letting Clubhouse run the schedule. All the core features are completely

31:18 free for teams with up to 10 users. As Talk Python listeners, you'll get two free months on any paid

31:24 plan with unlimited users and access to the premium features. So get started today. Visit

31:29 talkpython.fm/clubhouse. That's talkpython.fm/clubhouse.

31:35 So you have an interesting quote in the article that I'll link to that you talk about maintaining this as a solo open source developer and so on. You say that Python Tutor is probably, as far as you know, the most widely used piece of open source software that's maintained by a single active assistant professor. That's quite an interesting statistic. I think you may be right.

31:58 Yeah. I mean, as far as I know, that's always good to say, as far as I know, because it's true. As far as I know, it's true. Yeah. So the quote there was about like, so an assistant professor, someone who's, you know, basically in the first five or six years of their career.

32:11 Tenure track, but not yet tenured, right?

32:13 Yeah. So tenure track means that, you know, I'm on a path to work toward getting tenure, but I'm not there yet. So, you know, you're basically at, you know, these big universities, a lot about publishing, getting grants, you know, writing research papers, you know, teaching well, all that stuff goes in your portfolio. And, you know, building open source software is not really part of that portfolio. Although there are people who do it because that's part of their research lab, right?

32:35 So, and I guess my, you know, I guess somewhat claim to fame is I think out of people who are early in their careers, I don't know of anybody who's really been maintaining software has been so widely used. And there's, of course, software projects that are much more widely used. You know, the Jupyter project is a great example, right? So Jupyter notebooks and the whole Jupyter ecosystem that has started out of academia. And now there's a lot of industry partnerships, but that's, you know, a big team effort with a lot of funding with, you know, it's a big team effort. It's not just one person in there.

33:04 In their home office hacking away late at night.

33:06 Yeah, for sure. There's a bunch of folks that work on that. Another example that came to mind was SageMath. Are you familiar with SageMath?

33:12 Yes, yes.

33:13 And William Stein's work? So yeah, he was at University of Washington near Seattle, and he worked on SageMath. I don't think he did it solo, but he actually left academics to just focus on SageMath and the platform. I interviewed him many years ago, and I hope, hopefully he's still doing well. But yeah, I mean, that's kind of the pressure, right? Is for him, he's like, I can't do both.

33:33 I'm going to just go work on this project, which, you know, seems pretty interesting, but you got to balance it, right?

33:37 Yeah, it just depends on priorities, right? Yeah. So William is a bit, quite a bit more senior than I am. I mean, he's a, he made tenure and a full professor. So he advanced quite a bit in the ranks and University of Washington as the math professor.

33:49 And all the meantime, you know, obviously his passion has been making the Sage project for computational mathematics. And that, you know, again, I think that started with him, right? Like many projects. And then it grew. And to the point where, you know, he wrote some great blog posts about this kind of over the years of how it's really hard to sustain this in academia, especially if it's growing, because that's not really your day job, right?

34:09 It's hard to get funding for it. It's hard to get students to work on it. But then, you know, he decided, at least as of, you know, a year or two ago, he decided to quit his professor job and full time just basically run his, run it as a business, right?

34:21 I think it's still open source, but there's, you know, a consultancy model and hosted and everything.

34:26 Yeah, there's paid hosting cloud version as well. That's interesting. Have you considered you are subject to the publish or perish segment of your career, right? So have you considered publishing things about Python Tutor on, say, Joss, the journal for open source software or something like that?

34:43 So far, actually, you know, this whole thing about Python Tutor in these past few years, it's not just, you know, for my own personal enrichment or benefit in the world. There was actually a great career benefit as well. So because the platform has a lot of users, I'm actually able to use it to do data analysis or to run experiments or to deploy these user surveys like for the older adults. So like, imagine like it's hard to reach thousands of older adults coding all around the world if you don't have a platform that you can just deploy.

35:08 Especially beginner coders, right? Like the advanced ones, you can go through the standard dev channels, but right, Stack Overflow or whatever. Yeah, it's really hard to reach the beginners, though.

35:17 Yeah, so I've actually been able to my students, I've actually been able to do a good amount of research on the platform. So we have a bunch of papers using the platform. But you're right, we haven't actually thought about writing technical papers about the system itself, per se. And part of it is just the lack of time and priorities. And, you know, the hope is in the future, if I have more time to think about these things,

35:37 I might transition that. But so far, you know, like you're saying, I've been down a much more traditional, you know, academic research route. So our papers are much more academic in nature, which is, you know, which is good, because I mean, I obviously like doing that, too.

35:49 That's right. That's right. Yeah, it's the right fit. All right. So let's talk a little bit about your article. And you basically laid out the various steps and tradeoffs.

36:01 I think tradeoffs is the under the key word or key underpinning here of how can you both keep this very popular thing going and yet focus on your academic career.

36:11 And, you know, you have a ton of students right at UCSD that need to come to office hours and you need to grade their papers or their code and work with them.

36:20 That already is, you know, draining. You probably don't want to go and then take care of a bunch of issues and bugs and stuff afterwards that you don't need to bring upon yourself. Right.

36:30 So you kind of laid out some of the steps that you went through. And it's a little bit of a counter example of what people say you should do to be a really good open source maintainer.

36:40 And it all comes back to where do you want this thing to lead you to? Do you want to create requests of a super popular library or do you want to create Jupyter?

36:49 Or do you want to keep this thing interesting and useful while not letting it consume your life? Right.

36:54 Yeah. And I've definitely chosen, you know, sustainability as the most as the highest priority. Right. It's like I like this thing. It's running. It's running well. You know, it could sustain itself.

37:03 But if I try to grow it in any way, really, you know, I feel like it's at a good equilibrium now. And if I try to do more stuff, it would just create more work. Right. Doing more stuff just creates more work and such.

37:13 So and I like your framing of it. It's kind of a counter example. I think the explicit framing of the article, which people can read, is that there's all these best practices that people talk about in open source, which I'm sure many people on your podcast have talked about, you know, building a community, being responsive to users, having good documentation, good tutorials, just inviting collaborators.

37:33 That's right. You know, inviting collaborators, you know, maintaining a community, both of users and of contributors and such. And I basically try to turn every one of these best practices, you know, and think about the opposite. Right. Because my use case is that I don't want to grow this. I want this to keep going, but I don't want to, you know, grow it.

37:51 I do think one of the key missing elements is or key assumptions is, of course, I want my open source project to be the most widely used and highest contributed to thing, period. Right. And that may be, but if it's not, then maybe that advice no longer applies.

38:08 Yeah, I like that. I like the framing, actually. And that's, it's on a related note, this is kind of, you know, you read these blog posts about in the technology world, right? There's always more technologies coming about, you know, you can do Kubernetes and you can do all these crazy setups and you have all these new cloud engines and all platforms and stuff. And then, you know, there's these people like framework.

38:25 So you can use the new async stuff and all that, right?

38:28 Yeah. And then people are just like, all right, you can just chill because you're not running Facebook or Google or Twitter. You know, if you're just starting a minimal viable product or small business, something, just pick something that works. It's fine. Just build your product out. And, and I feel very similar, right? I don't use all the new technologies. I don't, you know, my tech stack is pretty old and pretty crufty, but it kind of works. And as long as it keeps working and it's reasonable, I don't want to like, I don't want to poke it at all because, you know, I don't want to have to deal with anything breaking.

38:55 Right. Well, if you're trying to chase the most modern JavaScript front and framework, think how many times you'd rewrite that.

39:00 Right. Exactly.

39:01 Right. It's, it's Angular. No, we're angry. We're angry at Angular now. So it's Vue. Oh, Vue went to three and people are upset about something there. So now it's React. It's just, it goes and goes and goes. Right. So pretty interesting. So let me just take you through some of the key points in the article that you talked about, some of the steps you've taken to help keep this balance that you talked about.

39:22 The one that I think we've already hit on a lot is you hyper-focused on a single use case.

39:27 Yeah. So the, the main use case, or I guess the only use case is, you know, emulating what a teacher would draw on the board. Right. So I felt like that focus is great because that gets rid of a lot of the scalability issues. Right. Cause you're like, Oh, I need to run arbitrary code. And like, how do you render, you know, a bunch of code and a bunch of diagram stuff.

39:43 And it's like, no, you know, the way to use this is think about what would a teacher draw on the board. If the teacher can't draw on the board, you probably can't understand anyways, because, you know, that's not for the use case.

39:53 And also, you know, if you have too much code, if your code is too complex, you know, we just throw up our hands and, you know, we, we bring you to like an unsupportive features page and be like, all right, you know, this is really outside of the scope of this tool.

40:03 So focus really helps, you know, the eliminating feature creep.

40:06 Right. We already talked about databases, email, reset passwords, accounts, GDPR, all that kind of stuff. Right.

40:13 And just saying, look, we just need this really cool diagramming, this auto diagramming feature.

40:18 This is what we're going to focus on. It's been really successful.

40:21 And I think that leads pretty naturally into not listening to user requests, right?

40:27 People ask for accounts or social gamification or integration with GitHub or programmer, like autocomplete, like PyCharm or Visual Studio Code or an LMS, all these different things people are asking for, right?

40:40 Yeah. I mean, these are all great ideas.

40:41 If other people want to build them or if, if I had a team to build this, that'd be great.

40:45 But again, it's just, you know, there's, if it's only me, there's no way to, you know, there's no way to implement all those.

40:50 Yeah. They all do sound fun, but they all, it's one of those things where you want to ask,

40:55 could you just make this small change and let's focus it down, not just from a whole application,

41:00 but let's just take it down to a little library, right? Some open source library you got.

41:04 Could you just add this overload or this default value to this function?

41:08 Or could you just add one other function that does something slightly different?

41:11 It's probably only three lines to write. Please do that. Why won't you do that?

41:15 Well, because now I got to go write a bunch of tests and then I've got to go rewrite the documentation.

41:20 And then there was that screenshot that showed the output, but now the output is different.

41:24 So I've got to go regenerate screenshots for all these things. And then I've got to rewrite the

41:27 tutorial because now this would be an alternative way. And there's just this three lines blows up into

41:33 a week long experience, right? It's as it's super hard to see those knock on effects.

41:38 Yeah. It's like, you could have this whole hour, just you talking to yourself. Cause I mean,

41:42 there's like, you basically said all these things way better than I could, right? Yeah.

41:46 That's right. That, you know, these things just, just keep piling up. And,

41:50 I think that goes with the focus, right? Like if you're really focused on providing one thing and

41:54 an anecdote here is that, you know, I don't have any flexibility in how the diagrams are. I'm sure

41:58 you've run into this too. You're like, I really wish this diagram was drawn in a slightly different

42:01 way. And it's just like, the list go across and then stuff like behind.

42:06 So it's just like, yeah, too bad. Right. Cause, cause you know, to make it more flexible,

42:09 it's just a lot more work. And in a way, you know, I kind of view this tool as because it's

42:13 kind of pretty stable. I just said, you know, if as an instructor, you want to work around it,

42:17 you just basically would explain it. It's like, oh, this is a tool, you know, it works,

42:20 but just be careful that, you know, this thing actually should point backwards or whatever.

42:25 And early on, I actually, I have a little anecdote here. I actually did work with a few professors

42:29 way in the early days, right? And I actually customized several versions,

42:33 some version they wanted the pointer drawn this way. And then I had a few versions for

42:36 different classes just because these are my friends and colleagues. Right. And also it was

42:40 early on. So I wanted to help them out. But after a while, I'm like, all right, there's no way I can

42:44 do this for everybody. Right. Like, so this is a canonical graphic, the visualization, take it or leave

42:49 it. Yeah. Yeah. I think it's fine. You know, if you need a better picture, use a whiteboard or

42:54 manually draw it yourself. Yeah. That's what I mean. Yeah. Draw it on either on a chalkboard,

42:59 a whiteboard, like some sort of digital equivalent, right? You could, here's the picture I want to draw. And this is actually how it should look, right? I think that's

43:07 probably just fine. Yep. Nice. The other, the next one up in the article was that you

43:12 resist talking to users. Yeah. Yeah. That's a very, very blunt way to put it. Yeah. So like, you know, in the early days, you know, it was great

43:21 because I had my email address on the site and it was very helpful to talk to users and get bug reports

43:26 and feature requests because that was how I was able to iterate on the tool so well. So I'm very grateful

43:30 for the early users. I mean, this is like, you know, even though I put it so bluntly,

43:33 I put an asterisk, I'm like in the beginning, it was awesome. But then after, again, after the tool

43:37 stabilized after a few years, there wasn't anything obvious I wanted to add. Then, you know, the bug

43:42 reports just were corner cases or things that have already been said before, or like, could you do this?

43:47 I'm like, no, you know, this is way too complicated. So then instead what I do is I have a very

43:51 comprehensive kind of FAQ slash unsupportive features page. And I list out basically anything

43:57 people would ask, it kind of listed out. And if people actually ask something that's unusual,

44:01 I would add it to the, you know, the FAQ. And that seems to work reasonably well.

44:05 Yeah. That's really nice. Instead of just answering it privately over email, which is frustrating,

44:09 find a way to answer it in a public permanent form. So it can just stay. You could just either

44:15 just have an autoresponder that says, first, you need to look here, and then you can email me or

44:20 something like that, right? Yeah. And you're basically now where there's an error message on

44:25 the site for whatever reason, right? Either it's a user triggered error or just like the server went

44:29 down or something. I just put a little link to the page. They can just go read the page. Whereas

44:33 before I put my email address, so then obviously my email address, people would just email me a lot.

44:37 So again, this is like the thing with design, right? Like if you want to create less work for

44:41 yourself, then, you know, make yourself less available. Right, right. Absolutely. So another

44:45 one that you decided to do is not to go and explicitly try to do marketing to promote the thing,

44:52 but more somehow you just sort of grew organically in the MOOC era and have been going strong since then.

44:58 Yeah. And like, you know, I'm a really big fan of following, you know, a lot of these open source

45:02 conferences, you know, PyCon and others and watching their videos and, and seeing how people,

45:07 you know, promote and market and spread the word about the open source project. But again,

45:12 it's just a time thing, right? It's like, I have to spend my time giving talks and traveling on much

45:17 more academic, you know, and research conferences and stuff. It's actually something I would love to do

45:21 in the future. You know, when I have a bit more kind of freedom and time to, I would love to explore

45:26 this world of non-academic conferences because I, cause you know, it's like I've been to many

45:31 academic conferences, all pretty similar, right? It's all the stereotypes that you hear about.

45:34 And I would love to participate in things like PyCon and, you know, OzCon and all these things,

45:40 but it's just, again, it's just a priority for me at this point that I haven't, you know, really

45:43 prioritized it. Sure. Yeah. Trade-offs. The other one that you mentioned that we have covered fairly

45:48 deeply is keeping everything stateless. Yeah. So just briefly on that is that I don't have,

45:53 basically I don't have a persistent database. Yeah. I guess by, by stateless, I'd be, there's

45:57 no persistent data store. Right, right, right. Right. Sure. There might be in memory chat logs

46:02 to sync that up, but that's not the same as I need to migrate Postgres to the latest version

46:07 or had a failure. So we had to go over to the backup cluster. Like that's not a problem you worry about.

46:11 Yeah. There's, there's none of that. I mean, basically it's, I have, you know, I just reboot the

46:16 servers periodically if something, it's pretty bad. I mean, it's like, there's some weird memory leak

46:20 issues because maybe with Docker, maybe with something else. I don't know. That's the point,

46:23 right? I don't actually know. I don't bother debugs. I just have some cron job that just

46:28 checks my memory usage. And if the memory usage starts spiking too high for too long, I just reboot

46:32 the server and it's all, you know, and I think I have a few servers. So it kind of, you know,

46:37 load balances. So like, it seems to work like pretty well. And at worst, you know, it's a free thing.

46:42 People will just go try again a minute later and it works. It's been holding up for a few years.

46:46 This is not a way to run, you know, DevOps or sysbit at all, but it works.

46:50 Yeah. No, look, practicality definitely beats the purity of absolutely. You're saying this as like,

46:57 I'm just this guy. I got to just keep it working and I can't be debugging this weird Docker issue.

47:03 So you know what? Just forget it. We're just going to reboot it every now and then.

47:06 If you were a real company, you would definitely not do this. But on the other hand,

47:11 there's a really cool article from Instagram's engineering blog called dismissing Python garbage

47:16 collection at Instagram. And they, you can go import GC and say GC.disable in Python and it'll

47:21 turn off the generational garbage collector that catches the cycles. But because most of the stuff

47:25 is caught on reference counting, you can actually live for a long time. So they ended up doing that

47:29 in production and saving tons of memory usage because they got better memory sharing across like

47:34 the forked out processes. Yeah. Yeah. And then they just reboot it because eventually you got a bunch of

47:39 cycles. You got to get rid of it. So they just recycle the process. That's great. Yeah. So the

47:43 same thing, right? Like it's, it's not as crazy as it sounds, I guess. Yeah. And this is like,

47:49 that's great. That's, you know, that's an engineering hack in the, you know, in the truest sense of the

47:53 word, right? That it's like a simple and, you know, kludgy solution, but I'm sure it saves them all sorts

47:58 of time and money, you know, both the, you know, you can imagine calculating literally the savings of

48:02 money in the data centers from that efficiency and also the savings of money and paying engineers to

48:06 debug and maintain all that. You know, if they had implemented a more complicated custom memory

48:11 allocator scheme, it just takes all this money for highly paid engineers to maintain all that.

48:16 Right. It's like, no, we'll just reboot it. Yeah, exactly. It's, it's so weird that they get

48:20 actually better performance by just letting it leak memory, but apparently under the right use case,

48:25 they definitely do. So yeah, that's great. Yeah. Yeah. Another, maybe put two, a couple of things

48:30 together, like kind of in that bucket are that I'm not super worried about performance reliability

48:34 and not, we already mentioned, not super dedicated to staying up on the latest version of the hot new

48:42 web framework or whatever. Yeah. And actually, you know, the, the ironically, you know, me having a very

48:47 stable sort of setup is kind of good for reliability in a sense, because, you know, if I try to change

48:53 anything around, it might fail in some weird way. If I'm always upgrading the latest libraries or latest

48:57 framework, there might be some weird memory leak that's undiagnosed. Right. But if I stick with,

49:01 you know, super old, you know, sort of, you know, kind of a lamp stack, you know, super old setups,

49:06 those things are pretty, you know, patchy and those things are pretty well debugged and fairly stable.

49:10 But on the other hand, you know, I'm not trying to squeeze every ounce of performance out, right,

49:14 that it works well enough. Sometimes you have to wait a little longer if the server is busy or you have

49:18 to retry, but you know, this is not a, you know, wall street, you know, high speed trading or something.

49:24 Yeah. Maybe some, some large lecture is just finished. And, or they said, everybody open up

49:29 your laptop and try this. It's like all thousand of you go here now and try this. Maybe, maybe you run

49:34 into it. But my experience was it was super fast and it was totally fine. I had no latency issues. And I,

49:39 I think that's interesting because I feel like so much the trade-offs that we consider,

49:44 imagine a world where we're so successful that we can barely stand it. You know what I mean? Like,

49:50 there's, what if we're featured on the front page of Forbes or we're stuck to the top of hacker news for

49:57 like three weeks or, you know, product hunt and just the people come and they just crush it.

50:03 I think people really don't appreciate a lot of people, not everyone. A lot of people don't

50:08 really appreciate how much traffic, just simple pyramid flash Django on a $10 server can handle. I

50:14 mean, my server, we get millions of requests. We do like 15 terabytes of data traffic exchange. It's

50:20 ridiculous. And it's like the CPU usage is 5%. You know, it's, it's nothing.

50:24 Yeah. I think that, I mean, this goes back to the point earlier that we've mentioned, you know,

50:28 these blog posts about, you know, don't worry about designing for scale up front, right? This is,

50:31 I mean, it's a classic case, the modern instance of premature optimization, right? It's like,

50:35 you know, I think a lot of this premature optimization is, you know, as software developers,

50:40 you know, we often like to try on new technologies and try to, you know, they're intellectually

50:43 interesting, right? Like, Oh, if you can hook this up and this up, it's intellectually interesting.

50:47 And it's, I feel personally, it's sort of like procrastination from thinking about your product,

50:52 right? So it's like the hard thing when you're building either open source or a product or

50:56 anything is the actual product and talking to users and the, the real core of what you really,

51:00 well, whether you're building something people really want, right? Because then you have to

51:03 actually talk to people and face criticism, but right. Documentation or tutorials or other boring

51:08 stuff in your mind. That's not like cutting edge. Yeah. Yeah. But if you just yak shave, I'm building

51:12 the best tech stack, then, you know, no one's going to tell you no, but then, you know, you might

51:17 over-engineer that. So for me, you know, it's not that I'm some, you know, engineering genius. It's just,

51:21 I didn't have time to do any of this. I just stuck with, I stuck with whatever stuff was available 10 years ago,

51:25 right? So that's another thing that people ask like, Oh, you know, do you use all these things?

51:28 Like they didn't exist 10 years ago. So of course I didn't use it. I haven't really upgraded at all.

51:33 Yeah. Yeah. Super interesting. The other last two, I guess, kind of fit together as well as

51:38 the code is available on GitHub and I'll link to it, but you don't make it super easy for people to work

51:43 on and you don't have a lot of contributors. Yeah. Yeah. So this is kind of like the last part,

51:47 like, you know, when people talk about open source, another assumption that people have is that

51:52 open source projects have a community of both users and contributors, right? Like you think

51:56 about these projects with contributors and, you know, GitHub pull requests and issues and all this

52:01 very vibrant thing of open source. And for me, it's like, it's open source in the strictest sense is

52:06 that the source code is open and there is a, you know, a open source valid open source license on the

52:11 code. So you can use it, you can put in your products, whatever you want, according to the licenses.

52:16 It's not quote unquote open source in that I don't foster a community contributor. So I don't,

52:20 I explicitly don't spend time on documenting the code or telling people really into the instructions

52:26 for how to install it or run on their own servers or under different edge cases and stuff. And also,

52:31 I don't really solicit contributions, right? That anyone can fork the code and, you know, use it.

52:36 I'm sure people use it in all sorts of ways I don't even know. And that's the great part about it,

52:40 but I don't personally have time to, you know, merge the contributions and manage all of this

52:45 complexity around, you know, when you go from one developer myself to anybody who's more than one,

52:50 you're dealing with a team project you have to manage. And I just didn't want to deal with that.

52:53 Yeah. And it's another one of these knock on effects of, well, it would be great to have people

52:57 to contribute, but then your code has to be a little bit higher quality so that it's easier for

53:02 them to do so. Oh, and then also you got to make sure that you have proper test coverage

53:06 completely across the board. Cause if they contribute something and the Travis CI automation

53:11 says it passes, well, is it really broken? Now you got to go test it. I mean, like here we are again,

53:16 going down this rat hole, which if that's the direction, right? Like it's about where you want

53:20 to go. Like I said at the beginning, if that's the way you want to go, you definitely want to do that.

53:23 But if that's not the way you want to go, then maybe that's not the right thing.

53:27 Yeah, totally. And yeah, I think you summarize it really. It's funny because every one of these

53:32 points, you summarize the examples better than I could have.

53:35 Perfect. Well, I'm happy to do it. You wrote a good article that I read through and thought about,

53:39 I guess an interesting term that I've heard, I heard this from Scott Henselman first, but maybe he heard

53:45 it somewhere along the way. I don't know the original attribution, but I've heard of this type of project,

53:50 at least the way you described it at the end as source open instead of open source. It's like

53:56 the source is there, but it's not sort of participating in the whole PR flow.

54:01 That said, you do have people who have contributed, right? Like the Java run,

54:06 the Java visualizers was done by other people and so on. So it's not that nobody contributes. It's more

54:10 that you're not fostering it. So I think it probably counts as open source and maybe the little asterisk

54:16 like limited for special cases.

54:18 Yeah, it's very limited. And like the Java one is a great example because it was actually done by a

54:23 professor who taught in Java. So like funny to admit, I haven't done Java since I like the beginning of

54:29 college in one class because I've just never worked in Java. I just never worked in Java.

54:33 Let's go use something else.

54:34 Yeah. I mean, I just never happened. That just was never the world I worked in. I didn't work in,

54:38 you know, enterprise apps in the 2000s or widgets or applets or whatever. So he taught in Java and he

54:45 actually made a Java extension. He actually hosted it on his own site for a while. And then after a while,

54:50 I actually merged his thing into the main thing because it was well-contained, right? It's self-contained.

54:54 It was its own thing. It just interfaces my visualizer and it works great. It works great.

54:58 But the thing is, if anyone reports a bug in it, I'm just like, I have no idea what to do,

55:01 you know, because this guy has moved on to something else too. But I like that.

55:04 Software is as is.

55:06 It's as is. It's literally as is. I don't actually know Java. I don't know how to debug it. You know,

55:10 it's running some old version of Java. It's, you know, it works and I don't want to touch it.

55:13 All right. Yeah. It's the version of Java from six years ago, but that's what it's going to be.

55:18 It's all good. Nice. So at the end of the article, you said it's a bit of a fluke that Python Tutor is

55:24 doing so well that you've, you know, all these millions of users who have visited it and benefit

55:29 from it and so on. And the reason it's a bit of a fluke is your day job doesn't incentivize it.

55:34 So let me ask you, what do you think? Do you think it should? I mean, I did mention the Journal

55:39 of Open Source Software as a way to kind of like shoehorn it in a tad. But what are your thoughts on

55:44 things like this and academics and should academics get credit for these kinds of creations? I mean,

55:50 it's clearly a value to the world if 10 million people are understanding code better through it.

55:55 Yeah, that's a great, that's a great question. I mean, that could be its own hour. And I obviously

55:59 get that question a lot. I mean, the question is really about currently academic path, you know,

56:03 kind of research studies and publications and grants, all the more traditional metrics are the main

56:08 thing. And, and I think that things are, you know, broadening out in some fields, right? And, you know,

56:14 in fields like my own and HCI, human computer interaction, user experience, a lot of our things

56:19 are open source products that you write papers on, you do studies on, it's all very symbiotic.

56:24 I think my kind of most clear stance on this is that, you know, research is really about sharing

56:29 generalizable knowledge with the world, right? So like, an example is, if I made Python Tutor,

56:34 I just stuck the code on GitHub and said, here's the link, I think that by itself is not really any

56:38 much generalizable knowledge. It's just a bunch of stuff. But like you mentioned, if I wrote about

56:43 in detail, I showed how it works for users, if I have some general design ideas, you know, maybe these

56:49 blog posts and these things, I think that there is a way to work that into a more scholarly sort of

56:54 portfolio. And I see that in now that I'm in, you know, in a, in more disciplinary field, you know,

56:59 we have colleagues in, you know, in visual arts, right? So in UC San Diego, we have a great visual arts

57:03 department, you know, what is their portfolio, right? Their portfolio is, yeah, they write some

57:07 papers, they write some scholarly works, analysis, history, but a lot of the portfolio is my art

57:12 pieces and my film piece. If you're a film professor or music professor, you're doing performances and

57:17 stuff. So I think that the world of science is broadening out its definition more. And I,

57:22 I definitely do see that broadening in the future, but you know, universities and academia is a very kind

57:28 of traditional and slow moving sort of institution.

57:30 It's a big shift to turn. Yeah, that's right.

57:32 So for the time being, you know, I still think that, you know, like the ending of my

57:36 article, you know, I have a lot of students, especially, you know, very programming oriented

57:40 students are like, Oh, you know, I want to be able to build software and stuff. And then I,

57:43 I actually tell me what's your goal. If your goal is to be working in industry or working open source,

57:46 that's great. But if your goal is still to build up a more traditional academic career,

57:50 at least for the time being, the more traditional research and scientific studies are the way to go.

57:56 And if you can do your science in a way that can foster open science and open source, which,

58:01 you know, some people have done very well, you know, including some people we've interviewed,

58:04 that's probably the most sustainable way, right? That you do your work in the open,

58:08 but you also kind of adhere to more traditional scholarship as well. It's a lot of work, right?

58:12 But it, you know, that that's the most practical way now.

58:15 Yeah, it does seem like it's shifting a little bit, but yeah, I definitely hear where you're

58:20 coming from. I think some of the drivers are, Hey, we have this great paper about,

58:24 we just took the first picture of black holes, but Oh, by the way, we can't actually take the

58:29 pictures. We have to use artificial intelligence to interpret a bunch of different things to actually

58:33 compile the picture. And how can you possibly write your article without somehow talking about

58:40 what you built? There's probably a component that can be extracted out. I think as people are moving

58:44 away from things like MATLAB and other proprietary, you know, SAS and whatever proprietary systems,

58:49 they're starting to get into open source and it just draws them into building stuff that helps do

58:53 their research. And I think that that's eventually that's going to put enough pressure to turn the

58:58 ship a little. Yeah, I think so. And I think that people at the more senior levels can be,

59:03 can have more flexibility and in doing these sorts of things and advocating for that. And I think a lot

59:08 of this stuff works on the top down too. I already we've seen, you know, then the top down view is,

59:12 you know, if the funding agency, you know, if the funding agencies say that you must put your code

59:16 open, your data open and stuff, and which some funding, you know, the NIH and NSF, they're starting

59:21 to do that. I think those are great efforts because that at least gets people to think about, Oh, we

59:24 might need to like clean up our data and our code so that other people can use it. And eventually those,

59:29 you know, the stuff will percolate downward.

59:31 Yeah, I agree. And the reproducibility aspect that's becoming a bigger focus, it's always been

59:36 super important, but now it's easy to test whether it's reproducible. So, you know, I think that also

59:41 leads to more of the code being open and whatnot.

59:43 Yeah, exactly.

59:44 All right. Well, this has been a super interesting conversation, but you've got to answer the two

59:49 questions before you get out of here. So if you're going to write some Python code, what editor do you

59:54 use?

59:54 I still use Vim because I learned it in, you know, in grad school and I still use it, even though,

59:59 you know, everybody's on to Visual Studio Code by now. You know, again, just like with the old

01:00:04 technologies, right? I've passed several generations, right? There was, you know, Atom was big, Sublime

01:00:09 was big, and VS Code is now, you know, obviously really big. I still use Vim and that's just what I do.

01:00:15 We talk about these threshold concepts. There's probably some kind of theory about editors as well.

01:00:19 Yeah.

01:00:19 Well, interesting. And then notable PyPI package, something that That's really funny. Yeah. I was trying to prepare for this, you know, last time I gave

01:00:28 this, you know, cop out answer, I'll just Anaconda or, you know, you know, just this package manager

01:00:32 and stuff. It's hilarious because I actually don't write a lot of Python code anymore, right? Because

01:00:36 you know, this Python tutor, it's all JavaScript, it's all web code.

01:00:38 That's right, that you've got to write the front end stuff. Like that's where all the magic is.

01:00:43 And it's not.

01:00:44 That's right. Yeah.

01:00:45 Ironically, I'll plug one thing. I was actually scrambling because I knew you were going to ask that

01:00:49 question, Dan. I was scrambling to look at this Python kind of wrote for something else. This code that I

01:00:53 wrote to, you know, inventory my file system. So I've been basically, you know, doing a lot of stuff

01:00:57 with basically kind of building my own personal archiving and R syncing Dropbox and kind of thing.

01:01:04 And one of the problems is how do you just crawl through a directory hierarchy super fast when you

01:01:09 have, you know, a million files. And I've actually found this is a plug for upgrading Python 3. I found

01:01:15 that this is not a package, but you know, the OS, the built in OS library, OS dot walk, and OS dot walk,

01:01:22 I believe is this thing that's only in Python, I think three, five or above. And it's super fast,

01:01:28 at least on the Mac. And I think, you know, it's because it probably uses some system calls or stuff.

01:01:32 And it's notably because I had, you know, a Python 2.7 version before using a scanner or some other

01:01:38 thing, right? And like, if you upgrade to Python 3.3.5 and use this, this function, you know, OS walk,

01:01:45 it just goes like, it's just like thousands of times faster. And this is, you know, we people talk about

01:01:49 this a lot, right? And like, that's one of the things that gets people to upgrade is that, you know,

01:01:54 if, if something just is slow, and there's a new version of the API, and a lot of things in the new,

01:01:59 in new Python standard library, in the standard library, I guess, are either drop and replacements,

01:02:04 or a slightly different API, but it just goes so much faster. I just encourage people to look at

01:02:09 that. So this is not a PyPI packet. It's just really just...

01:02:11 No, that counts. Yeah, like a useful library.

01:02:12 Yeah, just relooking at the standard library for, you know, things that they optimize. And,

01:02:17 and yeah, that was the first thing that came to my mind, because I was just working on that code.

01:02:22 Very nice. Very nice. All right, final call to action. People are either teachers or their students,

01:02:28 they're hearing about Python Tutor, they think it's maybe useful. What do you tell them?

01:02:32 I tell them that just go go try out the the site pythontutor.com. And to go participate in the

01:02:38 community of just, you know, helping each other out and, you know, asking for help and such. And it's

01:02:43 really community driven. So any help I can get on the site would be great. And also, if they want to

01:02:49 integrate it when they're teaching materials and stuff. And you know, despite what I say in my article,

01:02:53 I actually really do like hearing from instructors and students in the piles of emails that I get.

01:02:57 So I love hearing about how people are using it. So you know, if you have good user stories, or interesting

01:03:03 stories, I always write, you know, despite again, in my article, I say, Oh, I don't listen to users. But I actually

01:03:08 do listen to all the users, I actually write down all of their notes in the GitHub repo as notes of like,

01:03:13 these are cool future directions, I don't think I have time to do this. But these are awesome suggestions.

01:03:17 So I would just write that down. So it's once you get tenure, or you want to take you have a

01:03:22 sabbatical, and you want to come back and spend some time on it, you can actually harness all that

01:03:26 feedback. That's right. Yeah. I mean, that's just that's always a dream of like, Oh, someday I'll

01:03:30 get to make some of that. And then you just end up getting more busy over time. Because everyone

01:03:34 does. That's just the reality of life. That's how life is. All right, Philip, it's been great to chat

01:03:39 with you. As always. Thanks for being here. Awesome. Thank you so much, Michael. Thank you so much again.

01:03:43 You bet. Bye. This has been another episode of Talk Python to me. Our guest on this episode was

01:03:48 Philip Guo. And it's been brought to you by Tidelift and Clubhouse. If you run an open source

01:03:53 project, Tidelift wants to help you get paid for keeping it going strong. Just visit talkpython.fm

01:03:59 slash Tidelift, search for your package and get started today. Clubhouse is a fast and enjoyable

01:04:05 project management platform that breaks down silos and brings teams together to ship value, not features.

01:04:10 Fall in love with project planning. Visit talkpython.fm/clubhouse.

01:04:15 Want to level up your Python? If you're just getting started, try my Python jumpstart by building 10

01:04:21 apps course. Or if you're looking for something more advanced, check out our new async course that

01:04:27 digs into all the different types of async programming you can do in Python. And of course,

01:04:31 if you're interested in more than one of these, be sure to check out our everything bundle. It's like

01:04:35 a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher

01:04:40 and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes,

01:04:45 the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:04:51 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

01:04:56 Now get out there and write some Python code.

01:04:58 I'll see you next time.

01:05:18 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon