Learn Python with Talk Python's 270 hours of courses

#22: CPython Internals and Learning Python with pythontutor.com Transcript

Recorded on Monday, Aug 3, 2015.

00:00 It's time to look deep within the machine and understand what *really* happens when your Python code executes. We're code-walking through the CPython code and visualizing it at pythontutor.com.

00:00 This is episode #22 with Philip Guo, recorded Monday, August 3rd 2015.

00:00 [music]

00:00 Welcome to Talk Python to Me. A weekly podcast on Python- the language, the libraries, the ecosystem, and the personalities.

00:00 This is your host, Michael Kennedy. Follow me on twitter where I'm @mkennedy

00:00 Keep up with the show and listen to past episodes at talkpython.fm and follow us on twitter where we are @talkpython.

00:00 This episode is brought to you by Hired and Codeship. Thank them for supporting the show on twitter via @hired_hq and @codeship.

00:00 Let me introduce Philip.

00:00 Philip Guo is an assistant professor of computer science at the University of Rochester in New York. He researches human-computer interaction (HCI), with a focus on user interfaces for online learning. He is especially interested in studying how to better train software engineers and data scientists. He created a free Web-based visualization tool for learning programming called “Online Python Tutor” (pythontutor.com), which has been used by over 1.2 million people in over 165 countries to visualize over 11 million pieces of code

01:45 Philip, welcome to the show.

01:47 My pleasure.

01:48 Yeah, it's really exciting to have you here. We are going to talk a lot about many things, we are going to talk about CPython and a really cool project that you put on your website and on YouTube, called "CPython a ten hour code walk". And se we'll be digging into CPython and we are also going to talk about this thing called "Python tutor" at pythontutor.com that you are working to help people understand the internals of Python better. So that's going to be great stuff.

02:14 Cool, I'm looking forward to it.

02:16 Yeah, before we get into the details, everyone likes to know how people got into programming and how they got started in Python- what's your story?

02:24 So my story- I was always interested in computers as a kid, like many people who got into computer science, but I never really had a strong programming background until I went to college. So I tried to learn QBasic by myself when I was ten, and how to book, and I failed after a few weeks, I had no one teaching me. I took an AP computer science course in high school, that was in C++, and that was really fun. And that was kind of my first introduction of really doing programming. And, in college, I decided to major in Electrical Engineering and Computer Science.

02:57 And that's when I started just learning programming formally, but really the Python relevance is I didn't actually start hacking for fun, until my senior year of college, and the first language that I learned for programming for fun and not just because I had to do it for class, was actually Python. So, the first kinds of programs I wrote were scripts to manage my photos and kind of manipulate and manage my own personal photo gallery, and put it up on a simple website. So that was where I got started getting hooked on Python, that was ten years ago, that was around 2005, that was like Python 2.4 or something like that.

03:36 Yeah, that's a great way to get started. I think a lot of people have interesting stories like that, you know just they have some small problem they are trying to solve, and it leads you down this path and all of a sudden you discover this world where hey there is this great thing, programming in Python or whatever.

03:53 Yeah. That's exactly-

03:55 So, I see you are calling in from Seattle, what are you doing up there?

03:59 So I am currently an assistant professor of computer science at the university of Rochester New York so that's nowhere near Seattle.

04:07 That's what I was going to say, you are not, it's not at all in Seattle.

04:09 So one of the real benefits of being a professor is that your summers are free to do research or to travel or to do other sorts of scholarly work, so most professors at most terms they stay on campus on the summers and they do research full time, for three months. What I decided to do this summer, since I had some colleagues at Microsoft, was to spend most of my summer at Microsoft research doing research in both, the software engineering and in online education at the lab in Seattle. And I came here because I actually was an intern here, a long time ago when I was back in grad school, so I am actually back interning in the same group, it's sort of a homecoming.

04:52 Back to the future. That's excellent, yeah. I've done some work with some of the guys up in Microsoft, it's a cool place up there, excellent. Is this related to Python tutor?

05:00 No, not really, this is just a completely separate sort of research project so there is nothing Python related in the work here.

05:10 All right, cool. So let's talk about your CPython internals class; this was the class you did at university of Rochester, right? 2014, at least that was a recorded version, 2014.

05:25 Yes, so this was a class I taught in fall 2014, and it was- the name of the course was "Dynamic languages and software development" so I actually inherited this course from another professor who was taking a leave and teaching another class. And that class was originally in Ruby so it was sort of a graduate level programming language class about these sorts of dynamically typed languages and originally he did in Ruby but since I knew Python a lot better, I revamped the class to be in Python, and basically turned it into what the videos are online. So I am happy to talk about that in detail.

06:01 Just for everyone listening, the videos are online and I actually spent like the last week going to your class, so I feel like I've had like some super intense summer course or something, you know, doing like ten lectures. And people are going to find those on your website, at pgbovine.net/cpython-internals.htm and I actually went through unrelated to this conversation or maybe preceding this whole having you on the show, I just saw your videos and I thought they are awesome, and I put them into a YouTube playlist at bitly/cpythonwalk. So both of those work well.

06:01 What was the main goal of the class, sort of get people to understand what happens when you actually run dynamic code like Python?

06:45 Yeah, I think that was basically the philosophy. So a lot of programming languages classes are taught from more of a theoretical perspective, right. So, it's usually kind of some formal syntax and semantics, and maybe doing some proofs and it's very kind of formalism heavy. And I thought it would be interesting to do a very different sort of class for graduate students, from the opposite side, which is something extremely applied to say you know, "Here is a piece of Python code, let's starts with 'hello world' or a simple for loop or simple function call" and what actually happens throughout all the steps between that code being parsed and then the output appearing on your screen, let's say.

07:29 So, I wanted to dive into the interpreter and show students how everything worked under the hood and how there's really by deconstructing it you can show there is really no magic here, there is just a lot of C code, behind the scenes that keeps track of a lot of stuff and eventually your program runs. So we don't do the parsing stage, because any parsing is fairly standard, and that's covered by most kind of introductory compilers classes you write grammar and parser generator and some code that gives you like an AST, and then that gets walked to turning to some kind of bytecode. So, the class actually starts with assuming you have a bunch of Python bytecode, how does the bytecode actually get interpreted step by step by the byte interpreter runtime system, to do your programs operations.

08:17 Yeah, that's really cool, and I think you know, if I think about how like C code runs and then my intuition about how that C code actually executes, if you understand a little bit about registers and memory addresses and pointers, your intuition more or less will carry the day, I think. With interpreted languages, all bets are off, right, I mean, you have some concept of the programming language doing things, but then the way that happens, you really have to look inside, right?

08:46 Yeah, exactly, because these interpreted languages are often not implemented like you would conceptually think of it, you think of something as you have frames and variables and pointers each other, but really these bytecodes are the sort of, the Python one is sort of this stack base, it's kind of virtual machine. I think the Java virtual machines are too, but I forgot, the exact semantics, but it's not something that you would think about normally, but they do it that way one- because it's really compact, and it kind of leads to really compact code and sort of easy understand code for the implementer. But yeah, that's very different than the conceptual model in your head at the very high level how a program ought to work, and we can talk about that later when we talk about Python tutor as well, because that kind of leads into that other tool. So we can keep talking about the CPython stuff first.

09:32 Sure, so one of the things I thought was interesting was in your very first session, you have kind of a cool whiteboarding thing, you are doing with a microsoft surface, and like a pen where you can kind of draw and that's cool. You do a cool little sketch about what actually happens when you type Python [space] some file.py. And, I mean, on one level I knew it and on the other it was a little surprise for me to say, in the first step is compilation. Could you maybe like talk just briefly about what happens when I run my Python code, before we get into the interpreter itself?

10:06 Yes, so many people are surprised when there is a compilation step in Python, or in these sorts of dynamic or what people call scripting languages. Because usually you think of running Python [space] whatever, Perl [space] whatever, Ruby [space] whatever and it just runs-

10:21 I just thought, here we go with the interpreter and now it's interpreting, right.

10:27 Right, so with Java or C or CSharp you have a compilation step and then you run a compiled binary, and there's two separate steps, but, with Python as with many other languages, the compilation happens before the execution, so what happens is as a standard kind of a front end compiler it takes the source code and it does the lexical analysis, it does the parsing, it creates AST or abstract syntax tree from that, and then it walks that tree and creates a bunch of bytecode.

10:56 So the Python bytecode language you can read in documentation it has a few dozen operations like add, load, store, and also some operations that are a little bit more Python specific like build the list, build the dictionary, function call, those sorts of things, so the compilation step really takes your source code which is in human readable somewhat human readable form and turns it into a linear stream of instructions, very much like assembly language except that you can think of bytecode as an assembly language for a Python virtual computer.

11:35 Right, that was kind of the impression I got as well. I got much, much richer assembly language where you have operations like build class and call method, push and pop stuff of stacks and so on...

11:47 Yep, exactly.

11:49 If we want to go work with this, right, we can go to python.org and download the code and decompress it or untard or whatever, and it's literally is a bunch of C code, right, the C and CPython is here is your C implementation of this interpreter, right?

12:04 That's right, so if you go to- this is what I do on the first day of class, we have everybody download the C interpreter source code from, sorry the CPython source code from python.org and unzip it and do configure and make. Now, part of the class, I didn't require students to actually run the interpreter, if they didn't want because most of the class was actually reading through the code and walking through it. Now, the students who are a bit more adventurous, they could try to compile the interpreter themselves and then try to put in debug statements or print statements, to see how it works behind the scenes.

12:40 But, actually, compile interpreter itself might not be easy if you are on say especially say on Windows machine which doesn't have a lot of the development tools compiler, usually on Linux and Mac machines if you install the standard developer toolchain with Gcc and make and configure and all that stuff. Building is always hard, but in theory, if you do ./configure and then you type make, you actually call the C compiler on your machine and it will compile all the C files, the C and H files in the CPython/directory, and in the end it will produce a binary executable file called Python, and that Python you can just run and that is the Python interpreter that you just compiled from C source code. So most of the class what we do is we go over what a lot of those C files actually do in C.

13:35 Maybe you could give us like a ten thousand foot view of what are the interesting parts of that source code and what is just noise and details. So there is like objects, and then there is include, there is like a few really common parts that you come back to over and over and then there is a bunch of details.

13:55 Yeah, so on the website with all the videos I actually show the files that they reference, but really the core file that I keep going back to like you are saying is in python/c eval.c. And what that is, that file, its core is the main interpreter. So conceptually how you execute, how python executes code is, a bytecode is just a bunch of, it's just a list of instructions, each one is add or subtract or build list or function call and so forth, and all the interpreter does, is just go through one instruction at a time, take it off the list of instructions, do something and then move to the next instruction, do something, move the instruction and then do something else. And in my jump around the stream of instructions if you have say a function call or a loop but really, the main interpreter loop in C eval.c all it does it's just a big while true infinite loop that just-

14:55 Yeah, there is like a huge switch statement and it is huge, right?

14:57 That's right. There's like a three thousand or whatever line switch statement. There is a fun fact in there, if you actually- I don't know if it's in all the versions, but at least in some of the versions I saw there is some kind of comment in there saying that they needed to like break up the switch statement in some weird way because some C compilers just can't take switch statements that big. So they had to actually break up the code in the pieces because you know, it wouldn't compile in some computers because that code was just too giant.

15:29 Yeah, that's pretty funny, it's like a 3000 line switch statement. It's pretty cool.

15:32 Yeah.

15:32 But it- those are more less the steps that are have all the app codes and so if I look at Python, it's not necessarily mapping one to one the Python code I write to these app codes, which is a good thing for Python programmers, right? That means you are working in the high level language, you are not working like down in the detail, right? But it also means it's hard for me to understand if I write, create a class and I say t=new test class, what does that actually mean, how do I line that up. And so you had a cool way to disassemble that, right?

15:32 [music]

15:32 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

15:32 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company. Typically, candidates receive 5 or more offers in just the first week and there are no obligations, ever.

15:32 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!

15:32 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.

15:32 [music]

17:19 Right. So the disassembler actually comes in the standard Python library, so if you do python [space]- m[space] tis, which runs the dis module, space the python file name, name of python file actually run the name function in the DIS module. What that will do is actually print out so much human readable representation of the bytecode, and the cool thing about that is that is shows the line number of which line of your python source code comes how into which bytecode and as you mentioned, it's not one to one mapping so one line usually compiles to several bytecodes because a bytecode is a lower level.

18:03 So you can run the DIS command and the DIS module you can just search for on your favorite search engine for python[space]dis you should see the documentation for this disassembler module and that is in the standard library and that gives you all of the stuff. So that said though, that only prints out the instructions, there's somebody who made a library called Byteplay, and that library actually is an enhance version of disassembler that lets you get the disassemble bytecode and the objects. You can actually play with it yourself, you can manipulate it, you can take it apart, you can analyze it. So this Byteplay library- I haven't used it myself personally but I know people who really like playing with it.

18:52 Yeah, that's cool. Little more powerful. One this about the dis module is it's super easy to look at just sort of flat code in Python files, but if I want to look in the functions, or I've got nested function and classes, it's a little more work to do that, right?

19:08 Yes. So the default with the dis module is it just disassembles the top level of your program. So, all the top level says is if you define a function it will just say function definition. And then what you have to do is you actually have to go inside that function and disassemble that function itself, so it is a little bit more hairy and I don't know if Byteplay handles all that out of the box but it might. But the main idea is that the dis module if you just run it by default it just disassemble the top level program and any functions will not be disassembled on that and you have to actually grab the code of those functions and go in there and call dis on that. So, it is a little bit more tricky to do that.

19:51 Sure. The other thing I thought was interesting is if I've got a function, let's say foo, in python I could say foo.func_bytecode- how do I- the bytecode is actually there, on the function and you could look it in its encoded form, which is kind of some binary string type thing, and you can also just assemble that as well, right?

20:16 That's right, and the idea is that dis itself if you just run it, it disassembles the bytecode of the I guess of the top level file, but each function itself has it's own code and what he said I think it's actually different in Python 2 and 3, the name of it, I think in one version it's like the function object. func_code the other one is just like just .code or something like that. But the idea is that the code of the function just appears inside of it as a binary string of data, so if you actually print it out it just looks like some garbled string, but if you run it through some function or through dis and it actually shows you the bytecode of the function all the function object, it's some contacts plus an actual string of bytecode that represents what the instructions are. But the function is supposed to execute when you run it.

21:14 Yeah, the other thing I thought was pretty cool is- or interesting to understand, is that sort of compile step that you talk about, right, when I run Python mypython file, first I get compile step to bytecode and then the dynamic interpreted execution, but all those function that stuff, that bytecode is there and ready to roll it's just not kind of wired together until it gets to the interpreter right?

21:39 That's right, so you can actually compile- I think the Python interpreter does the compiling and running all at the same time, but I think that there is actually a mode in Python that you can just compile the bytecode and not to actually run it yet. I'm not sure exactly- but sometimes you actually ship pretty compiled Python bytecode instead of the source codes. So there is- I don't know what reason people do this because you can just run the source code and some people like to obfuscate the bytecode maybe, but I don't know how all that actually works because you kind of reverse engineering. But yeah, the compile stuff is completely separate from the running step and like you said, once you compile it it's just a bunch of instead of you know text file, a .py file it's just a bunch of garbled stuff and then that garbled stuff you can just run through the interpreter and It'll run with your program.

22:41 Yeah, and it's really interesting to see how it's all coming together. What do you think some of the main reasons for studying Python at this level are, and how does it make you a better programmer? What do you think?

22:53 That's a good question. I think that studying Python this level, at the implementation level kind of makes you a better programmer in that, you kind of it builds a really good mental model of what goes on behind the scenes and you see that, these languages are just tools made by people. And there is something really powerful, I feel this is a various kind of system perspective of programming. So one analogy is that- why the people study say operating systems or study compilers, that's a good example, like they have the kind of classic thing in colleges a lot of people often take operating system course where they build very simple sort of a less kernel in C and maybe some assembly, and their kernel kind of run and it does a simple hello world, or you do a compiler's course where you build a compiler using some basic building blocks.

23:49 And the idea there is that it's not that you are going to ever build an operating system or a compiler real life or a new programming language; you are not going to implement a new kind of program language, but by studying the principles behind how it works, I feel like it makes you a better programmer and that you kind of understand how large complex code basis are organized and logically broken down. So I view this class like you've seen in this videos, as more of like a code reading or literature exercise in a way, because we are actually reading through dozens of- actually not that many, maybe a dozen really core complex files and see how they the pieces fit together. So it's sort of like dissecting kind of a large piece of code, I think that's really interesting.

24:39 Yeah. A lot of people in school at least studying this stuff, it's all very abstract, meaning that- it's not quite what I'm looking for but- like it doesn't have the nitty gritty details of the real world applied to this, all the air conditions that are so bizarre, and all the optimizations, you don't necessarily have to deal with that, and so when you do finally get to a real world complex code base, it's super hard to feel comfortable and I think you kind of helped your students do that a lot in there, so that was cool.

25:09 Yeah, there is always a trade off, so even in my choice what to cover in this class, if you noticed, I only covered maybe a dozen or so files, it's in the Python code base has hundreds or thousands of source code files. And obviously I don't have time to cover all that, and I feel like these dozen is really the conceptual core of the interpreter, a lot of the files are just modules, right, a lot of the files is just here is how strings are implemented, here is how the socket class is implemented, here is how memory is implemented- those are all I feel auxiliary things, but what is the core thing is you know- what is an object, what is a class, what is the function, what is the interpreter. And even, as you noticed, from watching the videos, I don't go over every single line in excruciating details, I basically glance over thing that, "look, this block happens if there is some kind of error you ran out of memory, so you know, look at that in your spare time". And there is a balance of exposing students to the nitty gritty like you said, but also not too nitty gritty because there is so much complexity in the code, so it's a balance.

26:22 Yeah, a lot of times as programmers we are to be effective, we have to kind of zoom in, look at the tree, zoom out look at the forest, zoom back in on another tree, and that scale of like in and out is pretty awesome.

26:22 We talked about the opcodes, and eval c that C function, or class that has main eval loop running around and around, that's one of the main architectural pieces of CPython. Another one was, that stroke me was everything is this type of C object called PyObject. Everything is a PyObject, right? Pretty much. So, numbers, strings, custom classes, those all kind of make sense but even the class definition itself, functions, methods, so... that was really interesting to me. And then we have derivatives of those, like things that have PyObject kind of as their base class, like Py int object for int, Py list object for lists and so on, but C is not an object oriented language, so how does that work?

27:30 Right. So, like you mentioned, the PyObject structure I guess in C is the base of how everything is implemented, all the objects are implemented in Python and what that contains is- that contains actually really few sorts of basic data, and I think the most basic one is a reference count of how many pointers are pointing to this object at once, because Python implements garbage collection by doing reference counting so if you have nobody pointing to you then you get garbage collected and your memory gets reclaimed. So, everything is conceptually a subclass, of PyObject, so if you want to make an integer object it's a PyIntObject, and if you want to make a string it's PyStringObject, if you want to make a function object, it's a PyFunctionObject... And, like you mentioned, C is not object oriented language, so there is no inheritance in the language, but really, you can fake it, by basically doing what's called structural inheritance or structural subtyping, what that really does is it's a hack where you basically create a struct where the first few elements of the struct are exactly the same as the base class. So basically the PyIntObject the first whatever-

28:49 What is it, type like the class there is no type and then the ref count like you are saying right?

28:55 That's right, so those are the two things in PyObject, that's right, so there is the pointer to the a tag saying what type it is and then there is the number of references. So every struct that represents some kind of Python classes starts with those two things, and the cool thing there is because if you have C code, that expects a PyObject's pointer and operates on it, it knows that the first thing it accesses in memory is the type and the second thing is the reference count. So all of your code will work perfectly fine if it's int object, or string object, if the function you are passing and do expects just a base class of PyObject. So basically conceptually it's just subclassing or subtyping but that's how it ends up being implemented in C. And actually how C++ does subtyping I think, and its most basic form is basically that, because C++ is meant to be compiled to be somewhat backward compatible with C. So, this idea of piling another class on top of another one structurally with the fields in the same places is a pretty classic technique.

30:10 Yeah, you kind of have to really understand C pointers, pretty well to get it, but once you do it it's pretty straight forward, right? Because when you say pointer and you dereference that pointer, you say name, that really just maps to like an offset from the base address as long as they have the same shape up to that point in terms of n memory, you basically have inheritance, right? That's cool.

30:10 [music]

30:10 This episode is brought to you by Codeship. Codeship has launched organizations, create teams, set permissions for specific team members and improve collaboration in your continuous delivery workflow. Maintains centralized control over your organization's projects and teams with Codeship's new organizations plan.

30:10 And, as Talk Python listeners you can save 20% off any premium plan for the next 3 months. Just use the code TALKPYTHON.

30:10 Check them out at codeship.com and tell them "thanks" for supporting the show on Twitter where they are at @codeship.

30:10 [music]

31:23 Yep, exactly. And that's another kind of side effect of studying the sort of studying implementation, most implementations are usually in C, so you get to kind of see these interesting C tricks and see how other languages are built on top of that. Like object oriented programming.

31:41 Yeah, it's cool. I definitely have a better appreciation for Macros after spending ten hours looking through that. I did a lot of C++ but not a lof pure C so, some of the tricks you might do differently in C++, almost have really nice macro solutions, so that's cool.

31:41 Your other projects Python tutor at pythontutor.com- what's the relationship to this, I mean, certainly Pythontutor.com helps you understand the sort of in memory what's happening inside your Python code so I kind of see these things as somewhat related, these two projects that you have. Maybe you could just introduce Python tutor for everyone and then we could talk a little bit about it.

32:25 Sure. So Python tutor, at pythontutor.com is a web based tool where you can write Python code, and actually, now you can write code in a lot of other languages. So you can write code; it supports Python, Java, Javascript, TYpeScript which is a Microsoft version of Javascript, which works very well and also Ruby. What you do is you write code in your browser, and you run it and it actually goes, it sends your code to a server to run in a sandbox, so it actually runs a real version of the language and not some kind of Javascript simulation of it. So it runs the code, it sends back the execution trace which is everything that happened when your code ran.

33:07 What did every step, when it printed out, what variables are, what data structures are... And then it produces a visualization for you that you can step through, so it produces a visualization of every step of the code execution, and then you can use the slider to go through it and see that the variables being created, the function stack frames being created, the pointers. And what that lets you do is that lets beginners especially build up a mental model of what is kind of going on inside of their program.

33:37 Because, even for code for experienced programmers we actually build up this model, we look at the piece of Python code and we think in our heads, "oh there is a variable here that's pointing something else here and that's pointing this other thing here and we call a function and that function points to the same thing we do". But those structures are really hard for beginners to build up in their heads and this tool has just been really helpful for a lot of people to build up that model.

34:02 And the relationship between that and the CPython stuff is actually very interesting, because CPython stuff is really for advance learners who want to learn how things really work behind the scenes, and like we mentioned earlier, the Python tutor is for most people I think, it's more useful because it's really what happens it draws the picture of what happens at the conceptual level. Conceptually all you want to think about is you run every line of code and something happens, you don't need to know that the bytecode or the stack, or the main interpreter loop or PyObject or everything. So I think those two are really complementary, one is for advanced kind of programmers who want to study internals and one is, a Python tutor is for beginners who are just learning the language.

34:46 Yeah. That's for sure, I kind of saw it the same way. I feel like you know, there is sort of this understanding of the thing that is CPython and Python tutor is a great way to help beginners kind of form good mental models and your CPython walk is really good, it actually shows super deep understanding but they kind of give like two perspectives of the same thing, so even though I've been doing Python for a long time, and I know C very well and C++ very well, I still thought that just looking at the stuff that was going on in Python tutor like it has some really great visualizations for showing basically like variable scope and things like that, because that can be kind of hard to understand for beginners. Those kind of things, right, because it's not just, well it's in the curly braces, and so when it leaves the curly braces this variable is gone, there is a whole different mechanism for finding what is to find where and so on...

35:37 Right. That' sright. And also, with the nested scopes and closures in Python, that gets even more tricky. So the Python tutor has a way of visualizing kind of your parent frames, for example the classic case is if you define a function within a function, that inner function has access to the outer functions, variables as well as the global variables, and it gets even trickier when you have a function foo and inside the foo you define bar and bar accesses something within foo, but then, foo returns bar to its color and foo is the stack of foo is gone, but when you call bar again, you can actually still get back to the variable s that foo had even though foo has finished executing. And the Python tutor and these sorts of tools visualize that for you. And it's been used by quite a few classes, especially to teach these things like nested functions, and closures which are not as obvious and are the more advanced concepts.

36:34 Yeah, I do professional like training for Python and other technologies as well and I was thinking I will probably pull that out when it gets to the scope stuff for students, just because I am teaching a lot of guys who have done C++ or .net or something like that, and just their mental model is not appropriate, right, and they are just like seeing it is a lot easier than spending 5 minutes talking about it and writing some demos, so I think that's really cool, I think it can help a lot in those areas as well.

37:00 Yeah, definitely, please use it and let me know if you have use of them. It's pretty robust at this point, I mean the thing is it does require an expert such as yourself to guide people through, I mean, it's helpful for people let themselves but if just what people do as instructors like yourself is you just pull up a browser and start writing code and start running it and start explaining the code to the students, one step at the time. That is a lot more useful I think then starting a terminal, because the alternative now is you start a terminal write a function or nest a function whatever, and then put a bunch of print statements inside and then you just run a terminal that just prints a bunch of stuff and you are like ok, "I've got to explain why it's putting this"; in Python tutor it is printing it to the web terminal but that also every step you see, "oh it's printing this because x is now pointing at this, and now x points at something else and it's printing that..." It's extremely clear-

37:52 Yeah, it is very clear and it's like if you were to do your terminal example and then go over to the whiteboard and sketch out what's really happening as you try to describe it like Python tutor just does that drawing for you, right.

38:04 Exactly, so the exact use case is what you said, it really replaces a combination of a terminal, it replaces a text editor and interpreter terminal scene, like a repl, and a separate whiteboard all in one, and I thought it was really interesting to mention the .net developers switching to mental model of Python. A funny story about this is recently I wanted to learn Ruby, I've always wanted to learn Ruby, I've never done it before, and I felt the good way for me to learn Ruby is to actually write my own Ruby backend for the Python tutor. So the Python tutor is actually a language independent interface, so you noticed the visualizations- nothing about the visualizations has Python, there are variables and stack frames, and functions, and lists and objects with attributes and stuff, and you can imagine squinting in that makes sense in another language like Javascript or Java or Ruby. So, what the backend does is if you actually write a backend say in Ruby, by hooking into the Ruby debugger, and printing out what happens at every step, you can actually generate visualizations for Ruby. So I actually spent about two weeks really deep diving into the Ruby language implementation and debugger and how it works and I actually created the backend Ruby which is live on the site. And that actually gave me some really interesting revelations about how scoping especially works in Ruby. Have you done Ruby before?

39:31 I started to learn Ruby on Rails a little bit played around with it, never really got really far with it.

39:38 Yeah, it's- we can do a whole other podcast on that. I think people who had done Python for a long time when we learn Ruby it just seems crazy and weird, and I'm sure Ruby people say the same thing about Python. The scoping is really weird, so I have an example of Ruby scoping: so like, something that really looks basic in Python- so I'll give you a classic example is if you have a- what looks like a global variable like you know x=5, and you define a function in Ruby inside that function you cannot access the global variable.

40:10 Ok.

40:10 It's like insane. And it's because it actually when you define a global variable it's actually not a global variable, it's a local variable in that scope, and when you define a function, what looks like a function, that's actually a method on the default object, which is outside of the scope of your normal thing. So, it can actually access when you think as global. So, if you actually think of things in Python way and you do in Ruby, it gets super confusing in terms of scope, so, but the Python tutor actually always treats that all for you and it's like, "oh I can see why I can't access that variable" Even though I thought I couldn't, I could.

40:45 Yeah, and that way I think Python tutor is really interesting for experienced developers, because we have this really strong mental models, but they are not portable necessarily right, you can't just plug them into different situations, and so seeing the differences might just quickly connect those two together.

41:03 Exactly. And I think that's why I've been extending it to other languages, because I want this tool to be useful not only for someone who is an absolute beginner, who has never learned anything, but also for experienced programmers. And one of the future pieces of work I would love to get into if I have time, is to kind of make the bridge between different languages, now that I imagine Python tutor is this nice visualization that is pretty much language agnostic, I want to see like one awesome thing would be like if I'm a Python programmer I want to learn Ruby I want to write some Python examples and then see some equivalent or similar Ruby examples and steps through the code and see "oh this is how you do this, do lambdas or do nested functions in Ruby, it looks kind of like Python". I think that with this tool because it's on the web people can build up interactive examples showing how different languages differ from each other.

41:52 Yeah. I think that's actually really valuable. Although you might need 3 0 1 redirect to a language tutor or something like that.

42:01 Yeah, the name is funny. The name kind of stuck, because it started with Python. I debated change of name, the domain is really good, and it's pretty highly ranked and everybody, a lot of people know it so... I think it just might be an inside job, I will just have to stick with Python for now even though it supports these languages.

42:20 Yeah yeah of course, it's cool. So, That's all really well and good for helping, before we move on to this, let me go back to one thing-- one other thing, that I think is really helpful, even in this mode that we have already spoken about, we'll get to the other modes that we have available in Python tutor as well, but one I thought was really cool is forwards and backwards execution. Like, so I'm a huge fan of PyCharm, and PyCharm has really nice interactive debugging and speaking of Microsoft you are somewhere physically near the visual studio guys, and they have Python tools for visual studio which have nice interactive debuggers, but, going back in a debugger is not the same thing as actually forward and reverse in time, and I think that's one of the things that's cool about Python tutor, is I can run forward, "oh wait, I didn't understand what happened, let me go back 3 steps, forward 2 steps..." that's a really cool feature.

43:09 Yeah, and I know that's one of the key features, and people have been trying to do reverse kind of buggers in production for a lot, there are actually some teams doing stuff for various languages, I mean, in production it's really hard to do but in actually in this educational case, the forth and backward is sort of a trick, because what happens is the whole program is has already executed, but the time you see it in the browser, so when I'm scrubbing forward and backwards I'm just looking at different pieces of the log, that the program has already done. So that's really nice because it allows you go forward and backward, it's not like I have to re-execute, the whole program is done, I am just seeing what did it do at step one, what did it do at step 2 and such.

43:49 Right. That's awesome, yeah there is no like an edit continue or drag execution pointer to skip this, if check or something like that.

43:57 That would be cool, but yeah, it's a read only view, it's done and this is the program you wrote, let's just see what happens every step.

44:05 I think that's amazing. So, the other thing that you can do, that you seem to be building up more and more is to bring other perspectives into this. So if I go into Python tutor and I go click some stuff it's like an automated system, showing me stuff, but if I want to sit over the shoulder of somebody and help them understand, or if I was teaching a class and a bunch of people doing it, you've got tools for that as well, right?

44:27 Yes, so the media tool is on the site right now, if you look at, if you go to python tutor site there is a "start share session" button on upper left, and what that does is that it actually creates a unique URL that you can send to your friend or your tutor, and when they join that URL they actually get into your session, so it's like you are both virtually in the same session, you are sharing it. So, what are you going to use- you can write code together just like you are in Google docs, and then you can actually run when you run the code, your visualizations are synced up, so you can actually when you step backward, the other person's screen also steps. And then there is a chat box so you can talk to each other. And you can also see each other's mouse cursor.

45:11 So that simulates the experience of say a tutor and a learner getting together and sitting side by side and trying to work out a piece of code together, except, you can do this anywhere and remotely. So we've deployed this for about a year now and a bunch of people, like hundreds of people have just used the service to do both tutoring, remotely just saying you know, "I sent a link to my tutor and they can tutor me and I don't have to be in the same room with them," but it is also being used for collaborative learning, which is really need, so this chat room supports arbitrate numbers of users. In reality after you get more than 4 or 5 it gets confusing because so many people are trying to like code together and chat and it just...

45:52 So it's a storm of little mouse cursors.

45:55 Exactly. Because you see everyone's cursor. So I've seen people with 3, 4 maybe 5 people kind of talking about stuff, so that's sort of the tool that simulates a kind of personal interaction with the visualization.

46:12 Then you have another form that is sort of almost a dashboard of many learners, right?

46:17 Right. So then there is another tool that I've been building, that isn't exactly live on the site yet because it's really beta for tutors only, and what that is, is that that solves the problem of there not being enough tutors. So imagine if you are in a large, in a real class, in college class you might have 50 students in a computer lab and one tutor or TA there and what the TA has to do is run around computer lab helping everybody and people raise their hands, going around, helping one person then someone else raises their hand, and go around and help another person... In online course, there may be a thousand students for every TA On the course and there is no way they can help everybody at once obviously. So what I've done is I built a dashboard that shows a tutor or a teacher in real time what a lot of students are doing at the same time on a website like the Python tutor.

47:12 So this dashboard can show up the dozens of people, and each person's actions are just in the little tile, so it's like you have a dozens of little rectangular tiles in a big dashboard on your monitor, and each one is updating in a real time, as the student is editing code, or running code, or seeing compiler errors, so then as a teacher you can glance and it's sort of like you are looking over the shoulders of say 20 or 30 students at once and seeing at a glance what they are doing. And most of the time, students are just coding along, pr they are paused or they are thinking, but sometimes you see a student always keep getting the same compiler error, or you see a student changing their code back and forth and see him confuse, and in that case you can start a chat with those students, directly in the tile.

47:58 So, you can chat with any number of students you want and each chat shows up directly in their coding session. And as a tutor, because you have this dashboard, you can simultaneously chat with many students at once. The reason this works really well in practice is because a lot of students are just paused or thinking, so that you could jump in to help say 3 or 4 or 5 students at once and it's not like you are chatting all the time, you are chatting, giving them a suggestion, giving them hint or something and they go on and do some work and you go help someone else. You can do all that from the comfort of your own home without having to run around a giant computer lab or if you are on an online course you don't know where the students are they are all over the world, but you can just sit there in one central location and help to up the dozens of people at once.

48:44 Yeah, that's really awesome. So you've got all that and a blog post is coming up pretty soon, right? Do you know when is that coming out?

48:50 Yeah, so I am writing a blog post, I don't exactly know when it is coming out, hopefully it'll be in the middle of the month, I'm still kind of shopping around, to different folks and seeing where I can get it published, and the research papers on these projects are coming out soon on my website as well. So all of these are kind of along the line with my research projects which all of them are around the theme of how do you build better interactive tools for teaching programming.

49:18 Awesome. Speaking of interactive tools, we had Brad Miller on from Interactive Python, and you guys are doing some work together as well, right, they are doing something with Python tutor to integrate that?

49:32 Yes, so Brad Miller, he was one of the first users of the Python tutor back in the days so I started this project about 5 years ago, as a graduate student, and it had zero users. And, it was just something I did like many hobby projects, like many of these open source. I was teaching a bit of Python, I wanted to create some visualizations to help me solve that and it was a fun thing to do. I put this project online for about a year, or two and no one really used it. I showed it to some friends and colleagues and they thought this is a cool hobby project, but then around 2011 Brad was starting to build some digital textbook resources, he had been an author, he's been professor for over dozen of years and he had written some Python textbooks.

50:22 And he was trying to experiment back then, in 2011 with putting his Python education materials online in a digital format. And, because he is really innovative, what he wanted to do is he was thinking, "I don't want to just put some texts and code online because then it's just like, it's no better than reading a book except you are just on the computer" It's better only in the sense that it's free. Which is cool, that's already great, I mean having a free digital textbook that's open source is great, because many more people can read it. But he was saying, "if we are all already on the computer, can't we do more something more interactive" So he found my Python tutor project and actually challenged me to try to make it so that you can embed it within other web pages. And theoretically that was possible because it was just a web based interface, but you know I had to write a bunch of code to get it so that it can embed within other web pages.

51:17 And we did that, this was 4 years ago, and I've been working with him ever since, on and off to embed the Python tutor in his page. If you actually look at his intercativepython.org digital textbooks, throughout the textbook you'll see little widgets here and there that show a piece of Python code with a slider to it and then you can just slide and the visualizations actually appear. And then if you hit at it, you can actually edit the code in the Python tutor and see the visualization. That has been tremendously helpful for students because they can not only read the code, they can actually see what is going on. And the cool thing is that that all happens within the contacts of their normal interactive textbook. So yeah, so Brad is an early power user and long time power user of the system, so I am glad you had him on this show...

52:01 Yeah, that's awesome. And the students can even kind of like customize the code samples and save them and stuff, right, so that's really neat.

52:09 Yeah, exactly. And that's the cool thing about being online, that you can not only say with the Python 2 with the visualizations, you can imagine someone making a visualization, like an instructor using power point to very carefully draw out pointers and data structures, and that's all good, but what if as a student you are like, "wait, I want to change this code to make it go backwards or something" and the cool thing with having a real tool is that you can just change your own code and see the new visualization instead of just seeing what the teacher imagined.

52:41 Yeah, that's wonderful. Because that playful exploration is sort of the key to becoming a good programmer I think.

52:48 Exactly, and that's something that's you know, on a more philosophical level that sort of tinkering mindset like you said is key. I mean, on one hand it's really important to understand fundamental principles like variable and scoping and functions, but a lot of this, as you know software is really a craft, it's just like wood working or being a carpenter or something, like you have to, yeah I learned some of the basic physics and material science behind it but really, you can't just become an expert wood worker by just reading a bunch of books. You have to start tinkering and making mistakes and you know, bruising your hands and it is very similar- you have to write a lot of code, play it around, see a bunch of errors, and just build up this intuition about how things work behind the scenes, and hope for the visualizations to help scaffold that learning.

53:38 Yeah, I think that's absolutely what they do, so, very cool project. I think Philip, that might be a good place to kind of wrap it up. Before I let you go I am going to ask you two final questions I always ask a guest. First of all what is your favorite editor?

53:51 Starting with religious words here. I've used-

53:59 Hey, there's no judgments passed.

53:59 I started as an Emacs user in college for about a year or two, but then I saw the light and switched over to Vim a few years ago. So, I've been using Vim for the past decade or so. And I don't have any good reason for doing it beyond just memory so I am not dogmatic about text editors, I just think you should pick one that you work well in and just get really good at it, whatever it might be.

54:26 Yeah. Learn so that it really just become comfortable, that's important. So the other question is, there is the tone of stuff out on the Python package index, have you got any notable favorites out there? Things people should know about?

54:41 I think that one of the really useful Python packages is more the metapackage, so I think it's called, this company is called enthought, which started as the scientific Python company, they mix some packages there is one called "enthought canopy"; and then there is another one, I don't know which company does is but it's called "anaconda". I think it might be continuum or enthought, but look up canopy and anaconda and those are really all in one metapackages for just installing a hundred or so python packages in a nice one click installer that contains mostly it's meant for scientific programming, so it has things like the IPython notebook, and Numpy, SciPy, Matplotlib, all the scientific packages. But also just a lot of stuff for data science, for data processing, analysis. The reason why I suggest those is especially for beginners, is because they have one click installer for Mac, Windows and Linux. One of the annoying things about starting up with any kind of new language is just having install extra packages, you know, for people who are bit more savvy they can use PyPy and easy instal pip, those things but sometimes they get annoying as you try to install it and it says oh some dependencies not found, or your operating system doesn't have this compiler or stuff...

56:00 vcvars.batch not found.

56:03 Exactly. Especially on Windows, right, development is hard, so the one click installers canopy and anaconda they have a company backing them, I think those are free versions and I just want to get passed that headache and just get to the programming.

56:17 Yeah, especially if you are doing data science and you doing it on Windows, because it is super hard to get some of the things to compile over there.

56:26 Yeah, and one of the, you know, people have been talking in Python user groups and keynotes at like PyCon about how if you want to improve Python exposure you have to, we need better story at Windows, because the current story at Windows is it's pretty hard to get going. But these one click package installers are going to help. And I'm hoping in the future, as web and cloud stuff get better, more of the stuff could be hosted in the cloud, so imagine a web based Python cloud service, where the web ids are so good, I bet some companies have already started to do this and you can just have a web based id for Python. Which is really responsive and your stuff that just runs in the cloud it has every single thousands of libraries available you don't even have to worry about install, you just import whatever you want and everything works. I mean, that's the dream.

57:11 Yeah, that's definitely a cool dream. There is a company called Python Anywhere, and they started down that path. It's not quite that far, but it is pretty cool thing, and it's free for people to try it, it's pretty cool.

57:11 All right, so Philip, awesome conversation, thank you so much for being on the show. Is there anything that you would like to talk about, to tell people about, that I forgot? A final call to action?

57:34 Final call to action- well this is high pressure here. I mean I would say the call to action would be to find- I call that action is to go to YouTube and watch some of the videos, that people have put up, especially talks, so PyCon which is the main Python conference has some great key note talks or just amazing, they are really good about putting up talks publicly, I mean, they have hundreds of talks on all sorts of topics, and they are well produced too, like in PyCon and also the affiliates in different countries, I just listen in the background if I am working or I'm doing errand, both the keynotes which are more high level where python is going, and also very detail things like if I want to learn about networking in Python, I want to learn about data science in Python. I think those videos are amazing, and there is so many of them, I mean it's just like thousands of them.

58:34 Absolutely. I totally second that. That's really a good suggestion. And on a more focused level for this conversation, I really recommend people go and watch your ten hour code walk to the CPython code base- you'll absolutely learn something no matter what your experience level is, it's very cool. So, check that out.

58:54 Great. Well thank you very much for promoting that, and I am really glad I made this video, I didn't plan on producing those videos, I mean, those are just part of my class and the great decision I made was I just turned on the screen recording capture and I have a Microsoft surface tablet and that allows me to do a lot of the drawings and the more interactive things, but really what you are hearing is exactly what I gave in those ten lectures or so, that's exactly what the lectures were. I did some light editing in the beginning people were setting up in class and I did some editing, but you notice the audio quality isn't amazing, that's the one downside because I didn't have a nice mic I just basically was teaching in front of the class, but I feel like maybe another call to action is for people if you are giving a lecture or giving a talk about something, just record it and put it online, the quality doesn't have to be amazing, but it's just like the CPython walkthrough, it wasn't like I pre planned to like go to the studio and like spend 10 000 dollars making some high quality production, I just recorded this as part of my class and I've set a disclaimer this is kind of rough, some parts are I'm statering, I'm kind of backtracking of their mistakes, but it's great to have it as a resource out there and something like Khan Academy which is really famous now would sound like making these really simple sketches explaining basic math and arithmetic and that is exactly how they started. He just was tutoring his cousins and he just recorded videos, he didn't care if they were kind of great. And why people like that is because it seems really genuine, it wasn't just some million dollar production in a studio somewhere. The cool thing is I didn't take much work, I just recorded it and with some light editing, I probably won't do as much editing as you'll do on this podcast, but people happen to like them, so I'm glad that I did that.

01:00:55 Yeah, I think it's a great contribution to the community. Thanks for doing it and thanks for being on my show.

01:01:02 Great, thank you very much.

01:01:03 Yeah. See you later.

01:01:03 This has been another episode of Talk Python To Me. Today's guest was Philip Guo and this episode has been sponsored by Hired and Codeship. Thank you guys for supporting the show.

01:01:03 Hired wants you to find your next big thing- visit hired.com/talkpythontome and get 5 or more offers with salary inequity presented right upfront and a special listeners signing bonus of $4000.

01:01:03 Codeship wants you to always keep shipping- check them out at Codeship.com and thank them on Twitter via @codeship. And don't forget the discount code for listeners, it's easy TALKPYTHON.

01:01:03 You can find the links from the show at talkpythontome.com/episodes/show/22.

01:01:03 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.

01:01:03 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on our website.

01:01:03 This is your host, Michael Kennedy.

01:01:03 Thanks for listening!

01:01:03 Smixx, take us out of here.

01:01:03 [music]

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon