« Episode page View

#152: Understanding and using Python's AST Transcript

Recorded on Monday, Feb 5, 2018.

00:00 Michael Kennedy: Have you heard of ASTs, or Abstract Syntax Trees? If you have, it was probably in the context of a compiler or some kind of parser. They're a really powerful data structure, but we often only use them really indirectly by using those types of tools. They're just such an abstract idea to most of us. This week you'll meet Emily Morehouse. She's here to make this abstract concept much more concrete, and discuss the places where the AST can help us write and maintain better code. This is Talk Python To Me Episode 152 recorded February 5, 2018. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy, keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Park My Cloud and Rollbar. Please check out what they're offering during their segments, it really helps support the show. Emily, welcome to Talk Python.

01:12 Emily Morehouse-Valcarcel: Hello, how are you doing?

01:13 Michael Kennedy: I'm doing really well, it was so nice to meet you in Vancouver, and I really enjoyed the talk that you gave about the Abstract Syntax Tree. You made this abstract data kind of concrete, it was nice.

01:23 Emily Morehouse-Valcarcel: Thank you. I've gotten some really positive feedback from it, so I was very happy to be a part of it.

01:28 Michael Kennedy: Yeah, it was really cool. And we're going to talk all about the AST and what it means in Python, where it comes in, how you can actually leverage it to do all sorts of cool stuff, but before we get to all those things let's start with your story. How'd you get into programming and Python?

01:41 Emily Morehouse-Valcarcel: So I kind of stumbled into it. I was a student at Florida State University. I was actually studying theater and biochemistry, and I did an internship--

01:48 Michael Kennedy: That's quite the mix.

01:51 Emily Morehouse-Valcarcel: Yeah, yeah. But I did an internship at a lab, and realized that I could not see myself doing biochemistry for the rest of my life. I also knew that I wasn't going to make a whole lot of money doing theater. So I was really interested in forensics, so I figured if I couldn't do the biochem side that I would go to the criminology side. So I started in criminology, and it turned out that FSU had just launched a computer criminology program, and so this was, I think, my junior year, summer of my junior year of college, and they were like hey, you have really great math scores, why don't you go take a programming class? And I did, and I absolutely fell in love with it, it was an intro to programming, C++ class, and I built an enigma machine simulator.

02:35 Michael Kennedy: Oh wow, that's cool.

02:35 Emily Morehouse-Valcarcel: The rest is history.

02:38 Michael Kennedy: That's really awesome. So this data-driven criminology, is this a little bit like C.S.I. forensic-type stuff? What kind of things would you have done there?

02:49 Emily Morehouse-Valcarcel: It was a lot of focus on security, a lot of focus on digital forensics, so taking a dead hard drive and trying to reformulate a lot of the data that you can get off of it, that kind of stuff.

03:00 Michael Kennedy: That sounds pretty interesting. And I guess you were thrown right into the deep end with C++, so if you like that then this whole thing must be for you, right?

03:10 Emily Morehouse-Valcarcel: Exactly. I think that looking at my history and how I got into programming, and coming from a CS background that was very, very focused on theory, and compilers, and all that, that's definitely where a lot of this stems from.

03:23 Michael Kennedy: Yeah, yeah, I'm sure. So how'd you go from C++ to Python?

03:28 Emily Morehouse-Valcarcel: That's a bit of a complicated answer, so part of it was that I started building a lot of side projects with friends, and so we would go through and figure out how to build a web scraper and stuff like that, and then we started building APIs, and then my university actually offered Python courses so I was able to take Python, an advanced Python course, and then I wound up using it for a lot of other research programs that I was working on, 'cause it's really great with data processing, and building graphs, and all that.

03:57 Michael Kennedy: Right, it's a lot quicker than trying to do that in C++, and I don't know what, OpenGL, who knows, something crazy.

04:03 Emily Morehouse-Valcarcel: Yeah. I had to do a little bit of R at one point at as well, and I much prefer Python over R.

04:09 Michael Kennedy: Yeah, yeah, that's cool. Alright, well that's a really cool origin story, how you got started. How about now, are you still working in criminology?

04:17 Emily Morehouse-Valcarcel: Not quite, I almost did, but I had another interesting turn and wound up starting a company. I actually am the co-founder and Director of Engineering of a company called Cuttlesoft. We are a digital product development company, so we work with a lot of different types of clients, anywhere from non-profits to other tech companies, and we build anything from web and mobile applications to cloud migrations and embedded systems.

04:46 Michael Kennedy: That sounds like a lot of stuff, but it sounds like a really cool company. I looked around your website, it looks really slick, and friendly, and I like it a lot. So it sounds like you must have a ton of technologies at play, right? As soon as you say mobile there's probably some Java and some Swift, or at least a lot of JavaScript somewhere in there, but, what's it like?

05:05 Emily Morehouse-Valcarcel: It's really fun. I consider myself lucky to be able to really hone my polyglot skills. I really enjoy knowing a lot of different languages, and being able to see how building a product in one language is different than using another. Even different type systems between Swift and Java, and yeah, those are all debates that I love getting into.

05:28 Michael Kennedy: Those are cool, and I think knowing those languages gives you a richer perspective on any one of them, being able to think about it and look at these things from different perspectives.

05:37 Emily Morehouse-Valcarcel: I absolutely agree.

05:40 Michael Kennedy: I think it's cool, yeah nice. So you guys also run a podcast, I just learned, tell me a little bit about your podcast.

05:47 Emily Morehouse-Valcarcel: Yeah, so our podcast is called Startup Capital, it's a little bit of a play on words. Our original office started in Tallahassee, Florida, and there was a huge movement of entrepreneurs, and there's a business incubator, and just a huge focus on entrepreneurship. And so we started a podcast to highlight each of those stories, and how everybody's businesses were going, how they relate to the community, how everything is giving back to Tallahassee, and really taking a look at what Tallahassee's ecosystem for entrepreneurs looks like.

06:19 Michael Kennedy: That's cool, that sounds really interesting. There definitely seems to be a lot of entrepreneurship in Florida for some reason. I don't know why that is but it definitely seems like it.

06:29 Emily Morehouse-Valcarcel: Yeah, I mean, we got started there.

06:32 Michael Kennedy: Well there's absolutely one cool example. So let's talk a little bit about the conference and then we'll get into your talk. So I learned about your presentation just by going to PyCascades, which was a new conference in Vancouver this year. I thought it went really well, your talk, but also the whole conference, what do you think of the experience?

06:54 Emily Morehouse-Valcarcel: I absolutely agree, so this was actually my first regional conference that I had attended. I attended PyCon previously. But for me, regional conferences are a lot more digestible, and a lot less intimidating, a lot less exhausting, so for me the two-day conference experience is the perfect length.

07:10 Michael Kennedy: I feel like when you go to PyCon it's like the paradox of choice, it's like oh, there's 10 tracks, and there's the open spaces, and I can even skip all that and hang out with these people, and cruise around the expo hall, and there's just so much to do. It was pretty clear you either do the main track and you're in there with everybody, or not, and that's definitely a different experience.

07:31 Emily Morehouse-Valcarcel: I think the whole thing was really great, I know that I felt very welcomed and very supported as a first-time conference speaker, and I was very, very happy to be invited into the PyCascades community, even as somebody who ventured all the way from Colorado.

07:51 Michael Kennedy: That's cool, it was really nice. There were a lot of Americans for a Canadian conference, but it was still nice to meet everybody from all over the place. I got to say, your presentation was really well done, and I think it was, from a slides visual perspective, one of the most interesting ones there, it had animated gifs, little puppies, it had all sorts of stuff to keep it interesting. People so often just go present a wall of text, and you're just like, I'm sure what you're saying is super interesting, but this isn't convincing me of it, right?

08:22 Emily Morehouse-Valcarcel: I had to be very conscious about how I presented a lot of this information because it can be very dense, and a little bit confusing, so I knew that I had to make it interesting, for one, cue puppy gifs. I also knew that I had to relay a lot of information in a very visual fashion so that people could see the trees and understand what they looked like, instead of trying to put words together to describe it.

08:46 Michael Kennedy: Absolutely, there's definitely a lot of visual stuff in there. Let's start by maybe asking the question, should you care about Python internals? We're going to dive pretty deep into the internals about how Python works and stuff, and on one hand you could blissfully ignore that, pip install requests, and BeautifulSoup, screen scrape, you're done. So maybe why do you think people should look a little deeper?

09:11 Emily Morehouse-Valcarcel: By understanding how your code works under the hood it can give you insights into the approaches that you're using to write your code. I also think that it's really interesting to learn about how Python works as a language, and then how you can leverage that to build different tools. Whether you're using the actual Python AST that gets generated, or just knowledge of ASTs to build other really neat tools.

09:36 Michael Kennedy: Just to know that you can leverage these ASTs to do more or even change how things execute, it's pretty cool to know, and it's not at all obvious that that's possible.

09:47 Emily Morehouse-Valcarcel: Absolutely.

09:49 Michael Kennedy: So what we're going to focus on mostly is around taking Python source code, getting it to Abstract Syntax Trees, and into bytecode. Once the bytecode runs, and that is a whole 'nother area. I just want to give a shout out, for people who are interested in this, to Philip Guo's 10-hour CPython codewalk, and I did an episode on that with him looking inside the CPython implementation of, at that disassembly part, a whole bunch way back on Episode 22, so that's a good compliment to this. But let's maybe start by talking about how do we go from source code to machine instructions executing, actual stuff happening, and you posed a really interesting question of asking whether Python is interpreted or compiled. What's the answer?

10:37 Emily Morehouse-Valcarcel: The answer is both. Usually there's somebody in the audience who actually gets that one right, which I really appreciate, but yes, Python is actually both.

10:46 Michael Kennedy: It seems like you hear that Python is interpreted, so there must be no compile step, but the thing that actually gets interpreted is a bunch of compiled bytecode, it doesn't JIT down to machine instructions, it just stays in these higher level bytecode things. So maybe talk a little bit about that process.

11:03 Emily Morehouse-Valcarcel: Exactly, so the compiler generates your bytecode and then the interpreter actually executes your bytecode. So essentially what you do is you take your source code that gets parsed into what's called a parse tree, and then for various reasons your parse tree gets to be a little bit too detailed, and it's a little bit harder to work with, so you actually transform that into an Abstract Syntax Tree.

11:24 Michael Kennedy: Is the parse tree basically almost exactly what you typed whereas the Abstract Syntax Tree is the meaning, the essence of what you typed?

11:34 Emily Morehouse-Valcarcel: Yes, exactly. So one of my favorite examples to use is a math expression, and so all of us are taught from a very young age the order of operations, so you know that if you have one plus two in parentheses, times three, you know based on the order of operations that you execute the parentheses first, and then you multiply. And so that's one of my favorite examples to use from parse tree to AST, because when you have your AST you can actually automatically know the order in which you have to perform your operations, just based on how the tree itself is laid out, instead of having to know, oh, well I have parentheses so I have to worry about that too.

12:18 Michael Kennedy: Right, right, and we don't have your great pictures to help here, 'cause this is audio, but imagine a tree where the top operation is multiply, and then each branch one is the three, and the other is one plus two, you just go down it and evaluate it, it's pretty straightforward once it's in that mode. Yeah, that's cool. Okay, so we were talking, so we have source code, it gets parsed into a parse tree, and then into an Abstract Syntax Tree, and then what?

12:42 Emily Morehouse-Valcarcel: And then you take your Abstract Syntax Tree, that gets transformed into what's called a Control Flow Graph. So if you think about it, an AST is still represented in a very hierarchial fashion, whereas your CFG you can actually have cycles or loops that can mimic the actual control flow of your program in a better way, and then from there you can actually emit your bytecode that winds up being executed.

13:08 Michael Kennedy: Right, and can we find this bytecode, does it land on disc somewhere when we run our code?

13:11 Emily Morehouse-Valcarcel: Yeah, absolutely, so most of the time all of your bytecode is actually stored in PyC files, it used to also get stored in PyO files back Python 2, if you're still using that--

13:23 Michael Kennedy: Yeah, and it's in that __pycache__ that you sometimes see hanging around.

13:28 Emily Morehouse-Valcarcel: Yeah, so Python 3, they were very kind to organize all those PyC files into your pycache directory to get them out of the way.

13:35 Michael Kennedy: Yeah, the one thing is, here's junk, you can throw it away. Yeah, so let's start talking about working with this, maybe just by focusing on the tools. There's a couple of modules built into Python directly that let you work with this bytecode and these trees and stuff, right?

13:52 Emily Morehouse-Valcarcel: There are two modules that are built in that I use a lot, one of them is the AST module, so this is made to assist you in actually interacting with the AST that Python generates, and it also has a bunch of helpers that you can use to support translations from your source code into an AST, AST into code objects, and then actually executing the code from those code objects.

14:19 Michael Kennedy: That sounds really interesting. So let's see, you could take the AST module and you could parse just source text, and I guess you could either get that from a file by reading the text of the file, or you literally could just give it text, right? That's pretty interesting, and if you want to look at it though, it's just like an object in memory, it's _AST., I don't know, node, or expression, or something that's some address, it's completely hard to make any sense of, but there's a nice dump thing which will dump it back to, what is that, a text to version of a tree or something, I'm not entirely sure.

14:53 Emily Morehouse-Valcarcel: It's this huge glob of tree data, so the easiest way to actually interact with it is to do a traversal and actually walk over your AST, which is actually, I have a few third party libraries that make that a little bit easier, that abstract away actually interacting with the tree, and it handles what is your next step. Then there's actually some really neat visualizations that have been built that you can actually just give it your source code that you want to visualize, and then it'll print out an actual tree representation, it has an SEG.

15:29 Michael Kennedy: So you can visualize that you can even take these nodes, these Abstract Syntax Trees, and say, compile them, and then look at their underlying bytecode, just all in memory.

15:39 Emily Morehouse-Valcarcel: Exactly. So what I did was I approached it as trying to step through as much of the compilation process as I could, just by looking at the AST. So actually going from your source code, skipping the parse tree, going to the AST, and then being able to see your bytecode and interact with it. And so everybody always says that everything in Python is an object, and that actually is true down to the smallest, most integral parts of the actual implementation of the language, so your source code gets compiled into what's called code objects, and then those code objects store things like disassembled bytecode, or sorry, the assembled bytecode.

16:25 Michael Kennedy: Just the straight numbers, like 3, 102, 7.

16:30 Emily Morehouse-Valcarcel: Yeah, and well it's actually stored as bytes so they're not even human readable. You have to go in there and poke at it a little bit to get everything translated into integers so that you can actually read it as a human.

16:43 Michael Kennedy: Right, so once you compile it here, or if you use the disassembly module, you can just go to the code objects, and they have a .co_code, which is, that's the bytes you're talking about, and then you can do a couple of things. You could actually just re-execute it in case you're messing with it here, or you could disassemble it. So I sidetracked you when you were talking about the tools, so there's the AST module which does all of this stuff that we're talking about, it'll take source code and you can actually turn it into an AST, but you were also talking about the disassembly module?

17:14 Emily Morehouse-Valcarcel: Yes, so the disassembly module allows you to take your code objects and actually shows you the machine instructions as they get executed, and it also shows you the registers that they use, and a bunch of information, but it makes it a lot easier to actually just look at the disassembled bytecode because it goes through a lot of the lookup processes for you. So I'm not going to remember what these are off the top of my head, but if you see Instruction 101 there's actually a giant switch statement in the CPython interpreter that goes through and says, okay, Instruction 101, I know that that is supposed to be, I don't know, a load.

17:53 Michael Kennedy: Load function, or like, add a variable to the call stack, or something like that. Yeah, there's literally this huge switch statement that is 3,000 lines in ceval.c in CPython that's just every one of those codes, okay, it's Code 101, what does that mean, what do I do with that? And so the disassembly module basically takes this bytecode, whether you got it just from running directly some bit of code, or from messing with the AST, and it'll show you that in the raw steps, almost the assembly language of Python, it's close as you're going to get anyway. This portion of Talk Python To Me is brought to you by Park My Cloud. The last time you parked your car did you leave it running? No? Well then why are you leaving your cloud resources running? Every year $13 billion are wasted on cloud instances that no one is using. Don't let any of those be yours. Park My Cloud automatically identifies and eliminates wasted cloud spend, saving you 65% on AWS, Azure, and Google's cloud. You are up and running quickly with a 10-minute setup and no scripting required, plus governed users can easily integrate into your DevOps process. See why Park My Cloud was chosen by McDonald's, Unilever, Fox, and more. Start a free trial today at parkmycloud.com/talkpython. So those are the two built-in ones, and then there's a couple of others that let you dig around, there's A-S-T-or, or as-tor, how do you say it, how do you--

19:28 Emily Morehouse-Valcarcel: I've pronounced it as Astor.

19:30 Michael Kennedy: Astor, that sounds more fun, yeah. Tell us about that, and it's a derivative of codegen which comes from Armin Ronacher, of Flask fame as well.

19:36 Emily Morehouse-Valcarcel: Indeed. He started out by writing some codegen, and there were a few little holes that it had so other people in the community picked it up and patched it together. Both Astor, and there's another package called Meta, they are kind of focused on taking bytecode or an AST and generating the code from it, your original source code from it. It's focused on this reverse engineering of bytecode in ASTs. And then Astor also provides some really cool pretty printing and ways to manipulate the ASTs, so if you want to inject code in certain parts of the AST it provides helpers to do that.

20:16 Michael Kennedy: Right, every time you see this type of pattern, wrap it in whatever, something to that effect, right?

20:21 Emily Morehouse-Valcarcel: Mmhmm.

20:22 Michael Kennedy: Yeah, yeah, cool. With this straight AST module can you modify the ASTs as well?

20:28 Emily Morehouse-Valcarcel: I don't know if there are any helpers that make that process easier, but you can go in and actually edit the data structure, so there is an actual _ast data structure that you can interact with.

20:42 Michael Kennedy: If you want to just do it at the lowest level I guess. Yeah, so there's a couple of interesting things that the Astor and codegen did, you talked about the pretty printing, they let you traverse the tree in a less heavyweight way, I guess is what I'm thinking it does, but it also does this concept of round-tripping. Tell us about this idea of round-tripping.

21:04 Emily Morehouse-Valcarcel: I am really interested in round-tripping for a few different reasons. Basically what you can do is you can take your source code, translate it into an AST, and then translate it back into Python and see how much your code has changed. The idea of round-tripping can actually be used in refactoring, or going through and linting code. There are certain changes that you can make to your code that doesn't change the underlying semantic meaning that you can see through the AST. It's a way to guarantee that you've made changes to the syntax of your code, but you haven't actually changed how your code is working, or what it's doing.

21:42 Michael Kennedy: Right, it's like the linter says you really should format this way, or you should tweak it around that way, but it really should have exactly the same meaning. If the AST changes, well, it doesn't have the same meaning, so this is kind of a byte level check almost. I could see a unit test that snapshots the AST, saves it as a pretty printed text, and then just compares, again, does that change, did that change, and it never changed it kind of failed, that's pretty interesting. Another one that I ran across recently that falls into this general realm of digging inside the bytecode, it's not technically to do with ASTs I don't think, it's more about the disassembly, is this thing called Python Hunter. You've seen that, right?

22:27 Emily Morehouse-Valcarcel: I've recently looked into it. They do some really cool things with code tracing, and it'll actually show you at a very low level each step that your code is taking under the hood.

22:38 Michael Kennedy: It kind of of is like a logging the disassembly as your code executes it, it's pretty funky, that thing. But yeah, it's definitely just another library, there's probably 10 others that we're forgetting, or we don't even know exist, out of the 125,000 PyPI packages there's got to be some more in there. Nice, okay, so let's see. One thing that you talked about in your talk that I thought was interesting is that we can use the disassembly module to check out our code, take the bytecode and look at it in these lower level operations, but the bytecode and the source code, they don't always line up, right? Sometimes the bytecode is more verbose, sometimes it's less verbose than our source code, and maybe tell people about that mismatch there.

23:22 Emily Morehouse-Valcarcel: One of the things that you always assume is that there's some sort of optimization that happens under the hood, and so you can actually see how a lot of those optimizations happen by looking at your bytecode.

23:34 Michael Kennedy: The two I remember are the peephole optimization and the constant folding.

23:38 Emily Morehouse-Valcarcel: Yeah, so, I guess I can describe them first. The two most common ways that Python is optimized under the hood is the peephole optimizer. The way I like to think about that is looking around without moving your head, so using your peripheral vision to see your direct surroundings, and then being able to make intelligent choices based on those surroundings. So as humans we have learned that if you have x=1 and then y=x+2, we automatically know since x=1 that we can substitute those values in, and that's one of the things that the peephole optimizer does. And the other optimization is called constant folding, and so basically you can evaluate constant expressions at compile time instead of at run time.

24:30 Michael Kennedy: Right, so for example if you had, what I have all the time in my web apps is there's usually some part that says, how long do you want to cache this for? And it's sometimes in milliseconds and it's sometimes in seconds, and I don't want to write 236,414 seconds, I'll write 60 times 60 times 24 times 31, a month's worth of seconds, do the math for me, and what you're saying is that Python will actually do the constant folding and go, well that's the, whatever, actual number that is just before it ever runs.

25:02 Emily Morehouse-Valcarcel: Exactly. It tries to look out for you a little bit, so one of my other favorite examples is if you have a double negative in your logic, so if you have something not a, not in b, the compiler actually goes, oh, no, no-no-no, I know what you're trying to say here, this is simply just a and b, let me simplify that for you.

25:26 Michael Kennedy: I don't need to do three tests for you, I'll just do the one, or two, or however many you count not in. Yeah, that's totally a good idea, and that's one of the peephole optimizations as well, there's a whole class there. So things that I don't think happen are things like what C++ might do where you might inline non-virtual function calls, and stuff like that, so I don't know. I think it's pretty interesting that one of the big focuses recently on Python is around its performance. I think that was the 3.6 work, I can't remember, but Victor Stinner, and some of those guys are like, we're going to make a bunch of improvements on function call speed, and stuff, and so they're getting better at the optimizations for sure. But yeah, it's not too advanced, right?

26:11 Emily Morehouse-Valcarcel: Correct. I think one of the things that happened with Python 3 is that when Python 3 was first released it was actually a little bit slower than Python 2, and so I think they realized that they really needed to put a lot of focus into at least making it as fast, if not faster, than Python 2.

26:30 Michael Kennedy: We've had this big challenge for quite a while, which I feel like we're on the verge of putting that behind us, but this Python 3, Python 2 wars, it's hard to say you should choose Python 3 when it actually slows down everything you do, right?

26:45 Emily Morehouse-Valcarcel: Exactly.

26:46 Michael Kennedy: That definitely is not what you want on your side of the argument, so it's good that they're making it faster and using less memory. There was that whole presentation at PyCon 2017 from Instagram, I don't know, did you see that one?

26:58 Emily Morehouse-Valcarcel: I might have.

26:58 Michael Kennedy: Yeah, so they talked about upgrading from Python 2 to Python 3 from Django, some old version, to modern Django, and they basically saved 12% on their memory usage, and something similar around performance, just by upgrading. So it's finally getting to the point where it's better to upgrade, not worse, so that's pretty awesome. Alright, so these Abstract Syntax Trees, it's interesting to know that they work, it's kind of cool it gives you some insight into your code like we talked about, but let's talk about some of the applications, because it's sort of an abstract idea until you can do something constructive with it. So you had a bunch of different applications that you talked about.

27:40 Emily Morehouse-Valcarcel: I think that it can be interesting to see these micro-speed ups that you get based on some of the optimizations that the compiler does. There's also a few very random cases where, if you're trying to get 100% code coverage for your tests, there's actually certain cases that if you have an if/else statement, for example, that your else, the actual line for the else, doesn't actually get executed, so there's certain weird things where you're like, I don't know, I can't get my else to execute, but I know that it executes. So there are certain ways that you can use this knowledge to debug different errors that you're encountering.

28:21 Michael Kennedy: Or if you're super obsessed about getting 100% code coverage. You just want to force that to happen. Another thing is, if you wanted to create your own domain-specific language derived from Python, if you want to change the grammar a little bit.

28:36 Emily Morehouse-Valcarcel: So that's kind of one of the fun things about Python, is that anybody can propose a PEP and propose a change to the language. So if you did want to actually get in there and change Python's grammar you can do that fairly easily. There's a lot of blog posts out there about how to actually get in there and do it.

28:53 Michael Kennedy: Yeah. It's probably a pretty tough sell to get them to accept a new concept, right?

29:00 Emily Morehouse-Valcarcel: Usually, yeah.

29:01 Michael Kennedy: It's one of the things where once it's in there you have to live it, no matter what, if it's in the language, unless you're doing a major breaking change like Python 2 to 3, generally it's like the gift of a puppy, you have it once you've received it, nice. And then we talked about round-tripping already, that's another interesting application to verify that changes based on linting or automatic linting actually don't make any meaningful change to the underlying bytecode, that's pretty interesting. Maybe one of the ones where people see the most is around this idea of code generation.

29:40 Emily Morehouse-Valcarcel: So code generation is a really cool, and it's actually, one of the things that I learned in this process is that there's actually some decent chunks of Python itself in Python's compiler that are actually generated.

29:51 Michael Kennedy: Oh, that's pretty cool.

29:54 Emily Morehouse-Valcarcel: But you can use code generators for a lot of really neat things, so one of them is called Pythoscope.

30:00 Michael Kennedy: What's that?

30:01 Emily Morehouse-Valcarcel: It will actually essentially generate unit tests for you, so if you have a project that you haven't actually written any tests for, you can use Pythoscope to kick start that process.

30:10 Michael Kennedy: That's cool, so the logo they have is this doctor checking out a sick snake. It's a pretty good logo actually, it's funny. So looking at this it says, you take the old code and run this thing across it, and it will write what it thinks are the unit tests for you in comments for all the various things, and it just fails all the tests, but you can uncomment them and make them real, which is pretty cool and apparently uses the Abstract Syntax Tree to understand the various pieces, right?

30:46 Emily Morehouse-Valcarcel: Yeah, so it uses the AST to figure out what tests that it can actually generate and what needs to be tested and all that.

30:53 Michael Kennedy: That's a really nice way to do it. So another thing that's pretty interesting is this thing called Transcrypt, and I've never heard of Transcrypt, it's apparently a Python in the browser, sort of a JavaScript to, sorry, a Python to JavaScript compiler type thing, like Babel. Have you played with this, have you seen it?

31:14 Emily Morehouse-Valcarcel: I haven't played with it before, but it's really interesting. I think that that's one of the things that a lot of people are trying to do, is they're trying to make Python a lot more portable. And one of the things I have also learned, I'm just a fountain of knowledge over here, Python was almost the JavaScript of browser languages.

31:34 Michael Kennedy: Oh really?

31:34 Emily Morehouse-Valcarcel: Yes.

31:35 Michael Kennedy: That would've been nice.

31:38 Emily Morehouse-Valcarcel: Yeah, way back in the day, I think there was an old Netscape browser that was actually built using Python as its in-browser language.

31:45 Michael Kennedy: Wow, that's cool. Netscape, back in the day, they came up with JavaScript just for Netscape. Aw too bad, but yeah, there are a couple options, there's Sculpt, and there's a couple of others, but I don't know how the others work, maybe similarly, but at least this Transcrypt one, which I didn't know was an option until now, that one uses the AST to generate the JavaScript equivalent from Python, which seems like a pretty good way to do it.

32:15 Emily Morehouse-Valcarcel: Yeah, I think that transpiling languages is fairly common, so like you mentioned, there's definitely a lot of transcription in the JavaScript world, just going from newer ES7, ES6 syntax to original ES5.

32:32 Michael Kennedy: That's a pretty interesting thing to just say we're going to solve the version problem by recompiling it down to a different version of source code for you.

32:38 Emily Morehouse-Valcarcel: Yeah.

32:42 Michael Kennedy: I'm not sure it's a good way, but it's an interesting way.

32:43 Emily Morehouse-Valcarcel: As somebody who has to work with a lot of JavaScript, I definitely prefer the newer syntax, and I definitely appreciate being able to use arrow functions and all that.

32:53 Michael Kennedy: Yeah, the new JavaScript is a lot better, although it feels much more engineered. You almost need a CS degree to properly work with modern JavaScript where it used to be just this few jquery selectors, and go with it, real simple type thing, right?

33:07 Emily Morehouse-Valcarcel: Yeah, but definitely think that that's also reflected in the amount of weight that we're making JavaScript carry in modern web applications.

33:19 Michael Kennedy: Yeah, that's true, they are doing a lot, aren't they?

33:20 Emily Morehouse-Valcarcel: Yeah.

33:21 Michael Kennedy: So another thing that you talked about was reformatting code, how's that work?

33:25 Emily Morehouse-Valcarcel: There are these ideas using things like autopep8, where you can translate your code, and impose certain restrictions, and so one of my favorite ones is, I think it's just yet another Python formatter.

33:45 Michael Kennedy: Yeah, of course.

33:45 Emily Morehouse-Valcarcel: Yeah, and so that one's really neat. It takes a very intelligent approach to actually looking at your code and seeing at a more underlying level what your AST is actually doing, and being able to make certain choices based on this advanced knowledge that it has of your code, and how it can actually transform it, and almost refactor it in a way.

34:09 Michael Kennedy: It's pretty cool, they have a little online demo that you can play with, and I think, this comes from Google, right?

34:15 Emily Morehouse-Valcarcel: Yes.

34:15 Michael Kennedy: And it lets you pick different formatting guidelines, so you can format via PEP8, or Google, or Facebook, whatever their rules for how Python code should look you can just make it look like that, that's pretty cool. I would guess you could extend it.

34:29 Emily Morehouse-Valcarcel: Oh yeah, I think you definitely can. I'm a huge fan of code formatting.

34:32 Michael Kennedy: It seems like a pretty nice, right-before-you-check-in type of feature, just make sure you run that against your code always because there's always those weird, why did this file change, oh, it's just their formatting indented, my formatting unindents more in the version control, and so having something stabilizing that, that's kind of cool.

34:56 Emily Morehouse-Valcarcel: I think it helps normalize the look of code, so code is there for people to read, humans have to understand what your code is doing, and if you always know that certain line breaks are in certain places, or stuff like that, I think it makes it a lot easier for people to actually interact with it. So I actually have, for most of my projects I will set up githooks, so it's like a pre-commit hook that'll run the formatting and then also run any tests or anything like that to make sure that it didn't break anything.

35:26 Michael Kennedy: That's a really cool idea, I like it. Most of the stuff we've talked about so far really has to do with CPython, the disassembly, and the interpreter and stuff, but there's alternate interpreters as well, and that of course involves Abstract Syntax Trees as well.

35:41 Emily Morehouse-Valcarcel: We talked a little bit about Python's speed, and so the best way to speed up CPython is actually to just completely switch out your interpreter. And so that's why there are a lot of other interpreters like PyPy, Jython, Cython, the list goes on, and so you can actually use these other interpreters to run almost exactly the same syntax, although it is one of the downsides, is that in order to make some of these optimizations you have to have certain sacrifices in the way that your language is represented. So your AST's going to look a bit different in these different interpreters because a lot of them will try to make different optimizations and translations of that AST.

36:28 Michael Kennedy: Right, and so for example Cython would be totally different probably, but maybe PyPy is actually really similar. I think PyPy starts out the same as an interpreted CPython, but eventually if it finds a hot spot it'll replace it with a JIT compiled version. So maybe it starts out the same, I don't actually know.

36:48 Emily Morehouse-Valcarcel: Yeah, I'm definitely not an expert on all the different interpreters.

36:53 Michael Kennedy: It's interesting though, there's all these different trade-offs, and there's all these different interpreters trying to explore the advantages or disadvantages of something, right? Yeah, another one you talked about was the BeeWare project, or was it PyBee, is that also from the BeeWare project?

37:05 Emily Morehouse-Valcarcel: I know PyBee is like their GitHub account, I believe.

37:09 Michael Kennedy: Yeah, yeah, yeah, there's a ton of projects under there, yeah.

37:12 Emily Morehouse-Valcarcel: PyBee is doing some really, really awesome work in trying to use Python for native mobile development. Whether that is in the form of transpiling code in order to interact with native mobile components, or actually they have a project called Batavia, I probably am butchering that.

37:35 Michael Kennedy: That's right, that is their transpiler one, that's right.

37:38 Emily Morehouse-Valcarcel: So they're actually, they'll transpile Python into JavaScript as well.

37:43 Michael Kennedy: How interesting. Yeah, they definitely have some interesting stuff going on over there, I would love for that to become a thing, like proper native mobile apps in Python, because right now I don't think there's a lot of great options. I know you could do some stuff with Pythonista, but it's kind of stuck within that app. You can't just ship your own app to the App Store. And I've been playing with Ionic framework, and Electron JS, and Cordova, and all those things, and I would rather just not, but right now there's not a super awesome option, so it'd be cool if they were successful. This portion of Talk Python To Me has been brought to you by Rollbar. One of the frustrating things about being a developer is dealing with errors. Ugh, relying on users to report errors, digging through log files, trying to debug issues, or getting millions of alerts just flooding your inbox and ruining your day. With Rollbar's full stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster. Adding Rollbar to your Python app is as easy as pip install rollbar. You can start tracking production errors in deployments in eight minutes or less. Are you considering self-hosting tools for security or compliance reasons? Then you should really check out Rollbar's Compliant SaaS option. Get advanced security features and meet compliance without the hassle of self-hosting, including HIPPA, ISO 27001, Privacy Shield, and more. They'd love to give you a demo. Give Rollbar a try today, go to talkpython.fm/rollbar and check them out.

39:17 Emily Morehouse-Valcarcel: I think that being able to actually transpile into native code, so a lot of other platforms like Ionic, et cetera, will use WebView, so you're still running a JavaScript app in your native application.

39:32 Michael Kennedy: Right, and you have all these performance issues, like large lists are super weird unless you choose virtual scrolling add-ons, and all kinds of stuff.

39:40 Emily Morehouse-Valcarcel: Animations are a lot harder, you have that separation between your app and the bare metal, so you can't do a lot of GPU optimizations.

39:50 Michael Kennedy: So fingers crossed for a little bit more from the BeeWare folks, that'd be awesome. Another one that a lot of people likely interact with, especially from the Flask world, is Jinja 2.

40:00 Emily Morehouse-Valcarcel: That's one of the things, the more you start looking around at the tools that you're using, the more you realize that a lot of them are using ASTs under the hood to accomplish some really cool things. But it makes sense, because HTML in itself can be thought of in a tree structure, and so it's really easy to parse HTML into a tree, and then if you have variables in your Jinja template that need to be filled in, it's really easy to pop those values in, and then transpile everything back into an AST, or into the HTML from the AST.

40:34 Michael Kennedy: The template languages are pretty impressive, the way they work, for all the different web frameworks it's kind of cool to see them go.

40:40 Emily Morehouse-Valcarcel: If you think about it, you can have little bits and pieces of Python code in your Jinja templates, and so you can see how this comes together where you've got an AST that you can then edit, and piece additional bits of code into the AST, and then transpile it all back.

41:00 Michael Kennedy: It's definitely a cool example. So all those are within the realm of Python, how about outside Python?

41:07 Emily Morehouse-Valcarcel: There's a lot of different code analysis that you can do, any sort of linting is usually going to use an AST under the hood. And then we already talked about Babel in the JavaScript world, but there's also some really cool stuff that they're doing with CSS. So especially with the advent of all these newer JavaScript frameworks, there's a huge debate over how you actually handle your CSS and your styling now. So you can use an AST to do CSS transformations, and really optimize the use of mix-ins, and different media queries, and you can automate a bit more than what you used to be able to.

41:47 Michael Kennedy: There's definitely a ton of that stuff over in the JavaScript web world, it's quite interesting. All the minification, and all types of stuff that happens before your code actually goes out to the web, it's pretty cool. So what else would you like to let people know about, while you've got some time to speak with them?

42:05 Emily Morehouse-Valcarcel: One of the things that I have definitely been reflecting on a lot recently is just going through my first time speaking at a conference, and so I think I mentioned, I wrote a blog post on it, but I think that I want to shed some light on what that process looks like, and for a lot of people it's really intimidating to actually go out there, especially to start off with a technical talk.

42:29 Michael Kennedy: I think one of the really scary parts is, I think a lot of people are afraid that when you get up in front of them, the audience is just going to find that little one thing you don't know and just tear into you because you didn't know, well actually there's this one aspect of ASTs that does one thing. And I think one of the things I've learned about the Python community is that people are generally really supportive, and those types of things are not likely to happen unless you just have really an unprepared presentation, I think.

42:59 Emily Morehouse-Valcarcel: I've gotten some really great feedback, and I did have these little, oh, well you missed this little thing, but I really liked this, this, this, and this. But I really urge people, if you have a desire to speak at a conference, to just do it and submit that proposal, and you might get accepted, and you might not, but you also might get some really great feedback on your proposal.

43:19 Michael Kennedy: I think it can absolutely change your position within the community, once you've done one of these talks your talk is now on YouTube. If for some reason you ever wanted to apply for a job, it sounds like you're being super successful not having them up. If for some reason you wanted to, having that up there would be a really great resume item, you could say, do you want to know if I know about it, look, here's me speaking for half an hour in front of hundreds of people doing it.

43:46 Emily Morehouse-Valcarcel: Yeah, and it's also a really great way to get accepted for other conferences too. So as soon as you have that first conference talk you can point to that and be like, yeah, I can actually hold my own in front of a crowd, and stand up on the stage, and not completely forget how to speak.

44:03 Michael Kennedy: I'm sure that's a concern for conference organizers, and seeing someone who's obviously good at it is really, it really opens doors, I would think. And it doesn't have to just be the main PyCon. For example, PyCascades is a regional one, they're all over the world, but there's also meet ups, and user groups, and smaller things, maybe even a brown bag lunch talk at your company for the first type thing. I'm going to tell you guys all about web scraping, nobody knows about here, so let's talk about it.

44:33 Emily Morehouse-Valcarcel: Yeah, and I always remind myself to go back to that, it's a very well-known Venn diagram of what you think everybody else knows, and what you think you know, and how there's really a lot less overlap in the spaces than you'd think, and how much value you can really add, even though you just assume, oh, everybody else probably already knows this.

44:56 Michael Kennedy: That's right, it's easy to assume that, but I find it's actually rarely the case. I used to do a lot of in-person professional training, and I would get on the phone to set up some kind of event with the company I was going to, and the managers would also say, our developers are super advanced, they're really advanced, no beginner stuff, we only want advanced stuff for them, 'cause they're the best. Alright, great. Then we'd show up, we're like, alright, we're just doing advanced stuff, but hold on, can we talk about this not advanced stuff for a day or two, 'cause nobody knows it. They're like, what do you mean nobody knows it, your manager just told me that, alright great, that's what I actually believed was the case, but you don't want to tell that to people, right? But I think there's a lot of value in some of the foundational stuff, for sure.

45:39 Emily Morehouse-Valcarcel: Yeah, absolutely.

45:40 Michael Kennedy: So I think there's a bunch of opportunities for people to speak, and it's also, you can level up, you do a talk at a user group and then that gives you the confidence to do your talk at a regional conference, and then you can do whatever else you want to do.

45:52 Emily Morehouse-Valcarcel: I was very pleasantly surprised at how rewarding it was, and how validating it was that me, going from somebody who has a very technical, theoretical background and being mildly obsessed with ASTs, and learning about CPython's compiler and interpreter, and then being like, oh yeah, I'm going to get to talk on this and having be people be interested in it, and want to talk about their perspectives and interests. One of the things that I gained from doing this talk is seeing all the dozens of different interpretations and interests that people had that this talk sparked for them, which was really interesting.

46:30 Michael Kennedy: That's really rewarding. I guess the last thought on this one is how much time did it take you to prepare this whole half hour presentation with all the research and everything, what kind of commitment was it?

46:43 Emily Morehouse-Valcarcel: I put 30-plus hours, easy.

46:43 Michael Kennedy: So that's like, more than 10 times, it was a 30-minute talk, so 60 times? Yeah, that still seems about right, 'cause it's a lot of research, and your talk was really well put together.

47:00 Emily Morehouse-Valcarcel: That's just the talk preparation itself, so I've been iterating on my own side research into CPython for, oh geeze, probably almost a year now, maybe over a year. So there's definitely a lot of research that's gone into it, a lot of shower thoughts, and all that stuff.

47:22 Michael Kennedy: Sure. Well, like I said, it was a really good talk, we'll link to it from the show notes, people can go watch it. I think they just have two whole days of videos so I have a timestamp in the link so hopefully it starts playing, but if for some reason it doesn't, it's 2 hours and 45 minutes into day two, or something like that. But yeah, it was a really, really good talk, and I'm glad you gave it there.

47:46 Emily Morehouse-Valcarcel: Thank you, I'm glad to have given it.

47:48 Michael Kennedy: So let me hit you with the final two questions before I let you out of here. First, if you're going to write some Python code what editor do you use?

47:55 Emily Morehouse-Valcarcel: I always use Sublime on a daily basis. I'm also really comfortable in Vi and Vim, just because I did a lot of server management, so I can get around there, but I am a huge fan of Sublime's Anaconda package.

48:11 Michael Kennedy: That is really nice, yeah. That's not the regular, that's not Anaconda as in the NumPy scientific world, this is a totally different thing, maybe tell people about it.

48:21 Emily Morehouse-Valcarcel: It brings a little bit of the IDE experience to Sublime and allows you to highlight where a function is defined, and that kind of stuff.

48:32 Michael Kennedy: Yeah, it's a really nice add-on for it, that's cool. And notable PyPI package, we talked about a bunch, but grab one of those or another one?

48:40 Emily Morehouse-Valcarcel: Yeah, I think the one that I always have to talk about when people ask me this is the requests library. I think that was the first Python code that I saw that I was like, wow, it felt Pythonic to me, I was like, this is how interacting with a package should feel. And I think that Kenneth Reitz always does a really excellent job of thinking about your package's API. All of your code and your classes and all that have an API that humans have to interact with, I love that that's a theme for all of his work.

49:14 Michael Kennedy: The tagline, all of his stuff is for humans, request is for humans, and then records is SQL for humans, and all sorts of stuff. Yeah, definitely good stuff, he's doing really good work. Alright, so final call to action, if people are excited about this stuff, if they want to learn more, obviously they should check out your talk, but what else can they do to get started?

49:34 Emily Morehouse-Valcarcel: That is a great question. I would encourage people to find the way that they want to apply ASTs, so whether that's in linting, or code highlighting, or actually getting in there and messing around with ASTs, and find something that is a really small thing that you can tweak and have fun with, and see how you can actually improve your current workflow.

49:58 Michael Kennedy: That sounds awesome. Alright, well Emily, thank you so much for being on the show, it was great to chat with you, and thanks for sharing your AST project with us.

50:07 Emily Morehouse-Valcarcel: Thank you.

50:07 Michael Kennedy: Yup, bye. This has been another episode of Talk Python To Me. Today's guest was Emily Morehouse, and this episode has been brought to you by Park My Cloud and Rollbar. Do you hear that sucking noise? That's your cloud provider making you pay for your idle instances. Turn on Park My Cloud, plug the leaks, and save money. Visit talkpython.fm/park to get started. Rollbar takes the pain out of errors. They give you the context and insight you need to quickly locate and fix errors that might've gone unnoticed, until your users complain, of course. As Talk Python To Me listeners, track a ridiculous number of errors for free at rollbar.com/talkpythontome. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point-by-point? Well check out my online course Python Jumpstart By Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. And if you're looking for something a little more advanced try my Write Pythonic Code course at talkpython.fm/pythonic. Be sure to subscribe to the show, open your favorite podcatcher and search for Python, we should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening, I really appreciate it. Now get out there and write some Python code.

Back to show page