#152: Understanding and using Python's AST Transcript
00:00 Have you heard of ASTs or abstract syntax trees?
00:02 If you have, it was probably in the context of a compiler or some kind of parser.
00:06 They're a really powerful data structure, but we often only use them really indirectly by using those types of tools.
00:12 They're just such a, well, you know, abstract idea to most of us.
00:16 This week, you'll meet Emily Morehouse. She's here to make this abstract concept
00:20 much more concrete and discuss the places where the AST can help us write and maintain
00:24 better code. This is Talk Python to Me, episode 152, recorded February 5th, 2018.
00:31 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries,
00:50 the ecosystem, and the personalities. This is your host, Michael Kennedy.
00:54 Follow me on Twitter, where I'm @mkennedy. Keep up with the show and listen to past episodes
00:58 at talkpython.fm and follow the show on Twitter via at Talk Python.
01:02 This episode is brought to you by ParkMyCloud and Rollbar. Please check out what they're
01:07 offering during their segments. It really helps support the show.
01:10 Emily, welcome to Talk Python.
01:12 Hello, how are you doing?
01:13 I'm doing really well. It was so nice to meet you in Vancouver, and I really enjoyed the talk
01:18 that you gave about the abstract syntax tree. You made this abstract idea kind of concrete.
01:23 It was nice.
01:23 Thank you. Yeah, I've gotten some really, really positive feedback from it. So I was very happy
01:28 to be a part of it.
01:28 Yeah, it was really cool. And we're going to talk all about the AST and what it means in
01:33 Python, where it comes in, how you can actually leverage it to do all sorts of cool stuff.
01:37 But before we get to all those things, let's start with your story. How did you get into
01:39 programming in Python?
01:40 So I kind of stumbled into it. I was a student at Florida State University. I was actually studying
01:46 theater and biochemistry. And I did an internship.
01:50 That's quite the mix.
01:51 Yeah. Yeah. But I did an internship at a lab and realized that I could not see myself doing
01:57 biochemistry for the rest of my life. I also knew that I wasn't going to make a whole lot
02:01 of money doing theater. So I was really interested in forensics. So I figured if I couldn't do the
02:07 biochem side, that I would go to the criminology side.
02:10 So I started in criminology. And it turned out that FSU had just launched a computer criminology
02:16 program. And so this was, I think, my junior year, like summer of my junior year of college.
02:22 And they were like, hey, you have really great math scores. Like, why don't you go take a programming
02:26 class? And I did. And I absolutely fell in love with it. It was like an intro to programming C++ class.
02:32 And I built an Enigma machine simulator.
02:35 Oh, wow. That's cool.
02:36 The rest is history. Yeah.
02:37 Yeah. That's really awesome. So this like sort of data driven criminology, is this like
02:43 a little bit like CSI, like forensic type stuff? Or what kind of things would you have done there?
02:49 Yeah. It was a lot of focus on security, a lot of focus on digital forensics. So taking like a dead
02:55 hard drive and trying to reformulate a lot of the data that you can get off of it,
03:00 that kind of stuff. That sounds pretty interesting. And I guess, you know, you were thrown right into
03:05 like the deep end with C++. So if you like that, then, you know, this whole thing must be for you,
03:11 right? Exactly. Yeah. And I think that looking at my history and how I got into programming and
03:16 coming from a CS background, that was very, very focused on theory and compilers and all that.
03:21 That's definitely where a lot of this stems from. Yeah, yeah, I'm sure. So how'd you go from
03:27 C++ to Python? That's a bit of a complicated answer. So part of it was that I started building
03:34 a lot of like side projects with friends. And so we would go through and like figure out how to build
03:39 a web scraper and stuff like that. And then we started building APIs. And then my university
03:44 actually offered Python courses. So I was able to take Python and like an advanced Python course.
03:50 And then I wound up using it for a lot of the research programs that I was working on,
03:54 because it's really great with data processing and building graphs and all that.
03:58 Right. It's a lot quicker than trying to do that in C++.
04:00 Yeah.
04:01 I don't know what. OpenGL, who knows? Something crazy.
04:04 Yeah. I had to do a little bit of R at one point as well. And I much prefer Python over R.
04:09 Yeah, yeah, that's cool. All right. Well, that's a really cool origin story,
04:12 how you got started. How about now? Are you still working in criminology?
04:17 Not quite. I almost did. But I kind of had another interesting turn and wound up starting a company.
04:23 So I actually am the co-founder and director of engineering of a company called Cuddlesoft.
04:28 And so we are a digital product development company. So we work with a lot of different types of clients,
04:35 anywhere from nonprofits to other tech companies. And we build anything from web and mobile applications
04:42 to cloud migrations and embedded systems. And yeah.
04:46 Yeah. That sounds like a lot of stuff, but it sounds like a really cool company. I looked around your website.
04:50 It looks really, really slick and friendly. And I like it a lot. So it sounds like you must have a ton of technologies at play, right?
04:58 Like as soon as you say mobile, there's probably some Java and some Swift or at least a lot of JavaScript somewhere in there.
05:03 But yeah, what's it like?
05:05 It's really fun. I think I consider myself lucky to be able to really hone my sort of polyglot skills.
05:12 I really enjoy knowing a lot of different languages and being able to see how building a product in one language is different than using another.
05:21 Even like different type systems between Swift and Java. And like, yeah, those are all debates that I love getting into.
05:28 Yeah, those are cool. And I think knowing those languages gives you a richer perspective on any one of them, right?
05:35 Being able to think about it and look at these things from different perspectives.
05:38 I absolutely agree.
05:39 I think it's cool. Yeah, nice.
05:41 So you guys also run a podcast, right? I just learned. Tell me a little bit about your podcast.
05:47 Yeah. So our podcast is called Startup Capital. It's a little bit of a play on words.
05:51 So our original office started in Tallahassee, Florida.
05:56 And there was a huge movement of entrepreneurs and like there's a business incubator and just a huge focus on entrepreneurship.
06:05 And so we started a podcast to kind of highlight each of those stories and how everybody's businesses were going, how they relate to the community, how everything is kind of giving back to Tallahassee and really taking a look at what Tallahassee's ecosystem for entrepreneurs looks like.
06:20 That's cool. That sounds really interesting.
06:22 There's definitely seems to be a lot of entrepreneurship in Florida for some reason.
06:26 I don't know why that is, but it definitely seems like it.
06:29 Yeah. I mean, we got started there.
06:30 Yeah. Well, there's exactly one cool example.
06:35 So let's talk a little bit about the conference and then we'll get into your talk.
06:39 So I learned about your presentation and stuff just by going to PyCascades, which was a new conference in Vancouver this year.
06:49 I thought it went really well, your talk, but also the whole conference.
06:52 What do you think of the experience?
06:53 I absolutely agree. So this was actually my first regional conference that I had attended.
06:58 I attended PyCon previously, but for me, regional conferences are a lot more digestible, a lot less intimidating, a lot less exhausting.
07:06 So for me, like the two day conference experience is like the perfect length.
07:11 Yeah. I feel like when you go to PyCon, it's kind of like the paradox of choice, right?
07:14 It's like, oh, there's 10 tracks and there's open spaces and I can even skip all that and hang out with these people and cruise around the expo hall.
07:22 And there's just like, you know, so much to do, right?
07:24 It was pretty clear, like you either do this, the main track and you're in there with everybody or not.
07:29 And that's definitely a different experience.
07:31 Yeah. But I think the whole thing was really great.
07:34 I know that I felt very welcomed and very supported as a first time conference speaker.
07:39 And I was very, very happy to be invited into the PyCascades community, even as somebody who ventured all the way from Colorado.
07:50 Yeah, that's cool. It was really nice.
07:52 Yeah, there was there were a lot of Americans for a Canadian conference, but it was still nice to meet everybody from from all over the place.
08:00 I got to say, your presentation was really well done.
08:02 And I, I think it was from a sort of slides visual perspective, one of the most interesting ones there, like it had a animated gifs, little puppies, and it had all sorts of stuff to keep it interesting.
08:15 You know, people so often just go present like a wall of text and you're just like, I'm sure what you're saying is super interesting, but this isn't convincing me of it, right?
08:22 I had to be very conscious about how I presented a lot of this information because it can be very dense and a little bit confusing.
08:30 So I knew that I had to make it interesting for one, cue puppy gifs.
08:34 And I also knew that I had to relay a lot of the information in a very visual fashion so that people could see the trees and understand what they looked like instead of trying to put words together to describe it.
08:46 Yeah, absolutely. There's definitely a lot of visual stuff in there.
08:49 So let's start by maybe asking the question, like, should you care about Python internals?
08:56 I mean, we're going to dive pretty deep into the internal of how Python works and stuff.
09:00 And on one hand, you could kind of blissfully ignore that, you know, pip install requests in beautiful soup, screen scrape, you're done.
09:08 So maybe why do you think people should look a little deeper?
09:11 By understanding how your code works under the hood, it can give you insights into the approaches that you're using to write your code.
09:20 I also think that it's really interesting to learn about how Python works as a language and then how you can leverage that to build different tools, whether you're using the actual Python AST that gets generated or just knowledge of ASTs to build other really neat tools.
09:37 Just to know that you can leverage these ASTs to do more or even change how things execute is, it's pretty cool to know.
09:46 And it's not at all obvious that that's possible, right?
09:48 Absolutely.
09:48 Yeah, yeah.
09:49 So what we're going to focus on mostly is around taking Python source code, getting it to abstract syntax trees and into bytecode.
09:59 Like once the bytecode runs and that sort of is a whole nother area.
10:03 So I just want to give a shout out for people who are interested in this to Philip Wo's 10-hour CPython code walk.
10:10 And I did an episode on that with him, like looking inside the CPython implementation at that sort of disassembly part.
10:16 A whole bunch way back on episode 22.
10:19 So that's kind of a good compliment to this.
10:21 But let's maybe start by talking about how do we go from source code to machine instructions executing, right?
10:30 Like actual stuff happening.
10:31 And you posed a really interesting question of asking whether Python is interpreted or compiled.
10:36 What's the answer?
10:37 The answer is both.
10:39 Usually there's somebody in the audience who actually gets that one right, which I really appreciate.
10:43 But yes, Python is actually both.
10:45 It seems, you know, like you hear that Python is interpreted, so there must be no compile step.
10:50 But the thing that actually gets interpreted is a bunch of compiled bytecode.
10:55 It doesn't get down to machine instructions.
10:57 It just stays these sort of higher level bytecode things, right?
11:00 So maybe talk a little bit about that process.
11:02 Exactly.
11:03 So the compiler generates your bytecode, and then the interpreter actually executes your bytecode.
11:09 So essentially what you do is you take your source code that gets parsed into what's called a parse tree.
11:14 And then for various reasons, your parse tree gets to be a little bit too detailed and is a little bit harder to work with.
11:22 So you actually transform that into an abstract syntax tree.
11:25 Is the parse tree like basically almost exactly what you typed, whereas the abstract syntax tree is like the meaning, like the essence of what you typed?
11:33 Yes, exactly.
11:34 So one of my favorite examples to use is a math expression.
11:38 And so all of us are taught from a very young age the order of operations, right?
11:44 So you know that if you have one plus two in parentheses times three, you know, based on the order of operations, that you execute the parentheses first and then you multiply.
11:56 And so that's one of my favorite examples to use from parse tree to AST, because when you have your AST, you can actually automatically know the order in which you have to perform your operations just based on how the tree itself is laid out.
12:12 Instead of having to know, oh, well, I have parentheses, so I have to worry about that too.
12:17 Right, right.
12:18 And we don't have your great pictures to help here because this is audio.
12:22 But, you know, imagine a tree where the top operation is multiply and then like each branch, one is like the three and the other is one plus two.
12:29 You just you just go down it and evaluate it.
12:31 Right.
12:31 It's pretty, pretty straightforward once it's in that mode.
12:33 Yeah, that's cool.
12:34 OK, so we were we're talking.
12:36 So we have source code.
12:37 It gets parsed into a parse tree and then into an abstract syntax tree.
12:42 And then what?
12:43 And then you take your abstract syntax tree that gets transformed into what's called a control flow graph.
12:49 So if you think about it, an AST is still represented in a very hierarchical fashion, whereas your CFG, you can actually have cycles or loops that can mimic the actual control flow of your program in a better way.
13:02 And then from there, you can actually emit your byte code that winds up being executed.
13:07 Right.
13:08 And can we find this byte code?
13:09 Like, does it land on disk somewhere when we run our code?
13:11 Yeah, absolutely.
13:12 So most of the time, all of your byte code is actually stored in PyC files.
13:17 It used to also get stored in PyO files back Python 2, if you're still using that.
13:23 Yeah.
13:24 And it's in that dunder PyCache that you sometimes see hanging around, right?
13:28 Yeah.
13:28 So Python 3, they were very kind to organize all those PyC files into your PyCache directory to kind of get them out of the way.
13:35 Yeah.
13:36 The one thing is, here's junk, you can throw it away.
13:38 Yeah.
13:39 So let's start talking about working with this, maybe just by focusing on the tools.
13:45 So there's a couple of modules built into Python directly that lets you work with this byte code and these trees and stuff, right?
13:52 Yeah.
13:53 So there are two modules that are built in that I use a lot.
13:56 One of them is the AST module.
13:58 So this is made to assist you in actually interacting with the AST that Python generates.
14:05 And it also has a bunch of helpers that you can use to support translations from your source code into an AST, AST into code objects, and then actually executing the code from those code objects.
14:19 Yeah, that sounds really interesting.
14:20 So let's see, you could take the AST module and you could like parse just source text, right?
14:27 And I guess you could either get that from a file by reading the text of the file, or you literally could just give it text, right?
14:32 That's pretty interesting.
14:33 And if you want to look at it, though, you can't, you know, it's just like an object in memory.
14:37 It's like underscore AST dot, I don't know, node or expression or something at some address, right?
14:43 It's completely hard to make any sense of.
14:44 But there's a nice dump thing, which will dump it sort of back to, what is that, like a text version of a tree or something?
14:52 I'm not entirely sure.
14:53 It's this huge glob of tree data.
14:56 So the easiest way to actually interact with it is to do like a traversal and actually walk over your AST, which is actually, I have a few like third party libraries that make that a little bit easier that abstract away actually interacting with the tree and it handles, you know, what is your next step?
15:15 Right. Okay.
15:16 Yeah. And then there's actually some really neat visualizations that have been built that you can actually just give it your source code that you want to visualize.
15:24 And then it'll print out of an actual tree representation.
15:28 It has like an SVG.
15:30 So you can visualize that you can even take these nodes, these abstract syntax trees and say compile them and then look at their underlying bytecode, right?
15:40 Just all in memory.
15:41 Yeah, exactly.
15:42 So what I kind of did was I approached it as trying to kind of step through as much of the compilation process as I could just by looking at the AST.
15:52 So actually going from your source code, kind of skipping the parse tree, going to the AST and then being able to see your bytecode and interact with it.
16:01 And so everybody always says that everything in Python is an object.
16:05 And that actually is true down to the smallest, most integral parts of the actual implementation of the language.
16:13 So your source code gets compiled into what's called code objects.
16:18 And then those code objects store things like disassembled bytecode or sorry, the assembled bytecode.
16:25 Just the straight numbers, right?
16:27 Like three, 102, seven.
16:30 Yeah.
16:30 And it's actually stored as bytes.
16:33 So they're not even like human readable.
16:35 So you can actually, you have to go in there and poke at it a little bit and to get everything translated into integers so that you can actually read it as a human.
16:44 Right.
16:44 So once you compile it here or if you use the disassembly module, you can just go to the code objects and they have like a .co underscore code, which is, that's the bytes you're talking about.
16:54 And then you can do a couple of things.
16:57 You could actually just re-execute it in case you're messing with it here or you could disassemble it.
17:01 So I kind of sidetracked you when you're talking about the tool.
17:04 So there's the AST module, which does all the stuff that we're talking about, right?
17:08 It'll take source code and you can actually turn it into an AST, but you are also talking about the disassembly module?
17:14 Yes.
17:15 So the disassembly module allows you to take your code objects and actually shows you sort of the machine instructions as they get executed.
17:25 And it also shows you like the registers that they use and a bunch of information, but it makes it a lot easier to actually just look at the disassembly bytecode because it goes through a lot of the lookup processes for you.
17:38 So I'm not going to remember what these are off the top of my head, but if you see like instruction 101, there's actually a giant switch statement in the CPython interpreter that goes through and says, okay, instruction 101.
17:52 I know that that is supposed to be, I don't know, a load.
17:54 Load function or like add a variable to the call stack or something like that.
18:00 Yeah.
18:01 There's literally like this huge switch statement that is like 3000 lines in C eval.c and CPython.
18:07 That's like just every one of those codes.
18:10 Okay.
18:10 It's code 101.
18:11 What does that mean?
18:12 What do I do with that?
18:13 Right.
18:13 And so the disassembly module basically take this bytecode, whether you got it just from running and directly some bit of code or, or from messing with the AST and it'll show you that and kind of the, the raw steps, almost the assembly language of Python.
18:28 Right.
18:29 As close as you're going to get anyway.
18:30 Yeah.
18:31 Right.
18:34 This portion of talk Python to me is brought to you by park my cloud.
18:37 The last time you parked your car, did you leave it running?
18:40 No.
18:41 Well, then why are you leaving your cloud resources running?
18:44 Every year, $13 billion are wasted on cloud instances that no one is using.
18:50 Don't let any of those be yours.
18:52 Park my cloud automatically identifies and eliminates wasted cloud spend, saving you 65% on AWS, Azure,
19:00 and Google's cloud.
19:01 You're up and running quickly with a 10 minute setup and no scripting required.
19:05 Plus govern users and easily integrate into your DevOps process.
19:09 See why park my cloud was chosen by McDonald's, Unilever, Fox, and more.
19:14 Start a free trial today at parkmycloud.com slash talk Python.
19:19 So those are the two built-in ones.
19:21 And then there's a couple of others that let you dig around.
19:24 There's AST or AST or how do you say it?
19:28 How do you think?
19:29 I've pronounced it as Aster.
19:30 Aster.
19:31 That sounds more fun.
19:31 Yeah.
19:32 Tell us about that.
19:32 And it's a derivative of CodeGen, which comes from Armin Roeneker of Flask fame as well.
19:37 Indeed.
19:37 Yeah.
19:38 So he started out by writing some CodeGen and there were a few little holes that it had.
19:44 So other people in the community kind of picked it up and patched it together.
19:48 And so both Aster and there's another package called Meta.
19:52 They're kind of focused on taking bytecode or an AST and generating the code from it, like your original source code from it.
20:01 And so it's kind of focused on this like reverse engineering of bytecode and ASTs.
20:05 And then Aster also provides some really cool like pretty printing and ways to manipulate the AST.
20:10 So if you want to like inject code in certain parts of the AST, it provides helpers to do that.
20:16 Right.
20:16 Every time you see this type of pattern, like wrap it in whatever, right?
20:20 Something to that effect, right?
20:22 Mm-hmm.
20:22 Yeah.
20:23 Yeah.
20:23 Cool.
20:23 With the straight AST module, can you modify the ASTs as well?
20:28 I don't know if there are any helpers that kind of make that process easier, but you can go in and actually like edit the data structure.
20:36 So there is an actual like underscore AST data structure that you can interact with.
20:42 If you want to just do it at the lowest level, I guess.
20:44 Yeah.
20:45 Yeah.
20:45 So there's a couple of interesting things that the Aster and Kojen did.
20:51 You talked about the pretty printing.
20:52 They let you kind of traverse the tree in a less heavyweight way, I guess, is one thing it does.
20:59 But it also does this concept of round tripping.
21:01 Tell us about this idea of round tripping.
21:04 I am really interested in round tripping for a few different reasons.
21:07 But basically what you can do is you can take your source code, translate it into an AST, and then translate it back into Python
21:14 and see how much your code has changed.
21:16 So the idea of round tripping can actually be used in sort of refactoring or going through and linting code.
21:24 So there are certain changes that you can make to your code that doesn't change the underlying semantic meaning that you can see through the AST.
21:34 So it's a way to kind of guarantee that you've made changes to the syntax of your code, but you haven't actually changed how your code is working or what it's doing.
21:43 Right.
21:43 It's like the linter says, you really should format this way or you should tweak it around that way.
21:48 Right.
21:49 But it really should have exactly the same meaning.
21:51 If the AST changes, well, it doesn't have the same meaning.
21:54 Right.
21:55 So this is kind of a byte level check almost.
21:59 Yeah, I could see a unit test that kind of like snapshots the AST, saves it as like a pretty printed text, and then, you know, just compares again.
22:09 Does that change?
22:10 Did that change?
22:11 All right.
22:11 And if it ever changed, it kind of failed.
22:13 That's pretty interesting.
22:14 Another one that I ran across recently that kind of falls into this general realm of like digging inside the bytecode, it's not technically to do with ASTs, I don't think.
22:24 It's more about the disassemblies, this thing called Python Hunter.
22:27 You've seen that, right?
22:28 Yeah, I recently looked into it.
22:30 They do some really cool things with code tracing, and it'll actually kind of show you at a very low level each step that your code is taking under the hood.
22:39 Yeah, so it's kind of like logging the disassembly as your code executes.
22:44 It's pretty funky, that thing.
22:46 But yeah, it's definitely just another library.
22:48 There's probably, you know, 10 others that we're forgetting or we don't even know exist, right?
22:52 Out of the 125,000 PyPI packages, there's got to be some more in there.
22:56 Nice.
22:57 Okay, so let's see.
22:59 One thing that you talked about in your talk that I thought was interesting is that we can use the disassembly module to check out our code,
23:08 like take the bytecode and sort of look at it in these lower level operations.
23:12 But the bytecode and the source code, they don't always line up, right?
23:15 Sometimes the bytecode is more verbose.
23:17 Sometimes it's less verbose than our source code.
23:19 And maybe tell people about that mismatch there.
23:21 One of the things that you always kind of assume is that there's some sort of optimization that happens under the hood.
23:29 And so you can actually see how a lot of those optimizations happen by looking at your bytecode.
23:35 The two I remember are the peephole optimization and the constant folding.
23:39 Yeah.
23:40 So I guess I can describe them first.
23:42 The two most common ways that Python is optimized under the hood is the peephole optimizer.
23:48 So the way I like to think about that is looking around without moving your head.
23:53 So kind of using your peripheral vision to see your direct surroundings and then being able to make intelligent choices based on those surroundings.
24:01 So as humans, we have learned that if you have x equals 1 and then y equals x plus 2, we automatically know since x equals 1 that we can kind of substitute those values in.
24:15 And that's one of the things that the peephole optimizer does.
24:19 And the other optimization is called constant folding.
24:23 And so basically you can evaluate constant expressions at compile time instead of at runtime.
24:30 Right.
24:30 So for example, if you had like what I have all the time in my web apps is there's usually some part that says, how long do you want to cache this for?
24:39 And it's sometimes in milliseconds and it's sometimes in seconds.
24:42 And I don't want to write, you know, 236,414 seconds.
24:47 All right.
24:48 Like 60 times, 60 times, 24 times, 31, a month's worth of seconds.
24:53 Do the math for me.
24:54 And what you're saying is that Python will actually do the constant folding and go, well, that's the, you know, whatever actual number that is.
25:00 Just before it ever runs.
25:02 Right.
25:02 Yeah, exactly.
25:03 And it tries to look out for you a little bit.
25:06 So one of my other favorite examples is if you have like a double negative in your logic.
25:12 So if you have something like not A, not in B, the compiler actually goes, oh, no, no, no, no.
25:19 I know what you're trying to say here.
25:20 This is simply just A and B.
25:22 Like, let me simplify that for you.
25:25 Right, right.
25:25 I don't need to do three tests for you.
25:27 I'll just do the one or two.
25:29 However, however you count not in.
25:31 Yeah, that's totally a good idea.
25:34 And that's one of the people optimizations as well.
25:37 It's like there's a whole class there.
25:39 So things that I don't think happen are things like what C++ might do where you might like inline non-virtual function calls and stuff like that.
25:48 So I don't know.
25:50 There's I think it's pretty interesting that one of the big focuses recently on Python is around its performance.
25:56 I think that was the 3.6 work.
25:58 I can't remember.
25:59 But like Victor Stinner and some of those guys are like, we're going to make a bunch of improvements on like function call speed and stuff.
26:06 And so they're getting better at the optimizations for sure.
26:09 But yeah, it's not too advanced, right?
26:11 Correct.
26:12 Yeah.
26:12 I think one of the things that happened with Python 3 is that when Python 3 was first released, it was actually a little bit slower than Python 2.
26:20 And so I think they realized that they really needed to put a lot of focus into at least making it as fast, if not faster, than Python 2.
26:28 Yeah, we've had this big challenge for quite a while, which I feel like we're on the verge of sort of putting that behind us.
26:36 But the sort of Python 3, Python 2 wars, it is hard to say you should choose Python 3 when it actually slows down everything you do, right?
26:46 Exactly.
26:46 Yeah, that definitely is not what you want on your side of the argument.
26:49 So it's good that they're making it faster and using less memory.
26:52 Like there was that whole presentation at PyCon 2017 from Instagram.
26:57 I don't know.
26:57 Did you see that one?
26:58 I might have.
26:59 Yeah.
26:59 So they've talked about upgrading from Python 2 to Python 3 from Django, some old version to like modern Django.
27:05 And they basically save like 12% on their memory usage and something similar around performance just by upgrading.
27:11 So, you know, it's finally getting to the point where it's better to upgrade, not worse.
27:17 So that's pretty awesome.
27:18 All right.
27:19 So these abstract syntax trees, it's interesting to know that they work.
27:24 It's kind of cool.
27:25 It's kind of cool.
27:25 It gives you some insight into your code like we talked about.
27:28 But, you know, let's talk about some of the applications because, you know, it's sort of an abstract idea until you can do something constructive with it.
27:37 So you had a bunch of different applications that you talked about.
27:41 I think it's kind of a lot of things that you're going to do something like 100% of the code coverage for your tests.
27:45 So you're going to do something like 100% of the compiler does.
27:49 There's also a few like very random cases where if you're trying to get like 100% code coverage for your tests.
27:57 There's actually certain cases that if you have an if else statement, for example, that your else, the actual line for the else doesn't actually get executed.
28:07 So there are certain like weird things where you're like, I don't know, I can't get my else to execute, but I know that it executes.
28:14 Yeah.
28:15 Yeah.
28:15 So there are certain ways that you can use this knowledge to sort of debug different errors that you're encountering.
28:21 Right.
28:21 Or if you're super obsessed about getting 100% code coverage.
28:25 Yeah.
28:25 You just want to force that to happen.
28:27 Yeah.
28:28 Another thing is if you wanted to create your own sort of domain specific language derived from Python, right, if you want to change the grammar a little bit.
28:37 So that's kind of one of the fun things about Python is that anybody can sort of propose a PEP and propose a change to the language.
28:44 So if you did want to actually get in there and change Python's grammar, you can do that fairly easily.
28:50 There's a lot of blog posts out there about how to actually get in there and do it.
28:54 Yeah.
28:54 It's probably a pretty tough sell to get them to accept like a new concept, right?
29:01 Usually.
29:01 Yeah.
29:01 Yeah.
29:02 I mean, that's one of the things where like once it's in there, you have to live with it no matter what.
29:07 Right.
29:08 If it's in the language, unless you're doing a major break and change like Python 2 to 3, like generally,
29:14 you know, it's kind of like the gift of a puppy, right?
29:16 You have it once you've received it.
29:19 Nice.
29:19 And then we talked about round tripping already.
29:21 That's another interesting application to sort of verify that changes based on linting or automatic linting actually don't make any significant, you know, any meaningful change to the underlying bytecode.
29:34 That's pretty interesting.
29:36 Maybe one of the ones where people see the most is around this idea of code generation.
29:40 Yeah.
29:40 So code generators are really cool.
29:42 And it's actually one of the things that I learned in this process is that there's actually some decent chunks of Python itself and Python's compiler that are actually generated.
29:52 Oh, that's pretty cool.
29:53 Yeah.
29:53 But you can use code generators for a lot of really neat things.
29:57 So one of them is called Pythoscope.
29:59 Now, what's that?
30:00 It will actually essentially generate unit tests for you.
30:05 So if you have a project that you haven't actually written any tests for, you can use Pythoscope to kind of kickstart that process.
30:11 That's cool.
30:12 So the logo they have is like this doctor checking out like a sick snake.
30:18 It's a pretty good logo, actually.
30:21 It's funny.
30:23 So look at this.
30:24 It says you take the old code and sort of run this thing across it.
30:29 It will write what it thinks are the unit tests for you in comments for all the various things.
30:35 And it just fails all the tests.
30:37 But you can sort of uncomment them and make them real, which is pretty cool.
30:41 And it apparently uses the abstract syntax tree to go understand the various pieces, right?
30:47 Yeah.
30:47 So it uses the AST to kind of figure out what tests that it can actually generate and what needs to be tested and all that.
30:54 Oh, yeah.
30:54 That's cool.
30:54 That's a really nice way to do it.
30:56 So another thing that's pretty interesting is this thing called Transcript.
31:00 And I've never heard of Transcript.
31:02 It's apparently a Python in the browser, sort of a JavaScript to, sorry, a Python to JavaScript compiler type thing like Babel.
31:13 Have you played with this?
31:14 Have you seen it?
31:15 I haven't played with it before, but it's really interesting.
31:18 So I think that that's one of the things that a lot of people are trying to do is they're trying to make Python a lot more portable.
31:23 And one of the things I have also learned, I'm just a fountain of knowledge over here.
31:28 Python was almost the JavaScript of browser languages.
31:33 Oh, really?
31:34 Yes.
31:35 That would have been nice.
31:37 Yeah.
31:38 Way back in the day, I think there was like an old Netscape browser that was actually built using Python as it's like in browser language.
31:46 Wow, that's cool.
31:47 Yeah.
31:47 So, you know, Netscape back in the day, that's when they came up with JavaScript just for Netscape, right?
31:54 So, too bad.
31:56 But yeah, there are a couple options.
31:58 There's Sculpt and there's a couple of others.
32:00 But I don't know how the others work.
32:02 Maybe similarly.
32:02 But at least this transcript one, which I didn't know was an option until now, that one uses AST to sort of generate the JavaScript equivalent from Python, which seems like a pretty good way to do it.
32:15 Yeah, I think that transpiling languages is fairly common.
32:20 So, like you mentioned, there's definitely a lot of transcription in the JavaScript world just going from newer, you know, ES7, ES6 syntax to original ES5.
32:31 That's a pretty interesting thing to just say, we're going to solve the version problem by recompiling it down to a different version of source code for you.
32:40 Yeah.
32:40 I'm not sure it's a good way, but it's an interesting way.
32:44 As somebody who has to work with a lot of JavaScript, I definitely prefer the newer syntax.
32:49 And I definitely appreciate being to use, you know, arrow functions and all that.
32:53 Yeah, the new JavaScript is a lot better, although it feels much more engineered.
32:58 Like you almost need a CS degree to properly work with modern JavaScript, where it used to be just this few jQuery selectors and go with it.
33:06 Real simple type thing, right?
33:08 Yeah, but I definitely think that that's also sort of reflected in the amount of weight that we're making JavaScript carry in, you know, modern web applications.
33:18 Yeah, that's true.
33:19 They are doing a lot, aren't they?
33:21 Yeah.
33:21 So another thing that you talked about was reformatting code.
33:24 How does that work?
33:26 So I think there are these ideas using things like AutoPep 8, where you can kind of translate your code and impose certain restrictions.
33:35 And so one of my favorite ones is, I think it's just yet another Python formatter.
33:43 Yep, of course.
33:46 Yeah.
33:47 And so that one's really neat.
33:48 It takes a very intelligent approach to actually looking at your code and seeing at a more underlying level what your AST is actually doing and being able to make certain choices based on this advanced knowledge that it has of your code and how it can actually transform it and almost refactor it in a way.
34:09 It's pretty cool.
34:10 They have a little online demo that you can play with.
34:12 And I think this comes from Google, right?
34:15 Yes.
34:15 And it lets you pick different formatting guidelines.
34:19 You can format via Pep 8 or Google or Facebook, like whatever their rules for how Python code should look.
34:25 You can just make it look like that.
34:26 That's pretty cool.
34:27 I would guess you could extend it.
34:29 Oh, yeah.
34:29 I think you definitely can.
34:31 I'm a huge fan of code formatting.
34:33 It seems like a pretty nice sort of right before you check in type of feature.
34:38 Just make sure you run that against your code always because there's always those weird like, why did this file change?
34:46 Oh, it's just their formatting indented.
34:48 My formatting unindents like war in the version control, right?
34:53 And so having like something sort of stabilizing that, that's kind of cool.
34:56 Yeah.
34:57 And I think it helps normalize the look of code.
35:00 So code is there for people to read, right?
35:03 Like humans have to understand what your code is doing.
35:05 And if you always know that certain line breaks are in certain places or stuff like that, I think it makes it a lot easier for people to actually interact with it.
35:15 So I actually have, for most of my projects, I'll set up get hooks.
35:19 So it's like a pre-commit hook that'll run the formatting and then also run any tests or anything like that to make sure that it didn't break anything.
35:27 Yeah, that's a really cool idea.
35:28 I like it.
35:29 Yeah.
35:29 Most of the stuff we've talked about so far really has to do with CPython, you know, the disassembly and the interpreter and stuff.
35:35 But there's alternate interpreters as well, right?
35:38 And that, of course, involves abstracts, syntax, trees as well.
35:41 We talked a little bit about Python's speed.
35:44 And so the best way to speed up CPython is actually to just completely switch out your interpreter.
35:51 And so that's why there are a lot of interpreters like PyPy, Jython, Cython, the list goes on.
36:00 And so you can actually use these other interpreters to run almost exactly the same syntax.
36:07 So that is one of the downsides.
36:08 So that is one of the downsides is that in order to make some of these optimizations, you have to have certain sacrifices in the way that your language is represented.
36:18 So your AST is going to look a bit different in these different interpreters because a lot of them will try to make different optimizations and translations of that AST.
36:29 Right.
36:29 And so, like, for example, Cython would be totally different probably.
36:34 But maybe PyPy is actually really similar.
36:37 I think PyPy starts out the same as an interpreted CPython, but eventually if it finds a hotspot, it will replace it with a JIT compiled version.
36:45 So maybe it starts out the same.
36:47 I don't actually know.
36:48 Yeah, I'm definitely not an expert on all the different interpreters.
36:51 It's interesting, though.
36:53 There's all these different tradeoffs and there's all these different interpreters trying to explore the advantages or disadvantages of something, right?
37:00 Yeah.
37:01 Another one you talked about was the Beware project.
37:04 Is it PyB?
37:04 Is that also from the Beware project?
37:06 I know PyB is, like, their GitHub account, I believe.
37:09 Yeah, yeah.
37:10 There's a ton of projects under there, yeah.
37:12 Yeah.
37:13 PyB is doing some really, really awesome work in trying to use Python for, like, native mobile development.
37:20 And so whether that is in the form of transpiling code in order to interact with native mobile components or actually they have a project called Batavia.
37:33 I probably am butchering that.
37:35 Yeah, yeah.
37:36 That's right.
37:36 That is their transpiler one.
37:38 That's right.
37:38 Yeah.
37:38 So they're actually, they'll transpile Python into JavaScript as well.
37:43 How interesting.
37:44 Yeah.
37:44 They definitely have some interesting stuff going on over there.
37:47 I would love for that to become a thing.
37:49 You know, like, proper native mobile apps in Python.
37:53 Because right now, you know, I don't think there's a lot of great options.
37:57 I know you can do some stuff with Pythonista, but it's kind of stuck within, like, that app, right?
38:03 You can't just ship your own app to the app store.
38:05 And I've been playing with Ionic framework and Electron.js and Cordova and all those things.
38:13 And I would rather just not, you know.
38:14 But right now, there's not a super awesome option.
38:17 So it'd be cool if they were successful.
38:21 This portion of Talk Python to Me has been brought to you by Rollbar.
38:24 One of the frustrating things about being a developer is dealing with errors.
38:28 Relying on users to report errors, digging through log files, trying to debug issues,
38:33 or getting millions of alerts just flooding your inbox and ruining your day.
38:37 With Rollbar's full stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster.
38:43 Adding Rollbar to your Python app is as easy as pip install Rollbar.
38:48 You can start tracking production errors and deployments in eight minutes or less.
38:52 Are you considering self-hosting tools for security or compliance reasons?
38:56 Then you should really check out Rollbar's compliant SaaS option.
38:59 Get advanced security features and meet compliance without the hassle of self-hosting,
39:04 including HIPAA, ISO 27001, Privacy Shield, and more.
39:09 They'd love to give you a demo.
39:10 Give Rollbar a try today.
39:12 Go to talkpython.fm/Rollbar and check them out.
39:17 I think that being able to actually transpile into native code.
39:22 So a lot of other platforms like Ionic, etc. will use WebViews.
39:27 So you're still running a JavaScript app in your native application.
39:31 Right.
39:32 And you have all these performance issues.
39:34 Like large lists are super weird unless you choose like virtual scrolling add-ons and all kinds of stuff, right?
39:40 Yeah.
39:40 And like animations are a lot harder.
39:42 You just, you have that separation between your app and the bare metal.
39:47 So you can't do a lot of like GPU optimizations.
39:50 Yeah.
39:50 So fingers crossed for a little bit more from the Beware folks.
39:54 That'd be awesome.
39:55 Another one that a lot of people likely interact with, especially from the Flask world, is Jinja2.
40:01 Yeah.
40:01 So that's one of the things like the more you start looking around at the tools that you're using, the more you realize that a lot of them are using ASTs under the hood to accomplish some really cool things.
40:12 So, but it makes sense because, I mean, HTML in itself can be thought of in a tree structure.
40:17 And so it's really easy to parse HTML into a tree.
40:21 And then if you have variables in your Jinja template that need to be filled in, it's really easy to kind of pop those values in and then transpile everything back into an AST or into the HTML from the AST.
40:34 The template languages are pretty impressive the way they work for all the different web frameworks.
40:39 It's kind of cool to see them go.
40:41 Yeah.
40:41 And if you think about it, you can have little bits and pieces of Python code in your Jinja templates.
40:48 And so you can kind of see how all this comes together where you've got an AST that you can then edit and kind of piece additional bits of code into the AST and then transpile it all back.
41:00 Yeah.
41:00 Yeah.
41:01 It's definitely a cool example.
41:02 So all those are kind of within the realm of Python.
41:05 How about outside Python?
41:06 There's a lot of different code analysis that you can do.
41:10 Any sort of linting is usually going to use an AST under the hood.
41:13 And then we already talked about Babel in the JavaScript world.
41:17 There's also some really cool stuff that they're doing with CSS.
41:21 So especially with the advent of all these newer JavaScript frameworks, there's a huge debate over how you actually handle your CSS and your styling now.
41:32 So you can use an AST to do CSS transformations and really optimize the use of mixins and different media queries.
41:43 And you can kind of automate a bit more than what you used to be able to.
41:47 There's definitely a ton of that stuff over in the JavaScript web world.
41:50 It's quite interesting.
41:51 And all the minification and all types of stuff that happens before your code actually goes out to the web.
41:59 Yeah.
41:59 It's pretty cool.
42:00 So what else would you like to sort of let people know about why you got some time to speak with them?
42:06 One of the things that I have definitely been sort of reflecting on a lot recently is just going through my first time speaking at a conference.
42:14 And so I think I mentioned I wrote a blog post on it.
42:17 But I think that I want to kind of shed some light on what that process looks like.
42:22 And for a lot of people, it's like, it's really intimidating to actually go out there and especially to start off with a technical talk.
42:30 I think one of the really scary parts is I think a lot of people are afraid that when you get up in front of them,
42:35 the audience is just going to find that little one thing you don't know and just tear into you because you didn't know.
42:42 Well, actually, there's this one aspect of ASTs that does one thing.
42:45 And I think one of the things I've learned about the Python community is that people are generally really supportive.
42:51 And those types of things are not likely to happen unless you just have like really an unprepared presentation, I think.
42:59 I've gotten some really great feedback.
43:00 And I did have these little like, oh, well, you missed this little thing.
43:04 But I really liked this, this, this, and this, you know.
43:07 But I think I really urge people if you have a desire to speak at a conference to just do it and submit that proposal.
43:14 And you might get accepted and you might not.
43:16 But you also might get some really great feedback on your proposal.
43:19 I think it can absolutely change your position within the community, right?
43:25 Like once you've done one of these talks, like your talk is now on YouTube.
43:28 If for some reason you ever wanted to apply for a job, it sounds like you're being super successful not having a boss.
43:34 But if for some reason you wanted to, having that up there would be a really great resume item, right?
43:39 You could say, do you want to know if I know about it?
43:42 Look, here's me speaking for half an hour in front of hundreds of people doing it, right?
43:46 Yeah.
43:47 And it's also a really great way to like get accepted for other conferences too.
43:51 So like as soon as you have that first conference talk, you can kind of point to that and be like, yeah, I can actually hold my own in front of a crowd and stand up on stage and not completely forget how to speak.
44:03 I'm sure that's a concern for conference organizers and, you know, seeing someone who's obviously good at it is really, really opens doors, I would think.
44:12 And it doesn't have to just be like the main PyCon, right?
44:16 For example, PyCascades is a regional one.
44:19 They're all over the world.
44:21 But there's also meetups and user groups and smaller things, maybe even like a brown bag lunch talk at your company for the first type thing, right?
44:30 Like here, I'm going to tell you guys all about like web scraping.
44:32 Nobody knows about it here.
44:33 So let's talk about it.
44:34 Yeah.
44:34 And I always kind of remind myself to go back to that.
44:37 It's like a very, very well-known Venn diagram of like what you think everybody else knows and what you think you know and how there's really a lot less overlap in those spaces than you'd think and how much value you can really add.
44:52 Even though you just kind of assume, oh, everybody else probably already knows this.
44:56 Yeah, that's right.
44:57 It's easy to assume that, but I find it's actually rarely the case.
45:01 I used to do a lot of like in-person professional training and I would get on the phone to like set up some kind of event with the company I was going to.
45:10 And the managers would also say, our developers are super advanced.
45:14 They're really advanced.
45:15 Like just no beginner stuff.
45:16 We only want advanced stuff for them because they're the best.
45:19 All right, great.
45:20 You know, then we show up like, all right, we're just doing advanced stuff.
45:22 But hold on.
45:23 Can we talk about this not advanced stuff for like a day or two?
45:26 Because nobody knows it.
45:28 Like, what do you mean nobody knows it?
45:29 Like your manager just told me that, all right, great.
45:32 That's what I actually believe was the case.
45:33 But, you know, you don't want to tell that to people, right?
45:36 But I think there's a lot of value in sort of some of the foundational stuff for sure.
45:39 Yeah, absolutely.
45:40 Yeah.
45:40 So I think there's a bunch of opportunities for people to speak.
45:42 And it's also, you can kind of level up, right?
45:44 Like you do a talk at a user group and then that gives you the confidence to do a talk at like a regional conference.
45:49 And then you can do, you know, whatever else you want to do.
45:52 I was very pleasantly surprised at how rewarding it was and how validating it was that like me going from somebody who has like a very technical theoretical background and being kind of mildly obsessed with ASTs and learning about CPython's compiler and interpreter.
46:10 And then being like, oh, yeah, I'm going to give a talk on this and having people be interested in it and want to talk about, you know, their perspectives and interests.
46:19 So one of the things that I gained from doing this talk is seeing all the dozens of different interpretations and interests that people had that this talk sparked for them, which was really interesting.
46:31 Yeah.
46:32 Yeah.
46:32 That's really rewarding.
46:33 I guess last thought on this one is how much time did it take you to prepare this whole half hour presentation with all the research and everything?
46:41 Like what kind of commitment was it?
46:43 Yeah.
46:43 I put 30 plus hours easy.
46:47 So that's like more than 10 times.
46:50 It was 30 minute talk.
46:51 So 60 times.
46:53 Yeah.
46:53 That's it still seems about right because it's a lot of research and your talk was really well put together.
46:59 So, yeah.
47:00 And that's just so that's just the talk preparation itself.
47:04 So I've been kind of iterating on my own like side research into CPython for.
47:10 Oh, geez.
47:11 Probably almost a year now.
47:13 Maybe over a year.
47:16 So there's definitely a lot of a lot of research that's gone into it.
47:20 A lot of, you know, shower thoughts and all that stuff.
47:22 Yeah.
47:23 Yeah.
47:23 Sure.
47:23 Well, like I said, it was really good talk.
47:25 We'll link to it from the show notes.
47:28 People can go watch it.
47:29 I think they just have like two whole days of video.
47:32 So I have a timestamp in the link.
47:34 So hopefully it starts playing.
47:35 But if for some reason it doesn't, it's like two hours and 45 minutes into day two or something like that.
47:42 But yeah, it was a really, really good talk.
47:44 And I'm glad you gave it there.
47:46 Thank you.
47:46 I'm glad to have given it.
47:47 Yeah.
47:48 So let me hit with the final two questions before I let you out of here.
47:53 First, if you're going to write some Python code, what editor do you use?
47:56 I always use Sublime on a daily basis.
48:01 I'm also really comfortable in VI and Vim just because I did a lot of sort of server management.
48:07 So I can get around there.
48:09 But I am a huge fan of Sublime's Anaconda package.
48:14 Oh, that is really nice.
48:15 Yeah.
48:15 Yeah.
48:15 That's not the regular.
48:16 That's not Anaconda as in the NumPy scientific world.
48:20 It's a totally different thing.
48:21 Maybe tell people about it.
48:22 It kind of brings a little bit of like the IDE experience to Sublime and allows you to
48:29 highlight, you know, where a function is defined and that kind of stuff.
48:33 Yeah.
48:33 It's really a really nice add on for it.
48:35 That's cool.
48:35 And notable PyPI package.
48:38 We talked about a bunch.
48:39 Let's grab one of those or another one.
48:41 Yeah.
48:41 I think the one that I always have to talk about when people ask me this is the request library.
48:48 So I think that was the first Python code that I saw that I was like, wow, this like it felt Pythonic to me.
48:56 Like I was like, this is how interacting with a package should feel.
49:00 And I think that Kenneth Reed always does a really excellent job of thinking about your package's API.
49:08 All of your code and your classes and all that have an API that humans have to interact with.
49:12 I love that that's a theme for all of his work.
49:15 Yeah.
49:15 The tagline, all of his stuff is for humans.
49:17 Requests is for humans.
49:19 And then records is like SQL for humans and all sorts of stuff.
49:22 Yeah.
49:23 Yeah.
49:23 Definitely good stuff.
49:24 He's doing really, really good work.
49:25 All right.
49:26 So final call to action.
49:28 People are excited about the stuff.
49:29 They want to learn more.
49:30 I mean, obviously they should check out your talk, but what else can they do to get started?
49:34 That is a great question.
49:36 I would encourage people to find the way that they want to apply ASTs.
49:43 So whether that's in linting or code highlighting or actually like getting in there and messing around with ASTs and find something that is like a really small thing that you can tweak and have fun with and see how you can actually improve your current workflow.
49:59 Yeah.
49:59 That sounds awesome.
50:00 All right.
50:00 Well, Emily, thank you so much for being on the show.
50:03 It was great to chat with you and thanks for sharing your AST project with us.
50:07 Thank you.
50:07 Yep.
50:07 Bye.
50:09 This has been another episode of Talk Python to Me.
50:12 Today's guest was Emily Morehouse.
50:15 And this episode has been brought to you by ParkMyCloud and Rollbar.
50:18 Do you hear that sucking noise?
50:20 That's your cloud provider making you pay for your idle instances.
50:24 Turn on ParkMyCloud, plug the leaks and save money.
50:27 Visit talkpython.fm/park to get started.
50:30 Rollbar takes the pain out of errors.
50:33 They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course.
50:41 As Talk Python to Me listeners, track a ridiculous number of errors for free at rollbar.com slash talkpythontome.
50:48 Are you or a colleague trying to learn Python?
50:51 Have you tried books and videos that just left you bored by covering topics point by point?
50:56 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.
51:04 And if you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/pythonic.
51:13 Be sure to subscribe to the show.
51:14 Open your favorite podcatcher and search for Python.
51:16 We should be right at the top.
51:18 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.
51:27 This is your host, Michael Kennedy.
51:29 Thanks so much for listening.
51:30 I really appreciate it.
51:31 Now get out there and write some Python code.
51:33 I really appreciate it.