Monitor errors and performance issues with

#49: Microsoft's JIT-based Python Project: Pyjion Transcript

Recorded on Wednesday, Feb 3, 2016.

00:00 This episode you'll learn about a project that has the potential to unlock massive innovation around how CPython understands and executes code. And it's coming from what many of you may consider an unlikely source: Microsoft and the recently open-sourced, cross-platform .NET Core runtime.

00:00 You'll meet Brett Cannon who works on Microsoft's Azure Data group. Along with Dino Viehland, he is working on a new initiative called Pyjion (pronounced Pigeon) P-y-j-i-on, a JIT framework that can become part of CPython itself paving the way for many new just-in-time compilation initiatives in the future.

00:00 This is episode number 49 of Talk Python To Me, recorded February 4th 2016.

00:00 Welcome to Talk Python to Me. A weekly podcast on Python the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy Follow me on twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at and follow the show on twitter via @talkpython.

00:00 This episode is brought to you by Hired and Snap CI. Thank them for supporting the show on twitter via @Hired_HQ and @snap_ci.

00:00 Hi everyone. I think you are going to love this episode. Brett is doing some amazing work and we talk about that in depth. But he's also a Python core developer and we spend a decent amount of time on Python 3 and moving from Python 2 and the whole story there.

00:00 I have just one piece of news for you before we get to the interview. It's just t-10 days until my kickstarter for Python Jumpstart by Building 10 Apps closes.

00:00 The initial feedback from the early access students has been universally positive. If you have backed the kickstarter with early access be sure to create an account at and send me a message via kickstarter so you can check out the first 6 chapters (3 hours) of the course.

00:00 If you're not sure what I'm talking about, check out my online course at

00:00 Now, let's hear about JIT innovation in CPython and more with Brett Cannon

02:22 Brett, welcome to the show.

02:24 Thanks for having me Michael.

02:24 I'm super excited to talk to you about this new project that you guys have going on with Python and Microsoft and yeah, we are going to dig into it, it'll be fun.

02:32 Yeah, I'm looking forward to it.

02:33 Absolutely. Before we get into that topic though, what's your story how did you get going in Python and programming and all that?

02:39 They are slightly long stories. Getting into programming, probably my earliest experience with anything you could potentially call programming, with Turtle back in third grade, I was lucky enough to be in a school that had a computer lab full of Apple 2Es and they would bring us in and say, "Oh, look you can do this little forward command and this little and make this little turtle graphic draw a line," and all this stuff.

03:00 Was it on the monitor that was just like monochrome green?

03:03 Yep, and that is partly sometimes when my terminal is that all green and black style because it's just what I started with back in the day.

03:12 That's awesome.

03:12 Si I did that but I didn't realize what the heck programming was, but I was thinking computer is kind of this fascinating black box that you somehow stick into this 5 1/4 inch floppies, and somehow we are in the world of Carmen San Diego place and it' like amazing. And then, in junior high I ended up taking a summer class on computers and it involved a little bit of Basic and I really took to it, I actually locked out and got so far ahead of the class the teacher just said, yeah, you can stop coming to class if you want for the rest of the summer. So, that was like, half way through. So, I got bit kind of early but I didn't really have any guidance or anything back then, I mean this is pre access to the internet, so I didn't really have any ways to know how to carry on. And then, when I went to junior college, my mom made me promise her that I would take a class in philosophy and a class in computer science, and I did both, and I loved them both, but in terms of computer science, I read through my C book within two weeks and then one night I spent 6 hours in front of my computer writing tic tac toe from scratch using really Basic terminal output. And I was basically hooked for life. In terms of-

04:20 That's really cool, I think we all have that moment where you sit down at a computer and you haven't- maybe you really enjoyed working with them or whatever, but then you kind of get into programming and you realize, "Wow, 8 hours have passed, and it feels like I just sat down." And like, then you are in the world, that's it.

04:39 Brought me my dinner at my desk, and just said, "Ok, I get it, you are just into this, just go with it, here is your food, eat it at some point tonight."

04:46 Awesome. And in terms of Python, I actually ended up going to Berkley and getting a degree in philosophy, because there were some issues trying to double major like I originally planned to do, but I did try to still take all the CS courses there, and there was a test to basically get into the inter of CS course at Berkley at the time. And, I thought they might have something about Object Oriented Programming and having learned C, I knew procedural but I didn't know object oriented programming. So, in fall of 2000 before I took the class in spring, I decided to try to find an object oriented programming language to learn OO from. And, I was reading and all the stuff, and Perl and Python caught my eye, but when I kept reading the Perl should be like the fifth or sixth language you learn, while people kept saying "Oh, Python is great for teaching", and I was like, "All right, then I will learn Python." And I did, and I loved it, and I just continued to use it for anything I could, and all my personal projects and just kept going and going with it and haven't looked back since.

05:41 Yeah, that's really cool. What language was your CS 1.01 course actually?

05:46 Scheme actually.

05:47 Interesting, my CS 1.01 class was Scheme as well, and I thought that was a very interesting choice for the introduction.

05:53 Yeah, it was really interesting. I mean, it does kind of do the syntax, but obviously, now being a Python user I really understand what it means to kind of really minimize the syntax in a nice way instead of slightly painful way with all those parenthesis. And, it was interesting, I mean, it is a nice way to try to get in like procedural programming and object oriented and functional, so it's really nice to do multi paradigm teach you the basics kind of introduction, they did actually, interesting enough for the last project 6:22 right of really basic logo interpreter, which, funny enough with such a bad experience for me, partly because of the way it worked out in terms of having to work with another team, and I had some issues with my team mates, I actually kind of got turned off on language design of all things, for a little while, and then I just over time realized I loved programming languages, learning how they worked, so I just reevaluated my view and just realized, "Ok, it was just a bad taste form bad experience" and I just realized I actually do have this weird little fascination with programming languages and luckily got over that little issue of mine.

06:57 Yeah, no kidding, and now you are a python core developer among other things, right?

07:02 Yeah. [laugh]

07:02 So back to the language design, at least on the internals.

07:06 Yeah, yeah.

07:07 Awesome. So, we are going to talk about Pyjion, this cool new GIT extension that you are going to have to tell me a little more while you most correctly characterize it for CPython, but before we do, I thought maybe you could give us like a high level view of two things- how CPython works, you know, what is sort of going on when we run our code, as is right, with the interpreter, and then, maybe a survey of the different implementations or runtimes, because a lot of people think there is just one Python from implementation or runtime perspective, and there is actually quite a variety already, right?

07:44 Yeah. Actually we are kind of lucky in the Python community for having a lot of really top quality implementations. But, to target your first question of how CPython works, which is for those who don't know, CPython is the version of Python you get form and the reason it's called CPython is because it's implemented in C and has a C API which makes it easy to embed and stuff, like blunder. Anyway, basically the way Python works is more or less like a traditional interpreted programming language, where you write your source code, Python access a VM reads the source code, parses it into individual tokens like If and Def and plus sign and whatever, and then that gets turned into what's called a concrete syntax tree, which is kind of just like the way the grammar is written kind of nest 8:32 and this is how you get your priorities in terms of presidence, like multiplication happens before plus, which happens before whatever, and that all works on the concrete syntax tree in terms of how it nests itself. And then that gets passed into a compiler, within Python that gets turns that into what's called an abstract syntax tree, which is much more high level, like this is addition instead of plus, and this is loading a value and this is an actual number, and this is a function call. And then, that gets passed further down into the bytecode compiler which will then shake that AST and spit out Python bytecode, and that is actually what's stores basically in your PYC files. Actually, technically they are marshalled code objects. And then when Python wants to execute that, it just loads up those bytecodes and just has a really big for loop that basically reads through those individual bytecodes and it goes, "Ok, what do you want me to do? You want me to load a const", const is zero and it happens the 9:31 to none in every code object, so I am going to put none onto what is called the execution stack because Python stack base instead of register base, so CPU is a register based, stack based Vms such as Python, Java is another one, it's fairly common because it's easier to implement. Anyway, you can do stuff like load cons none, or load number, load another number on the stack so the stack now had two numbers and then the loop 9:57 the C eval loop are for evaluation loop.

10:01 yeah, it's worth pointing out to the listeners I think who maybe haven't gone and look to the source code there, when you say it's a big loop, it's like 3000 lines of C code, right, it's a big for loop.

10:13 Yeah, it literally is a massive for loop. If you actually go to Python source code, and you look in the Python directory, there is a file in there called Ceval.C you can open that up, and you will literally find nest in that file somewhere just the for loop, with the huge switch statement, that does nothing more than just executes these little bytecodes, like if it hits add, what it will do is just pop two values off of what is basically a chunk of memory where we know what pointers are on the stack and just go and take that Python object, and we are going to take that Python object and execute it under AD in the right way or under R ad and then make that all happen, get back a Python object and stick that back on the stack, and then just go back to the top of the for loop and just keep going and going and going until you are done and your program exists.

11:01 Yeah, and you can actually see that bytecode by taking loading up some Python module or function, or class, or whatever, and importing the disassembly module and you can actually have it spit out the bytecode for like say a function, right?

11:16 Yep, and I do this all the time on Pyjion actually. Basically, you can import the dis module, and in there, there is a dis functions, so if you go dis.dis and then pass in any callable basically, so function, method, whatever, and it will just print out to standard out in your rapple, all the bytecode and it will give you information like what line this is corelate to, what is the bytecode, what's the argument to that bytecode, the actual byte offset, and the whole bunch of other interesting things. And the dis module documentation actually lists most of the bytecode, I actually have found a couple of ?? codes that weren't actually documented, and there is a bug for that, but the majority of the bytecode is actually talking in there so if you are really interested you can have a look to see actually how we kind of break down the operations for Python for performance reasons and such.

12:04 Yeah. That's really interesting. And, for the listeners who want to dig deeper into this, on show 22, I talked with Philip Guo about his sort of CPython internals graduate course, he did in the University of New York, have you seen his work?

12:21 No, I haven't yet.

12:21 He basically recorded ten hours of a graduate computer science course studying the internals of CPython and spend a lot of time Ceval.c, and it's on You Tube you can go check it out, so it's really cool. So, that's interesting.

12:35 Oh, it's probably actually answer to your second question, to all the other interpreters.

12:37 Yeah, let's talk about the interpreters.

12:40 As I said earlier, CPython is kind of, it's the one you get from and kind of the one most people are aware of, but there is actually a bunch of other ones, so one of the more commonly known alternative interpreters or VMs or implementations of Python is Jython, which is Python implemented in Java, so a lot of people love that whenever they have to write a Java app and want some easy scripting to plug in, or have some requirement that they have to run on the JVM. Apparently to really the defense industry, for some reason.

13:11 That's interesting. Once you get a VM approved, you just don't mess with it I'd say.

13:14 Yeah, and one really cool perk at this is PyCon every so often there is a really cool like talk about like flame fighter jets with Python using Jython and stuff like that. So it does at least leads to some really cool talks.

13:28 Nice. And here is the after burner function, you just call this.

13:32 Exactly. There is IronPython, which is Python implemented in C Sharp, so that is usable form .Net, so it's often used for embedding in .Net applications that need scripting or anyone who needs to run on top of the CLR. Those are the two big ones. Obviously, in terms of direct alternatives, there is obviously PYpY which I think a lot of people know about, which is two things- there is PyPy the implementation of Python written in Python, although technically it's a subset of Python, called RPython which is specifically restricted such that they can infer a lot of information about it, so that can be compiled down, straight to its basically assembly. And then, there is PyPy the tool chain, which they developed for PyPy the Python implementation, which is basically this tool chain to create custom JITs for programming languages. So you can take the PyPy tool chain and not just implement Python in Python, but they've done it for like PHP, for instance, and so you can actually write alternative implementations of languages in RPython, and have it spit out the custom just designed for your language. Those are the key ones that have actually finished, in terms of compatibility with some specific version of Python, all of them currently target 2.7 PyPy has support for Python 3.2 but obviously that's kind of an old support in terms of Python 3. And then, there is the new up and coming which is Pystin, which is being sponsored by Dropbox, and they are also targeting 2.7 and they are trying to version of Python that is as compatible with C Python as possible, including the C extension API, but what they are doing is they have added a JIT, or using a JIT from LOVM, so they are trying to make 2.7 fast using LOVM JIT and plan as much as the C code and APIs they can from CPython to try to be compatible to extension modules which is a common problem that PyPy, IronPython and Jython have.

15:27 Right, that one actually seems to be really interesting and have a lot of potential, because if you think of companies that are sort of Python power houses, Dropbox is definitely among them.

15:40 Yeah, it doesn't help- definitely it does not hurt when Guido went to work there and they are just going to color their- and several other people, Benjamin Peterson works for them, so they already have a couple of high up people in the python community working there and their whole server stack in the back I believe is at least mostly Python, their desktop clients are Python, they are definitely Python heavy there.

16:00 Yeah, absolutely. So, how does Pyjoin relate to- the thing that came to mind for me when I saw it announced was- a friend of mine, Craig Berstein sent me a message on Twitter and said, "Hey you have to check this out," and I was like, "Oh, that is awesome." And it was just a Twitter message check out this JIT version of Python coming from Microsoft, well, I don't know anything about it but maybe it's like PyPy. So, what are you guys actually building over there, what is this?

16:29 Pyjion was actually started by Dino Viehland, my coworkers and I believe, I don't know if he is necessary the sole creator but definitely one of the original creators of IronPython, back at PyCon US 2015, which was in Montreal. During the language summit, Lary Hastings, the release manager for Python 3.4 and 3.5 got up in front of core developers and say, "What can we do to get more people to switch to Python 3 faster?" Because, obviously we all think Python 3 is awesome and legacy Python 2 is fine, but everyone should get off that at some point.

17:01 Yeah. I agree. So what do you do, right?

17:04 Yeah. That could be a whole other question on that one, Michael. So he said, what can we do, what can we do, and he said performance is always a good thing, people always seem to want more performance no matter how well Python does, people are always hungry for more, and Dino went, yeah, that's a good idea, I know I'll see- .Net just got open sourced, back in April 2015, and he said, I will see if I could write a JIT for CPython using coreclr, because Dino also happened to used to be on the clr team so he knows the codes like the back of his hand. And so he sort of hack on it at the conference, and actually managed to get somewhere and he premiered it at PyData Seeatle back in July when we hosted Microsoft, and I got brought on to basically help him flash out the goals, there is basically 3 goals, one is to develop CAPI for CPython to basically make it pluggable for JIT, like one of the tough things that people have always done like is they directly tie into a forkasy Python more or less, which really tightly couples it but it also means that for instance if LVM does not work for your workload for whatever reason, you are kind of just stuck and it's just not an option, while we would rather basically make it so that there is just an API to plug in a JIT and then that way CPython doesn't have to ship with the JIT but it's totally usable by a JIT, and in that way if LOVM or coreclr which is the . Net JIT or Chakra or VA or whatever JIT you want is someone basically writes the code to your plug for CPython into that JIT, you can use whatever works best for you.

18:46 That's really cool, I think it's the super noble goal to say, let's stop everybody starting from scratch rebuilding the CPython sort of implementation and weaving in their version of a JIT and say let's just find a way so that you don't have to write that ever again and you just plug in the pieces.

19:08 Yeah, exactly, and actually one of the other goals we have with this is not only developing the API but goal number two is to write JIt for CPython using the coreclr and using that to drive the API design that we want to push back up to CPython eventually. But the third goal is actually the design kind of a JIT framework for CPython such that we write the framework that drives the code emission for the JIT and then all the JIT people have to do is basically just write to the interface of this framework and don't have to worry about specific semantics necessarily, so for instance, you would build as a JIT author go, "Ok, I need to know how to emit an integer onto a stack and I need to know how to do ad or adint," but then the framework would actually handle going, "ok, well here is the Python bytecode that implements ad let's actually do an adcall," or, "Hey, I know this thing is actually an integer, let's do an adincall and not just a generic Python ad" and be able to handle that level of diference so that there is a lot less busy work that is common to all the JITs like type ineference and such. And extract that out so that it's even easier to ad a JIT to CPython.

20:22 So is that like two levels, on like one hand you have a straight CAPI at the CPython level, and then, optionally you can choose to use a C++ framework that makes it so you do less work and you plug in your sort of events or steps?

20:35 Yeah, exactly. It's giving me the bare minimum into CPython so that CPython at least has this option without everyone having to do a fork, and as well as pushing downlevel to a separate project whereas the common stuff is extrapulated out because it's built off the same base line and then only thing that is really different is what's unique to the JITs and then that way everyone's work is as simple as possible to try to make this work.

20:57 Ok, that makes a lot of sense.

20:57 [music]

20:57 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

20:57 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.

20:57 Typically, candidates receive 5 or more offers in just the first week and there are no obligations ever.

20:57 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $1,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link and Hired will double the signing bonus to $2,000!

20:57 Opportunity is knocking, visit and answer the call.

20:57 [music]

21:57 Would you still be able to support things like method inligning and things like that with C++ framework?

22:04 We don't know yet, but there is technically no reason why not, what's actually really interesting is we started all this work and we actually weren't ready to premiere any of this yet, we've been doing this out in the open on github, but as you mentioned Michael, people started to tweet it and then it made it to reddit and then to hacker news and suddenly everyone is asking questions and stuff, but in the middle of all this, there has been a lot of work in the past two months of various core developers putting a lot of time and effort trying to speed up CPython itself and part of this is actually trying to cash method objects, so that they can get cashed in the code object and actually not have to every time you try to execute like a call bytecode and have to go to like the object, pull out the method object and then call that but actually just cash the method object I already have it, I don't need to re-access that attribute on the object and so it's already starting to go its way up into CPython and there shouldn't technically be any reason why we can't just piggyback off of that and just go, "Oh well, they've already cashed this' or use some technique of basically if the object hasn't changed, I really don't need to worry about previous versions of this being different, so I can just cash it and then reuse it and then just save myself the hassle, I mean the get a method back, or same things with built ins, right, like if you ever want to call lin, some people cash it locally for performance, but the work that is going on is actually you are going to make that a move point because it's going to start to notice when the built ins and the globals for your code have not changed and just go well, I have our cash lin locally, because I have used it previously so I might as well just pull that object immediately out of my cash instead of just try in the local name space not having it there, going to the global name space not having it there then going to the built in name space and having to pull out lin again, for every time through a loop for instance and call that.

23:54 Yeah, that's really great and I suspect you could just say here is the JIT compile machine instructions just cash that, or something like this, yeah?

24:01 Yeah, exactly. So, a lot of this work that is happening directly in CPython bubbles down both directions into helping JITs in various ways, right, this whole detecting what state the name space is from the last time you looked at it, has it changed or not, that's probably going to end up in CPython itself as an implementation detail but it also means all the JITs will be able to go, "Oh look, the built in name space hasn't changed, so that means if I cash lin I don't need to worry about anything change, I won't have to pay for a dictionary lookup, I can just pull it right out of my array cashed objects and just go with it."

24:35 Ok, that sounds like it will be great regardless of whether you are talking about a JIT or just running your code, right.

24:43 Yeah, those can be fantastic, everyone is going to win on that one.

24:44 Yeah, that's cool. One of the things that I think is surprisingly slow in Python is calling methods, right? It's more expensive maybe than it should be. What other stuff kind of falls into that class that you can think of?

24:59 So just to give an explanation of why that is so slow is, if you look at what you can do, with a method or function call, Python's got a really rich set of semantics, right? We have positional arguments, we have keyword arguments, we have *args, I mean we have **kwargs, we have keyword only arguments in Python 3, I mean, there are default values in that, there is a lot of different ways to try to build this stuff up, into something that we can use to call a function with and some of them are-

25:34 Right, and maybe even closures as well, right, on top of that?

25:37 Yes. Actually, luckily, that is not actually too costly for the actual call, it's just when it comes time to look up the value you've got to work your way up, but that kind of ties into it right, so that's the other kind of expensive thing you have to do in Python is there is the cost of making a call itself, because it just takes so much effort to build up what all the arguments should be, and then making the call, and then there is the cost of just looking up the method or the function, right, because as you mentioned, there is closures so you have kind of this- you have local scope, you have this potential closure scope which are like sole variables, or free variables, you've got your global name space, you've got your built in name space, and then that's on top of whether or not you define like a tender getattr, at method on your object, this is going to have its own set of code to call to try to figure what the heck you want, whether it can get it for you. And that's one of the real expenses trying to basically access attributes which method is happened to be so that's one of the reasons method calls can be so expensive, it's not just the cost of getting the object, but it's also the call itself.

26:43 Ok. Interesting, and this cashing in CPython, put Pyjion aside for a moment, that would make a big difference?

26:51 Yeah. Yuri, I believe he lives in Toronto actually, he has actually developed a new opt codes for instance load method and call method, which directly by themselves have a slight performance perk because they kind of skip some steps you typically have to make a method ready, but Yuri has also been the one working on this cashing stuff building off of Victor Stinner's dictionary versioning, and what he is doing is with his call methods and load methods he is basically grabbing the unbound methods and sticking on stack and just calling them directly without doing some extra work. But with the cashing,t hat thing he sticks on the stack you can actually squirrel away and say hey next time I come to this call method, or load method, I can just pull it right out of this cash as long as stuff hasn't changed in the name spaces above me, and that is how he is trying to make method calls cheaper. It's basically storing away the method object and fetching right back if he can make sure for a fact that nothings has changed since last time he tried to get that object out.

28:03 Ok, that's awesome, what's the time frame? Any ideas? Is it still just experimental or-

28:08 That's a good question. So, there is a PEP- so Victor Stinner has started what he is called Fat Python, you can google for that and I am sure you'll find it, he has currently has 3 PEP actually, PEP 509 handles dictionary versioning, which is important for a name spaces and cashing because you need to know if something like in your global name space or your built in name space or even your local name space has changed because all name spaces in Python are dictionaries which is why you can introspect so much. 510 is having guards to bytecode so that he can do stuff like add a guard saying, "Hey, if globals hasn't changed and built-ins hasn't changed use this version of len," this is before Yuri started, and then he has implemented PEP 511 he is trying to add actually API for doing AST transformations. So that you can basically plug in- AST transformations do like well if you are doing a number plus a number we can just make it a number and skip the plus. As of right now, PEP 510 and 511 I don't know where they are heading quite yet, but PEP 509 seems to be fairly well accepted and it's just a question of Victor finalizing the PEP and the design exactly and getting accepted, so I really don't see any reason at all why that won't make it into Python 3.6, and Yuri stuff, he has already got patches and he has benchmarked it and showed it working. And, there is some discussion about whether or not these current approaches the best or not, but I personally do not see any reason why any of this won't make it in 3.6 either.

29:42 3.6, ok, that's pretty excellent, that's not too far out.

29:45 Yeah, I know, I think what we do to hit beta in September, so as long as all this can get wrapped up by then, it will all line in python 3.6 and I should mention all this stuff is looking like Yuri stuff I think is adding up between 5 and 10 per cent across the board speed up improvements, and depending on how your code looks I think he is up to 20 per cent faster.

30:08 Yeah, that's a really big deal. Ok, awesome. I want to talk about the coreclr a little bit, but before we do, you said something that I didn't expect you to say when we were talking about jitters and plug-it-in jitters, and that was V8 or Chakra, that is awesome. So, somehow we could plug in the Javascript engine from Chrome V8 or the one from IE edge, what would that look like?

30:32 We haven't really explored it yet, but it's definitely an idea we had actually before Chakra went open source the Chakra team reached out to Dino and said, hey we think this might be useful to your project. The thinking is, because Javascript is as dynamic as it is, and all these JITs have to be designed to JIT quickly because obviously, if you are in your browser no one wants to wait for their favorite web based email client to start running, so they are really fast at the start, but they also have to handle dynamicism really well, because Javascript just like Python can easily have attributes added and removed, and changed at any time, and so they have to be really flexible in terms of how they handle that kind of workload. Well, coreclr obviously does its best to be really good all around JIT, obviously its heavy uses like FSharp and CSharp and more static based languages, the thinking is that if we try to use a JIT that worries about the language that's as dynamic as Javascript, we should be able to actually piggyback on all that work, and actually have a JIT that work really well for Python because it's already designed to deal with all the dynamicism of program language like Python and Javascript have.

31:37 That's super interesting, and I think if you have two distinct examples API as different as the clr, and Javascript, you have a pretty robust APi right?

31:52 Yeah, and that's the other thinking too, we want to get the coreclr done and passing over the Python test suite as much as reasonably possible so that we can go, ok, our git framework that we designed to help drive these gits covers all the possible edge cases and basically is good enough that if you implement these things you will get Python compatibility. And that way we can just pluging and make sure that all the stuff just works both in two completely different JITs, targeted different types of languages and have a just all fall through. And honestly, it's a nice way to do performance comparisons for what kind of JIT would probably work best for Python.

32:33 Awesome, that sounds like a really good idea. I've done a fair amount of work with CSharp and the CLR and I know what the coreclr is but I suspect most listeners when they hear .Net they think oh, it's a windows thing. But you guys actually are doing quite a bit of different stuff, now that Satya is in charge there is kind of a new man there, right, so tell people about the coreclr?

32:56 I believe it was last year, it was before I joined Microsoft past July, basically all of .Net was open sourced, so previously this was all this closed source thing that was very Windows only except for mono which kind of initially reverse engineered a bunch of things and then Microsoft said, oh, you know we can at least open source like I believe like the test suite, and some other things for you to test your compatibility, but Satya Nadella, the CEO in Microsoft has really pushed for open source Microsoft both its use but also contributing and doing things in the open both as in starting from scratch that Microsoft has done in open sourcing those and also giving back to pretty existing open source projects. And one of the things they did was they completely the open sourced .Net so .Net actually- I don't know if they the official release yet, but if you look at least their integration tests, they are passing on Linux and OS10 on top of windows. For instance Pyjion right now is Windows only, purely because of momentum of laziness on Dino and my part and it has nothing to do with using coreclr because coreclr uses like cmake for its builds so it's already got our cross platform build scripts setup and all that, it's just basically Dino and I for Pyjion haven't bothered to write the visual studio solution file in Cmake to be able to run it on Linux or OS10.

34:20 I think that's going to breathe a lot of new interests into sort of whole clr and the CSharp side of things from people that are just saying, look, Windows is not an option for whatever reason for us.

34:33 Yeah, and I really hope it does too, because I did java development and google and honestly I like CSharp a lot more, Microsoft is in a really good job of sheparding that language forward and continually evolving it, well I don't think oracle has done such a great job at Java and CSharp has done a better job at going forward continuously, I mean, CSharp has local type inference, this is not a new technology, to this day and age and CSharp has it yet Java still doesn't have it, and it always drove me nuts. I mean bloody C++ has local type inference and C++ 11 using auto and yet java still doesn't have that kind of stuff, and it's always kind of buggled my mind that unless you use generics in Java you can't like leave out the type and so- I really do hope the open source in coreclr will be available on Linux and OS10 on top of Windows is really going to get more people to really take serious look at CSharp and FSharp.

35:29 Absolutely. And, it definitely makes it your project, absolutely, broadly applicable to all the Python guys right, because if for some reason you said it's kind of like IronPython, it is really cool implementation of Python on .Net, but it's just tied to Windows, right, that would really 35:46 it but the fact that it's starting out with a base that could be on any on the major platforms is cool.

35:29 [music]

35:29 This episode is brought to you by SnapCI, the only hosted, cloud based continuous integration and delivery solution that offers the multi stage pipelines as a built in feature.

35:29 SnapCI is built to follow best practices like automated build, testing before integration, and it provides high visibility into who is doing what. Just connect Snap to your GitHub repo and it automatically builds the first pipeline for you, it's simple enough for those who are new to continuous integration, yet powerful enough to run dozens of parallel pipelines. More reliable and frequent releases, that's Snap.

35:29 For a free, no obligation 30 day trial just go to

35:29 [music]

36:54 Technically, I am on the data science tools team in data analytics in cloud and enterprises at Microsoft, and Azure supports Linux right, on top of Windows, so it would be really silly of us to develop something that only part of our client base could use, right, we want to get Pyjion such that you can use this on your Azure apps or in data analytics and that includes Azure machine learning, so we have this thing called Azure ML studio where it's this whole drag and drop machine learning system in the browser and it's really cool. And you can actually use Python code to like transform data and actually run analyses on it, and do all this cool stuff. And because it's machine learning it doesn't happen in second, it can take 30 seconds or 5 minutes or half an hour, whatever, these workloads take enough time that a JIT would be really beneficial. So it makes total sense both form Azure ML but also just Azure in general to support multiple languages, so it just would honestly be stupid of us not to try to support more than just Windows because we would be leaving out part of our client base and that is just not how you win users.

37:59 It's definitely not. So, I have a couple of questions about maybe like what the future might hold in a Pyjion type of world. I've been thinking about kind of what you guys were talking about, and what does it take to dramatically move people into Python 3? Performance is good, 20% increase in performance is really good for like we were talking before, and those types of things, but what would really sort of hit people in the face and go, "Yeah, this is different," and I think better threading is possibly number one, like removing the global interpreter lock in some way-

38:37 [laugh]

38:39 Does this at all touch this concept?

38:41 No, because Pyjion and the JIT API we are trying to design, one of the key things is we are trying to be compatible with extension modules, C extension modules. Because, that's always been a big limitation of PyPy, right like if you write C code and interface using cffi, that will get you a c extension module for Python that works in both PyPy and in CPython itself, but unfortunately that requires getting people to use cffi, which is a great project by the way and I do encourage people to consider that when they need to wrap some C code. But, there is also a lot of pre existing C extension code, I mean, this is why PyPy for instance before they create cffi started to write NumPy from scratch in RPython that's their NumPyPy project.

39:26 You guys definitely don't want to get down that path.

39:27 Yeah, exactly, right, we are trying to avoid that, completely. The problem is is extension modules are designed around the concept of the gil, right the way garbage collection works in Python is reference counting and all the C code works with the assumption that that's how it works and stuff won't magically disappear if you don't 39:47 Python object at the C level, it will stick around so if you get in and it increment that reference count and just leave it increment until you finally done with it at the very end, that will guarantee that the object isn't garbage collected. And there is just a ton of assumptions in the C code, it's not just Python itself but any third party C code. And so, getting rid of the gil without breaking basically the world of C extension modules, would be very difficult. So, I get where it all comes from, people's desire to get rid of the gil I do think some people do get a little huffy about it when they really don't need to, I mean, if you do any IO it really doesn't matter, it's only when your Cpu bound is it's everyone come up.

40:28 Yeah, absolutely.

40:29 But I do get why people do want faster and if you are doing like I know this comes a lot form the scientific Python community, if you are doing a lot of CPU bound stuff you really want to not have to have the gil, right, and we get it, it's just one of these rock in a hard place where the rock is cpu performance but then the hard place is all the backwards compatibility with all the pre existing c extension code.

40:51 Right, like hey we have this really fast thing, but you can't use all the stuff that you want to use.

40:57 Yeah, exactly, it's like the scientific Python community going, "Ok, you can't use NumPy and possible Scikit-learn," although that's written in Cython so at least they have a chance, but like "NumPy would not work. You ok with that?" I don't really see that going down very well.

41:15 Yeah, we tried that with Python 2 and Python 3 kind of.

41:17 Yeah, exactly.

41:19 It hasn't gone on so well.

41:20 Exactly, look at NumPy. PyPy isn't fully compatible so it's like, "Hey, scientific community, do you want to run on PyPy?" It's like, "Do you have NumPy?" "No." "Meh."

41:29 "I'm not so excited any more."

41:30 "Maybe sometimes. For some things." So, it's a really tough position to be in, where people ask for this without realizing that the ramlfications of the community, and as you pointed out Michael, we've done this once with Python 2 and 3 right, where we said, "Ok, for the benefits of the community we are going to break backwards compatibility" and there a way to write code that works in Python 2 and 3, it takes an effort, it's not like going from Python 2.6 to 2.7, there is actually some effort that has to be put in. And, we paid a price for it. Now, I don't regret the decision, but it does bring up to the point that- does the community really want to put up with this again at the C level and I don't know if they do, even if it does get them the guilt free life. Now, I am sure some people are going to say, "God yes, I will totally rewrite all my C extension code to completely ignore whatever it has to and change whatever it has to to get around the GIL," but the question is what solutions we have that wouldn't help migrate existing code and would it be reasonable. And I simply just don't have an answer to that.

42:30 Ok, well, that's I think a really interesting sort of both sides of the debate to think about for the listeners to think about when they talk about that topic. So, with Pyjion, is it too soon to ask about performance or anything like this and how that's looking, or-

42:47 You can always ask the question, I just can't always give a good answer. [laugh]

42:53 Any news there or you are just not fully baked yet.

42:56 The current update on that is- I'll give two updates, I'll give one on compatibility, and one on performance. I'll start with the performance- it's not bad but it's not better. So back in November I was lucky enough to be invited to give the opening keynote to PyCon Canada, and the video is upon YouTube so you can find that if you want. But basically I did a survey of all of the- an unscientific survey of Python interpreters. And I basically listed the history of all the different implementations to Python over the decades, because Python is 25 year old. I benchmarked everything, because it had been a while since someone had benchmarked all the interpreters and I included Pyjion in it because I was curious because we hadn't really done any benchmarking. In general, some things were faster, some things were slower. The median across all the entire Python benchmark suite was slightly slower than Python 2.7 but if you looked at the geometric mean it was actually faster. But it was all within not huge jump between the two, and I think we were still faster than for instance Jython or IronPython. The performance isn't bad, it's kind of maybe on parallel a little slower but this is with like zero automatization

44:16 Yes, so my follow up question was if you are already sort of tide and you have not done a ton of optimization, that's actually a really good place to be.

44:23 Yeah, exactly, that was one of the key metrics we wanted to hit initially, was, "Ok, can we get to compatibility and not have performance suck?" More or less. And use that kind of the showing that this is not a waste of y time and Dino's time to pursue. That it's actually going to be worth all of this effort and that there is actually a chance that this i going to pay off and actually be useful. And, I will say that as of yesterday we are more or less compatible with CPython minus supporting tracing profiling and anyone who touches sys. under git frame. So we basically fairly compatible now.

45:08 Yeah, that's really good actually.

45:09 Yeah. There were some hairy bugs in there, but Dino deserves a lot of credit for figuring out how to fix most of the bugs, we actually have test45:18 file in our tests directory that lists the nearly 400 tests that Python 3.5 has, you can look at it to see what is left to do but most of them are actually profiling related or tracing related. There is one or two that are dealing with actually odd semantic compatibility, that we think probably needs to get change actually upstream in CPython itself, but otherwise everything seems to be tracing based, profiling bases or using sys 45:45 basically, stuff that would slow everything down anyway if you use, if you are using your JIT you probably not going to want to touch that stuff anyway. So, yeah, we are actually pretty happy and we think we are more or less compatible at this point of release, enough to be willing to go to PyCon and say, "We are basically compatible, you should give us a shot." If performance ends up being good by then.

46:03 Ok. That's a really good start, I think.

46:06 Yeah, we are really happy that we've managed to hit this compatibility spot now because we've proposed a talk at PyCon, obviously we don't know if it's been accepted yet, but our hope is to now that we have compatibility to try to spend the next like 2, 3 months to train the ramp up performance somewhat and see how far we can get, and now we can more consistently either match or actually start beating Python 3.5 somehow.

46:28 Right. Interesting. Do you feel that CPython itself is getting better because the pressure that you are putting on it from this like slightly different use case?

46:38 I don't think it's really coming from us, I think it's coming from all the core devs who are honestly a little tired of people dragging their feet switching to Python 3.

46:47 Yeah.

46:46 We realize that we can give so many carrots in terms of features and stuff, but you kind of have to be inspired, to come up with a new feature, and actually there is a really cool one I can talk about if you want, coming to Python 3.6 that I think a lot of people are going to love.

47:03 Yes, tell us.

47:03 Erik Smith, has implemented something we are calling format strings, or f strings, so if you take a string 47:13 pre fix it with f, you can use the formatting that you use with str.format, except you don't have to make the format call and you can specify the name of a variable, and it will directly do string substitution. So, if you did span = 42 and had a string constant starting with f and then said my cost is and {span} and that was it, no format call, no percent or whatever, and you execute that in Python 3.6, it will actually turn that string into my value is 42.

47:46 Ok, that's pretty awesome. That sounds a little bit like the swift string interpolation,

47:46 or you know, what CSharp 6 adopted after that as well. The same basic idea?

47:56 Yeah. Exactly. What is even cooler about it is beyond the fact that it keeps almost full compatibility, there is an edge case that I don't remember of the substitution but basically it worked exactly the same way as str.format but what is really cool, is Eric implemented a new bytecode for it, so it's actually faster than str.format and faster than using module of the percent sign for a string interpolation, so it's actually going to be the fastest way to do string interpolation in Python.

48:25 All right. So, I asked about the Gil, you know and that was really interesting answer, thanks. Are there other advantages that this type of git Api might bring that I am not thinking about or that are not entirely obvious?

48:39 Beyond just raw performance increases with compatibility with C extension modules, not specifically. Basically, as I was so not eloquently saying earlier, we realize to get people to Python 3 we have to add- we can add new features which is one form of a caret, but it requires inspiration to come up with those new features. The other way to do it that doesn't require any inspiration, because everyone always wants more is to improve performance, and hence this is why Dino decided to give this a try at PyCon last year and it's looking like it's going to pay off. But this is also why for instance Victor Stinner is putting all this effort into it for red hat, and why Yuri for his own consulting company, is putting all this time. It's basically just people really like fast and if we can give them fast, we hope that it gives people more ammunition to go to like their manager and say, "Look, Python 3 is faster than Python 2, we should put in the effort we are going to get the performance, it's worth it." Because, I did a blog post on this where I compared 5 stages of grief to the 5 stages for he Python 3 transition, for the community, Everyone seems to at least be at the depression stage which is a stage 4 and then, some people will be lucky enough to get to stage 5 and move on to Python 3 and realize how much nicer it is and all that. But those who are stuck in the depression stage are usually people who work at corporations where they've just been told, "We don't see enough of a win, I don't want to put any time and effort and resources into getting our code to move to Python 3."

50:03 Right. It might be better but the manager who makes that decision doesn't want to possibly bear the burden of saying, "Yeah, we decided to switch but now we can't release our app for 6 months because we are actually not as quick at converting-" It's just easier to do nothing right?

50:20 Yeah, exactly, and for me, it's a little frustrating because I put a lot of personal time and effort in, back in the summer/fall 2014 to make porting a lot easier and it can be done file by file, right, I think one of the bog problems is people feel like they have to port the entire code base at once, and they really don't have to, like there is still people out there who think 2 or 3 is the cutting edge of porting Python 2 code to Python 3 and it's not at all, I don't even recommend it, if you go to there is a how to section and there is a doc in there that I wrote that explains the best practices for porting your code form 2 to 3 but basically, you can write your Python 2 to 3 code that's compatible on both versions, and you could do it file by file, you don't have to do with this huge massive "let's change everything." But the deal, and that's great for engineers, who understand like- having trace backs putting exceptions and having changed trace backs in your code, so that when you trigger an exception you can say this exception was caused by this exception that was caused by this exception, it's really useful, but you might not be able to sell that to your manager but if you can tell your manager, "Hey you know what, if we switch Python 3.6 is looking to be 10, 20% faster than Python 2.7, that's a great performance win that will allow us to penal x number more requests per second with no new hardware if we just put the time into move our code over, wouldn't that be fantastic?" That's hopefully easier for some of these companies.

51:42 I feel a little bit like boiling the frog, you know, the analogy of you guys keep adding awesome new stuff every time and it is just getting cooler and cooler, but there i snot that jolt, that goes, "We have to go now" right, it's just been so sort of smooth.

51:57 Yeah. Exactly. Then to give the job adding new features slowly over time instead of like hiding them in our back pocket and suddenly spring them to the community and like, "Hey look, all this new stuff!"

52:09 That darn open source, people keep figuring out what you are up to.

52:10 That's when the stick of 2020 comes in right, of like all right, legacy Python 2 being supported by the core developers for free are going to go away, so either port your code if you want free support or go pay someone like red hat or canonical to support you in Python 2 because it isn’t going to be free anymore.

52:30 Yeah.

52:31 So do you want to pay basically red hat in 2020 to support your Python 2 code or do you want to pay your own engineers now to move to Python 3 and then it becomes cost analyses of-

52:41 Yeah, get all the benefits now, 2020 sounds so far away, but it is actually 2016, I mean, that's only 4 years, that's not really that far for large code bases.

52:51 And the other thing I am afraid people aren't thinking of is like, "Oh, 2020 is not that far, I'll start in 2020" it's like, "No, no, you need to finish your transition by 2020," it's not start in 2020, it's done by 2020. So, you better start sooner rather than later. And, I mean, there is still stuff being done that make the porting easier, for instance, the type ends that were Guido's in Python 3 and is back port to Python 2 using MyPy at Dropbox, he is hoping to make it so that if you add this type in information they will be able to develop a tool to help warn you statically offline that "hey, this code, while it's fine in Python 2 is kind of questionable in Python 3, and you might want to tweak it so that there is no question of compatibility so that when you run this code under Python 3 in the future it will be ok and you won't have any issues." So there is even sole tool work being done to make it easier.

53:42 Yeah, that's really interesting.

53:43 Yeah, the real problem is people are still writing new code that's Python 2 only instead of like going all new code should at least be Python 2 and 3 compatible. Because if you do that, then a least your problem is like set but if you are still writing all your code in 2 you are just making your problem worse and worse as time goes on, so this is why I always- whenever I give talks on Python 3 I always go, "Ok, I want you to go home and I want you to do 2 things- do not write any more new Python 2 code, only write code that work in Python 2 and Python 3, and then slowly start putting your code over to Python 3, file by file", I'm not expecting anyone to do their whole code base, but at least start and at least get the practices in place, like adding the feature statements or running Pylint with the --py3k flags to get some of the warnings. If you run Python 2 make sure you run it with the -3 flags so you get py3k warnings in the interpreter. I mean, there is a bunch of stuff you can do, you can just integrate as part of your integration or day to day practices and it will just make your life easier when you finally do get to flip the switch. It doesn't have to be all shut down for 6 months while you do the port, it's like, "No, I'll just spend a little time maintain and tweak some code to make it more compatible and just slowly work my way forward." Because, otherwise, you will just make it that much of a burden in the future.

54:53 That's really good advice, I think if you find yourself in the hole the first step to get out of it is to stop digging, right?

54:59 Exactly. And, I suspect it's a little easier sell, if you don't necessary tell a management that we are going to support Python 2 and 3 with the new code and slowly fix things as you fix bugs, we are going to now run Python 2 with the -3 flag in our continuous integration test, we are going to use Pylint to actually check for errors and we are just going to start fixing up where we are really kind of ambiguous whether we are working with binary data or textual data so that we know exactly what needs to support unicode what needs to support bytes. And you just do it slowly over time and just make it easier until you are, ok management, look we already support it, can we just have a week or a month or how much time do you think you need to get over the last hump and get done?

55:44 Well, maybe as we get closer to 2020 the 20% time concept maybe could be applied as well like look we are not going to stop and just do the switch, but just dedicate some of our time like Friday afternoons or like, whatever, as a team and eventually you will get there, right?

56:01 Yes. Exactly.

56:03 Cool. So, we're kind of getting closer to the end of the show, let me ask you just a few more questions. There is an interesting story how the name Pyjion came to be, can you tell me like what that was sort of derived from?

56:13 As I said earlier, Dino started this project just on basically a Vim after being inspired like oh, what can I do to make Python 3 faster and get people to switch. And, he wanted the name that somehow involved Python and JIT, he came out with Pyjion which is spelled P-Y-J-I-O-N it throws everyone for a loop until Dino told them it is pronounced Pyjion, or at least that is how we expect people to pronounce it. But then again, I am kind of used to it, for instance, people always miss pronounce PyPi the Python package index the abbreviation is PyPi and it is pronounced PyPi, but I've heard so many people call it paipai, peipe, paipi, pipi, I've heard every different way of saying it, and I am just going to tell your audience, it's pai pi ai, or call it the Cheese Shop, which was its original name until some people were too worried that managers would not take Python seriously back in like 2005 and we renamed it into the Python package index.

57:13 That's awesome.

57:13 And by the way, does work and it will redirect you to

57:17 Yeah, very cool. Two other questions- if you are going to go write some Python code what editor do you open?

57:23 I am actually currently opening visual studio code, or VS code, I have very little allegiance to code editors, I totally jump around constantly, I learned Vim way back in my undergrad days, and used that for a long time, but I've tried Eclipse, I was a Text Mate user for quite a while until updates kind of dried up and then I ended up switching to Sublime, especially when Sublime 3 beta came out using Python 3, and i was like I can throw a couple of bucks this way to support someone going with Python 3, but then updates kind of slowed up a lot. And then, I used Atom for a while, from GitHUb, I was using that and I actually still do. I joined Microsoft and Microsoft released VS code and we actually announced-

58:16 Which is not the same thing as Visual Studio, right?

58:18 No, not at all. So, Visual Studio is an integrated development environment, right, it's a full fledged IDE, it does everything, and if you like IDEs it's actually really great. It is Windows only though, and it is an IDE and I am personally a code editor kind of guy, like I like separate tools like I will have a git bash open to do my own git work, I don't need an IDE to give me a fancy tree view of all my branches, for instance. I like having a separate code editor. And, VS code is more like Atom, then it is like Visual Studio, but it is from the same team, so it's from a team that's been doing code editing and IDE development for basically decades. So there is a lot of worth of knowledge there for the design of it. And, we have actually announced that my team, which is in charge of Python tools for visual studio was actually a really cool plugin which lets you do crazy stuff like the plug across Python, and C code and other stuff. We are actually in charge of adding Python to VS code.

59:17 Oh that's cool to hear.

59:18 Yeah. We don't have a time line or anything like that but my manager announced it on hacker news, so I can talk about it publicly that we have been put in charge of doing that once we get around to it.

59:34 Very cool, there is a lot of stuff going on in Python around there, more than people might think, these days.

59:37 Yes. Exactly, so I am actually using VS code because I want to make sure |I fully understand it for one we do development with it and know where we need to add stuff in and be familiar with it, so that I can be either contributing to project or at least be an internal tester of all the stuff we have.

59:52 Right, and adviser, very cool. So the other question is, on PyPi, there are many thousands of packages, everybody has their own sort of favorite that a lot of people don't have experience with. What is yours?

01:00:02 I'm going to cheat, and I am going to say PyPi, I think it kind of goes a little unnoticed that you hear people complain about this state of package in Python and all that, on occasion, but I don't know if people truly realize how organically grown it is and what is partly why it's taken so long to get stuff down in it, but also how difficult of a problem it is and how useful it is to have PyPi like i remember back starting with Python when people would ask what is Python or is that the language with the white space. And, so this was back in the day when like C pan was a big deal and I had to longingly look at Perl and like, oh my God, the central repository for all the projects, this is amazing. And then, Richard Jones and Martin Lowis at PyCon 2005 I think did PyPi and suddenly we have this central place where people can upload their own packages and it wasn't manually maintained, and suddenly we had this index and I think it was a real boom for the community, because suddenly it was a single place to find your code and fetch your code, and you could just keep a track of stuff. And so, I would say PyPi and actually specifically the SQL of PyPi that is actually being developed right now called Warehouse, being lead by Donald Stufft, that would probably be my project of choice because that is going to be a big deal and he actually is working with someone to actually do user experience design on it and it looks really sharp. If you want to help contribute, they are taking contributions, so I believe if you search for PyPi Warehouse or maybe if you go to which is for the python package authority, there should be a Warehouse repo, and you should be able to take a look at what the next version of PyPi is going to look like.

01:01:43 It's a very meta but very good answer, thanks for that. All right Brett, it's been super interesting, I've really learned a lot talking about all the internals and I wish you guys a lot of luck with this project, it seems really promising.

01:01:56 Yeah, well thanks a lot Michael, I really hope it works out. And thanks for having me, I am actually a listener so I was really honored to be on the podcast.

01:02:03 You are absolutely welcome, it's been great. Thanks so much.

01:02:03 This has been another episode of Talk Python To Me.

01:02:03 Today's guest was Brett Cannon and this episode has been sponsored by Hired and SnapCI. Thank you guys for supporting the show!

01:02:03 Hired wants to help you find your next big thing. Visit to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $4,000 USD.

01:02:03 Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at

01:02:03 Do check out the video course I'm building. The kickstarter is open until March 18th and you'll find all the details at

01:02:03 You can find the links from the show at

01:02:03 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.

01:02:03 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on

01:02:03 This is your host, Michael Kennedy. Thanks for listening!

01:02:03 Smixx, take us out of here.

Back to show page