Microsoft's JIT-based Python Project: Pyjion

Episode #49, published Tue, Mar 8, 2016, recorded Wed, Feb 3, 2016

Episode Deep Dive Transcript

This episode you'll learn about a project that has the potential to unlock massive innovation around how CPython understands and executes code. And it's coming from what many of you may consider an unlikely source: Microsoft and the recently open-sourced, cross-platform .NET Core runtime.

You'll meet Brett Cannon who works on Microsoft's Azure Data group. Along with Dino Viehland, he is working on a new initiative called Pyjion (pronounced Pigeon) P-y-j-i-on, a JIT framework that can become part of CPython itself paving the way for many new just-in-time compilation initiatives in the future.

Links from the show:

Pyjion project: github.com/Microsoft/Pyjion
Brett's PyData Keynote on interpreters:
youtube.com/watch?v=NdJ9BxgRpOY
Philip Guo's CPython internals episode (#22):
talkpython.fm/episodes/show/22
Brett on Twitter: @brettsky
Michael's Video Project:
blog.michaelckennedy.net/2016/02/16/im-building-20-online-python-courses-and-i-need-your-help-video-course-library-announced

Episode Deep Dive

Guest Introduction and Background

Brett Cannon is a long-time Python core developer and was one of the earlier members of the Python core team, contributing significantly to Python 3 as well as other parts of the language. He joined Microsoft to work on Python tooling and open-source initiatives, ultimately becoming a driving force behind Pyjion (pronounced “pigeon”), the JIT-based Python project built on top of Microsoft’s now open-sourced .NET Core runtime. In this episode, Brett shares how his interest in language design, performance, and bridging Python with the .NET ecosystem led to the creation of Pyjion.

What to Know If You’re New to Python

Here are a few concepts, drawn from the conversation, that will help you follow the discussion about Pyjion and CPython’s internals:

Python 3 is the modern version of the language. Much of this episode focuses on how to make Python 3 faster through a JIT compiler.
CPython is the reference implementation of Python, written in C; it uses a bytecode interpreter loop.
“JIT” (Just-In-Time) compilation allows Python code to be translated into faster machine code at runtime.
While listening, keep in mind that Python extensions (written in C) and the Global Interpreter Lock (GIL) significantly influence how Python performance is improved.

Key Points and Takeaways

1. The Pyjion Project: A JIT for CPython

Pyjion aims to enhance the performance of standard Python (CPython) by plugging a .NET-based JIT (Just-In-Time) compiler directly into the Python evaluation loop. Rather than rewriting or replacing CPython entirely, Pyjion uses a pluggable JIT layer to compile Python bytecode to optimized machine instructions at runtime. This approach can potentially bring major speedups for Python 3 users without breaking compatibility.

Links / Tools:
- Microsoft’s .NET Core: github.com/dotnet/runtime
- CPython: github.com/python/cpython
- Pyjion (likely code location, though not stated explicitly in the transcript): github.com/Microsoft/Pyjion

2. Traditional CPython Internals

CPython works by parsing source code into bytecode and then executing those bytecode instructions in a large ceval loop. Each operation (e.g., LOAD_CONST, ADD) is handled in a switch statement. While simple to implement and maintain, this interpretation process can be slow, particularly for CPU-bound tasks.

Links / Tools:
- dis module (for disassembling bytecode): docs.python.org/3/library/dis.html

3. Potential for Multiple JIT Backends

Pyjion’s API design aims to make Python “JIT-pluggable,” so you could theoretically use different JIT engines (CoreCLR, Chakra, or even V8). Because JavaScript engines (like V8 or Chakra) handle similar dynamic features, the Pyjion team sees an opportunity for plugging in other specialized JITs or even just comparing them.

Links / Tools:
- Chakra (Microsoft’s JS engine): github.com/Microsoft/ChakraCore
- V8 (Chrome’s JS engine): github.com/v8/v8

4. Concurrency and the GIL

Although Pyjion focuses on speed, it does not remove Python’s GIL (Global Interpreter Lock). Removing the GIL is a non-trivial challenge because many existing C extensions (like NumPy, SciPy, etc.) rely on GIL-based assumptions. While many community members want true multi-threaded execution, Pyjion stays compatible with existing extension modules by keeping the GIL in place.

Links / Tools:
- Global Interpreter Lock (GIL) explained: wiki.python.org/moin/GlobalInterpreterLock

5. Performance Improvements in Python 3

Beyond Pyjion, there are ongoing efforts to boost performance in CPython itself. The episode references dictionary “versioning” (PEP 509) and new opcodes for method calls that can cache lookups. These changes can often yield across-the-board performance gains without requiring a separate JIT.

Links / Tools:
- PEP 509 (Dict Versioning): peps.python.org/pep-0509/
- PEP 510 & 511 (Referenced as “Fat Python” proposals): peps.python.org/

6. CPython Extension Module Compatibility

One of the major goals for Pyjion is preserving compatibility with C extensions. This is essential for users of libraries like NumPy, SciPy, and Pandas. While other Python implementations (e.g., PyPy) are faster in some cases, they struggle to support existing C extensions without specialized shims or rewrites. Pyjion’s approach is: “Keep CPython as-is but make its execution faster.”

Tools Mentioned:
- NumPy: numpy.org
- CFFI (alternative extension approach): cffi.readthedocs.io

7. Moving from Python 2 to Python 3

At the time, there was still hesitancy around adopting Python 3 in many organizations. One of the motivations behind speeding up Python 3 is to give engineers a performance “carrot” for making the switch. Brett Cannon points out that performance benefits and new language features are easier to sell to management when moving away from Python 2.

Tools / Resources:
- Official Python 3 docs: docs.python.org/3
- Transition tips: docs.python.org/3/howto/pyporting.html

8. CPython’s Future: Evolving Internals

During the conversation, Brett mentions the flurry of activity among core developers to optimize the interpreter. Examples include caching method objects or skipping repeated dictionary lookups for global variables. These incremental changes in each new CPython release (e.g., 3.5, 3.6, etc.) add up to big performance wins over time, complementing larger projects like Pyjion or PyPy.

Links / Tools:
- Python Enhancement Proposals (PEPs): peps.python.org

9. Python JIT Approaches: PyPy, Pyston, and More

Pyjion isn’t the only JIT story in Python. The hosts compare and contrast it with PyPy (which uses its own toolchain in RPython), Pyston (an LLVM-based JIT initially backed by Dropbox), and other alternatives like Jython or IronPython. Each approach tries to speed up Python for different ecosystems and has different degrees of CPython compatibility.

Links / Tools:
- PyPy: pypy.org
- Pyston: github.com/pyston/pyston

10. Practical Goals, Ongoing Evolution

Pyjion started as a proof-of-concept but grew into a serious exploration of what it takes to embed a JIT into CPython. Brett and Dino’s plan covers a few phases: proving out the concept, providing an API for others to build JIT extensions, and building a community around it. While performance is the primary draw, Pyjion also offers a blueprint for how to build pluggable enhancements to CPython without forking it completely.

Interesting Quotes and Stories

“People always seem to want more performance, no matter how well Python does. This is why we’re exploring a JIT that might give us a performance carrot for Python 3.” -- Brett Cannon

“We hope to make JIT pluggable so you could use .NET Core, Chakra, or any JIT you want, letting you pick whichever engine fits your workload best.” -- Brett Cannon

“If we can tell a manager, ‘Switching to Python 3 will save 10 or 20% on runtime costs,’ that’s a selling point you can take to the C-suite.” -- Brett Cannon

Key Definitions and Terms

CPython: The official, reference implementation of Python, written in C.
JIT (Just-In-Time Compiler): A technique to compile parts of the code at runtime for faster execution, rather than interpreting everything.
Global Interpreter Lock (GIL): A mechanism in CPython that ensures only one thread executes Python code at a time, simplifying memory management.
Bytecode: A lower-level, platform-independent representation of Python code that CPython executes in the ceval loop.
C Extensions: Modules written in C (or C++) that interface with CPython’s C API for performance or access to lower-level system capabilities.
PEPs (Python Enhancement Proposals): Design documents providing information to the Python community or describing a new feature for Python or its processes.

Learning Resources

If you want to dive deeper into the concepts and skills mentioned in this episode, here are two suggestions:

Python for .NET Developers: Ideal if you come from a C# or .NET background and want to be effective coding in Python.
Python for Absolute Beginners: Perfect if you are brand new to Python and need a solid foundation before exploring advanced topics like JIT compilation.

Overall Takeaway

Pyjion is a bold project that aims to bring the power of JIT optimization to regular Python (CPython) without forcing developers to leave behind the libraries and extensions they rely on every day. This broader push toward performance, from dictionary caching in CPython to pluggable JIT engines, highlights Python’s continued evolution. While removing the GIL or going fully multi-threaded remains complex, Pyjion’s incremental approach could lead to valuable performance boosts for Python 3 in real-world scenarios. If you’re excited about the future of Python, Pyjion is definitely one of the projects to watch.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 This episode, you'll learn about a project that has the potential to unlock massive innovation

00:04 around how CPython understands and executes code.

00:07 And it's coming from what many of you may consider an unlikely source, Microsoft and the recently open-sourced cross-platform .NET Core Runtime.

00:15 You'll meet Brett Cannon, who works on Microsoft's Azure Data Group.

00:19 Along with Dino Villan, he is working on a new initiative called PYJION, P-Y-J-I-O-N,

00:25 a JIT framework that can become part of CPython itself, paving the way for many new just-in-time compilation initiatives in the future.

00:33 This is episode number 49 of Talk Python To Me, recorded February 4th, 2016.

00:51 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:09 This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy.

01:13 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:20 This episode is brought to you by Hired and SnapCI.

01:23 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.

01:31 Hey, everyone. I think you're going to love this episode.

01:33 Brett is doing some amazing work, and we talk about that in depth, but he's also a Python core developer,

01:39 and we spend a decent amount of time on Python 3 and moving from Python 2 to Python 3 and that whole story there.

01:45 I do have just one piece of news for you before we get to the interview.

01:49 It's just T minus 10 days until my Kickstarter for Python jumpstart by building 10 apps closes.

01:55 The initial feedback from the early access students has been universally positive.

02:00 If you have backed the Kickstarter with early access, be sure to create an account at training.talkpython.fm

02:05 and send me a message via Kickstarter so I can get you the first six chapters, about three hours, of the course.

02:11 If you're not sure what I'm talking about here, check out my online course at talkpython.fm/course.

02:16 Now, let's hear about JIT innovation in CPython and more with Brett Cannon.

02:22 Brett, welcome to the show.

02:23 Thanks for having me, Michael.

02:24 I'm super excited to talk to you about this new project that you guys have going on with Python and Microsoft.

02:29 And yeah, we're going to dig into it. It'll be fun.

02:31 Yeah, I'm looking forward to it.

02:32 Absolutely.

02:33 So before we get into that topic, though, what's your story?

02:36 How do you get going in Python and programming and all that?

02:39 They're slightly long stories.

02:40 So getting into programming, probably my earliest experience with anything you could potentially call programming was Turtle back in third grade.

02:48 I was lucky enough to be in a school that had a computer lab full of Apple IIEs.

02:52 And they'd bring us in and say, oh, look, you can do this little forward command and make this little turtle graphic draw a line and all this stuff.

02:59 Was that on the monitor that was just like monochrome green?

03:02 Yep. And that's why I think I used one of those, too.

03:05 Yeah. I sometimes run my terminal with that old green and black style because it's just what I started with back in the day.

03:11 Oh, that's awesome.

03:12 So I did that, but I didn't realize what the heck programming was.

03:15 But I always found computers kind of this fascinating black box that somehow you stick in these five and a fourth inch floppies, which dates me.

03:21 And somehow we're in the world in Carmen San Diego plays.

03:24 I was like, wow, this is amazing.

03:26 And then in junior high, I ended up taking a summer class on computers and it involved a little bit of Apple basic.

03:33 And I really took to it.

03:35 I actually lucked out and got so far ahead of the class.

03:38 The teacher just said, yeah, you can stop coming to class if you want for the rest of the summer.

03:41 So that was like halfway through.

03:44 So I got bit kind of early, but I didn't really have any guidance or anything back then.

03:49 I mean, this is pre-access to the Internet, so I didn't really have any way to really know how to carry on.

03:54 And then when I went to junior college, my mom made me promise her that I would take a class in philosophy and a class in computer science.

04:01 And I did both and I loved them both.

04:03 But in terms of the computer science, I read through my C book within two weeks.

04:08 And then one night, spent six hours in front of my computer writing tic-tac-toe from scratch.

04:14 Using really basic terminal output.

04:15 And I was basically hooked for life.

04:17 In terms of Python.

04:19 That's really cool.

04:20 I think we all have that moment where you sit down at a computer and you haven't, maybe you've really enjoyed working with them or whatever.

04:28 But then you kind of get into programming and you realize, wow, eight hours have passed.

04:33 And it feels like I just sat down.

04:35 And then you're in the world.

04:37 That's it.

04:37 Brought me my dinner at my desk.

04:39 And you said, okay, I get it.

04:40 You're just into this.

04:42 Just go with it.

04:44 Here's your food.

04:44 Make sure you eat at some point tonight.

04:45 Awesome.

04:46 Yeah.

04:47 And in terms of Python, I actually ended up going to Berkeley and getting a degree in philosophy because there were some issues trying to double major like I originally planned to do.

04:56 But I did try to still take all the CS courses there.

04:59 And there was a test to basically get into the intro of CS course at Berkeley at the time.

05:05 And I thought they might have something about object-oriented programming.

05:08 And having learned C, I knew procedural, but I didn't know object-oriented programming.

05:11 So in fall of 2000, before I took the class in spring, I decided to try to find an object-oriented programming language to learn OO from.

05:20 And I was reading and all this stuff.

05:22 And Perl and Python caught my eye.

05:25 But when I kept reading, Perl should be like the fifth or sixth language you learned.

05:28 While people kept saying, oh, Python's great for teaching.

05:30 I mean, all right, I'll learn Python.

05:31 And I did.

05:33 And I loved it.

05:33 And then I just continued to use it for anything I could and all my personal projects.

05:37 And just kept going and going with it.

05:39 And I haven't looked back since.

05:40 Yeah, that's really cool.

05:41 What language was your CS 101 course actually in?

05:45 Scheme, actually.

05:46 Interesting.

05:47 My CS 101 class was Scheme as well.

05:50 And I thought that was a very interesting choice for an introduction.

05:53 Yeah, it was really interesting.

05:55 I mean, it does kind of do away with the syntax.

05:58 But obviously, now being a Python user, I really understand what it means to kind of really minimize the syntax in a nice way instead of a slightly painful way with all those parentheses.

06:06 And it was interesting.

06:08 I mean, it is a nice way to try to get in procedural programming and object-oriented and functional.

06:14 So it was really nice to do multi-paradigm, teach you the basics kind of introduction.

06:19 They did actually, interestingly enough, for the last project to have us write a really basic logo interpreter, which, funny enough, was such a bad experience for me,

06:28 partially because of the way it worked out in terms of having to work with another team.

06:32 And I had some issues with my teammates.

06:35 I actually kind of got turned off on language design, of all things, for a little while.

06:40 And then I just, over time, kept realizing I loved programming languages, learning how they worked.

06:44 So I just re-evaluated my view and just realized, okay, it was just a bad taste from a bad experience and realized that I actually do have this weird little fascination with programming languages.

06:55 And luckily got over that little issue of mine.

06:57 Yeah, no kidding.

06:58 And now you're a Python core developer, among other things, right?

07:01 Yeah.

07:01 So back to the language design, at least on the internals.

07:05 Yeah, yeah.

07:06 Awesome.

07:07 So we're going to talk about Pigeon, this cool new JIT extension.

07:14 You're going to have to tell me a little more about how you'd most correctly characterize it for CPython.

07:19 But before we do, I thought maybe you could give us like a high-level view of two things.

07:24 How CPython works, what's sort of going on when we run our code as is, right, with the interpreter.

07:32 And then maybe a survey of the different implementations or runtimes.

07:36 Because a lot of people think there's just one Python from an implementation or runtime perspective.

07:42 And there's actually quite a variety already, right?

07:44 Yeah, actually, we're kind of lucky in the Python community of having a lot of really top-quality implementations.

07:50 But to target your first question of how CPython works, which is, for those who don't know,

07:55 CPython is the version of Python you get from python.org.

07:59 And the reason it's called CPython is because it's implemented in C and has a C API,

08:04 which makes it easy to embed in stuff like Blender.

08:07 Anyway, basically, the way Python works is more or less like a traditional interpreted programming language

08:12 where you write your source code.

08:14 Python acts as a VM, reads the source code, parses it into individual tokens like

08:20 if and def and, oh, that's a plus sign and whatever.

08:24 And then that gets turned into what's called a concrete syntax tree, which is kind of just like the way the grammar is written kind of nests things.

08:32 And this is how you get your priorities in terms of precedence, like multiplication happens before plus, which happens before whatever.

08:40 And that all works out in the concrete syntax tree in terms of how it nests itself.

08:45 And then that gets passed into a compiler within Python that turns that into what's called an abstract syntax tree,

08:51 which is much more high level.

08:52 Like this is addition instead of plus and two things.

08:55 And this is loading a value.

08:58 And this is an actual number.

08:59 And this is a function call.

09:02 And then that gets passed farther down into the bytecode compiler, which will then take that AST and spit out Python bytecode.

09:09 And that's actually what's stored basically in your PYC files.

09:13 Actually, technically, they're marshaled code objects.

09:15 And then when Python wants to execute that, it just loads up those bytecodes and just has a really big for loop that basically reads through those individual bytecodes.

09:24 It goes, OK, what do you want me to do?

09:26 All right, you want me to load a const.

09:27 Const is zero.

09:29 And that happens to correlate to none in every code object.

09:32 So I'm going to put none onto what's called the execution stack because Python is stack-based instead of register-based.

09:39 So CPUs are register-based.

09:40 Stack-based VMs such as Python.

09:43 Java is another one.

09:44 It's fairly common because it's easier to implement.

09:48 Anyway, you can do stuff like load const none or load a number, load another number on the stack.

09:53 So the stack now has two numbers.

09:54 And then the loop might, the C eval loop for evaluation loop.

10:00 Yeah.

10:00 So it's worth pointing out to the listeners, I think, who maybe haven't gone and looked at the source code there.

10:06 When you say it's a big loop, it's like 3,000 lines of C code or something, right?

10:11 It's a big for loop.

10:13 Yeah, it literally is a massive for loop.

10:15 If you actually go to Python source code and you look in the Python directory, there's a file in there called ceval.c.

10:24 You can open that up and you will literally find nested in that file somewhere just a for loop with a huge switch statement that does nothing more than just execute these little byte codes.

10:35 So like if it hits add, what it'll do is just pop two values off of what's basically a chunk of memory where we know what's pointers are on the stack and just go, I'm going to take that Python object.

10:47 I'm going to take that Python object and execute the dunder add in the right way or the dunder r add and then make that all happen.

10:53 Get back a Python object and stick that back on the stack and then just go back to the top of the for loop and just keep going and going and going until you're done and your program exists.

11:01 Yeah, and you can actually see that byte code by taking loading up some Python module or function or class or whatever and importing the disassembly module and you can actually have it spit out the byte codes for like say a function, right?

11:15 Yep.

11:16 And I do this all the time on Pigeon, actually.

11:18 Basically, you can import the dis module, D-I-S.

11:22 And in there, there's a dis function.

11:24 So if you go dis.dis and then pass in any callable, basically, so function, method, whatever, and it'll just print out to standard out in your REPL all the byte code.

11:35 And it'll give you information like what line does this correlate to?

11:38 What is the byte code?

11:40 What's the argument to that byte code?

11:42 The actual byte offset and a whole bunch of other interesting things.

11:45 And the dis module documentation actually lists most of the byte code.

11:50 I actually found a couple of opcodes that weren't actually documented.

11:53 Now there's a bug for that.

11:54 But the majority of the byte code is actually documented there.

11:57 So if you're really interested, you can have a look to see actually how we kind of break down the operations for Python for performance reasons and such.

12:05 Yeah, that's really interesting.

12:07 And for the listeners who are wanting to dig deeper into this, on show 22, I talked with Philip Guau about his sort of CPython internals graduate course he did in the University of New York.

12:19 Have you seen his work?

12:20 No, I haven't yet.

12:21 He basically recorded 10 hours of a graduate computer science course studying the internals of CPython and spent a lot of time in cval.c.

12:30 And it's on YouTube.

12:31 You can go check it out.

12:32 So it's really cool.

12:32 So that's interesting.

12:35 Oh, I should probably actually answer your second question, too, about all the other interpreters.

12:38 Yeah, so let's talk about the interpreters.

12:39 As I said earlier, CPython is kind of, it's the one you get from python.org and kind of the one most people are aware of.

12:46 But there's actually a bunch of other ones.

12:49 So one of the more commonly known alternative interpreters or VMs or implementations of Python is Jython, which is Python implemented in Java.

12:58 So a lot of people love that whenever they have to write a Java app and want some easy scripting to plug in.

13:04 Or have some requirement that they have to run on the JVM.

13:06 Apparently, it's really popular in the defense industry for some reason.

13:10 Interesting.

13:10 Once you get a VM approved, you just don't mess with it, I'd say.

13:13 Yeah.

13:14 Well, and one really cool perk of this is PyCon, every so often there's a really cool talk about flying fighter jets with Python using Jython and stuff like that.

13:25 So it does at least lead to some really cool talks.

13:27 Nice.

13:27 And here's the afterburner function.

13:29 You just call this.

13:30 Exactly.

13:32 There's Iron Python, which is Python implemented in C#.

13:35 So that's usable from .NET.

13:37 So once again, it's often used for embedding in .NET applications that need scripting or anyone who needs to run on top of the CLR.

13:48 Those are the two big ones.

13:49 Obviously, in terms of direct alternatives, there's obviously PyPy, which I think a lot of people know about, which is two things.

13:57 There's PyPy, the implementation of Python written in Python, although technically it's a subset of Python called RPython, which is specifically restricted such that they can infer a lot of information about it.

14:09 So that can be compiled down straight to basically assembly.

14:13 And then there's PyPy, the tool chain, which they developed for PyPy, the Python implementation, which is basically this tool chain to create custom jets for programming languages.

14:25 So you can take the PyPy tool chain and not just implement Python in Python, but they've done it for like PHP, for instance.

14:33 And so you can actually write alternative implementations of languages in RPython and have it spit out a custom just designed for your language.

14:40 Those are the key ones that have actually finished in terms of compatibility with some specific version of Python.

14:46 All of them currently target 2.7.

14:48 PyPy has support for Python 3.2, but obviously that's kind of an old support in terms of Python 3.

14:55 And then there's the new up-and-comer, which is Piston, which is being sponsored by Dropbox.

15:00 And they're also targeting 2.7.

15:02 And they're trying to version a Python that is as compatible with CPython as possible, including the C extension API.

15:09 But what they're doing is they've added a JIT or using a JIT from LLVM.

15:14 So they're trying to make 2.7 fast using LLVM JIT and pulling as much of the C code and API as they can from CPython to try to be compatible with extension modules, which is a common problem that PyPy, IronPython, and Drython have.

15:27 Right. That one actually seems to be really interesting and have a lot of potential.

15:32 Because if you think of companies that are sort of Python powerhouses, Dropbox is definitely among them.

15:39 Yeah, it definitely does not hurt when Guido went to go work there as well.

15:43 And they have Justin McKellar there and several other people.

15:46 Benjamin Peterson works for them.

15:48 So they already have a couple of core devs and high up people in the Python community working there.

15:52 And their whole server stack in the back, I believe, is at least mostly Python.

15:56 Their desktop clients are Python.

15:58 They're definitely Python heavy there.

16:00 Yeah, absolutely.

16:02 So how does Pigeon relate to the thing that came to mind for me when I saw it announced was, you know, a friend of mine, Craig Bernstein, sent me a message on Twitter and said, hey, you have to check this out.

16:13 And I'm like, oh, that is awesome.

16:15 And it was just, you know, a Twitter message.

16:17 You know, check out this JIT version of Python coming from Microsoft.

16:22 Well, I don't know anything about it, but maybe it's like PyPy.

16:26 So what are you guys actually building over there?

16:28 What is this?

16:29 Pigeon was actually started by Dino Velen, one of my coworkers.

16:32 And I believe that I don't know if he's necessarily the sole creator, but definitely one of the original creators of Iron Python back at PyCon US 2015, which was in Montreal.

16:43 During the language summit, Larry Hastings, the release manager for Python 3.4 and 3.5,

16:49 got up in front of the core developers and said, what can we do to get more people to switch to Python 3 faster?

16:55 Because obviously we all think Python 3 is awesome and legacy Python 2 is fine, but everyone should get off that at some point.

17:01 Yeah, I hear you.

17:02 I agree.

17:02 So what do you do, right?

17:03 Yeah, that could be a whole other question on that one, Michael.

17:07 So he said, what can we do?

17:09 What can we do?

17:09 And he said, performance is always a good thing.

17:11 People always seem to want more performance, no matter how well Python does.

17:15 People are always hungry for more.

17:16 And Dino went, yeah, that's a good idea.

17:18 I know, I'll see.

17:19 .NET just got open sourced back in April 2015.

17:23 And he said, you know what?

17:25 I will see if I can write a JIP for CPython using Core CLR.

17:29 Because Dino also happened to used to be on the CLR team.

17:32 So he knows the opcodes like the back of his hand.

17:35 And so he started to hack on it at the conference and actually managed to get somewhere.

17:40 And he premiered it at PyData Seattle back in July when we hosted it at Microsoft.

17:45 And I got brought on to basically help him flesh out the goals.

17:50 There's basically three goals.

17:52 One is to develop a C API for CPython to basically make it pluggable for a JIT.

17:58 Like one of the tough things that people have always done, like Unladen Swallow started with and Pistons also doing, is they're directly tying into a fork of CPython, more or less, a JIT, which really tightly couples it.

18:11 But it also means that, for instance, if LLVM does not work for your workload for whatever reason, you're kind of just stuck and it's just not an option.

18:18 Well, we would rather basically make it so that there's just an API to plug in a JIT.

18:24 And then that way CPython doesn't have to ship with a JIT, but it's totally usable by a JIT.

18:29 And then that way, if LLVM or CoreCLR, which is the .NET JIT or Chakra or V8 or whatever JIT you want, as long as someone basically writes the code to plug from CPython into that JIT, you can use whatever works best for you.

18:46 That's really cool.

18:47 I think it's a super noble goal to say, let's stop everybody starting from scratch, rebuilding the CPython sort of implementation and weaving in their version of a JIT and saying, let's just find a way so that you don't have to write that ever again.

19:05 And you just plug in the pieces.

19:07 Yeah, exactly.

19:08 And actually, one of the other goals we have with this is not only developing the API, but goal number two is to write JIT for CPython using the CoreCLR and using that to drive the API design that we need that we want to push back up to CPython eventually.

19:25 But the third goal is actually to design kind of a JIT framework for CPython such that we write the framework that drives the coding mission for the JIT.

19:35 And then all the JIT people have to do is basically just write to the interface of this framework and don't have to worry about specific semantics necessarily.

19:45 So, for instance, you would be able to, as a JIT author, go, OK, I need to know how to emit an integer onto a stack and I need to know how to do add or add int.

19:55 But then the framework would actually handle going, OK, well, here's the Python bytecode that implements add.

20:01 Let's actually do an add call or, hey, I know this thing is actually an integer.

20:05 Let's do an add inc call and not just a generic Python add and be able to handle that level of difference so that there's a lot less busy work that's common to all the JITs like type inference and such and be able to extract that out so that it's even easier to add a JIT to CPython.

20:21 So is that like two levels?

20:23 Like on one hand, you have a straight C API at the CPython level and then optionally you could choose to use the C++ framework that makes it so you do less work and you plug in your sort of events or steps?

20:34 Yeah, exactly.

20:35 It's getting the bare minimum into CPython so that CPython at least has this option without everyone having to do a fork and as well as pushing down a level to a separate project where the common stuff is extrapolated out and everyone can just build off the same baseline.

20:49 And then only thing that has to really differ is what's unique to the JITs.

20:53 And then that way, everyone's work is as simple as possible to try to make this work.

20:56 OK, that makes a lot of sense.

21:04 This episode is brought to you by Hired.

21:11 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

21:16 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.

21:25 Typically, candidates receive five or more offers within the first week and there are no obligations ever.

21:30 Sounds awesome, doesn't it?

21:31 Well, did I mention the signing bonus?

21:33 Everyone who accepts a job from Hired gets a $1,000 signing bonus.

21:36 And as Talk Python listeners, it gets way sweeter.

21:39 Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $2,000.

21:46 Opportunity's knocking.

21:47 Visit Hired.com slash Talk Python To Me and answer the call.

21:56 Would you still be able to support things like method inlining and things like that with the C++ framework?

22:03 We don't know yet, but there's technically no reason why not.

22:08 What's actually really interesting is we started all this work and we actually weren't ready to premiere any of this yet.

22:15 We've been doing this out in the open on GitHub.

22:17 But as you mentioned, Michael, people started to tweet it and then it made it to Reddit and then it made it to Hacker News.

22:21 And suddenly everyone's asking questions and stuff.

22:23 But in the middle of all this, there's been a lot of work literally the past, I don't know, maybe two months of various core developers putting in a lot of time and effort trying to speed up CPython itself.

22:34 And part of this is actually trying to cache method objects so that they can get cached in the code object and actually not have to, every time you try to execute like a call by code,

22:47 not have to go to like the object, pull out the method object and then call that, but actually just cache the method object.

22:52 I already have it.

22:53 I don't need to re-access that attribute on the object.

22:56 And so it's already starting to bubble its way up into CPython.

23:00 And there shouldn't technically be any reason why we can't just piggyback off of that and just go, oh, well, they've already cached this or use a similar technique of basically,

23:08 if the object hasn't changed, I really don't need to worry about previous versions of this being different.

23:14 So I can just cache it and reuse it and just save myself the hassle of having to get a method back.

23:19 Or same thing with built-ins, right?

23:21 Like if you ever want to call len, some people cache it locally for performance.

23:26 But the work that's going on is actually going to make that a moot point because it's going to start to notice when the built-ins and the globals for your code have not changed.

23:35 And just go, well, I've already cached len locally because I already know I've used it previously.

23:39 So I might as well just pull that object immediately out of my cache instead of trying it in the local namespace, not having it there, going to the global namespace, not having it there, then going to the built-in namespace and having to pull out len again for every time through a loop, for instance, and call that.

23:53 Yeah, that's really great.

23:54 And I suspect you could just say, here's the JIT compiled machine instructions.

23:58 Just cache that or something like this.

24:00 Yeah, exactly.

24:02 So a lot of this work that's happening directly in CPython bubbles down both directions into helping JITs in various ways, right?

24:10 Like this whole detecting what state a namespace is from the last time you looked at it.

24:15 Has it changed at all or not?

24:17 That's probably going to end up in CPython itself as an implementation detail.

24:20 But it also means all the JITs will be able to go, oh, look, the built-in namespace hasn't changed.

24:25 So that means if I've cached len, I don't need to worry about it being changed.

24:28 I don't have to pay for a dictionary lookup.

24:30 I can just pull it right out of my array of cached objects and just go with it.

24:34 Okay.

24:34 Yeah, that sounds like it'll be great regardless of whether you're talking about a JIT or just running your code, right?

24:41 Yeah, no, it's going to be fantastic.

24:42 Everyone's going to win on that one.

24:44 Yeah, that's cool.

24:44 One of the things that I think is surprisingly slow in Python is calling methods, right?

24:51 Yeah.

24:52 It's more expensive maybe than it should be.

24:54 What other stuff kind of falls into that class that you can think of?

24:58 So the reason, just to give an explanation of why that's so slow, is if you look at what you can do with a method or function call,

25:08 Python's got a really rich set of semantics, right?

25:11 We have positional arguments.

25:13 We have keyword arguments.

25:14 We have star args and we have star star kwrgs.

25:19 We have keyword only arguments in Python 3.

25:21 I mean, they're default values and not.

25:24 There's a lot of different ways to try to build this stuff up into something that we can use to call a function with.

25:32 And some of them are really, really safe.

25:33 Right.

25:33 And maybe even closures as well, right?

25:35 On top of that.

25:36 Yeah.

25:37 Actually, luckily, that's not actually too costly for the actual call.

25:42 It's just when it comes time to look up the value, you've got to work your way up.

25:46 But that kind of ties into it, right?

25:47 So that's the other kind of expensive thing you have to do in Python is there's the cost of making a call itself because it just takes so much effort to build up what all the arguments should be.

25:57 And then there's the cost of just looking up the method or the function, right?

26:03 Because as you mentioned, there's closures.

26:05 So you have kind of this, you have local scope.

26:08 You have this potential closure scope, which are like sole variables or free variables.

26:13 If you're the guy calling out, you've got your global namespace.

26:16 You've got your built-in namespace.

26:18 And then that's on top of whether or not you've defined like a thunder get adder at method on your object.

26:23 This is going to have its own set of code to call to try to figure out what the heck you want, whether it can get it for you.

26:29 And that's the other real expense is trying to basically access attributes, which methods happen to be.

26:35 So that's one of the reasons that the calls can be so expensive.

26:37 It's not just the cost of getting the object, but it's also the call itself and just basically preparing for it.

26:43 Okay, interesting.

26:44 And this caching in CPython, you know, putting Pidget aside for a moment, that would make a big difference?

26:50 Yeah.

26:51 Yuri, I'm going to butcher his last name, so I honestly don't want to try.

26:54 Initial.

26:55 Yuri, you're center of law, I think.

26:58 Yeah.

26:59 Yuri, I believe it's Y.

27:00 I believe he lives in Toronto, actually.

27:04 He has actually developed some new opcodes.

27:07 For instance, load method and call method, which directly by themselves.

27:13 have a slight performance perk because they kind of skip some steps.

27:17 You typically have to make a method ready.

27:19 But Yuri's also been the one working on this caching stuff, building off of Victor Sinner's dictionary versioning.

27:26 And what he's doing is with his call methods and load methods, he's basically grabbing the unbound methods and sticking them on stack and just calling them directly without doing some extra work.

27:39 But with the caching, that thing he sticks on the stack, he can actually squirrel away and say, hey, next time I come to this call method or load method, I can just pull it right out of this cache as long as stuff hasn't changed in the namespaces above me.

27:50 And that's how he's trying to make method calls cheaper.

27:54 It's basically storing away the method object and fetching it right back if he can make sure for a fact that nothing has changed since last time he tried to get that object out.

28:03 Okay, that's awesome.

28:04 What's the time frame?

28:05 Any ideas?

28:06 Is it still just experimental or?

28:07 That's a good question.

28:09 So there's a pep.

28:10 So Victor Sinner has started what he's called Fat Python, F-A-T.

28:14 You can Google for that.

28:16 I'm sure you'll find it.

28:17 He currently has three peps, actually.

28:19 Pep 509 handles dictionary versioning, which is important for namespaces and caching.

28:25 Because you need to know if something like in your global namespace or your built-in namespace or even your local namespace has changed because all namespaces in Python or dictionaries, which is why you can introspect so much.

28:37 5.10 is adding guards to bytecode so that he can do stuff like add a guard saying, hey, if globals hasn't changed and built-ins hasn't changed, use this cast version of len.

28:51 This is before Yuri's stuff had started.

28:53 And then he's implemented PEP 5.11.

28:56 He's trying to add, actually, API for doing AST transformations so that you can basically plug in custom AST transformations to go like, well, if you're doing a number plus a number, we can just make it a number and skip the plus.

29:09 As of right now, PEP 5.10 and 5.11, I don't know where they're headed quite yet.

29:14 But 5.09 seems to be fairly well accepted.

29:18 And it's just a question of Victor finalizing the PEP and the design exactly and getting accepted.

29:24 So I really don't see any reason at all why that won't make it into Python 3.6.

29:28 And Yuri's stuff, he's already got patches and has benchmarked it and showed it working.

29:32 And there's some discussion about whether or not his current approach is the best or not.

29:37 But I personally don't see any reason why any of this won't make it in 3.6 either.

29:41 3.6.

29:42 Okay.

29:42 That's pretty excellent.

29:43 That's not too far out.

29:44 Yeah, no.

29:45 I think what we're due to hit beta in September.

29:48 So as long as you can get it, all this can wrap up by then.

29:51 It'll all land in Python 3.6.

29:53 And I should mention all this stuff is looking like Yuri's stuff, I think, is adding up to

29:58 between 5% and 10% across the board speed up improvements.

30:01 And depending on how your code looks, I think you're seeing up to 20% faster.

30:05 So definitely wins.

30:07 Yeah, that's a really big deal.

30:08 Okay.

30:09 Awesome.

30:10 I want to talk about the core CLR a little bit.

30:12 But before we do, you said something that I didn't expect you to say when we were talking

30:17 about jitters and plugging in jitters.

30:18 And that was V8 or Chakra.

30:21 That is awesome.

30:22 So somehow we could plug in the JavaScript engine from Chrome V8 or the one from IE and

30:29 Edge.

30:29 What would that look like?

30:30 We haven't really explored it yet, but it's definitely an idea we had.

30:33 Actually, before Chakra went open source, the Chakra team reached out to Dino and said,

30:37 hey, we think this might be useful to your project.

30:40 The thinking is, because JavaScript is as dynamic as it is, and all these jets have to be designed

30:46 to jit quickly, because obviously, if you're in your browser, no one wants to wait for their

30:51 favorite web-based email client to start running.

30:53 So they're really fast at the start.

30:55 But they also have to handle dynamicism really well, because JavaScript, just like Python,

31:00 can easily have attributes added and removed and changed at any time.

31:05 And so they have to be really flexible in terms of how they handle that kind of workload.

31:09 While Core CLR obviously does its best to be a really good all-around jit, obviously, it's

31:15 heavy uses like F-sharp and C-sharp and more static-based languages.

31:19 The thinking is that if we try to use a jit that worries about a language that's as dynamic

31:25 as JavaScript, we should be able to actually piggyback on all that work and actually have

31:30 a jit that works really well for Python, because it's already designed to deal with all the

31:34 dynamicism a programming language like Python and JavaScript have.

31:37 That's super interesting.

31:38 And I think if you have two distinct examples working against your API as different as the

31:46 CLR and JavaScript, you'll have a pretty robust API, right?

31:51 Yeah.

31:52 And that's the other thinking, too, is we want to get the Core CLR version done and passing

31:57 all of the Python test suite as much as reasonably possible so that we can go, OK, our jit framework

32:04 that we've designed to help drive these jits covers all the possible edge cases and basically

32:11 is good enough that if you implement these things in a reasonable fashion, you will get Python

32:15 compatibility.

32:16 And then that way we can just plug in and make sure that all this stuff just works both in

32:22 two completely different jits targeted to different types of languages and have it just all fall

32:27 through.

32:27 And honestly, it's a nice way to do performance comparisons for what kind of jit would probably

32:32 work best for Python.

32:32 Awesome.

32:33 That sounds like a really good idea.

32:34 I've done a fair amount of work with C# and the CLR.

32:38 And I know what the Core CLR is, but I suspect most listeners, when they hear .NET, they think,

32:44 oh, it's a Windows thing.

32:46 But you guys actually are doing quite a bit of different stuff now that Satya's in charge.

32:53 There's kind of a new mandate, right?

32:54 So tell people about the Core CLR.

32:56 I believe it was last year.

32:58 It was before I joined Microsoft this past July.

33:01 Basically, all of .NET was open sourced.

33:05 So previously, it was all this closed source thing that was very Windows only, except for

33:10 Mono, which kind of initially reverse engineered a bunch of things.

33:13 And then Microsoft said, oh, you know, well, we can at least open source, like I believe,

33:17 like the test suite and some other things for you to test your compatibility.

33:20 But Satya Nadella, as CEO of Microsoft, is really pushed for open source of Microsoft,

33:27 both its use, but also contributing and doing things in the open, both as in starting projects

33:32 from scratch that Microsoft has done in open sourcing those and also giving back to pre-existing

33:37 open source projects.

33:38 And one of the things they did was they completely open sourced .NET.

33:41 So .NET actually, I don't know if they've done the official release yet, but if you look at

33:46 least their digital integration tests, they're passing on Linux and OS X on top of Windows.

33:52 For instance, Pigeon right now is Windows only purely because of momentum and laziness on

33:58 Dnomi part.

33:59 And it has nothing to do with using Core CLR because Core CLR uses like CMake for its builds.

34:05 So it's already got a cross-platform build scripts set up and all that.

34:10 It's just basically Dnomi for Pigeon.

34:12 Haven't bothered to write the Visual Studio solution file in CMake to be able to run it on

34:18 Linux or OS X.

34:18 I think that's going to breathe a lot of new interest into sort of the whole CLR and

34:24 the C# side of things from people that are just saying, look, Windows is not an option

34:30 for whatever reason for us.

34:31 Yeah.

34:32 And I really hope it does, too, because I did Java development at Google.

34:37 And honestly, I like C# a lot more.

34:40 Microsoft has done a really good job of shepherding that language forward and continuously evolving

34:46 it.

34:46 Well, I don't think Oracle has done such a great job with Java.

34:50 And C# has just done a better job of going forward continuously.

34:52 I mean, C# has local type difference.

34:56 This is not new technology to this day and age.

34:59 And C# has it.

35:00 And yet Java still doesn't have it.

35:01 It always drove me nuts.

35:02 I mean, bloody C++ has local type difference and C++ has been using auto.

35:08 And yet Java still doesn't have that kind of stuff.

35:11 And it's always kind of buckled my mind that unless you use generics in Java, you can't

35:16 like leave out a type.

35:18 So I really do hope the open sourcing of Core CLR and it being available on Linux and OS X on top of Windows is really going to get more people to really take serious look at C# and F Sharp.

35:28 Absolutely.

35:29 And it definitely makes it your project absolutely broadly applicable to all the Python guys.

35:36 Right.

35:36 Because if for some reason you said it's kind of like Iron Python, it's this really cool implementation of Python on .NET, but it's just tied to Windows, right?

35:44 That would really stifle it.

35:45 But the fact that it's starting out with a base that could be on any of the major platforms is cool.

35:51 This episode is brought to you by SnapCI, the only hosted cloud-based continuous integration and delivery solution that offers multi-stage pipelines as a built-in feature.

36:17 SnapCI is built to follow best practices like automated builds, testing before integration, and provides high visibility into who's doing what.

36:25 Just connect Snap to your GitHub repo and it automatically builds the first pipeline for you.

36:30 It's simple enough for those who are new to continuous integration, yet powerful enough to run dozens of parallel pipelines.

36:36 More reliable and frequent releases.

36:39 That's Snap.

36:40 For a free, no obligation, 30-day trial, just go to snap.ci slash talkpython.

36:46 Technically, I'm on the data science tools team in data and analytics in cloud and enterprise at Microsoft.

37:01 And Azure supports Linux, right, on top of Windows.

37:05 So it'd be really silly of us to develop something that only part of our client base could use, right?

37:11 We want to get Pigeon such that you can use this on your Azure apps or, as I said, in data analytics, and that includes Azure machine learning.

37:18 So we have this thing called Azure ML Studio where it's this whole drag-and-drop machine learning system in the browser.

37:26 It's really cool.

37:27 And you can actually use Python code to, like, transform data and actually run analysis on it and do all this cool stuff.

37:33 And because it's machine learning, it doesn't happen necessarily in one second.

37:36 It can take 30 seconds or five minutes or half an hour or whatever.

37:39 These workloads take enough time that a JIT would be really, really beneficial.

37:43 So it makes total sense both from Azure ML but also just Azure in general to support multiple languages.

37:49 So it just would honestly be stupid of us not to try to support more than just Windows because we'd be leaving out part of our client base.

37:56 And that's just not how you win users.

37:59 It's definitely not.

37:59 So I have a couple of questions about maybe, like, the future, what the future might hold in a Pigeon type of world.

38:08 I've been thinking about kind of what you guys were talking about in what does it take to dramatically move people into Python 3.

38:17 Performance is good.

38:19 20% increase in performance is really good, like we were talking before, and those types of things.

38:23 But what would really sort of hit people in the face and go, yeah, this is different?

38:28 And I think better threading is possibly number one, like removing the global interpreter lock in some way.

38:36 Does this at all touch this concept?

38:40 No, because Pigeon and the JIT API were trying to design.

38:47 One of the key things is we're trying to be compatible with extension modules, C extension modules.

38:52 Because that's always been a big limitation of PyPy, right?

38:55 Like if you write C code and interface using CFFI, that will get you a C extension module for Python that works in both PyPy and in CPython itself.

39:06 But unfortunately, that requires getting people to use CFFI, which is a great project, by the way.

39:09 And I do encourage people to consider that next time they need to wrap some C code.

39:13 But there is also a lot of pre-existing C extension code.

39:16 I mean, this is why PyPy, for instance, before they created CFFI, started to write NumPy from scratch in our Python.

39:23 That's their NumPyPy project.

39:24 You guys definitely don't want to get down that path.

39:26 Yeah, exactly, right?

39:27 We're trying to avoid that completely.

39:28 The problem is extension modules are designed around the concept of the GIL, right?

39:34 The way garbage collection works in Python is reference counting.

39:38 And all the C code works with the assumption that that's how it works and stuff won't magically disappear.

39:44 If you don't decref Python object at the C level, it will stick around.

39:50 So if you get it and then increment that reference count and just leave it incremented until you're finally done with it at the very end, that will guarantee that the object isn't garbage collected.

39:58 And there's just a ton of assumptions in the C code.

40:00 This is not just Python itself, but any third-party C code.

40:05 And so getting rid of the GIL without breaking, basically, the world of C extension modules would be very difficult.

40:13 So I get where it all comes from, people's desire to get rid of the GIL.

40:17 I do think some people get a little huffy about it when they really don't need to.

40:22 I mean, if you do any I.O., it really doesn't matter.

40:25 It's only when you're CPU-bound does this ever even come up.

40:28 Yeah, absolutely.

40:28 But I do get why people do want faster.

40:31 And if you're doing like – I know this comes a lot from the scientific Python community.

40:35 If you're doing a lot of CPU-bound stuff, you really want to not have to have the GIL, right?

40:39 And we get it.

40:41 It's just one of these rock-in-a-hard place where the rock is CPU performance, but then the hard place is all the backwards compatibility with all the pre-existing C extension code.

40:51 Right.

40:51 Like, hey, we have this really fast thing.

40:53 Oh, but you can't use all the stuff that you want to use.

40:55 So you can start from scratch there, right?

40:56 Yeah, exactly.

40:57 Like, it's like going to the scientific Python community and going, okay, you can't use NumPy and possibly scikit-learn, although that's written in Scithon, so they at least have a chance.

41:07 But like, NumPy would not work.

41:09 You okay with that?

41:11 I don't really see that going down very well.

41:13 Yeah, we tried that with Python 2 and Python 3, kind of.

41:16 Yeah, exactly.

41:17 It hasn't gone down so well.

41:19 Exactly.

41:19 Or look at NumPy, right?

41:20 Where they just – Yeah.

41:21 NumPyPy isn't fully compatible.

41:23 So it's like, hey, scientific community, you want to run on PyPy?

41:26 It's like, do you have NumPy?

41:27 No.

41:28 I'm not so excited anymore.

41:29 Maybe sometimes for some things.

41:31 So it's a really tough position to be in where people ask for this without realizing that the ramifications of the community, right?

41:38 And as you pointed out, Michael, we've done this once with Python 2 and 3, right, where we said, okay, for the benefit of the community, we are going to break backwards compatibility.

41:48 And there's totally a way to write code that works in Python 2 and 3.

41:53 It takes some effort.

41:54 It's not like going from Python 2.6 to 2.7.

41:57 There's actually some effort that has to be put in.

41:58 And we've paid a price for it.

42:00 Now, I don't regret the decision, but it does bring up to the point that does the community really want to put up with this again at the C level?

42:07 And I don't know if they do, even if it does get them a gill-free life.

42:11 Now, I'm sure some people are going to say, God, yes, I will totally rewrite all my C extension code to completely ignore whatever it has to and change however it has to to get around the GIL.

42:21 But the question is, what solutions we have that would help migrate existing code and would it be reasonable?

42:27 And I simply just don't have an answer to that.

42:29 Okay.

42:30 Well, that's, I think, a really interesting sort of both sides of the debate to think about for the listeners to think about when they talk about that topic.

42:40 So, with Pigeon, is it too soon to ask about performance or anything like this and how that's looking?

42:45 Or?

42:46 You can always ask the question, Michael.

42:48 I just can't always give a good answer.

42:51 Any news there?

42:53 Or you're just not fully baked yet?

42:56 The current update on that is, I'll give two updates.

42:59 I'll give one on compatibility and one on performance.

43:01 I'll start with the performance.

43:02 It's not bad, but it's not better.

43:05 But this is out of date information, although I don't see it having changed much.

43:09 So, back in November, I was lucky enough to be invited to give the opening keynote of Pigeon Canada.

43:14 And the video is up on YouTube.

43:16 And so, you can find that if you want.

43:18 But basically, I did a survey of all of the – an unscientific survey of Python interpreters.

43:24 And I basically listed the history of all the different implementations of Python over the decades because Python is 25 years old.

43:32 I benchmarked everything because it had been a little while since someone had benchmarked all the interpreters.

43:38 And I included Pigeon in it because I was curious because we hadn't really done any benchmarking.

43:42 In general, some things were faster.

43:45 Some things were slower.

43:46 The median across all the entire Python benchmark suite was slightly slower than Python 2.7.

43:54 But if you looked at the geometric mean, it was actually faster.

43:58 But it was all within not a huge jump between the two.

44:01 And I think we were still faster than, for instance, Jython or IronPython.

44:05 The performance isn't bad.

44:09 It's kind of maybe on par or a little slower.

44:11 But this is with, like, zero optimizations.

44:14 Yeah.

44:14 So, my follow-up question was, if you're already sort of tied and you've not done a ton of optimization, that's actually a really good place to be.

44:23 Yeah, exactly.

44:24 That was one of the key metrics we wanted to hit initially was, okay, can we get to compatibility and not have performance suck, more or less?

44:34 And use that as kind of a showing that, okay, this is not a waste of my time and Dino's time to pursue.

44:40 That it's actually going to be worth all of this effort.

44:43 And that there's actually a chance that this is going to pay off and actually be useful.

44:47 And I will say that as of yesterday, we are more or less compatible with CPython minus supporting tracing, profiling, and anyone who touches sys.undergetframe.

45:04 So, we basically are – we're actually fairly compatible now.

45:08 Yeah, that's really good, actually.

45:09 Yeah.

45:09 There were some hairy bugs in there.

45:11 But Dino deserves a lot of the credit in figuring out how to fix most of the bugs.

45:15 We actually have a text file in our tests directory that lists the nearly 400 tests that Python 3.5 has.

45:24 You can look at it to see what's left to do.

45:27 But most of them are actually profiling-related or tracing-related.

45:30 There's one or two that are dealing with actually an odd semantic compatibility that we think probably needs to get changed actually upstream in CPython itself.

45:39 But otherwise, everything seems to be tracing-based, profiling-based, or using sys.getframe.

45:44 Basically, stuff that would slow everything down anyway if you use.

45:48 So, if you're using UGIT, you're probably not going to want to touch that stuff anyway.

45:51 So, yeah, we're actually pretty happy.

45:53 And we think we're more or less compatible at this point, or at least enough to be willing to go to PyCon and say, we're basically compatible.

45:59 You should give us a shot.

46:00 If performance ends up being good by then.

46:03 Okay.

46:03 That's a really good start, I think.

46:05 Yeah, we're really happy that we've managed to hit this compatibility spot now because we proposed a talk at PyCon.

46:11 Obviously, we don't know if it's been accepted yet.

46:13 But our hope is to, now that we've hit compatibility, to try to spend the next two, three months trying to ramp up performance somewhat and seeing how far we can get.

46:22 And whether we can more consistently either match or actually start beating Python 3.5 somehow.

46:27 Right.

46:28 Interesting.

46:29 Do you feel that CPython itself is getting better because of the pressure that you're putting on it from this slightly different use case?

46:37 I don't think it's really coming from us.

46:40 I think it's coming from all the core devs who are honestly a little tired of people dragging their feet, switching to Python 3.

46:46 We realize that we can give so many carrots in terms of features and stuff, but you kind of have to be inspired to come up with a new feature.

46:57 And actually, there's a really cool one I can talk about if you want coming in Python 3.6 that I think a lot of people are going to love.

47:02 Yeah, tell us.

47:03 Eric Smith has implemented something we're calling format strings or f strings.

47:09 So if you take a string constant and prefix it with f, you can use the formatting that you use with str.format, except you don't have to make the format call.

47:20 And you can specify the name of a variable and it will directly do string substitution.

47:25 So if you did spam equals 42 and had a string constant starting with f and then said, my cost is, and then curly brace, spam, close curly brace, and that was it.

47:37 No format call or no percent, no whatever.

47:39 And you execute that in Python 3.6, it'll actually turn that string into my value is 42.

47:45 Okay, that's pretty awesome.

47:46 That sounds a little bit like the Swift string interpolation or what C# 6 adopted after that as well.

47:54 Yep.

47:55 Same basic idea?

47:56 Yeah, exactly.

47:56 What's even cooler about it is beyond the fact that it keeps almost full compatibility, there's an edge case that I don't remember off the top of my head of the substitution.

48:06 But basically, it works exactly the same way as str.format.

48:08 But what's really cool is Eric implemented a new bytecode for it.

48:13 So it's actually faster than str.format and faster than using modulo, the percent sign for string interpolation.

48:22 So it's actually going to be the fastest way to do string interpolation in Python.

48:25 All right, so I asked about the gill, you know, and that was really an interesting answer.

48:31 Thanks.

48:31 Are there other advantages that this type of JIT API might bring that I'm not thinking of or that are not entirely obvious?

48:39 Beyond just raw performance increases with compatibility with C extension modules, not specifically.

48:47 Basically, as I was not so eloquently saying earlier, we realize to get people to Python 3, we have to add.

48:53 We can add new features, which is one form of a carrot, but it requires inspiration to come up with those new features.

48:58 The other way to do it that doesn't require any inspiration because everyone always wants more is to improve performance.

49:03 And hence, this is why Dino decided to give this a try at PyCon last year, and it's looking like it's going to pay off.

49:09 But this is also why, for instance, Victor Stinner is putting all this effort into it for Red Hat and why Yuri, for his own consulting company, is putting all this time.

49:17 It's basically just people really like fast.

49:20 And if we can give them fast, we hope that it gives people more ammunition to go to their managers and say, look, Python 3 is faster than Python 2.

49:30 We should put in the effort.

49:31 We're going to get a performance when it's worth it.

49:33 Because I did a blog post on this where I compared the five stages of grief to the five stages of the Python 3 transition for the community.

49:41 Everyone seems to at least be at the depression stage, which is stage four.

49:45 And then some people have been lucky enough to get away to stage five and have moved on to Python 3 and are realizing how much nicer it is and all that.

49:52 But those who are stuck in the depression stage are usually people who work at corporations where they've just been told, eh, we don't see enough of a win.

49:59 And I don't want to put any time and effort into our resources into getting our code to move to Python 3.

50:03 Right. It might be better.

50:04 But the manager who makes that decision doesn't want to possibly bear the burden of saying, yeah, we decided to switch.

50:11 But now we can't release our app for six months because we're actually not as quick at converting or something.

50:16 Right. It's just it's easier to just do nothing.

50:18 Right.

50:19 Yeah, exactly.

50:20 And for me, it's a little frustrating because I put a lot of personal time and effort in back in the summer, fall of 2014 to make porting a lot easier.

50:28 And it can be done file by file.

50:30 Right. I think one of the big problems is people feel like they have to port their entire code base at once and they really don't have to.

50:35 Like there's still people out there who think two to three is the cutting edge of porting Python to code to Python 3.

50:40 And it's not at all. I don't even recommend it.

50:43 If you go to docs.python.org, there's a how to section and there's a doc in there that I wrote that explains the current best practices for porting your code from two to three.

50:51 But basically, you can write your Python two to three code that's compatible in both versions and you can do it file by file.

50:57 You don't have to do this huge, massive, let's change everything.

51:01 But the deal and that's great for the engineers who understand like, oh, hey, you know, like having tracebacks put in exceptions and having chained tracebacks in your code so that when you trigger an exception, you can actually see that, oh, this exception was caused by this exception, which was caused by this exception is really useful.

51:17 But you might not be able to sell that to your manager.

51:19 But if you can tell your manager, hey, you know what?

51:22 If we switch Python 3.6 is looking to be 10, 20 percent, whatever, faster than Python 2.7, that's a real performance one that allow us to handle X number more requests per second with no new hardware.

51:35 If we just put the time in to move our code over, wouldn't that be fantastic?

51:38 That's hopefully an easier sell for some of these companies.

51:41 I feel a little bit like boiling the frog, you know, the analogy of you guys keep adding awesome new stuff every time and it's just getting cooler and cooler.

51:50 But there's not that jolt that goes, oh, yeah, we have to go now.

51:54 Right.

51:55 It's just been so, so sort of smooth.

51:57 Yeah, exactly.

51:59 Then to give the job adding new features slowly over time instead of like hiding them in our back pocket and then suddenly springing them on communities like, hey, look, all this new stuff.

52:08 That darn open source.

52:09 People keep figuring out what you're up to.

52:11 That's when the stick of 2020 comes in, right?

52:13 Of like, all right, legacy Python 2 being supported by the core developers for free is going to go away.

52:19 So either port your code if you want free support or go pay someone like Red Hat or Canonical to support your install of Python 2 because it ain't going to be free anymore.

52:29 Yeah.

52:29 So do you want to pay, basically, do you want to pay Red Hat in 2020 to support your Python 2 code or do you want to pay your own engineers now to move to Python 3?

52:37 And then it becomes a cost analysis.

52:40 Yeah, and get all the benefits now.

52:42 2020 sounds so far away, but it is actually 2016.

52:46 I mean, that's only four years.

52:48 That's not really that far for a large code basis.

52:50 Well, and the other thing I'm afraid people aren't thinking of is like, oh, 2020 is not that far.

52:55 I'll start in 2020.

52:56 It's like, no, no, you need to finish your transition by 2020.

52:59 It's not start in 2020.

53:01 It's done by 2020.

53:03 Exactly.

53:03 So it'd be better to start sooner rather than later.

53:07 And I mean, there's still stuff being done to make the porting user.

53:10 For instance, the type hints that Guido's added in Python 3 and is backported to Python 2 using mypy at Dropbox.

53:19 He's hoping to make it so that if you add this typing information, they'll be able to develop a tool to help warn you statically offline that, hey, this code, while it's fine in 2, is kind of questionable and iffy in 3.

53:31 And you might want to tweak it so that there's no question of compatibility so that when you run this code under 3 in the future, it'll be okay and you won't have any issues.

53:38 So there's even still tool work being done to make it easier.

53:40 Yeah, that's really interesting.

53:42 Yeah, the real problem is that people are still running new code that's 2 only instead of going like all new code should at least be 2 and 3 compatible.

53:49 Because if you do that, then at least your problem is like set and it's not getting any worse.

53:54 But it's still writing all your code in 2.

53:56 You're just making your problem worse and worse as time goes on.

53:59 So this is why I always, whenever I give talks on Python 3, I always go, okay, I want you to go home and I want you to do two things.

54:06 Don't write any more new Python 2 code.

54:08 Only write Python 2 code that will work in 2 and 3 and then slowly start pointing your code over to 3.

54:13 File by file.

54:14 I'm not expecting everyone to do their whole code base, but at least start.

54:17 And at least getting the practices in place, like adding the future statements or running pilot with the --py3k flag to get some of the warnings.

54:25 If you run Python 2, make sure you run it with the dash 3 flag so you get py3k warnings out of the interpreter.

54:30 I mean, there's a bunch of stuff you can do that you can just integrate as part of your serious integration or day-to-day practices and just will make your life easier when you finally do get to flip the switch.

54:39 It doesn't have to be all shut down the app in six for six months while you do the port.

54:43 It's like, no, just spend a little time anytime you tweak some code to make it more compatible and just you'll slowly work your way forward.

54:49 Because otherwise, you just make it that much more of a burden in the future.

54:53 That's really good advice.

54:54 I think, you know, if you find yourself in a hole, the first step to get out of it is stop digging, right?

54:58 Exactly.

54:59 And I suspect it's a little easier sell if you don't necessarily tell management that we're going to support Python 2 and 3 with new code and slowly fix things as we fix bugs.

55:09 You just institute practices of, oh, okay, we are going to now run Python 2 with the dash 3 flag in our continuous integration tests.

55:17 We're going to use PyLint to actually check for errors.

55:21 And we're just going to start fixing up where we're really kind of ambiguous, whether we're working with binary data or textual data, so that we know exactly what needs to support Unicode and what needs to support bytes.

55:32 And you can just do it slowly over time and just make it slowly easier until you go, okay, Madge, but look, we already support it.

55:38 Or can we just have like a week or a month or however much time you think you need to get over that last hump and get done?

55:43 Right.

55:44 Well, you know, maybe as we get closer to 2020, the 20% time concept maybe could be applied as well.

55:51 Like, look, we're not going to stop and just do the switch, but, you know, just dedicate some of our time.

55:56 Like Friday afternoons is like whatever, you know, as a team.

56:00 And eventually you'll get there, right?

56:01 Yeah, no, exactly.

56:02 Cool.

56:02 So we're kind of getting closer to the end of the show.

56:05 Let me ask you just a few more questions.

56:07 There's an interesting story how the name Pidgin came to be.

56:11 Can you tell me like what that's sort of derived from?

56:13 As I said earlier, Dino started this project just on kind of basically a whim after being inspired.

56:18 Like, oh, what can I do to make Python 3 faster to get people to switch?

56:21 And he wanted a name that somehow involved Python and Jet.

56:25 He came up with Pidgin, which is spelled P-Y-J-I-O-N.

56:30 It throws everyone for a loop until Dino or I tell them it's pronounced Pidgin.

56:35 At least that's how we expect people to pronounce it.

56:37 But then again, I'm kind of used to it.

56:39 For instance, people always mispronounce PyPI.

56:43 The Python package index, the abbreviation is P-Y-P-I.

56:46 And it's pronounced PyPI.

56:48 But I've heard so many people call it PyPy, Hippie, PyPy, PP.

56:53 I've heard every different way of saying it.

56:56 And I'm just going to tell your audience, it's PyPI.

57:00 Or call it the cheese shop, which was its original name until some people were too worried that pointy-haired managers would not take Python seriously back in like 2005.

57:10 And we renamed it the Python package index.

57:12 No, that's awesome.

57:13 And by the way, cheese shop.python.org does work.

57:15 And it will redirect you to PyPI.python.org.

57:17 Yeah, yeah.

57:18 Very cool.

57:19 Two other questions.

57:20 If you can go write some Python code, what editor do you open?

57:23 I'm actually currently opening Visual Studio Code or VS Code.

57:26 I have very little allegiance to code editors.

57:30 I totally jump around constantly.

57:32 I learned Vim way back in my undergrad days.

57:36 And I used that for a long time.

57:38 But I've tried Eclipse.

57:41 I've never really went anywhere.

57:43 But I did try it.

57:44 I was a TextMate user for quite a while until updates kind of dried up.

57:48 And then I ended up switching to Sublime, especially when Sublime 3 beta came out using Python 3.

57:55 And it's like, all right, I can throw a couple bucks this way to support someone going out on a limb and going with Python 3.

57:59 But then their updates kind of slowed up a lot.

58:03 And then I used Atom for a while from GitHub.

58:07 I was using that.

58:08 And I actually still do whenever I do Dart development.

58:10 I joined Microsoft.

58:11 And Microsoft released VS Code.

58:13 And we actually announced.

58:15 Which is not the same thing as Visual Studio, right?

58:18 No, not at all.

58:19 So Visual Studio is an integrated development environment, right?

58:22 It's a full-fledged IDE.

58:24 It does everything.

58:26 And if you like IDEs, it's actually really great.

58:29 It is Windows only, though.

58:31 And it is an IDE.

58:32 And I'm personally a code editor kind of guy.

58:34 Like, I like separate tools.

58:35 Like, I will have a command.

58:37 Like, I will have Git Bash open to do my own Git work.

58:41 I don't need an IDE to give me a fancy tree view of all my branches, for instance.

58:45 I like having a separate code editor.

58:47 And VS Code is more like Atom than it is like Visual Studio.

58:52 But it is from the same team of Visual Studio.

58:54 So it's from a team that's been doing code editing and IDE development for basically decades.

59:00 So there's a lot of wealth of knowledge there for the design of it.

59:03 And we've actually announced that my team, which is in charge of Python tools for Visual Studio, which is actually a really cool plugin, which lets you do crazy stuff like debug across Python and C code and other stuff.

59:13 We're actually in charge of adding Python support to VS Code.

59:17 Oh, that's cool to hear.

59:18 Yeah.

59:18 We don't have a timeline or anything like that.

59:20 But my manager announced it on Hacker News.

59:22 So I can talk about publicly that we've been put in charge of doing that once we get around to it.

59:27 And we're actually hiring for that kind of stuff.

59:29 Python jobs at Microsoft.com.

59:30 Very cool.

59:32 If you want to work in Ruby.

59:32 Yeah, there's a lot of stuff going on with Python around there.

59:35 More than people might think these days.

59:36 Yeah, exactly.

59:37 So I'm actually using VS Code because I want to make sure I fully understand it for when we do development with it and know where we need to add stuff in and be familiar with it.

59:45 So that I can either contribute to the project or at least be an internal tester of all the stuff we have.

59:50 Right.

59:51 An advisor.

59:51 Very cool.

59:52 So the other question is on PyPI, there are many thousands of packages.

59:56 Everybody has their own sort of favorite that a lot of people don't have experience with.

01:00:01 What's yours?

01:00:02 I'm going to cheat.

01:00:03 And I'm going to say PyPI.

01:00:05 I think it kind of goes a little unnoticed that you hear people complain about the state of packaging in Python and all that on occasion.

01:00:15 But I don't know if people truly realize how organically grown it is, which is partially why it's taken so long to get stuff straightened out in it.

01:00:22 But also how difficult of a problem it is and how useful it is to have PyPI.

01:00:28 I remember back starting with Python when people would ask, what's Python or is that the language with the white space?

01:00:35 And so this is back in the day when CPAN was a big deal.

01:00:38 And I had to longingly look at Perl and like, oh, my God, they have the central repository for all their projects.

01:00:43 This is amazing.

01:00:44 And then Richard Jones and Martin Lewis at PyCon 2005, I think, maybe earlier, did PyPI.

01:00:51 And suddenly we had this central place where people could upload their own packages and it wasn't manually maintained.

01:00:56 And so we had this index.

01:00:59 And I think it was a real boon for the community because suddenly there was a single place to find your code and fetch your code and just keep track of stuff.

01:01:08 And so I would say PyPI and actually specifically the sequel of PyPI that's actually being developed right now called Warehouse, being led by Donald Stuffed of pip fame.

01:01:16 That would probably be my project of choice because that's going to be a big deal.

01:01:21 And he actually is working with someone to actually do user experience design on it.

01:01:25 And it looks really sharp.

01:01:26 And actually, if you want to help contribute, they're taking contributions.

01:01:29 So I believe if you search for PyPI warehouse or maybe if you go to GitHub.com slash PyPA, which stands for the Python Package Authority, there should be a warehouse repo.

01:01:39 And you should be able to take a look at what the next version of PyPI is going to look like.

01:01:43 Oh, that's a very meta, but a very good answer.

01:01:45 Thanks for that.

01:01:46 All right, Brett.

01:01:48 It's been super interesting.

01:01:49 I've really learned a lot talking about all the internals.

01:01:52 And I wish you guys a lot of luck with this project.

01:01:54 It seems really promising.

01:01:55 Yeah.

01:01:56 Well, thanks a lot, Michael.

01:01:57 I really hope it works out.

01:01:58 And thanks for having me.

01:01:59 I'm actually a listener.

01:02:00 So I feel really honored to be on the podcast.

01:02:03 You're absolutely welcome.

01:02:04 It's been great.

01:02:05 Thanks so much.

01:02:05 Talk to you later.

01:02:06 This has been another episode of Talk Python To Me.

01:02:10 Today's guest was Brett Cannon, and this episode has been sponsored by Hired and SnapCI.

01:02:14 Thank you guys for supporting the show.

01:02:16 Hired wants to help you find your next big thing.

01:02:18 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity

01:02:23 right up front and a special listener signing bonus of $2,000.

01:02:27 SnapCI is modern continuous integration and delivery.

01:02:30 Build, test, and deploy your code right from GitHub, all in your browser with debugging,

01:02:35 Docker, and parallelism included.

01:02:36 Try them for free at Snap.CI slash Talk Python.

01:02:39 And do check out my video course I'm building.

01:02:42 The Kickstarter is open until March 18th, and you'll find all the details at talkpython.fm

01:02:48 slash course.

01:02:48 You can find the links from this show at talkpython.fm/episodes slash show slash 49.

01:02:54 And be sure to subscribe to the show.

01:02:57 Open your favorite podcatcher and search for Python.

01:02:59 We should be right at the top.

01:03:00 You can also find the iTunes and direct RSS feeds in the footer of the website.

01:03:04 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

01:03:09 And you can hear an entire song on talkpython.fm and find links to all of his music on SoundCloud.

01:03:14 This is your host, Michael Kennedy.

01:03:17 As always, thank you so much for listening.

01:03:19 I really appreciate it and hope you enjoyed it.

01:03:21 Smix, take us out of here.

01:03:23 Outro Music.

01:03:44 you