Time to JIT your Python with Pyjion?

Episode #340, published Wed, Nov 10, 2021, recorded Wed, Nov 3, 2021

Episode Deep Dive Links Transcript

Is Python slow? We touched on that question with Guido and Mark last episode. This time we welcome back friend of the show, Anthony Shaw. Here's there to share the massive amount of work he's been doing to answer that question and speed things up where they answer is yes. He's just released version 1.0 of the Pyjion project.

Pyjion is a drop-in JIT compiler for Python 3.10. Pyjion uses the power of the .NET 6 cross-platform JIT compiler to optimize Python code on the fly, with NO changes to your source code required. It runs on Linux, macOS, and Windows, x64 and ARM64.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Anthony Shaw is a seasoned Python developer, prolific contributor to open-source projects, and a Python Software Foundation Fellow. He has worked extensively on Python performance and internals, including the new JIT compiler project called Pyjion. Anthony is also well-known for his contributions to CPython and for creating numerous libraries and tools that blend Python with other ecosystems such as .NET and C++. He’s passionate about bridging technology gaps and sharing his deep curiosity with the broader Python community.

What to Know If You're New to Python

If this is one of your first deep-dives into Python, here are a few points to help you follow along:

Python has multiple “implementations” beyond just the official CPython (e.g., PyPy, Jython, IronPython). Pyjion sits atop standard CPython but gives it JIT superpowers.
You’ll hear references to “bytecode,” “C API,” and “machine code.” Just know these are lower-level details that make Python run.
If you see or hear the term “PEP,” it stands for Python Enhancement Proposal. These are design documents that guide and shape Python’s development.

Key Points and Takeaways

Why JIT (Just-In-Time) Compilation in Python Matters Pyjion aims to address Python’s sometimes-slower runtime performance by compiling Python functions into native machine code on the fly. By skipping bytecode interpretation in tight loops or numeric-heavy tasks, this can greatly accelerate certain workloads.
- Links and Tools:
  - Pyjion (on PyPI)
  - Try it live
How Pyjion Works Under the Hood Pyjion uses .NET 6’s cross-platform JIT compiler as a backend, translating Python bytecode into .NET’s intermediate language, which is then turned into machine code. Thanks to PEP 523, Pyjion can “take over” code evaluation from CPython without needing a custom fork.
- Links and Tools:
  - .NET 6 documentation
  - PEP 523
Profile-Guided Optimizations for Speed Pyjion supports a “profile-guided” approach, where the first pass through a function profiles data types (e.g., ints, floats) and then recompiles it for faster execution. This means code that is run frequently (“hot code”) can get significant speedups over time.
- Links and Tools:
  - Pyjion GitHub repository
Impressive Gains in Numeric and Math-Heavy Code Many Python users who do numeric computations directly in Python (as opposed to calling out to libraries like NumPy) can see big boosts. Anthony mentioned micro-benchmarks showing up to a 20x speedup in certain pure-Python math loops, and a big lift on the “n-body” planetary simulation problem.
- Links and Tools:
  - NumPy
  - n-body problem explanation
Fallback to CPython for Unsupported Features Not every code path is fully optimized by Pyjion, especially certain async/await features or specialized C extension modules. Instead of crashing, Pyjion gracefully hands unsupported code back to CPython’s normal interpreter loop. This makes it less risky to try on large existing codebases.
- Tools Mentioned:
  - Flask
  - Django
Comparisons to Other Speedups: PyPy, Numba, and Cython While PyPy reimplements the entire Python interpreter and Numba focuses on accelerating NumPy-based workflows, Pyjion is a drop-in “extension” to stock CPython. It aims for broad compatibility, minimal code modifications, and integration with standard C APIs that tools like Cython also rely upon.
- Tools:
  - PyPy
  - Numba
  - Cython
Working with Web Frameworks and WSGI Middleware Pyjion offers a WSGI middleware so you can enable JIT compilation in frameworks such as Flask or Django with minimal setup. While dynamic frameworks may not see the same dramatic speedups as math-bound code, the overhead of repeated function calls can still see improvements in certain endpoints.
- Tools:
  - WSGI PEP 3333
Installation and Activation Getting started with Pyjion is straightforward on Python 3.10 or later:
1. Install .NET 6 or higher
2. pip install pyjion
3. import pyjion; pyjion.enable() Or use the pyjion command instead of python to launch scripts or modules.
- Links:
  - Pyjion on PyPI
  - Installation Guide
Type Specialization vs. Python’s Dynamic Nature Much of the performance speedup comes from “unboxing” integers, floats, and other small data types. When Python’s dynamic typing remains unpredictable, Pyjion may produce more general code. This dynamic vs. specialized tradeoff is a core challenge for any JIT in a dynamically typed language.
- Links and Tools:
  - Python Data Model Docs
Future Directions and .NET Collaboration Anthony has worked closely with the .NET compiler team, filing issues and collaborating to improve the JIT for scenarios like Python. This paves the way for deeper cross-ecosystem synergy, possibly letting Python devs reuse .NET’s tooling and speed without rewriting code.

Links and Tools:
- Python for .NET Developers Course

Interesting Quotes and Stories

On Curiosity and Starting Pyjion: “I was one of those projects I got started on because it just had my curiosity. I didn’t understand how it could work and really wanted to learn.” — Anthony Shaw
On Speeding Up ‘N-Body’: “We went from around 33% faster than CPython to more like 65%, and even 20 times faster for certain micro-benchmarks.” — Anthony Shaw

Key Definitions and Terms

CPython: The default and most widely used implementation of the Python language, written in C.
JIT Compiler: Just-In-Time compiler converts interpreted bytecode into machine code as the program runs, often with real-time optimizations.
PEP 523: A Python Enhancement Proposal that allows customizing Python’s “ceval” loop, enabling alternative bytecode evaluators like Pyjion.
Specializations: Optimizations where the JIT compiler narrows down code paths based on data types (e.g., int vs. float).
Profile-Guided Optimization (PGO): A technique where runtime data about code usage (frequently used paths, variable types, etc.) feeds into a second or third compilation pass to make the code even faster.

Learning Resources

If you are new to Python or want to deepen your understanding, check out these courses and references:

Python for Absolute Beginners: A thorough introduction to Python covering all the basics you’ll need before exploring advanced topics like JIT compilation.
Python for .NET Developers: Ideal for .NET developers looking to leverage or integrate Python effectively, especially relevant if you’re interested in .NET-based JIT methods like Pyjion.

Overall Takeaway

Pyjion represents a major leap in bridging Python’s ease of use with highly optimized machine code. It aims to provide a near-seamless drop-in JIT solution compatible with most of your existing Python code, all while retaining the strengths of CPython and .NET. For data scientists, web developers, and Python enthusiasts at large, Pyjion signals the exciting possibility that you can keep writing idiomatic Python while still harnessing serious performance gains—no major rewrites or specialized frameworks required.

Links from the show

Anthony on Twitter: @anthonypjshaw
Pyjion: github.com
Restarting Pyjion Presentation: youtube.com
Hathi: SQL host scanner and dictionary attack tool: github.com
Try Pyjion online: trypyjion.com
Pyjion optimizations: readthedocs.io
Pyjion docs: readthedocs.io
.NET: dotnet.microsoft.com
PEP 523: python.org
Pydantic validation decorator: helpmanual.io
Tortoise ORM: github.com
pypy: pypy.org
Numba: numba.pydata.org
NGen AOT Compiler: microsoft.com
Watch this episode on YouTube: youtube.com
Episode #340 deep-dive: talkpython.fm/340
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #340 deep-dive: talkpython.fm/340

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Is Python slow? We touched on that question with Guido and Mark in the last episode.

00:04 This time, we welcome back friend of the show, Anthony Shaw. He's here to share the massive

00:09 amount of work that he's been doing to answer that question and speed up things where the answer

00:13 is yes. And he just released version one of the Pigeon project. Pigeon is a drop-in JIT compiler

00:21 for Python 3.10. It uses the .NET 6 cross-platform JIT to compile and optimize Python code on the fly

00:29 with zero changes to your source code. It runs on Linux, macOS, and Windows, both x64 and ARM64.

00:36 It's a cool project, and I'm excited Anthony's here to tell us all about it.

00:39 This is Talk Python To Me, episode 340, recorded November 3rd, 2021.

00:58 Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow

01:03 me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at

01:08 talkpython.fm. And follow the show on Twitter via at Talk Python. We've started streaming most of our

01:14 episodes live on YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube to get

01:20 notified about upcoming shows and be part of that episode. This episode is brought to you by Shortcut

01:26 and Linode, and the transcripts are sponsored by Assembly AI. Anthony Shaw, welcome to Talk By The

01:34 Enemy.

01:34 Hi, Michael. Good to see you again.

01:35 Yeah, it's great to have you back. You're at least yearly appearance here. If not by name,

01:41 you know, you have at least appear yourself once. But I think you get mentioned a bunch of times with

01:46 all the stuff you're doing.

01:47 Yeah, we're just trying to work out when there's the last time. It's like a year, over a year ago.

01:51 Yeah, yeah, it was May 2020. See, that was when the whole COVID thing was about to end. Like,

01:58 hey, it's just a couple of months. We'll be through this. It'll be fine. Everyone will wear their mask

02:01 and get their shots and it'll be totally normal.

02:03 I had to pause for a second. I was trying to work out what year it was.

02:08 I know. I know. Well, you're in Australia, so it's probably the next year.

02:12 2022, yeah.

02:13 Yeah, you guys are always ahead by a little bit.

02:15 Yeah, exactly.

02:16 Awesome. Well, welcome to the show. This time we're here to talk about a project that you've

02:21 been spearheading for the last year or so, Pigeon, which is a JIT compiler for Python,

02:27 which is pretty awesome.

02:29 Yeah, I kind of, it was sitting on the shelf for a few years and I decided to pick it up. It was

02:35 related to a thread of things that I've been working on, looking at Python performance. And also

02:39 when I kind of finished working on the book, so.

02:42 The CPython internals book, yeah.

02:44 CPython internals book. Yeah, it was like really interesting to dig into a compiler and I really

02:50 wanted to put some of that theory into practice and work on my own compiler. So yeah, that's kind

02:56 of what led into it.

02:57 It looks like no small feat as people will see as we get into it, but how much did the diving into

03:02 the C internals make you feel like, all right, I'm ready to actually start messing with this and

03:06 messing with how it runs and, you know, compiling stuff to machine instruction. Yeah.

03:11 Yeah, massively. I don't know where I would have started otherwise. It's a pretty steep learning

03:15 curve.

03:15 Yeah. If someone gave me that job, I'd be like, I have no idea how to do this.

03:19 Yeah. I was one of those projects I got started on because it just had my curiosity. I didn't

03:24 understand how it could work and yeah, I just really wanted to learn and it just seemed like

03:29 a really big challenge. So yeah, I looked at it and thought this is an interesting thing.

03:33 I know that Brett and Dino were no longer working on it and just decided to pick it up and see how

03:38 far I could take it.

03:38 Yeah, absolutely. I think you've taken it pretty far and it's about to go to 1.0. Is that right?

03:45 Yeah. So it could be launching version 1.0 of Pigeon in a few days. So I'm waiting for .NET 6 to be

03:53 released and I'll explain why probably a bit later.

03:56 Wait, wait, wait, wait, wait, wait, wait, wait, wait, hold on, hold on. This is the Python project

04:01 on a Python podcast and you're waiting for .NET 6? Oh my gosh, I don't understand.

04:04 Yeah, this bit confuses people and it did originally, I think when the project was written. So

04:10 it's a JIT compiler for Python written in C++ and you could write your own JIT compiler,

04:18 which is no small feat.

04:20 Yeah. And then you'd be done in like 10 years.

04:22 Exactly. Or you can take a JIT compiler off the shelf. So a JIT compiler compiles some sort of

04:28 intermediary language into machine code. So assembly. A lot of JIT compilers are working with basically

04:36 like which registers to use, which operations and instructions are supported on different CPUs.

04:42 There's a whole bunch of stuff to work on. Very, very involved. And .NET Core, what was called .NET Core,

04:50 now just .NET, has a JIT compiler in it. And yeah, you can actually use just the JIT compiler.

04:57 So that's what the project originally did was to basically use the JIT compiler for .NET Core.

05:04 Other JITs for Python, some of them use the LLVM JIT. And there's a few other JITs as well you can get

05:10 off the shelf. So yeah, I started to use the .NET one. That was originally Dino and Brett when they

05:16 built this. It was actually before .NET Core was even released. It was still in beta back then,

05:20 but the other used .NET Core's JIT.

05:22 Yeah. I think it was a much earlier version of .NET Core back then, right? Like it's come a long way.

05:28 I think back then, you probably know this better than I do now, but back then I think there was like

05:33 a fork in the road to pass for the .NET world. You could do traditional .NET on Windows only,

05:41 but that was like the 15-year polished version of .NET. And then there was this alternative,

05:47 funky, open source .NET Core thing that sort of derived, but was not the same thing as that.

05:54 And so I think probably when Brett and Dino were working on it, it was really early days on that

05:58 version of the JIT compiler.

06:00 Yeah, it was version 0.9 of .NET Core.

06:02 Right.

06:02 So yeah, I did actually get involved back then at just helping out, upgrading it from version

06:08 0.9 of .NET Core to version one. So I did like help on the original Pigeon project with some of the

06:15 builds and stuff, but that early version of Pigeon required a fork of CPython and a fork of .NET Core.

06:22 So it required you to compile both projects from source with patches and like stick a whole bunch

06:29 of stuff together. And yeah, very tricky to set up. And that's one of the things when I kind of came in

06:34 a year ago to pick this project up again, that I really wanted to tackle was let's make it easy

06:39 to install this, which means it should just be pip installable. So you can just pip install Pigeon

06:45 on Python. And that's the second thing I had to do was to upgrade it to the latest versions of Python,

06:50 the latest versions of .NET. So yeah, it's running on .NET 6 and CPython 3.10. So yeah,

06:58 it's basically a JIT compiler for Python 3.10.

07:01 Right. So now we have .NET is, I think they've closed that fork. It's a little bit like the two

07:07 to three boundary that we've crossed. It's just back to this .NET thing and it's open source,

07:12 which is cool. So that closes the .NET JIT side of things from a beta thing. On the other hand,

07:19 there's a PEP and you'll, I'm sure you know the number. I don't know the number off the top of my head

07:24 that allowed for extensions into standard CPython. So you don't have to fork it and reprogram

07:32 ceval.c. There's an extensible, an extensibility layer for this kind of stuff now, right?

07:38 Yeah. So maybe to backtrack a little bit, but when you write Python code and then execute it

07:42 in CPython, which is the most popular Python interpreter, the one you get from Python.org,

07:47 when you compile the, well, you don't compile the Python code. Python does it for you.

07:52 When you compile Python code, it compiles down into an abstract syntax tree. And then the next level

07:59 down is a bytecode sequence, which you can see if you import the disk module and you run the disk

08:06 function on your code. I've got to talk about Pidgin, which I did at PyCon this year.

08:12 Yeah. We'll link to that in the show notes. Yeah.

08:13 Yeah. So it gives some explanations and examples. So that's kind of the bytecode. And yeah, basically

08:20 this looks at how can the bytecode gets handed off onto an evaluation loop in Python, which is called

08:28 a ceval.

08:29 Is that one of the first things you looked at when you started diving into the c source code? Is that

08:34 the first file you went to?

08:35 I was one of, yeah. It's such a big one. I remember on Python bytes last week, you mentioned that

08:42 Lukash had been analyzing bits of Python that changed the most. And that was like the most changed bit of

08:48 Python. It's kind of like the brain of Python, really. It's the loop that evaluates all the

08:52 instructions and then calls all the different C APIs to actually make your code do things.

08:58 And what you can do in this PEP, which Brett proposed when originally when he was working on

09:04 Pidgin, was that you can actually tell cPython to not use its own evaluation loop, but use a

09:10 replacement one. So this PEP 523 it is, you can basically write an extension module for Python

09:17 and then say, okay, from now on, I will evaluate all Python code. Well, Python compiles it for you and

09:25 then just gives it to you as a bytecode code object with bytecodes in it. And then you can write a custom

09:31 function which will evaluate those bytecodes.

09:34 Right. Exactly. Normally there's just a switch statement that says, if I get this bytecode,

09:38 here's what I do. If I get that bytecode, here's what I do.

09:41 One of the drawbacks of that, that makes it super hard to optimize, among other things, is it's one

09:47 statement at a time, right? Like the C of L dot C, that switch statement, that loop doesn't go,

09:53 here's a series of possibly related opcodes, make that happen. It goes, no, you need to load this

10:01 variable. Now create this object. And there's just not a lot of room for optimization there,

10:07 right? You're not going to inline a function or do other types of things when it's, you know,

10:10 instruction by instruction.

10:12 Yeah, exactly. So what Pidgin does essentially is it implements that API. When you install Pidgin

10:19 and activate Pidgin, which you do by importing Pidgin and then just doing Pidgin dot enable,

10:24 it will tell CPython that Pidgin will be the function to evaluate all Python code from now on.

10:31 And when it sees a new function for the first time, instead of interpreting all the bytecode

10:37 instructions, when you execute the function, it will basically interpret those ahead of time and

10:43 then compile them into machine code instructions and then store that machine code in memory and then

10:50 re-execute it every time you run the function. So it basically compiles the function down into

10:56 assembly, essentially, and then puts that assembly object in memory. And then when you run the Python

11:03 function, if it's already been compiled, it will then just run those instructions.

11:07 Right. As standard JIT stuff, it has to do it once. But then once it's hit one method, it's like,

11:13 okay, this one, here's the machine instructions for it. We're just going to reuse that, right?

11:17 Yeah, because the computer needs machine code to do anything. So something has to have compiled it

11:24 down into machine code. And in the case of normal CPython, CPython is written in C and the C compiler

11:30 has compiled that down into machine code. But it's a loop that it runs through for each bytecode

11:36 instruction to go, okay, this is a add operator. So, you know, I'm going to take two objects off the

11:43 stack, the left-hand side and the right-hand side, and then I'm going to call the add function in the

11:48 C API.

11:48 Right. Exactly. I kind of got you diving down deep a little too quick. I do want to set the stage just

11:54 a moment before we get into how all of this works. Because understanding where you're coming from and

12:01 understanding some of the problems you're trying to solve are really going to be helpful to seeing

12:05 the value here. So back in, what was it, April of 2020, whatever PyCon was, that virtual PyCon,

12:13 the first virtual PyCon, you had that talk called Why Python is Slow. And you talked about some

12:18 interesting things that really set the stage for, well, if you had a JIT and you had all sorts of

12:23 control over it, as you do, how could it be faster? What could we do, right? So one of the things you

12:27 talked about was this in-body problem, how C++ relative to say Python, there was a bit of a

12:33 difference there, but also .NET Core was a lot faster. And really importantly, JavaScript was

12:37 faster, right?

12:38 Yeah. So that was 2019.

12:41 My gosh. Okay.

12:43 Yeah. This is part of this lovely pandemic. So yeah, I covered the in-body problem, which is

12:49 interesting because it's not a, the in-body problem is a mathematical formula that calculates

12:55 the position of the jovial planets.

12:59 And as soon as you have three or more, it starts to get super complicated, right?

13:02 Yeah. It's basically like a big mathematical formula and it just loops through the iterations

13:07 to work out the position of different planets. So it's, it's kind of the difference between C is

13:13 seven seconds it takes to run the algorithm in C and 14 minutes it takes to run it in Python.

13:18 Python's even slower than Perl, which isn't embarrassing.

13:21 That's a little embarrassing. Yeah. It's actually pretty much the worst case scenario for all

13:27 the reasonable languages. Yeah.

13:28 In that talk, I dug into the details about why some of the reasons why that is. And kind of the core of

13:34 the in-body algorithm is this, is a few lines of code, which basically calculate, look at floating

13:40 point numbers and it calculates, does the big calculation. So there's a lot of mathematical

13:45 operations. There's like minus divide power add, which is great. And it can all be done in line.

13:51 CPUs are very efficient at doing this because CPUs natively understand floating point numbers.

13:58 Yeah. But a number in C and a number in Python, these are not equivalent, right? A floating point

14:03 number in C is probably eight bytes on the stack. A floating point number in Python is a, what is that?

14:09 A PI float object that's 50 bytes and is out on the heap, probably separated in space from the other

14:15 numbers in terms of like, it'll cause more cache misses and flushes and all sorts of stuff, right?

14:21 Yeah. So a floating point number in Python is a, is an immutable object and it's basically a wrapper

14:28 around a double. So yeah, basically you have to create a Python object to store the floating

14:35 point number. And then if the value changes, you have to create a new one. So the issue in N body

14:40 is that you have to create for one, one line of Python that just does a whole bunch of work to get a

14:46 single answer, like a single floating point number, all the interim values in that calculation create

14:52 like 18 objects, which are immediately discarded.

14:55 Right, right, right. So the memory management kicks in just constantly. Yeah.

15:00 Yeah. And Python is pretty efficient at allocating small objects, but when you magnify that to the

15:06 level that is seen in the N body problem, then yeah, that's why it's so slow effectively because

15:11 it's creating all these temporary objects and then destroying them in the next operation.

15:19 This portion of talk Python to me is brought to you by shortcut, formerly known as clubhouse.io. Happy

15:25 with your project management tool. Most tools are either too simple for a growing engineering team to

15:30 manage everything or way too complex for anyone to want to use them without constant prodding.

15:35 Shortcut is different though, because it's worse. No, wait, no, I mean, it's better.

15:39 Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible,

15:45 powerful, and many other nice positive adjectives. Key features include team-based workflows.

15:50 Individual teams can use default workflows or customize them to match the way they work.

15:55 Org wide goals and roadmaps. The work in these workflows is automatically tied into larger company

16:01 goals. It takes one click to move from a roadmap to a team's work to individual updates and back.

16:07 Type version control integration. Whether you use GitHub, GitLab, or Bitbucket, clubhouse ties directly into

16:13 them so you can update progress from the command line. Keyboard friendly interface. The rest of

16:18 shortcut is just as friendly as their power bar allowing you to do virtually anything without

16:24 touching your mouse. Throw that thing in the trash. Iteration planning. Set weekly priorities and let

16:29 shortcut run the schedule for you with accompanying burndown charts and other reporting. Give it a try over at

16:36 talkpython.fm/shortcut. Again, that's talkpython.fm/shortcut. Choose shortcut because you shouldn't have to project manage your project management.

16:46 One of the things you can do is maybe understand that there are numbers there and treat them.

16:53 Keep, say, the intermediate values as floating points and only return the result to Python.

16:58 Like, okay, this Python runtime is going to need a pyint or a pylong or a pyflow or whatever, but we don't need to do all the intermediate steps that way.

17:09 We could compute those in a lower level thing because we, again, understand the whole function, not just understand, you know, multiply two numbers,

17:17 multiply two numbers, add two numbers, but like that whole thing as a group, right?

17:21 Yeah, exactly. So the principle behind some of the design ideas in Pidgin is that, and in lots of other compilers, this is not something I came up with. But the idea is to try and keep things as efficient as possible by just carrying

17:36 values on the CPU registers and then not allocating memory in the heap. And a floating point number fits in a CPU register.

17:44 So a 64-bit integer or a floating point number fit in a CPU register. So let's just carry those values on the registers and then do low level instructions to do addition and minus and multiplication as well, but not divide.

17:59 Because I found out Python has a whole bunch of like custom rules for division.

18:05 Yeah.

18:06 Yeah.

18:06 So I can rely on the CPU instructions to do that.

18:09 So these are the types of things that you're like, well, maybe we could use this PEP 523 and some kind of JIT compiler and turn it loose on this.

18:19 I just earlier this week interviewed Guido and Mark Shannon about just general Python performance.

18:26 They're working on sort of a parallel branch of making Python faster, which is great.

18:31 So out of the live stream, Josh Peake asks, Mark Guido teased the potential of addition of a JIT to CPython 3.13.14.

18:39 Would this potentially intersect with that project?

18:43 Is this totally separate?

18:44 Do you have any visibility there?

18:45 Yeah, I haven't asked to be involved in that yet.

18:48 I don't know if, I mean, Mark Shannon's experience with compilers is like miles ahead of mine.

18:56 And to be frank and, and, and, you know, Guido has invented the language.

19:00 So like their knowledge surpasses quite substantially.

19:03 Yeah.

19:04 I hope that the work done in this project will be insightful when they're designing the JIT.

19:10 And I've already spoken to both of them about this project and walk through like what's working and what isn't.

19:17 And cause it's, I guess, quite a bit ahead and it's dealing with some challenges, which they're probably going to hit when they come to this.

19:23 And then yeah, it will, it will steer, steer them in that direction.

19:26 Cool.

19:26 Let's talk about compiling just for a bit, because I did a lot of C++.

19:31 I remember pressing compile the build button or the run, which would build in a run and you would see it grind.

19:38 And actually thinking back, that was when computers actually made noises.

19:42 They would like, you know, like their hard drive would like make noises that they are, they, you would hear it compiling even.

19:48 Yeah.

19:49 Also in C#, .NET compile, but less and much faster.

19:53 But in Python, it's just, it runs.

19:55 It feels like it just runs.

19:56 And I don't remember this compile step.

19:58 And yet there is a, there is an aspect of compiling, right?

20:01 Yeah, it happens.

20:02 You just don't see it.

20:03 So yeah, it happens behind the scenes.

20:04 It compiles it, but it doesn't compile it into machine code.

20:07 It compiles it into bytecode.

20:09 Right.

20:09 Which is the same as .NET and Java.

20:11 But the difference is what happens to that bytecode next, right?

20:14 Yeah, it is similar, but the Python bytecode is, is much higher level.

20:19 So there's like single operations for just add two objects, for example.

20:24 Right.

20:24 Or put this thing in a list.

20:25 Yeah.

20:25 Like add this thing to a list or merge two dictionaries.

20:28 It's like dict merge is, is a single bytecode instruction.

20:32 Whereas .NET uses a specification.

20:35 It's an open specification called ECMA 335.

20:39 And this specification describes different stack types.

20:44 So it says, you know, there's like a 32 bit integer, 64 bit integer, 16 bit, et cetera.

20:51 There's floating point numbers, which come in the form of four byte, eight or four or eight

20:58 byte floating point numbers.

21:00 And there's also things like Booleans and then how branches and evaluations work.

21:05 So it's closer to assembly.

21:08 But the reason you don't want to write things in assembly is because assembly is specific to a CPU.

21:12 And you often find yourself writing instructions, which would only work on that particular CPU.

21:18 And then when you ship it to the real world, like that doesn't work.

21:21 Right.

21:22 One of the big benefits of JIT is it can look exactly at what you're running on and say,

21:26 oh, this has this vectorized hardware thing.

21:29 So let's use that version here.

21:31 Or this has this type of threading.

21:33 So we're going to do some sort of memory management around that.

21:37 C and C++ are ahead of time compilers.

21:39 They will interpret your code, parse it and compile it down into machine code instructions,

21:45 and then put it in a binary format, like a shared library or a standalone executable.

21:50 .NET Java and other languages that have a JIT, they have both a compiled VM, which is something

21:59 which has actually been compiled into a standalone executable, which is the framework.

22:03 So the, you know, the Java.exe, for example.

22:06 And then it could compile down the code into an intermediary language and then evaluate that

22:12 just in time, and then typically cache the machine code onto a disk or into memory.

22:19 And it does that using a JIT.

22:21 And Python, CPython interprets everything at runtime, essentially.

22:27 So it does cache the bytecode, but it doesn't cache the machine code because it doesn't compile

22:32 to machine code.

22:32 And that's what Pigeon does.

22:33 Right, exactly.

22:34 If you've seen PYC files with the Dunderpy cache, right, down in theirs, that's the compiled

22:41 output of Python.

22:41 But like you said, it's much higher level and it gets interpreted after that.

22:45 Yeah.

22:46 So part of the insight of Pigeon is like, well, let's take that compile step.

22:52 And instead of outputting Python bytecode, what if we output intermediate language bytecode?

22:58 Because there's a nice compiler hanging around that can compile that.

23:01 If you could somehow feed it that IL instead of PYC content, right?

23:06 Yeah.

23:06 So the steps are, it's quite involved.

23:09 Yeah.

23:10 The steps are, and I do go to this in the talk, but Python code, abstract syntax tree, code object,

23:16 which has Python bytecode.

23:18 And then Pigeon will basically compile Python bytecode into .NET intermediary bytecode.

23:25 And then .NET will compile the intermediary bytecode into assembly, into machine code.

23:30 And then you attach that to say the function object or a class or something like that, right?

23:35 Yeah.

23:35 And then that bytecode is, that machine code, sorry, is essentially an executable, which lives

23:40 in memory.

23:41 And then when you want to call it, you just call that memory address and it runs the function.

23:45 Just in the same way that you would load a shared library and just call the address.

23:51 Right.

23:52 You might have a .so file and you import it and run it.

23:55 And as far as you're concerned, magically, it just runs, right?

23:58 Exactly.

23:58 Okay.

23:59 Is it slow?

24:00 The compilation step in particular, not necessarily, we'll get to the performance of the overall

24:04 system.

24:05 But like, is this JIT step, is this a big deal?

24:08 Does it take a lot of memory?

24:09 What's it like?

24:09 Yeah.

24:10 I haven't actually focused too hard on the performance of the compilation step because

24:13 a lot of the problems that I'm looking at are compile once, execute 50,000 times.

24:19 And the overhead doesn't really matter that much.

24:23 Although it's pretty fast.

24:25 I've been really impressed.

24:26 Pigeon is written in C++ and the compilation step is actually pretty quick.

24:32 The overhead is 10 to 15% of the execution time on the first pass, depending on the complexity

24:39 of the function.

24:39 But yeah, like if the function takes a second to run, then, you know, 0.15 of a second is

24:45 around how much it'll take to compile it.

24:48 Yeah.

24:48 That's not bad.

24:49 And then it goes faster.

24:50 Yeah.

24:50 And then once it's done it once, that's it.

24:53 With the exception that Pigeon has a feature called profile-guided compilation, which is

24:58 kind of something that I designed to get around how dynamic Python is.

25:05 So JIT compilers are brilliant when you've got statically typed languages.

25:09 So if you know that this variable is an integer and this variable is a string and this variable is

25:13 an object, then you can compile all the correct instructions.

25:17 But in Python, variable A could be assigned as a string and then changed to an integer and then

25:24 you can assign it to the return of a function, which could be anything.

25:27 So one of the challenges I kind of looked at was how do you actually make a JIT is only going to be

25:32 faster if you've got optimizations and you can't make optimizations if you have to generalize

25:37 everything.

25:37 Yeah.

25:38 So what it does is a feature called PGC, which it will compile a profiling function.

25:44 So the first time it runs the Python code, it's basically going to sort of look at what variables

25:50 are.

25:50 It's almost like you're doing a C profile on itself.

25:54 Yeah.

25:54 So basically it compiles a function that runs and then when that function is running, it

25:59 kind of captures a whole bunch of information about what's actually happening.

26:03 And then it makes some assumptions and says, oh, you, when you were adding these three variables

26:07 last time, they were all integers.

26:09 So let's optimize that for integers next time.

26:12 And if they do change, then it depends.

26:15 It won't crash.

26:17 What happens if they change?

26:18 Okay.

26:18 A crashing is an option.

26:19 You probably don't totally want to go with the crashing part, but that might be a an intermediate,

26:24 like we're building it and it's getting dialed in.

26:27 Yeah.

26:27 Some options that come to mind is you could have a, an alternate compiled version that says,

26:33 okay, we've also seen this come as a string and, and so we're going to compile a separate

26:37 one and then do like a lookup on the arguments and go from there.

26:41 Yes.

26:41 So they're called specializations and that's something that Mark Shannon talked about.

26:46 And I think when CPython does its own JIT, they will definitely have specializations.

26:50 The downside is extremely, and the downside is that you have a lot of memory overhead if there

26:57 are lots of specializations.

26:58 And a good example would be in the unit test module, the assert equal function.

27:03 This is pretty much the first one I kind of slammed into like Pigeon tried to optimize the

27:09 assert equal function, which could take anything.

27:12 Like it could take two strings, a string and a number.

27:14 Yeah.

27:15 Probably the first question is, are they the same type or can they be coerced in the same

27:18 type?

27:19 Yeah.

27:19 And then just keep going down like these different cases.

27:22 It can't be simple.

27:22 Yeah.

27:23 And it was actually in a conversation with Guido, he suggested looking at type guards.

27:28 So the type guard is, so before I go into the optimized code, it will check to see, has the

27:36 variable changed from what it was last time it got profiled?

27:40 And then if it has changed type, then it will default back into a generic path.

27:45 So that's essentially how it deals with different types.

27:49 Yeah.

27:49 One of the fall through paths could be just, well, let Python have it, right?

27:53 Let Python just run the bytecode.

27:54 Yeah.

27:55 There are some things that Pigeon doesn't support.

27:57 Async and await is one major, major feature.

28:00 If it comes across asynchronous generators, then it will just hand them back to Python and Python

28:05 executes them.

28:06 It should be in there though, right?

28:07 I mean, C# and .NET also have async and await.

28:10 I know that means quite a bit differently, but theoretically in a future down the road,

28:14 when you have more time, maybe it's not completely out of the world possible.

28:18 I actually kind of started implementing it and put most of it together, only to realize that

28:23 the APIs for asynchronous generators are all private in CPython.

28:28 So I can't import them, which makes it technically impossible to implement, which is a bit of a

28:33 shame.

28:34 But yeah, that's one of the drawbacks at the moment is you can't do async and await.

28:38 Right.

28:39 But if this were super successful, I can see that that's like, okay, well, let's go ahead and expose

28:44 that because Anthony's so close.

28:45 Yeah.

28:47 They could probably maybe be coerced.

28:49 It's like a one line code change, I think.

28:51 Yeah.

28:51 Now this doesn't apply to the profile guided optimizations, but one of the things that

28:58 these frameworks have, you know, I know that .NET has had it at several levels, like they've got this

29:03 NGen utility that will take a .NET assembly in IL and you can pre-compile it like ahead of time, compile it

29:11 and generate a native image on your machine.

29:13 And then Xamarin had that because they had to have something like this to get onto iOS where they

29:19 pre like ahead of, they ran the JIT compiler in advance and saved it.

29:23 Is that something that could potentially be done here?

29:25 Or is it, is it too much?

29:27 I watched this space.

29:28 I've been researching that.

29:31 That's kind of one of the things I've been looking into is, can you compile it down into a

29:37 format which can be stored and then loaded or marshaled?

29:41 Or can it be stored into a portable executable format or some other binary format?

29:47 Lots of security implications there as well.

29:49 So that's one thing I'm cautious of.

29:51 But yeah, I want to look into.

29:53 Yeah, I hadn't even thought about all the challenges you got there.

29:55 Yeah, that could be interesting.

29:57 But you know, if it comes along that, yeah, this is pretty good, but there's this slower startup.

30:03 And I know something that the core devs and, you know, have been very protective of is the startup

30:08 speed of Python, right?

30:10 That they don't want it to start super slow because often it's, it's sort of run on a little tiny bit of

30:16 do this tiny thing and then we're going to drop back and then maybe run Python again on this tiny

30:20 thing or even mic, multi-processing, you know, fork it, run these things, drop out of it.

30:25 So I'm just thinking of like, how do you protect, like, how do you still achieve that goal and gain these advantages?

30:32 Yeah, I think there probably could be work done to make the compiler more efficient.

30:35 Also, you can set the threshold of how many times should a function be called before you JIT compile it.

30:41 Right.

30:41 So that's the threshold setting.

30:43 So if you call a function once, there's probably no need to JIT compile it.

30:47 Well, there's, there is no need to JIT compile it because, you know, you're compiling and then

30:52 just running it straight afterwards.

30:53 Whereas if it gets called a lot, then you would, you would want to call it a lot.

30:56 So, and that's kind of where you get these sort of things, your hot functions, which is

31:00 a function, which is run a lot.

31:02 You want to specialize and make more efficient essentially.

31:07 So yeah, if you are like sorting a list, for example, then doing comparisons between two

31:13 different types, you'd want to make that as efficient as possible.

31:16 And that would inherently make sorting algorithms all quicker.

31:19 For sure.

31:19 Yeah.

31:20 So there's multiple stages here, right?

31:23 There's the uncompiled code or letting Python run the code.

31:26 Then there's compiling it with those hooks to understand what types come in for the specialization.

31:31 And then there's the generate, generating the optimized version.

31:35 So if it's run once at best, you'll get the unoptimized compiled version and unoptimized

31:41 compiled code is probably not that much better, right?

31:44 Yeah.

31:44 There are some things that have been that it can do to optimize.

31:47 For example, there's a list of all the built-ins and it knows what return types the built-ins

31:53 have.

31:53 So for sure, like it knows if you run list as a built-in function, then it will return a list.

32:00 And I have even put in a check if somebody's overridden the list built-in, which is possible

32:08 and had to test that as well, which is interesting.

32:11 But yeah, it does make a whole bunch of assumptions like that, which is generic and works in most code.

32:17 And for example, if you're accessing the fourth item in a list, so you've got a list called

32:24 names.

32:25 And in square brackets, you put names, square bracket, the number three, then you three is

32:31 a constant.

32:32 So it can't change.

32:33 It's compiled into the function.

32:34 If the code knows for sure that names is a list, then instead of calling a C API to see what it is and get the

32:43 index, et cetera, et cetera, et cetera.

32:44 The JIT compiler can go, oh, I already know this is a list.

32:49 I know the index you want is the fourth number.

32:51 So instead of calling all the stuff, let's just calculate the memory address of the fourth item

32:57 in the list.

32:58 And then put in a little check to make sure that there are four items in that list.

33:02 And then, yeah, just compile that into the function.

33:05 And it's immediately significantly quicker.

33:08 Go to where the data is stored in the list.

33:10 Go over by the size of four pointers.

33:13 So eight times four or something like that.

33:15 And just read it right there.

33:16 Something like that.

33:17 Yeah, exactly.

33:17 I mean, for sure that that's what it is, right?

33:21 It sounds dangerous, isn't it?

33:23 Yeah, exactly.

33:24 This is some of the security things, right?

33:26 Go over some part in memory and read it and then do something like that.

33:30 Sounds like buffer overflow when done wrong.

33:32 So I can see why you'd be nervous.

33:34 Yeah, exactly.

33:35 But that's how compilers work.

33:37 They you're dealing with memory addresses, essentially, and low level instructions.

33:42 So yeah.

33:43 Something we thankfully don't have to do a ton of in Python.

33:45 But like in CPython, you're in C, right?

33:48 That's a C thing.

33:49 Yeah, definitely.

33:49 And when you're working with duples, yeah, you do that as well.

33:52 So you work out the address of the nth element and then just use that address and increment

33:58 the reference counter.

34:00 Yeah.

34:00 Also, don't forget that or memory management breaks.

34:03 This portion of Talk Python To Me is sponsored by Linode.

34:10 Cut your cloud bills in half with Linode's Linux virtual machines.

34:14 Develop, deploy, and scale your modern applications faster and easier.

34:18 Whether you're developing a personal project or managing larger workloads, you deserve simple,

34:23 affordable, and accessible cloud computing solutions.

34:25 Get started on Linode today with $100 in free credit for listeners of Talk Python.

34:31 You can find all the details over at talkpython.fm/linode.

34:35 Linode has data centers around the world with the same simple and consistent pricing,

34:40 regardless of location.

34:42 Choose the data center that's nearest to you.

34:44 You also receive 24/7/365 human support with no tiers or handoffs, regardless of your plan size.

34:52 Imagine that real human support for everyone.

34:55 You can choose shared or dedicated compute instances, or you can use your $100 in credit on S3 compatible

35:01 object storage, managed Kubernetes clusters, and more.

35:05 If it runs on Linux, it runs on Linode.

35:08 If it runs on Linux, you can find the link right in your podcast player show notes.

35:15 Thank you to Linode for supporting Talk Python.

35:18 Where are you with this?

35:22 You said it's going to go to 1.0, which sounds like I could install this and I could run it and

35:27 it would do its magic, right?

35:28 It's going to 1.0.

35:30 Works only on Python 3.10.

35:32 That's one big thing.

35:33 I've upgraded it from 3.9 to 3.10.

35:36 When 3.10 was released, actually, and I won't be backporting it.

35:40 It's just so much has changed in Python and the APIs.

35:45 You can pip install it on Python 3.10.

35:47 You need to have .NET 6 installed when that is released.

35:51 Or you can install release candidate 2, which is already out.

35:55 Yeah, you can enable it.

35:57 Yeah, it sounds like it's pretty close to done, right?

35:59 I don't know when it's actually, but they've got their .NET conference in, what is that?

36:03 Six days?

36:04 Yeah.

36:04 Surely.

36:05 Yeah, they say it launches then.

36:06 It's already online.

36:07 By the time this episode out, it's very likely very close to just out.

36:11 So, okay, that's a pretty easy thing.

36:13 Can I brew install .NET?

36:15 Do you know?

36:15 I've not tried.

36:16 I don't know.

36:17 That's a good question.

36:17 Yeah.

36:18 There is an installer for Mac.

36:19 Yeah.

36:19 Pigeon also works on ARM 64, which is worth noting.

36:23 And it's a very complicated detail of Jet Comparters.

36:26 But yeah, ARM support was no small feat, but it's something that people just expected to be there.

36:31 Yeah, sure.

36:32 Well, I mean, there's obviously the M1s, right?

36:35 Everyone with an M1 would like, who wants to do this would really like it to work on ARM.

36:40 But there's also Raspberry Pis and other places that Python shows up.

36:44 I don't know, helicopters on Mars.

36:46 I don't know if that's ARM or not, probably.

36:47 Yeah.

36:48 So I've tested Linux ARM 64, which will be the Raspberry Pis and other.

36:53 And also Mac ARM 64.

36:55 I have not tested Windows ARM 64 because there is no Python for ARM 64 on Windows.

37:03 Interesting.

37:03 Okay.

37:04 Steve Dower has released a preview package of the libraries, but not a standalone executable.

37:12 But it may come out in the future.

37:13 We're going off on a bit of a tangent, but...

37:16 No, but that whole story with Windows on ARM is...

37:18 I would love to see it better handled.

37:21 But it's like you can't even buy it, right?

37:24 Is it supported?

37:25 It's provided OEMs, but kind of, sort of.

37:28 I don't know.

37:28 It's in a weird state, right?

37:30 It's not normal Windows.

37:31 It's not super supported.

37:33 Yeah.

37:33 I think...

37:34 Who knows?

37:34 I can't speak...

37:35 I do work for Microsoft, so I'm definitely not going to give my opinion.

37:38 Yeah.

37:38 I'm not asking for your...

37:40 This is more of me just making a proclamation.

37:42 I have Windows 11 running on my Mac Mini M1, and it is running the ARM.

37:47 But to get it, I had to go join the Insiders program and then install it.

37:52 And it's permanently got this, like, watermark that it's a tainted version, but you can try

37:57 to use it.

37:58 And, you know, it works fine, but it's kind of sluggish and whatnot.

38:01 So, anyway.

38:02 Hopefully, that comes along better.

38:04 Yeah.

38:04 So, it sounds like it's supported on the various things.

38:07 Yeah.

38:07 I have .NET 6 installed and Python 3.10 installed.

38:11 Neither of those have to be messed with, right?

38:14 You've got the PEP for Python, and you've got just the JIT.

38:17 Vanilla installations, yeah.

38:19 Yeah.

38:19 That's beautiful.

38:19 And so, give us a little walkthrough on, like, how we might use this.

38:23 What do I have to do to change my code to get these compilation steps?

38:27 Wherever your code starts running, you need to import Pigeon and then call the enable function

38:32 on Pigeon.

38:33 Okay.

38:33 There's also a config function, which configures different settings in terms of how Pigeon

38:38 runs.

38:39 Like that threshold, for example, like how many times before you compile.

38:43 Yeah.

38:43 The hot code threshold optimization level, which is a level between zero and two, which is

38:49 like how aggressive the optimizer is.

38:51 And you can also enable or disable the profiler.

38:54 Yeah.

38:54 I remember in C, way back when I was doing C for projects, there were these different optimization

39:01 levels.

39:02 Yeah.

39:02 And it was like, well, you can go to one or two and it'll probably still work.

39:06 If you go too far, it'll just crash.

39:08 It's like, what is going on?

39:10 Like, I don't understand.

39:11 But okay, we'll just, we'll dial it back until it's stable.

39:14 And that's as fast as we can make it go.

39:16 Yeah.

39:17 This is the same.

39:18 This is my recommendation is go as hard as you can without it catching fire.

39:22 Take a step back.

39:23 Go with that.

39:24 The start, maybe start at zero.

39:26 I know it defaults to one.

39:28 Yeah.

39:28 So yeah, don't turn out to 11, but yeah.

39:31 And then the profiler is on by default, which I may disable in the future because the profiler

39:37 probably causes the most issues where you've got a function, which ran with integers a thousand

39:44 times.

39:45 And then all of a sudden somebody gave it some floating point numbers.

39:48 It won't crash.

39:49 It will just, it will either fall back to default path or it will raise an exception

39:53 to say that it got some values, which it didn't expect.

39:55 And if you do see that, it's called a, it'll tell you in the error message and it will suggest

40:00 that you turn the profiler off and then rerun the code.

40:03 You know, for me, I feel like that would suggest to me that maybe I should go check my code.

40:07 Yeah.

40:08 Not always, but often if I got something like I'm trying to take these things and add them

40:12 and get what I think is the result.

40:14 And I'm trying to do math, not string concatenation.

40:17 And I get a string, chances are that's actually a mistake, not something that I wanted to take

40:23 account for.

40:23 It could be, but probably not.

40:24 Yeah.

40:25 I actually saw a useful example of this yesterday on, on Twitter.

40:28 Somebody shared adding two floating point numbers in Python, A and B, A plus B is not the

40:34 same as B plus A.

40:35 They actually give different results, which is crazy.

40:38 But yeah, when you work with floating point numbers and, and integers and you, you don't

40:43 mean to, but you end up with different types, you will get some weird results anyway.

40:47 Interesting.

40:47 Yeah.

40:48 Yeah.

40:48 So yeah, in terms of whether this will work, I've compiled some pretty big libraries and

40:53 they've worked fine.

40:54 Pandas, Flask, Django, chucked a lot of the CPython's test suite.

40:59 Yeah.

40:59 This as well.

41:00 So yeah, it will run about, I guess to about 50,000 tests or something.

41:05 I think quite happily.

41:06 That's really awesome.

41:06 A lot of CPython tests, testing specific internal things in CPython.

41:12 So some of them do fail, but it's not anything that Pigeon's done.

41:15 And pytest works as well.

41:17 So yeah, there's a lot of big libraries.

41:20 NumPy works fine.

41:22 I have this test for NumPy, test for Pandas, test for Flask, Django, all the stuff that I'd

41:27 expect people to try is in there.

41:29 If you're working with a lot of C extension modules, they also do work.

41:32 Cython extensions work.

41:34 So in terms of like compatibility, that was one of the main things I wanted to focus on

41:39 was instead of going super aggressive with the optimizations, I just want to make sure

41:42 this works with existing code because there are lots of other projects which.

41:47 Right.

41:47 We have PyPy already, P-Y-P-Y, which is also a JIT compiler.

41:52 And it works in not with the .NET backing the JIT, but some of like the hot functions getting

41:58 compiled versus just running in Python.

42:00 That kind of stuff is pretty similar now, but they made the big trade-off like we're going

42:04 to just go all in on compiling and we're going to not necessarily integrate with the C APIs

42:08 in the same way, which means breaking with some of these things like NumPy or Pandas that

42:13 people sometimes care about.

42:14 Yeah.

42:14 And they also have to play catch up on the language features.

42:17 So if there's new features in 3.9, 3.10 in the language, like new operators or whatever,

42:24 then PyPy has to then go and implement that, which is hard.

42:27 But Pidgin is, you load it into CPython.

42:30 So like in terms of language, it would be exactly the same.

42:33 Right.

42:33 Often a lot of those language features are just syntactic sugar over existing things,

42:38 right?

42:39 Yeah, exactly.

42:39 And then if there's anything which is not compatible, like I mentioned, async and await, then it will

42:44 default back to CPython.

42:45 And that transition is seamless.

42:47 And you wouldn't, you won't notice it will just, it will just run the code regardless.

42:51 Awesome.

42:51 So it looks like, I'll say it works for the most part.

42:55 I haven't totally tried it, but it sounds like it works quite extensively.

42:58 The way you use it is you pip install Pidgin and then just import Pidgin, Pidgin.enable

43:02 is option one.

43:04 And then that's it, right?

43:05 There's nothing else that you have to do.

43:06 Nothing else you have to do.

43:07 You just run the Python code and it all just automatically spots stuff that it should compile

43:13 and compile it for you.

43:14 Fantastic.

43:14 And then another option I see on the page here is I can say Pidgin space some Python

43:19 file and not necessarily modify the file, but tell it to execute.

43:23 What does it do?

43:24 Import Pidgin, Pidgin.enable, eval, something like that.

43:30 Yeah, basically it's a very small script.

43:34 So yeah, Pidgin is a standalone command that you can run instead of, so instead of running

43:38 Python, you run Pidgin against a script or a module and all the arguments should work as

43:44 normal.

43:45 Awesome.

43:45 You also have the dash M for built and stuff, right?

43:48 Yeah.

43:48 So if you want to run a script, then you'd run Pidgin and then the name of the script.

43:52 If you want to run a module like pytest, for example, then you would do Pidgin dash pytest

43:57 and it would run pytest with the JIT enabled.

44:01 Yeah.

44:01 Fantastic.

44:02 Or Flask or something like that, right?

44:04 Yeah, exactly.

44:04 Yeah.

44:05 So I guess the dash M would work with external libraries, right?

44:07 Long as like Python can see them.

44:09 Yeah.

44:10 And I've shipped a whiskey extension as well so that you can use it in whiskey apps.

44:16 I think that that's an interesting use case, actually.

44:20 So when I run my regular Python code, it just loads up and runs.

44:24 But when I run Flask or FastAPI or Django or Pyramid or whatever, there's all sorts of

44:30 layers of indirection or layers of not directly running it, right?

44:35 In production, you would say, hey, I want Microwiskey or G Unicorn to run this.

44:39 Like for FastAPI, it would be, I want G Unicorn to run this, but with UVicorn workers and run

44:44 five of them and like, boom, boom, boom.

44:46 Now you've described like this chain of events, right?

44:50 Yeah.

44:50 But it sounds like there's, what, middleware to make this work still?

44:55 Yeah.

44:55 It's a whiskey middleware that is for Pidgin, which will do the enabling and disabling.

45:01 Fantastic.

45:02 So that sounds like it works for any whiskey app, Flash, Django, Pyramid, et cetera.

45:06 What about ASGI apps?

45:08 Well, due to the lack of async and await support, then no.

45:12 It doesn't really make much sense, right?

45:13 Because like the big thing that it does is like not the thing that's supported, right?

45:18 Yeah.

45:18 I mean, if it's an async function, then it will just give it back to CPython.

45:22 I'm sure there's a lot of synchronous things happening in those various places, right?

45:28 Maybe the view method itself is async, but it might call a whole bunch of, you know,

45:32 give me the headers and the cookies synchronously.

45:34 Who knows?

45:35 Yeah, exactly.

45:35 It also depends on the nature of the program as to whether Pidgin's actually going to make

45:40 a difference to their performance.

45:42 Yeah.

45:43 So that's kind of where I'm up to at the moment is different benchmarks and running Pidgin

45:50 against some standard benchmarks.

45:52 I shared the N-body execution time at my PyCon talk, and that was 33% faster.

45:58 It's now 65% faster.

46:01 So I've doubled that.

46:02 Oh, nice.

46:02 Gain.

46:03 So, however, most people aren't calculating the position of planets.

46:08 But the few who are, they'll be super thrilled.

46:12 Yeah.

46:13 For the few people who are and are doing it in Python, the system doing it in Python.

46:16 Yeah.

46:17 Then they'll be delighted.

46:18 So there are, so code, which is doing a lot of math and is in pure Python would be faster

46:24 up to 20% fast, 20 times, not 20%, 20 times faster.

46:30 I've got some micro benchmarks that do like simple calculus and stuff like that.

46:35 And they're 20 times faster with floating point numbers.

46:39 And for, I'll say, small integers, because an integer in Python, an int in Python is a

46:46 called a...

46:47 It's an unbounded thing.

46:48 It's bounded by your memory, right?

46:50 Yeah.

46:50 It's actually a list of digits.

46:51 It's not...

46:51 Yeah.

46:52 So it can have like an almost infinitely large number inside it.

46:56 Whereas CPUs work with 32-bit or 64-bit numbers.

47:00 And the other languages, instead of keep growing, they just go like, we broke.

47:04 So instead of going one more, it goes like negative 2 billion.

47:07 Yeah.

47:08 Yeah.

47:09 You get funny overflows and stuff.

47:11 Yeah.

47:11 So one of the challenges I've had with Pidgin is trying to optimize integers, but trying to

47:18 understand where it potentially could be a very big number and where the number is like

47:23 five.

47:24 What's five times five?

47:25 You don't need to allocate half a meg of memory to do five times five.

47:30 Yeah.

47:31 Yeah.

47:32 So that's one of the challenges.

47:33 So if you're working with integers and you're working with floating point numbers and you're

47:37 doing a lot of math and Pidgin will make a dramatic difference.

47:39 There's also a feature called the graph, which will create .graphviz files for the functions

47:48 that it's compiled.

47:49 And you can see this on the website.

47:51 So if you go to live.trypidgin.com.

47:53 You've got this interactive live website, right?

47:55 Yeah.

47:56 So I've kind of made a sort of live demo site where you can type Python code in and click

48:01 compile and then it will show you.

48:04 Oh, don't change it, Michael.

48:05 You broke it.

48:06 It's going to be fine.

48:08 And then do you compile?

48:10 It's going to be fine.

48:11 I got faith in you.

48:11 Now I can delete it.

48:13 It catches fire.

48:13 Okay.

48:14 That's the assembly that it has compiled that Python code into.

48:18 Okay.

48:18 And a fun thing I actually added was there's a comment in assembly, which says which byte

48:23 code this is for, which is fun.

48:25 If you scroll down on the page, you see this graph.

48:28 I got to do my screen.

48:29 Press the IL and then keep going down.

48:31 Okay.

48:32 There we go.

48:33 Here we go.

48:33 I don't have a big enough screen there.

48:35 Here we go.

48:36 Okay.

48:37 So this instruction graph gets generated if you enable graphing.

48:42 It's on the documentation.

48:44 But when you enable graphing, it will show you all the Python byte codes.

48:48 And then what values are being sent between those byte codes.

48:51 So this is basically the output of the profiler.

48:54 It can see here that you've got A and B.

48:57 B was a float.

48:58 A is an integer.

49:00 And then it's doing a multiplication operation.

49:03 And it knows that if you multiply a float by an integer, then the output is a float.

49:06 So it carries that value through the graph.

49:09 PyPyde actually does things quite similar to this.

49:11 And then once the profiler has run, it will look at the graph and then make some decisions

49:18 about which values don't need to be Python objects.

49:22 So for example, A times B.

49:24 Right.

49:25 There's these intermediate binaries here and stuff, right?

49:29 This in-place add, all those, yeah?

49:30 So yeah, exactly.

49:31 So all of those values do not need to be Python objects.

49:35 So what it will do on the next pass is it will recompile that.

49:39 And then it will, what's called, unbox them.

49:41 So it basically just carries them in CPU registers as integers and floats.

49:46 And then instead of running the C function to add two numbers together, it will just emit the assembly instruction to add two numbers on the register.

49:54 Yeah, that's fantastic.

49:55 That's what you're talking about where you don't have to drop back into allocating Python numbers if you know for sure that no one's going to look at it.

50:04 It's just an intermediate value.

50:06 And this is where it gets tricky with integers, right?

50:08 Because A and B might be small, but A to the power of B might be larger.

50:12 Exactly.

50:13 And then it goes one step beyond as well if you have code that uses fast values.

50:19 It's tricky because when you do eval on the website, it will never have used fast locals.

50:24 But if you do have a function that has fast locals, if it detects that that local is only ever used in places where it can be unboxed,

50:33 then it won't actually store the variable as a Python object either.

50:38 It will store it as a native stack value.

50:41 So that's something it even does will, like if you have a variable called A and you assign it to the number two,

50:46 then it will actually just reserve an area in memory just to store that stack value.

50:52 And it will be an offset.

50:54 Which is way more efficient.

50:55 Enormous, like thousands of times more efficient.

50:58 And when you refer to that variable in your function, it will basically just access that point in memory to get the actual value.

51:06 Yeah.

51:07 Yeah.

51:07 Awesome.

51:08 This is really neat.

51:09 One thing that stands out to me here is I wrote name equals Anthony plus Shaw as two strings.

51:14 And that was, you're like, don't do it.

51:16 And yet what I see in the graph here is that it loads the string Anthony Shaw.

51:22 So did Python look at that and then decide that that's actually, those are two constants.

51:28 So we'll just make it one constant or was that .NET or what happened there?

51:32 Yeah.

51:32 That's constant folding is a feature of Python.

51:35 Yeah.

51:35 That's what I thought.

51:36 If you do, yeah, two strings and a plus, it will actually compile those into one string.

51:41 You'd never, you'd never see it.

51:43 Oh, interesting.

51:43 Because they're statics, right?

51:45 Like it knows both of them.

51:46 Yeah.

51:47 Okay.

51:47 Very interesting.

51:48 So it sounds like for numerical stuff, this is quite a bit faster.

51:52 Did you do any tests on the web frameworks?

51:54 I mean, it's kind of appealing to say, what if I could get Flask to kind of run natively?

51:58 Yeah, I have.

51:59 Because when you look at what, when you think about I'm writing this code to make this website go, most of what you're doing is you're doing like a little tiny bit of code on top of a big framework that's doing most of the heavy lifting, right?

52:11 So if you could make Flask do that magic faster or Django or whatever.

52:16 Yeah.

52:18 So the areas where Python is faster as numerical work, similar to PyPy, PyPy is a lot faster with numerical work.

52:25 Yeah.

52:25 It can make clear and simple assumptions and optimize based on the CPU.

52:30 So that's brilliant.

52:31 Areas where it's, it's sometimes it's not faster or sometimes even slower is code, which is just uses a lot of classes and very small functions.

52:41 Partly just because of the way the PEP is designed, it will JIT compile functions.

52:47 And if your functions are just working with custom classes and you're passing things around, then trying to decide what type things are and then how it can optimize types.

52:57 Yeah.

52:57 If everything is a Python, custom Python object, custom Python class, there's very little it can actually do to optimize.

53:04 Yeah.

53:05 And when you were talking about the specializations earlier, it's one thing to say, well, I'm passing a customer and an order object, but then they have fields themselves, each of which have potential types, right?

53:15 Like it's this, this object graph, this closure of all the object graphs.

53:20 And you got to look at all those types you might touch, right?

53:22 Yeah.

53:23 And then you've also got to check that fields exist.

53:25 It's been set.

53:26 It's not none.

53:27 I mean, like if you Django.

53:28 By the time you're done testing them all, you might as well just run it.

53:31 By the time you've done all of that, you've basically just written what CPython would have done anyway.

53:36 Yeah.

53:36 But the difference is that if you JIT compile it, you've got to emit all those instructions.

53:40 And the compiled function, the JIT compiled function ends up being bigger because, you know, if it's compiled in C, it just has one function that does that, that's shared by all libraries.

53:51 Whereas in the JIT, you have to make it so that it's their standalone functions.

53:56 So that's one downside is that if you're working with stuff which is similar to Django and Flask, I guess like lots of classes, lots of variables, which are all custom types, probably not going to see a performance improvement or potentially it could even be slower.

54:10 Will it be transitive?

54:11 If I write code that runs in Flask, let's just pick one of them.

54:14 And I run Flask with Pigeon.

54:17 Will that then make all my code that Flask is executing also run in Pigeon?

54:22 Yeah.

54:23 So maybe if I was doing like highly computational stuff in my Flask app, having to do that might be worthwhile.

54:29 Yeah, definitely.

54:29 And in that case, you can just enable Pigeon before those functions or you can set the threshold to be higher.

54:36 That's probably what makes more sense, I think.

54:38 Yeah.

54:38 The other area is strings.

54:40 So if you're doing a lot of work with strings, I haven't done any work on optimizing strings.

54:44 And I'm not particularly sure what you would optimize either because they're so complicated.

54:50 Because you're dealing all with Unicode, different encodings, different bit lengths.

54:56 Yeah, I don't even know how you would improve upon CPython's string implementation, to be honest.

55:03 Yeah, there's a lot of nuances to strings in all the languages about it, especially in Python, because you don't have to think about it, right?

55:09 The fact that you don't know how complicated Unicode is and, you know, word alignment and null characters and so on and so on.

55:19 Yeah, if you want a glimpse of how complicated it is, look at the Unicode object source file in CPython.

55:25 Is it big?

55:26 It's an absolute monster.

55:27 I bet it is.

55:29 It's probably more complicated than C of L, actually.

55:31 It's...

55:32 Oh my goodness.

55:33 Yeah.

55:33 All of that for emojis.

55:34 For emojis.

55:35 Okay, I understand there's other languages.

55:37 No, interesting.

55:39 So one thing I wanted to ask you about here, I was hunting around on the screen for it, is it's cool that you can compile these things down and run them as native machine instructions,

55:48 instead of piping them through C of L.C as single line operations.

55:52 That's great.

55:53 But when I think of the stuff that compilers can do and JIT compilers, it's a lot about the optimizations.

56:00 It's like, I saw this function and this function.

56:03 Actually, we're going to inline that one.

56:04 And this one, we're going to rewrite this as some other thing that's going to be more efficient because we're seeing that.

56:11 And so you do have some optimizations, right?

56:13 Like that's part of the magic.

56:14 Yeah.

56:15 So I've kind of documented them as best I can on the Pigeon documentation.

56:19 There's a section called built-in optimizations.

56:21 And I've given them all numbers.

56:23 And if it uses that optimization, it will flag the function to say, oh, I'd use this optimization on this function.

56:30 And then in the documentation, I'll explain what's the background.

56:33 What was the idea?

56:34 What difference does it make?

56:36 You want to give some examples from some of these, like the is one, for example?

56:40 Yeah.

56:40 So if you're using the is operator in Python, so if something is false, or you actually probably use not something.

56:49 Is none or something like that.

56:50 Yeah, is none.

56:51 Then it won't run the pyis function.

56:54 It won't run the C API to do an is comparison.

56:57 It will just look at the address of both objects and see if they're the same.

57:01 Because is actually asking, are these objects the same, not are they equivalent, right?

57:06 So the same in CPython is the pointer addresses are equal.

57:10 Yeah, exactly.

57:11 It will just compile that down to a simple pointer comparison.

57:14 That's actually one of the first ones I wrote, and it was good to learn.

57:18 Also, when it's doing comparisons for small numbers.

57:22 So Python inter, it kind of immortalizes numbers between, I can't remember the range.

57:28 Negative five and 256?

57:29 Yeah, that's it.

57:30 Because they use so much that...

57:32 Maybe 255, but basically.

57:34 It's around that many.

57:35 It immortalizes them, and then that keeps them as constant objects so that they're just reused.

57:40 So if you create a new integer with the number one, it will just reuse the same number one that it used before.

57:45 Right.

57:45 Because they're not created on the stack.

57:47 They're like complicated things on the heap.

57:49 So yeah.

57:50 Exactly.

57:50 So if you do, if something equals equals 25, Pidgin will go, oh, well, I know 25 is an intern number.

57:58 So I'm actually just going to do a pointer comparison instead of a value comparison if the left-hand side is a number.

58:03 So that they have to sort of like little things like this, which make small differences, but when you add up...

58:09 They add up, right?

58:09 Yeah.

58:10 So I felt like, I don't know where you are now from where you were when you spoke at PyCon,

58:14 but I feel like this is an area that people could come contribute to.

58:18 This is an area for growth that doesn't require a lot of changes.

58:22 It could be super focused.

58:23 But if you see this situation, you know, here's one more way to make it faster.

58:27 Absolutely.

58:28 So yeah, this is one area where I was hoping that the research that I was doing whilst writing Pidgin could contribute to other projects.

58:36 And I'm going to, I've been talking to Carl Friedrich-Osch as well, who works on PyPy.

58:41 And we're going to do some pair programming at some point to see kind of like how different projects work and stuff like that.

58:50 But kind of hopefully these optimizations can be learned from and then used when CPython gets to implementing its JIT.

58:58 Yeah.

58:58 Surely the .NET compiler has all sorts of optimizations in it.

59:03 Are you already taking advantage of those to some degree?

59:06 Yeah, some of them.

59:07 Just by nature of letting it run, basically.

59:09 Yeah, some of those optimize it.

59:11 It does a lot of them already.

59:13 And I've been working with the compiler team on the .NET project as well.

59:18 Pidgin is one of the only projects to use the compiler directly.

59:22 Even though it can be used in that way, it's, it wasn't, it was really, I think it was designed to be run directly, but it's not like advertised as an off the shelf JIT.

59:32 So yeah, there aren't many projects that are using it in that way.

59:36 And I do work with the compiler team, like specific test cases and stuff as well.

59:40 You do have that advantage of being on the inside at Microsoft, even if you're like half a world away, you still can get direct access remotely, which is like being down the street these days.

59:51 Yeah, I've done everything.

59:52 It's all been via GitHub though.

59:53 So I think as far as they're concerned, I'm just another name on GitHub, but like I don't, it doesn't.

59:59 They might not even know, right?

01:00:00 I probably didn't even know.

01:00:02 Yeah.

01:00:02 That I work.

01:00:03 Yeah.

01:00:04 Let's see.

01:00:06 What else?

01:00:07 I think we pretty much have covered it given the time that we got.

01:00:10 I mean, another area that's interesting is how does it compare to the other things that have been done before it or going along a parallel?

01:00:18 And, you know, you do have a whole section on the read me on the GitHub page.

01:00:22 Like how does this compare to X, Y, Z, PyPy, Piston, Numba, Iron Python, and so on.

01:00:29 Do you want to maybe just have a quick statement about that?

01:00:31 Yeah, I think it's really hard to compare them.

01:00:33 So probably the most obvious ones to compare it with would be Cite and Numba and PyPy.

01:00:39 So PyPy is a Python interpreter that has a JIT.

01:00:44 So it interprets, compiles, and runs Python code written in Python.

01:00:48 PyPy has been around.

01:00:50 Right.

01:00:50 It's like a fork of Python that was rewritten to behave differently rather than PEP523, right?

01:00:55 Yeah, exactly.

01:00:56 So it's not Cite Python is written in C.

01:00:58 PyPy is written in Python and probably significantly faster in many cases.

01:01:05 It's a very mature project.

01:01:07 But obviously that there's limitations around C APIs and some things don't work in PyPy.

01:01:14 Numba is a JIT.

01:01:16 It's a JIT specific for NumPy.

01:01:19 If you're using NumPy, then if you use Numba, Numba can JIT compile NumPy data arrays and stuff like that.

01:01:27 That's actually a very specific use case.

01:01:29 Right.

01:01:30 That's great.

01:01:30 But if you're not doing NumPy, then not too much.

01:01:33 In any other use case, it would make very little, if any, difference at all.

01:01:37 And then Siphon is not a JIT.

01:01:40 It's an AOT compiler.

01:01:42 And it's a way of annotating Python code with concrete types.

01:01:46 And then it compiles them into C.

01:01:49 Concrete compiles them into C extensions.

01:01:50 Yeah.

01:01:51 It's a little bit like what you're talking about, trying to understand what are these types and then can we create a specialization.

01:01:56 But it's the developer who just says, no, these are integers.

01:01:59 This is the list.

01:02:00 For sure.

01:02:01 Go with it.

01:02:02 Yeah.

01:02:02 That's the point where you actually specify the length of the integer as well.

01:02:05 So in Siphon, you would say this is a 64-bit integer so that it knows that it can be converted into a 64-bit integer in C.

01:02:14 Pidgin is probably the closest to Siphon.

01:02:17 But obviously with Siphon, you have to compile it to a C extension ahead of time.

01:02:21 Whereas with Pidgin, you just import it into Python and just turn it on and it just runs and compiles live.

01:02:27 And it will, yeah, you don't have to have different compiled versions of your app or your library, not necessarily because of this anyway, for different platforms, right?

01:02:35 And you don't have to annotate the code in this sort of special syntax either.

01:02:39 Yeah.

01:02:40 Well, let's close this out with a question on that.

01:02:43 So in Python, we've been able to say optionally that this thing is an integer, you know, x colon int, or it's an optional integer or it's a customer or whatever the heck it is.

01:02:54 We've had these type annotations and traditionally they've meant nothing, right?

01:02:58 Except for if you run tools against them.

01:03:00 On the other hand, we've had things come along like Pidantic, like FastAPI that look at that and go, you know what?

01:03:06 I'm going to do something with that because I got a string because this is the web and that's all I usually get.

01:03:10 I'm going to make that into an int because you said it's an int.

01:03:13 Is there a scenario where type annotations play into this to enhance it somehow?

01:03:18 I'm against that idea, potentially.

01:03:21 I mean, that's how Siphon works.

01:03:23 Yeah.

01:03:24 I think, and with type annotations as well, having a type checker is great until the types are wrong and it's not a fault of your own.

01:03:33 And having work on like strongly typed languages, like C#, C# is brilliant, except when you need to do things like reflection.

01:03:42 So let's say you're working with JSON data or YAML, for example.

01:03:47 And working with JSON and YAML in C# is incredibly hard.

01:03:53 Because you're like, oh, I've had to do parsing YAML files in Java.

01:03:59 It's incredibly difficult because you're like, oh, well, this could be a list of strings or it could be a dictionary where the key is a string, but sometimes it's a number.

01:04:12 Which is like YAML is like that sometimes.

01:04:14 And JSON is like that.

01:04:15 It's completely free.

01:04:16 So when you're working with dynamic content using strongly typed languages is extremely difficult.

01:04:21 It just gets in the way and just makes your life harder.

01:04:23 And the compiler just complains and just won't let you parse because it's saying, oh, well, that's not compliant with my view on the world.

01:04:31 With Python, I think it's cool to say, okay, if I just tag this as an integer, then it should be able to optimize it for integers.

01:04:38 And I think that's what Mark was suggesting.

01:04:39 If it stops there, that's fine.

01:04:41 I think if it goes beyond that, then that's where things get very, very complicated.

01:04:44 And it just becomes a thing that's in the way being noisy and it slows you down.

01:04:50 That's where I have a strong disagreement with it is that we use Python because it's fast to develop in.

01:04:56 And it's fast to develop in because, you know, if you know what you're trying to get it to do, then the compiler doesn't really give you any grief.

01:05:03 If it's syntactically, it's okay.

01:05:04 I'll just try and run it.

01:05:05 Yeah.

01:05:06 And I think the thing that's most similar to this is TypeScript.

01:05:10 And my experiences with TypeScript have been often met with frustration.

01:05:14 I think it's really cool that you can write.

01:05:17 The language specification is really neat, but I want to use this library and that thing and pull them together.

01:05:21 And if that thing doesn't have the type specified just right, I can't get JavaScript to pop out the other side because the transpiler gives you a compiler error.

01:05:30 Like, this doesn't work right.

01:05:31 Well, they know it's going to work right.

01:05:32 Just let this little part go through.

01:05:34 I just, I don't want to re...

01:05:36 I know there's ways to say this is just any and stuff, but still.

01:05:39 I feel like my experience was bumping up against that stuff, even though it's kind of the same as Python type annotations in a lot of ways.

01:05:47 Yeah.

01:05:47 And under the hood, it's still dynamic.

01:05:49 Yeah.

01:05:49 Yeah, exactly.

01:05:50 At runtime, it would still be the same thing, but you won't get to the runtime because you can't get the stuff compiled.

01:05:56 Yeah.

01:05:56 Or it does really weird things at runtime that at compile time, it made assumptions that wouldn't happen.

01:06:01 All right.

01:06:01 Let me throw one quick thing out there before you pass final judgment on this type thing.

01:06:07 Because you're right.

01:06:08 I could write a thing that says my function takes an int and a string, and I could run my pi against my code.

01:06:15 And sure enough, it only takes ints and strings in these situations.

01:06:18 But if it's a library, all bets are off.

01:06:20 There's nothing that says people who use your library are going to run my pi and listen to that or do anything, right?

01:06:27 They could write in Notepad if they had no self-respect, but that would like you could write whatever you want, right?

01:06:32 And then you could just feed it on and then you'd have these problems.

01:06:35 That said, over on the Pydantic world, there is a validator, a decorator called at validate arguments that will make sure at runtime, they really are a string in an int.

01:06:47 Yes.

01:06:48 To make you feel any better?

01:06:49 Probably not, because the transitive closure of objects is too complicated to check and describe.

01:06:54 The validate arguments is, I think, it's a convenience function.

01:06:57 It's there because you often have to validate data, user-provided data that's coming in from an interface, like a web interface, or via a file, or an API of some sort.

01:07:08 So if people are submitting user-provided data, then you want to have to validate the types.

01:07:13 And that is just a lot of boilerplate code, and it's annoying to write.

01:07:17 As a convenience function, let's write this thing.

01:07:20 Right.

01:07:20 Probably because we're going to immediately try to convert it in the next thing, right?

01:07:24 Yeah.

01:07:24 Yeah, exactly.

01:07:25 So I think it's quite different to using it internally to make changes in terms of how the code is compiled.

01:07:34 Yeah.

01:07:35 For numbers, like I said, numbers and strings and base types, I think is cool.

01:07:39 But for objects, like which fields do you assume exist?

01:07:43 Yes, exactly.

01:07:44 And then do they all have to match the types or just the two that you touch?

01:07:47 Yeah.

01:07:48 You've got an attribute there.

01:07:49 Like sometimes it has a getter and a setter.

01:07:51 Sometimes it's just an attribute.

01:07:53 Sometimes it's a descriptor that inside the descriptor is the thing you actually want.

01:07:58 Is that the right type?

01:07:59 Yeah, you're right.

01:07:59 It's insane.

01:08:00 It's just, oh yeah.

01:08:01 Once you kind of open, I think that once you open custom types, then it just kind of mushrooms

01:08:06 into this.

01:08:07 Sure.

01:08:07 And that's why you see things like Cython and stuff having like little fragments of here's

01:08:12 some limited code that we're going to use, often fundamental types.

01:08:16 Yeah.

01:08:16 Cool.

01:08:16 All right, Anthony, this is really awesome.

01:08:18 I could tell you've done a massive amount of work on it.

01:08:21 That's more stuff I want to talk to you about, but I think we're out of time, like calling

01:08:26 into .NET, stuff like that.

01:08:27 But we'll save that for some other time, huh?

01:08:29 Yeah.

01:08:30 Until you have a little more progress.

01:08:34 Now, you know, there's always the two questions at the end of the show.

01:08:37 So if you're going to write some code, what editor are you using these days?

01:08:41 VS Code.

01:08:42 Right on.

01:08:42 Yeah.

01:08:43 And notable IPI library.

01:08:45 I mean, it could be Pigeon.

01:08:47 Anything else?

01:08:48 It's like you come across like, wow, this is awesome.

01:08:50 Yeah.

01:08:50 I think I mentioned it before.

01:08:51 Tortoise, I'm a big fan of at the moment.

01:08:53 Yeah.

01:08:54 Tortoise ORM.

01:08:54 Yeah.

01:08:55 It's a nice async ORM.

01:08:56 Yeah.

01:08:57 And Beanie as well.

01:08:59 I'm really enjoying playing with Beanie as an ODM.

01:09:01 Yeah.

01:09:02 Async ODM on top of Mongo.

01:09:04 Yep.

01:09:04 Beanie is very cool.

01:09:05 I'm actually having Roman Wright on the show talk about Beanie not too far out as well.

01:09:10 And Beanie is cool because it's basically Pydantic plus async and await on top of MongoDB, which

01:09:15 is cool.

01:09:15 Can I have three?

01:09:17 You can have three.

01:09:17 I released like a small package called Hathi, which is a SQL attack tool for Postgres, MySQL, and SQL Server.

01:09:26 I'm looking at your GitHub profile here.

01:09:28 You've got one of these fancy profiles with the new readme that shows up and all of your stars.

01:09:33 Look at this.

01:09:33 Yeah.

01:09:34 Like I made a custom graphic for everything.

01:09:35 Yeah, you did.

01:09:36 That's fantastic.

01:09:37 Okay.

01:09:38 What is this one called?

01:09:38 Hathi.

01:09:39 H-A-T-H.

01:09:41 I have too many repositories.

01:09:42 I need to go up.

01:09:43 Hathi.

01:09:43 Got it.

01:09:44 It is a dictionary attack tool for Postgres, MySQL, and MSSQL, Microsoft SQL Server designed for internal testing, of course.

01:09:53 Don't be bad.

01:09:54 Yeah.

01:09:55 So don't break the law.

01:09:56 And yeah, I've been using it to test like internal, not internal stuff, but test environments and look at like bad passwords.

01:10:05 See if like an admin has a login, like password, password one or admin or super user has a login or whatever.

01:10:11 Yeah.

01:10:12 It was also a test of like I think an await networking and how fast I could make it.

01:10:17 It can do up to 120 login attempts a second.

01:10:21 So yeah, on my machine.

01:10:23 But if you maybe this is a four core Mac, but yeah, if you had a few more CPUs, you could probably get a bit more than that.

01:10:30 But yeah, it'll go through a few thousand passwords.

01:10:32 And there's a password list in there as well of about 10,000 common database passwords.

01:10:37 Yeah.

01:10:37 Nice.

01:10:38 So yeah, just make sure your password is not on the list.

01:10:40 And if it is, you can raise a pull request to remove it.

01:10:43 Yeah.

01:10:44 Yeah.

01:10:45 You don't want to have your database open.

01:10:48 Like it doesn't get much worse than I just have read write access to your database.

01:10:53 But that's what this would test for, right?

01:10:55 Yeah.

01:10:56 So you can scan a cluster or a machine and see if it can predict what the username password is.

01:11:01 Yeah.

01:11:01 Cool.

01:11:02 All right.

01:11:02 Well, those are three great recommendations.

01:11:04 It's awesome to have you back on the show.

01:11:07 And congratulations on the work here.

01:11:08 You know, final call to action.

01:11:10 You know, people maybe to try Pigeon.

01:11:12 You want them to contribute.

01:11:13 Got other ideas.

01:11:14 What do you say?

01:11:15 Yeah.

01:11:15 Go to trypigeon.com.

01:11:17 Yeah.

01:11:17 On trypigeon.com, you'll see at the top, you can try it out live.

01:11:21 Link to documentation.

01:11:22 Link to download.

01:11:23 Yeah.

01:11:24 I'd love more contributions or discussion, I think, around what optimizations you could have and people using it and checking it out.

01:11:31 And if you have any issues, you just raise them on GitHub and I'll check them out.

01:11:34 All right.

01:11:34 Fantastic.

01:11:35 Well done.

01:11:36 I can tell this is a ton of work.

01:11:37 So you've come really a long ways.

01:11:40 And congrats on 1.0 when it comes out in a few days.

01:11:42 Thanks, Michael.

01:11:43 Yeah.

01:11:43 You bet.

01:11:43 See ya.

01:11:44 See ya.

01:11:45 This has been another episode of Talk Python To Me.

01:11:48 Thank you to our sponsors.

01:11:50 Be sure to check out what they're offering.

01:11:51 It really helps support the show.

01:11:53 Choose Shortcut, formerly Clubhouse.io, for tracking all of your project's work.

01:11:58 Because you shouldn't have to project manage your project management.

01:12:01 Visit talkpython.fm/shortcut.

01:12:04 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

01:12:09 Develop, deploy, and scale your modern applications faster and easier.

01:12:13 Visit talkpython.fm/Linode and click the Create Free Account button to get started.

01:12:17 Do you need a great automatic speech-to-text API?

01:12:21 Get human-level accuracy in just a few lines of code.

01:12:23 Visit talkpython.fm/assemblyai.

01:12:26 Want to level up your Python?

01:12:28 We have one of the largest catalogs of Python video courses over at Talk Python.

01:12:32 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:12:37 And best of all, there's not a subscription in sight.

01:12:40 Check it out for yourself at training.talkpython.fm.

01:12:43 Be sure to subscribe to the show.

01:12:45 Open your favorite podcast app and search for Python.

01:12:47 We should be right at the top.

01:12:49 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:12:54 and the direct RSS feed at /rss on talkpython.fm.

01:12:58 We're live streaming most of our recordings these days.

01:13:02 If you want to be part of the show and have your comments featured on the air,

01:13:05 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:13:10 This is your host, Michael Kennedy.

01:13:11 Thanks so much for listening.

01:13:13 I really appreciate it.

01:13:14 Now get out there and write some Python code.

01:13:16 Thank you.

01:13:16 Bye.

01:13:17 Bye.

01:13:18 Bye.

01:13:19 Bye.

01:13:20 Bye.

01:13:21 Bye.

01:13:22 Bye.

01:13:23 Bye.

01:13:24 Bye.

01:13:26 Bye.

01:13:28 Bye.

01:13:30 Bye.

01:13:32 Bye.

01:13:34 you Thank you.

01:13:36 Thank you.