Learn Python with Talk Python's 270 hours of courses

#474: Python Performance for Data Science Transcript

Recorded on Thursday, Jul 18, 2024.

00:00 Python performance has come a long way in recent times, and it's often the data scientists,

00:05 with their computational algorithms and large quantities of data, who care the most about this form of performance.

00:11 It's great to have Stan Siebert back on the show to talk about Python's performance for data

00:16 scientists. We cover a wide range of tools and techniques that will be valuable for many Python

00:21 developers and data scientists. This is Talk Python to Me, episode 474, recorded July 18th, 2024.

00:30 Are you ready for your host, Daniels?

00:32 You're listening to Michael Kennedy on Talk Python to Me.

00:35 Live from Portland, Oregon, and this segment was made with Python.

00:39 Welcome to Talk Python to Me, a weekly podcast on Python.

00:46 This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy,

00:50 and follow the podcast using @talkpython, both accounts over at fosstodon.org.

00:56 And keep up with the show and listen to over nine years of episodes at talkpython.fm.

01:01 If you want to be part of our live episodes, you can find the live streams over on YouTube.

01:05 Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about

01:11 upcoming shows. This episode is sponsored by Posit Connect from the makers of Shiny.

01:16 Publish, share, and deploy all of your data projects that you're creating using Python.

01:21 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quatro, Reports, Dashboards, and APIs.

01:27 Posit Connect supports all of them. Try Posit Connect for free by going to talkpython.fm/posit.

01:33 And it's also brought to you by us over at Talk Python Training. Did you know that we have

01:41 over 250 hours of Python courses? Yeah, that's right. Check them out at talkpython.fm/courses.

01:48 Hey, Stan.

01:49 Hello.

01:49 Hello, hello. Welcome back to Talk Python to Me.

01:51 I'm glad to be here. Glad to talk performance.

01:53 I know. I'm excited to talk performance. It's one of those things I just never

01:57 get tired of thinking about and focusing on. It's just so multifaceted. And as we will see,

02:04 even for a language like Python that is not primarily performance-focused, there's a lot

02:08 to talk about.

02:09 Yeah, there's an endless bag of tricks.

02:11 Yeah. And I would say, you know, sort of undercut my own comment there. Python is

02:16 increasingly focusing on performance since like 3.10 or so, right?

02:21 Mm-hmm. And the secret is that it's because Python integrates so well with other languages,

02:26 it sort of always cared about performance in some way. It's just sometimes you had to

02:30 leave Python to do it, but you still got to keep the Python interface.

02:33 There's been such an easy high-performance escape hatch that making Python itself faster is

02:39 obviously not unimportant, but maybe not the primary focus, right? Like usability, standard

02:43 library, et cetera, et cetera. All right. For people who have not heard your previous episode,

02:49 let's maybe just do a quick introduction. Who's Stan?

02:52 Yeah. So I am Stan Siebert. I am a manager at Anaconda, well-known purveyor of Python

02:58 packages and such. My day-to-day job is actually managing most of our open source developers

03:03 at Anaconda. So that includes the Numba team. And so we'll be talking about Numba today,

03:06 but other things like we have people working on Jupyter and Beware for mobile Python and

03:11 other projects like that. And so that's what I do mostly is focus on how do we have an impact

03:16 on the open source community and what does Python need to sort of stay relevant and keep evolving.

03:22 I love it. What a cool job as well, position in that company.

03:24 Yeah. I'm really grateful. It's a rare position, so I'm really glad I've been able to do it for

03:28 so long.

03:29 Yeah. And I would also throw out in the list of things that you've given a shout out to,

03:33 I would point out PyScript as well.

03:36 Oh yes, of course. Yeah. I just started managing the PyScript team again, actually. And so

03:40 I forgot about that one too. Yes. PyScript. So Python in your web browser, Python everywhere,

03:44 on your phone, in your browser, all the places.

03:46 Yeah. I mean, it's a little bit out of left field compared to the other things that you

03:50 all are working on, but it's also a super important piece, I think. So yeah, really cool.

03:56 Really cool there. So I think we set the stage a bit, but maybe let's start with Numba. That

04:04 one's been around for a while. Some people know about it. Others don't. What is Numba and how do

04:09 we make Python code faster with Numba?

04:11 Yeah. So there've been a lot of Python compilation projects over the years. Again, Numba's very

04:16 fortunate that it's now 12 years old. We've been doing it a long time and I've been involved

04:20 with it probably almost 10 of those years now. And Python, I think one of Numba's success

04:25 points is trying to stay focused on an area where we can have a big impact. And that is

04:29 trying to speed up numerical code. So there's a lot of, again, in data science and other

04:34 sciences, there's a lot of need to write custom algorithms that do math. And Numba's sweet

04:39 spot is really helping you to speed those up. So we see Numba used in a lot of places

04:43 where maybe the algorithm you're looking for isn't already in NumPy or already in

04:47 JAX or something like that. You need to do something new. Projects like UMAP, which do

04:52 really novel sort of clustering algorithms, or I just at SciPy, I learned more about a

04:56 project called Stumpy, which is for a time series analysis. Those authors were able to

05:00 use Numba to take the numerical core of that project that was the sort of the time bottleneck

05:06 and speed it up without having to leave Python. And so that is, I think, really where Numba's

05:10 most effective.

05:12 Sure. If you look at a lot of programs, there might be 5,000 lines of code or more, but

05:18 even just something only as big as 5,000 lines, there's a lot of code, but only a little bit

05:23 of it really actually matters, right?

05:25 Yeah. That's what we find a lot is when you sit down and measure your code, you'll spot

05:31 some hotspots where 60 or 70 or 80% of your time is spent in just like three functions,

05:35 or something. And that's great. And if that's the case, that's great because you can just

05:40 zero in on that section for speeding things up and not ruin the readability of the rest

05:44 of your program. Sometimes optimization can make it harder to read the result. And so

05:49 there's always a balance of you have to keep maintaining this project. You don't want to

05:53 make it unreadable just to get 5% more speed.

05:55 Yeah, absolutely. Not just the readability, but the ability to evolve it over time, right?

06:02 So maybe it's like, "Oh, we're going to compile this section here using Numba or

06:08 Cython or something like that." Well, maybe I was going to use this cool new

06:11 API package I found, but I can't just jam it in there where it's compiled. That's

06:16 unlikely to work well, right? And things like that. And so, yeah, a lot of times there's

06:22 these big sections that look complicated. They look slow. They're not actually.

06:26 Yeah. And one thing I also often emphasize for people is that when you think about the

06:30 time your program takes, think about the time you spent working on it as well as the time

06:34 you spent running it. And so, because we've heard from a lot of projects who said they

06:38 were able to get major speedups, not necessarily because Numba compiled their code to be incredibly

06:44 fast, but it compiled it to be fast enough that they could try new ideas quicker. And

06:49 so they got to the real win, which was a better way to solve their problem because they weren't

06:54 kind of mired in just kind of boilerplate coding for so long.

06:58 Right, right, right. It turns out I learned I should use a dictionary and not a list.

07:02 And now it's a hundred times faster. And that wasn't actually a compiling thing. That was

07:07 a visibility thing or something, right?

07:09 Yeah. Try more things is always helpful. And so something that a tool that lets you do

07:13 that is really valuable.

07:14 A hundred percent. So what tools do you recommend for knowing? Because our human intuition sometimes

07:20 is good, but sometimes it's really off the mark in terms of thinking about what parts

07:24 are slow, what parts are fast.

07:25 That's something I definitely, when I've talked to people, everyone thinks they know where

07:28 the heart, the slow part is, but sometimes they're surprised. And so you definitely,

07:32 before you do anything, it does, this is not just Numba advice. This is any time before

07:36 you're going to speed up your program measure something. So what you want is you want a

07:40 representative benchmark, something that's not going to run too fast because often, you

07:45 know, like unit tests run too quickly to really tell you to exercise the program in a realistic

07:49 way. So you want a benchmark that doesn't run too long, but maybe like five minutes

07:53 or something. And then you're going to want to run that through a profiling tool.

07:57 And there are several options. I just usually tell people to use C profile. It's built into

08:01 the standard library in Python. It's a great tool. It does the job for most stuff. And

08:05 so sometimes you may, there may other tools, things like snake viz and other things to

08:08 help you interpret the results of the profile, but often you'll use C profile to collect

08:12 the data. And what this does is it samples, it sort of records as the program is running,

08:18 what are all the functions that are being called and how much, and how much time are

08:22 they taking? And there are different strategies for how to do this, but fundamentally what

08:26 you get out is a, essentially a dataset that says, you know, 2% of the time in your, in

08:32 your, in your program, this function was running in 3% of this function was running. And you

08:36 can just sort that in descending order and look and see where, what pops out at the top.

08:41 And sometimes you're surprised. Sometimes you find out it's actually, it wasn't my

08:44 numerical code. It's that I spent, you know, 80% of my time doing some string operation

08:48 that I didn't realize I needed to do over and over again.

08:50 - Right, right. Exactly. Some weird plus equals with a string was just creating a thousand

08:57 strings to get to the endpoint or something like that. Yeah.

08:59 - Yeah. And I could have just done that once up front. It's good to do the profiling just

09:02 to make sure there isn't an obvious problem before you get into the more detailed optimization.

09:08 - Yeah. Before you start changing your code completely, it's execution method or whatever.

09:13 - Yep.

09:13 - Yeah. Yeah. And, you know, shout out to the PyCharm folks. They've got to push the

09:17 button to profile and they've got to visualize it and they just run C profile right in there.

09:21 So that's like C profile on easy mode. You know, you get a spreadsheet and you get a

09:25 graph. What about other ones like bill, FIL or anything else? Like any other recommendations?

09:30 - Yeah. So that's an interesting point is C profile is for compute time profiling. An

09:36 interesting problem you run into is this tool does, which is data is memory profiling, which

09:41 is often a problem when you're scaling up. And that's actually one of the other good things to

09:45 keep in mind when you're optimizing is what am I trying to do? Am I trying to get done faster?

09:48 Am I trying to save on compute costs? Am I trying to go bigger? And so I have to speed things up so

09:53 that I have room to put more data in. If that's where you're going, you might want to-

09:56 - Or am I just out of memory, right? Can I just not do-

09:59 - Yeah. Or am I already stuck? And so there it is very easy in Python to not recognize when you

10:05 have temporary arrays and things. Because again, it's also very compact and you're not seeing

10:09 what's getting allocated. You can accidentally blow up your memory quite a lot. And so this kind

10:14 of a profiler is a great option for, and what it can often show you is they'll kind of a line by

10:20 line. This is how much memory was allocated in each line of your program. So you can see, oh,

10:25 that one line of pandas, oops, that did it. - Yeah. I can't remember all the details. I

10:31 talked to Ipmar about this one, but I feel like it also keeps track of the memory used even down

10:37 into like NumPy and below, right? Not just Python memory where it says now there's some opaque blob

10:44 of data science stuff. - Yeah. And actually even on the compute part, there's sort of two approaches. So Cprofile is focused on counting function time,

10:52 but sometimes you have a long function and if you're making a bunch of NumPy calls,

10:56 you might actually care line by line how much time is being taken. And that can be a better

11:00 way to think about it. And so I think the tool is called LineProfiler. I forget the exact URL, but

11:06 it's an excellent tool in Python for, there's one in R and there's an equivalent one. Yes. Robert

11:13 LineProfiler. There you go. - LineProfiler. Oh, it's archived, but still can be used. Yeah. - I have to find another tool now. This is my go-to for so long.

11:22 I didn't realize it had already been archived. Oh, there's a- - Hey, if it still works, it's all good.

11:26 It's all good. - It's been transferred to a new location. So that's where it lives now. But yeah,

11:31 line profiling is another, I often use them complimentary sort of tools. As I zero in on

11:35 one function that's with CProfile, and then I'll go line profile that function.

11:39 Oh, interesting. Yeah. Okay. - Drill in further.

11:41 Yeah. Like, okay, this is the general area. Now let's really focus on it.

11:45 Memray is another one. I talked to the folks from Bloomberg about that.

11:48 Oh, okay. I have not used this one. Yeah. This is a pretty new one and it's quite neat the way

11:53 it works. It's, yeah, this one actually tracks C and C++ and other aspects of allocations as well.

12:01 So one of the problems you can run into with profiling is especially memory profiling,

12:06 I think, although if you just want to know about memory, but the more you monitor it,

12:09 the more it becomes kind of a Heisenberg quantum mechanics type thing. Once you observe it,

12:14 you change it. And so the answers you get by observing it are not actually what are happening.

12:19 So you got to keep a little bit of an open mind towards that as well. Right?

12:23 Yeah. And that's even a risk with, you know, other, the compute side of the profiling is

12:27 some, you're using some compute time to actually observe the program, which means that it can,

12:32 and these tools try to subtract out that bias, but it does impact things. And so you may want

12:38 to have kind of a benchmark that you can run as your kind of real source of truth that you run

12:43 without the profiler turned on just to see a final run time run with the profiler to break it down.

12:48 And then when you're all done, you're going to want to run that program again with the profiler off

12:52 to see if you've actually improved it while clock time wise.

12:55 Yeah. Yeah, absolutely. That's a really good point. Maybe do a percent time it type of thing,

13:00 something along those lines. Okay. That was a little bit of a side deep dive into profiling

13:05 because you, before you apply some of these techniques like Numba and others, you certainly

13:11 want to know where to apply it. And part of that is you might need to rewrite your code a little

13:16 bit to make it more optimizable by Numba or these things. So first of all, like, what do you do to

13:23 use Numba? Right. It's just, you just put a decorator on there and off you go.

13:27 At the very simplest level, Numba's interface is supposed to be just one decorator. Now there's

13:31 some nuance obviously and other things you can do, but we tried to get it down to, for most people,

13:35 it's just that. And the end in NGIT means no Python, meaning we get, this code is not calling

13:41 the Python interpreter anymore at all. It is purely machine code, no interpreter access.

13:46 Interesting. Okay. So some of these like compile, do a thing and compile your Python code to

13:52 machine instructions. I feel like they still interact with like Py object pointers and they

13:57 still kind of work with the API of the Python data types, which is nice, but it's a whole,

14:06 whole lot slower of an optimization than now it's int32 and it's, you know, float32 and these are

14:12 on registers, you know? Yeah. And this is part of the reason why Numba focuses on numerical code

14:17 is that NumPy arrays and actually other arrays and PyTorch and other things that support the

14:21 buffer protocol, the Py, so really when Numba compiles this, it compiles sort of two functions.

14:27 One is a wrapper that handles the transition from the interpreter into no Python land, as we call

14:31 it. And then there's the core function that is kind of like you could have written in C or Fortran

14:35 or something. And that wrapper is actually doing all the Py object stuff. It's reaching in and

14:40 saying, ah, this, this integer, I'm going to pull out the actual number and oh, this NumPy array,

14:44 I'm going to reach in and grab the data pointer and pass those down into the core where the actual,

14:49 all the math happens. So the only time you interact with the interpreter is really at the edge.

14:54 And then once you get in there, you try not to touch it ever again. Now, Numba does have a feature

14:58 that we added some years ago called an object mode block, which lets you in the middle of your

15:02 no Python code, go back and actually start talking to the interpreter again. Right. Maybe use a

15:07 standard library feature or something. Yeah. The most common use we've seen is you like,

15:11 you want a progress bar to update or something that's not in your, you know, hot loop. You

15:15 don't want it. Right. You don't want to be going back to the interpreter in something that's really

15:18 performance critical, but inside of a function, you might have parts that are more or less,

15:22 you know, one out of a million iterations. I want to go update the progress bar or something

15:25 that's totally valid. And you can do that with Numba. That is, there's a way to get back to the

15:29 interpreter if you really need to. Okay. Yeah. And it says it takes and translates Python functions

15:34 to optimize machine code at runtime, which is cool. So that makes deploying it super easy

15:39 and you don't have to have like compiled wheels for it and stuff using industry standard LLVM

15:44 compilers and then similar speeds to C's in Fortran. Yeah. Which is awesome, but also has

15:52 implications if I can speak. For example, when I came to Python, I was blown away that I could

16:00 just have integers as big as I want. If I keep adding to them, they just get bigger and bigger

16:05 and like billions, bazillions of, you know, bits of accuracy. And it came from C++ and C# and where

16:14 you explicitly said it's a, it's an N32, it's an N64, it's a double. And these all had ranges of

16:20 valid numbers. And then you got weird like wraparounds and maybe you create an unsigned

16:24 one so you can get a little bit bigger. I suspect that you may fall victim or be subjected to these

16:32 types of limitations without realizing them in Python, if you add and get it because you're

16:37 back in that land. Right. Or, or does it do you guys magic to allow us to have big.

16:42 We do not handle the big integer, which is what you're describing as sort of that

16:46 integer that can grow without bound because our target audience is very familiar with NumPy.

16:50 NumPy looks at numbers sort of the way you're, you described from C++ and other languages.

16:55 The D type and all that stuff, right? Yeah. NumPy arrays always have a fixed

16:59 size integer and you get to pick what that is, but it has to be 8, 16, 32, 64. Some machines

17:04 can handle bigger, but that it is fixed. And so once you've locked that in, you can't over,

17:10 if you, if you go too big, you'll just wrap around and overflow. Yeah. So that limitation

17:13 is definitely present again in Numba, but fortunately NumPy users are already familiar

17:17 with thinking that way. So it isn't an additional constraint on them too much.

17:22 This portion of talk Python to me is brought to you by Posit, the makers of Shiny, formerly RStudio

17:29 and especially Shiny for Python. Let me ask you a question. Are you building awesome things?

17:35 Of course you are. You're a developer or data scientist. That's what we do. And you should

17:39 check out Posit Connect. Posit Connect is a way for you to publish, share, and deploy all the

17:45 data products that you're building using Python. People ask me the same question all the time.

17:50 Michael, I have some cool data science project or notebook that I built. How do I share it with my

17:55 users, stakeholders, teammates? Do I need to learn FastAPI or Flask or maybe view or react JS?

18:02 Hold on now. Those are cool technologies and I'm sure you'd benefit from them, but maybe stay

18:07 focused on the data project. Let Posit Connect handle that side of things. With Posit Connect,

18:11 you can rapidly and securely deploy the things you build in Python. Streamlit, Dash, Shiny,

18:17 Bokeh, FastAPI, Flask, Quarto, Ports, Dashboards, and APIs. Posit Connect supports all of them.

18:24 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise

18:29 requirements. Make deployment the easiest step in your workflow with Posit Connect. For a limited

18:35 time, you can try Posit Connect for free for three months by going to talkpython.fm/posit.

18:40 That's talkpython.fm/posit. The link is in your podcast player show notes.

18:46 Thank you to the team at Posit for supporting Talk Python.

18:49 And then one thing you said was that you should focus on using arrays.

18:55 Yes.

18:56 And that kind of data structures before you apply Numba JIT compilation to it. Does that mean

19:03 list as in bracket or these NumPy type vector things? We all have different definitions.

19:10 Yes. That's true. Array to Bay. Generally, yeah. Usually the go-to I talked about is a NumPy array.

19:15 So it has a shape. The nice thing, NumPy arrays can be multidimensional. So you can represent a

19:20 lot of complex data that way. But within an array, there's a fixed element size. That element could

19:26 be a record. So if you want for every cell in your array to store maybe a set of numbers or

19:30 a pair of numbers, you can do that with custom D types and things. And Numba will understand that.

19:35 That's the ideal data structure. Numba does have, we added support a couple of years ago

19:40 for other data structures because the downside to a NumPy array is that it's fixed size. Once you

19:45 make it, you can't append to it like you can a Python list. So Numba does have support for both

19:51 what we call typed lists and typed dictionaries. So these are sort of special cases of lists and

19:56 dictionaries in Python where the, in the case of a list, every element in the list has the same type

20:02 or in the case of a dictionary, the keys are all the same type and the values are all the same type.

20:06 And those cover a lot of the cases where, you know, when users want to make things where they

20:11 don't know how long it's going to be, you're going to append in the algorithm. A list is a much more

20:16 natural thing than a NumPy array where you might like over-allocate or something that seems.

20:19 And dictionaries, our dictionary implementation is basically taken straight from CPython's

20:24 dictionary implementation. So it's very tuned and very fast in the same way CPython's is.

20:29 We just had to modify it a little bit to add this type information, but it's really good for kind

20:34 of lookup random items kind of stuff. So those are available as additional data structures

20:39 in addition to the array. And to use those, I would say from Numba import and something like this.

20:44 There are new type in the, in the docs, I'll show you, you can sort of import a typed list as a

20:49 special class that you can create. The downside by the way, is that, and the reason we have those,

20:55 and we don't just take, historically Numba used to try and let you pass in a Python list,

21:00 is that wrapper function would have to go recursively through the list of list of lists

21:04 of whatever you might have and pop out all of the elements into some format that wasn't all

21:09 Py objects so that the no Python code could manipulate them quickly. And then how do you

21:14 put it all back if you modify that sort of shadow data structure? And so what we realized is that

21:19 was confusing people and actually added a lot of overhead and calling functions took too long.

21:23 So we instead went up a level and said, okay, we're going to make a new kind of list that you

21:27 at the interpreter level can opt into for your algorithm. And so accessing that list from Python

21:32 is slower than a Python list, but accessing it from Numba is like a hundred times faster.

21:36 So you kind of have to decide for the, while I'm in this mode, I'm optimizing for numbers

21:43 performance, not for the Python interpreter performance. Which is reasonable often I'd

21:47 imagine because this is the part you found to be slow.

21:49 Yeah. That's the trade-off you make. And so, yeah. So we would not suggest people use type

21:54 list just in random places in their program. It's really intended to be used. Yeah.

21:59 I heard this is fast. So we're just going to replace all like new rule bracket bracket is

22:04 disallowed. We're not using this one, right? Yeah. When you're working with Python objects,

22:08 Python's data structures can't be beat. They are so well-tuned that it's very,

22:13 very hard to imagine something that could be faster than them.

22:16 All right. So maybe one more thing on Numba we could talk about is so far, I imagine people have

22:22 been in their mind thinking of, I have at least running on CPUs, be that on Apple Silicon or

22:28 Intel chips or AMD or whatever, but there's also support for graphics cards, right?

22:34 Yes. Yeah. So for a very long, I mean, we've had this again for 10 plus years. We were very early

22:40 adopters of CUDA, which is the programming interface for NVIDIA GPUs. CUDA is supported

22:45 by every NVIDIA GPU, whether it's a low end gamer card or a super high end data center card,

22:50 they all support CUDA. So that was really nice for people who were trying to get into GPU

22:54 programming. You could use inexpensive hardware to learn. And so on both Windows and Linux,

22:59 Macs don't have NVIDIA GPUs for a long, long time now, but on Windows and Linux, you can

23:04 basically write what they call a CUDA kernel in pure Python. And it just, you know, and you can

23:10 pass up, you know, arrays, either NumPy arrays, which then have to be sent to the card or special

23:15 GPU arrays that are already on the card. That is a great way for people to learn a bit more about

23:20 GPU programming. I will say Numba might not be the best place to start with GPU programming in

23:25 Python because there's a great project called Cupy, C-U-P-Y, that is literally a copy of NumPy,

23:33 but does all of the computation on the GPU. And CUPY works great with Numba. So I often tell

23:38 people, if you're curious, start with CUPY, use some of those NumPy functions to get a sense of,

23:44 you know, when is an array big enough to matter on the GPU, that sort of thing. And then when you

23:48 start wanting to do more custom algorithms, Numba is where you kind of turn to for that second

23:53 level. Yeah. So I feel like I'm referencing a lot of Atamar's work over here, but what if we

24:01 didn't have a NVIDIA GPU? Is there anything we could do? Yeah. So there are other projects. So

24:06 things like, as I mentioned here, like PyTorch and things are, have been ported to a number of

24:11 different backends. This is one thing the Numba team, we are frequently talking about is how do

24:16 we add non-GPUs or non-NVIDIA GPU support, but it's, I don't have an ETA on that. That's something

24:21 that we just still are kind of thinking about, but PyTorch definitely. And you can use PyTorch as an

24:27 array library. You don't have to be doing machine learning necessarily. You can use it for fast

24:32 arrays. It's just most popular for, because it supports, I mean, JAX is a very similar thing

24:36 because it adds the extra features you want for those machine learning models, but at the core

24:40 of every machine learning model, it's just array math. And so you could choose to just do that if

24:44 that's what you want. And then you could even still pass those arrays off to Numba at some point in

24:47 the future. Yeah. I didn't realize there was integration with that as well. Yeah. A while

24:55 back, we kind of worked with a number of projects to define a GPU array interface that's used by a

24:59 number of them so that we can see each other's arrays without having to copy the data, which is

25:03 very helpful. Yeah. Yeah. We have a lot more topics of the number, but I'm still fascinated

25:06 with it. So, so, you know, one of the big, all the rages, all the rage now is the vector databases,

25:13 obviously, because I want to query my way through LLM outputs. Like where in a hundred thousand

25:20 dimensional space does this question live or whatever? Is there any integration with that

25:25 kind of stuff back into Numba? Numba, not directly, although Numba does have interfaces,

25:30 oh, an easy way to call out to C functions. So a lot of these vector databases are implemented in,

25:36 you know, C or C++ or something. And so if you did have a use case where you needed to call out

25:40 to one of them, if there was a C function call to make directly to sort of the underlying library

25:45 that bypassed the interpreter, you can do that from Numba. And so I haven't seen anyone do that

25:49 yet, but it's, it's a generic sort of C interface. Yeah. Maybe there's a database driver written in

25:55 C, in which case, I don't know all the different databases. I know there are some that are

25:58 specifically built for it. Maybe DuckDB has got something going on here, but also MongoDB has,

26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at Lance DB

26:09 is one I've seen mentioned by used by a couple of projects. That's just for vector stuff. It

26:14 doesn't do anything else. Lance DB. Lance DB. Okay. I heard about it in the context of another Python

26:20 LLM project. Well, that's news to me, but it is a developer friendly open source database for AI.

26:26 Okay. Brilliant. All right. Well, like I said, we have more things to talk about.

26:30 So many things, but this is super. Okay. One more thing I want to ask you about here before we go

26:37 on, this has a Cython feel to it. Can you compare and contrast Numba to Cython? So Cython uses,

26:44 sort of requires you to put in some type information in order to be able to generate C

26:48 code that is more efficient. Numba is mainly focused on type inference. So we try to figure

26:55 out all the types in your function based on the types of the inputs. And so in general, we,

27:00 although Numba has places where you can put type annotations, we generally discourage people from

27:05 doing it because we find that it adds work and is error prone and doesn't really help the performance

27:10 in the end. Numba will figure out all the types directly. And so when it JIT compiles it, if it

27:15 comes in, if you call it twice with different types, does it just say, well, now we're going

27:19 to need a, this version of the function underscore strings list rather than integers list or

27:24 something? Yeah. Yeah. Every Numba compiled function actually contains a dispatcher that

27:28 will look at the argument types and pick the right one. And it's at pretty high granularity.

27:32 For example, people who are familiar with multidimensional arrays and like Fortran and C

27:37 know that they lay out the rows and columns in a different order, which has impact on how you do

27:42 loops and stuff for, for kind of to maximize locality. Numba can tell the difference between

27:46 those two cases and will generate different code for those two cases. So this is stuff that you as

27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole another

27:56 level. So you were like, okay, well, if it's laid out in this order, it's probably this, it appears

28:02 on the L, you know, the local cache for the CPU in this way. And so if we loop in that direction,

28:07 we'll like iterate through the cache instead of blow through it every loop or something like that.

28:11 Basically, we want to make sure that LLVM knows when the step size is one, and that's either on

28:16 the row or the column access, depending on that. And because, because I mean, compilers in general

28:20 are magic. Like we are grateful that LLVM exists because they can do so many tricks at that level.

28:26 I mean, because I mean, this is the same thing that powers Clang and other stuff. So, you know,

28:29 all of, all of macOS compilers are built on LLVM. And so we can leverage all of the tricks

28:35 they've figured out in decades of development. Yeah, that's cool. And Python itself is compiled

28:40 with that, at least on macOS. I just saw it last night, you know, that I have Clang some version,

28:46 whatever. And I was just looking at the version for my, my Python. It was compiled with that.

28:49 Cool. Okay. So we've been on the Numba JIT. Anthony Shaw wrote an article, Python 3.13

28:57 gets a JIT. And this is a pretty comprehensive and interesting article on, on what's going on

29:03 here. What's your experience with this, this JIT coming to Python 3.13?

29:07 This is a, and they've definitely tried to set expectations here that this first release is

29:11 really planting a flag is to say, we're going to start building on top of this base. And so

29:17 as far as I've seen the benchmarks for 3.13 are not going to be like the world on fire kind of

29:22 stuff. Throwing away our C code and rewriting operating systems and JIT-ed Python.

29:27 But you have to take a first step. And this is honestly pretty, it impressed me because

29:32 as a library, we can take a lot of risks and do things that I know we can depend on LLVM. We can

29:37 do all sorts of stuff that may be not work for everyone because if Numba doesn't solve your

29:41 problem, you just don't use it. You can just leave it out of your environment. You don't have to

29:45 install it. And so it's easy to kind of us to zero in on just the problems that we're good at and

29:50 say, if there, you know, if this doesn't solve your problem, just leave us out. When you actually

29:54 are putting a JIT into the core interpreter, everyone gets it. And you have to consider,

29:59 and Python is so broad that, you know, you could grab two Python experts and they may actually have

30:05 nothing in common with each other. And they, but are both equal claim to being experts at using

30:10 Python, but they might use them in different domains and have different libraries they care

30:14 about and all of that. - I feel that way when I talk to Canvas people and I think about me doing like web development, APIs and stuff. I'm like,

30:22 I think I'm really good at Python generally, you know, good enough to write good apps.

30:26 But then I look at this, I'm like, I don't even really, this, some of these like expressions that

30:31 go into the filter brackets. I'm like, I didn't even know that that was possible or really how

30:37 to apply, you know, like, it's just, it's weird to both feel really competent and understand it,

30:41 but also have it kind of no idea. And what I think what you're getting at is those are two

30:46 really different use cases and they're getting the same JIT and it has to work for both of them. But

30:50 you know, combinatorically explode that problem, right?

30:53 - Yeah. And you know, all the different hardware, I mean, Numba supports a lot of different

30:56 computers, but not everyone that Python supports. - Like micro Python.

31:00 - Yeah. Or we don't work on, you know, HP UX or anything like that necessarily. Python has an

31:06 enormous support, range of supported platforms, an enormous set of use cases and anything you do

31:11 is going to affect everyone. So this approach, which I would say this copy and patch JIT

31:15 approach is really clever because one of the, you know, again, Numba has to bring, we build a,

31:20 you know, custom version of LLVM. It's a little stripped down, but it's mostly still there.

31:23 So we have to bring that along for the ride. That's a heavy, heavy dependency to put on the

31:27 core interpreter for everyone. So the clever bit here is they figured out how to have a JIT,

31:33 but still do all the compiler stuff at build time. So when you build this copy and patch JIT,

31:38 you actually need LLVM, but only at build time and then it can go away. And so the person who

31:42 receives the interpreter doesn't need LLVM anymore. And so they basically built for themselves a

31:46 bunch of little template fragments. This is the patching part is basically you're saying,

31:51 I've got a bunch of fragments that implement different op codes in the bytecode different

31:55 ways, and I'm going to string them together and then go in. And there's a bunch of fill in the

32:00 blank spots that I can go in and swap in the, okay, you get your value from here and then you

32:06 put it over here and all that. But the, but the actual machine code was generated by LLVM by the

32:10 person who built Python in the first place. I see. It just amazes me this works. And I'm,

32:15 I'm excited to see where they go with it because it was a clever way to avoid

32:19 adding a huge heavy dependency to Python. Let's start to get some of that JIT benefit.

32:23 - So it looks at some of the common patterns. I see, okay, we're looping over a list of loads

32:30 or something and replaces that with more native code or something along those lines.

32:35 - Yeah. Yeah. You essentially have a compiler print a bunch of little recipes that are,

32:40 if I do this pattern sub in this machine code, fill in these blanks, and it just have a table of them. So it's the challenge there is that there is a combinatorial

32:48 explosion again of how many different, a full blown compiler like LLVM has a bunch of rules.

32:55 It's rule-based. And so it's saying, if I see this pattern, I do this replacement and it keeps doing

33:00 all of this. And then at the end it says, okay, now I'm going to generate my machine code from

33:03 that, those transformations. If I don't have LLVM at runtime, I have to kind of figure out what are

33:08 the best templates up front, put them in this table. And then, and so there, this is where

33:13 honestly looking at different usage patterns will probably be a huge help is in practice,

33:18 you could have any sequence of byte codes, but in reality, you're probably going to have certain

33:21 ones a lot. And those are the ones you want to focus on. So I think once we, I don't know,

33:26 once we start getting this out in the community and start getting feedback on it, I'm curious to

33:29 see how rapidly it can evolve. That'll be really interesting. - Yeah. And this whole copy and

33:33 patch and Jit is we often hear people say, I'm a computer, I have a computer science degree.

33:38 And I think what that really means is I have a software engineering degree in, or I am a software

33:43 engineering person. They are not often creating new science, computer science of theories. They're

33:51 more like, I really understand how operating system works and programmers and compilers. And I,

33:55 I write JSON API. So I talk to databases. This is like true new research out of legitimate computer

34:02 science, right? This copy and patch, Jit. - Yeah. They mentioned, I mean, they cite a paper from

34:05 2021 and in computer science, going from paper to implementation in one of the most popular

34:11 languages on earth in three years seems pretty fast. - It does seem pretty fast, right? It

34:16 definitely seems pretty fast. And the reason I bring this up is I imagine that dropping it into

34:21 one of the most popular programming languages with a super diverse set of architectures and

34:26 use cases will probably push that science forward. - Yeah. This will be tested in one of the most

34:32 intense environments you could imagine. - You know, I mean, whatever they did for their,

34:36 their research paper or their dissertation or whatever, this is another level of putting it

34:41 into a test and experimentation to put it into Python. Yeah. Wild. Okay. Hopefully this speeds

34:47 things up. This is going to be interesting because it just happens, right? It just happens.

34:51 If you have Python 3.13, it's going to be looking at its patterns and possibly swapping out.

34:55 - This is complimentary to all of the techniques. The faster CPython folks have been doing all

35:00 along for many releases now is they've been looking at other ways to speed up the interpreter

35:05 without going all the way to a full blown compiler, which this is kind of getting you the final step.

35:10 So that's again, another interesting place is how does this compliment those? You know,

35:15 I don't know those, those details, but it's another tool in the toolbox to sort of go back

35:19 to the beginning. It's speed is about having a bunch of tools and you kind of pick up 5% here

35:24 and 5% there and you pile up enough 5% and pretty soon you have something substantial.

35:29 - Yeah, I absolutely, that was the plan, right? The faster CPython was to make it

35:33 multiples of times faster by adding 20% improvements, release over release,

35:38 over release and compounding percentages basically. The one thing I don't know about this

35:43 is we've had the specializing adaptive interpreter that was one of those faster CPython things that

35:49 came along. You know, is this just the next level of that or is this a replacement for that? I don't

35:54 know. I'm sure people can. - Yeah. I don't know. I don't know what the, what their roadmap is for that. Cause I think part of this is, this is so new. I think

36:00 they got to see how it works in practice before they start figuring out.

36:04 - I agree. It feels like an alternative to that specialized and adaptive interpreter,

36:08 but I don't know. Maybe some of the stuff they've learned from one made it made possible

36:12 or even as just an extension of it. Okay. What did we want to talk about next year? I think.

36:16 - You want to talk about threads? - No, let's talk. I want to talk about,

36:19 I want to talk about Rust really quick before we talk about then, because that'll be quick.

36:23 And then I want to talk about threads because threads will not be as quick. And it's super

36:27 interesting. It's been a problem that people have been chipping at for, for years and years and

36:33 years, the threads thing. But what do you think about all this Rust mania? I mean, it's shown

36:38 some real positive results, things like rough and a Pydantic and others, but it's actually a little

36:44 bizarrely controversial or maybe not bizarre, non-obviously controversial.

36:48 - Yeah. I mean, my take on the Rust stuff is I view it in the same light as when we use C and

36:52 Fortran historically, it's just Rust is a nicer language in many ways. And so being a nicer

36:57 language means it's certainly, you know, you could have taken any of these things and rewritten them

37:02 in C a long time ago and they would have been faster. You just didn't want to write that C code.

37:06 - Exactly. You know what? We could do this in assembler and you would fly guys.

37:11 - Yeah. So Rust is moving things, Rust lowering the bar to saying, okay, maybe I'll implement

37:18 the core of my algorithm outside of Python entirely. It's interesting. And honestly,

37:22 I would happily see Rust completely replace C as the dominant extension language in Python.

37:28 The trade-off here, and this is one of those things that's sometimes easy to forget again,

37:32 because the Python community is so diverse, is when you do switch something to Rust, you do

37:37 reduce the audience who can contribute effectively in some cases. That Python,

37:43 using Python to implement things has a benefit for the maintainers if it lets them get more

37:48 contributions, more easily onboard new people. I hear this a lot actually from academic software

37:55 where you have this constant rotating, you know, students and postdocs and things. And so how

38:00 quickly you can onboard someone who isn't a professional software developer into a project

38:03 to contribute to it is relevant. And so, but I think it's different for every project. There

38:09 are some things like, you know, again, Rust and cryptography makes total sense to me because

38:14 that's also a very security conscious thing. You really don't want to be dealing with C buffer

38:18 overflows in that kind of code. And so the guarantees Rust offers are valuable also.

38:23 Well, and I think that that's also makes sense, even outside of just security directly,

38:29 you're going to build a web server. It's a nerve wracking thing to run other people's code

38:33 on an open port on the internet. And so this is better. One of the things I switched to is,

38:39 I recently switched to Granian for a lot of my websites, which is a Rust HTTP server. It's

38:46 comparable in performance, slightly faster than other things, but it's way more,

38:50 it's deviation from its average is way, way better.

38:55 So it's just more consistent.

38:57 More consistent, but also, you know, like the average, for example, the average, where's the

39:02 versus third-party server. That's the one I want. So for against a micro-WSGI, for example,

39:07 right. It's six milliseconds versus 17 milliseconds. So like, whatever. But then

39:12 you look at the max latency is 60 versus three seconds. It's like, oh, hold on. Right. But the

39:16 fact is written in Rust, I think feels it's a little bit of extra safety, all other things

39:21 being equal. Right. And that, I mean, obviously a lot of caveats there.

39:25 Yeah. Actually the interesting point about, and this is not unique to Rust, this is again,

39:29 the same problem with C and other things is that it's a little bit interesting. On one hand,

39:32 we're pushing the Python interpreter and JITs and all this other stuff at the same time as you're

39:36 thinking about whether to pull code entirely out of Python. And it creates a barrier where the JIT

39:42 can't see what's in the Rust code. And so if there was an optimization that could have crossed that

39:46 boundary, it's no longer available to the compilers. Yeah. This is a problem the Numba team

39:51 has been thinking about a lot because our number one request, aside from, you know, other GPUs,

39:57 is can Numba be an ahead of time compiler instead of a just-in-time compiler? And we were like,

40:02 superficially, yes, that's straightforward. But then we started thinking about the user experience

40:06 and the developer experience. And there are some things that you lose when you go ahead of time

40:10 that you have with the JIT and how do you bridge that gap? Yeah, it gets tricky.

40:14 We've been trying to figure out some tooling to try and bridge that. So we, at SciPy, we did a

40:19 talk on a project we just started called Pixie, which is a sub-project of Numba that is trying to,

40:24 which doesn't have Rust support yet, but that's been one of the requests. So if you go to

40:28 github.com/numba/pixie, see if they've indexed it. Oh, they're perfect. Okay.

40:33 Search engines. Search engines are pyramidic.

40:36 They really are.

40:36 But yeah, but Pixie, we gave a talk about it at SciPy. It's very early stages,

40:42 but what we're trying to do is figure out how to, in the ahead of time compilation,

40:45 whether that's C or Rust or even Numba, eventually, capturing enough info that we can

40:51 feed that back into a future JIT so that the JIT can still see what's going on in the compiled code

40:56 as kind of a future-proofing ecosystem.

40:58 Yeah, that's cool. I know some compilers have profiling-based optimization type things.

41:04 Like you can compile it with some instrumentation, run it, and then take that output and feed it back

41:09 into it. And I have not, I don't know if I've ever practically done anything with that, but I'm like,

41:14 "Oh, that's kind of a neat idea to like, let it see what it does and then feed it back."

41:17 Is this sort of like that or what do you think?

41:19 Yeah, this is different. This is sort of, this is basically capturing in the library file. So

41:24 you compiled ahead of time to a library, capturing the LLVM bitcode so that you could pull it out and

41:29 embed it into your JIT, which might be have other LLVM bit codes. So then you can optimize, you can

41:35 have a function you wrote in Python that calls a function in C and you could actually optimize

41:39 them together, even though they were compiled at different times, implemented in different

41:43 languages, you could actually cross that boundary.

41:45 One's like a head of time, just standard compilation. And one is like a JIT thing,

41:49 but it's like, "Oh, we're going to click it together in the right way." Yeah, yeah. Because JITs are nice in that they can see everything that's going on,

41:55 but then they have to compile everything that's going on and that adds time and latency and

41:59 things. And so can you have it both ways? Is that's really what we're trying to do.

42:03 It's nice when you can't have your cake and eat it too, right?

42:05 Yes.

42:06 My cake before my vegetables and it'd be fine.

42:08 I said that this Rust thing was a little bit controversial. I think there's some just,

42:14 "Hey, you're stepping on my square of Python space with a new tool." I don't think that has

42:20 anything to do with a Rust per se. It's just somebody came along and made a tool that is now

42:25 doing something maybe in better ways, or I don't know. I don't want to start up a whole debate

42:29 about that. But I think the other one is what you touched on is if we go and write a significant

42:34 chunk of this stuff in this new language, regardless what language it is, Rust is a

42:39 relatively not popular language compared to others. Then people who contribute to that,

42:44 either from the Python side, we're like, "Well, there's this big chunk of Rust code now that I

42:47 don't understand anything about, so I can't contribute to that part of it." And you might

42:52 even say, "Well, what about the pro developers or the experienced core developers and stuff?

42:57 They're experienced in pro at C in Python, which is also not Rust, right? It's this new area that

43:03 is more opaque to most of the community, which I think that's part of the challenge.

43:08 Yeah. Some people like learning new programming languages and some don't. So on some hand,

43:12 Rust can be, "This is a new intellectual challenge and it fixes practically some

43:16 problems you have with C." Or in other cases, it's the, "I wanted to worry about what this

43:20 project does and not another programming language." Right, right, right.

43:23 Kind of have to look at your communities and decide what's the right trade-off for you.

43:26 Maybe in 10 years, CPython will be our Python and it'll be written in Rust. I mean,

43:32 if we move to WebAssembly and like PyScript, Pyodide, Land a lot, having that right in,

43:36 there's a non-zero probability, but it's not a high number, I suppose. Speaking of something

43:42 I also thought was going to have a very near zero probability. PEP 703 is accepted. Oh my

43:49 goodness. What is this?

43:50 Yeah. So this was, again, a couple of years ago now, or I guess a year ago, it was finally

43:54 accepted. So for, since very long time, the Python interpreter has, because again, threads

44:00 are an operating system feature that let you do something in a program concurrently. And

44:05 now that all of our computers have four, eight, even more cores, depending on what kind of

44:11 machine you have, even your cell phone has more than one core. Using those cores requires

44:16 you have some kind of parallel computing in your program. And so the problem is that you

44:21 don't want, once you start doing things in parallel, you have the potential for race

44:25 conditions. You have the two threads might do the same thing at the same time or touch

44:30 the same data, get it inconsistent. And then your whole program starts to crash and other

44:34 bad things happen. So historically, the global interpreter lock has been sort of sledgehammer

44:39 protection of the CPython interpreter. But the net result was that threads that were

44:44 running pure Python code basically got no performance benefits. You might get other

44:48 benefits. Like you could have one block on IO while the other one does stuff. And so

44:52 it was easier to manage that kind of concurrency. But if you were trying to do compute on two

44:56 cores at the same time in pure Python code, it was just not going to happen because every

45:01 operation touches a Python object, has to lock the interpreter while you make that

45:04 modification.

45:05 Yeah. You could write all the multi-threaded code with locks and stuff you want, but it's

45:08 really just going to run one at a time anyway. A little bit like preemptive multi-threading

45:13 on a single core CPU. I don't know, it's weird. I've added all this complexity, but

45:17 I haven't got much out of it.

45:18 The secret of course, is that if your Python program contained not Python, like C or Cython

45:23 or Fortran, as long as you weren't touching Python objects directly, you could release

45:27 the GIL. And so Python, so especially in the scientific and computing and data science

45:31 space, where multi-threaded code has been around for a long time and we've been using

45:35 it and it's fine, Dask, you can use workers with threads or processes or both. And so

45:40 I frequently will use Dask with four threads and that's totally fine because most of the

45:44 codes in NumPy and Pandas, that release the GIL. But that's only a few use cases. And

45:48 so if you want to expand that to the whole Python interpreter, you have to get rid of

45:52 the GIL. You have to have a more fine-grained approach to concurrency. And so this proposal

45:58 from Sam Gross at Meta was basically a, one of many historical attempts to kind of make

46:05 that, get rid of that global interpreter lock. Many have been proposed and failed historically.

46:09 So getting this all the way through the approval process is a real triumph. At the point where

46:15 it was really being hotly contested, my, you know, maybe slightly cynical take is we have

46:20 between zero and one more chance to get this right in Python. Either it's already too late

46:26 or this is it. And I don't know which it is. I think there were two main complaints against

46:32 this change. Complaint number one was, okay, you theoretically have opened up a parallel

46:38 compute thing. So for example, on my Apple M2 Pro, I have 10 cores, so I could leverage

46:45 all of those cores, maybe get a five times improvement. But single core regular programming

46:52 is now 50% slower. And that's what most people do and we don't accept it. All right. That's

46:58 the one of the sides, you know, the Gilectomy and all that was kind of in that realm, I believe.

47:02 The other is yet to be determined, I think, is much like the Python two to three shift.

47:08 What the problem with Python two to three wasn't that the code of Python changed. It was that

47:13 all the libraries I like and need don't work here. Right. And so what is going to happen

47:18 when we take half a million libraries that were written in a world that didn't

47:23 know or care about threading and are now subjected to it?

47:26 Yeah. And there's sort of two levels of problem there. There's one that there's work that has

47:30 to be done to libraries. It's usually with C extensions that, you know, didn't assume that,

47:34 you know, they assumed a global interpreter lock and they'll have to do some changes to change that.

47:38 But the other one is a much more kind of cultural thing where the existence of the

47:43 GIL just meant that Python developers just wrote less threaded code.

47:46 Yeah. They don't think about locks. They don't worry about locks. They just assume

47:49 it's all going to be fine.

47:50 Because again, the race condition doesn't protect threaded code from, the GIL doesn't protect thread code from race conditions, but it just protects the

47:57 interpreter from race conditions. So you and your application logic are free to make all

48:01 the thread mistakes you want. But if no one ever ran your code in multiple threads, you would never

48:05 know. And so we're going to have to face that now.

48:08 I think that's a super interesting thing that it's a huge cultural issue that just people don't think about it. Like I said, I used to do a lot of C++ and C#.

48:16 And over there, it's, you're always thinking about threading. You're always thinking about,

48:19 well, what about these three steps? Does it go into a temporarily invalid state? Do I need to

48:23 lock this? Right. And like C# even had literally a keyword lock, which is like a context manager.

48:28 You say lock curly brace and everything in there is like into a lock and out of like,

48:32 cause it's just so part of that culture. And then in Python, you kind of just forget about

48:37 it and don't worry about it. But that doesn't mean that you aren't taking multiple, like

48:41 five lines of Python code. Each one can run all on its own, but taken as a block, they may still

48:47 get into these like weird states where if another thread after three lines observes the data,

48:52 it's still busted. Right. It's just the culture doesn't talk about it very much.

48:57 Yeah. No one ever runs your code in multiple threads. All of those bugs are theoretical.

49:01 And so it's now what's going to shift is, you know, all of those C extensions will get fixed

49:07 and everything will be, you know, they'll fix those problems. And then we're gonna have a

49:10 second wave of everyone seeing their libraries used in threaded programs and starting to discover

49:16 what are the more subtle bugs? Do I have global state that I'm not being careful with? And it's

49:21 gonna it's going to be painful, but I think it's necessary for Python to stay relevant

49:25 into the future. I'm a little worried. I mean, one of the common questions we hear is sort of

49:29 why is this what multiprocessing is fine? Why don't we do that? And definitely,

49:34 multiprocessing is big challenge is processes don't get to share data directly. So either I,

49:40 you know, if even if I have like read only data, I might if I have to load two gigabytes of data

49:44 in every process, and I want to start 32 of them because I have a nice big computer.

49:48 I've just 32 X my data, my memory usage, just so that I can have multiple concurrent computations.

49:56 Now there are tricks you can play on things like Linux, where you load the data once and the rely

50:00 on forking to preserve pages of memory. Linux does cool copy on write stuff when you fork,

50:06 but that's like fragile and not necessarily going to work. And then the second thing,

50:09 of course, is if any of those have to talk to each other. Now you're talking about pickling

50:13 objects and putting them through a socket and handing them off. And that is again,

50:17 for certain kinds of applications, just a non-starter. Yeah, but then people just start

50:21 going, well, we're just going to rewrite this in a language that lets us share pointers.

50:24 Yeah. Or at least memory in process. Yeah. Yeah. There's again, there are a lot of Python users

50:28 where this, they don't need this. They don't care about this. This will never impact them.

50:32 And then there's a whole class of Python users who are desperate for this and really, really want it.

50:38 Sure. I, you know, my, I think there's a couple of interesting things here.

50:43 One, I think that this, I think this is important for stemming people leaving. I thought, I actually

50:50 don't hear this that much anymore, but I used to hear a lot of we've left for go because we

50:56 need better parallelism or we've left for this performance reason. And I don't know,

51:00 that's just a success story of the faster CPython initiative or all the people who had been around

51:05 and decided they needed to leave. They're gone. And they, we just don't hear them anymore because

51:09 they left. It's like, you know, I used to hear him say this at the party, but then they said,

51:13 they're going to leave. And I don't hear anyone say they're leaving. Well, it's because everyone's

51:16 still here. Didn't say that. I don't know, but I do think having this as a capability will be

51:22 important for people to be able to maybe adopt Python where Python was rejected at the proposal

51:29 stage. You know, like, should we use Python for this project or something else? Oh, we, we need

51:33 threading. We need computational threading. We've got, you know, 128 core it's out. Right. And then

51:38 no one comes and complains about it because they never even started that process. Right. So it'll

51:42 either allow more people to come into Python or prevent them for leaving for that same argument

51:47 on some projects. I think that's, that's a pretty positive thing here.

51:50 Yeah. There's, there's, yeah, we, we don't get to count all of the projects that didn't come

51:54 into existence because of the, of the global interpreter lock. It's easy when you're in it

51:59 to sort of adjust your thinking to not see the limitation anymore because you're so used to

52:04 routing around it. You don't even stop and think, oh man, I got to worry about threads. You just

52:08 don't think threads. I totally agree. And I'll give people two other examples that maybe resonate

52:12 more if this, this doesn't resonate with them. It's the, what have I said? Oh, it's a little

52:16 bit challenging to write this type of mobile phone application in Python. Like, well, it's nearly

52:24 impossible to write a mobile phone application in Python. So we're not even focusing on that as an

52:29 issue because no one is, I know, beware and a few other things there, but there's a little bit of

52:34 work. So I don't want to just, I don't want to like, yeah, I'm not trying to talk bad about them,

52:37 but as a community, there's not like, it's not a react native sort of thing or a flutter where

52:42 there's a huge community of people who are just like, and we could do this. And then how do we,

52:46 like, there's just not a lot of talk about it. And that doesn't mean that people wouldn't just

52:50 love to write mobile apps in Python. It's just, it's so far out of reach that it's, it's just a

52:57 little whisper in the corner for people trying to explore that rather than a big den. And I think,

53:02 you know, same thing about desktop apps. Wouldn't it be awesome if we could not have electron,

53:06 but like some really cool, super nice UI thing, that's almost pure Python. It would, but people

53:12 were not focused on it because no one's trying to do it, but no one's trying to do it because

53:16 there weren't good options to do it with. Right. And I think the same story is going to happen

53:20 around performance and stuff with this. Just to jump in, you know, since I have to

53:24 talk about the beware folks, I mean, you've described exactly the reason why we funded

53:28 the beware development is because yeah, if we don't work on that now before people sort of,

53:33 there's a lot of work that has to do before you reach that point where it's easy. And so

53:37 recently the team was able to get sort of tier three support for iOS and Android into CPython

53:43 three 13. So now we're at the first rung of the ladder of iOS and Android support in CPython.

53:48 That's awesome. Pega and briefcase, the two components of beware are really focused again

53:52 on that. Yeah. How do I make apps? How do I make it for desktop and mobile? And so, but it's,

53:56 yeah, we ran into people is that they just didn't even realize you could even think about doing

54:00 that. And so they just, they never stopped to say, oh, I wish I could do this in Python

54:04 because they just assumed you couldn't. And all the people who really needed to,

54:07 like were required to leave the ecosystem and make another choice.

54:12 And it will take the same amount of, I was gonna say, it takes the same amount of time with this.

54:15 Even once threads are possible in Python, it'll take years to shift the perception.

54:19 Yeah. And the probably some of the important libraries. Yeah. Yeah. All right. So I'm pretty

54:24 excited about this. I was hoping something like this would come on. I didn't know what form it

54:27 would be. I said, there were the two limitations, the libraries and the culture, which you,

54:31 you called out very awesomely. And then also the performance in the, this one is either neutral or

54:37 a little bit better in terms of performance. So it doesn't have that disqualifying killing of the,

54:43 the single threaded performance. The person taking care, I will say again,

54:46 because you have to be fairly conservative with CPython, because so many people use it

54:51 is that this will be an experimental option that by default, Python won't turn this on. You will

54:56 have Python 3.13 when you get it, we'll still have the global interpreter lock. But if you build

55:01 Python 3.13 yourself, or you get another kind of experimental build of it, there's a flag now at

55:06 the build stage to turn off the gill. So this is in this mode, they decided to make, you know,

55:11 not have to make double negatives. This is Python in free threading mode. And that will be an

55:16 experimental thing for the community to test, to try out, to benchmark and do all these things

55:21 for a number of years. They've taken a very measured approach and they're saying, we're not

55:25 going to force the whole community to switch to this until it's proven itself out. Everyone's had

55:30 time to port the major libraries, to try it out, to see that it really does meet the promise of

55:36 not penalizing single threaded stuff too much. Yeah. Or breaking the single threaded code too

55:41 much. Yeah. Yeah. The steering council is reserving the right to decide when this becomes, or if this

55:48 becomes the official way for Python, you know, I don't know, 3.17 or something. I mean, it could be,

55:52 it could be several years. And so I just want everyone not to panic. Yeah, exactly. Don't,

55:57 this doesn't get turned on in October. No, and this is super interesting. It's accepted. It only

56:04 appears in your Python runtime if you build it with this. So I imagine, you know, some people

56:10 will build it themselves, but someone also just create a Docker container with Python built with

56:14 this and you can get the free threaded Docker version or whatever. Right. We've already put

56:18 out Conda packages as well. So if you want to build a Conda environment, yeah, actually, if

56:21 you jump over to the, the Py free thread page. Yeah. Tell people about this. Yeah. We didn't

56:25 make this. This is the, the community made this, the scientific Python community put this together.

56:30 And this is a really great resource, again, focused on, you know, that, that community,

56:35 which really wants threading. Cause we have a lot of, you know, heavy numerical computation.

56:39 And so this is a good resource for things like how do you install it? So there's a link there on

56:44 what are your options for installing the free threaded CPython? You can get it from Ubuntu or

56:48 high-end for Conda. If you go look at the you know, and you could build it from source or get

56:53 a container. Yeah. So these are, again, this is very focused on the kind of things the scientific

56:57 Python community cares about, but, but these are things like, you know, have we ported Cython?

57:00 Have we ported NumPy? Is it being automatically tested? Which release has it? And the nice thing

57:06 actually is pip as of 24.1, I believe can tell the difference between wheels for regular Python

57:12 and free threaded Python. Oh, you can tell by the there's different wheels as well. Yeah. So there's

57:16 a, you know, Python has always had this thing called an ABI tag, which is just a letter that

57:20 you stick after the version number and T is the one for free threading. And so now you, a project

57:26 can choose to upload wheels for both versions and make it easier for people to test out stuff. So

57:31 for example, I mean, Cython, it looks like there are nightly wheels already being built. And so

57:36 this is, they're moving fast and, and, you know, definitely, and our condo, we're also very

57:40 interested in getting into this as well. So that's why we built the condo package for free threading.

57:44 And we're going to start looking at building more condo packages for these things in order to be

57:47 able to facilitate testing. Cause I think the biggest thing we want to make sure is if you want

57:51 to know if your code works, you want the quickest way to get an environment to have some place to

57:55 test. And so making this more accessible to folks is a really high priority. This is cool.

58:01 There was something like this for Python two to three. I remember it showed like the top,

58:05 top 1000 packages on PyPI. And then how many of them were compatible with Python three,

58:11 basically by expressing their language tag or something like that.

58:14 Yep.

58:14 So this is kind of like that. It's also kind of like the, can I use, I don't know if you're

58:18 familiar with that. Can I use from, from the web dev world?

58:22 Oh yeah. Oh, awesome. Yeah. Yeah. I've seen this.

58:24 You go and say, I want to use this, I want to use this feature and it'll, or, you know,

58:28 if I want to say web workers or something like that, and then it'll, you can, it'll show you

58:32 all the browsers and all the versions and when were they supported. And, and this sounds a little

58:36 bit like that, but for free threaded Python, which by the way, free threaded Python is the

58:40 terminology, right? Not no Gil, but free threaded.

58:42 That is what they've decided. I think they're worried about people trying to talk about no,

58:45 no Gil or, I mean, I don't know.

58:47 Gilful. Are you running on a Gilful? You know? Oh my gosh. Okay. Interesting. Now we have a few

58:56 other things to talk about, but we don't have really much time to talk about it. But there was

58:59 one thing that we were maybe going to talk about a bit with compiling. You said, you mentioned some

59:05 talk or something where people were talking about, well, what if we had a static language Python and

59:09 we compiled it and related to that kind of Mr. Magnetic says, could a Python program be compiled

59:15 into a binary, like a jar or a, you know, a go app or whatever.

59:20 There are other tools that look at that as a, yeah, a standalone executable. So yeah, one of

59:24 the things I just wanted to shout out a colleague of mine at Anaconda, Antonio Cuny, who is a well

59:29 known PyPI developer from long ago. He's worked on PyPI for 20 years. He's been working-

59:33 Not the package installing thing, but the JIT compiler.

59:36 PYPY.

59:37 PYPY. Sometimes phonetically, like over audio, that's hard to tell.

59:41 Yes. Yeah. So he's been thinking about this stuff for a very long time. His sort of key insight,

59:46 at least clicked in my head, was that Python is hard to compile because it is so dynamic.

59:51 I can, in principle, modify the attributes, like even the functions of a class at any point in the

59:57 execution of the program. I can monkey patch anything I can do. This dynamicness is really

01:00:02 great for making kind of magical metaprogramming libraries that do amazing things with very little

01:00:07 typing, but it makes compiling them really hard because you don't get to ever say,

01:00:13 okay, this can't ever change. And so what he's been trying to do with a project called Spy,

01:00:18 which he gave a talk on at PyCon 2024, but I think the recordings aren't up yet for that.

01:00:24 And so there isn't a, I don't think there's a public page on it, but he does have a talk on it.

01:00:28 And because I think they've got the keynotes up. The key kind of insight for me for Spy was to

01:00:33 recognize that in a typical Python program, all the dynamic metaprogramming happens at the

01:00:38 beginning. You're doing things like data classes, generating stuff and all kinds of things like

01:00:42 that. And then there's a phase where that stops. And so if we could define a sort of variant of

01:00:49 Python where those two phases were really clear, then you would get all of the dynamic expressiveness,

01:00:55 almost all the dynamic expressiveness of Python, but still have the ability to then feed that

01:01:00 into a compiler tool chain and get a binary. This is super early R and D experimental work,

01:01:05 but I think that's a really great way to approach it because often there's always been this tension

01:01:10 of, well, if I make Python statically compliable, is it just, you know, C with, you know, different

01:01:16 keywords? Do I lose the thing I loved about Python, which was how quickly I could express my

01:01:21 idea. And so this is again, to our, you know, having your cake and eating it too. This is

01:01:25 trying to find a way to split that difference in a way that lets us get most of the benefits

01:01:30 of both sides. That's pretty interesting. And hopefully that talks up soon. That'd be really

01:01:34 neat. Maybe by the time this episode's out, I know the PyCon videos are starting to roll,

01:01:38 like not out on YouTube, but out on, out on the podcast channels. It would be fantastic to have,

01:01:43 here's my binary of Python. Take my data science app and run it. Take my desktop app and run it.

01:01:49 I don't care what you have installed on your computer. I don't need you to set up Python

01:01:53 3.10 or higher on your machine and set up a virtual environment. Just here's my binary.

01:01:59 Do it as you will. That's another, I throw that in with the mobile apps and the front end or the

01:02:06 desktop apps or the front end Python. You know, that's another one of those things that it's,

01:02:10 nobody's pushing towards it. Not nobody, not that many people are pushing towards it because there's

01:02:14 not that many use cases for it that people are using it for because it was so challenging that

01:02:19 people stopped trying to do that. You know? Yeah. That's one thing I also, you know, people

01:02:23 probably hear me say this too many times, but the most people use apps when they use a computer,

01:02:29 not packages or environments. And so in the, in the Python space, we are constantly grappling

01:02:35 with how hard packages and environments are to work with, talk with, you know, decide again,

01:02:40 what languages are in what, you know, do I care about everything or just Python or whatever?

01:02:44 That's all very hard, but that's actually not how most people interact with the computer at all.

01:02:48 And so it really is one of those things. Again, this is one of the reasons I'm so interested in

01:02:52 Beware is briefcase is like the app packager. And the more they can push on that, the more we have

01:02:58 a story. And again, there are other tools that have been around for a long time, but that's

01:03:01 just what I think about a lot. We need to focus on tools for making apps because that's how we're

01:03:05 going to share our work with 99% of the earth. Yes. Yeah. A hundred percent. I totally agree.

01:03:11 And lots, lots of props to Keith Russell McGee and the folks over at Beware for doing that. And for

01:03:17 you guys supporting that work, because it's, it's one of those things where there's not a ton of

01:03:22 people trying to do it. It's not like, well, we're using Django, but is there another way we could

01:03:25 do it? It's basically the same thing, right? There it's creating a space for Python where it kind of,

01:03:31 I know there's PyInstaller and Py2App, but it's, it's pretty limited, right?

01:03:34 Yeah. There's not a lot of effort there. And so it's, it's, there are a few people who've

01:03:38 been doing it for a long time and others are getting more into it. And, and yeah, so it's,

01:03:42 I just, yeah, I wish that we could get more focus on it because there's, there are tools,

01:03:46 they're just don't get a lot of attention. Yeah. And they're not very polished and there's so many

01:03:50 edge cases and scenarios. All right. Let's close it out with just final thought on this little

01:03:54 topic and then, well, you wrapped this up for us. Do you think that's maybe a core developer thing?

01:03:59 I mean, I know it's awesome that Py2App and PyInstaller and PyFreeze are doing their things

01:04:03 that Torg are doing, doing their things to try to make this happen. But I feel like they're kind of

01:04:07 looking in at Python and go like, how can we grab what we need out of Python and jam it into an

01:04:12 executable and make it work? Like, should we be encouraging the core developers to just go like a,

01:04:16 a Python, PyScript, --windows and they're out, you get in .exe or something.

01:04:22 I don't know, actually, that would be a great question. Actually, I would ask Russell that

01:04:26 question. He would have probably better perspective than I would. At some level,

01:04:30 it is a tool that is dealing with a lot of problems that aren't core to the Python language.

01:04:34 And so maybe having it outside is helpful, but maybe there are other things that the core could

01:04:40 do to support it. I mean, again, a lot of it has to do with the realities of when you drop an

01:04:45 application onto a system, you need it to be self-contained. You need, sometimes you have to,

01:04:49 you know, do you have to brick the import library to know where to find things and all of that?

01:04:53 That's exactly what I was thinking is right. If Python itself didn't require like operating system

01:04:58 level fakes to make it think it, if it could go like, here is a thing in memory where you just

01:05:05 import, this is the import space. It's this memory address for these things. And we just run from the

01:05:10 exe rather than dump a bunch of stuff temporarily on disk, import it, throw it, you know, like that

01:05:14 kind of weirdness that happens sometimes. There is probably definitely improvements that could

01:05:18 be made to the import mechanism to support applications. Yeah, exactly. Well, we've

01:05:22 planted that seed. Maybe it will grow. We'll see. All right, Stan, this has been an awesome

01:05:26 conversation. You know, give us a wrap up on all this stuff, just like sort of final call to action

01:05:31 and summary of what you guys are doing at Anaconda. Because there's a bunch of different stuff we

01:05:34 talked about that are in this space. Yeah. I mean, mainly I would say, I would encourage people that

01:05:39 if you want to speed up your Python program, you don't necessarily have to leave Python.

01:05:42 Go take a look at some of these tools. Go, you know, measure what your program's doing. Look

01:05:47 at tools like Numba, but there are other ones out there, you know, PyTorch and Jax and all sorts of

01:05:50 things. There are lots of choices now for speed. And so Python doesn't have to be slow. You just

01:05:55 have to sort of figure out what you're trying to achieve and find the best tool for that.

01:05:58 Oh, one other thing I do want to shout out. I'm teaching a tutorial in a month over at the

01:06:04 Anaconda sort of live tutorials system, which will be how to use Numba. So if something you saw here

01:06:11 you want to go deep on, there will be a tutorial, hopefully linked in the show notes or something.

01:06:15 Yeah, I can link that in the show notes. No problem. Absolutely.

01:06:18 So I'll be going in. Is that the high performance Python with Numba?

01:06:22 Yes. Yes. So yeah, we'll be doing worked examples and you'll get to ask questions and all that

01:06:27 stuff. Cool. I'll make sure to put that in the show notes. People can check it out. All right.

01:06:31 Well, thanks for sharing all the projects that you guys are working on and just the broader

01:06:36 performance stuff that you're tracking. Yeah. Awesome. Glad to chat.

01:06:39 You bet. See you later.

01:06:40 This has been another episode of Talk Python to Me. Thank you to our sponsors. Be sure to check

01:06:46 out what they're offering. It really helps support the show. This episode is sponsored by Posit

01:06:51 Connect from the makers of Shiny. Publish, share and deploy all of your data projects that you're

01:06:56 creating using Python. Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards

01:07:03 and APIs. Posit Connect supports all of them. Try Posit Connect for free by going to

01:07:08 talkpython.fm/posit. P-O-S-I-T. Want to level up your Python? We have one of the largest catalogs

01:07:16 of Python video courses over at Talk Python. Our content ranges from true beginners to deeply

01:07:21 advanced topics like memory and async. And best of all, there's not a subscription in sight.

01:07:26 Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show. Open your

01:07:31 favorite podcast app and search for Python. We should be right at the top. You can also find

01:07:36 the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on

01:07:43 talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of

01:07:48 the show and have your comments featured on the air, be sure to subscribe to our YouTube channel

01:07:52 at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening.

01:07:58 I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon