#474: Python Performance for Data Science Transcript
00:00 Python performance has come a long way in recent times, and it's often the data scientists,
00:05 with their computational algorithms and large quantities of data, who care the most about this form of performance.
00:11 It's great to have Stan Siebert back on the show to talk about Python's performance for data
00:16 scientists. We cover a wide range of tools and techniques that will be valuable for many Python
00:21 developers and data scientists. This is Talk Python to Me, episode 474, recorded July 18th, 2024.
00:30 Are you ready for your host, Daniels?
00:32 You're listening to Michael Kennedy on Talk Python to Me.
00:35 Live from Portland, Oregon, and this segment was made with Python.
00:39 Welcome to Talk Python to Me, a weekly podcast on Python.
00:46 This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy,
00:50 and follow the podcast using @talkpython, both accounts over at fosstodon.org.
00:56 And keep up with the show and listen to over nine years of episodes at talkpython.fm.
01:01 If you want to be part of our live episodes, you can find the live streams over on YouTube.
01:05 Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about
01:11 upcoming shows. This episode is sponsored by Posit Connect from the makers of Shiny.
01:16 Publish, share, and deploy all of your data projects that you're creating using Python.
01:21 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quatro, Reports, Dashboards, and APIs.
01:27 Posit Connect supports all of them. Try Posit Connect for free by going to talkpython.fm/posit.
01:33 And it's also brought to you by us over at Talk Python Training. Did you know that we have
01:41 over 250 hours of Python courses? Yeah, that's right. Check them out at talkpython.fm/courses.
01:48 Hey, Stan.
01:49 Hello.
01:49 Hello, hello. Welcome back to Talk Python to Me.
01:51 I'm glad to be here. Glad to talk performance.
01:53 I know. I'm excited to talk performance. It's one of those things I just never
01:57 get tired of thinking about and focusing on. It's just so multifaceted. And as we will see,
02:04 even for a language like Python that is not primarily performance-focused, there's a lot
02:08 to talk about.
02:09 Yeah, there's an endless bag of tricks.
02:11 Yeah. And I would say, you know, sort of undercut my own comment there. Python is
02:16 increasingly focusing on performance since like 3.10 or so, right?
02:21 Mm-hmm. And the secret is that it's because Python integrates so well with other languages,
02:26 it sort of always cared about performance in some way. It's just sometimes you had to
02:30 leave Python to do it, but you still got to keep the Python interface.
02:33 There's been such an easy high-performance escape hatch that making Python itself faster is
02:39 obviously not unimportant, but maybe not the primary focus, right? Like usability, standard
02:43 library, et cetera, et cetera. All right. For people who have not heard your previous episode,
02:49 let's maybe just do a quick introduction. Who's Stan?
02:52 Yeah. So I am Stan Siebert. I am a manager at Anaconda, well-known purveyor of Python
02:58 packages and such. My day-to-day job is actually managing most of our open source developers
03:03 at Anaconda. So that includes the Numba team. And so we'll be talking about Numba today,
03:06 but other things like we have people working on Jupyter and Beware for mobile Python and
03:11 other projects like that. And so that's what I do mostly is focus on how do we have an impact
03:16 on the open source community and what does Python need to sort of stay relevant and keep evolving.
03:22 I love it. What a cool job as well, position in that company.
03:24 Yeah. I'm really grateful. It's a rare position, so I'm really glad I've been able to do it for
03:28 so long.
03:29 Yeah. And I would also throw out in the list of things that you've given a shout out to,
03:33 I would point out PyScript as well.
03:36 Oh yes, of course. Yeah. I just started managing the PyScript team again, actually. And so
03:40 I forgot about that one too. Yes. PyScript. So Python in your web browser, Python everywhere,
03:44 on your phone, in your browser, all the places.
03:46 Yeah. I mean, it's a little bit out of left field compared to the other things that you
03:50 all are working on, but it's also a super important piece, I think. So yeah, really cool.
03:56 Really cool there. So I think we set the stage a bit, but maybe let's start with Numba. That
04:04 one's been around for a while. Some people know about it. Others don't. What is Numba and how do
04:09 we make Python code faster with Numba?
04:11 Yeah. So there've been a lot of Python compilation projects over the years. Again, Numba's very
04:16 fortunate that it's now 12 years old. We've been doing it a long time and I've been involved
04:20 with it probably almost 10 of those years now. And Python, I think one of Numba's success
04:25 points is trying to stay focused on an area where we can have a big impact. And that is
04:29 trying to speed up numerical code. So there's a lot of, again, in data science and other
04:34 sciences, there's a lot of need to write custom algorithms that do math. And Numba's sweet
04:39 spot is really helping you to speed those up. So we see Numba used in a lot of places
04:43 where maybe the algorithm you're looking for isn't already in NumPy or already in
04:47 JAX or something like that. You need to do something new. Projects like UMAP, which do
04:52 really novel sort of clustering algorithms, or I just at SciPy, I learned more about a
04:56 project called Stumpy, which is for a time series analysis. Those authors were able to
05:00 use Numba to take the numerical core of that project that was the sort of the time bottleneck
05:06 and speed it up without having to leave Python. And so that is, I think, really where Numba's
05:10 most effective.
05:12 Sure. If you look at a lot of programs, there might be 5,000 lines of code or more, but
05:18 even just something only as big as 5,000 lines, there's a lot of code, but only a little bit
05:23 of it really actually matters, right?
05:25 Yeah. That's what we find a lot is when you sit down and measure your code, you'll spot
05:31 some hotspots where 60 or 70 or 80% of your time is spent in just like three functions,
05:35 or something. And that's great. And if that's the case, that's great because you can just
05:40 zero in on that section for speeding things up and not ruin the readability of the rest
05:44 of your program. Sometimes optimization can make it harder to read the result. And so
05:49 there's always a balance of you have to keep maintaining this project. You don't want to
05:53 make it unreadable just to get 5% more speed.
05:55 Yeah, absolutely. Not just the readability, but the ability to evolve it over time, right?
06:02 So maybe it's like, "Oh, we're going to compile this section here using Numba or
06:08 Cython or something like that." Well, maybe I was going to use this cool new
06:11 API package I found, but I can't just jam it in there where it's compiled. That's
06:16 unlikely to work well, right? And things like that. And so, yeah, a lot of times there's
06:22 these big sections that look complicated. They look slow. They're not actually.
06:26 Yeah. And one thing I also often emphasize for people is that when you think about the
06:30 time your program takes, think about the time you spent working on it as well as the time
06:34 you spent running it. And so, because we've heard from a lot of projects who said they
06:38 were able to get major speedups, not necessarily because Numba compiled their code to be incredibly
06:44 fast, but it compiled it to be fast enough that they could try new ideas quicker. And
06:49 so they got to the real win, which was a better way to solve their problem because they weren't
06:54 kind of mired in just kind of boilerplate coding for so long.
06:58 Right, right, right. It turns out I learned I should use a dictionary and not a list.
07:02 And now it's a hundred times faster. And that wasn't actually a compiling thing. That was
07:07 a visibility thing or something, right?
07:09 Yeah. Try more things is always helpful. And so something that a tool that lets you do
07:13 that is really valuable.
07:14 A hundred percent. So what tools do you recommend for knowing? Because our human intuition sometimes
07:20 is good, but sometimes it's really off the mark in terms of thinking about what parts
07:24 are slow, what parts are fast.
07:25 That's something I definitely, when I've talked to people, everyone thinks they know where
07:28 the heart, the slow part is, but sometimes they're surprised. And so you definitely,
07:32 before you do anything, it does, this is not just Numba advice. This is any time before
07:36 you're going to speed up your program measure something. So what you want is you want a
07:40 representative benchmark, something that's not going to run too fast because often, you
07:45 know, like unit tests run too quickly to really tell you to exercise the program in a realistic
07:49 way. So you want a benchmark that doesn't run too long, but maybe like five minutes
07:53 or something. And then you're going to want to run that through a profiling tool.
07:57 And there are several options. I just usually tell people to use C profile. It's built into
08:01 the standard library in Python. It's a great tool. It does the job for most stuff. And
08:05 so sometimes you may, there may other tools, things like snake viz and other things to
08:08 help you interpret the results of the profile, but often you'll use C profile to collect
08:12 the data. And what this does is it samples, it sort of records as the program is running,
08:18 what are all the functions that are being called and how much, and how much time are
08:22 they taking? And there are different strategies for how to do this, but fundamentally what
08:26 you get out is a, essentially a dataset that says, you know, 2% of the time in your, in
08:32 your, in your program, this function was running in 3% of this function was running. And you
08:36 can just sort that in descending order and look and see where, what pops out at the top.
08:41 And sometimes you're surprised. Sometimes you find out it's actually, it wasn't my
08:44 numerical code. It's that I spent, you know, 80% of my time doing some string operation
08:48 that I didn't realize I needed to do over and over again.
08:50 - Right, right. Exactly. Some weird plus equals with a string was just creating a thousand
08:57 strings to get to the endpoint or something like that. Yeah.
08:59 - Yeah. And I could have just done that once up front. It's good to do the profiling just
09:02 to make sure there isn't an obvious problem before you get into the more detailed optimization.
09:08 - Yeah. Before you start changing your code completely, it's execution method or whatever.
09:13 - Yep.
09:13 - Yeah. Yeah. And, you know, shout out to the PyCharm folks. They've got to push the
09:17 button to profile and they've got to visualize it and they just run C profile right in there.
09:21 So that's like C profile on easy mode. You know, you get a spreadsheet and you get a
09:25 graph. What about other ones like bill, FIL or anything else? Like any other recommendations?
09:30 - Yeah. So that's an interesting point is C profile is for compute time profiling. An
09:36 interesting problem you run into is this tool does, which is data is memory profiling, which
09:41 is often a problem when you're scaling up. And that's actually one of the other good things to
09:45 keep in mind when you're optimizing is what am I trying to do? Am I trying to get done faster?
09:48 Am I trying to save on compute costs? Am I trying to go bigger? And so I have to speed things up so
09:53 that I have room to put more data in. If that's where you're going, you might want to-
09:56 - Or am I just out of memory, right? Can I just not do-
09:59 - Yeah. Or am I already stuck? And so there it is very easy in Python to not recognize when you
10:05 have temporary arrays and things. Because again, it's also very compact and you're not seeing
10:09 what's getting allocated. You can accidentally blow up your memory quite a lot. And so this kind
10:14 of a profiler is a great option for, and what it can often show you is they'll kind of a line by
10:20 line. This is how much memory was allocated in each line of your program. So you can see, oh,
10:25 that one line of pandas, oops, that did it. - Yeah. I can't remember all the details. I
10:31 talked to Ipmar about this one, but I feel like it also keeps track of the memory used even down
10:37 into like NumPy and below, right? Not just Python memory where it says now there's some opaque blob
10:44 of data science stuff. - Yeah. And actually even on the compute part, there's sort of two approaches. So Cprofile is focused on counting function time,
10:52 but sometimes you have a long function and if you're making a bunch of NumPy calls,
10:56 you might actually care line by line how much time is being taken. And that can be a better
11:00 way to think about it. And so I think the tool is called LineProfiler. I forget the exact URL, but
11:06 it's an excellent tool in Python for, there's one in R and there's an equivalent one. Yes. Robert
11:13 LineProfiler. There you go. - LineProfiler. Oh, it's archived, but still can be used. Yeah. - I have to find another tool now. This is my go-to for so long.
11:22 I didn't realize it had already been archived. Oh, there's a- - Hey, if it still works, it's all good.
11:26 It's all good. - It's been transferred to a new location. So that's where it lives now. But yeah,
11:31 line profiling is another, I often use them complimentary sort of tools. As I zero in on
11:35 one function that's with CProfile, and then I'll go line profile that function.
11:39 Oh, interesting. Yeah. Okay. - Drill in further.
11:41 Yeah. Like, okay, this is the general area. Now let's really focus on it.
11:45 Memray is another one. I talked to the folks from Bloomberg about that.
11:48 Oh, okay. I have not used this one. Yeah. This is a pretty new one and it's quite neat the way
11:53 it works. It's, yeah, this one actually tracks C and C++ and other aspects of allocations as well.
12:01 So one of the problems you can run into with profiling is especially memory profiling,
12:06 I think, although if you just want to know about memory, but the more you monitor it,
12:09 the more it becomes kind of a Heisenberg quantum mechanics type thing. Once you observe it,
12:14 you change it. And so the answers you get by observing it are not actually what are happening.
12:19 So you got to keep a little bit of an open mind towards that as well. Right?
12:23 Yeah. And that's even a risk with, you know, other, the compute side of the profiling is
12:27 some, you're using some compute time to actually observe the program, which means that it can,
12:32 and these tools try to subtract out that bias, but it does impact things. And so you may want
12:38 to have kind of a benchmark that you can run as your kind of real source of truth that you run
12:43 without the profiler turned on just to see a final run time run with the profiler to break it down.
12:48 And then when you're all done, you're going to want to run that program again with the profiler off
12:52 to see if you've actually improved it while clock time wise.
12:55 Yeah. Yeah, absolutely. That's a really good point. Maybe do a percent time it type of thing,
13:00 something along those lines. Okay. That was a little bit of a side deep dive into profiling
13:05 because you, before you apply some of these techniques like Numba and others, you certainly
13:11 want to know where to apply it. And part of that is you might need to rewrite your code a little
13:16 bit to make it more optimizable by Numba or these things. So first of all, like, what do you do to
13:23 use Numba? Right. It's just, you just put a decorator on there and off you go.
13:27 At the very simplest level, Numba's interface is supposed to be just one decorator. Now there's
13:31 some nuance obviously and other things you can do, but we tried to get it down to, for most people,
13:35 it's just that. And the end in NGIT means no Python, meaning we get, this code is not calling
13:41 the Python interpreter anymore at all. It is purely machine code, no interpreter access.
13:46 Interesting. Okay. So some of these like compile, do a thing and compile your Python code to
13:52 machine instructions. I feel like they still interact with like Py object pointers and they
13:57 still kind of work with the API of the Python data types, which is nice, but it's a whole,
14:06 whole lot slower of an optimization than now it's int32 and it's, you know, float32 and these are
14:12 on registers, you know? Yeah. And this is part of the reason why Numba focuses on numerical code
14:17 is that NumPy arrays and actually other arrays and PyTorch and other things that support the
14:21 buffer protocol, the Py, so really when Numba compiles this, it compiles sort of two functions.
14:27 One is a wrapper that handles the transition from the interpreter into no Python land, as we call
14:31 it. And then there's the core function that is kind of like you could have written in C or Fortran
14:35 or something. And that wrapper is actually doing all the Py object stuff. It's reaching in and
14:40 saying, ah, this, this integer, I'm going to pull out the actual number and oh, this NumPy array,
14:44 I'm going to reach in and grab the data pointer and pass those down into the core where the actual,
14:49 all the math happens. So the only time you interact with the interpreter is really at the edge.
14:54 And then once you get in there, you try not to touch it ever again. Now, Numba does have a feature
14:58 that we added some years ago called an object mode block, which lets you in the middle of your
15:02 no Python code, go back and actually start talking to the interpreter again. Right. Maybe use a
15:07 standard library feature or something. Yeah. The most common use we've seen is you like,
15:11 you want a progress bar to update or something that's not in your, you know, hot loop. You
15:15 don't want it. Right. You don't want to be going back to the interpreter in something that's really
15:18 performance critical, but inside of a function, you might have parts that are more or less,
15:22 you know, one out of a million iterations. I want to go update the progress bar or something
15:25 that's totally valid. And you can do that with Numba. That is, there's a way to get back to the
15:29 interpreter if you really need to. Okay. Yeah. And it says it takes and translates Python functions
15:34 to optimize machine code at runtime, which is cool. So that makes deploying it super easy
15:39 and you don't have to have like compiled wheels for it and stuff using industry standard LLVM
15:44 compilers and then similar speeds to C's in Fortran. Yeah. Which is awesome, but also has
15:52 implications if I can speak. For example, when I came to Python, I was blown away that I could
16:00 just have integers as big as I want. If I keep adding to them, they just get bigger and bigger
16:05 and like billions, bazillions of, you know, bits of accuracy. And it came from C++ and C# and where
16:14 you explicitly said it's a, it's an N32, it's an N64, it's a double. And these all had ranges of
16:20 valid numbers. And then you got weird like wraparounds and maybe you create an unsigned
16:24 one so you can get a little bit bigger. I suspect that you may fall victim or be subjected to these
16:32 types of limitations without realizing them in Python, if you add and get it because you're
16:37 back in that land. Right. Or, or does it do you guys magic to allow us to have big.
16:42 We do not handle the big integer, which is what you're describing as sort of that
16:46 integer that can grow without bound because our target audience is very familiar with NumPy.
16:50 NumPy looks at numbers sort of the way you're, you described from C++ and other languages.
16:55 The D type and all that stuff, right? Yeah. NumPy arrays always have a fixed
16:59 size integer and you get to pick what that is, but it has to be 8, 16, 32, 64. Some machines
17:04 can handle bigger, but that it is fixed. And so once you've locked that in, you can't over,
17:10 if you, if you go too big, you'll just wrap around and overflow. Yeah. So that limitation
17:13 is definitely present again in Numba, but fortunately NumPy users are already familiar
17:17 with thinking that way. So it isn't an additional constraint on them too much.
17:22 This portion of talk Python to me is brought to you by Posit, the makers of Shiny, formerly RStudio
17:29 and especially Shiny for Python. Let me ask you a question. Are you building awesome things?
17:35 Of course you are. You're a developer or data scientist. That's what we do. And you should
17:39 check out Posit Connect. Posit Connect is a way for you to publish, share, and deploy all the
17:45 data products that you're building using Python. People ask me the same question all the time.
17:50 Michael, I have some cool data science project or notebook that I built. How do I share it with my
17:55 users, stakeholders, teammates? Do I need to learn FastAPI or Flask or maybe view or react JS?
18:02 Hold on now. Those are cool technologies and I'm sure you'd benefit from them, but maybe stay
18:07 focused on the data project. Let Posit Connect handle that side of things. With Posit Connect,
18:11 you can rapidly and securely deploy the things you build in Python. Streamlit, Dash, Shiny,
18:17 Bokeh, FastAPI, Flask, Quarto, Ports, Dashboards, and APIs. Posit Connect supports all of them.
18:24 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise
18:29 requirements. Make deployment the easiest step in your workflow with Posit Connect. For a limited
18:35 time, you can try Posit Connect for free for three months by going to talkpython.fm/posit.
18:40 That's talkpython.fm/posit. The link is in your podcast player show notes.
18:46 Thank you to the team at Posit for supporting Talk Python.
18:49 And then one thing you said was that you should focus on using arrays.
18:55 Yes.
18:56 And that kind of data structures before you apply Numba JIT compilation to it. Does that mean
19:03 list as in bracket or these NumPy type vector things? We all have different definitions.
19:10 Yes. That's true. Array to Bay. Generally, yeah. Usually the go-to I talked about is a NumPy array.
19:15 So it has a shape. The nice thing, NumPy arrays can be multidimensional. So you can represent a
19:20 lot of complex data that way. But within an array, there's a fixed element size. That element could
19:26 be a record. So if you want for every cell in your array to store maybe a set of numbers or
19:30 a pair of numbers, you can do that with custom D types and things. And Numba will understand that.
19:35 That's the ideal data structure. Numba does have, we added support a couple of years ago
19:40 for other data structures because the downside to a NumPy array is that it's fixed size. Once you
19:45 make it, you can't append to it like you can a Python list. So Numba does have support for both
19:51 what we call typed lists and typed dictionaries. So these are sort of special cases of lists and
19:56 dictionaries in Python where the, in the case of a list, every element in the list has the same type
20:02 or in the case of a dictionary, the keys are all the same type and the values are all the same type.
20:06 And those cover a lot of the cases where, you know, when users want to make things where they
20:11 don't know how long it's going to be, you're going to append in the algorithm. A list is a much more
20:16 natural thing than a NumPy array where you might like over-allocate or something that seems.
20:19 And dictionaries, our dictionary implementation is basically taken straight from CPython's
20:24 dictionary implementation. So it's very tuned and very fast in the same way CPython's is.
20:29 We just had to modify it a little bit to add this type information, but it's really good for kind
20:34 of lookup random items kind of stuff. So those are available as additional data structures
20:39 in addition to the array. And to use those, I would say from Numba import and something like this.
20:44 There are new type in the, in the docs, I'll show you, you can sort of import a typed list as a
20:49 special class that you can create. The downside by the way, is that, and the reason we have those,
20:55 and we don't just take, historically Numba used to try and let you pass in a Python list,
21:00 is that wrapper function would have to go recursively through the list of list of lists
21:04 of whatever you might have and pop out all of the elements into some format that wasn't all
21:09 Py objects so that the no Python code could manipulate them quickly. And then how do you
21:14 put it all back if you modify that sort of shadow data structure? And so what we realized is that
21:19 was confusing people and actually added a lot of overhead and calling functions took too long.
21:23 So we instead went up a level and said, okay, we're going to make a new kind of list that you
21:27 at the interpreter level can opt into for your algorithm. And so accessing that list from Python
21:32 is slower than a Python list, but accessing it from Numba is like a hundred times faster.
21:36 So you kind of have to decide for the, while I'm in this mode, I'm optimizing for numbers
21:43 performance, not for the Python interpreter performance. Which is reasonable often I'd
21:47 imagine because this is the part you found to be slow.
21:49 Yeah. That's the trade-off you make. And so, yeah. So we would not suggest people use type
21:54 list just in random places in their program. It's really intended to be used. Yeah.
21:59 I heard this is fast. So we're just going to replace all like new rule bracket bracket is
22:04 disallowed. We're not using this one, right? Yeah. When you're working with Python objects,
22:08 Python's data structures can't be beat. They are so well-tuned that it's very,
22:13 very hard to imagine something that could be faster than them.
22:16 All right. So maybe one more thing on Numba we could talk about is so far, I imagine people have
22:22 been in their mind thinking of, I have at least running on CPUs, be that on Apple Silicon or
22:28 Intel chips or AMD or whatever, but there's also support for graphics cards, right?
22:34 Yes. Yeah. So for a very long, I mean, we've had this again for 10 plus years. We were very early
22:40 adopters of CUDA, which is the programming interface for NVIDIA GPUs. CUDA is supported
22:45 by every NVIDIA GPU, whether it's a low end gamer card or a super high end data center card,
22:50 they all support CUDA. So that was really nice for people who were trying to get into GPU
22:54 programming. You could use inexpensive hardware to learn. And so on both Windows and Linux,
22:59 Macs don't have NVIDIA GPUs for a long, long time now, but on Windows and Linux, you can
23:04 basically write what they call a CUDA kernel in pure Python. And it just, you know, and you can
23:10 pass up, you know, arrays, either NumPy arrays, which then have to be sent to the card or special
23:15 GPU arrays that are already on the card. That is a great way for people to learn a bit more about
23:20 GPU programming. I will say Numba might not be the best place to start with GPU programming in
23:25 Python because there's a great project called Cupy, C-U-P-Y, that is literally a copy of NumPy,
23:33 but does all of the computation on the GPU. And CUPY works great with Numba. So I often tell
23:38 people, if you're curious, start with CUPY, use some of those NumPy functions to get a sense of,
23:44 you know, when is an array big enough to matter on the GPU, that sort of thing. And then when you
23:48 start wanting to do more custom algorithms, Numba is where you kind of turn to for that second
23:53 level. Yeah. So I feel like I'm referencing a lot of Atamar's work over here, but what if we
24:01 didn't have a NVIDIA GPU? Is there anything we could do? Yeah. So there are other projects. So
24:06 things like, as I mentioned here, like PyTorch and things are, have been ported to a number of
24:11 different backends. This is one thing the Numba team, we are frequently talking about is how do
24:16 we add non-GPUs or non-NVIDIA GPU support, but it's, I don't have an ETA on that. That's something
24:21 that we just still are kind of thinking about, but PyTorch definitely. And you can use PyTorch as an
24:27 array library. You don't have to be doing machine learning necessarily. You can use it for fast
24:32 arrays. It's just most popular for, because it supports, I mean, JAX is a very similar thing
24:36 because it adds the extra features you want for those machine learning models, but at the core
24:40 of every machine learning model, it's just array math. And so you could choose to just do that if
24:44 that's what you want. And then you could even still pass those arrays off to Numba at some point in
24:47 the future. Yeah. I didn't realize there was integration with that as well. Yeah. A while
24:55 back, we kind of worked with a number of projects to define a GPU array interface that's used by a
24:59 number of them so that we can see each other's arrays without having to copy the data, which is
25:03 very helpful. Yeah. Yeah. We have a lot more topics of the number, but I'm still fascinated
25:06 with it. So, so, you know, one of the big, all the rages, all the rage now is the vector databases,
25:13 obviously, because I want to query my way through LLM outputs. Like where in a hundred thousand
25:20 dimensional space does this question live or whatever? Is there any integration with that
25:25 kind of stuff back into Numba? Numba, not directly, although Numba does have interfaces,
25:30 oh, an easy way to call out to C functions. So a lot of these vector databases are implemented in,
25:36 you know, C or C++ or something. And so if you did have a use case where you needed to call out
25:40 to one of them, if there was a C function call to make directly to sort of the underlying library
25:45 that bypassed the interpreter, you can do that from Numba. And so I haven't seen anyone do that
25:49 yet, but it's, it's a generic sort of C interface. Yeah. Maybe there's a database driver written in
25:55 C, in which case, I don't know all the different databases. I know there are some that are
25:58 specifically built for it. Maybe DuckDB has got something going on here, but also MongoDB has,
26:04 has added vector stuff to it. And I know they have a C library as well. Yeah. I've looked at Lance DB
26:09 is one I've seen mentioned by used by a couple of projects. That's just for vector stuff. It
26:14 doesn't do anything else. Lance DB. Lance DB. Okay. I heard about it in the context of another Python
26:20 LLM project. Well, that's news to me, but it is a developer friendly open source database for AI.
26:26 Okay. Brilliant. All right. Well, like I said, we have more things to talk about.
26:30 So many things, but this is super. Okay. One more thing I want to ask you about here before we go
26:37 on, this has a Cython feel to it. Can you compare and contrast Numba to Cython? So Cython uses,
26:44 sort of requires you to put in some type information in order to be able to generate C
26:48 code that is more efficient. Numba is mainly focused on type inference. So we try to figure
26:55 out all the types in your function based on the types of the inputs. And so in general, we,
27:00 although Numba has places where you can put type annotations, we generally discourage people from
27:05 doing it because we find that it adds work and is error prone and doesn't really help the performance
27:10 in the end. Numba will figure out all the types directly. And so when it JIT compiles it, if it
27:15 comes in, if you call it twice with different types, does it just say, well, now we're going
27:19 to need a, this version of the function underscore strings list rather than integers list or
27:24 something? Yeah. Yeah. Every Numba compiled function actually contains a dispatcher that
27:28 will look at the argument types and pick the right one. And it's at pretty high granularity.
27:32 For example, people who are familiar with multidimensional arrays and like Fortran and C
27:37 know that they lay out the rows and columns in a different order, which has impact on how you do
27:42 loops and stuff for, for kind of to maximize locality. Numba can tell the difference between
27:46 those two cases and will generate different code for those two cases. So this is stuff that you as
27:52 the user don't want to even know. No, you don't want to worry about that. That's a whole another
27:56 level. So you were like, okay, well, if it's laid out in this order, it's probably this, it appears
28:02 on the L, you know, the local cache for the CPU in this way. And so if we loop in that direction,
28:07 we'll like iterate through the cache instead of blow through it every loop or something like that.
28:11 Basically, we want to make sure that LLVM knows when the step size is one, and that's either on
28:16 the row or the column access, depending on that. And because, because I mean, compilers in general
28:20 are magic. Like we are grateful that LLVM exists because they can do so many tricks at that level.
28:26 I mean, because I mean, this is the same thing that powers Clang and other stuff. So, you know,
28:29 all of, all of macOS compilers are built on LLVM. And so we can leverage all of the tricks
28:35 they've figured out in decades of development. Yeah, that's cool. And Python itself is compiled
28:40 with that, at least on macOS. I just saw it last night, you know, that I have Clang some version,
28:46 whatever. And I was just looking at the version for my, my Python. It was compiled with that.
28:49 Cool. Okay. So we've been on the Numba JIT. Anthony Shaw wrote an article, Python 3.13
28:57 gets a JIT. And this is a pretty comprehensive and interesting article on, on what's going on
29:03 here. What's your experience with this, this JIT coming to Python 3.13?
29:07 This is a, and they've definitely tried to set expectations here that this first release is
29:11 really planting a flag is to say, we're going to start building on top of this base. And so
29:17 as far as I've seen the benchmarks for 3.13 are not going to be like the world on fire kind of
29:22 stuff. Throwing away our C code and rewriting operating systems and JIT-ed Python.
29:27 But you have to take a first step. And this is honestly pretty, it impressed me because
29:32 as a library, we can take a lot of risks and do things that I know we can depend on LLVM. We can
29:37 do all sorts of stuff that may be not work for everyone because if Numba doesn't solve your
29:41 problem, you just don't use it. You can just leave it out of your environment. You don't have to
29:45 install it. And so it's easy to kind of us to zero in on just the problems that we're good at and
29:50 say, if there, you know, if this doesn't solve your problem, just leave us out. When you actually
29:54 are putting a JIT into the core interpreter, everyone gets it. And you have to consider,
29:59 and Python is so broad that, you know, you could grab two Python experts and they may actually have
30:05 nothing in common with each other. And they, but are both equal claim to being experts at using
30:10 Python, but they might use them in different domains and have different libraries they care
30:14 about and all of that. - I feel that way when I talk to Canvas people and I think about me doing like web development, APIs and stuff. I'm like,
30:22 I think I'm really good at Python generally, you know, good enough to write good apps.
30:26 But then I look at this, I'm like, I don't even really, this, some of these like expressions that
30:31 go into the filter brackets. I'm like, I didn't even know that that was possible or really how
30:37 to apply, you know, like, it's just, it's weird to both feel really competent and understand it,
30:41 but also have it kind of no idea. And what I think what you're getting at is those are two
30:46 really different use cases and they're getting the same JIT and it has to work for both of them. But
30:50 you know, combinatorically explode that problem, right?
30:53 - Yeah. And you know, all the different hardware, I mean, Numba supports a lot of different
30:56 computers, but not everyone that Python supports. - Like micro Python.
31:00 - Yeah. Or we don't work on, you know, HP UX or anything like that necessarily. Python has an
31:06 enormous support, range of supported platforms, an enormous set of use cases and anything you do
31:11 is going to affect everyone. So this approach, which I would say this copy and patch JIT
31:15 approach is really clever because one of the, you know, again, Numba has to bring, we build a,
31:20 you know, custom version of LLVM. It's a little stripped down, but it's mostly still there.
31:23 So we have to bring that along for the ride. That's a heavy, heavy dependency to put on the
31:27 core interpreter for everyone. So the clever bit here is they figured out how to have a JIT,
31:33 but still do all the compiler stuff at build time. So when you build this copy and patch JIT,
31:38 you actually need LLVM, but only at build time and then it can go away. And so the person who
31:42 receives the interpreter doesn't need LLVM anymore. And so they basically built for themselves a
31:46 bunch of little template fragments. This is the patching part is basically you're saying,
31:51 I've got a bunch of fragments that implement different op codes in the bytecode different
31:55 ways, and I'm going to string them together and then go in. And there's a bunch of fill in the
32:00 blank spots that I can go in and swap in the, okay, you get your value from here and then you
32:06 put it over here and all that. But the, but the actual machine code was generated by LLVM by the
32:10 person who built Python in the first place. I see. It just amazes me this works. And I'm,
32:15 I'm excited to see where they go with it because it was a clever way to avoid
32:19 adding a huge heavy dependency to Python. Let's start to get some of that JIT benefit.
32:23 - So it looks at some of the common patterns. I see, okay, we're looping over a list of loads
32:30 or something and replaces that with more native code or something along those lines.
32:35 - Yeah. Yeah. You essentially have a compiler print a bunch of little recipes that are,
32:40 if I do this pattern sub in this machine code, fill in these blanks, and it just have a table of them. So it's the challenge there is that there is a combinatorial
32:48 explosion again of how many different, a full blown compiler like LLVM has a bunch of rules.
32:55 It's rule-based. And so it's saying, if I see this pattern, I do this replacement and it keeps doing
33:00 all of this. And then at the end it says, okay, now I'm going to generate my machine code from
33:03 that, those transformations. If I don't have LLVM at runtime, I have to kind of figure out what are
33:08 the best templates up front, put them in this table. And then, and so there, this is where
33:13 honestly looking at different usage patterns will probably be a huge help is in practice,
33:18 you could have any sequence of byte codes, but in reality, you're probably going to have certain
33:21 ones a lot. And those are the ones you want to focus on. So I think once we, I don't know,
33:26 once we start getting this out in the community and start getting feedback on it, I'm curious to
33:29 see how rapidly it can evolve. That'll be really interesting. - Yeah. And this whole copy and
33:33 patch and Jit is we often hear people say, I'm a computer, I have a computer science degree.
33:38 And I think what that really means is I have a software engineering degree in, or I am a software
33:43 engineering person. They are not often creating new science, computer science of theories. They're
33:51 more like, I really understand how operating system works and programmers and compilers. And I,
33:55 I write JSON API. So I talk to databases. This is like true new research out of legitimate computer
34:02 science, right? This copy and patch, Jit. - Yeah. They mentioned, I mean, they cite a paper from
34:05 2021 and in computer science, going from paper to implementation in one of the most popular
34:11 languages on earth in three years seems pretty fast. - It does seem pretty fast, right? It
34:16 definitely seems pretty fast. And the reason I bring this up is I imagine that dropping it into
34:21 one of the most popular programming languages with a super diverse set of architectures and
34:26 use cases will probably push that science forward. - Yeah. This will be tested in one of the most
34:32 intense environments you could imagine. - You know, I mean, whatever they did for their,
34:36 their research paper or their dissertation or whatever, this is another level of putting it
34:41 into a test and experimentation to put it into Python. Yeah. Wild. Okay. Hopefully this speeds
34:47 things up. This is going to be interesting because it just happens, right? It just happens.
34:51 If you have Python 3.13, it's going to be looking at its patterns and possibly swapping out.
34:55 - This is complimentary to all of the techniques. The faster CPython folks have been doing all
35:00 along for many releases now is they've been looking at other ways to speed up the interpreter
35:05 without going all the way to a full blown compiler, which this is kind of getting you the final step.
35:10 So that's again, another interesting place is how does this compliment those? You know,
35:15 I don't know those, those details, but it's another tool in the toolbox to sort of go back
35:19 to the beginning. It's speed is about having a bunch of tools and you kind of pick up 5% here
35:24 and 5% there and you pile up enough 5% and pretty soon you have something substantial.
35:29 - Yeah, I absolutely, that was the plan, right? The faster CPython was to make it
35:33 multiples of times faster by adding 20% improvements, release over release,
35:38 over release and compounding percentages basically. The one thing I don't know about this
35:43 is we've had the specializing adaptive interpreter that was one of those faster CPython things that
35:49 came along. You know, is this just the next level of that or is this a replacement for that? I don't
35:54 know. I'm sure people can. - Yeah. I don't know. I don't know what the, what their roadmap is for that. Cause I think part of this is, this is so new. I think
36:00 they got to see how it works in practice before they start figuring out.
36:04 - I agree. It feels like an alternative to that specialized and adaptive interpreter,
36:08 but I don't know. Maybe some of the stuff they've learned from one made it made possible
36:12 or even as just an extension of it. Okay. What did we want to talk about next year? I think.
36:16 - You want to talk about threads? - No, let's talk. I want to talk about,
36:19 I want to talk about Rust really quick before we talk about then, because that'll be quick.
36:23 And then I want to talk about threads because threads will not be as quick. And it's super
36:27 interesting. It's been a problem that people have been chipping at for, for years and years and
36:33 years, the threads thing. But what do you think about all this Rust mania? I mean, it's shown
36:38 some real positive results, things like rough and a Pydantic and others, but it's actually a little
36:44 bizarrely controversial or maybe not bizarre, non-obviously controversial.
36:48 - Yeah. I mean, my take on the Rust stuff is I view it in the same light as when we use C and
36:52 Fortran historically, it's just Rust is a nicer language in many ways. And so being a nicer
36:57 language means it's certainly, you know, you could have taken any of these things and rewritten them
37:02 in C a long time ago and they would have been faster. You just didn't want to write that C code.
37:06 - Exactly. You know what? We could do this in assembler and you would fly guys.
37:11 - Yeah. So Rust is moving things, Rust lowering the bar to saying, okay, maybe I'll implement
37:18 the core of my algorithm outside of Python entirely. It's interesting. And honestly,
37:22 I would happily see Rust completely replace C as the dominant extension language in Python.
37:28 The trade-off here, and this is one of those things that's sometimes easy to forget again,
37:32 because the Python community is so diverse, is when you do switch something to Rust, you do
37:37 reduce the audience who can contribute effectively in some cases. That Python,
37:43 using Python to implement things has a benefit for the maintainers if it lets them get more
37:48 contributions, more easily onboard new people. I hear this a lot actually from academic software
37:55 where you have this constant rotating, you know, students and postdocs and things. And so how
38:00 quickly you can onboard someone who isn't a professional software developer into a project
38:03 to contribute to it is relevant. And so, but I think it's different for every project. There
38:09 are some things like, you know, again, Rust and cryptography makes total sense to me because
38:14 that's also a very security conscious thing. You really don't want to be dealing with C buffer
38:18 overflows in that kind of code. And so the guarantees Rust offers are valuable also.
38:23 Well, and I think that that's also makes sense, even outside of just security directly,
38:29 you're going to build a web server. It's a nerve wracking thing to run other people's code
38:33 on an open port on the internet. And so this is better. One of the things I switched to is,
38:39 I recently switched to Granian for a lot of my websites, which is a Rust HTTP server. It's
38:46 comparable in performance, slightly faster than other things, but it's way more,
38:50 it's deviation from its average is way, way better.
38:55 So it's just more consistent.
38:57 More consistent, but also, you know, like the average, for example, the average, where's the
39:02 versus third-party server. That's the one I want. So for against a micro-WSGI, for example,
39:07 right. It's six milliseconds versus 17 milliseconds. So like, whatever. But then
39:12 you look at the max latency is 60 versus three seconds. It's like, oh, hold on. Right. But the
39:16 fact is written in Rust, I think feels it's a little bit of extra safety, all other things
39:21 being equal. Right. And that, I mean, obviously a lot of caveats there.
39:25 Yeah. Actually the interesting point about, and this is not unique to Rust, this is again,
39:29 the same problem with C and other things is that it's a little bit interesting. On one hand,
39:32 we're pushing the Python interpreter and JITs and all this other stuff at the same time as you're
39:36 thinking about whether to pull code entirely out of Python. And it creates a barrier where the JIT
39:42 can't see what's in the Rust code. And so if there was an optimization that could have crossed that
39:46 boundary, it's no longer available to the compilers. Yeah. This is a problem the Numba team
39:51 has been thinking about a lot because our number one request, aside from, you know, other GPUs,
39:57 is can Numba be an ahead of time compiler instead of a just-in-time compiler? And we were like,
40:02 superficially, yes, that's straightforward. But then we started thinking about the user experience
40:06 and the developer experience. And there are some things that you lose when you go ahead of time
40:10 that you have with the JIT and how do you bridge that gap? Yeah, it gets tricky.
40:14 We've been trying to figure out some tooling to try and bridge that. So we, at SciPy, we did a
40:19 talk on a project we just started called Pixie, which is a sub-project of Numba that is trying to,
40:24 which doesn't have Rust support yet, but that's been one of the requests. So if you go to
40:28 github.com/numba/pixie, see if they've indexed it. Oh, they're perfect. Okay.
40:33 Search engines. Search engines are pyramidic.
40:36 They really are.
40:36 But yeah, but Pixie, we gave a talk about it at SciPy. It's very early stages,
40:42 but what we're trying to do is figure out how to, in the ahead of time compilation,
40:45 whether that's C or Rust or even Numba, eventually, capturing enough info that we can
40:51 feed that back into a future JIT so that the JIT can still see what's going on in the compiled code
40:56 as kind of a future-proofing ecosystem.
40:58 Yeah, that's cool. I know some compilers have profiling-based optimization type things.
41:04 Like you can compile it with some instrumentation, run it, and then take that output and feed it back
41:09 into it. And I have not, I don't know if I've ever practically done anything with that, but I'm like,
41:14 "Oh, that's kind of a neat idea to like, let it see what it does and then feed it back."
41:17 Is this sort of like that or what do you think?
41:19 Yeah, this is different. This is sort of, this is basically capturing in the library file. So
41:24 you compiled ahead of time to a library, capturing the LLVM bitcode so that you could pull it out and
41:29 embed it into your JIT, which might be have other LLVM bit codes. So then you can optimize, you can
41:35 have a function you wrote in Python that calls a function in C and you could actually optimize
41:39 them together, even though they were compiled at different times, implemented in different
41:43 languages, you could actually cross that boundary.
41:45 One's like a head of time, just standard compilation. And one is like a JIT thing,
41:49 but it's like, "Oh, we're going to click it together in the right way." Yeah, yeah. Because JITs are nice in that they can see everything that's going on,
41:55 but then they have to compile everything that's going on and that adds time and latency and
41:59 things. And so can you have it both ways? Is that's really what we're trying to do.
42:03 It's nice when you can't have your cake and eat it too, right?
42:05 Yes.
42:06 My cake before my vegetables and it'd be fine.
42:08 I said that this Rust thing was a little bit controversial. I think there's some just,
42:14 "Hey, you're stepping on my square of Python space with a new tool." I don't think that has
42:20 anything to do with a Rust per se. It's just somebody came along and made a tool that is now
42:25 doing something maybe in better ways, or I don't know. I don't want to start up a whole debate
42:29 about that. But I think the other one is what you touched on is if we go and write a significant
42:34 chunk of this stuff in this new language, regardless what language it is, Rust is a
42:39 relatively not popular language compared to others. Then people who contribute to that,
42:44 either from the Python side, we're like, "Well, there's this big chunk of Rust code now that I
42:47 don't understand anything about, so I can't contribute to that part of it." And you might
42:52 even say, "Well, what about the pro developers or the experienced core developers and stuff?
42:57 They're experienced in pro at C in Python, which is also not Rust, right? It's this new area that
43:03 is more opaque to most of the community, which I think that's part of the challenge.
43:08 Yeah. Some people like learning new programming languages and some don't. So on some hand,
43:12 Rust can be, "This is a new intellectual challenge and it fixes practically some
43:16 problems you have with C." Or in other cases, it's the, "I wanted to worry about what this
43:20 project does and not another programming language." Right, right, right.
43:23 Kind of have to look at your communities and decide what's the right trade-off for you.
43:26 Maybe in 10 years, CPython will be our Python and it'll be written in Rust. I mean,
43:32 if we move to WebAssembly and like PyScript, Pyodide, Land a lot, having that right in,
43:36 there's a non-zero probability, but it's not a high number, I suppose. Speaking of something
43:42 I also thought was going to have a very near zero probability. PEP 703 is accepted. Oh my
43:49 goodness. What is this?
43:50 Yeah. So this was, again, a couple of years ago now, or I guess a year ago, it was finally
43:54 accepted. So for, since very long time, the Python interpreter has, because again, threads
44:00 are an operating system feature that let you do something in a program concurrently. And
44:05 now that all of our computers have four, eight, even more cores, depending on what kind of
44:11 machine you have, even your cell phone has more than one core. Using those cores requires
44:16 you have some kind of parallel computing in your program. And so the problem is that you
44:21 don't want, once you start doing things in parallel, you have the potential for race
44:25 conditions. You have the two threads might do the same thing at the same time or touch
44:30 the same data, get it inconsistent. And then your whole program starts to crash and other
44:34 bad things happen. So historically, the global interpreter lock has been sort of sledgehammer
44:39 protection of the CPython interpreter. But the net result was that threads that were
44:44 running pure Python code basically got no performance benefits. You might get other
44:48 benefits. Like you could have one block on IO while the other one does stuff. And so
44:52 it was easier to manage that kind of concurrency. But if you were trying to do compute on two
44:56 cores at the same time in pure Python code, it was just not going to happen because every
45:01 operation touches a Python object, has to lock the interpreter while you make that
45:04 modification.
45:05 Yeah. You could write all the multi-threaded code with locks and stuff you want, but it's
45:08 really just going to run one at a time anyway. A little bit like preemptive multi-threading
45:13 on a single core CPU. I don't know, it's weird. I've added all this complexity, but
45:17 I haven't got much out of it.
45:18 The secret of course, is that if your Python program contained not Python, like C or Cython
45:23 or Fortran, as long as you weren't touching Python objects directly, you could release
45:27 the GIL. And so Python, so especially in the scientific and computing and data science
45:31 space, where multi-threaded code has been around for a long time and we've been using
45:35 it and it's fine, Dask, you can use workers with threads or processes or both. And so
45:40 I frequently will use Dask with four threads and that's totally fine because most of the
45:44 codes in NumPy and Pandas, that release the GIL. But that's only a few use cases. And
45:48 so if you want to expand that to the whole Python interpreter, you have to get rid of
45:52 the GIL. You have to have a more fine-grained approach to concurrency. And so this proposal
45:58 from Sam Gross at Meta was basically a, one of many historical attempts to kind of make
46:05 that, get rid of that global interpreter lock. Many have been proposed and failed historically.
46:09 So getting this all the way through the approval process is a real triumph. At the point where
46:15 it was really being hotly contested, my, you know, maybe slightly cynical take is we have
46:20 between zero and one more chance to get this right in Python. Either it's already too late
46:26 or this is it. And I don't know which it is. I think there were two main complaints against
46:32 this change. Complaint number one was, okay, you theoretically have opened up a parallel
46:38 compute thing. So for example, on my Apple M2 Pro, I have 10 cores, so I could leverage
46:45 all of those cores, maybe get a five times improvement. But single core regular programming
46:52 is now 50% slower. And that's what most people do and we don't accept it. All right. That's
46:58 the one of the sides, you know, the Gilectomy and all that was kind of in that realm, I believe.
47:02 The other is yet to be determined, I think, is much like the Python two to three shift.
47:08 What the problem with Python two to three wasn't that the code of Python changed. It was that
47:13 all the libraries I like and need don't work here. Right. And so what is going to happen
47:18 when we take half a million libraries that were written in a world that didn't
47:23 know or care about threading and are now subjected to it?
47:26 Yeah. And there's sort of two levels of problem there. There's one that there's work that has
47:30 to be done to libraries. It's usually with C extensions that, you know, didn't assume that,
47:34 you know, they assumed a global interpreter lock and they'll have to do some changes to change that.
47:38 But the other one is a much more kind of cultural thing where the existence of the
47:43 GIL just meant that Python developers just wrote less threaded code.
47:46 Yeah. They don't think about locks. They don't worry about locks. They just assume
47:49 it's all going to be fine.
47:50 Because again, the race condition doesn't protect threaded code from, the GIL doesn't protect thread code from race conditions, but it just protects the
47:57 interpreter from race conditions. So you and your application logic are free to make all
48:01 the thread mistakes you want. But if no one ever ran your code in multiple threads, you would never
48:05 know. And so we're going to have to face that now.
48:08 I think that's a super interesting thing that it's a huge cultural issue that just people don't think about it. Like I said, I used to do a lot of C++ and C#.
48:16 And over there, it's, you're always thinking about threading. You're always thinking about,
48:19 well, what about these three steps? Does it go into a temporarily invalid state? Do I need to
48:23 lock this? Right. And like C# even had literally a keyword lock, which is like a context manager.
48:28 You say lock curly brace and everything in there is like into a lock and out of like,
48:32 cause it's just so part of that culture. And then in Python, you kind of just forget about
48:37 it and don't worry about it. But that doesn't mean that you aren't taking multiple, like
48:41 five lines of Python code. Each one can run all on its own, but taken as a block, they may still
48:47 get into these like weird states where if another thread after three lines observes the data,
48:52 it's still busted. Right. It's just the culture doesn't talk about it very much.
48:57 Yeah. No one ever runs your code in multiple threads. All of those bugs are theoretical.
49:01 And so it's now what's going to shift is, you know, all of those C extensions will get fixed
49:07 and everything will be, you know, they'll fix those problems. And then we're gonna have a
49:10 second wave of everyone seeing their libraries used in threaded programs and starting to discover
49:16 what are the more subtle bugs? Do I have global state that I'm not being careful with? And it's
49:21 gonna it's going to be painful, but I think it's necessary for Python to stay relevant
49:25 into the future. I'm a little worried. I mean, one of the common questions we hear is sort of
49:29 why is this what multiprocessing is fine? Why don't we do that? And definitely,
49:34 multiprocessing is big challenge is processes don't get to share data directly. So either I,
49:40 you know, if even if I have like read only data, I might if I have to load two gigabytes of data
49:44 in every process, and I want to start 32 of them because I have a nice big computer.
49:48 I've just 32 X my data, my memory usage, just so that I can have multiple concurrent computations.
49:56 Now there are tricks you can play on things like Linux, where you load the data once and the rely
50:00 on forking to preserve pages of memory. Linux does cool copy on write stuff when you fork,
50:06 but that's like fragile and not necessarily going to work. And then the second thing,
50:09 of course, is if any of those have to talk to each other. Now you're talking about pickling
50:13 objects and putting them through a socket and handing them off. And that is again,
50:17 for certain kinds of applications, just a non-starter. Yeah, but then people just start
50:21 going, well, we're just going to rewrite this in a language that lets us share pointers.
50:24 Yeah. Or at least memory in process. Yeah. Yeah. There's again, there are a lot of Python users
50:28 where this, they don't need this. They don't care about this. This will never impact them.
50:32 And then there's a whole class of Python users who are desperate for this and really, really want it.
50:38 Sure. I, you know, my, I think there's a couple of interesting things here.
50:43 One, I think that this, I think this is important for stemming people leaving. I thought, I actually
50:50 don't hear this that much anymore, but I used to hear a lot of we've left for go because we
50:56 need better parallelism or we've left for this performance reason. And I don't know,
51:00 that's just a success story of the faster CPython initiative or all the people who had been around
51:05 and decided they needed to leave. They're gone. And they, we just don't hear them anymore because
51:09 they left. It's like, you know, I used to hear him say this at the party, but then they said,
51:13 they're going to leave. And I don't hear anyone say they're leaving. Well, it's because everyone's
51:16 still here. Didn't say that. I don't know, but I do think having this as a capability will be
51:22 important for people to be able to maybe adopt Python where Python was rejected at the proposal
51:29 stage. You know, like, should we use Python for this project or something else? Oh, we, we need
51:33 threading. We need computational threading. We've got, you know, 128 core it's out. Right. And then
51:38 no one comes and complains about it because they never even started that process. Right. So it'll
51:42 either allow more people to come into Python or prevent them for leaving for that same argument
51:47 on some projects. I think that's, that's a pretty positive thing here.
51:50 Yeah. There's, there's, yeah, we, we don't get to count all of the projects that didn't come
51:54 into existence because of the, of the global interpreter lock. It's easy when you're in it
51:59 to sort of adjust your thinking to not see the limitation anymore because you're so used to
52:04 routing around it. You don't even stop and think, oh man, I got to worry about threads. You just
52:08 don't think threads. I totally agree. And I'll give people two other examples that maybe resonate
52:12 more if this, this doesn't resonate with them. It's the, what have I said? Oh, it's a little
52:16 bit challenging to write this type of mobile phone application in Python. Like, well, it's nearly
52:24 impossible to write a mobile phone application in Python. So we're not even focusing on that as an
52:29 issue because no one is, I know, beware and a few other things there, but there's a little bit of
52:34 work. So I don't want to just, I don't want to like, yeah, I'm not trying to talk bad about them,
52:37 but as a community, there's not like, it's not a react native sort of thing or a flutter where
52:42 there's a huge community of people who are just like, and we could do this. And then how do we,
52:46 like, there's just not a lot of talk about it. And that doesn't mean that people wouldn't just
52:50 love to write mobile apps in Python. It's just, it's so far out of reach that it's, it's just a
52:57 little whisper in the corner for people trying to explore that rather than a big den. And I think,
53:02 you know, same thing about desktop apps. Wouldn't it be awesome if we could not have electron,
53:06 but like some really cool, super nice UI thing, that's almost pure Python. It would, but people
53:12 were not focused on it because no one's trying to do it, but no one's trying to do it because
53:16 there weren't good options to do it with. Right. And I think the same story is going to happen
53:20 around performance and stuff with this. Just to jump in, you know, since I have to
53:24 talk about the beware folks, I mean, you've described exactly the reason why we funded
53:28 the beware development is because yeah, if we don't work on that now before people sort of,
53:33 there's a lot of work that has to do before you reach that point where it's easy. And so
53:37 recently the team was able to get sort of tier three support for iOS and Android into CPython
53:43 three 13. So now we're at the first rung of the ladder of iOS and Android support in CPython.
53:48 That's awesome. Pega and briefcase, the two components of beware are really focused again
53:52 on that. Yeah. How do I make apps? How do I make it for desktop and mobile? And so, but it's,
53:56 yeah, we ran into people is that they just didn't even realize you could even think about doing
54:00 that. And so they just, they never stopped to say, oh, I wish I could do this in Python
54:04 because they just assumed you couldn't. And all the people who really needed to,
54:07 like were required to leave the ecosystem and make another choice.
54:12 And it will take the same amount of, I was gonna say, it takes the same amount of time with this.
54:15 Even once threads are possible in Python, it'll take years to shift the perception.
54:19 Yeah. And the probably some of the important libraries. Yeah. Yeah. All right. So I'm pretty
54:24 excited about this. I was hoping something like this would come on. I didn't know what form it
54:27 would be. I said, there were the two limitations, the libraries and the culture, which you,
54:31 you called out very awesomely. And then also the performance in the, this one is either neutral or
54:37 a little bit better in terms of performance. So it doesn't have that disqualifying killing of the,
54:43 the single threaded performance. The person taking care, I will say again,
54:46 because you have to be fairly conservative with CPython, because so many people use it
54:51 is that this will be an experimental option that by default, Python won't turn this on. You will
54:56 have Python 3.13 when you get it, we'll still have the global interpreter lock. But if you build
55:01 Python 3.13 yourself, or you get another kind of experimental build of it, there's a flag now at
55:06 the build stage to turn off the gill. So this is in this mode, they decided to make, you know,
55:11 not have to make double negatives. This is Python in free threading mode. And that will be an
55:16 experimental thing for the community to test, to try out, to benchmark and do all these things
55:21 for a number of years. They've taken a very measured approach and they're saying, we're not
55:25 going to force the whole community to switch to this until it's proven itself out. Everyone's had
55:30 time to port the major libraries, to try it out, to see that it really does meet the promise of
55:36 not penalizing single threaded stuff too much. Yeah. Or breaking the single threaded code too
55:41 much. Yeah. Yeah. The steering council is reserving the right to decide when this becomes, or if this
55:48 becomes the official way for Python, you know, I don't know, 3.17 or something. I mean, it could be,
55:52 it could be several years. And so I just want everyone not to panic. Yeah, exactly. Don't,
55:57 this doesn't get turned on in October. No, and this is super interesting. It's accepted. It only
56:04 appears in your Python runtime if you build it with this. So I imagine, you know, some people
56:10 will build it themselves, but someone also just create a Docker container with Python built with
56:14 this and you can get the free threaded Docker version or whatever. Right. We've already put
56:18 out Conda packages as well. So if you want to build a Conda environment, yeah, actually, if
56:21 you jump over to the, the Py free thread page. Yeah. Tell people about this. Yeah. We didn't
56:25 make this. This is the, the community made this, the scientific Python community put this together.
56:30 And this is a really great resource, again, focused on, you know, that, that community,
56:35 which really wants threading. Cause we have a lot of, you know, heavy numerical computation.
56:39 And so this is a good resource for things like how do you install it? So there's a link there on
56:44 what are your options for installing the free threaded CPython? You can get it from Ubuntu or
56:48 high-end for Conda. If you go look at the you know, and you could build it from source or get
56:53 a container. Yeah. So these are, again, this is very focused on the kind of things the scientific
56:57 Python community cares about, but, but these are things like, you know, have we ported Cython?
57:00 Have we ported NumPy? Is it being automatically tested? Which release has it? And the nice thing
57:06 actually is pip as of 24.1, I believe can tell the difference between wheels for regular Python
57:12 and free threaded Python. Oh, you can tell by the there's different wheels as well. Yeah. So there's
57:16 a, you know, Python has always had this thing called an ABI tag, which is just a letter that
57:20 you stick after the version number and T is the one for free threading. And so now you, a project
57:26 can choose to upload wheels for both versions and make it easier for people to test out stuff. So
57:31 for example, I mean, Cython, it looks like there are nightly wheels already being built. And so
57:36 this is, they're moving fast and, and, you know, definitely, and our condo, we're also very
57:40 interested in getting into this as well. So that's why we built the condo package for free threading.
57:44 And we're going to start looking at building more condo packages for these things in order to be
57:47 able to facilitate testing. Cause I think the biggest thing we want to make sure is if you want
57:51 to know if your code works, you want the quickest way to get an environment to have some place to
57:55 test. And so making this more accessible to folks is a really high priority. This is cool.
58:01 There was something like this for Python two to three. I remember it showed like the top,
58:05 top 1000 packages on PyPI. And then how many of them were compatible with Python three,
58:11 basically by expressing their language tag or something like that.
58:14 Yep.
58:14 So this is kind of like that. It's also kind of like the, can I use, I don't know if you're
58:18 familiar with that. Can I use from, from the web dev world?
58:22 Oh yeah. Oh, awesome. Yeah. Yeah. I've seen this.
58:24 You go and say, I want to use this, I want to use this feature and it'll, or, you know,
58:28 if I want to say web workers or something like that, and then it'll, you can, it'll show you
58:32 all the browsers and all the versions and when were they supported. And, and this sounds a little
58:36 bit like that, but for free threaded Python, which by the way, free threaded Python is the
58:40 terminology, right? Not no Gil, but free threaded.
58:42 That is what they've decided. I think they're worried about people trying to talk about no,
58:45 no Gil or, I mean, I don't know.
58:47 Gilful. Are you running on a Gilful? You know? Oh my gosh. Okay. Interesting. Now we have a few
58:56 other things to talk about, but we don't have really much time to talk about it. But there was
58:59 one thing that we were maybe going to talk about a bit with compiling. You said, you mentioned some
59:05 talk or something where people were talking about, well, what if we had a static language Python and
59:09 we compiled it and related to that kind of Mr. Magnetic says, could a Python program be compiled
59:15 into a binary, like a jar or a, you know, a go app or whatever.
59:20 There are other tools that look at that as a, yeah, a standalone executable. So yeah, one of
59:24 the things I just wanted to shout out a colleague of mine at Anaconda, Antonio Cuny, who is a well
59:29 known PyPI developer from long ago. He's worked on PyPI for 20 years. He's been working-
59:33 Not the package installing thing, but the JIT compiler.
59:36 PYPY.
59:37 PYPY. Sometimes phonetically, like over audio, that's hard to tell.
59:41 Yes. Yeah. So he's been thinking about this stuff for a very long time. His sort of key insight,
59:46 at least clicked in my head, was that Python is hard to compile because it is so dynamic.
59:51 I can, in principle, modify the attributes, like even the functions of a class at any point in the
59:57 execution of the program. I can monkey patch anything I can do. This dynamicness is really
01:00:02 great for making kind of magical metaprogramming libraries that do amazing things with very little
01:00:07 typing, but it makes compiling them really hard because you don't get to ever say,
01:00:13 okay, this can't ever change. And so what he's been trying to do with a project called Spy,
01:00:18 which he gave a talk on at PyCon 2024, but I think the recordings aren't up yet for that.
01:00:24 And so there isn't a, I don't think there's a public page on it, but he does have a talk on it.
01:00:28 And because I think they've got the keynotes up. The key kind of insight for me for Spy was to
01:00:33 recognize that in a typical Python program, all the dynamic metaprogramming happens at the
01:00:38 beginning. You're doing things like data classes, generating stuff and all kinds of things like
01:00:42 that. And then there's a phase where that stops. And so if we could define a sort of variant of
01:00:49 Python where those two phases were really clear, then you would get all of the dynamic expressiveness,
01:00:55 almost all the dynamic expressiveness of Python, but still have the ability to then feed that
01:01:00 into a compiler tool chain and get a binary. This is super early R and D experimental work,
01:01:05 but I think that's a really great way to approach it because often there's always been this tension
01:01:10 of, well, if I make Python statically compliable, is it just, you know, C with, you know, different
01:01:16 keywords? Do I lose the thing I loved about Python, which was how quickly I could express my
01:01:21 idea. And so this is again, to our, you know, having your cake and eating it too. This is
01:01:25 trying to find a way to split that difference in a way that lets us get most of the benefits
01:01:30 of both sides. That's pretty interesting. And hopefully that talks up soon. That'd be really
01:01:34 neat. Maybe by the time this episode's out, I know the PyCon videos are starting to roll,
01:01:38 like not out on YouTube, but out on, out on the podcast channels. It would be fantastic to have,
01:01:43 here's my binary of Python. Take my data science app and run it. Take my desktop app and run it.
01:01:49 I don't care what you have installed on your computer. I don't need you to set up Python
01:01:53 3.10 or higher on your machine and set up a virtual environment. Just here's my binary.
01:01:59 Do it as you will. That's another, I throw that in with the mobile apps and the front end or the
01:02:06 desktop apps or the front end Python. You know, that's another one of those things that it's,
01:02:10 nobody's pushing towards it. Not nobody, not that many people are pushing towards it because there's
01:02:14 not that many use cases for it that people are using it for because it was so challenging that
01:02:19 people stopped trying to do that. You know? Yeah. That's one thing I also, you know, people
01:02:23 probably hear me say this too many times, but the most people use apps when they use a computer,
01:02:29 not packages or environments. And so in the, in the Python space, we are constantly grappling
01:02:35 with how hard packages and environments are to work with, talk with, you know, decide again,
01:02:40 what languages are in what, you know, do I care about everything or just Python or whatever?
01:02:44 That's all very hard, but that's actually not how most people interact with the computer at all.
01:02:48 And so it really is one of those things. Again, this is one of the reasons I'm so interested in
01:02:52 Beware is briefcase is like the app packager. And the more they can push on that, the more we have
01:02:58 a story. And again, there are other tools that have been around for a long time, but that's
01:03:01 just what I think about a lot. We need to focus on tools for making apps because that's how we're
01:03:05 going to share our work with 99% of the earth. Yes. Yeah. A hundred percent. I totally agree.
01:03:11 And lots, lots of props to Keith Russell McGee and the folks over at Beware for doing that. And for
01:03:17 you guys supporting that work, because it's, it's one of those things where there's not a ton of
01:03:22 people trying to do it. It's not like, well, we're using Django, but is there another way we could
01:03:25 do it? It's basically the same thing, right? There it's creating a space for Python where it kind of,
01:03:31 I know there's PyInstaller and Py2App, but it's, it's pretty limited, right?
01:03:34 Yeah. There's not a lot of effort there. And so it's, it's, there are a few people who've
01:03:38 been doing it for a long time and others are getting more into it. And, and yeah, so it's,
01:03:42 I just, yeah, I wish that we could get more focus on it because there's, there are tools,
01:03:46 they're just don't get a lot of attention. Yeah. And they're not very polished and there's so many
01:03:50 edge cases and scenarios. All right. Let's close it out with just final thought on this little
01:03:54 topic and then, well, you wrapped this up for us. Do you think that's maybe a core developer thing?
01:03:59 I mean, I know it's awesome that Py2App and PyInstaller and PyFreeze are doing their things
01:04:03 that Torg are doing, doing their things to try to make this happen. But I feel like they're kind of
01:04:07 looking in at Python and go like, how can we grab what we need out of Python and jam it into an
01:04:12 executable and make it work? Like, should we be encouraging the core developers to just go like a,
01:04:16 a Python, PyScript, --windows and they're out, you get in .exe or something.
01:04:22 I don't know, actually, that would be a great question. Actually, I would ask Russell that
01:04:26 question. He would have probably better perspective than I would. At some level,
01:04:30 it is a tool that is dealing with a lot of problems that aren't core to the Python language.
01:04:34 And so maybe having it outside is helpful, but maybe there are other things that the core could
01:04:40 do to support it. I mean, again, a lot of it has to do with the realities of when you drop an
01:04:45 application onto a system, you need it to be self-contained. You need, sometimes you have to,
01:04:49 you know, do you have to brick the import library to know where to find things and all of that?
01:04:53 That's exactly what I was thinking is right. If Python itself didn't require like operating system
01:04:58 level fakes to make it think it, if it could go like, here is a thing in memory where you just
01:05:05 import, this is the import space. It's this memory address for these things. And we just run from the
01:05:10 exe rather than dump a bunch of stuff temporarily on disk, import it, throw it, you know, like that
01:05:14 kind of weirdness that happens sometimes. There is probably definitely improvements that could
01:05:18 be made to the import mechanism to support applications. Yeah, exactly. Well, we've
01:05:22 planted that seed. Maybe it will grow. We'll see. All right, Stan, this has been an awesome
01:05:26 conversation. You know, give us a wrap up on all this stuff, just like sort of final call to action
01:05:31 and summary of what you guys are doing at Anaconda. Because there's a bunch of different stuff we
01:05:34 talked about that are in this space. Yeah. I mean, mainly I would say, I would encourage people that
01:05:39 if you want to speed up your Python program, you don't necessarily have to leave Python.
01:05:42 Go take a look at some of these tools. Go, you know, measure what your program's doing. Look
01:05:47 at tools like Numba, but there are other ones out there, you know, PyTorch and Jax and all sorts of
01:05:50 things. There are lots of choices now for speed. And so Python doesn't have to be slow. You just
01:05:55 have to sort of figure out what you're trying to achieve and find the best tool for that.
01:05:58 Oh, one other thing I do want to shout out. I'm teaching a tutorial in a month over at the
01:06:04 Anaconda sort of live tutorials system, which will be how to use Numba. So if something you saw here
01:06:11 you want to go deep on, there will be a tutorial, hopefully linked in the show notes or something.
01:06:15 Yeah, I can link that in the show notes. No problem. Absolutely.
01:06:18 So I'll be going in. Is that the high performance Python with Numba?
01:06:22 Yes. Yes. So yeah, we'll be doing worked examples and you'll get to ask questions and all that
01:06:27 stuff. Cool. I'll make sure to put that in the show notes. People can check it out. All right.
01:06:31 Well, thanks for sharing all the projects that you guys are working on and just the broader
01:06:36 performance stuff that you're tracking. Yeah. Awesome. Glad to chat.
01:06:39 You bet. See you later.
01:06:40 This has been another episode of Talk Python to Me. Thank you to our sponsors. Be sure to check
01:06:46 out what they're offering. It really helps support the show. This episode is sponsored by Posit
01:06:51 Connect from the makers of Shiny. Publish, share and deploy all of your data projects that you're
01:06:56 creating using Python. Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards
01:07:03 and APIs. Posit Connect supports all of them. Try Posit Connect for free by going to
01:07:08 talkpython.fm/posit. P-O-S-I-T. Want to level up your Python? We have one of the largest catalogs
01:07:16 of Python video courses over at Talk Python. Our content ranges from true beginners to deeply
01:07:21 advanced topics like memory and async. And best of all, there's not a subscription in sight.
01:07:26 Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show. Open your
01:07:31 favorite podcast app and search for Python. We should be right at the top. You can also find
01:07:36 the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on
01:07:43 talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of
01:07:48 the show and have your comments featured on the air, be sure to subscribe to our YouTube channel
01:07:52 at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening.
01:07:58 I really appreciate it. Now get out there and write some Python code.