Learn Python with Talk Python's 270 hours of courses

#474: Python Performance for Data Science Transcript

Recorded on Thursday, Jul 18, 2024.

00:00 Python performance has come a long way in recent times, and it's often the data scientists,

00:05 with their computational algorithms and large quantities of data, who care the most about

00:10 this form of performance.

00:11 It's great to have Stan Siebert back on the show to talk about Python's performance for

00:16 data scientists.

00:17 We cover a wide range of tools and techniques that will be valuable for many Python developers

00:22 and data scientists.

00:23 This is Talk Python to Me, episode 474, recorded July 18th, 2024.

00:30 Are you ready for your host?

00:31 Here he is.

00:32 You're listening to Michael Kennedy on Talk Python to Me.

00:36 Live from Portland, Oregon, and this segment was made with Python.

00:39 Welcome to Talk Python to Me, a weekly podcast on Python.

00:45 This is your host, Michael Kennedy.

00:48 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

00:53 both accounts over at fosstodon.org.

00:56 And keep up with the show and listen to over nine years of episodes at talkpython.fm.

01:01 If you want to be part of our live episodes, you can find the live streams over on YouTube.

01:05 Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about upcoming

01:11 shows.

01:12 This episode is sponsored by Posit Connect from the makers of Shiny.

01:16 Publish, share, and deploy all of your data projects that you're creating using Python.

01:20 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quattro, Reports, Dashboards, and APIs.

01:27 Posit Connect supports all of them.

01:29 Try Posit Connect for free by going to talkpython.fm/posit, P-O-S-I-T.

01:35 And it's also brought to you by us over at Talk Python Training.

01:39 Did you know that we have over 250 hours of Python courses?

01:44 Yeah, that's right.

01:45 Check them out at talkpython.fm/courses.

01:48 Hey, Stan.

01:49 Hello.

01:49 Hello, hello.

01:50 Welcome back to Talk Python to Me.

01:51 I'm glad to be here.

01:52 Glad to talk performance.

01:53 I know.

01:54 I'm excited to talk performance.

01:56 It's one of those things I just never get tired of thinking about and focusing on.

02:00 It's just so multifaceted.

02:02 And as we will see, even for a language like Python that is not primarily performance focused,

02:08 there's a lot to talk about.

02:09 Yeah, there's endless bag of tricks.

02:11 Yeah.

02:12 And I would say, you know, sort of undercut my own comment there.

02:16 Python is increasingly focusing on performance since like 3.10 or so, right?

02:21 And the secret is that it's because Python integrates so well with other languages.

02:26 It's sort of always cared about performance in some way.

02:29 It's just sometimes you had to leave Python to do it.

02:31 But you've still got to keep the Python interface.

02:33 There's been such an easy, high-performance escape hatch that making Python itself faster

02:39 is obviously not unimportant, but maybe not the primary focus, right?

02:42 Like usability, standard library, et cetera, et cetera.

02:45 All right.

02:45 For people who have not heard your previous episode, let's maybe just do a quick introduction.

02:51 Who's Stan?

02:52 Yeah.

02:52 So I am Stan Siebert.

02:54 I am a manager at Anaconda, well-known purveyor of Python packages and such.

02:59 My day-to-day job is actually managing most of our open source developers at Anaconda.

03:04 So that includes the Numba team.

03:05 And so we'll be talking about Numba today, but other things like we have people working

03:09 on Jupyter and Beware for Mobile Python and other projects like that.

03:13 And so that's what I do mostly is focus on how do we have an impact on the open source

03:17 community and what does Python need to sort of stay relevant and keep evolving?

03:22 I love it.

03:22 What a cool job as well, position in that company.

03:24 Yeah, I'm really grateful.

03:25 It's a rare position.

03:27 So I'm really glad I've been able to do it for so long.

03:29 Yeah.

03:29 And I would also throw out in the list of things that you've given a shout out to, I would

03:34 point out PyScript as well.

03:36 Oh, yes, of course.

03:37 Yeah.

03:37 I just started managing the PyScript team again, actually.

03:40 And so I forgot about that one too.

03:41 Yes, PyScript.

03:42 So Python in your web browser, Python everywhere, on your phone, in your browser, all the places.

03:47 Yeah.

03:47 I mean, it's a little bit out of left field compared to the other things that you all are

03:50 working on, but it's also a super important piece, I think.

03:54 So yeah, really cool.

03:56 Really cool there.

03:57 So I think we set the stage a bit, but maybe let's start with Numba.

04:04 That one's been around for a while.

04:05 Some people know about it.

04:07 Others don't.

04:08 You know, what is Numba and how do we make Python code faster with Numba?

04:11 Yeah.

04:12 So there have been a lot of Python compilation projects over the years.

04:15 Again, Numba's very fortunate that it's now 12 years old.

04:18 We've been doing it a long time, and I've been involved with it probably almost 10 of those

04:22 years now.

04:22 And Python, I think one of Numba's success points is trying to stay focused on an area where

04:27 we can have a big impact, and that is trying to speed up numerical code.

04:30 So there's a lot of, again, in data science and other sciences, there's a lot of need to

04:36 write custom algorithms that do math.

04:38 And Numba's sweet spot is really helping you to speed those up.

04:41 So we see Numba used in a lot of places where maybe the algorithm you're looking for isn't

04:46 already in NumPy or already in JAX or something like that.

04:48 You need to do something new.

04:50 Projects like UMAP, which do really novel sort of clustering algorithms.

04:54 Or I just at SciPy, I learned more about a project called Stumpy, which is for time series analysis.

04:59 Those authors were able to use Numba to take the numerical core of that project that was

05:04 the sort of the time bottleneck and speed it up without having to leave Python.

05:08 And so that is, I think, really where Numba's most effective.

05:12 Sure.

05:13 If you look at a lot of programs, there might be 5,000 lines of code or more, but even just

05:19 something only as big as 5,000 lines.

05:22 There's a lot of code, but only a little bit of it really actually matters, right?

05:26 Yeah, that's what we find a lot is when you sit down and measure your code, you'll spot

05:30 some hotspots where 60 or 70 or 80% of your time is spent in just like three functions or

05:36 something.

05:36 And that's great.

05:38 And if that's the case, that's great because you can just zero in on that section for speeding

05:42 things up and not ruin the readability of the rest of your program.

05:45 Sometimes optimization can make it harder to read the result.

05:49 And so there's always a balance of you have to keep maintaining this project.

05:52 You don't want to make it unreadable just to get 5% more speed.

05:55 Yeah, absolutely.

05:56 Not just the readability, but the ability to evolve it over time, right?

06:02 So maybe it's like, oh, we're going to compile this section here using Numba or Cython or something

06:09 like that.

06:09 Well, maybe I was going to use this cool new IPI package I found, but I can't just jam it

06:15 in there where it's compiled.

06:16 That's unlikely to work well, right?

06:19 And things like that.

06:20 And so, yeah, a lot of times there's these big sections that look complicated.

06:24 They look slow.

06:25 They're not actually.

06:26 Yeah.

06:26 And one thing I also often emphasize for people is that when you think about the time your

06:31 program takes, think about the time you spent working on it as well as the time you spent

06:35 running it.

06:35 And so because we've heard from a lot of projects who said they were able to get major speed

06:40 ups, not because necessarily because Numba compiled their code to be incredibly fast,

06:45 but it compiled it to be fast enough that they could try new ideas quicker.

06:49 And so they got to the real win, which was a better way to solve their problem because

06:54 they weren't kind of mired in just kind of boilerplate coding for so long.

06:58 Right, right, right.

06:59 It turns out I learned I should use a dictionary and not a list.

07:02 And now it's a hundred times faster.

07:04 And that wasn't actually a compiling thing.

07:07 That was a visibility thing or something, right?

07:09 Yeah.

07:09 Try more things is always helpful.

07:11 And so something that a tool that lets you do that is really valuable.

07:14 A hundred percent.

07:15 So what tools do you recommend for knowing?

07:18 Because our human intuition sometimes is good, but sometimes is really off the mark in terms

07:23 of thinking about what parts are slow, what parts are fast.

07:25 That's something I definitely want.

07:26 I've talked to people.

07:27 Everyone thinks they know where the heart, the slow part is, but sometimes they're surprised.

07:31 And so you definitely before you do anything, it does.

07:34 This is not just number advice.

07:35 This is any time before you're going to speed up your program, measure something.

07:38 So what you want is you want a representative benchmark, something that's not going to run

07:43 too fast because often, you know, like unit tests run too quickly to really tell you to

07:48 exercise the program in a realistic way.

07:50 So you want a benchmark that doesn't run too long, but maybe like five minutes or something.

07:53 And then you're going to want to run that through a profiling tool.

07:57 And there are several options.

07:58 I just usually tell people to use C profile.

08:00 It's built into the standard library in Python.

08:03 It's a great tool.

08:04 It does the job for most stuff.

08:05 And so sometimes you may, there may be other tools, things like snake vis and other things

08:09 to help you interpret the result of the profile.

08:10 But often you'll use C profile to collect the data.

08:13 And what this does is it samples, it sort of records as the program is running, what are

08:19 all the functions that are being called and how much time are they taking?

08:23 And there are different strategies for how to do this, but fundamentally what you get out

08:27 is a essentially a data set that says, you know, 2% of the time in your, in your, in

08:32 your program, this function was running and 3% of this function was running.

08:35 And you can just sort that in descending order and look and see where, what pops out at the

08:40 top.

08:41 And sometimes you're surprised.

08:42 Sometimes you find out it's actually, it wasn't my numerical code.

08:45 It's that I spent, you know, 80% of my time doing some string operation that I didn't

08:49 realize it needed to do over and over again.

08:51 Right, right.

08:52 Exactly.

08:52 Some weird plus equals with a string was just creating a thousand strings to get to

08:58 the end point or something like that.

08:59 Yeah.

08:59 Yeah.

08:59 And it could have just done that once upfront.

09:01 It's good to do the profiling just to make sure there isn't an obvious problem before you

09:06 get into the more detailed optimization.

09:08 Yeah.

09:08 Before you start changing your code completely, it's execution method or whatever.

09:13 Yep.

09:13 Yeah.

09:14 Yeah.

09:14 And, you know, shout out to the PyCharm folks.

09:16 They've got to push the button to profile and they've got to visualize it and they just run

09:20 C profile right in there.

09:21 So that's like C profile on easy mode.

09:24 You know, you get a spreadsheet and you get a graph.

09:25 What about other ones like Bill, FIL or anything else?

09:29 Like any other recommendations?

09:30 Yeah.

09:31 So that's an interesting point is C profile is for compute time profiling.

09:35 An interesting problem you run into is this tool does, which is data is memory profiling,

09:40 which is often a problem when you're scaling up.

09:43 And that's actually one of the other good things to keep in mind when you're optimizing is what

09:46 am I trying to do?

09:47 Am I trying to get done faster?

09:48 Am I trying to save on compute costs?

09:51 Am I trying to go bigger?

09:52 And so I have to speed things up so that I have room to put more data in.

09:55 If that's where you're going, you might want to be out of memory.

09:57 Right.

09:58 Can I just not do?

09:59 Yeah.

09:59 Or is that am I already stuck?

10:01 And so there it is very easy in Python to not recognize when you have temporary arrays and

10:07 things.

10:07 Because again, it's also very compact and you're not seeing what's getting allocated.

10:11 You can accidentally blow up your memory quite a lot.

10:13 And so this kind of a profiler is a great option for it.

10:17 What it can often show you is they'll kind of a line by line.

10:20 This is, you know, how much memory was allocated in each line of your program.

10:25 So you can see, oh, that one line of pandas.

10:27 Oops.

10:27 That that did it.

10:29 Yeah, I can't remember all the details.

10:31 I talked to it more about this one, but I feel like it also keeps track of the memory

10:36 used even down into like NumPy and below, right?

10:41 Not just Python memory where it says now there's some opaque blob of data science stuff.

10:46 Yeah.

10:46 And actually, even on the compute product, there's sort of two approaches.

10:49 So C profile is focused on counting function time.

10:52 But sometimes you have a long function.

10:54 And if you're making a bunch of NumPy calls, you might actually care line by line how much time

10:59 is being taken.

11:00 And that can be a better way to think about it.

11:01 And so I think the tool is called LineProf.

11:03 Yeah.

11:03 I forget the exact URL, but it's an excellent tool in Python for there's one in R and there's

11:10 an equivalent one.

11:11 Yes.

11:12 Robert, Robert, LineProfiler.

11:14 There you go.

11:15 LineProfiler.

11:16 Oh, it's archived, but still can be used.

11:18 Yeah.

11:19 I have to find another tool now.

11:20 This is my go-to for so long.

11:22 I didn't realize it had already been archived.

11:24 Oh, there's a copy.

11:25 Hey, but it still works.

11:26 It's all good.

11:26 It's all good.

11:27 It's been transferred to a new location.

11:29 So that's where it lives now.

11:29 But yeah, line profiling is another.

11:32 I often use them complimentary sort of tools.

11:34 As I zero in on one function that's with C profile, and then I'll go line profile that

11:39 function.

11:39 Oh, interesting.

11:40 Yeah.

11:40 Okay.

11:40 Drilling further.

11:41 Yeah.

11:41 Okay.

11:42 Okay.

11:42 This is the general area.

11:43 Now let's really focus on it.

11:45 Memory is another one.

11:46 I talked to the folks from Bloomberg about that.

11:48 Oh, okay.

11:49 I have not used this one.

11:50 Yeah.

11:50 This is a pretty new one.

11:51 And it's quite neat the way it works.

11:54 Yeah.

11:54 This one actually tracks C and C++ and other aspects of allocations as well.

12:01 So one of the problems you can run into with profiling is especially memory profiling, I think.

12:06 Although if you just want to know about memory, but the more you monitor it, the more it becomes

12:10 kind of a Heisenberg quantum mechanics type thing.

12:14 Once you observe it, you change it.

12:16 And so the answers you get by observing it are not actually what are happening.

12:19 So you got to keep a little bit of an open mind towards that as well, right?

12:23 Yeah.

12:23 And that's even a risk with, you know, other the compute side of the profiling is some you're

12:28 using some compute time to actually observe the program, which means that it can.

12:32 And these tools try to subtract out that bias, but it does impact things.

12:37 And so you may want to have kind of a benchmark that you can run as your kind of real source of

12:43 truth that you run without the profiler turned on just to see a final runtime run with the

12:47 profiler to break it down.

12:48 And then when you're all done, you're going to want to run that program again with the profiler

12:51 off to see if you've actually improved it while clock time wise.

12:55 Yeah.

12:56 Yeah, absolutely.

12:56 That's a really good point.

12:57 It's just maybe do a percent time it type of thing or something along those lines.

13:01 Okay.

13:02 That was a little bit of a side deep dive into profiling because you, before you apply

13:07 some of these techniques like Numba and others, you certainly want to know where to

13:12 apply it.

13:13 And part of that is you might need to rewrite your code a little bit to make it more optimizable

13:19 by Numba or these things.

13:21 So first of all, like what do you do to use Numba, right?

13:24 It's just, you just put a decorator on there and have you go.

13:27 At the very simplest level, Numba's interface is supposed to be just one decorator.

13:30 Now there's some nuance, obviously, and other things you can do, but we tried to get it down

13:34 to for most people, it's just that.

13:37 And the N in NGIT means no Python, meaning we get, this code is not calling the Python

13:42 interpreter anymore at all.

13:43 It is purely machine code, no interpreter access.

13:46 Interesting.

13:47 Okay.

13:47 So some of these like compile, do a thing and compile your Python code to machine instructions.

13:53 I feel like they still interact with like Py object pointers and they still kind of work

13:59 with the API of the Python data types.

14:02 Yeah.

14:03 Which is nice, but it's, it's, it's, it's a whole, whole lot slower of an optimization

14:08 than now it's in 32 and it's, you know, float 32 and these are on registers, you know?

14:13 Yeah.

14:14 And this is part of the reason why Numba focuses on numerical code is that NumPy arrays and actually

14:19 other arrays and PyTorch and other things that support the buffer protocol.

14:22 The API.

14:23 So, so really when Numba compiles this, it compiles sort of two functions.

14:26 One is a wrapper that handles the transition from the interpreter into no Python land, as we call

14:32 it, and then there's the core function that is kind of like you could have written in C

14:35 or Fortran or something.

14:36 And that wrapper is actually doing all the Py object stuff.

14:39 It's reaching in and saying, ah, this, this integer, I'm going to pull out the actual number

14:43 and oh, this NumPy array, I'm going to reach in and grab the data pointer and pass those down

14:47 into the core where the actual, all the math happens.

14:50 So the only time you interact with the interpreter is really at the edge.

14:54 And then once you get in there, you try not to touch it ever again.

14:57 Now, Numba does have a feature that we added some years ago called an object mode block.

15:01 Which lets you in the middle of your no Python code, go back and actually start talking to

15:05 the interpreter again.

15:06 Right.

15:07 Maybe use a standard library feature or something.

15:09 Yeah.

15:09 The most common use we've seen is you like you want a progress bar to update or something

15:13 that's not in your, you know, hot loop.

15:15 You don't want, you don't want to be going back to the interpreter in something that's

15:18 really performance critical.

15:19 But inside of a function, you might have parts that are more or less, you know, one out

15:23 of a million iterations.

15:23 I want to go update the progress bar or something that's totally valid.

15:26 And you can do that with Numba.

15:28 There's a way to get back to the interpreter if you really need to.

15:31 Okay.

15:31 Yeah.

15:31 And it says it takes and translates Python functions to optimize machine code at runtime,

15:37 which is cool.

15:38 So that makes deploying it super easy.

15:39 And you don't have to have like compiled wheels for it and stuff.

15:42 Using industry standard LLVM compilers.

15:45 And then similar speeds to Cs in Fortran.

15:49 Yeah.

15:50 Which is awesome, but also has implications if I can speak.

15:55 For example, when I came to Python, I was blown away that I could just have integers as big

16:02 as I want.

16:03 If I keep adding to them, they just get bigger and bigger.

16:05 Like billions, bazillions of, you know, bits of accuracy.

16:10 And I came from C++ and C Sharp and where you explicitly said it's an N32, it's an N64,

16:17 it's a double.

16:18 And these all had ranges of valid numbers.

16:21 And then you've got weird like wraparounds.

16:23 Maybe you create an unsigned one so you can get a little bit bigger.

16:26 I suspect that you may fall victim or be subjected to these types of limitations without realizing

16:34 them in Python if you at NGIT it because you're back in that land, right?

16:38 Or does it do you guys magic to allow us to have big?

16:42 We do not handle the big integer, which is what you're describing as sort of that integer

16:46 that can grow without bound.

16:47 Because our target audience is very familiar with NumPy.

16:50 NumPy looks at numbers sort of the way you're described from C++ and other languages.

16:55 The D type and all that stuff, right?

16:57 Yeah.

16:57 NumPy arrays always have a fixed size integer and you get to pick what that is, but it has

17:02 to be 8, 16, 32, 64.

17:04 Some machines can handle bigger, but that it is fixed.

17:07 And so once you've locked that in, you can't over, if you go too big, you'll just wrap around

17:12 and overflow.

17:12 Yeah.

17:13 So that limitation is definitely present again in Numba, but fortunately NumPy users are already

17:17 familiar with thinking that way.

17:19 So it isn't an additional constraint on them too much.

17:24 This portion of Talk Python to Me is brought to you by Posit, the makers of Shiny, formerly

17:28 RStudio, and especially Shiny for Python.

17:31 Let me ask you a question.

17:33 Are you building awesome things?

17:35 Of course you are.

17:36 You're a developer or a data scientist.

17:37 That's what we do.

17:38 And you should check out Posit Connect.

17:41 Posit Connect is a way for you to publish, share, and deploy all the data products that you're

17:46 building using Python.

17:47 People ask me the same question all the time.

17:51 Michael, I have some cool data science project or notebook that I built.

17:54 How do I share it with my users, stakeholders, teammates?

17:57 Do I need to learn FastAPI or Flask or maybe Vue or React.js?

18:02 Hold on now.

18:03 Those are cool technologies, and I'm sure you'd benefit from them, but maybe stay focused

18:07 on the data project.

18:08 Let Posit Connect handle that side of things.

18:10 With Posit Connect, you can rapidly and securely deploy the things you build in Python.

18:15 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, ports, dashboards, and APIs.

18:21 Posit Connect supports all of them.

18:24 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise

18:29 requirements.

18:30 Make deployment the easiest step in your workflow with Posit Connect.

18:34 For a limited time, you can try Posit Connect for free for three months by going to talkpython.fm

18:39 slash posit.

18:40 That's talkpython.fm/P-O-S-I-T.

18:44 The link is in your podcast player show notes.

18:46 Thank you to the team at Posit for supporting Talk Python.

18:50 And then one thing you said was that you should focus on using arrays.

18:55 Yes.

18:56 And that kind of data structures before you apply Numba JIT compilation to it.

19:02 Does that mean list as in bracket bracket or these NumPy type vector things?

19:07 We all have different definitions.

19:09 Yes, that's true.

19:10 What are an array?

19:11 Generally, yeah, usually the go-to I talk to about is a NumPy array.

19:15 So it has a shape.

19:17 The nice thing, you know, NumPy arrays can be multidimensional.

19:19 So you can represent a lot of complex data that way.

19:22 But within an array, there's a fixed element size.

19:25 That element could be a record.

19:27 So if you want for every cell in your array to store maybe a set of numbers or a pair of

19:31 numbers, you can do that with custom D types and things.

19:34 And Numba will understand that.

19:35 That's the ideal data structure.

19:37 Numba does have, we added support a couple years ago for other data structures because

19:42 the downside to a NumPy array is that it's fixed size.

19:45 Once you make it, you can't append to it like you can a Python list.

19:49 So Numba does have support for both what we call typed lists and typed dictionaries.

19:54 So these are sort of special cases of lists and dictionaries in Python where the, in the

20:00 case of a list, every element in the list has the same type.

20:02 Or in the case of a dictionary, the keys are all the same type and the values are all the

20:06 same type.

20:06 Right.

20:07 And those cover a lot of the cases where, you know, when users want to make things where

20:12 they don't know how long it's going to be, you're going to append in the algorithm.

20:15 A list is a much more natural thing than an umpire array where you might like over allocate

20:18 or something that seems.

20:19 And dictionaries, our dictionary implementation is basically taken straight from CPython's dictionary

20:25 implementation.

20:26 So it's very tuned and very fast in the same way CPython's is.

20:29 We just had to modify it a little bit to add this type information, but it's really good

20:33 for kind of look up random items kind of stuff.

20:36 So those are available as additional data structures in addition to the array.

20:40 To use those, I would say from number import and something like this.

20:45 There are new type in the, in the docs, I'll show you, you can sort of import a typed list

20:49 as a special class that you can create.

20:51 The downside, by the way, is that, and the reason we have those and we don't just take

20:56 historically number used to try and let you pass in a Python list is that wrapper function

21:01 would have to go recursively through the list of list of lists of whatever you might have

21:05 and pop out all of the elements into some format that wasn't all Py objects so that the no Python

21:11 code could manipulate them quickly.

21:13 And then how do you put it all back if you modify that sort of shadow data structure?

21:17 And so what we realized is that was confusing people and actually added a lot of overhead

21:21 and calling functions took too long.

21:23 So we instead went up a level and said, okay, we're going to make a new kind of list that

21:27 you at the interpreter level can opt into for your algorithm.

21:29 And so accessing that list from Python is slower than a Python list, but accessing it from

21:35 Numba is like a hundred times faster.

21:37 So you kind of have to decide for the, while I'm in this mode, I'm optimizing for numbers

21:43 performance, not for the Python interpreter performance.

21:46 Which is reasonable often, I'd imagine, because this is the part you found to be slow.

21:50 Yeah, that's the trade-off you make.

21:52 And so, yeah.

21:52 So we would not suggest people use type lists just in random places in their program.

21:57 It's really intended to be used.

21:58 Yeah.

21:59 I heard this is fast.

22:00 So we're just going to replace all like new rule.

22:03 Bracket, bracket is disallowed.

22:05 We're now using this one, right?

22:06 Yeah.

22:06 When you're working with Python objects, Python's data structures can't be beat.

22:09 They are so well-tuned that it's very, very hard to imagine something that could be faster

22:15 than them.

22:16 All right.

22:16 So maybe one more thing on Numba we could talk about is, so far, I imagine people have

22:22 been in their mind thinking of, I have at least, running on CPUs, be that on Apple Silicon or

22:28 Intel chips or AMD or whatever.

22:31 But there's also support for graphics cards, right?

22:34 Yes.

22:34 Yeah.

22:35 So for a very long, I mean, we've had this again for 10 plus years.

22:39 We were very early adopters of CUDA, which is the programming interface for NVIDIA GPUs.

22:44 CUDA is supported by every NVIDIA GPU, whether it's a low-end gamer card or a super high-end

22:49 data center card.

22:50 They all support CUDA.

22:51 So that was really nice for people who were trying to get into GPU programming.

22:54 You could use inexpensive hardware to learn.

22:57 And so on both Windows and Linux, Macs don't have NVIDIA GPUs for a long, long time now,

23:02 but on Windows and Linux, you can basically write what they call a CUDA kernel in pure Python.

23:08 And it just, you know, and you can pass up, you know, arrays, either NumPy arrays, which

23:13 then have to be sent to the card or special GPU arrays that are already on the card.

23:17 That is a great way for people to learn a bit more about GPU programming.

23:21 I will say Numba might not be the best place to start with GPU programming in Python because

23:26 there's a great project called Coupy, C-U-P-Y, that is literally a copy of NumPy, but does

23:34 all of the computation on the GPU.

23:35 And Coupy works great with Numba.

23:37 So I often tell people, if you're curious, start with Coupy, use some of those NumPy functions

23:42 to get a sense of, you know, when is an array big enough to matter on the GPU, that sort of

23:47 thing.

23:47 And then when you start wanting to do more custom algorithms, Numba is where you kind of turn

23:52 to for that second level.

23:54 Yeah.

23:54 So I feel like I'm referencing a lot of Etamar's work over here.

23:59 But what if we didn't have a NVIDIA GPU?

24:04 Is there anything we could do?

24:05 Yeah.

24:05 So there are other projects.

24:06 So things like, as I mentioned here, like PyTorch and things are, have been ported to a

24:11 number of different backends.

24:12 Well, this is one thing the Numba team, we are frequently talking about is how do we add

24:16 non-GPU or non-NVIDIA GPU support?

24:19 But it's, I don't have an ETA on that.

24:21 That's something that we just still are kind of thinking about.

24:24 But PyTorch, definitely.

24:25 And you can use PyTorch as an array library.

24:28 You don't have to be doing machine learning necessarily.

24:29 You can use it just for fast arrays.

24:32 It's just most popular for, because it supports, I mean, JAX is a very similar thing, because

24:37 it adds the extra features you want for those machine learning models.

24:39 But at the core of every machine learning model, it's just array math.

24:42 And so you could choose to just do that if that's what you want.

24:45 And then you could even still pass those arrays off to Numba at some point in the future.

24:48 I'm interested.

24:49 Yeah, I didn't realize there was integration.

24:50 Yeah, I didn't realize there was integration with that as well.

24:53 Yeah.

24:54 A while back, we kind of worked with a number of projects to define a GPU array interface

24:59 that's used by a number of them so that we can see each other's arrays without having

25:02 to copy the data, which is very helpful.

25:03 Yeah.

25:04 Yeah.

25:04 We have a lot more topics than Numba, but I'm still fascinated with it.

25:07 Oh, yeah.

25:08 So, you know, one of the big, all the rages, all the rage now is vector databases, obviously,

25:13 because I want to query my way through LLM outputs.

25:17 Right?

25:18 Like, where in 100,000 dimensional space does this question live?

25:22 Yep.

25:22 Or whatever.

25:23 Is there any integration with that kind of stuff back into Numba?

25:27 Numba, not directly.

25:28 Although Numba does have interfaces, an easy way to call out to C functions.

25:33 So a lot of these vector databases are implemented in, you know, C or C++ or something.

25:38 And so if you did have a use case where you needed to call out to one of them, if there

25:41 was a C function call to make directly to sort of the underlying library that bypassed the

25:46 interpreter, you can do that from Numba.

25:47 And so I haven't seen anyone do that yet, but it's a generic sort of C interface.

25:52 Yeah.

25:53 Maybe there's a database driver written in C, in which case, I don't know all the different

25:57 databases.

25:58 I know there are some that are specifically built for it.

26:00 Maybe DuckDB has got something going on here, but also MongoDB has added vector stuff to

26:06 it.

26:06 And I know they have a C library as well.

26:07 Yeah.

26:08 I've looked at LanceDB is one I've seen mentioned by used by a couple of projects.

26:12 That's just for vector stuff.

26:14 It doesn't do anything else.

26:15 LanceDB.

26:16 LanceDB.

26:17 Okay.

26:17 I heard about it in the context of another Python LLM project.

26:21 Cool.

26:22 Well, that's news to me, but it is a developer-friendly open source database for AI.

26:26 Okay.

26:27 Brilliant.

26:27 All right.

26:28 Well, like I said, we have more things to talk about.

26:31 So many things.

26:32 So many things.

26:34 But this is super.

26:34 Okay.

26:35 One more thing I want to ask you about here before we go on.

26:37 This has a Cython feel to it.

26:40 Can you compare and compress Numba to Cython?

26:43 So Cython uses sort of requires you to put in some type information in order to be able

26:48 to generate C code that is more efficient.

26:50 Numba is mainly focused on type inference.

26:53 So we try to figure out all the types in your function based on the types of the inputs.

26:59 And so in general, we, although Numba has places where you can put type annotations, we

27:03 generally discourage people from doing it because we find that it adds work and is error prone

27:09 and doesn't really help the performance in the end.

27:11 Numba will figure out all the types directly.

27:13 And so when it compiles it, if it comes in, if you call it twice with different types,

27:17 does it just say, well, now we're going to need a, this version of the function underscore

27:22 strings list rather than integers list or something.

27:24 Yeah.

27:25 Yeah.

27:25 Every Numba compiled function actually contains a dispatcher that will look at the argument

27:29 types and pick the right one.

27:30 And it's at pretty high granularity.

27:32 You know, for example, people who are familiar with multidimensional arrays and like Fortran and

27:36 C know that they lay out the rows and columns in a different order, which has impact

27:41 on how you do loops and stuff for, for kind of to maximize locality.

27:45 Numba can tell the difference between those two cases and will generate different code for

27:49 those two cases.

27:50 Wow.

27:50 So this is stuff that you as the user don't want to even know.

27:53 No, you don't want to worry about that.

27:55 That's a whole nother level.

27:56 So you were like, okay, well, if it's laid out in this order, it's probably this, it appears

28:02 on the L, you know, the local cache for the CPU in this way.

28:06 And so if we loop in that direction, we'll like iterate through the cache instead of blow

28:09 through it every loop or something like that.

28:11 Basically, we want to make sure that LLVM knows when the step size is one.

28:15 And that's either on the row or the column access, depending on that.

28:18 And because, I mean, compilers in general are magic.

28:21 Like we are grateful that LLVM exists because they can do so many tricks at that level.

28:26 I mean, because, I mean, this is the same thing that powers clang and other stuff.

28:29 So, you know, all of, all of macOS compilers are built on LLVM.

28:33 And so we can leverage all of the tricks they've figured out in decades of development.

28:37 Yeah, that's cool.

28:38 And Python itself is compiled with that, at least on macOS.

28:42 I just saw it last night, you know, that I have clang some version, whatever.

28:46 When I was just looking at the version for my, my Python, I was compiled with that.

28:49 Cool.

28:50 Okay.

28:50 So we've been on the Numba JIT.

28:53 Anthony Shaw wrote an article, Python 3.13 gets a JIT.

28:57 And this is a pretty comprehensive and interesting article on what's going on here.

29:03 What's your experience with this JIT coming to Python 3.13?

29:07 This is, and they've definitely tried to set expectations here, that this first release is really planting a flag.

29:12 It's to say, we're going to start building on top of this base.

29:17 And so as far as I've seen, the benchmarks for 3.13 are not going to be like the world on fire kind of stuff.

29:23 So throwing away RC code and rewriting operating systems and JIT in Python.

29:27 But you have to take a first step.

29:29 And this is honestly pretty, it impressed me because as a library, we can take a lot of risks and do things that I know we can depend on LLVM.

29:37 We can do all sorts of stuff that may be not work for everyone, because if Numba doesn't solve your problem, you just don't use it.

29:43 You can just leave it out of your environment.

29:44 You don't have to install it.

29:46 And so it's easy to kind of us to zero in on just the problems that we're good at and say, if this doesn't solve your problem, just leave us out.

29:53 When you actually are putting a JIT into the core interpreter, everyone gets it.

29:57 And you have to consider, and Python is so broad that you could grab two Python experts, and they may actually have nothing in common with each other.

30:08 But are both equal claim to being experts at using Python, but they might use them in different domains and have different libraries they care about and all of that.

30:14 I feel that way when I talk to Candace people.

30:17 And I think about me doing like web development APIs and stuff.

30:21 I'm like, I think I'm really good at Python, generally, you know, good enough to write good apps.

30:26 But then I look at this, I'm like, I don't even really this.

30:29 Some of these like expressions that go into the filter brackets.

30:33 I'm like, I didn't even know that that was possible or really how to apply, you know, like it's just, it's weird to both feel really competent and understand it, but also have it kind of no idea.

30:43 Yeah, and what I think what you're getting at is those are two really different use cases, and they're getting the same JIT and has to work for both of them.

30:50 But, you know, combinatorially explode that problem, right?

30:53 Yeah, and, you know, all the different hardware.

30:55 I mean, Numbus supports a lot of different computers, but not everyone that Python supports.

30:59 Like Micro Python?

31:00 Yeah, or we don't work on, you know, HP UX or anything like that, necessarily.

31:05 Python has an enormous support, range of supported platforms, an enormous set of use cases.

31:09 And anything you do is going to affect everyone.

31:13 So this approach, which I would say this copy and patch JIT approach is really clever, because one of the, you know, again, Numba has to bring, we build a, you know, custom version of LVM.

31:21 It's a little stripped down, but it's mostly still there.

31:23 So we have to bring that along for the ride.

31:25 That's a heavy, heavy dependency to put on the core interpreter for everyone.

31:29 So the clever bit here is they figured out how to have a JIT, but still do all the compiler stuff at build time.

31:35 So when you build this copy and patch JIT, you actually need LVM, but only at build time.

31:40 And then it can go away.

31:42 And so the person who receives the interpreter doesn't need LVM anymore.

31:44 And so they basically built for themselves a bunch of little template fragments.

31:49 This is the patching part is basically you're saying, I've got a bunch of fragments that implement different opcodes in the bytecode different ways.

31:56 And I'm going to string them together.

31:59 And then go in and there's a bunch of fill in the blank spots that I can go in and swap in the okay, you get your value from here and then you put it over here and all that.

32:07 But the actual machine code was generated by LVM by the person who built Python in the first place.

32:12 I see.

32:13 It just amazes me this works.

32:15 And I'm excited to see where they go with it because it was a clever way to avoid adding a huge heavy dependency to Python.

32:21 Let's start to get some of that JIT benefit.

32:24 So it looks at some of the common patterns.

32:27 I see.

32:28 Okay, we're looping over a list of loads or something and replaces that with more native code or something along those lines.

32:35 Yeah.

32:36 Yeah.

32:36 You essentially have a compiler from a bunch of little recipes that are if I do this pattern, sub in this machine code, fill in these blanks, and you just have a table of them.

32:46 So it's the challenge there is that there is a combinatorial explosion again of how many different, you know, a full blown compiler like LVM has a bunch of rules.

32:55 It's rule based.

32:56 And so it's saying, if I see this pattern, I do this replacement and it keeps doing all of this.

33:00 And then at the end, it says, okay, now I'm going to generate my machine code from that, those transformations.

33:04 If I don't have LVM at runtime, I have to kind of figure out what are the best templates up front, put them in this table.

33:11 And then, you know, and so this is where honestly looking at different usage patterns will probably be a huge help is in practice, you can have any sequence of byte codes, but in reality, you're probably going to have certain ones a lot.

33:22 And those are the ones you want to focus on.

33:24 So I think once we, I don't know, you know, once we start getting this out in the community and start getting feedback on it, I'm curious to see how rapidly it can evolve.

33:31 That'll be really interesting.

33:32 Yeah.

33:32 And this whole copy and patch it is we often hear people say, I'm a computer, I have a computer science degree.

33:38 And I think what that really means is I have a software engineering degree in, or I am a software engineering person.

33:44 They are not often creating new science, computer science of theories.

33:50 They're more like, I really understand how operating system works and programmers and compilers.

33:54 And I, I write JSON API, sorry, talk to databases.

33:58 This is like true new research out of legitimate computer science, right?

34:02 This copy and patch shit.

34:04 Yeah.

34:05 They mentioned, I mean, they cite a paper from 2021 and in computer science, going from paper to implementation in one of the most popular languages on earth in three years seems pretty fast.

34:14 It does seem pretty fast, right?

34:16 It definitely seems pretty fast.

34:17 And the reason I bring this up is I imagine that dropping it into one of the most popular programming languages with a super diverse set of architectures and use cases will probably push that science forward.

34:29 Yeah.

34:29 This will be tested in one of the most intense environments you could imagine.

34:34 You know, I mean, whatever they did for their, their research paper or their dissertation or whatever, this is another level of putting it into a test and experimentation to put it into Python.

34:44 Yep.

34:45 Yeah.

34:45 Wild.

34:46 Okay.

34:46 Hopefully this speeds things up.

34:47 This is going to be interesting because it just happens, right?

34:50 It just happens.

34:51 If you have Python 3.13, it's going to be looking at its patterns and possibly swapping out.

34:55 This is complimentary to all of the techniques the faster CPython folks have been doing all along for many releases now.

35:02 Yeah.

35:03 Is they've, they've been looking at other ways to speed up the interpreter without going all the way to a full blown compiler, which this is kind of giving you the final step.

35:11 So that's again, another interesting place is how does this compliment those?

35:15 You know, I don't know those, those details, but it's another tool in the toolbox to sort of go back to the beginning.

35:20 It's speed is about having a bunch of tools and you kind of pick up 5% here and 5% there and you pile up enough 5% and pretty soon you have something substantial.

35:29 Yeah, I absolutely.

35:31 That was the plan, right?

35:31 The faster CPython was to make it multiples of times faster by adding 20% improvements release over release over release and, you know, compounding percentages basically.

35:41 The one thing I don't know about this is we've had the specializing adaptive interpreter that was one of those faster CPython things that came along.

35:49 You know, is this just the next level of that or is this a replacement for that?

35:54 I don't know.

35:54 I'm sure people can.

35:55 Yeah, I don't know.

35:56 I don't know what the, what their roadmap is for that because I think part of this is, this is so new.

36:00 I think they got to see how it works in practice before they start figuring out.

36:04 I agree.

36:05 It feels like an alternative to that specialized and adaptive interpreter, but I don't know.

36:09 Maybe some of the stuff they've learned from one made it possible or even as just an extension of it.

36:14 Okay.

36:14 What do we want to talk about next here?

36:16 I think.

36:16 You want to talk about threads?

36:17 No, let's talk.

36:19 I want to talk about, I want to talk about Rust really quick before we talk about it.

36:22 Then, because that'll be quick.

36:23 And then I want to talk about threads because threads will not be as quick and it's super interesting.

36:27 It's been a problem that people have been chipping at for years and years and years, the threads thing.

36:34 But what do you think about all this Rust mania?

36:38 I mean, it's shown some real positive results, things like rough and pydantic and others,

36:43 but it's actually a little bizarrely controversial or maybe not bizarre, non-obviously controversial.

36:48 Yeah.

36:48 I mean, my take on the Rust stuff is I view it in the same light as when we use C and Fortran

36:53 historically, it's just Rust is a nicer language in many ways.

36:55 And so being a nicer language means it's certainly, you know, you could have taken any of these things

37:01 and rewritten them in C a long time ago and they would have been faster.

37:04 You just didn't want to write that C code.

37:07 Exactly.

37:07 You know what?

37:08 We could do this in assembler and you would fly, guys.

37:11 Yeah.

37:12 So Rust is moving things, Rust lowering the bar to saying, okay, maybe I'll implement the core of my algorithm outside of Python entirely.

37:21 It's interesting.

37:22 And honestly, I would happily see Rust completely replace C as the dominant extension language in Python.

37:28 The trade-off here, and this is one of those things that's sometimes easy to forget, again, because the Python community is so diverse,

37:34 is when you do switch something to Rust, you do reduce the audience who can contribute effectively in some cases.

37:41 That using Python to implement things has a benefit for the maintainers if it lets them get more contributions, more easily onboard new people.

37:51 I hear this a lot actually from academic software where you have this constant rotating students and postdocs and things.

37:59 And so how quickly you can onboard someone who isn't a professional software developer into a project to contribute to it is relevant.

38:06 And so, but I think it's different for every project.

38:09 There are some things like, you know, again, Rust in cryptography makes total sense to me because that's also a very security conscious thing.

38:16 You really don't want to be dealing with C buffer overflows in that kind of code.

38:20 And so the guarantees Rust offers are valuable also.

38:23 Well, and I think that that's also makes sense even outside of just security directly.

38:29 You're going to build a web server.

38:30 It's a nerve wracking thing to run other people's code on an open port on the Internet.

38:35 And so this is better.

38:38 One of the things I switched to is I recently switched to Granian for a lot of my websites, which is a Rust ACP server.

38:46 It's comparable in performance, slightly faster than other things, but it's way more, it's deviation from its average is way, way better.

38:55 So it's just more consistent.

38:57 More consistent, but also, you know, like the average, for example, the average, where's the versus third party server?

39:03 That's the one I want.

39:03 So for against micro WSGI, for example, right, it's six milliseconds versus 17 milliseconds.

39:10 Like, whatever.

39:11 But then you look at the max latency of 60 versus three seconds.

39:14 It's like, oh, wow.

39:16 Hold on.

39:16 Right.

39:16 But the fact it's written in Rust, I think feels it's a little bit of extra safety, all other things being equal.

39:22 Right.

39:23 And that, I mean, obviously a lot of caveats there.

39:25 Yeah.

39:26 The actually the interesting point about, and this is not unique to Rust.

39:29 This is, again, the same problem with C and other things is that it's a little bit interesting.

39:32 On one hand, we're pushing the Python interpreter and JITs and all this other stuff.

39:35 At the same time as you're thinking about whether to pull code entirely out of Python.

39:39 And it creates a barrier where the JIT can't see what's in the Rust code.

39:43 And so if there was an optimization that could have crossed that boundary, it's no longer available to the compilers.

39:49 Yeah.

39:50 This is a problem the number team has been thinking about a lot because our number one request, aside from, you know, other GPUs, is can number be an ahead of time compiler instead of a just in time compiler?

40:01 And we were like, superficially, yes, that's straightforward.

40:04 But then we started thinking about the user experience and the developer experience.

40:07 And there are some things that you lose when you go ahead of time that you have with the JIT.

40:12 And how do you bridge that gap?

40:13 Yeah, it gets tricky.

40:14 We've been trying to figure out some tooling to try and bridge that.

40:17 So we at SciPy, we did a talk on a project we just started called Pixie, which is a sub project of Numba that is trying to, which doesn't have Rust support yet, but that's been one of the requests.

40:27 So if you go to github.com/Numba slash Pixie, see if they've indexed it.

40:32 Oh, they're perfect.

40:33 Okay.

40:33 Search engines.

40:34 Search engines are pure magic.

40:36 They really are.

40:36 But yeah.

40:37 But Pixie, we gave a talk about it at SciPy.

40:41 It's very early stages.

40:42 But what we're trying to do is figure out how to, in the ahead of time compilation, whether that's C or Rust or even Numba eventually, capturing enough info that we can feed that back into a future JIT so that the JIT can still see what's going on in the compiled code as kind of a future-proofing.

40:57 Yeah, that's cool.

40:59 I know some compilers have profiling-based optimization type things.

41:04 Like you can compile it with some instrumentation, run it, and then take that output and feed it back into it.

41:10 And I don't know if I've ever practically done anything with that, but I'm like, oh, that's kind of a neat idea to like try it.

41:16 Let it see what it does and then feed it back.

41:18 Is this sort of like that?

41:19 Or what do you think?

41:19 This is different.

41:21 This is sort of, this is basically capturing in the library file.

41:24 So you compiled ahead of time to a library, capturing the LLVM bitcode so that you could pull it out and embed it into your JIT, which might have other LLVM bitcode.

41:32 So then you can optimize.

41:34 You can have a function you wrote in Python that calls a function in C, and you could actually optimize them together.

41:39 Even though they were compiled at different times, implemented in different languages, you could actually cross that boundary.

41:45 One's like a ahead of time compilation, just standard compilation.

41:48 And one is like a JIT thing, but it's like, oh, we're going to click it together in the right ways.

41:52 Yeah.

41:52 Wow.

41:52 Yeah.

41:52 Because JITs are nice in that they can see everything that's going on, but then they have to compile everything that's going on.

41:57 And that adds time and latency and things.

41:59 And so can you have it both ways is that's really what we're trying to do.

42:03 It's nice when you can't have your cake and eat it too, right?

42:05 Yes.

42:06 My cake before my vegetables and it'd be fine.

42:09 I said that this Rust thing was a little bit controversial.

42:12 I think there's some just, hey, you're stepping on my square of Python space with a new tool.

42:20 I don't think that has anything to do with Rust per se.

42:22 It's just somebody came along and made a tool that is now doing something maybe in better ways or I don't know.

42:28 I don't want to start up a whole debate about that.

42:30 But I think the other one is what you touched on is if we go and write a significant chunk of this stuff in this new language, regardless what language it is,

42:38 Rust is a relatively not popular language compared to others, then people who contribute to that, either from the Python side, we're like, well, there's this big chunk of Rust code now that I don't understand anything about.

42:49 So I can't contribute to that part of it.

42:50 And you might even say, well, what about the pro developers or the experienced core developers and stuff?

42:57 They're experienced and grow at C in Python, which is also not Rust, right?

43:01 Like it's this new area that is more opaque to most of the community, which I think that's part of the challenge.

43:08 Yeah.

43:09 Some people like learning new programming languages and some don't.

43:12 So on some hand, you know, Rust can be this is a new intellectual challenge and it fixes practically some problems you have with C.

43:18 In other cases, it's the I wanted to worry about what this project does and not another programming language.

43:22 Right, right, right.

43:23 Kind of have to look at your communities and decide what's the right tradeoff for you.

43:26 Maybe in 10 years, CPython will be our Python and it'll be written in Rust.

43:31 You know, I mean, if we move to WebAssembly and like PyScript, Pyodid land a lot, like having that right in there's there's there's a non zero probability, but it's not a high number, I suppose.

43:42 Speaking of something I also thought was going to have a very near zero probability, PEP 703 is accepted.

43:49 Oh, my goodness.

43:49 What is this?

43:50 Yeah.

43:50 So this was, you know, again, a couple of years ago now, or I guess a year ago, it was finally accepted.

43:55 So for since very long time, the Python interpreter has, you know, again, threads are an operating system feature that let you do something in a program concurrently.

44:04 And now that all of our computers have, you know, four, eight, even more cores, depending on, you know, what kind of machine you have, even your cell phone has more than one core.

44:14 The using those those cores requires you have some kind of parallel computing in your program.

44:20 And so the problem is that you don't want, you know, once you start doing things in parallel, you have the potential for race conditions.

44:26 You have the two threads might do the same thing at the same time or touch the same data, get it inconsistent, and then your whole program starts to crash and other bad things happen.

44:35 So historically, the global interpreter lock has been sort of sledgehammer protection of the CPython interpreter.

44:41 But the net result was that threads that were running pure Python code basically got no performance benefits.

44:48 You might get other benefits, like you could have one block on IO while the other one does stuff.

44:52 And so it was easier to manage that kind of concurrency.

44:54 But if you were trying to do compute on two cores at the same time in pure Python code, it was just not going to happen because every operation touches a Python object has to lock the interpreter while you make that modification.

45:05 Yeah, you could write all the multi-threaded code with locks and stuff you want, but it's really just going to run one at a time anyway.

45:10 Yeah.

45:11 A little bit like preemptive multi-threading on a single core CPU, right?

45:15 It's weird.

45:16 Like I've added all this complexity, but I haven't got much out of it.

45:18 The secret, of course, is that if your Python program contained not Python, like C or Cython or Fortran, as long as you weren't touching Python objects directly, you could release the GIL.

45:27 And so Python, so especially in the scientific and computing and data science space, multi-threaded code has been around for a long time.

45:34 And we've been using it and it's fine.

45:36 Dask, you can use workers with threads or processes or both.

45:39 And so I frequently will use Dask with four threads.

45:42 And that's totally fine because most of the codes in NumPy and Pandas that release the GIL.

45:47 But that's only a few use cases.

45:48 And so if you want to expand that to the whole Python interpreter, you have to get rid of the GIL.

45:53 You have to have a more fine-grained approach to concurrency.

45:56 And so this proposal from Sam Gross at Meta was basically one of many historical attempts to kind of make that, get rid of that global interpreter lock.

46:07 Many have been proposed and failed historically.

46:09 So getting this all the way through the approval process is a real triumph.

46:14 At the point where it was really being hotly contested, my maybe slightly cynical take is we have between zero and one more chance to get this right in Python.

46:23 Either it's already too late or this is it.

46:27 I don't know which it is.

46:29 I think there were two main complaints against this change.

46:33 Complaint number one was, okay, you theoretically have opened up a parallel compute thing.

46:39 So, for example, on my Apple M2 Pro, I have 10 cores.

46:44 So I could leverage all of those cores, maybe get a five times improvement.

46:49 But single core regular programming is now 50% slower.

46:54 And that's what most people do and we don't accept it.

46:57 All right.

46:57 That's the one of the size, you know, the golectomy and all that was kind of in that realm, I believe.

47:02 The other is yet to be determined, I think, is much like the Python 2 to 3 shift.

47:08 The problem with Python 2 to 3 wasn't that the code of Python changed.

47:12 It was that all the libraries I like and need don't work here.

47:16 Right.

47:17 And so what is going to happen when we take half a million libraries that were written in a world that didn't know or care about threading and are now subjected to it?

47:26 Yeah.

47:26 And there's sort of two levels of problem there.

47:28 There's one that there's work that has to be done to libraries.

47:31 It's usually with C extensions that, you know, didn't assume that, you know, they assumed a global interpreter lock and they'll have to do some changes to change that.

47:38 But the other one is a much more kind of cultural thing where the existence of the guild just meant that Python developers just wrote less threaded code.

47:46 Yeah.

47:46 They don't think about locks.

47:48 They don't worry about locks.

47:49 They just assume it's all going to be fine.

47:50 Because, again, the race condition doesn't protect threaded code from from the guild.

47:54 The guild doesn't protect thread code from race conditions, but it just protects the interpreter from race conditions.

47:59 So you and your application logic are free to make all the thread mistakes you want.

48:03 But if no one ever ran your code in multiple threads, you would never know.

48:06 And so we're going to have to face that now.

48:08 I think that's a super interesting thing that it's a huge cultural issue that people don't think about it.

48:13 Like I said, I used to do a lot of C++ and C Sharp.

48:16 And over there, you're always thinking about threading.

48:18 You're always thinking about, well, what about these three steps?

48:21 Does it go into a temporarily invalid state?

48:23 Do I need to lock this?

48:24 Right.

48:24 And like C Sharp even had literally a keyword lock, which is like a context manager.

48:29 You just say lock curly brace and everything in there is like into a lock and out of like, because it's just so part of that culture.

48:35 And then in Python, you kind of just forget about it and don't worry about it.

48:38 But that doesn't mean that you aren't taking multiple, like five lines of Python code.

48:43 Each one can run all on its own, but taken as a block, they may still get into these like weird states where if another thread after three lines observes the data, it's still busted.

48:53 Right.

48:54 It's just the culture doesn't talk about it very much.

48:57 Yeah.

48:57 If no one ever runs your code in multiple threads, all of those bugs are theoretical.

49:01 And so it's now what's going to shift is, you know, all of those C extensions will get fixed and everything will be, you know, they'll fix those problems.

49:09 And then we're going to have a second wave of everyone seeing their libraries used in threaded programs and starting to discover what are the more subtle bugs.

49:17 Do I have global state that I'm not being careful with?

49:20 And it's going to be painful, but I think it's necessary for Python to stay relevant into the future.

49:26 I'm a little worried.

49:27 I mean, one of the common questions we hear is sort of why is this what multiprocessing is fine.

49:31 Why don't we do that?

49:32 And definitely multiprocessing is big challenge is processes don't get to share data directly.

49:39 So either, you know, if even if I have like read only data, I might if I have to load two gigabytes of data in every process and I want to store start 32 of them because I have a nice big computer.

49:48 I've just 32 X my data, my memory usage, just so that I can have multiple concurrent computations.

49:56 Now, there are tricks you can play on things like Linux, where you load the data once and rely on forking to preserve pages of memory.

50:03 Linux does cool copy on write stuff when you fork, but that's like fragile and not necessarily going to work.

50:09 And then the second thing, of course, is if any of those have to talk to each other.

50:12 Now you're talking about pickling objects and putting them through a socket and handing them off.

50:16 And that is, again, for certain kinds of applications, just a non-starter.

50:20 Yeah, but then people just start going, well, we're just going to rewrite this in a language that lets us share pointers.

50:24 Yeah.

50:24 Or at least memory in process.

50:26 Yeah.

50:26 Yeah.

50:27 There's, again, there are a lot of Python users where this, they don't need this, they don't care about this, this will never impact them.

50:32 And then there's a whole class of Python users who are desperate for this and really, really want it.

50:38 Sure.

50:39 You know, my, I think there's a couple of interesting things here.

50:43 One, I think that this, I think this is important for stemming people leaving.

50:49 I thought, I actually don't hear this that much anymore, but I used to hear a lot of we've left for go because we need better parallelism or we've left for this performance reason.

50:59 And I don't know, that's just a success story of the faster CPython initiative or all the people who had been around and decided they needed to leave.

51:07 They're gone.

51:07 And they, we just don't hear them anymore because they left.

51:10 It's like, you know, I used to hear them say this at the party, but then they said they're going to leave.

51:14 And I don't hear anyone say they're leaving.

51:15 Well, it's because everyone's still here.

51:16 Didn't say that.

51:17 I don't know.

51:18 But I do think having this as a capability will be important for people to be able to maybe adopt Python where Python was rejected at the proposal stage.

51:30 You know, like, should we use Python for this project or something else?

51:32 Oh, we need threading.

51:34 We need computational threading.

51:35 We've got, you know, 128 core.

51:37 It's out, right?

51:38 And then no one comes and complains about it because they never even started that process, right?

51:42 So it'll either allow more people to come into Python or prevent them for leaving for that same argument on some projects.

51:48 I think that's a pretty positive thing here.

51:50 Yeah, there's, yeah, we don't get to count all of the projects that didn't come into existence because of the global interpreter lock.

51:58 It's easy when you're in it to sort of adjust your thinking to not see the limitation anymore because you're so used to routing around it.

52:05 You don't even stop and think, oh, man, I got to worry about threads.

52:08 You just don't think threads.

52:09 I totally agree.

52:09 And I'll give people two other examples that maybe resonate more if this doesn't resonate with them.

52:14 It's the, what if I said, oh, it's a little bit challenging to write this type of mobile phone application in Python.

52:22 Like, well, it's nearly impossible to write a mobile phone application in Python.

52:26 So we're not even focusing on that as an issue because no one is, I know, Beware and a few other things, there's a little bit of work.

52:34 So I don't want to just, I don't want to like, I'm not trying to talk bad about them.

52:37 But as a community, there's not like, it's not a React native sort of thing or a Flutter where there's a huge community of people who are just like, and we could do this.

52:45 And then how do we, like, there's just not a lot of talk about it.

52:48 And that doesn't mean that people wouldn't just love to write mobile apps in Python.

52:53 It's just, it's so far out of reach that it's, it's just a little whisper in the corner for people trying to explore that rather than a big din.

53:01 And I think, you know, same thing about desktop apps.

53:03 Wouldn't it be awesome if we could not have Electron, but like some really cool, super nice UI thing that's almost pure Python?

53:11 It would.

53:12 But people were not focused on it because no one's trying to do it, but no one's trying to do it because there weren't good options to do it with.

53:18 Right.

53:18 And I think the same story is going to happen around performance and stuff with this.

53:22 Just to jump in, you know, since I have to talk about the Beware folks, I mean, you've described exactly the reason why we fund the Beware development is because, yeah, if we don't work on that now before people sort of, there's a lot of work that has to do before you reach that point where it's easy.

53:36 And so recently the team was able to get sort of tier three support for iOS and Android into CPython 313.

53:43 So now we're at the first rung of the ladder of iOS and Android support in CPython.

53:48 That's awesome.

53:49 Poga and Briefcase, the two components of Beware are really focused again on that.

53:52 Yeah.

53:53 How do I make apps?

53:54 How do I make for desktop and mobile?

53:55 And so, but it's, yeah, we ran into people is that they just didn't even realize you could even think about doing that.

54:01 And so they just, they never stopped to say, oh, I wish I could do this in Python because they just assumed you couldn't.

54:06 And all the people who really needed to like were required to leave the ecosystem and make another choice.

54:12 And it will take the same amount of, I was going to say, it takes the same amount of time with this.

54:15 Even once threads are possible in Python, it'll take years to shift the perception.

54:19 Yeah.

54:19 And probably some of the important libraries.

54:21 Yeah.

54:22 Yeah.

54:22 All right.

54:23 So I'm pretty excited about this.

54:25 I was hoping something like this would come and I didn't know what form it would be.

54:28 I said there were the two limitations, the libraries and the culture, which you called out very awesomely.

54:33 And then also the performance in the, this one is either neutral or a little bit better in terms of performance.

54:40 So it doesn't have that disqualifying killing of the single thread of performance.

54:45 The personal taking here, I will say again, because you have to be fairly conservative with CPython because so many people use it.

54:51 Is that this will be an experimental option that by default, Python won't turn this on.

54:56 You will have Python 3.13 when you get it, we'll still have the global interpreter lock.

55:00 But if you build Python 3.13 yourself, or you get another kind of experimental build of it, there's a flag now at the build stage to turn off the GIL.

55:08 So this is, in this mode, they decided to make, you know, not have to make double negatives.

55:13 This is Python in free threading mode.

55:15 And that will be an experimental thing for the community to test, to try out, to benchmark and do all these things for a number of years.

55:22 They've taken a very measured approach and they're saying, we're not going to force the whole community to switch to this until it's proven itself out.

55:29 Everyone's had time to port the major libraries, to try it out, to see that it really does meet the promise of not penalizing single threaded stuff too much.

55:39 Yeah.

55:39 Or breaking single threaded code too much.

55:41 Yeah.

55:41 Yeah.

55:42 The steering council is reserving the right to decide when this becomes, or if this becomes the official way for Python, you know, I don't know, 3.17 or something.

55:52 I mean, it could be, it could be several years.

55:53 And so I just want everyone not to panic.

55:56 Yeah, exactly.

55:57 This doesn't get turned on in October.

55:59 No.

55:59 And this is super interesting.

56:01 It's accepted.

56:03 It only appears in your Python runtime if you build it with this.

56:08 So I imagine, you know, some people will build it themselves, but someone will also just create a Docker container with Python built with this.

56:14 And you can get the free threaded Docker version or whatever, right?

56:17 We've already put out conda packages as well.

56:19 So if you want to build a conda environment, you actually, if you jump over to the PyFreeThread page.

56:24 Yeah.

56:24 Tell people about this.

56:25 Yeah.

56:25 We didn't make this.

56:26 This is the community made this.

56:28 The scientific Python community put this together.

56:30 And this is a really great resource, again, focused on, you know, that community, which really wants threading because we have a lot of, you know, heavy numerical computation.

56:40 And so this is a good resource for things like how do you install it?

56:43 So there's a link there on what are your options for installing the free threaded CPython.

56:46 You can get it from Ubuntu or PyEnv or conda.

56:49 If you go look at the, you know, and you can build it from source.

56:53 Or get a container.

56:53 Yeah.

56:54 So these are, again, this is very focused on the kind of things the scientific Python community cares about.

56:58 But, but these are things like, you know, have we ported Cython?

57:01 Have we ported NumPy?

57:02 Is it being automatically tested?

57:03 Which release has it?

57:05 And the nice thing actually is pip as of 24.1, I believe, can tell the difference between wheels for regular Python and free threaded Python.

57:13 Oh, you can tell by the, there's different wheels as well.

57:15 Yeah.

57:15 So there's a, you know, Python has always had this thing called an ABI tag, which is just a letter that you stick after the version number.

57:21 And T is the one for free threading.

57:24 And so now you, a project can choose to upload wheels for both versions and make it easier for people to test out stuff.

57:31 So for example, I mean, Cython, it looks like there are nightly wheels already being built.

57:35 And so this is, they're moving fast and, you know, definitely.

57:39 And our conda, we're also very interested in getting into this as well.

57:42 So that's why we built the conda package for free threading.

57:44 And we're going to start looking at building more conda packages for these things in order to be able to facilitate testing.

57:49 Because I think the biggest thing we want to make sure is if you want to know if your code works, you want the quickest way to get an environment to have some place to test.

57:56 And so making this more accessible to folks is a really high priority.

58:00 This is cool.

58:01 There was something like this for Python 2 to 3, I remember.

58:04 It showed like the top 1,000 packages on PyPI.

58:08 And how many of them were compatible with Python 3, basically by expressing their language tag or something like that.

58:14 Yep.

58:14 Yep.

58:15 So this is kind of like that.

58:15 It's also like the can I use.

58:17 I don't know if you're familiar with that.

58:18 Can I use from the web dev world?

58:22 Oh, yeah.

58:23 Oh, awesome.

58:24 Yeah, yeah.

58:24 I've seen this.

58:24 You're going to say, I want to use this.

58:26 I want to use this feature.

58:27 And it'll, or, you know, if I want to say web workers or something like that, and then it'll, you can, it'll show you all the browsers and all the versions.

58:34 And when were they supported?

58:35 And this sounds a little bit like that, but for free threaded Python.

58:39 Which, by the way, free threaded Python is the terminology, right?

58:41 Not no gil, but free threaded.

58:42 That is what they've decided.

58:43 I think they're worried about people trying to talk about no, no gil or, I mean, I don't know.

58:47 Gilful.

58:49 Gilful Python.

58:51 Are you running on a gilful?

58:52 Yeah.

58:53 Oh, my gosh.

58:54 Okay.

58:54 Interesting.

58:55 Now, we have a few other things to talk about, but we don't have really much time to talk about them.

58:59 But there was one thing that we were maybe going to talk about a bit with compiling.

59:04 You said, you mentioned some talk or something where people were talking about, well, what if we had a static language Python and we compiled it?

59:10 And related to that, kind of Mr. Magnetic says, could a Python program be compiled into a binary like a JAR or a, you know, Go app or whatever?

59:20 There are other tools that look at that as a, yeah, a standalone executable.

59:24 So, yeah, one of the things I just wanted to, you know, shout out a colleague of mine at Anaconda, Antonio Cuny, who is a well-known PyPy developer from long ago.

59:31 He's worked on PyPy for 20 years.

59:32 He's been working.

59:33 And not the package installing thing, but the JIG compiler.

59:36 PY, PY.

59:37 PY, PY, PY.

59:38 Yes.

59:38 Yeah.

59:39 Sometimes phonetically, like over audio, it's hard to tell.

59:41 Yes.

59:42 Yeah, yeah.

59:42 So he's been thinking about this stuff for a very long time.

59:44 His sort of key insight, at least clicked in my head, was that Python is hard to compile because it is so dynamic.

59:52 I can, in principle, modify the attributes, like even the functions of a class at any point in the execution of the program.

59:58 I can monkey patch anything.

01:00:00 I can do this dynamicness is really great for making kind of magical metaprogramming libraries that do amazing things with very little typing.

01:00:08 But it makes compiling them really hard because you don't get to ever say, okay, this can't ever change.

01:00:14 And so what he's been trying to do with a project called Spy, which he gave a talk on at PyCon 2024, but I think the recordings aren't up yet for that.

01:00:24 And so there isn't a, I don't think there's a public page on it, but he does have a talk on it.

01:00:28 And because I think they've got like the key notes up, the key kind of insight for me for Spy was to recognize that in a typical Python program, all the dynamic metaprogramming happens at the beginning.

01:00:38 You're doing things like data classes, generating stuff and all kinds of things like that.

01:00:43 And then there's a phase where that stops.

01:00:45 And so if we could define a sort of variant of Python where those two phases were really clear, then you would get all of the dynamic expressiveness, almost all the dynamic expressiveness of Python, but still have the ability to then feed that into a compiler tool chain and get a binary.

01:01:03 This is super early R&D experimental work, but I think that's a really great way to approach it because often there's always been this tension of, well, if I make Python standardize.

01:01:12 I make Python statically compilable, is it just, you know, C with, you know, different keywords?

01:01:17 Do I lose the thing I loved about Python, which was how quickly I could express my idea?

01:01:22 And so this is again to our, you know, having your cake and eating it too.

01:01:25 This is trying to find a way to split that difference in a way that lets us get most of the benefits of both sides.

01:01:31 That's pretty interesting.

01:01:31 And hopefully that talks up soon.

01:01:33 That'd be really neat.

01:01:34 Maybe by the time this episode's out, I know the Python videos are starting to roll, like not out on YouTube, but out on, out on the podcast channels.

01:01:42 It would be fantastic to have, here's my binary of Python.

01:01:45 Take my data science app and run it.

01:01:47 Take my desktop app and run it.

01:01:49 I don't care what you have installed on your computer.

01:01:51 I don't need you to set up Python 3.10 or higher on your machine and set up a virtual environment.

01:01:57 Just here's my binary.

01:01:59 Do it as you will.

01:02:00 That's another, I throw that in with the mobile apps and the front end or the desktop apps or the front end Python.

01:02:08 You know, that's another one of those things that it's nobody's pushing towards it.

01:02:12 Not that many people are pushing towards it because there's not that many use cases for it that people are using it for because it was so challenging that people stopped trying to do that.

01:02:21 You know?

01:02:21 Yeah, there's one thing I also, you know, people probably hear me say this too many times, but the most people use apps when they use a computer, not packages or environments.

01:02:31 And so in the Python space, we are constantly grappling with how hard packages and environments are to work with, talk with, you know, decide again, what languages are in, what, you know, do I care about everything or just Python or whatever?

01:02:44 That's all very hard.

01:02:45 But that's actually not how most people interact with the computer at all.

01:02:48 And so it really is one of those things.

01:02:51 Again, this is one of the reasons I'm so interested in Beware is Briefcase is like the app packager.

01:02:55 And the more they can push on that, the more we have a story.

01:02:59 And again, there are other tools that have been around for a long time, but that's just what I think about a lot.

01:03:02 We need to focus on tools for making apps because that's how we're going to share our work with 99% of the earth.

01:03:09 Yeah, 100%.

01:03:10 I totally agree.

01:03:11 And lots of props to Keith Russell McGee and the folks over at Beware for doing that and for you guys supporting that work because it's one of those things where there's not a ton of people trying to do it.

01:03:23 It's not like, well, we're using Django, but is there another way we could do it?

01:03:26 It's basically the same thing, right?

01:03:27 It's creating a space for Python where it kind of, I know there's PyInstaller and PyTor app, but it's pretty limited, right?

01:03:34 Yeah, there's not a lot of effort there.

01:03:36 And so there are a few people who have been doing it for a long time and others are getting more into it.

01:03:41 And yeah, so I just, yeah, I wish that we could get more focus on it because there are tools that just don't get a lot of attention.

01:03:47 Yeah, and they're not very polished and there's so many edge cases and scenarios.

01:03:51 All right, let's close it out.

01:03:52 Let's just find a thought on this little topic and then let you wrap this up for us.

01:03:56 Do you think that's maybe a core developer thing?

01:03:59 I mean, I know it's awesome that Py2App and PyInstaller and PyFreeze are doing their things, that Toga are doing their things to try to make this happen.

01:04:06 But I feel like they're kind of looking in at Python and go like, how can we grab what we need out of Python and jam it into an executable and make it work?

01:04:13 Should we be encouraging the core developers to just go like a Python MyScript --Windows?

01:04:21 And they're out, you get in .exe or something.

01:04:23 I don't know, actually, that would be a great question.

01:04:24 Actually, I would ask Russell that question.

01:04:26 He would have probably better perspective than I would.

01:04:29 At some level, it is a tool that is dealing with a lot of problems that aren't core to the Python language.

01:04:34 And so maybe having it outside is helpful, but maybe there are other things that the core could do to support it.

01:04:41 I mean, again, a lot of it has to do with the realities of when you drop an application onto a system, you need it to be self-contained.

01:04:47 You need, sometimes you have to, you know, do you have to trick the import library to know where to find things and all of that?

01:04:53 That's exactly what I was thinking is, right?

01:04:55 If Python itself didn't require like operating system level fakes to make it think it is, if it could go like, here is a thing in memory where you just import, this is the import space.

01:05:07 It's this memory address for these things, and we just run from the EXE rather than dump a bunch of stuff temporarily on disk, import it, throw it, you know, like that kind of weirdness that happens sometimes.

01:05:16 There is probably definitely improvements that could be made to the import mechanism to support applications.

01:05:20 Yeah, exactly.

01:05:21 Well, we've planted that seed.

01:05:23 Maybe it will grow.

01:05:24 We'll see.

01:05:25 All right, Stan, this has been an awesome conversation.

01:05:27 You know, give us a wrap up on all this stuff, just like sort of find a call to action and summary of what you guys are doing in Anaconda, because there's a bunch of different stuff we talked about that are in this space.

01:05:36 Yeah, I mean, mainly, I would say I would encourage people that if you want to speed up your Python program, you don't necessarily have to leave Python.

01:05:42 Go take a look at some of these tools.

01:05:44 Go, you know, measure what your program's doing.

01:05:47 Look at tools like Numba, but there are other ones out there, you know, PyTorch and Jaxx and all sorts of things.

01:05:51 There are lots of choices now for speed, and so Python doesn't have to be slow.

01:05:54 You just have to sort of figure out what you're trying to achieve and find the best tool for that.

01:05:59 Oh, one other thing I do want to shout out.

01:06:01 I'm teaching a tutorial in a month over at the Anaconda sort of live tutorial system, which will be how to use Numba.

01:06:09 So if something you saw here you want to go deep on, there will be a tutorial I hopefully linked in the show notes or something.

01:06:15 Yeah, I can link that in the show notes.

01:06:17 No problem.

01:06:17 Absolutely.

01:06:18 So I'll be going in.

01:06:20 Is that the high performance Python with Numba?

01:06:22 Yes.

01:06:23 Yes.

01:06:24 So, yeah, we'll be doing worked examples and you'll get to ask questions and all that stuff.

01:06:28 Cool.

01:06:28 I'll make sure to put that in the show notes so people can check it out.

01:06:30 Mm-hmm.

01:06:31 Cool.

01:06:31 All right.

01:06:31 Well, thanks for sharing all the projects that you guys are working on and just the broader performance stuff that you're tracking.

01:06:37 Yeah, awesome.

01:06:38 Glad to chat.

01:06:39 You bet.

01:06:39 See you later.

01:06:40 This has been another episode of Talk Python to Me.

01:06:43 Thank you to our sponsors.

01:06:45 Be sure to check out what they're offering.

01:06:47 It really helps support the show.

01:06:49 This episode is sponsored by Posit Connect from the makers of Shiny.

01:06:53 Publish, share, and deploy all of your data projects that you're creating using Python.

01:06:57 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.

01:07:04 Posit Connect supports all of them.

01:07:06 Try Posit Connect for free by going to talkpython.fm/posit, P-O-S-I-T.

01:07:12 Want to level up your Python?

01:07:14 We have one of the largest catalogs of Python video courses over at Talk Python.

01:07:18 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:07:23 And best of all, there's not a subscription in sight.

01:07:25 Check it out for yourself at training.talkpython.fm.

01:07:28 Be sure to subscribe to the show.

01:07:30 Open your favorite podcast app and search for Python.

01:07:33 We should be right at the top.

01:07:35 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

01:07:40 direct RSS feed at /rss on talkpython.fm.

01:07:44 We're live streaming most of our recordings these days.

01:07:47 If you want to be part of the show and have your comments featured on the air, be sure to

01:07:51 subscribe to our YouTube channel at talkpython.fm/youtube.

01:07:55 This is your host, Michael Kennedy.

01:07:57 Thanks so much for listening.

01:07:58 I really appreciate it.

01:08:00 Now get out there and write some Python code.

01:08:01 Now get out there and write some Python code.

01:08:01 And I'll see you next time.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon