WEBVTT

00:00:00.001 --> 00:00:03.500
Understanding how your Python application is using memory can be tough.

00:00:03.500 --> 00:00:10.220
First, Python has its own layer of reused memory, arenas, pools, and blocks, to help it be more efficient.

00:00:10.220 --> 00:00:15.460
And many important Python packages are built in native compiled languages like C and Rust,

00:00:15.460 --> 00:00:19.000
oftentimes making that section of your memory usage opaque.

00:00:19.000 --> 00:00:23.300
But with memory, you can get way deeper insight into your memory usage.

00:00:23.300 --> 00:00:29.120
We have Pablo Galindo Salgado and Matt Wazinski back on the show to dive into memory.

00:00:29.560 --> 00:00:32.200
The sister project to their PyStack one we recently covered.

00:00:32.200 --> 00:00:38.240
This is Talk Python to Me, episode 425, recorded June 20th, 2023.

00:00:38.240 --> 00:00:54.600
Welcome to Talk Python to Me, a weekly podcast on Python.

00:00:54.600 --> 00:00:56.340
This is your host, Michael Kennedy.

00:00:56.340 --> 00:00:59.000
Follow me on Mastodon, where I'm @mkennedy,

00:00:59.120 --> 00:01:03.820
and follow the podcast using @talkpython, both on Bostodon.org.

00:01:03.820 --> 00:01:06.420
Be careful with impersonating accounts on other instances.

00:01:06.420 --> 00:01:07.380
There are many.

00:01:07.380 --> 00:01:12.460
Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

00:01:13.040 --> 00:01:16.500
We've started streaming most of our episodes live on YouTube.

00:01:16.500 --> 00:01:24.060
Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

00:01:24.420 --> 00:01:30.520
This episode is brought to you by JetBrains, who encourage you to get work done with PyCharm.

00:01:30.520 --> 00:01:37.420
Download your free trial of PyCharm Professional at talkpython.fm/done dash with dash PyCharm.

00:01:38.060 --> 00:01:39.900
And it's brought to you by InfluxDB.

00:01:39.900 --> 00:01:46.460
InfluxDB is the database purpose built for handling time series data at a massive scale for real-time analytics.

00:01:46.460 --> 00:01:50.340
Try them for free at talkpython.fm/InfluxDB.

00:01:51.400 --> 00:01:52.000
Hey, all.

00:01:52.000 --> 00:01:56.940
Before we dive into the interview, I want to take just a moment and tell you about our latest course over at Talk Python,

00:01:56.940 --> 00:01:59.520
MongoDB with Async Python.

00:01:59.520 --> 00:02:05.240
This course is a comprehensive and modernized approach to MongoDB for Python developers.

00:02:05.240 --> 00:02:14.420
We use Beanie, Pydantic, Async and Await, as well as FastAPI to explore how you write apps for MongoDB and even test them with Locus for load testing.

00:02:14.580 --> 00:02:23.000
And just today, yes, exactly today, the last of these frameworks were upgraded to use the newer, much faster Pydantic 2.0.

00:02:23.000 --> 00:02:25.200
I think it's a great course that you'll enjoy.

00:02:25.200 --> 00:02:29.980
So visit talkpython.fm/Async dash MongoDB to learn more.

00:02:29.980 --> 00:02:34.620
And if you have a recent full course bundle, this one's already available in your library of courses.

00:02:34.620 --> 00:02:39.360
Thanks for supporting this podcast by taking and recommending our courses.

00:02:39.360 --> 00:02:41.460
Hey, guys.

00:02:41.460 --> 00:02:42.320
Hey, Pablo.

00:02:42.320 --> 00:02:43.080
Matt.

00:02:43.180 --> 00:02:44.700
Welcome back to Talk Python to me.

00:02:44.700 --> 00:02:47.320
It hasn't been that long since you've been here last time, has it?

00:02:47.320 --> 00:02:53.160
With the magic of the issue, maybe even minutes since the person listening to us listens to the previous one.

00:02:53.160 --> 00:02:53.700
Exactly.

00:02:53.700 --> 00:02:54.640
We don't know.

00:02:54.640 --> 00:02:55.680
We don't know when they're going to listen.

00:02:55.680 --> 00:02:57.900
And they don't know when we recorded it necessarily.

00:02:57.900 --> 00:02:59.040
It could be magic.

00:02:59.040 --> 00:03:00.420
It's a little bit apart.

00:03:00.420 --> 00:03:06.840
But we got together to talk about all these cool programs that give insight into how your app runs in Python.

00:03:07.060 --> 00:03:14.940
So we talked about PyStack previously about figuring out what your app is doing, if it's locked up or at any given moment, if it crashes, grab a core dump.

00:03:14.940 --> 00:03:19.340
And maybe we thought about combining that with memory and just talking through those.

00:03:19.400 --> 00:03:25.200
But they're both such great projects that, in the end, we decided, nope, they each get their own attention.

00:03:25.200 --> 00:03:26.900
They each get their own episode.

00:03:26.900 --> 00:03:32.560
So we're back together to talk about memory and memory profiling in Python.

00:03:32.560 --> 00:03:37.080
An incredible, incredible profiler we're going to talk about in a minute.

00:03:37.080 --> 00:03:43.960
Pablo, you were just talking about how you were releasing some of the new versions of Python, some of the point updates and some of the betas.

00:03:43.960 --> 00:03:48.760
You want to give us just a quick update on that before we jump into talking about memory?

00:03:48.760 --> 00:03:49.420
Yeah, absolutely.

00:03:49.420 --> 00:03:57.500
I mean, just to clarify also, like the ones I released myself are 3.10 and 3.11, which are the best versions of Python you will ever find.

00:03:57.500 --> 00:04:01.020
But the ones we are releasing right now is 3.12.

00:04:01.020 --> 00:04:03.160
We got the beta 3 today.

00:04:03.160 --> 00:04:05.380
You should absolutely test beta 3.

00:04:05.380 --> 00:04:10.060
Maybe they are not as exciting as 3.11, but there is a bunch of interesting things there.

00:04:10.280 --> 00:04:13.960
And, you know, there is the work of the FasterCPython team.

00:04:13.960 --> 00:04:21.380
And we have a huge change in the parser, but technically tokenizer, because we are getting f-strings even better.

00:04:21.380 --> 00:04:26.220
And that has a huge amount of changes everywhere, even if you don't think a lot about that.

00:04:26.220 --> 00:04:29.900
But having this tested is quite important, as you can think.

00:04:30.200 --> 00:04:33.720
So far, we really, really want everyone to test this release.

00:04:33.720 --> 00:04:38.920
Everyone that is listening to the live version of the podcast can go to python.org.

00:04:38.920 --> 00:04:43.500
A lot of the latest pre-release, that is Python 3.12 beta 3.

00:04:43.500 --> 00:04:45.060
And tell us what's broken.

00:04:45.060 --> 00:04:46.340
Hopefully it's not my fault.

00:04:46.340 --> 00:04:47.840
But yeah.

00:04:47.840 --> 00:04:48.740
No, that's excellent.

00:04:48.740 --> 00:04:49.900
Thanks for keeping those coming.

00:04:49.900 --> 00:04:52.560
Do you know how many betas are planned?

00:04:52.560 --> 00:04:53.880
We're on three now.

00:04:54.040 --> 00:04:57.880
This is going to be a bit embarrassing because I should know being a release manager, I think there is two more.

00:04:57.880 --> 00:05:03.400
It's a bit tricky though, because I think we released beta two a week after beta one, because we shift the schedule.

00:05:03.400 --> 00:05:07.300
It's a bit difficult to know, but there is a PEP that we can certainly find.

00:05:07.300 --> 00:05:11.040
You just search for Python releases schedule, Python 3.12.

00:05:11.520 --> 00:05:13.420
They will tell you exactly how many betas there are.

00:05:13.420 --> 00:05:14.740
I think there is two more betas.

00:05:14.740 --> 00:05:20.520
Then we will have one release candidate, if I recall correctly, if we do things the way I did them.

00:05:20.520 --> 00:05:21.900
And then the final version in October.

00:05:21.900 --> 00:05:23.120
I'm looking forward to it.

00:05:23.120 --> 00:05:28.900
And, you know, the release of 3.12, actually, it's going to have some relevance to our conversation.

00:05:28.900 --> 00:05:29.740
Yes, indeed.

00:05:29.740 --> 00:05:30.160
Yeah.

00:05:30.160 --> 00:05:35.920
I assume people probably listened to the last episode, but maybe just, you know, a real quick introduction to yourself.

00:05:35.920 --> 00:05:37.980
You know, Pablo, you go first, just so people know who you are.

00:05:37.980 --> 00:05:38.480
Yeah, absolutely.

00:05:38.480 --> 00:05:41.580
So I'm Pablo Galindo, I have many things in the Python community.

00:05:41.580 --> 00:05:44.880
I have been practicing to save them very fast, so I don't take a lot of time.

00:05:44.880 --> 00:05:54.840
So I'm a CPython card developer, Python release manager, a string council, and I work at Bloomberg and the Python infrastructure team doing a lot of cool tools.

00:05:54.840 --> 00:05:58.860
I think I don't, I'm not forgetting about anything, but yeah, I'm around.

00:05:58.860 --> 00:06:04.040
I break things so you can, I like to break tools, like with my new changes in CPython.

00:06:04.040 --> 00:06:04.680
That's what I do.

00:06:04.680 --> 00:06:05.120
Excellent.

00:06:05.120 --> 00:06:06.280
Sorry, Wukesh.

00:06:07.680 --> 00:06:09.140
And I am Matt Wisniewski.

00:06:09.140 --> 00:06:12.740
I am Pablo's co-worker on the Python infrastructure team at Bloomberg.

00:06:12.740 --> 00:06:16.680
I am the co-maintainer of Memray and PyStack.

00:06:16.680 --> 00:06:24.620
I'm also a moderator on Python Discord, and that is the extent of my very short list of community involvement compared to Pablo's.

00:06:24.620 --> 00:06:25.000
Excellent.

00:06:25.240 --> 00:06:28.960
Well, yeah, you both are doing really cool stuff, as we're going to see.

00:06:28.960 --> 00:06:38.540
Let's start this conversation off about profilers at a little bit higher level, and then we'll work our way into what is Memray and how does it work and where does it work, all those things.

00:06:38.940 --> 00:06:43.320
So let's just talk about what comes with Python, right?

00:06:43.320 --> 00:06:47.340
We have, interestingly, we have two options inside the standard library.

00:06:47.340 --> 00:06:49.340
We have cprofile and profile.

00:06:49.340 --> 00:06:55.240
Do I use cprofile to profile cpython and profile for other things, or what's going on here, guys?

00:06:55.240 --> 00:06:56.500
That's already...

00:06:56.500 --> 00:06:59.800
You use cprofile whenever you can, is the answer.

00:06:59.800 --> 00:07:00.760
Yes, indeed.

00:07:00.760 --> 00:07:01.980
There is a lot of caveats here.

00:07:01.980 --> 00:07:07.820
I already see, like, two podcast episodes yet with this question, so let's keep this short.

00:07:07.820 --> 00:07:11.600
The cprofile and profile are the profilers that come with the standard library.

00:07:11.600 --> 00:07:13.020
I'm looking at the...

00:07:13.020 --> 00:07:20.680
I mean, probably you are not looking at this in the podcast version, but here we are looking at the Python documentation and the profile, cprofile version.

00:07:21.220 --> 00:07:27.040
And there is this lovely sentence that says cprofile and profile provide deterministic profiling of Python programs.

00:07:27.040 --> 00:07:30.520
And I'm going to say, wow, that's an interesting take on them.

00:07:30.520 --> 00:07:35.140
I wouldn't say deterministic, although I think the modern terminology is tracing.

00:07:35.140 --> 00:07:44.160
And this is quite important because, like, it's not really deterministic in the sense that if you executed the profile 10 times, you're going to get the same results.

00:07:44.160 --> 00:07:48.620
You probably are not because programs in general are not deterministic due to several things.

00:07:48.620 --> 00:07:49.880
We can go into detail.

00:07:49.880 --> 00:07:50.200
Why not?

00:07:50.280 --> 00:07:52.140
Even what else is running on your computer, right?

00:07:52.140 --> 00:07:52.420
Like...

00:07:52.420 --> 00:07:52.640
Exactly.

00:07:52.640 --> 00:07:58.760
What this is referring to is actually a very important thing for everyone to understand because everyone gets this wrong.

00:07:58.760 --> 00:08:04.080
And, like, you know, there is so many discussions around this fact and comparing apples to oranges that is just very annoying.

00:08:04.080 --> 00:08:08.360
So what this is referring to is a cprofile is what is called a tracing profiler.

00:08:08.360 --> 00:08:15.160
The other kind of profiler that we will talk about is a sampling profiler, also called a statistical profiler.

00:08:15.700 --> 00:08:23.740
So this one, so a sampling profiler basically is a profiler that every time, assuming that it's a performance one, because cprofile checks time.

00:08:23.740 --> 00:08:27.540
So how much time does your functions take or why your code is slow, in other words.

00:08:28.100 --> 00:08:32.640
So this profiler basically checks every single Python call that happens, all of them.

00:08:32.640 --> 00:08:34.400
So it sees all of the calls that are made.

00:08:34.400 --> 00:08:37.580
And every time a function returns, it goes and sees them.

00:08:37.580 --> 00:08:41.540
Unfortunately, this has a disadvantage that is very slow.

00:08:41.920 --> 00:08:44.420
So running the profiler will make your code slower.

00:08:44.420 --> 00:08:46.580
So your code takes one hour to run.

00:08:46.580 --> 00:08:50.780
It's not unsurprising that running it under cprofile makes it two hours.

00:08:50.780 --> 00:08:53.200
And then you will say, well, how can I profile anything?

00:08:53.200 --> 00:08:55.240
Well, because it's going to report a percentage.

00:08:55.240 --> 00:08:57.740
So hopefully it's the same percentage, right?

00:08:57.740 --> 00:08:59.680
Not just that it makes it slow.

00:08:59.680 --> 00:09:05.360
The other problem with it is that it makes it slow by different amounts, depending on the type of code that's running.

00:09:05.580 --> 00:09:12.920
If what's executing is IO and you're waiting for a network service or something like that to respond, cprofile isn't making that any slower.

00:09:12.920 --> 00:09:16.880
It takes the amount of time that it takes and it's able to accurately report when that call finishes.

00:09:16.880 --> 00:09:28.360
But if what's running is CPU bound code, where you're doing a bunch of enters and exits into Python functions and executing a bunch of Python bytecode, the tracing profiler is tracing all of that and it's got overhead added to that.

00:09:28.360 --> 00:09:40.960
So the fact that it isn't adding overhead to network calls or to disk IO or things like that, but is adding overhead to CPU bound stuff means that it can be tough to get a full picture of where your program is spending its time.

00:09:40.960 --> 00:09:46.240
It's very good at telling you where it's spending its CPU, but not as good at telling you where it's spending its time.

00:09:46.240 --> 00:09:46.980
Right, right.

00:09:46.980 --> 00:09:51.260
Because it has this, it's one of these Heisenberg quantum mechanics sort of things.

00:09:51.260 --> 00:09:53.140
By observing it, you make a change.

00:09:53.140 --> 00:09:55.700
And it's really, Matt, that's a great point.

00:09:55.700 --> 00:10:06.060
I think also you could throw into there specifically in the Python world that it's really common that we're working with computation involving C or Rust, right?

00:10:06.060 --> 00:10:16.040
And so if I call a function where the algorithm is written in Python, every little step of that through the loops of that algorithm are being modified and slowed down by the profiler.

00:10:16.040 --> 00:10:22.540
Whereas once it hits a C or a Rust layer, it just says, well, we're just going to wait till that comes back.

00:10:22.600 --> 00:10:23.940
And so it doesn't interfere, right?

00:10:23.940 --> 00:10:33.080
And so it, even across things like where you're using something like say Pandas or NumPy, potentially, it could misrepresent how much time you're spending there.

00:10:33.280 --> 00:10:38.940
On the other hand, it was not going to interfere with the RAS or C, but it's also not going to report inside that.

00:10:38.940 --> 00:10:43.580
So you are going to see a very high level view of what's going on.

00:10:43.580 --> 00:10:48.420
So it's going to tell you algorithm running, but like you're not going to see what's going on, right?

00:10:48.420 --> 00:10:52.580
Well, the advantage here is that it comes with the standard library, which is, and it's a very simple profiler.

00:10:52.580 --> 00:10:56.120
So you know what you're doing, which is maybe a lot to ask.

00:10:56.620 --> 00:11:00.500
Because, you know, it's not, no, I mean it like in the sense that it's not that you are a professional.

00:11:00.500 --> 00:11:05.960
It's that sometimes it's very hard to know when it's a good choice of a tool.

00:11:05.960 --> 00:11:11.560
Because as Matt was saying, you know, you have a lot of CPU bound code and you don't have a lot of IOD and you're safe.

00:11:11.560 --> 00:11:14.820
But like sometimes it's very difficult to know that that's true or like how much do you have.

00:11:15.040 --> 00:11:18.820
So you have a very simple situation like a script maybe or a simple algorithm.

00:11:18.820 --> 00:11:21.600
It may work and you don't need to reach for something more sophisticated, right?

00:11:21.600 --> 00:11:21.800
Yeah.

00:11:21.800 --> 00:11:29.400
Knowing what type of problem you're falling into and whether this is the right tool already requires you to know something about where your program is spending most of its time.

00:11:29.400 --> 00:11:37.100
If you are using this tool to find out where your program is spending its time, you might not even be able to accurately judge if this is the right tool to use.

00:11:37.100 --> 00:11:37.620
That's true.

00:11:37.620 --> 00:11:40.620
But also it can give you some good information, right?

00:11:40.620 --> 00:11:52.120
It's not completely, but it certainly, as long as you are aware of those limitations that you laid out, Matt, you could look at it and say, okay, I understand that these things that seem to be equal time, they might not be equal.

00:11:52.120 --> 00:11:56.420
But it still gives you a sense of like within my Python code, I'm doing more of this.

00:11:56.420 --> 00:11:56.700
Right.

00:11:56.700 --> 00:11:58.860
Or here's how much time I'm waiting.

00:11:58.860 --> 00:12:06.060
Also another thing to mention here, which is going to become relevant when we talk about memory as well, is an advantage of this is that because it's in the standard library.

00:12:06.060 --> 00:12:10.100
What this tool produces is a file with the information of the profile run.

00:12:10.340 --> 00:12:17.920
And because it's in the standard library and it's so popular, there is a lot of tools that can consume the file and show you the information in different ways.

00:12:17.920 --> 00:12:21.660
So you have a lot of ways to choose how you want to look at the information.

00:12:21.660 --> 00:12:29.040
Some people, for instance, like to look at the information into kind of a graph that will tell you the percentage of the calls and things like that.

00:12:29.040 --> 00:12:32.200
Some other people like to see it in a graphical view.

00:12:32.200 --> 00:12:37.100
So there's this box with boxes inside that will tell you the percentage and things like that.

00:12:37.220 --> 00:12:43.340
And some people like to see it in Terminal or in the GUI or in PyCharm or whatever it is.

00:12:43.340 --> 00:12:49.620
So there is a lot of ways to consume it, which is very good because, you know, different people have different ways to consume the information.

00:12:49.620 --> 00:12:50.820
And that is a fact.

00:12:50.820 --> 00:12:55.520
Depends on who you are and how, whether you're looking at some visualizations may be better than others.

00:12:55.640 --> 00:12:57.060
And there is a lot to choose here.

00:12:57.060 --> 00:13:00.620
And that is an advantage compared to something that, you know, just offers you one and that's all.

00:13:00.620 --> 00:13:00.940
Indeed.

00:13:00.940 --> 00:13:08.920
So I mentioned that 3.12 might have some interesting things coming around this profiling story.

00:13:08.920 --> 00:13:14.760
We have PEP669, low impact monitoring for CPython.

00:13:14.760 --> 00:13:19.480
This is part of the Faster CPython initiative, I'm guessing, because Mark Shannon is the author of it.

00:13:19.480 --> 00:13:20.440
It's kind of related.

00:13:20.440 --> 00:13:22.060
I don't think it's immediately there.

00:13:22.060 --> 00:13:25.840
I mean, it's related to the fact that it's trying to make profiling itself better.

00:13:26.040 --> 00:13:29.820
I know he has to spend time from the Faster CPython project into implementing this.

00:13:29.820 --> 00:13:32.600
I need to double check if this is in 3.12.

00:13:32.600 --> 00:13:33.500
I think it is.

00:13:33.500 --> 00:13:37.040
But it may be accepted for 3.12 by going to 3.13.

00:13:37.040 --> 00:13:39.200
We should double check.

00:13:39.200 --> 00:13:45.120
So the 100% say that it's in 3.12 because I don't know if he had the time to fully implement it.

00:13:45.120 --> 00:13:49.820
Yeah, I don't know if it's in there, but from a PEP perspective, it says accepted and for Python 3.12.

00:13:49.820 --> 00:13:50.820
For Python 3.12, yes.

00:13:50.820 --> 00:13:51.860
I do believe it's in there.

00:13:51.860 --> 00:14:02.500
I'm pretty sure that I've been talking with Ned Batchelder a bit about coverage, and I'm pretty sure he said he's been testing with this in the coverage, testing coverage against this with 3.12 betas.

00:14:02.500 --> 00:14:08.720
So the idea here is to add additional events to the execution in Python, I'm guessing.

00:14:08.720 --> 00:14:14.700
It says it's going to have the following events, pystart, pyresume, pythrow, pyyield, pyunwind, call.

00:14:15.920 --> 00:14:17.580
How much either of you guys know about this?

00:14:17.580 --> 00:14:18.440
Yeah, I'm quite.

00:14:18.440 --> 00:14:25.980
I mean, I was involved in judging this, so I know quite a lot since I was accepting this.

00:14:25.980 --> 00:14:35.180
But the idea here is that just as a high-level view, because if we go into detail, we can again make two more podcast episodes, and maybe we should invite Max Shannon in that case.

00:14:35.180 --> 00:14:45.420
But the idea here is that the tools that the interpreter exposes for the profiler and debugging, because debugging is also involved here, they impose quite a lot of overhead over the program.

00:14:45.420 --> 00:14:51.160
What this means is that running the program under a debugger or a profiler will make it slow.

00:14:51.160 --> 00:15:00.040
We are talking about tracing profilers, yes, because the other kind of profilers, sampling profilers, they work differently, and they will not use these APIs.

00:15:00.040 --> 00:15:02.700
They trade accuracy for a lower impact, yeah.

00:15:02.700 --> 00:15:03.140
Yes.

00:15:03.140 --> 00:15:08.460
I mean, just to be clear, because I don't think we are going to talk that much about them, but just to be clear what is the difference.

00:15:08.460 --> 00:15:18.040
The difference here is that a sampling profiler, instead of tracing the program, I see the runs and sees everything that the program does, it just takes photos of the program at regular intervals.

00:15:18.040 --> 00:15:26.180
So it's like, you know, imagine that you're working on a project, and then I enter your room every five minutes and tell you, what file are you working on?

00:15:26.180 --> 00:15:29.760
And then you tell me, oh, it's a program.cpp, right?

00:15:29.760 --> 00:15:31.800
And then I enter again, it's a program.cpp.

00:15:31.800 --> 00:15:34.520
And then I enter again, it's like a other thing.cpp.

00:15:34.520 --> 00:15:41.900
So if I enter 100 times and 99 of them you were in this particular file, then I can tell you that that file is quite important, right?

00:15:41.960 --> 00:15:43.340
So that's the idea.

00:15:43.340 --> 00:15:46.900
But maybe when I was not there, you were doing something completely different, and I miss it.

00:15:46.900 --> 00:15:52.780
It just happens that every five minutes you were checking the file because you really like how it's written, but you were not doing anything there.

00:15:52.780 --> 00:15:57.320
So, you know, like there is a lot of cases when that can misrepresent what actually happened.

00:15:57.320 --> 00:16:03.560
An advantage here is that nobody is annoying you while I'm not entering the room, right?

00:16:03.560 --> 00:16:06.180
So you can do actual work at actual speed, right?

00:16:06.180 --> 00:16:12.620
And therefore, these profiles are faster, but as you say, they trade kind of like accuracy for speed.

00:16:12.620 --> 00:16:16.860
But this PEP tries to make tracing profiles faster, so that the other ones.

00:16:16.860 --> 00:16:26.800
And the idea here is that the kind of APIs that Cpython offers are quite as low because they are super generic in the sense that what they give you is that every time a function call,

00:16:26.800 --> 00:16:36.480
in the case of the profiler APIs, is made or returns, it will hold you, but it will basically pre-compute a huge amount of, well, not a huge amount,

00:16:36.480 --> 00:16:39.480
but it's quite a lot of amount of information for you, so you can use it.

00:16:39.480 --> 00:16:44.480
Most of the time you don't care about that information, but it's just there, and it was just pre-computed for you, so it's very annoying.

00:16:45.020 --> 00:16:56.540
And in the case of the tracing, the Cs.setTrace, so this is for debuggers and, for instance, coverage use that as well, which is the same idea, but instead of every function call, it's every bytecode instruction.

00:16:56.540 --> 00:17:05.240
So every time the bytecode execution loop executes the instruction, it calls you, or you can have different events, like every time it changes lines, something like that.

00:17:05.240 --> 00:17:07.620
But the idea is that the overhead is even bigger.

00:17:07.620 --> 00:17:11.080
And again, you may not care a lot about all these things.

00:17:11.260 --> 00:17:19.340
So the idea here is that instead of, like, calling you every single time, you could maybe do something, you can tell the interpreter what things are you interested in.

00:17:19.340 --> 00:17:27.260
So you say, well, look, I'm a profiler, and I am just interested on, you know, when a function starts and when a function ends, I don't care about the rest.

00:17:27.260 --> 00:17:31.560
So please don't pre-compute line numbers, don't give me any of these other things, just call me.

00:17:31.560 --> 00:17:32.860
Just don't do anything.

00:17:32.860 --> 00:17:40.080
So the idea is that then you only pay for these particular cases, and the idea is that it's as fast as possible.

00:17:40.220 --> 00:17:47.700
Because also the fact that this is event-based makes the implementation a bit easier in the sense that it doesn't need to slow down the normal execution loop by a lot.

00:17:47.700 --> 00:17:50.780
Only you register a lot of events, then it will be quite slow.

00:17:50.780 --> 00:17:57.560
But as you can see here for the list of events, there is a bunch of things that you may not care about, like, for instance, race exceptions or change lines and things like that.

00:17:58.020 --> 00:18:05.380
But the idea here is that, you know, because it's event-based, then if you are not interested in many of these things, then you don't register the events for that.

00:18:05.380 --> 00:18:10.860
So you are never called for them and you don't pay the cost, which in theory will make some cases faster.

00:18:10.860 --> 00:18:11.900
Some others not.

00:18:11.900 --> 00:18:12.340
Sure.

00:18:12.340 --> 00:18:15.800
It depends on how many of these events the profiler subscribes to, right?

00:18:15.800 --> 00:18:22.680
This portion of Talk Python to Me is brought to you by JetBrains and PyCharm.

00:18:22.680 --> 00:18:27.460
Are you a data scientist or a web developer looking to take your projects to the next level?

00:18:27.460 --> 00:18:30.160
Well, I have the perfect tool for you, PyCharm.

00:18:30.160 --> 00:18:39.440
PyCharm is a powerful integrated development environment that empowers developers and data scientists like us to write clean and efficient code with ease.

00:18:39.800 --> 00:18:45.780
Whether you're analyzing complex data sets or building dynamic web applications, PyCharm has got you covered.

00:18:45.780 --> 00:18:52.920
With its intuitive interface and robust features, you can boost your productivity and bring your ideas to life faster than ever before.

00:18:52.920 --> 00:18:59.120
For data scientists, PyCharm offers seamless integration with popular libraries like NumPy, Pandas, and Matplotlib.

00:18:59.120 --> 00:19:06.240
You can explore, visualize, and manipulate data effortlessly, unlocking valuable insights with just a few lines of code.

00:19:06.860 --> 00:19:10.880
And for us web developers, PyCharm provides a rich set of tools to streamline your workflow.

00:19:10.880 --> 00:19:19.880
From intelligent code completion to advanced debugging capabilities, PyCharm helps you write clean, scalable code that powers stunning web applications.

00:19:19.880 --> 00:19:28.180
Plus, PyCharm's support for popular frameworks like Django, FastAPI, and React make it a breeze to build and deploy your web projects.

00:19:28.180 --> 00:19:33.120
It's time to say goodbye to tedious configuration and hello to rapid development.

00:19:33.120 --> 00:19:34.620
But wait, there's more.

00:19:35.160 --> 00:19:43.360
With PyCharm, you get even more advanced features like remote development, database integration, and version control, ensuring your projects stay organized and secure.

00:19:43.360 --> 00:19:48.820
So whether you're diving into data science or shaping the future of the web, PyCharm is your go-to tool.

00:19:48.820 --> 00:19:50.900
Join me and try PyCharm today.

00:19:50.900 --> 00:20:01.520
Just visit talkpython.fm/done-with-pycharm, links in your show notes, and experience the power of PyCharm firsthand for three months free.

00:20:01.520 --> 00:20:02.700
PyCharm.

00:20:02.700 --> 00:20:04.440
It's how I get work done.

00:20:04.440 --> 00:20:10.560
For example, so one of the events is PyUnwind.

00:20:10.560 --> 00:20:14.680
So exit from a program function during an exception unwinding.

00:20:14.680 --> 00:20:20.060
You probably don't really care about recording that and showing that to somebody in a report.

00:20:20.060 --> 00:20:26.160
But the line event, like an instruction is about to be executed that has a different line number from the preceding instruction.

00:20:26.560 --> 00:20:27.120
There we go.

00:20:27.120 --> 00:20:27.640
All right.

00:20:27.640 --> 00:20:28.240
Something like that.

00:20:28.240 --> 00:20:29.180
This is an interesting one.

00:20:29.180 --> 00:20:31.440
Sorry, Matt, do you want to mention something?

00:20:31.440 --> 00:20:33.300
I think you do need to care about unwind.

00:20:33.300 --> 00:20:36.560
Actually, you need to know what function is being executed.

00:20:36.560 --> 00:20:42.540
And in order to keep track of what function is being executed at any given point in time, you have to know when a function has exited.

00:20:42.540 --> 00:20:54.640
There's two different ways of knowing when the function has exited, either a return or an unwind, depending on whether it returned due to a return statement or due to falling off the end of the function or because an exception was thrown and not caught.

00:20:54.640 --> 00:20:54.920
Okay.

00:20:54.920 --> 00:21:00.100
Give us an example of one that you might not care about from a memory-style perspective.

00:21:00.100 --> 00:21:02.080
Instruction is one that we wouldn't care about.

00:21:02.080 --> 00:21:04.580
In fact, even line is one that we wouldn't care about.

00:21:04.580 --> 00:21:12.360
Memory cares about profilers in general, for the most part, will care not about what particular instruction is being executed in a program.

00:21:12.360 --> 00:21:19.740
They care about what function is being executed in a program because that's what's going to show up in all the reports they give you rather than line-oriented stuff.

00:21:19.740 --> 00:21:23.520
So maybe coverage and then Batch Elder might care about line.

00:21:23.680 --> 00:21:24.180
Yeah, yeah, yeah.

00:21:24.180 --> 00:21:25.000
But you guys would.

00:21:25.000 --> 00:21:26.780
He very much cares about line.

00:21:26.780 --> 00:21:27.820
Yeah, I can imagine.

00:21:27.820 --> 00:21:28.500
And that's the slow one.

00:21:28.500 --> 00:21:29.340
That's the slow one.

00:21:29.340 --> 00:21:31.380
And it's important to understand why it's slow.

00:21:31.380 --> 00:21:36.780
It's slow because the program doesn't really understand what a line of code is, right?

00:21:36.780 --> 00:21:41.640
A line of code is a construct that only makes sense for you, the programmer.

00:21:41.640 --> 00:21:45.680
The parser doesn't even care about the line because it sees coding in a different way.

00:21:45.680 --> 00:21:46.540
It's a stream of bytes.

00:21:46.540 --> 00:21:51.860
And lines don't have semantic meaning for most of the program compilation and execution.

00:21:52.400 --> 00:22:02.920
The fact that you want to do something when a line changes, then it forces the interpreter to not only keep around that information, which mostly is somehow there, compressed, but also reconstructed.

00:22:02.920 --> 00:22:06.660
So basically every single time, I mean, it's made in a obviously better way.

00:22:06.760 --> 00:22:11.940
But the idea is that every single time it executes the instruction, it needs to check, oh, did I change the line?

00:22:11.940 --> 00:22:14.380
And then if the answer is this, then it calls you.

00:22:14.380 --> 00:22:16.780
That is basically the old way, sort of.

00:22:16.780 --> 00:22:21.120
Because instead of doing that, it has kind of a way to know when that happens, so it's not constantly checking.

00:22:21.380 --> 00:22:24.460
But this is very expensive because it needs to reconstruct that information.

00:22:24.460 --> 00:22:31.360
That slowness is going to happen every single time you're asking for something that doesn't have kind of meaning in the execution of the program.

00:22:31.360 --> 00:22:32.820
And an exception has it.

00:22:32.820 --> 00:22:37.100
Like the interpreter needs to know when an exception is raised and what that means because it needs to do something special.

00:22:37.100 --> 00:22:39.640
But the interpreter doesn't care about what a line is.

00:22:39.640 --> 00:22:41.240
So that is very expensive.

00:22:41.240 --> 00:22:41.600
Right.

00:22:41.600 --> 00:22:48.780
You could go and write statement one, semicolon statement two, semicolon statement three, and that would generate a bunch of bytecodes, but it still would just be one line.

00:22:48.780 --> 00:22:49.040
Sure.

00:22:49.040 --> 00:22:56.420
Hey, Pablo, sidebar, it sounds like there's some clipping or some popping from your mic, so maybe just check the settings just a little bit.

00:22:56.420 --> 00:22:57.000
Oh, absolutely.

00:22:57.000 --> 00:22:59.640
Yeah, hopefully we can clean that up just a bit.

00:22:59.640 --> 00:23:00.940
But it's not terrible either way.

00:23:00.940 --> 00:23:01.480
All right.

00:23:01.480 --> 00:23:03.140
So you think this is going to make a difference?

00:23:03.140 --> 00:23:06.480
This seems like it's going to be a positive impact here?

00:23:06.480 --> 00:23:15.600
One particular way that it'll make a difference is that for the coverage case that we just talked about, coverage needs to know when a line is hit or when a branch is hit, but it only needs to know that once.

00:23:15.600 --> 00:23:18.840
And once it has found that out, it can stop tracking that.

00:23:18.840 --> 00:23:30.980
So the advantage that this new API gives is the ability for coverage to uninstall itself from watching for line instructions or watching for function call instructions from a particular frame.

00:23:31.200 --> 00:23:40.180
Once it knows that it's already seen everything that there is to see there, then it can speed up the program as it goes by just disabling what it's watching for as the program executes.

00:23:40.180 --> 00:23:40.580
Okay.

00:23:40.580 --> 00:23:41.680
That's an interesting idea.

00:23:41.680 --> 00:23:47.640
It's like, it's decided it's observed that section enough in detail, and it can just kind of step back a little bit higher.

00:23:47.640 --> 00:23:47.900
Yep.

00:23:47.900 --> 00:23:48.180
All right.

00:23:48.180 --> 00:23:48.520
Okay.

00:23:48.520 --> 00:23:49.160
Excellent.

00:23:49.160 --> 00:23:52.120
So this is coming, I guess, in October.

00:23:52.120 --> 00:23:54.160
Pablo will release it to the world.

00:23:54.160 --> 00:23:54.940
Thanks, Pablo.

00:23:54.940 --> 00:24:00.480
No, this time is Thomas Wooders, which is the release manager of 312 is Thomas.

00:24:00.480 --> 00:24:01.400
Oh, this is Thomas.

00:24:01.400 --> 00:24:02.520
Oh, this is 312.

00:24:02.520 --> 00:24:02.980
That's right.

00:24:02.980 --> 00:24:03.460
That's right.

00:24:03.460 --> 00:24:06.840
So you want to blame someone, don't blame me for this one.

00:24:06.840 --> 00:24:08.140
Exactly.

00:24:08.140 --> 00:24:08.800
Exactly.

00:24:08.800 --> 00:24:09.480
All right.

00:24:09.480 --> 00:24:17.900
So that brings us to your project, Memray, which is actually a little bit of a different focus than at least Cprofile, right?

00:24:17.900 --> 00:24:26.100
And many of the profilers, I'll go and say most of the profilers answer the question of where am I spending time, not where am I spending memory, right?

00:24:26.100 --> 00:24:27.240
I would agree that that's true.

00:24:27.240 --> 00:24:29.140
There are definitely other memory profilers.

00:24:29.140 --> 00:24:32.660
We're not the only one, but the majority of profilers are looking at where time is spent.

00:24:32.660 --> 00:24:36.200
And yet understanding memory in Python is super important.

00:24:36.200 --> 00:24:47.720
I find Python to be interesting from the whole memory, understanding the memory allocation algorithms, and there's a GC, but it only does stuff some of the time.

00:24:47.720 --> 00:24:49.040
Like, how does all this work, right?

00:24:49.040 --> 00:25:05.940
And we as a community, maybe not Pablo as a core developer, but as a general rule, I don't find people spend a ton of time obsessing about memory like maybe they do in C++ where they're super concerned about memory leaks or some of the garbage collected languages where they're always obsessed with.

00:25:05.940 --> 00:25:10.920
You know, is the GC running and how's it affecting real time or near real time stuff?

00:25:10.920 --> 00:25:15.500
It's a bit of a black box, maybe how Python memory works.

00:25:15.500 --> 00:25:17.400
Would you say for a lot of people out there?

00:25:17.400 --> 00:25:18.320
Oh, yeah, absolutely.

00:25:18.320 --> 00:25:20.280
Yeah, I think that's definitely true.

00:25:20.280 --> 00:25:21.520
I think it is as well.

00:25:21.520 --> 00:25:30.100
And even these days, with all the machine learning and data science and the higher the abstraction goes, the easier it is to just allocate three gigabytes without you knowing.

00:25:30.100 --> 00:25:35.980
Like, you do something and then suddenly you have half of the RAM filled by something that you don't know what it is.

00:25:35.980 --> 00:25:36.260
Yeah.

00:25:36.260 --> 00:25:40.520
Because, you know, you are so high level that you didn't allocate any of these memories as the library.

00:25:40.700 --> 00:25:45.580
Yeah. Profiling for where time is being spent is something that pretty much every developer wants to do at some point.

00:25:45.580 --> 00:25:50.740
From the very first programs you're writing, you're thinking to yourself, well, I wish this was faster and how can I make this faster?

00:25:50.740 --> 00:26:01.580
I think looking at where your program is spending memory is more of a special case that only comes up in either when you have a program that's using too much memory and you need to figure out how to pair it back.

00:26:01.580 --> 00:26:14.420
Or if you are trying to optimize an entire suite of applications running on one set of boxes and you need to figure out how to make better use of a limited set of machine resources across applications.

00:26:14.420 --> 00:26:17.540
So that comes up more at the enterprise level.

00:26:17.540 --> 00:26:18.140
Yeah, sure.

00:26:18.140 --> 00:26:21.820
We heard Instagram give a talk about what did they entitled it?

00:26:21.820 --> 00:26:26.500
Something like dismissing the GC or something like that where they talked about actually.

00:26:26.500 --> 00:26:33.240
It's very funny because they made that talk and then they make a following up saying like the previous idea was actually bad.

00:26:33.240 --> 00:26:34.940
So now we have a refined version of that.

00:26:34.940 --> 00:26:35.060
I know.

00:26:35.060 --> 00:26:36.100
I remember they did all that.

00:26:36.100 --> 00:26:37.740
We have a refined version of the idea.

00:26:37.740 --> 00:26:38.540
But yeah.

00:26:38.540 --> 00:26:43.260
This was the one where they were disabling GC in their worker processes.

00:26:43.260 --> 00:26:44.120
Yeah.

00:26:44.120 --> 00:26:45.700
For, I think they're Jenga workers.

00:26:45.700 --> 00:26:45.980
Yeah.

00:26:45.980 --> 00:26:46.240
Yes.

00:26:46.240 --> 00:26:47.400
They have a forecast check.

00:26:47.400 --> 00:26:49.400
Quite interesting use case because it's quite common.

00:26:49.400 --> 00:26:54.840
But I want to ask to what Matt said that memory has this funny thing compared with time,

00:26:54.840 --> 00:26:59.520
which is that when people think about the time my program is spending on something,

00:26:59.520 --> 00:27:01.740
they don't really know what they are talking about.

00:27:01.740 --> 00:27:02.040
Right.

00:27:02.040 --> 00:27:02.980
They know what you want.

00:27:02.980 --> 00:27:06.080
Memory is funny because most of the time they actually don't.

00:27:06.080 --> 00:27:07.800
And you will say, how is that possible?

00:27:07.800 --> 00:27:10.740
Like the problem is that with memory is that you understand the problem.

00:27:10.740 --> 00:27:15.000
Like I have this thing called memory on my computer and it's like a number,

00:27:15.000 --> 00:27:18.040
like 12 gigabytes or six gigabytes or whatever it is.

00:27:18.040 --> 00:27:19.440
And it's half full.

00:27:19.520 --> 00:27:23.820
And I understand that concept, but the problem is that why is half full or like,

00:27:23.820 --> 00:27:28.060
what is even like memory in my program, which is different from that value.

00:27:28.060 --> 00:27:30.320
Now there's a huge disconnect.

00:27:30.320 --> 00:27:30.900
Right.

00:27:30.900 --> 00:27:33.000
And this is so much, so interesting.

00:27:33.000 --> 00:27:36.960
Like, I don't know if like, this is going to be a lot, like a super interesting to talk

00:27:36.960 --> 00:27:41.400
about, but like, I want to just highlight this because when, when I, imagine that I ask

00:27:41.400 --> 00:27:43.320
you, what is allocating memory for you?

00:27:43.320 --> 00:27:44.380
Like, what is that?

00:27:44.380 --> 00:27:45.320
It's calling malloc.

00:27:45.320 --> 00:27:46.900
It's creating a Python object.

00:27:46.900 --> 00:27:49.600
It like, because when you, this is, this is very interesting.

00:27:49.600 --> 00:27:52.320
And in Python, because we are so high level, who knows?

00:27:52.320 --> 00:27:57.040
Because when you create a Python object, well, it may or may not require memory.

00:27:57.040 --> 00:28:00.980
But when you call malloc, it may or may not actually allocate memory.

00:28:00.980 --> 00:28:01.260
Right.

00:28:01.700 --> 00:28:07.000
And if you really go and say, okay, so, so just tell me when I really go to that, you

00:28:07.000 --> 00:28:11.500
know, physical memory, and I really spend some of that physical memory in my problem.

00:28:11.500 --> 00:28:16.540
If you want just that, then you are not going to get information about your program because

00:28:16.540 --> 00:28:21.180
you are above so many abstractions that if I just told you when that happens, you're going

00:28:21.180 --> 00:28:28.260
to miss so much because you're going to find that the Python and the runtime C or C++ and

00:28:28.260 --> 00:28:31.460
the OS really likes to batch this operation.

00:28:31.680 --> 00:28:35.220
The same way you don't want to, you know, you're going to read a big file.

00:28:35.220 --> 00:28:39.040
When you call read, you're not going to read one byte at a time because that will be very

00:28:39.040 --> 00:28:39.440
expensive.

00:28:39.440 --> 00:28:42.480
The OS is going to kind of read a big chunk.

00:28:42.480 --> 00:28:47.460
And every time you call read, it's going to give you the pre-chunk that it already fetched.

00:28:47.460 --> 00:28:47.620
Right.

00:28:47.620 --> 00:28:49.180
And here it will happen the same.

00:28:49.180 --> 00:28:53.100
It's going to basically, even if you ask for like a tiny amount, like let's say you want

00:28:53.100 --> 00:28:55.280
just a 5k bytes, right?

00:28:55.280 --> 00:28:59.020
It's going to record like grab a big chunk and then it's going to give you from the chunk until

00:28:59.020 --> 00:29:00.140
it gets rid of.

00:29:00.300 --> 00:29:03.880
So what's going to happen is that you may be very unlucky and you're going to ask for

00:29:03.880 --> 00:29:04.920
a tiny, tiny object.

00:29:04.920 --> 00:29:09.660
And if you only care when I really go to the physical memory, you're going to get like maybe

00:29:09.660 --> 00:29:13.160
a 4k allocation from that very, very tiny object that you ask.

00:29:13.160 --> 00:29:16.980
And then you're going to, that doesn't make any sense because I just wanted space for this

00:29:16.980 --> 00:29:17.520
tiny object.

00:29:17.520 --> 00:29:19.960
And then you located four kilobytes of memory or even more.

00:29:20.200 --> 00:29:22.280
It's super not obvious, isn't it?

00:29:22.280 --> 00:29:22.720
Yeah.

00:29:22.720 --> 00:29:27.560
On Linux, the smallest amount you could possibly allocate from the system is always a multiple

00:29:27.560 --> 00:29:28.420
of four kilobytes.

00:29:28.420 --> 00:29:29.560
Well, that's by default.

00:29:29.560 --> 00:29:30.780
You can actually change that.

00:29:30.780 --> 00:29:32.340
The page size.

00:29:32.340 --> 00:29:33.640
The page size can be changed.

00:29:33.640 --> 00:29:33.940
Yes.

00:29:33.940 --> 00:29:34.620
Can it be lowered?

00:29:34.900 --> 00:29:37.340
I don't think it can be lower, but certainly it can be made higher.

00:29:37.340 --> 00:29:41.080
And when you make higher, there is this big page optimization.

00:29:41.080 --> 00:29:43.300
When it's super ridiculous.

00:29:43.300 --> 00:29:46.560
Actually, Windows, you can do the same, if I recall, because Windows has something called

00:29:46.560 --> 00:29:47.740
huge pages.

00:29:47.740 --> 00:29:49.280
There's something called huge pages.

00:29:49.480 --> 00:29:53.960
And it's very funny because it affects some important stuff, like the speed of hard drives

00:29:53.960 --> 00:29:54.580
and things like that.

00:29:54.580 --> 00:30:02.700
This portion of Talk Python to Me is brought to you by Influx Data, the makers of InfluxDB.

00:30:02.700 --> 00:30:10.000
InfluxDB is a database purpose built for handling time series data at a massive scale for real-time

00:30:10.000 --> 00:30:10.500
analytics.

00:30:10.500 --> 00:30:16.500
Developers can ingest, store, and analyze all types of time series data, metrics, events, and

00:30:16.500 --> 00:30:17.860
traces in a single platform.

00:30:18.520 --> 00:30:20.320
So, dear listener, let me ask you a question.

00:30:20.320 --> 00:30:25.380
How would boundless cardinality and lightning-fast SQL queries impact the way that you develop

00:30:25.380 --> 00:30:26.520
real-time applications?

00:30:26.520 --> 00:30:32.640
InfluxDB processes large time series data sets and provides low-latency SQL queries, making

00:30:32.640 --> 00:30:38.040
it the go-to choice for developers building real-time applications and seeking crucial insights.

00:30:38.040 --> 00:30:44.140
For developer efficiency, InfluxDB helps you create IoT analytics and cloud applications using

00:30:44.140 --> 00:30:47.160
timestamped data rapidly and at scale.

00:30:47.580 --> 00:30:52.520
It's designed to ingest billions of data points in real-time with unlimited cardinality.

00:30:52.520 --> 00:30:58.440
InfluxDB streamlines building once and deploying across various products and environments from

00:30:58.440 --> 00:31:00.900
the edge, on-premise, and to the cloud.

00:31:00.900 --> 00:31:04.920
Try it for free at talkpython.fm/influxDB.

00:31:05.440 --> 00:31:08.360
The link is in your podcast player show notes.

00:31:08.360 --> 00:31:10.520
Thanks to InfluxData for supporting the show.

00:31:13.080 --> 00:31:17.280
Maybe one of you two can give us a quick rundown on the algorithm for all the listeners.

00:31:18.080 --> 00:31:25.380
But the short version is if Python went to the operating system for every single byte of memory

00:31:25.380 --> 00:31:26.380
that it needed.

00:31:26.380 --> 00:31:30.440
So if I create the letter A, it goes, oh, well, I need, you know, what is that?

00:31:30.440 --> 00:31:31.540
30, 40 bytes.

00:31:31.540 --> 00:31:32.400
Turns out.

00:31:32.400 --> 00:31:33.160
Hopefully less.

00:31:33.160 --> 00:31:34.760
Hopefully less.

00:31:34.760 --> 00:31:36.900
But yeah, it's not eight.

00:31:37.020 --> 00:31:39.880
Yeah, it's not just the size, actually, of like you would have in C.

00:31:39.880 --> 00:31:42.380
There's like the reference count and some other stuff.

00:31:42.380 --> 00:31:43.120
Whatever.

00:31:43.120 --> 00:31:45.600
Like it's, let's say, 30, 20 bytes.

00:31:45.600 --> 00:31:48.360
It's not going to go to the operating system and go, I need 20 more.

00:31:48.360 --> 00:31:49.320
20 more bytes.

00:31:49.320 --> 00:31:50.000
20 more bytes.

00:31:50.200 --> 00:31:55.620
It has a whole algorithm of getting certain like blocks of memory, kind of like 4K blocks

00:31:55.620 --> 00:31:56.400
of page size.

00:31:56.400 --> 00:32:02.040
And then internally say, well, here's where I can put stuff until I run out of room to store,

00:32:02.040 --> 00:32:02.960
you know, new.

00:32:02.960 --> 00:32:03.460
Right.

00:32:03.460 --> 00:32:05.240
20 byte size pieces.

00:32:05.240 --> 00:32:07.240
And then I'll go ask for more.

00:32:07.240 --> 00:32:13.160
So you need something that understands Python to tell you what allocation looks like, not just

00:32:13.160 --> 00:32:16.480
something that looks at how the process talks to the OS, right?

00:32:16.480 --> 00:32:18.320
Yeah, I think that's definitely the case.

00:32:18.480 --> 00:32:22.560
There's one pattern that you'll notice with large applications is that there tend to be

00:32:22.560 --> 00:32:23.900
caches all the way down.

00:32:23.900 --> 00:32:28.920
And you can think of this as the C library fetching, allocating memory from the system and then

00:32:28.920 --> 00:32:32.540
caching it for later reuse once it's no longer in use.

00:32:32.540 --> 00:32:36.480
And above that, you've got the Python allocator doing the same thing.

00:32:36.480 --> 00:32:42.640
It's fetching memory from the system allocator and it's caching it itself for later reuse and

00:32:42.640 --> 00:32:46.340
not freeing it back to the system immediately, necessarily.

00:32:46.340 --> 00:32:46.640
Yeah.

00:32:46.640 --> 00:32:51.500
The key here, which is a conversation that I have with some people that are surprised, like,

00:32:51.500 --> 00:32:52.140
like, okay.

00:32:52.140 --> 00:32:54.840
So when they ask like, what is this Python allocator business?

00:32:54.840 --> 00:32:59.360
And when you explain it, they say, well, it's doing the same thing as malloc in the sense that

00:32:59.360 --> 00:33:02.420
when you call malloc, it doesn't really go to the system every single time.

00:33:02.420 --> 00:33:06.220
It does the same thing in a different way with a different algorithm.

00:33:06.220 --> 00:33:08.500
I mean, that the Python allocator does.

00:33:08.500 --> 00:33:10.480
So what's the point if they are doing the same thing?

00:33:10.480 --> 00:33:12.720
The key here is that is the focus.

00:33:12.720 --> 00:33:15.520
Like the algorithm that malloc follows is generic.

00:33:15.520 --> 00:33:17.620
Like it doesn't know what you're going to do.

00:33:17.620 --> 00:33:20.020
It's trying to be fast, as fast as possible.

00:33:20.500 --> 00:33:24.460
But for the, because it doesn't know how you're going to use it, it's going to be, try to make

00:33:24.460 --> 00:33:26.620
it as fast as possible for all possible cases.

00:33:26.620 --> 00:33:31.180
But the Python allocator knows something which is very important, which is that most Python

00:33:31.180 --> 00:33:33.220
objects are quite small.

00:33:33.220 --> 00:33:36.700
And the object itself, not the memory that it holds to, right?

00:33:36.700 --> 00:33:39.700
Because like the list object by itself is small.

00:33:39.700 --> 00:33:44.080
It may contain a lot of other objects, but that's a big array, but the object itself is very small.

00:33:44.420 --> 00:33:46.460
And the other thing is that there tend to be sort of leave.

00:33:46.460 --> 00:33:49.860
This means that there is a huge amount of objects that are being created and destroyed very fast.

00:33:49.860 --> 00:33:52.180
And that is a very specific pattern of uses.

00:33:52.180 --> 00:33:57.560
And it turns out that you can customize the algorithm doing the same basic thing with Matt

00:33:57.560 --> 00:33:58.960
mentioned, this caching of memory.

00:33:58.960 --> 00:34:02.420
You can customize the algorithm to make that particular pattern faster.

00:34:02.420 --> 00:34:05.580
And that's why we have a Python allocator in Python.

00:34:05.580 --> 00:34:06.800
And we have also malloc.

00:34:06.800 --> 00:34:07.160
Right.

00:34:07.160 --> 00:34:09.700
So there's people can go check out the source code.

00:34:09.700 --> 00:34:15.720
There's a thing called PyMalloc that has three data structures that are not just bytes, but

00:34:15.720 --> 00:34:20.980
it has arenas, chunks of memory that PyMalloc directly requests.

00:34:20.980 --> 00:34:25.300
It has pools, which contain fixed sizes of blocks of memory.

00:34:25.300 --> 00:34:32.180
And then these blocks are basically the places where the variables are actually stored, right?

00:34:32.180 --> 00:34:35.780
Like I needed 20 bytes, so that goes into a particular block.

00:34:35.980 --> 00:34:41.480
Often the block is dedicated to a certain size of object, if possible, right?

00:34:41.480 --> 00:34:42.800
And these tend to be quite small.

00:34:42.800 --> 00:34:46.820
Because the other important thing is that this is only used if your object is smallish.

00:34:46.820 --> 00:34:50.520
I think it's 512 kilobytes or something like that.

00:34:50.520 --> 00:34:51.260
There's a limit.

00:34:51.260 --> 00:34:51.740
It doesn't matter.

00:34:51.740 --> 00:34:57.100
The important thing is that if the object is medium size or big, it goes directly to malloc.

00:34:57.100 --> 00:35:01.060
So it doesn't even bother with any of these arenas or blocks.

00:35:01.060 --> 00:35:02.800
So this is just for the small ones.

00:35:02.800 --> 00:35:07.380
And I guess that's because it's already different from the normal allocation pattern that we see

00:35:07.380 --> 00:35:09.260
for Python objects, that they tend to be small.

00:35:09.260 --> 00:35:12.960
At the point where you're getting bigger ones, we might not have as good of information about

00:35:12.960 --> 00:35:14.580
what's going on with that allocation.

00:35:14.580 --> 00:35:18.060
And it might make sense to just let the system malloc handle it.

00:35:18.060 --> 00:35:18.300
Okay.

00:35:18.300 --> 00:35:19.780
So there's that side.

00:35:19.780 --> 00:35:22.300
We have reference counting, which does most of the stuff.

00:35:22.300 --> 00:35:24.560
And then we have GCs that catches the cycle.

00:35:24.560 --> 00:35:28.720
Not really worth going in, but primarily reference counting should be people's mental model,

00:35:28.720 --> 00:35:29.740
I would imagine, right?

00:35:29.740 --> 00:35:31.000
For the lifetime, you mean?

00:35:31.000 --> 00:35:32.360
For the lifetime of objects, yeah.

00:35:32.460 --> 00:35:32.680
Yeah.

00:35:32.680 --> 00:35:33.040
Yeah.

00:35:33.040 --> 00:35:33.360
Yeah.

00:35:33.360 --> 00:35:37.580
That's why it was at least conceivable that Instagram could turn off the GC and

00:35:37.580 --> 00:35:39.280
instantly run out of memory, right?

00:35:39.280 --> 00:35:39.940
Right.

00:35:39.940 --> 00:35:40.080
Right.

00:35:40.080 --> 00:35:45.940
I mean, when they turn off, this is just the pedantic compiler engineer mindset turning on

00:35:45.940 --> 00:35:46.240
here.

00:35:46.240 --> 00:35:48.720
But technically, reference count is a GC model.

00:35:48.720 --> 00:35:51.500
So technically, there is two GCs in Python, right?

00:35:51.500 --> 00:35:52.460
But yeah.

00:35:52.460 --> 00:35:54.580
But normally, when people say that you see...

00:35:54.580 --> 00:35:57.220
How about not the mark and sweep GC?

00:35:57.220 --> 00:35:57.440
Right.

00:35:57.440 --> 00:36:00.480
When people say that you see, they say the cycle GC.

00:36:00.480 --> 00:36:01.320
Yeah.

00:36:01.320 --> 00:36:01.540
Right.

00:36:01.640 --> 00:36:01.780
Yeah.

00:36:01.780 --> 00:36:02.240
Cool.

00:36:02.240 --> 00:36:04.760
Python doesn't actually have a mark and sweep GC.

00:36:04.760 --> 00:36:08.660
The way the cycle collecting GC works is not mark and sweep.

00:36:08.660 --> 00:36:10.980
It's actually implemented in terms of the reference counts.

00:36:10.980 --> 00:36:13.500
It was something that surprised me a lot when I learned it.

00:36:13.500 --> 00:36:13.680
Yeah.

00:36:13.680 --> 00:36:19.040
There is an interesting page in the dev guide written by a crazy Spanish person that goes

00:36:19.040 --> 00:36:20.820
into detail over how it is done.

00:36:20.820 --> 00:36:21.120
Yeah.

00:36:21.120 --> 00:36:21.920
I wonder who wrote that.

00:36:21.920 --> 00:36:22.220
Okay.

00:36:22.220 --> 00:36:23.820
We talked a bit about profilers.

00:36:23.820 --> 00:36:26.500
We, I think, probably dove enough into the memory.

00:36:26.700 --> 00:36:28.080
Again, that could be a whole podcast.

00:36:28.080 --> 00:36:29.540
Just like, how does Python memory work?

00:36:29.540 --> 00:36:34.000
But let's focus on not how does it work, but just measuring it for our apps.

00:36:34.360 --> 00:36:39.020
And you touched on this earlier, you guys, when you talked about there's memory and there's

00:36:39.020 --> 00:36:43.300
performance, but there's also a relationship between memory and performance, right?

00:36:43.300 --> 00:36:47.860
Like, for example, you might have an algorithm that allocates a bunch of stuff that's thrown

00:36:47.860 --> 00:36:48.720
away really quickly.

00:36:48.720 --> 00:36:51.400
And allocation and deallocation has a cost, right?

00:36:51.400 --> 00:36:57.580
You might have more things in memory that mean cache misses on the CPU, which might make

00:36:57.580 --> 00:36:58.320
it run slower, right?

00:36:58.320 --> 00:37:02.160
There's a lot of effects that kind of tie together with performance in memory.

00:37:02.160 --> 00:37:04.940
So I think it's not just about memory.

00:37:04.940 --> 00:37:07.740
It's what I'm trying to say, that you want to know what it's up to.

00:37:07.740 --> 00:37:08.840
So tell us about memory.

00:37:08.840 --> 00:37:10.060
It's such a cool project.

00:37:10.060 --> 00:37:16.640
So yeah, memory is our memory profiler as a lot of fairly interesting features.

00:37:16.640 --> 00:37:17.120
It does.

00:37:17.240 --> 00:37:23.260
One of them is that it supports a live mode where you can see where your application is

00:37:23.260 --> 00:37:28.480
spending memory as it's running, as like a nice little automatically updating grid that

00:37:28.480 --> 00:37:31.320
has that information in it that you can watch as the program runs.

00:37:31.320 --> 00:37:35.680
It also has the ability to attach to an already running program and tell you some stuff about

00:37:35.680 --> 00:37:35.900
it.

00:37:35.900 --> 00:37:41.760
But sort of the main way of running it is just capturing capture file as the program runs in

00:37:41.760 --> 00:37:44.560
the same way as cprofile would capture its capture file.

00:37:44.560 --> 00:37:45.520
Check out the report.

00:37:45.520 --> 00:37:45.800
Yeah.

00:37:46.180 --> 00:37:46.620
Yeah.

00:37:46.620 --> 00:37:49.220
Doing some reporting based on that capture file after the fact.

00:37:49.220 --> 00:37:54.000
So just for people listening, because I know they can't see this, the live version is awesome.

00:37:54.000 --> 00:38:01.900
If you've ever run glances or htop or something like that, where you can kind of see a two-y type

00:38:01.900 --> 00:38:07.460
of semi-graphical live updating dashboard, it's that, but for memory.

00:38:07.460 --> 00:38:09.400
This is really nice.

00:38:09.620 --> 00:38:09.740
Yeah.

00:38:09.740 --> 00:38:09.740
Yeah.

00:38:09.740 --> 00:38:16.060
And the other really cool feature that it's got is the ability to see into a C or Ruster's C++ extension

00:38:16.060 --> 00:38:16.520
modules.

00:38:16.520 --> 00:38:23.280
So you can see what's happening under the hood inside of things that are being used from your Python code.

00:38:23.380 --> 00:38:30.060
So if you're calling a library that's implemented partly in C, like NumPy, you can see how NumPy is doing its allocations under the hood.

00:38:30.060 --> 00:38:30.360
Right.

00:38:30.360 --> 00:38:30.560
Yeah.

00:38:30.560 --> 00:38:37.960
Pablo, you were touching on this a little bit, like how the native layer is kind of a black box that you don't really see into, you don't see into a CPython.

00:38:38.360 --> 00:38:41.900
Sorry, with C profile, but also with some of the other memory profilers.

00:38:41.900 --> 00:38:42.060
Right.

00:38:42.060 --> 00:38:44.320
And this, this looks at it across the board.

00:38:44.320 --> 00:38:45.920
C, C++, Rust.

00:38:45.920 --> 00:38:46.360
Right.

00:38:46.360 --> 00:38:51.020
So this is kind of important because as we discussed before, what is memory?

00:38:51.020 --> 00:38:53.680
Not only is complicated, but also depends on what you want.

00:38:53.680 --> 00:38:59.100
Like the thing is that, and this is quite a big, important part is that you really need to know what you're looking for.

00:38:59.580 --> 00:39:06.940
So for instance, we memory kind of highlights two important parts, which is that it sees all possible allocations.

00:39:06.940 --> 00:39:17.320
So not only the ones made by Python, because like Python has a way to tell you when an object is going to be created, but it doesn't really, it's not going to tell you is you are going to kind of like use memory for it or not.

00:39:17.320 --> 00:39:21.780
Among other things, because for instance, there is the Python even caches of entire objects.

00:39:21.780 --> 00:39:23.400
There is this concept of free lists.

00:39:23.400 --> 00:39:26.340
So object creation doesn't really mean memory allocation.

00:39:26.760 --> 00:39:34.440
It also tells you when you are going to allocate memory, when you normally run Python, you may use PyMalloc and PyMalloc caches that memory.

00:39:34.440 --> 00:39:37.560
So you don't really, you may not go to the actual system.

00:39:37.560 --> 00:39:42.080
So by default, memory checks all allocations onto the system allocator.

00:39:42.080 --> 00:39:43.120
So malloc basically.

00:39:43.120 --> 00:39:46.680
So every time you call malloc or mem map or one of these, we see it.

00:39:46.680 --> 00:39:54.120
And apart from seeing it and recording it, we also can tell you who made the location from C++ and Python.

00:39:54.240 --> 00:40:02.580
On top of that, if you really want to know when you create objects, well, not objects, but like when, when Python says I need memory, we can also tell you that if you want.

00:40:02.580 --> 00:40:07.620
So if you really want to know, well, I don't really care if PyMalloc caches and whatnot.

00:40:07.620 --> 00:40:11.100
Every single time Python does, it requires memory.

00:40:11.100 --> 00:40:17.920
Just tell me, even if you reuse it, maybe I just want to know because that kind of will show you a bit of like when you require object creation.

00:40:17.920 --> 00:40:22.300
Again, not 100%, but mostly, mostly doing that.

00:40:22.300 --> 00:40:28.260
And the idea here is that you can, you can really customize what you want to track and you don't pay for what you don't want.

00:40:28.260 --> 00:40:37.260
So for instance, most of the time you don't want the, to know when Python requires memory because most of the time it's not going to actually impact your memory usage.

00:40:37.260 --> 00:40:37.260
Right.

00:40:37.260 --> 00:40:43.260
Because as you mentioned, PyMalloc is going to use one of these arenas and you're going to see the actual malloc.

00:40:43.260 --> 00:40:48.720
But sometimes you want, so memory allows you to know, decide when you want to track one.

00:40:48.720 --> 00:40:55.720
And by default, it's going to use the faster method, which is mostly the, is the most similar to when you execute your program.

00:40:55.720 --> 00:41:03.720
And, or an interesting feature that as of this time only memory has is that it can tell you the location, like who actually made the location.

00:41:03.720 --> 00:41:05.720
So who call who, et cetera.

00:41:05.720 --> 00:41:05.720
Right.

00:41:05.720 --> 00:41:10.720
So you're going to tell you this Python function called this C function that in turn call this Python function.

00:41:10.720 --> 00:41:15.720
And this one actually made a call to malloc or, or created a Python list or something like that.

00:41:15.720 --> 00:41:20.720
I think that was really a fantastic feature that it's easy to kind of miss the significance of that.

00:41:20.720 --> 00:41:27.720
But if you get a memory profiler, it just says, look, you allocated a thousand lists and they used a good chunk of your memory.

00:41:27.720 --> 00:41:31.720
You're like, well, okay, well let's go through and find where lists are coming from.

00:41:31.720 --> 00:41:32.720
Right.

00:41:32.720 --> 00:41:42.720
Like, like converting that information back of how many of these types of objects and how many of those objects you allocated back to like, where can I look at my code and possibly make a change about that?

00:41:42.720 --> 00:41:44.720
That can be really, really tricky.

00:41:44.720 --> 00:41:49.720
And so the fact that you can see this function is allocating this much stuff is super helpful.

00:41:49.720 --> 00:41:53.720
One of the important things here to highlight, which I think is interesting.

00:41:53.720 --> 00:42:00.720
Maybe Matt can also cover it more in detail, but is that memory, most memory profilers are actually sampling profilers.

00:42:00.720 --> 00:42:06.720
Reason is that the same way tracing profilers for function calls need to trace every single function call.

00:42:06.720 --> 00:42:11.720
A memory profiler is that tracing memory profile needs to trace every single allocation.

00:42:11.720 --> 00:42:15.720
But there are allotions happen much more often than function calls.

00:42:15.720 --> 00:42:28.720
So you made the calculation based on normal programs, when it can be anything that you want, just open Python even, or even any C or C++, you're going to see that actually you allocate a huge amount of, so doing something per allocation is super expensive.

00:42:28.720 --> 00:42:30.720
It's extremely expensive.

00:42:30.720 --> 00:42:32.720
And most profilers, what they do is that they do sampling.

00:42:32.720 --> 00:42:33.720
It's a different kind of sampling.

00:42:33.720 --> 00:42:35.720
So it's not this photo kind of thing.

00:42:35.720 --> 00:42:37.720
They use a different statistic based on bytes.

00:42:37.720 --> 00:42:42.720
So they basically see these memories, a stream of bytes, and they decide to sample some of them.

00:42:42.720 --> 00:42:47.720
So they are inaccurate, but normally they try to be, use statistics to tell you some information.

00:42:47.720 --> 00:42:48.720
So memory on the other hand.

00:42:48.720 --> 00:42:55.720
To give an example, instead of sampling every 10 milliseconds and seeing what the process is doing right now, it's sampling every 10 bytes.

00:42:55.720 --> 00:43:01.720
So every time a multiple of 10 bytes is allocated from the system, it checks what was allocating that.

00:43:01.720 --> 00:43:07.720
Although it'll use a bigger number than 10 in order to ask for this to actually be effective, since most allocations will get at least 10 bytes.

00:43:07.720 --> 00:43:08.720
But something like that.

00:43:08.720 --> 00:43:09.720
Yeah.

00:43:09.720 --> 00:43:10.720
Right.

00:43:10.720 --> 00:43:13.720
So memory is tracing, which means that it sees every single allocation.

00:43:13.720 --> 00:43:20.720
This is quite an interesting kind of decision here because like, you know, it's very, very hard to make a tracing profiler that is not extremely slow.

00:43:20.720 --> 00:43:25.720
So, you know, memory tries to be very fast, but obviously it's going to be a bit slower than sampling profilers.

00:43:25.720 --> 00:43:34.720
But the advantage of this, what makes memory quite unique is that because it captures every single allocation into the file, which has a huge amount of technical challenges.

00:43:34.720 --> 00:43:36.720
For instance, these files can be genormals.

00:43:36.720 --> 00:43:42.720
Like we are taking gigabytes and gigabytes, and we put a ridiculous amount of effort into making them as small as possible.

00:43:42.720 --> 00:43:44.720
So it has double compression and things like that.

00:43:44.720 --> 00:43:46.720
So you're not using XML to store that?

00:43:46.720 --> 00:43:47.720
No, I certainly not.

00:43:47.720 --> 00:43:56.720
You know, the first version almost, I think every, if you look at our release notes from one version to the next, every version, we're like, and the capture files are now 90% smaller.

00:43:56.720 --> 00:43:59.720
Again, we've continued to find more and more ways to shrink.

00:43:59.720 --> 00:44:00.720
Sure.

00:44:00.720 --> 00:44:00.720
Right.

00:44:00.720 --> 00:44:12.720
At the cost of the now reasoning about what is in the file is just bananas because, you know, we, we kind of do a first manual compression based on the information we know is there, but then we run it C4 on that.

00:44:12.720 --> 00:44:14.720
So, so it's like double compression already.

00:44:14.720 --> 00:44:20.720
And there is even like a mode when, when we pre kind of massage the data into the only one that you care.

00:44:20.720 --> 00:44:21.720
So it's even smaller.

00:44:21.720 --> 00:44:23.720
So it is out of effort.

00:44:23.720 --> 00:44:28.720
But the advantage of having that much information is that now we can produce a huge amount of reports.

00:44:28.720 --> 00:44:40.720
So for instance, not only we can show you the classic flame graph, like this, this visualization over hook or what, I'm like, you know, instead of where you're spending your time, where did you locate your memory, but we can do some cooler things.

00:44:40.720 --> 00:44:46.720
So for instance, we can, you mentioned that there is this relationship between like running time and memory.

00:44:46.720 --> 00:44:55.720
So one of the things that we can show you in the latest versions of memory is that for instance, in my end that you have like a Python list, or if you're in C++ a vector, right.

00:44:55.720 --> 00:45:01.720
And then you have a huge amount of data you want to put into the vector and you start adding, showing Python will be append.

00:45:01.720 --> 00:45:02.720
So you start calling append.

00:45:02.720 --> 00:45:07.720
And then at some point the list has a pre-allocated size and you're going to fill it.

00:45:07.720 --> 00:45:10.720
And then there is no more size, no more room for the data.

00:45:10.720 --> 00:45:12.720
So it's going to say, well, I need more memory.

00:45:12.720 --> 00:45:14.720
So you're going to require a bigger chunk of memory.

00:45:14.720 --> 00:45:17.720
You're going to copy all the previous elements into the new chunk.

00:45:17.720 --> 00:45:22.720
And then it's going to keep adding elements and it's going to happen again and again and again and again.

00:45:22.720 --> 00:45:28.720
So if you want to introduce millions of elements into your list, because it doesn't know how many you need.

00:45:28.720 --> 00:45:31.720
I mean, you could tell it, but in Python is a bit more tricky than in C++.

00:45:31.720 --> 00:45:36.720
C++ has a call reserve when you can say, I'm going to need this many.

00:45:36.720 --> 00:45:39.720
So just, just make one call to the allocator and then let me fill it.

00:45:39.720 --> 00:45:42.720
But in Python, there is a way to do it, but not a lot.

00:45:42.720 --> 00:45:45.720
So the idea here is that it's going to go through these cycles of getting bigger and bigger.

00:45:45.720 --> 00:45:49.720
And obviously it's going to be as low because every time you require memory, you pay time.

00:45:49.720 --> 00:45:52.720
And memory can detect this pattern because we have the information.

00:45:52.720 --> 00:45:58.720
So memory can tell you when you are doing this pattern of like creating a bigger chunk, copying, creating a bigger chunk, copying.

00:45:58.720 --> 00:46:05.720
And it's going to tell you, hey, these areas of your code, you could pre-reserve a bigger chunk in Python.

00:46:05.720 --> 00:46:11.720
There is idioms depending on what you're doing, but it's going to tell you, maybe you want to tell whatever you're creating to just allocate once.

00:46:11.720 --> 00:46:15.720
So for instance, in Python, you can multiply a list of none by 10 million.

00:46:15.720 --> 00:46:17.720
And it's going to create a list of 10 million nones.

00:46:17.720 --> 00:46:21.720
And instead of calling append, you set the element using .

00:46:21.720 --> 00:46:22.720
Oh, interesting.

00:46:22.720 --> 00:46:27.720
Yeah, kind of keep track of yourself of where it is instead of just using len of.

00:46:27.720 --> 00:46:28.720
Exactly.

00:46:28.720 --> 00:46:35.720
But in C++, for instance, with memory also Cs, as long as it's called from Python, so it's going to tell you, well, you should use reserve.

00:46:35.720 --> 00:46:38.720
So tell the vector how many elements you need.

00:46:38.720 --> 00:46:40.720
Therefore, you're not going to go into this.

00:46:40.720 --> 00:46:43.720
There's not a way to do that in Python lists though, is there?

00:46:43.720 --> 00:46:46.720
To actually set like a capacity level when you allocate it?

00:46:46.720 --> 00:46:47.720
With this trick.

00:46:47.720 --> 00:46:48.720
With this trick.

00:46:48.720 --> 00:46:49.720
Yeah, yeah.

00:46:49.720 --> 00:46:50.720
Then you can't use len on it anymore, right?

00:46:50.720 --> 00:46:53.720
There's not a something in the initialization.

00:46:53.720 --> 00:46:54.720
Yeah, okay.

00:46:54.720 --> 00:46:57.720
I didn't think so either, but I could have missed it and it would be important.

00:46:57.720 --> 00:46:58.720
No, no, no.

00:46:58.720 --> 00:47:05.720
There are ways that I don't want to reveal because the list has a, it works the same as a vector.

00:47:05.720 --> 00:47:12.720
It's just that the reserve call is not exposed, but there are ways to trick the list into thinking that it needs a lot of memory.

00:47:12.720 --> 00:47:15.720
But I know how to reveal it so people don't rely on them.

00:47:15.720 --> 00:47:18.720
Those ways are implementation details that can change from one Python version to the next.

00:47:18.720 --> 00:47:19.720
Yeah, exactly.

00:47:19.720 --> 00:47:20.720
For instance, one example.

00:47:20.720 --> 00:47:21.720
Let me give you one example.

00:47:21.720 --> 00:47:26.720
Imagine that you have a tuple of 10 million elements and then you call list on the tuple.

00:47:26.720 --> 00:47:31.720
So you want a list of those 2 million elements because Python knows that it's a tuple and it knows the size.

00:47:31.720 --> 00:47:33.720
It knows how many elements it needs.

00:47:33.720 --> 00:47:35.720
So it's going to just require the million elements array.

00:47:35.720 --> 00:47:36.720
And then you're going to just copy them in one go.

00:47:36.720 --> 00:47:38.720
So it's not going to go through this.

00:47:38.720 --> 00:47:39.720
I see.

00:47:39.720 --> 00:47:44.720
You can pass a, some kind of iterable to a list to allocate it.

00:47:44.720 --> 00:47:53.720
But if it is a specific type where Python knows about it and says, oh, I actually know how big that is instead of doing the growing algorithm, it'll just initialize.

00:47:53.720 --> 00:47:53.720
Okay.

00:47:53.720 --> 00:47:57.720
I think it's an implementation detail of CPython in the sense that this only works in CPython.

00:47:57.720 --> 00:48:02.720
I don't really remember, but there is this magic method you can implement on your classes called len hint.

00:48:02.720 --> 00:48:10.720
So this is underscore, underscore, len, underscore, hint, underscore, underscore, that is not the len, but it's a hint to the, to Python.

00:48:10.720 --> 00:48:14.720
And it's going to say, well, this is not the real len, but it's kind of an idea.

00:48:14.720 --> 00:48:17.720
And this is useful for instance, for generators or iterators.

00:48:17.720 --> 00:48:23.720
So, so you may not know how many elements there are because it's a generator, but you may know, like, at least this many.

00:48:23.720 --> 00:48:28.720
So Python uses this information sometimes to pre-allocate, but I don't think this is like in the language.

00:48:28.720 --> 00:48:30.720
I think this is just a CPython.

00:48:30.720 --> 00:48:31.720
Sure.

00:48:31.720 --> 00:48:32.720
Okay.

00:48:32.720 --> 00:48:33.720
Excellent.

00:48:33.720 --> 00:48:37.720
So let's talk about maybe some of the different reporters you've got.

00:48:37.720 --> 00:48:39.720
So you talked about the flame graph.

00:48:39.720 --> 00:48:43.720
You've got a TQDM style report.

00:48:43.720 --> 00:48:47.720
You can put it just out on, you know, nice colors and emoji out onto the terminal.

00:48:47.720 --> 00:48:50.720
Like, give us some sense of like how we can look at this data.

00:48:50.720 --> 00:48:51.720
Yeah.

00:48:51.720 --> 00:48:53.720
That one is showing you kind of just aggregate statistics about the run.

00:48:53.720 --> 00:48:58.720
So it tells you a histogram of how large your allocations tended to be.

00:48:58.720 --> 00:49:07.720
It gives you some statistics about the locations that did the most allocating and the locations that did the largest number of allocations.

00:49:07.720 --> 00:49:13.720
So the most by number of bytes and the most by count, as well as just what your total amount of memory allocated was.

00:49:13.720 --> 00:49:18.720
It's interesting because this one looks across the entire runtime of the process.

00:49:18.720 --> 00:49:24.720
A lot of our other reports will like the other major one that we need to talk about is the flame graph reporter.

00:49:24.720 --> 00:49:31.720
That's probably the most useful way for people in general to look at what the memory usage of their program is.

00:49:31.720 --> 00:49:32.720
But the flame graph.

00:49:32.720 --> 00:49:34.720
So what a flame graph is, let's start there.

00:49:34.720 --> 00:49:39.720
A flame graph is shows you memory broken out by call tree.

00:49:39.720 --> 00:49:48.720
So rather than showing any time dimension at all, the flame graph shows you this function called that function called that function called that function.

00:49:48.720 --> 00:50:04.720
And at any given depth of the call tree, the width of one of the function nodes in the graph shows you what percentage of the memory usage of the process was can be allocated to that call or one of the children below it.

00:50:04.720 --> 00:50:11.720
That can be a really useful way, a really intuitive way of viewing how time or memory is being spent across a process.

00:50:11.720 --> 00:50:15.720
But the downside to it is that it does not have a time dimension.

00:50:15.720 --> 00:50:26.720
So with a memory flame graph like this, it's showing you a snapshot at a single moment in time of how the memory usage at that time existed.

00:50:26.720 --> 00:50:29.720
There's two different points in time that you can select for our flame graph reports.

00:50:29.720 --> 00:50:37.720
So you can either pick time right before tracking started or sorry, right before tracking stopped, which is sort of the point at which you would expect everything to have been freed.

00:50:37.720 --> 00:50:41.720
And you can use that point to analyze whether anything was leaked.

00:50:41.720 --> 00:50:43.720
Something was allocated and not deallocated.

00:50:43.720 --> 00:50:44.720
And you want to pay attention to that.

00:50:44.720 --> 00:50:52.720
The other place where you can ask it to focus in on is the point at which the process used the most memory.

00:50:52.720 --> 00:50:58.720
So the point during tracking when the highest amount of memory was used, it'll by default focus on that point.

00:50:58.720 --> 00:51:03.720
And it will tell you at that point how much memory could be allocated to each unique call stack.

00:51:03.720 --> 00:51:05.720
Yeah, these flame graphs are great.

00:51:05.720 --> 00:51:06.720
You have nice search.

00:51:06.720 --> 00:51:12.720
You got really good tool tips, obviously, because some of these little slices can be incredibly small tool tips there.

00:51:12.720 --> 00:51:13.720
But you can click on them.

00:51:13.720 --> 00:51:15.720
If you click on one of them, it will zoom.

00:51:15.720 --> 00:51:16.720
Oh yeah.

00:51:16.720 --> 00:51:17.720
Okay.

00:51:17.720 --> 00:51:21.720
And then it, yeah, if you click on one, then it'll like expand down and just focus on.

00:51:21.720 --> 00:51:31.720
For instance, the example that you're looking at for the people like here in the podcast, they were not going to see it, but here there is one of these flame graphs and one of the kind of like paths in the flame graph.

00:51:31.720 --> 00:51:34.720
One of the notes in the, in the tree is about imports.

00:51:34.720 --> 00:51:37.720
So here I'm looking at a line that says from something import core.

00:51:37.720 --> 00:51:41.720
So that's obviously memory that was allocated during importing.

00:51:41.720 --> 00:51:45.720
So obviously you're kind of get rid of that, but hopefully unless you're implanted in the library.

00:51:45.720 --> 00:51:47.720
So you may not care about that one.

00:51:47.720 --> 00:51:48.720
You may care about the rest.

00:51:48.720 --> 00:51:52.720
So you could click in the other path and then you don't care about you.

00:51:52.720 --> 00:51:56.720
You are going to see only the memory that was not allocated during imports.

00:51:56.720 --> 00:51:57.720
Right.

00:51:57.720 --> 00:51:58.720
Or you could be surprised.

00:51:58.720 --> 00:52:01.720
You could go, wait, why is half my memory being used during an import?

00:52:01.720 --> 00:52:03.720
And I only sometimes even use that library.

00:52:03.720 --> 00:52:05.720
You could push that down.

00:52:05.720 --> 00:52:07.720
Well, it's like additionally imported or something.

00:52:07.720 --> 00:52:08.720
Right.

00:52:08.720 --> 00:52:10.720
Like here, as you can see, you go up in this example.

00:52:10.720 --> 00:52:12.720
I think this example uses non-py.

00:52:12.720 --> 00:52:13.720
Yes.

00:52:13.720 --> 00:52:16.720
So you hover over this line that says import non-py as MP.

00:52:16.720 --> 00:52:17.720
Yeah.

00:52:17.720 --> 00:52:21.720
You may be surprised that importing non-py is 63 megabytes.

00:52:21.720 --> 00:52:22.720
Megabyte.

00:52:22.720 --> 00:52:25.720
And 44,000 allocations as well.

00:52:25.720 --> 00:52:26.720
Yeah.

00:52:26.720 --> 00:52:27.720
Just by importing.

00:52:27.720 --> 00:52:28.720
So here you go.

00:52:28.720 --> 00:52:29.720
Surprise.

00:52:29.720 --> 00:52:30.720
So yes, that's...

00:52:30.720 --> 00:52:36.720
And if someone wants to be extremely surprised, just try to import answer flow and see what happens.

00:52:36.720 --> 00:52:37.720
Okay.

00:52:37.720 --> 00:52:39.720
I can tell you that it's not a nice surprise.

00:52:39.720 --> 00:52:43.720
But here you can kind of focus on different parts if you want.

00:52:43.720 --> 00:52:48.720
Also, we have these nice like check boxes in the top that automatically hide the imports.

00:52:48.720 --> 00:52:51.720
So you don't care about the imports one.

00:52:51.720 --> 00:52:52.720
It just hides them.

00:52:52.720 --> 00:53:00.720
So you can just focus on the part that is just not imports, which is a very common pattern because, again, you may not be able to optimize non-py yourself.

00:53:00.720 --> 00:53:01.720
Right?

00:53:01.720 --> 00:53:04.720
If you decide you have to use it, then you have to use it.

00:53:04.720 --> 00:53:08.720
So it allows you to clean a bit because these ones can get quite complicated.

00:53:08.720 --> 00:53:09.720
Mm-hmm.

00:53:09.720 --> 00:53:14.720
So another thing that stands out here is I could see that it says the Python allocator is PyMalloc.

00:53:14.720 --> 00:53:20.720
This is the one that we've been talking about with arenas, pools, and blocks, and pre-allocating, and all of those things.

00:53:20.720 --> 00:53:21.720
That's not what's interesting.

00:53:21.720 --> 00:53:26.720
What's interesting is you must be showing us this because there might be another one.

00:53:26.720 --> 00:53:27.720
That's right.

00:53:27.720 --> 00:53:27.720
Okay.

00:53:27.720 --> 00:53:28.720
Well, not another one.

00:53:28.720 --> 00:53:29.720
Python only ships with...

00:53:29.720 --> 00:53:31.720
Well, Python does ship with two, kind of.

00:53:31.720 --> 00:53:34.720
It's also got a debug one that you wouldn't normally use.

00:53:34.720 --> 00:53:42.720
But the reason we're showing this to you is because it makes it very hard to find where memory leaks happen if you're using the PyMalloc allocator.

00:53:42.720 --> 00:53:51.720
So if you're using PyMalloc as your allocator, you can wind up with memory that has been freed back to Python but not yet freed back to the system.

00:53:51.720 --> 00:53:57.720
And we won't necessarily know what objects were responsible for that.

00:53:57.720 --> 00:54:05.720
And if you're looking at memory leaks, we won't be able to tell you whether every object has been destroyed because we won't see that the memory has gone back to the system.

00:54:05.720 --> 00:54:07.720
And that's what we're looking for at the leaks level.

00:54:07.720 --> 00:54:08.720
Now, as Python...

00:54:08.720 --> 00:54:09.720
Sorry.

00:54:09.720 --> 00:54:12.720
As Pablo said earlier, there's an option of tracing the Python allocators as well.

00:54:12.720 --> 00:54:24.720
So in memory leaks mode, you either want to trace the Python allocators as well so that we can see when Python objects are freed and we know not to report them as having been leaked as long as they were ever freed.

00:54:24.720 --> 00:54:28.720
Or you can run with a different allocator, just malloc.

00:54:28.720 --> 00:54:36.720
You can tell Python to disable the PyMalloc allocator entirely and just whenever it needs any memory to always just call the system malloc.

00:54:36.720 --> 00:54:37.720
And in that case...

00:54:37.720 --> 00:54:38.720
Oh, interesting.

00:54:38.720 --> 00:54:39.720
Okay.

00:54:39.720 --> 00:54:40.720
In that case, I'm not saying...

00:54:40.720 --> 00:54:43.720
Yeah, there is an environment variable called Python malloc.

00:54:43.720 --> 00:54:45.720
So all uppercase, all together, Python malloc.

00:54:45.720 --> 00:54:50.720
And then you can set it to malloc, the word malloc, and that will deactivate by malloc.

00:54:50.720 --> 00:54:54.720
You can set it to py malloc, which will do nothing because by default you get that.

00:54:54.720 --> 00:54:57.720
But you can also set it to py malloc debug or something like that.

00:54:57.720 --> 00:54:59.720
I don't recall exactly that one.

00:54:59.720 --> 00:55:01.720
I think it's py malloc plus debug.

00:55:01.720 --> 00:55:02.720
Right.

00:55:02.720 --> 00:55:06.720
And that will set the debug version of py malloc, which will tell you like if you use it wrong or things like that.

00:55:06.720 --> 00:55:12.720
The important thing also, apart from what Matt said, is that using py malloc can be slightly surprising sometimes.

00:55:12.720 --> 00:55:16.720
But the important thing to highlight here is that this is what really happens.

00:55:16.720 --> 00:55:20.720
So normally you want to run with this on because that is good to tell you what happened.

00:55:20.720 --> 00:55:23.720
It's just that what happened may be a bit surprised.

00:55:23.720 --> 00:55:28.720
Imagine, for instance, the case that we mentioned before, imagine that you allocate a big list.

00:55:28.720 --> 00:55:30.720
Not a huge one, but quite a big one.

00:55:30.720 --> 00:55:36.720
And then it turns out that that didn't allocate any memory because, you know, it was already there available in the arenas.

00:55:36.720 --> 00:55:36.720
Right.

00:55:36.720 --> 00:55:39.720
And then you allocated like the letter A.

00:55:39.720 --> 00:55:43.720
Well, maybe not the letter A, but the letter E from the Spanish alphabet, right?

00:55:43.720 --> 00:55:44.720
Yeah.

00:55:44.720 --> 00:55:48.720
Which is, it's especially not cache because like, where are you going to cache that?

00:55:48.720 --> 00:55:52.720
If you allocate the letter E, then suddenly there is no more memory.

00:55:52.720 --> 00:55:55.720
So py malloc says, well, I don't have any more memory.

00:55:55.720 --> 00:55:57.720
So let me allocate four kilobytes.

00:55:57.720 --> 00:56:04.720
And then when you look at your fling graph, you're going to, the flinger is going to tell you your letter E took four kilobytes.

00:56:04.720 --> 00:56:06.720
And you're going to say, what?

00:56:06.720 --> 00:56:07.720
How is that possible?

00:56:07.720 --> 00:56:10.720
And then you're going to go onto Reddit and rage about how.

00:56:10.720 --> 00:56:11.720
Yeah, Python is stupid.

00:56:11.720 --> 00:56:12.720
How bad you like.

00:56:12.720 --> 00:56:13.720
Exactly.

00:56:13.720 --> 00:56:15.720
And you are going to say, how is this even possible?

00:56:15.720 --> 00:56:23.720
Well, the two important facts here is that, yes, it's possible because you're not, it's not that the letter E itself needed four kilobytes.

00:56:23.720 --> 00:56:29.720
But when you, when you wanted that, then this happens, which is what the fling graph is telling you.

00:56:29.720 --> 00:56:31.720
You may say, oh, but that's not what I want to know.

00:56:31.720 --> 00:56:33.720
I want to know how much the letter E took.

00:56:33.720 --> 00:56:37.720
Then you need to deactivate py malloc or set Python trace allocation, which you can.

00:56:37.720 --> 00:56:45.720
It's just that normally the actual thing that you want, which is very intuitive if you think about it, is what happened when I requested this object.

00:56:45.720 --> 00:56:47.720
Because that's when your program run is going to happen.

00:56:47.720 --> 00:56:53.720
Because like, imagine that normally you reach for one of these memory profilers, not by, not for looking at your program.

00:56:53.720 --> 00:56:56.720
Like, oh, let me look at my beautiful program.

00:56:56.720 --> 00:56:57.720
How is this in memory?

00:56:57.720 --> 00:56:58.720
How is this in memory?

00:56:58.720 --> 00:56:59.720
You're rich because you have a problem.

00:56:59.720 --> 00:57:03.720
The problem normally is that I don't have an old memory and my program is using too much.

00:57:03.720 --> 00:57:04.720
Why is that?

00:57:04.720 --> 00:57:09.720
And to answer that question, you normally want to know what happens when you run your program.

00:57:09.720 --> 00:57:13.720
You don't want to know what happens if I deactivate this thing and yada, yada, right?

00:57:13.720 --> 00:57:17.720
And you want to absolutely take care of like, okay, there is this thing that is caching memory.

00:57:17.720 --> 00:57:22.720
Because like, if you run it without PyMalloc, it may report a higher peak, right?

00:57:22.720 --> 00:57:29.720
Because like, it's going to simulate that every single object that you want to request, require memory when it really didn't happen, right?

00:57:29.720 --> 00:57:32.720
Because maybe actually it was cached before.

00:57:32.720 --> 00:57:38.720
Or in other words, the actual peak that your program is going to reach may be in a different point as well.

00:57:38.720 --> 00:57:45.720
Because if you deactivate this caching, then the actual peak is going to happen at a different point, right?

00:57:45.720 --> 00:57:46.720
Or under different conditions.

00:57:46.720 --> 00:57:51.720
So you really want that engine to report 4K most of the time, except with leaks.

00:57:51.720 --> 00:57:53.720
Because in leaks, it's a very specific case.

00:57:53.720 --> 00:57:57.720
In leaks, you want to know, did I forget to deallocate an object?

00:57:57.720 --> 00:58:02.720
And for that, you need to know really like, you know, the relationship between every single allocation and deallocation.

00:58:02.720 --> 00:58:03.720
And you don't want caching.

00:58:03.720 --> 00:58:04.720
Right.

00:58:04.720 --> 00:58:08.720
So you've got to be exactly always traced and always removed.

00:58:08.720 --> 00:58:15.720
We saw a big red warning if you run with leaks and PyMalloc saying like, this is very likely not what you want.

00:58:15.720 --> 00:58:16.720
But who knows?

00:58:16.720 --> 00:58:18.720
Maybe someone wants that, right?

00:58:18.720 --> 00:58:19.720
Maybe.

00:58:19.720 --> 00:58:21.720
You might still detect it, but you might not.

00:58:21.720 --> 00:58:22.720
Well, yeah.

00:58:22.720 --> 00:58:28.720
Like I have used that in CPython itself, for instance, because we have used successfully, like the spam.

00:58:28.720 --> 00:58:34.720
We have used successfully memory in several cases in CPython to find memory leaks.

00:58:34.720 --> 00:58:46.720
And to greater success because the fact that we can see SQL is just fantastic for CPython because it literally tells you where you forgot to put PyInRef or PyDef or something like that, which is fantastic.

00:58:46.720 --> 00:58:52.720
We have found bugs that were there for almost 15 years just because we couldn't.

00:58:52.720 --> 00:58:56.720
It was so complicated to locate those bugs that until we have something like this.

00:58:56.720 --> 00:58:57.720
Nobody saw it.

00:58:57.720 --> 00:58:58.720
Right, exactly.

00:58:58.720 --> 00:59:09.720
But I have required sometimes to know the leaks with PyMalloc enabled just to understand what was, how PyMalloc was holding onto memory, which for us is important, but maybe not for the user.

00:59:09.720 --> 00:59:10.720
All right.

00:59:10.720 --> 00:59:11.720
Two more things.

00:59:11.720 --> 00:59:13.720
We don't have a lot of time left.

00:59:13.720 --> 00:59:16.720
Let's talk about temporary allocations real quick.

00:59:16.720 --> 00:59:25.720
I think that that's an interesting aspect that can affect your memory usage, but also can affect, you know, just straight performance, both from caching and yours also spending time allocating things.

00:59:25.720 --> 00:59:27.720
Maybe you don't have to.

00:59:27.720 --> 00:59:28.720
Who wants to take this one?

00:59:28.720 --> 00:59:33.720
Yeah, I think we talked about this for a while when Pablo was talking about how lists allocate memory.

00:59:33.720 --> 00:59:45.720
One thing that memory has that most memory profilers don't have is an exact record of what allocations happened when and in what order relative to other allocations.

00:59:45.720 --> 00:59:59.720
And based on that, we can build a new reporting mode that most memory profilers could not do, where we can tell you if something was allocated and then immediately thrown away after being allocated and then something new is allocated and then immediately thrown away.

00:59:59.720 --> 01:00:12.720
We can detect that sort of thrashing pattern where you keep allocating something and then throwing it away very quickly, which lets you figure out if there's places where you should be reserving a bigger list or pre-allocating a vector or something like that.

01:00:12.720 --> 01:00:18.720
So that's based on just this rich temporal data that we're able to collect that most other memory profilers can't.

01:00:18.720 --> 01:00:19.720
Yeah, that's excellent.

01:00:19.720 --> 01:00:22.720
And you can customize what it means to be temporary.

01:00:22.720 --> 01:00:33.720
So that by default is what Matt mentioned, this allocate, the allocate, allocate, the allocate, allocate, the allocate, but you can decide for whatever reason that any allocation that is followed by a bunch of things.

01:00:33.720 --> 01:00:34.720
And then it's the allocation.

01:00:34.720 --> 01:00:37.720
And then a bunch of things is two, three, four, four, five, six allocations.

01:00:37.720 --> 01:00:44.720
Then it's considered temporary because you have, I don't know, some weird data structure that just happens to work like that.

01:00:44.720 --> 01:00:46.720
So you can select that, that end, let's say.

01:00:46.720 --> 01:00:47.720
Excellent.

01:00:47.720 --> 01:00:48.720
Yeah.

01:00:48.720 --> 01:00:52.720
And you've got some nice examples of that list.append story you were talking about.

01:00:52.720 --> 01:00:53.720
Yeah.

01:00:53.720 --> 01:00:57.720
And this absolutely matters because allocating memory is pretty slow.

01:00:57.720 --> 01:01:05.720
So when you're doing this, it really transforms something that is quadratic, like O n square into something that is constant.

01:01:05.720 --> 01:01:07.720
So you absolutely want that.

01:01:07.720 --> 01:01:08.720
You do want that.

01:01:08.720 --> 01:01:08.720
That's right.

01:01:08.720 --> 01:01:09.720
Yeah.

01:01:09.720 --> 01:01:19.720
When I was thinking of temporary variables, I was thinking of math and like, as you multiply some things, maybe you could change, change the orders or do other operations along those lines.

01:01:19.720 --> 01:01:26.720
But yeah, the growing list is huge because it's not just, oh, there's, there's one object that was created.

01:01:26.720 --> 01:01:30.720
You're making 16 and then you're making 32 and copying the 16 over.

01:01:30.720 --> 01:01:33.720
Then you're making 64 and copying the 32 over.

01:01:33.720 --> 01:01:34.720
It's, it's massive, right?

01:01:34.720 --> 01:01:35.720
And then you're making a lot of decisions.

01:01:35.720 --> 01:01:36.720
And then you're making a lot of decisions.

01:01:36.720 --> 01:01:37.720
And then you're making a lot of decisions.

01:01:37.720 --> 01:01:38.720
And then you're making a lot of decisions.

01:01:38.720 --> 01:01:39.720
And then you're making a lot of decisions.

01:01:39.720 --> 01:01:40.720
And then you're making a lot of decisions.

01:01:40.720 --> 01:01:41.720
And then you're making a lot of decisions.

01:01:41.720 --> 01:01:42.720
And then you're making a lot of decisions.

01:01:42.720 --> 01:01:43.720
And then you're making a lot of decisions.

01:01:43.720 --> 01:01:44.720
And then you're making a lot of decisions.

01:01:44.720 --> 01:01:45.720
And then you're making a lot of decisions.

01:01:45.720 --> 01:01:46.720
And then you're making a lot of decisions.

01:01:46.720 --> 01:01:47.720
And then you're making a lot of decisions.

01:01:47.720 --> 01:01:48.720
And then you're making a lot of decisions.

01:01:48.720 --> 01:01:49.720
And then you're making a lot of decisions.

01:01:49.720 --> 01:01:50.720
And then you're making a lot of decisions.

01:01:50.720 --> 01:01:51.720
And then you're making a lot of decisions.

01:01:51.720 --> 01:01:52.720
And then you're making a lot of decisions.

01:01:52.720 --> 01:01:53.720
And then you're making a lot of decisions.

01:01:53.720 --> 01:01:54.720
And then you're making a lot of decisions.

01:01:54.720 --> 01:01:55.720
And then you're making a lot of decisions.

01:01:55.720 --> 01:01:56.720
And then you're making a lot of decisions.

01:01:56.720 --> 01:01:57.720
And then you're making a lot of decisions.

01:01:57.720 --> 01:01:58.720
And then you're making a lot of decisions.

01:01:58.720 --> 01:01:59.720
And then you're making a lot of decisions.

01:01:59.720 --> 01:02:00.720
And then you're making a lot of decisions.

01:02:00.720 --> 01:02:01.720
And then you're making a lot of decisions.

01:02:01.720 --> 01:02:02.720
And then you're making a lot of decisions.

01:02:02.720 --> 01:02:03.720
And then you're making a lot of decisions.

01:02:03.720 --> 01:02:04.720
And then you're making a lot of decisions.

01:02:04.720 --> 01:02:05.720
And then you're making a lot of decisions.

01:02:05.720 --> 01:02:06.720
And then you're making a lot of decisions.

01:02:06.720 --> 01:02:07.720
And then you're making a lot of decisions.

01:02:07.720 --> 01:02:08.720
And then you're making a lot of decisions.

01:02:08.720 --> 01:02:09.720
And then you're making a lot of decisions.

01:02:09.720 --> 01:02:10.720
And then you're making a lot of decisions.

01:02:10.720 --> 01:02:11.720
And then you're making a lot of decisions.

01:02:11.720 --> 01:02:12.720
And then you're making a lot of decisions.

01:02:12.720 --> 01:02:13.720
And then you're making a lot of decisions.

01:02:13.720 --> 01:02:14.720
And then you're making a lot of decisions.

01:02:14.720 --> 01:02:15.720
And then you're making a lot of decisions.

01:02:15.720 --> 01:02:16.720
And then you're making a lot of decisions.

01:02:16.720 --> 01:02:17.720
And then you're making a lot of decisions.

01:02:17.720 --> 01:02:18.720
And then you're making a lot of decisions.

01:02:18.720 --> 01:02:19.720
And then you're making a lot of decisions.

01:02:19.720 --> 01:02:20.720
And then you're making a lot of decisions.

01:02:20.720 --> 01:02:21.720
And then you're making a lot of decisions.

01:02:21.720 --> 01:02:22.720
And then you're making a lot of decisions.

01:02:22.720 --> 01:02:23.720
And then you're making a lot of decisions.

01:02:23.720 --> 01:02:24.720
And then you're making a lot of decisions.

01:02:24.720 --> 01:02:25.720
And then you're making a lot of decisions.

01:02:25.720 --> 01:02:26.720
And then you're making a lot of decisions.

01:02:26.720 --> 01:02:27.720
And then you're making a lot of decisions.

01:02:27.720 --> 01:02:28.720
And then you're making a lot of decisions.

01:02:28.720 --> 01:02:29.720
And then you're making a lot of decisions.

01:02:29.720 --> 01:02:30.720
And then you're making a lot of decisions.

01:02:30.720 --> 01:02:31.720
And then you're making a lot of decisions.

01:02:31.720 --> 01:02:32.720
And then you're making a lot of decisions.

01:02:32.720 --> 01:02:33.720
And then you're making a lot of decisions.

01:02:33.720 --> 01:02:34.720
And then you're making a lot of decisions.

01:02:34.720 --> 01:02:35.720
And then you're making a lot of decisions.

01:02:35.720 --> 01:02:36.720
And then you're making a lot of decisions.

01:02:36.720 --> 01:02:37.720
And then you're making a lot of decisions.

01:02:37.720 --> 01:02:38.720
And then you're making a lot of decisions.

01:02:38.720 --> 01:02:39.720
And then you're making a lot of decisions.

01:02:39.720 --> 01:02:40.720
And then you're making a lot of decisions.

01:02:40.720 --> 01:02:41.720
And then you're making a lot of decisions.

01:02:41.720 --> 01:02:42.720
And then you're making a lot of decisions.

01:02:42.720 --> 01:02:43.720
And then you're making a lot of decisions.

01:02:43.720 --> 01:02:44.720
And then you're making a lot of decisions.

01:02:44.720 --> 01:02:45.720
And then you're making a lot of decisions.

01:02:45.720 --> 01:02:46.720
And then you're making a lot of decisions.

01:02:46.720 --> 01:02:47.720
And then you're making a lot of decisions.

01:02:47.720 --> 01:02:48.720
And then you're making a lot of decisions.

01:02:48.720 --> 01:02:49.720
And then you're making a lot of decisions.

01:02:49.720 --> 01:02:50.720
And then you're making a lot of decisions.

01:02:50.720 --> 01:02:51.720
And then you're making a lot of decisions.

01:02:51.720 --> 01:02:52.720
And then you're making a lot of decisions.

01:02:52.720 --> 01:02:53.720
And then you're making a lot of decisions.

01:02:53.720 --> 01:02:54.720
And then you're making a lot of decisions.

01:02:54.720 --> 01:02:55.720
And then you're making a lot of decisions.

01:02:55.720 --> 01:02:56.720
And then you're making a lot of decisions.

01:02:56.720 --> 01:02:57.720
And then you're making a lot of decisions.

01:02:57.720 --> 01:02:58.720
And then you're making a lot of decisions.

01:02:58.720 --> 01:02:59.720
And then you're making a lot of decisions.

01:02:59.720 --> 01:03:00.720
And then you're making a lot of decisions.

01:03:00.720 --> 01:03:01.720
And then you're making a lot of decisions.

01:03:01.720 --> 01:03:02.720
And then you're making a lot of decisions.

01:03:02.720 --> 01:03:03.720
And then you're making a lot of decisions.

01:03:03.720 --> 01:03:04.720
And then you're making a lot of decisions.

01:03:04.720 --> 01:03:05.720
And then you're making a lot of decisions.

01:03:05.720 --> 01:03:06.720
And then you're making a lot of decisions.

01:03:06.720 --> 01:03:07.720
And then you're making a lot of decisions.

01:03:07.720 --> 01:03:08.720
And then you're making a lot of decisions.

01:03:08.720 --> 01:03:09.720
And then you're making a lot of decisions.

01:03:09.720 --> 01:03:10.720
And then you're making a lot of decisions.

01:03:10.720 --> 01:03:11.720
And then you're making a lot of decisions.

01:03:11.720 --> 01:03:12.720
And then you're making a lot of decisions.

01:03:12.720 --> 01:03:14.720
And so we have actually used this thing to speed up C-Python.

01:03:14.720 --> 01:03:15.720
Oh, that's amazing.

01:03:15.720 --> 01:03:19.720
And like I said, this is a feature that we're able to do exactly because we're a tracing profiler.

01:03:19.720 --> 01:03:21.720
And we do see every single allocation.

01:03:21.720 --> 01:03:28.720
We built a new feature that was just released literally last week where we have a new type of flame graph we can generate.

01:03:28.720 --> 01:03:35.720
That is a temporal flame graph that gives you sliders on it where you can adjust the range of time that you are interested in.

01:03:35.720 --> 01:03:42.720
So instead of only being limited to looking at that high watermark point or only being limited to looking at the point right before tracking stop to see what's going on.

01:03:42.720 --> 01:03:44.720
What was allocated and not deallocated.

01:03:44.720 --> 01:03:51.720
You can tell the flame graph to focus in on this spot or on that spot to see what was happening at a particular point in time.

01:03:51.720 --> 01:04:02.720
And that's again a pretty unique feature that requires tracing profiling in order to be able to do because you need to know allocations that existed at any given point in time from one moment to the next.

01:04:02.720 --> 01:04:09.720
Yeah, that ability to actually assign an allocation to a place in time really unlocks a lot of cool things.

01:04:09.720 --> 01:04:10.720
Right.

01:04:10.720 --> 01:04:14.720
So it seems to me that this is really valuable for people with applications.

01:04:14.720 --> 01:04:16.720
You got a web app or some CLI app.

01:04:16.720 --> 01:04:17.720
That's great.

01:04:17.720 --> 01:04:24.720
It also seems like it'd be really valuable for people creating packages that are really popular that other people use.

01:04:24.720 --> 01:04:25.720
Right.

01:04:25.720 --> 01:04:26.720
Right.

01:04:26.720 --> 01:04:30.720
If I was Sebastian creating FastAPI, it might be worth running this a time or two on FastAPI.

01:04:30.720 --> 01:04:31.720
I think they are actually using it on FastAPI.

01:04:31.720 --> 01:04:31.720
Are they?

01:04:31.720 --> 01:04:32.720
Okay.

01:04:32.720 --> 01:04:33.720
No, it's Pydantic.

01:04:33.720 --> 01:04:34.720
I think they are using it in PyEptic.

01:04:34.720 --> 01:04:35.720
And I think our other bigger, I mean, there's a lot of users.

01:04:35.720 --> 01:04:36.720
I'm trying to think of the big ones.

01:04:36.720 --> 01:04:37.720
I think the other ones that are...

01:04:37.720 --> 01:04:37.720
Yeah.

01:04:37.720 --> 01:04:37.720
Or a lib3.

01:04:37.720 --> 01:04:50.720
There was a feature that they came to us and they said we used memory to track down where

01:04:50.720 --> 01:04:52.720
memory was being spent in a new version of URL lib3.

01:04:52.720 --> 01:04:56.720
And they said that they would not have been able to release the new feature that they wanted

01:04:56.720 --> 01:05:00.720
if they hadn't been able to get the memory under control and that we helped them do it very quickly.

01:05:00.720 --> 01:05:01.720
That is awesome.

01:05:01.720 --> 01:05:02.720
Yeah.

01:05:02.720 --> 01:05:07.720
Like all the ORMs, I'm sure that they're doing a lot of like, read this cursor and put this

01:05:07.720 --> 01:05:08.720
stuff in the list.

01:05:08.720 --> 01:05:11.720
We're going to, you know, like there's probably a lot of low hanging fruit actually.

01:05:11.720 --> 01:05:16.720
And the reason this comes to mind for me is we can run it on our code and make it faster.

01:05:16.720 --> 01:05:21.720
But if somebody who's got a popular library, like the ones you all mentioned, can find some

01:05:21.720 --> 01:05:27.720
problem, like the multiplicative improvement across everybody's app, across all the different

01:05:27.720 --> 01:05:32.720
programs and the libraries that use those, it's a huge, huge benefit, I would think.

01:05:32.720 --> 01:05:40.720
We are also very lucky because we have a wonderful community and we have using this GitHub discussions.

01:05:40.720 --> 01:05:45.720
A lot of people probably don't know that that is a thing, but we have in the memory repo,

01:05:45.720 --> 01:05:48.720
a discussion for feedback.

01:05:48.720 --> 01:05:53.720
And a lot of, there is a lot of people from like library maintainers in the Python ecosystem

01:05:53.720 --> 01:05:55.720
that have used memory successfully.

01:05:55.720 --> 01:05:56.720
And they tell us about that.

01:05:56.720 --> 01:06:02.720
And it's quite, quite cool to see like how many problems have been solved by memory.

01:06:02.720 --> 01:06:03.720
Some of them super challenging.

01:06:03.720 --> 01:06:04.720
Yeah.

01:06:04.720 --> 01:06:07.720
I've got to say, I didn't know that discussions existed until we enabled it on this repo.

01:06:07.720 --> 01:06:09.720
So I'm learning things every day.

01:06:09.720 --> 01:06:10.720
Here you are.

01:06:10.720 --> 01:06:11.720
Absolutely.

01:06:11.720 --> 01:06:16.720
Maybe just a quick question to wrap up the conversation here is Brnega Boer out there

01:06:16.720 --> 01:06:19.720
asked, does memory support Python 312 yet?

01:06:19.720 --> 01:06:20.720
Is the short answer.

01:06:20.720 --> 01:06:27.720
We're at the moment blocked on that by, Cython 0.29, not supporting 312 yet.

01:06:27.720 --> 01:06:32.720
We need to get that sorted before we can even build on 312 to start testing on 312.

01:06:32.720 --> 01:06:35.720
Do you have to build on 312 to analyze 312 applications?

01:06:35.720 --> 01:06:36.720
Yes.

01:06:36.720 --> 01:06:37.720
Yes.

01:06:37.720 --> 01:06:38.720
Okay.

01:06:38.720 --> 01:06:40.720
Because these runs on the application itself.

01:06:40.720 --> 01:06:42.720
So this is not something that exists outside.

01:06:42.720 --> 01:06:44.720
This is something that runs within the application.

01:06:44.720 --> 01:06:45.720
It's like inside.

01:06:45.720 --> 01:06:46.720
Yeah.

01:06:46.720 --> 01:06:46.720
Yeah.

01:06:46.720 --> 01:06:49.720
So you need to run your app in 312 to run memory on 312.

01:06:49.720 --> 01:06:50.720
Yes.

01:06:50.720 --> 01:06:53.720
That's the difference between this and PyStack, which we were speaking about last time.

01:06:53.720 --> 01:06:58.720
PyStack can attach to a 312 process from a 311 processor or something like that.

01:06:58.720 --> 01:06:59.720
But memory can't.

01:06:59.720 --> 01:07:00.720
Okay.

01:07:00.720 --> 01:07:01.720
Well, good to know.

01:07:01.720 --> 01:07:02.720
All right, guys.

01:07:02.720 --> 01:07:03.720
Thank you for coming back.

01:07:03.720 --> 01:07:05.720
I'm taking the extra time to tell people about this.

01:07:05.720 --> 01:07:10.720
But mostly, you know, thanks to you all and thanks to Bloomberg for these two apps, memory

01:07:10.720 --> 01:07:11.720
and PyStack.

01:07:11.720 --> 01:07:16.720
They're both the kind of thing that looks like it takes an insane amount of understanding

01:07:16.720 --> 01:07:21.720
the internals of CPython and how code runs and how operating systems work.

01:07:21.720 --> 01:07:25.720
And you've done it for all of us, so we could just run it and benefit, not have to worry about

01:07:25.720 --> 01:07:26.720
it that much.

01:07:26.720 --> 01:07:27.720
You have no idea.

01:07:27.720 --> 01:07:31.720
I will add linkers because we didn't even have the time to go there.

01:07:31.720 --> 01:07:41.720
But memory uses quite a lot of dark linker magic to be able to activate itself in the middle of nowhere, even if you didn't prepare for that.

01:07:41.720 --> 01:07:45.720
Which a lot of memory profiles require you to modify how you run your program.

01:07:45.720 --> 01:07:50.720
Memory can magically activate itself, which allows to attach itself to a running process.

01:07:50.720 --> 01:07:53.720
But yeah, for another time maybe.

01:07:53.720 --> 01:07:57.720
I wrote some of the craziest code of my life last week in support of memory.

01:07:57.720 --> 01:07:59.720
You have no idea how wild it can get.

01:07:59.720 --> 01:08:02.720
It seems intense and even that's not enough.

01:08:02.720 --> 01:08:03.720
Okay, awesome.

01:08:03.720 --> 01:08:04.720
Again, thank you.

01:08:04.720 --> 01:08:06.720
This is an awesome project.

01:08:06.720 --> 01:08:18.720
People should certainly check it out and kind of want to encourage library package authors out there to say, you know, if you've got a popular package and you think it might benefit from this, just give it a quick run and see if there's some easy wins that would help everyone.

01:08:18.720 --> 01:08:19.720
Absolutely.

01:08:19.720 --> 01:08:22.720
Well, and I just want to add a thank you very much for inviting us again.

01:08:22.720 --> 01:08:26.720
We are super thankful for being here and always very happy to talk with you.

01:08:26.720 --> 01:08:27.720
Thanks, Pablo.

01:08:27.720 --> 01:08:27.720
Seconded.

01:08:27.720 --> 01:08:28.720
Yeah.

01:08:28.720 --> 01:08:29.720
Thanks, Matt.

01:08:29.720 --> 01:08:30.720
Bye, you guys.

01:08:30.720 --> 01:08:31.720
Thanks everyone for listening.

01:08:31.720 --> 01:08:31.720
Bye.

01:08:31.720 --> 01:08:32.720
Thank you.

01:08:32.720 --> 01:08:32.720
Thank you.

01:08:32.720 --> 01:08:32.720
Thank you.

01:08:32.720 --> 01:08:32.720
Thank you.

01:08:32.720 --> 01:08:33.720
Thank you.

01:08:33.720 --> 01:08:33.720
Thank you.

01:08:33.720 --> 01:08:34.720
Thank you.

01:08:34.720 --> 01:08:35.720
Thank you.

01:08:35.720 --> 01:08:36.720
Thank you.

01:08:36.720 --> 01:08:37.720
Thank you to our sponsors.

01:08:37.720 --> 01:08:38.720
Be sure to check out what they're offering.

01:08:38.720 --> 01:08:40.720
It really helps support the show.

01:08:40.720 --> 01:08:45.720
The folks over at JetBrains encourage you to get work done with PyCharm.

01:08:45.720 --> 01:08:56.720
PyCharm Professional understands complex projects across multiple languages and technologies, so you can stay productive while you're writing Python code and other code like HTML or SQL.

01:08:56.720 --> 01:09:01.720
Download your free trial at talkpython.fm/donewithpycharm.

01:09:01.720 --> 01:09:05.720
InfluxData encourages you to try InfluxDB.

01:09:05.720 --> 01:09:12.720
InfluxDB is a database purpose-built for handling time series data at a massive scale for real-time analytics.

01:09:12.720 --> 01:09:16.720
Try it for free at talkpython.fm/influxDB.

01:09:16.720 --> 01:09:18.720
Want to level up your Python?

01:09:18.720 --> 01:09:22.720
We have one of the largest catalogs of Python video courses over at Talk Python.

01:09:22.720 --> 01:09:27.720
Our content ranges from true beginners to deeply advanced topics like memory and async.

01:09:27.720 --> 01:09:30.720
And best of all, there's not a subscription in sight.

01:09:30.720 --> 01:09:33.720
Check it out for yourself at training.talkpython.fm.

01:09:33.720 --> 01:09:37.720
Be sure to subscribe to the show, open your favorite podcast app, and search for Python.

01:09:37.720 --> 01:09:39.720
We should be right at the top.

01:09:39.720 --> 01:09:48.720
You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:09:48.720 --> 01:09:51.720
We're live streaming most of our recordings these days.

01:09:51.720 --> 01:09:59.720
If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:09:59.720 --> 01:10:01.720
This is your host, Michael Kennedy.

01:10:01.720 --> 01:10:02.720
Thanks so much for listening.

01:10:02.720 --> 01:10:03.720
I really appreciate it.

01:10:03.720 --> 01:10:04.720
Now get out there and write some Python code.

01:10:04.720 --> 01:10:05.720
Now get out there and write some Python code.

01:10:05.720 --> 01:10:06.720
Now get out there and write some Python code.

01:10:06.720 --> 01:10:07.720
Now get out there and write some Python code.

01:10:07.720 --> 01:10:11.720
Now get out there and write some Python code.

01:10:11.720 --> 01:10:11.720
Now get out there and write some Python code.

01:10:11.720 --> 01:10:13.720
Now get out there and write some Python code.

01:10:13.720 --> 01:10:15.720
Now get out there and write some Python code.

01:10:15.720 --> 01:10:17.720
Now get out there and write some Python code.

01:10:17.720 --> 01:10:19.720
Now get out there and write some Python code.

01:10:19.720 --> 01:10:24.720
Now get out there and write some Python code.

01:10:24.720 --> 01:10:27.720
Now get out there and write some Python code.