Monitor performance issues & errors in your code

#274: Profiling data science code with FIL Transcript

Recorded on Wednesday, Jul 8, 2020.

00:00 Do you write data science code? Do you struggle loading large amounts of data or wonder what parts of your code use the maximum amount of memory? Maybe you just want to require smaller compute resources, servers, Ram and so on? If so, this episode is for you, it Mr. Turner trying Peter of the Python data science and memory profiler Phil here to talk about memory usage and data science. This is talk Python to me, Episode 274, recorded July 8 2020.

00:39 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk and follow the show on Twitter via at talk Python. This episode is brought to you by linode. And us Do you want to learn Python, but you can't bear to subscribe to yet another service at Talk Python Training. We hate subscriptions too. That's where of course bundle gives you full access to the entire library of courses. For one fair price. That's right, with the course bundle, you save 70% of the full price of our courses, and you own them all forever. That includes courses published at the time of the purchase, as well as courses released within about a year of the bundle to stop subscribing and start learning at talk Python. FM slash everything. It's Mr. Welcome to talk Python to me. All right. Great to be here. Yeah, it's great to have you here. I'm excited to talk about Python and memory. Yeah, me too. Yeah, I think it's something that doesn't really get as much coverage, as I think it deserves in the Python space. You know, if you're a Java developer or a dotnet, developer, people go on and on and on about optimizing the GC and tweaking this thing or that thing, or your code or algorithms for memory management. If you're a C developer, you're constantly in fear of memory leaks and memory management. And in Python, we get to just kind of coast or not. So my motivation for getting into this was doing some scientific computing, with basically a giant pile of images. And we'd have to extract information from them. And I initially just focused on getting it working. And then one day, I said, Okay, we're running this on these cloud computers, and it's taking, you know, 18 hours to process the data. Like most of the CPUs are idle, because you have so much memory, I wonder if this is a problem. And so I did some math, and I talked to management about it. But the revenue, it turns out, we were going to spend, like 70% of our expected revenue just on cloud computing, given Microsoft implementation, which wouldn't have that any, there wouldn't be anything left over for were they excited about that? Or were they not so excited? I didn't mention this. I went optimize it. And then I just like, then I sent an email to my manager saying, look, look, the great work I did. Exactly. But

03:08 yeah, that's very, very cool. And so and reducing the memory, like meant that you could use a lot more CPUs, because that was the bottleneck conditionally, like we had this cloud VM that was like mostly sitting idle, because you just need so much RAM for each of the threads or processes, right? You can't get a high memory version of a cloud computer. But still, there is that trade off, right? If you want to take full advantage of the CPUs there. And obviously, the less memory is better. And also just it might mean fewer cloud computers to manage. Yeah. And if you think about your computer, if you look at like the usage of your computer, much of the time your computer usage is going to be like you're using 1% of the CPU just sitting there, and your RAM, if you're like a lot of computers, or like a few bytes around the computer, your RAM is going to be like three quarters percent full. There's some 5% full. And basically, it's just that proportion of the RAM is much more expensive than computing. And so you don't have as much of a just look at all the CPU guy like memory tends to be resource constrained and the failure modes are you out of memory and like your computer's way faster. All right, right. You run out of CPU, it goes slower. Yeah, that's much worse. Interesting. Yeah. Well, it's gonna be really fun to dig into. And I think it's an interesting angle of just the Python ecosystem that people don't spend that much time obsessing about the memory. But it's important, and it's interesting, and we're gonna spend some time obsessing about it for the next hour or so for sure. Before we do, let's get into your story, though. How did you get into programming in Python, I got into programming back in the mid 90s. And my parents were this business creating multimedia CD ROMs, which was exciting new technology in the mid 1990s. And so I ended up doing coding for them. I got into Python few years later, when I discovered soap this

05:00 ZZ rp framework, which at the time was like really huge in the Python world, like, you go to Python conferences have like a whole trappings of them. And then I just stuck around and end up using Python for lots of things like distributed computing work and twisted from Arizona. And scientific computing. Just right here thinks about vertical. What do you do day to day, I'm been doing training stuff on Docker and packaging for Python, hoping to eventually teach some stuff back. I find memory, then I have some products, digital consulting on the side. So yeah, very cool. Is this training in person? Is it online? What is it like?

05:40 Originally, this was in person training. I was supposed to have, like, open enrollment class right after Ty con in Pittsburgh, for example. And nowadays, it's over zooms. Because Yeah, yeah. Because the world is crazy. It's absolutely crazy. Yeah. Okay. Well, cool. That's, it's a lot of fun. I've did that for like 10 years and really enjoyed my time doing in person training. Luckily, there was I don't know pandemics

06:09 with other things, but not not too much. Yeah, we did some stuff over I think goto meeting GoToWebinar, which there was no zoom. So that's what we're using. It was pretty good. Actually. Yeah, it's not not a bad story. Yeah. All right. So speaking of obsessing with Python memory, let's just get started off with a little bit of an overview of how Python memory works. So I feel like Python memory lives a little bit in between the C++ world where it's very explicit, and the Java dotnet GC world where it's not even deterministic. What's the story as a payload, this actually depends on which Python interpreter you're using, if you're using pi, pi, pi pi, it's actually basically like Java, or dotnet. If you're using C, Python, which most people do, it's a little bit different. And the basic idea is that every Python object has a reference counter. And so when you get a reference to an object, it gets incremented by one or move a reference, it's decremented. So when you append your object to the list, that's an extra reference. If you destroy the list, that reference goes down. If the reference goes down to zero, the object is not being used by anyone, there's no references to it. So it can immediately be freed up and deallocate. It. The problem with reference counting is that it's not doesn't cover all cases, if you have a circular set of circular references, the objects will never hit reference dot zero. So if you take a list, and then you append it to itself, to itself, it's going because it has a reference to itself, its reference counts never going to hit zero, even if you already have the references that it's an addition to the reference counting. Python also has a garbage collection system, which every think it's based on how many bad goats are run, it will go and look for objects that are in this little loop by themselves, but not being used in actual code. Get rid of them, too. Right. And I think the GC is also generational, like the other like the main ones, say Java and dotnet, as well. Yeah, and I don't quite remember how this works. So you know, totally, maybe more maniacal example might be some, if you're studying some kind of graph, theory type object, like a network of things, or a network of relationships among people, or something like that, where it doesn't even have to be one thing pointing back at itself. It could be thing a points at B, B points at C and D, D, points back at F but f points at a, if you can make a circle following that chain, reference counting breaks. Yeah, until you fall back on GC, garbage collection and right. But I would say for the most part that just knowing the GC is there to kind of catch that edge case is really all most people need to know, right? Because the very the primary story is this reference counting story. What do you think, unless you're using pi pi, because then there's no reference counting, it's only garbage collection. Yeah, but I'm thinking most people running C, Python, maybe these some data science libraries, especially in the context of using your tool that we're going to talk about it, it feels like it's definitely in the, in the data science side of things in that world and the CPython world, then it's probably reference counting that you care the most about. Yeah, and I mean, just a fairly high level understanding that as long as something's referring to your object, it will exist. If the references go away, it will either immediately or eventually disappear and get d allocated. That's pretty much all you need to know, the vast majority of the time. Yep, and the vast majority of tenants enough, but not always, not always. So we're going to talk about a project that you started called Phil si l that is about profiling memory allocations for data pipeline type of scenarios in particular is optimized for that, although I suspect you could use it for a lot of different things. Let's start the story by just talking about some memory challenges. I guess we could call them so you wrote a cool blog post.

10:00 called clinging to memory how Python function function calls can increase your memory usage. Yeah, that's pretty interesting. I want to tell us the general idea here. And so this is something I encounter in the real world. So it can impact you. And this is more of an issue in the kind of applications are processing large amounts of data is like one object might be like four gigabytes of RAM. Like if it's like, if I'm just live slightly longer, and they're like, you know, a dictionary, three entries, and there's only one dictionary, I don't really care how long it lives, because it's time are using 2.7 or 2.701 megabytes for this. Yeah, working memory. Nobody cares. Yeah, yeah. When you have like an array that's like four gigabytes, or 20 gigabytes like this can cause very significant impacts, if scifi array lives even slightly longer than it needs to. And so the idea is if you have a function, and you create something in it, and then you pass that object to another function that you're calling function F and you're creating this, you have this large array, you pass it to G, if you have a local variable and inside of apps, that the parent function still refers to that array, like the parameter that except that the data for example, yeah, then the you have that reference, within that function call is a reference, it means reference counts, not gonna hit zero. Even if G like uses that array, and then throws it away, doesn't care about it anymore. The parent function still has referenced that array. And so you can end up with these situations where if you read the code, you know that you are never going to use this data again, there is no way you can use it. But from pythons perspective, because there's a local variable, than the function frame that's referring to that object is going to persist until that function returns or throws an exception and exits, right, because everything that was loaded up in that function got defined. So here's all the variables of the function and reference counting, they're still pointing to things until those variables go away right away when the function returns. Yeah, you can imagine like, if you go into PDB, like you can actually travel up and down the stack. And like, you can go up to like the parent function and see the local variables, they're still there. Like, you can still go in the debugger properties go up to frames, so you can call and you'll still see the local variable pointing here a large object. And so you can restructure your code in various ways to deal with this. And the way I ended up actually doing it was basically copying this idiom from C++, where you have a this object whose only job is to own another object, it has that you end up with only one reference to the larger that you care about, which is from inside the owner object, then you pass the owner object around the owner, when you know that you don't need that data anymore, you tell the owner objects, clear your contents, and then that one reference goes away, memory is freed. So you sort of a dis interesting situation where every once in a while, you actually have to fall back to the manual memory management techniques that you have to use all the time in languages like C or C++, right? You know, what's interesting is I, I see examples of code like this, and then you'll, you'll see other people lamenting the fact that code is written this way. And they'll say, you should never write code this way. It's not necessary in Python, because it has automatic memory management, or you should never do this like halfway through, randomly set a variable to none. And then keep going, why would you ever do that? That's like, you don't need to do that. Right? If you're not going to use it again. Oh, except when that was costing you an extra gig of memory. All of a sudden, this kind of non standard pattern, it turns out to be really valuable, right? It's the difference between it works or it doesn't work, or it's 1000 versus $200, a cloud compute or whatever, right?

13:45 This portion of talk Python to me is brought to you by linode. Whether you're working on a personal project or managing your enterprises infrastructure, linode has the pricing support and scale that you need to take your project to the next level. With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise grade hardware, s3 compatible storage, and the next generation network will know delivers the performance that you expect at a price that you don't get started on the note today with a $20 credit and you get access to native SSD storage, a 40 gigabit network industry leading processors, their revamped Cloud Manager cloud not root access to your server along with their newest API and a python ccli. Just visit talk Python FM slash linode. When creating a new linode account, you'll automatically get $20 credit for your next project. Oh, and one last thing they're hiring go to slash careers to find out more. Let them know that we sent you.

14:44 Yeah, on having Mike never done scientific computing before this job. I was at a couple years ago. It was interesting experience learning a different because the domain is different that you have different constraints and different goals and some of the ways you are

15:00 software, nothing different. Okay. And unless you're doing day to day life, large scale data processing, most of the time in Python, you just don't think of any of these things like, you might have to worry about memory leaks. But that's a different sort of much of the time, that's a different set of problems. Whereas like, you don't think about the fact that an object being alive for five more milliseconds might cost you like, another hundred thousand dollars if you're scaling up. Yeah, for sure. It's interesting. Another solution that you proposed, well, you propose three solutions. One is this ownership story. One was maybe only applicable for very limited small functions, but you could just have no local variables, and just basically chain one function call into another. Yeah, the, the intermediate one, though, seems possible, possibly reasonable as well, which is to reuse the local variable. So you can load up some data, and then you're going to maybe make some changes, which will copy the data instead of having beta one, day two, day three, you say data equals loaded data equals modify the data, data equals modify the data again. And that way, at least as you go through these steps, after each one, it's, you know, release the memory from the prior potentially, yeah, and one of the things about today processing applications, they often have these sort of ATMs where you're like, doing a series of steps. And this is where like, keeping old copies of the data around tends to end up cumulatively being very expensive in terms of memory, because the series of steps once you've done step one, you don't really care about the initial input. Once you've done step two, you don't care about that one. So just yeah, explicitly overwriting the previous step. And another way to do this, I could see somebody looking at us in a code review and go, Why are you doing this? These data mean different things? One should be initial data, the other should be, you know, grouped by state. And the third should be some other thing. Like you're naming these wrong, you know what I mean? That's what I was, I was kind of hinting at is like, sometimes you need to break the rules to, to break through to like a better outcome. Yeah. And, in general, pretty much every best practices very situation specific. And sometimes, yeah, that's charity. But yeah, that's a really good point that it a lot of times when you hear advice like that, it's spoken as if it was absolute. But there's an implicit context, right, like we said, when you don't really care about memory and that kind of stuff, you just said, you just go and write the code. But you know, that probably means implicitly, what I care about is readability. And what I care about is maintainability. And I just want to optimize it to be as clean and pure as possible, which is fine. But if pure doesn't work, and not clean, totally works, like forget the clean, we don't care anymore, I want it to work that's more important, like functioning is primary here. Yeah. And then there's like places like micro Python, where you're running on little embedded devices, and very little Ram. And then some of the problems that you have large data processing are translate down to very small programs. That's an interesting example, for sure. Because, again, if you didn't care about that extra Meg of RAM, but all of a sudden, you only have half a meg, now you really care about it. I do want to throw out something from Philip go over at python If you want to understand like a lot of these relationships and how objects refer back to each other. He's got a really cool visualization. I think when you're over there, you have to check. There's like a checkbox at the bottom, you pull it up under the way it renders objects, I think you have to flip it from inline primitives to say render all objects on the heap like Java and Python do anyway. If you want to, like show that off or visualize that. That's a really cool, quick one. Also, if you want to observe reference counting, without changing reference counting, because like you might want to say, How do I know if there's a reference to this? You can't store a variable and point at it and say, now we're going to ask is it because you've now changed it right? If you've done anything with weak references, we graph I'm not sure I've ended up using them in scientific computing, and I've definitely done them. Use them in some places, like asynchronous programming seven times. Yeah. Yeah, you could use it before like caches that can kind of like auto expire and stuff as well. But they're really good for I could create a weak reference to an object Did you can ask how many things pointed this and even if you know, something points at it, knowing whether that's one or two might help you get a different understanding, right? You're like, Oh, I thought there was only one point. I don't know, Why are there two pointers to this thing? Where did that second one come from? So you can ask interesting questions without changing the reference counting with weak references. It's really easy. Yeah. And there's an API sis cat refers and gives you the objects that refers to an object but then, yeah, lovely. Add the current function frame as an additional reference, and you have to discount it Right, right. Through also get size of and here's, well, what's the story to get size of the function call thing is sort of just an example of places where automatic memory management gets in your way. But there are more fundamental limits or problems. You're in

20:00 With when using Python on in memory intensive situations, if you wish you need to understand. And one of them is just that Python objects use a surprising amount of memory for what information that they store. So pretty much every if you look at the implementation of the CPython interpreter, every object has an addition to whatever data you need to actually store the object itself. It has on 64 bit machine, which is most of them these days, it has a pointer to the class, or the C type for the class. So that's another eight bytes. And then it has the reference count. So that's another eight bytes, then I think, if you have a few objects for its garbage collection is even more. And so if you check the system gets size of nice utility, and lets you use Tamiya tell you how many bytes object uses. I don't think that actually traverses the object tree, right? Like if this thing, it's a list and the list points of things, and those points at those. Yeah, I think it's just how much is like that. The immediate thing that that variable value points out, right, yeah, yeah.

21:06 Yeah, I enjoy talking. And if you check the how much memory can integer users like the number one, it takes 28 bytes. And six, think about like, how you represent numbers in memory, like, unless you have really large numbers, where you obviously need more 64 bits is sort of will get you some really big numbers, it's only eight bytes. But you're actually doing for every integer. With some optimizations, I think we use this a lot, hello, thousand or 10,000 integers. But in general, it's 28 by a super integer. So if you have a list of a million integers, that's why I did the math, I think it was 35, a list of a million integers and it is 35 megabytes of RAM. If you allocated that in a CRA, it would be eight megabytes of RAM. So you're using four and a half times as much memory just because you're using Python objects. In another example, what we're talking about is the character A. So in C, the character a would be four bytes or something like that. If you're using UTF, eight, you can probably get it down to one byte. Yeah, you could definitely make it smaller if you do it right. In Python, it's 50. Yeah, so also get size of just some interesting stuff. So if I give it a list of like, a million items, it'll say the size is 800,000. It's not quite a million, maybe it's 100,000. I think it's 100,000. But if I give it a list, which has the number one, and also contains, within that list, that list of 100,000 items, the size of 72. So yeah, you got to be real careful. It's it's, it doesn't tell you the whole story. But it does get Yeah, exactly. But it gives you a sense of Oh, like the letter A is 50. And then number one is 28. The memory that we use per representation and data in Python is fairly expensive, I think is the takeaway, right? Yeah. So if you have like, a common one place where people hit this is like, you're reading in some data. And then you're creating like a tick list per thing for like obedience and like rows of data from the CSV or something, and you're turning into like, here's a list. And then like, there's a dictionary with like, a bunch of keys for each one or an object for each entry. And you end up with like a massive amount of concern, the information you're storing, you end up with a huge amount of overhead from creating all those different Python objects. And so one situation you end up in Python running out of memory is if you're doing like data processing, and it's just like, you just have 10 gigabytes of data loaded, it's going to be a lot of memory. But sometimes it's not actually that much data, if you store it on disk, or if you stored in the appropriate CFC object. And it's just a lot of data because you create a lot of Python objects. And so it's using like five times as much memory as the actual information in writes. Right. So maybe you load it into NumPy, or pandas or something like that. It's Yeah, up into a native Python dictionary or something. Yeah. So if you think about Python lists, which has a bunch of Python NumPy integers in it. And so each of those Python integers is a has like 28 bytes of RAM, a NumPy array. It's basically it has the same Python overhead, but only once at the beginning where it says I am an array, and I store 64 bit integers. And then the storage is just it's not generic pointer to generic Python. Alright, it's only eight bytes for entry. Yeah, yeah. And so the information that was like 35 megabytes in a Python list will be eight megabytes in a NumPy array. Now, another one that I mean, moving to some of these libraries that support it more efficiently, certainly, in the data science world make a lot of sense. But also something that makes a lot of sense that I think people may be overlooking, is using different algorithms, or different ways of processing, like one really simple way is like I need to load compute a bunch of stuff and give it back as a collection. So I'm going to create a list, fill it up and return it right. That loads, you know, maybe a million items into memory and all the cost and overhead of that and then you give it over to be processed and off it goes.

25:00 Alternatively, add a yield instead of a list and just do a generator and you process them one at a time. Because probably what you're going to do when you get that list back is go through the list one at a time, right? And that uses one 1,000,000th of the memory or something to that effect, right? It doesn't it only loads one in memory at a time, and not all of them. There's things like that as well that you can do if processing them one at a time in order to make sense if you need to seek around and say, Well, what the third one compared to the first one is, then forget it. Yeah, the three basic techniques usually are batching. And streaming with generators, like a sort of batches or batch size of one. Yeah, yeah. And then there's a compression where you have the same memory, and same data semantically, that with less overhead. So like switching from Python lists to NumPy arrays, in some sense, compression, if you know, your numbers are only going to go up to 100, you can use a eight bit NumPy array, and then like you've cut your memory by, like 80%, no cost because you have exact same information. And then the final technique is indexing, where you No need to load only some of the data that you consider arrange your data. So you can only you only need to load that part. So like triggering accounting, like if you have one file for every month of the year, and you just love July's file, and then you have to worry about Rite Aid on the other month. Yeah, yeah, very cool summary. So that's the picture. That's the memory story. That's some of the challenges you might hit, and some of the potential solutions that you might come up against. But at some point, you might just need to know like, Okay, this is about as good as it's gonna get. But I still need to understand the memory better, or I'm running out of memory. why, where, or maybe you want to take the lazy approach, maybe you want to start from? Well, I know I have this problem of using too much memory. I know one of these things that these guys talked about will possibly solve it. But where should I focus my attention? Right? I've got 1000 lines of Python, maybe only three need to be changed, which are the three, right? So you probably want to profile it somehow answer the question like, Where is the memory coming from? What's the problem? It's very difficult to optimize something if you can't measure it, like the example we gave with functions, keeping local variables keeping things alive, like, I would never know that I know, it's a problem I've encountered, I might be able to look for it. But at time like it was I believe it was something like extra 10 megabytes of RAM or something. And I don't think I ever would have spotted it just reading the code. Yeah, because it looks perfect. it's clean, it's readable. It's optimized, exactly for the scenario that you most the time optimize it for. So it doesn't look broken. Yeah. If you want to understand why something is using too much resources, like you need to measure it, I built a profiler for a memory profiler called Phil. Feel FL, which is designed to solve this problem because I hadn't tried other tools available. I decided they weren't sufficient. Yeah. So Phil, I think is really interesting. And the thing that made it connect for me at first, I was like, well, we already have some pretty interesting ones. I mean, you've got built in C profile. I think that only does CPU profiling, not memory. We have memory underscore profiler, which will do memory profiling. Yeah.

28:16 Yeah, we have Austin, are you familiar with Austin? I've not used it. I've used pi instrument. And I know about pi top pi spy. And they're all sampling profilers, right, right. And Austin's pretty interesting as well. But Phil, the way that you laid it out is really, a lot of these profilers are either general purpose, or they're built around the idea of working on servers and long living processes that do short amounts of work many, many times, right, like a web server or something like that. And that's a pretty different scenario of I have a script I need to run at once in order, and then look at where the memory comes from. Right? Yeah, so memory profiler is the tool I ended up using when I was trying to just memory usage. And memory profiler will do this thing where it gives you if you run a function that says this function added nk memory, 100 megabytes of memory, or whatever. And if you're trying to find a memory leak, this is actually pretty useful. Like you can say here, like I call this function that now My memory is a tire. And so why, what what happened, so you can figure out this function is where your memory is leaking. But the thing that I was trying to do and what data processing applications you mentioned, or trying to do is reduce your peak memory. The idea is that you're running this process, it's going to load in some data, it's going to process it, then it's going to write it out, and it's going to exit. And the peak memory usage is what determines how much hardware you need, or virtual hardware. Like doesn't matter if like 99% of the time, it's only using 100 megabytes. If 1% of the time you need 60 gigabytes of RAM. Like it's that peak moment in time that you need to use. You have to really use it for Yeah, yeah, the high watermark so yeah,

30:00 build a dam, like, figure out what the highest, what you get is. And the thing about memory profiler like you can run it on an earlier function. And I'll say this line of code added zero megabytes of RAM, like measured before and measured are the same. So no memory was added. It's

30:19 great, right? But it may be that it allocated 20 gigabytes of RAM did something and then deallocate it. And so you have to memory profiler like recursively, go through your whole codebase function by function, until you find that one line of code that's making things and so you can use it to figure out the peak memory and but it is a very manual tedious process. Yeah, and it's, and once your code base is hard enough, excuse me, it can become quite difficult. And another big distinction between servers and data pipelines is how much you care about memory leaks. As long as it's a small memory leak, like if you're doing like a process that runs for an hour and a leak the hundred k like after like an hour, just exit if you have for leaking 100 k an hour, but your process, you have like 10 processes and they're running for a year 100 k may not be a problem. But like there's some threshold where for a server, it could accumulate your server crashes. And for a batch process so long, it's not impacting the peak, you don't care, I imagine you leak only one kilobyte of memory. But it's in the context of a web request, and you're getting 100,000 web requests an hour, all of a sudden, your servers toast, right? Whereas if you call the function once, and you leak, a kilobyte, and you're doing like a top to bottom run at once data pipeline, who cares? Right, it's, it's a lost in the void there. So I think also, just the focus of what you care about, is really different. You don't generally have these huge spikes in server type application. You can if you're doing like reporting or other weird stuff, but like standard data driven stuff, it's it's pretty flatline. Yeah. And it turns out that if you think about it, a memory leak, a tool that can find peak memory can also find memory leaks. Because if you have a memory leak, peak memory is always like right now. Yeah, it was around for a while. And peak memory event eventually, like your memory of since overwhelmed by the leak. And then you dump the memory that and so in that moment is peak memory. So tool that can find peak memory can deal with leaks, deal deals, leaks can't necessarily help you with memory. It's actually a more general concept.

32:33 Talk Python to me, it's partially supported by our training courses. How does your team keep their Python skills sharp? How do you make sure new hires Get Started fast and learn the pythonic? way? If the answer is a series of boring videos that don't inspire, or a subscription service you pay way too much for and use way too little. Listen up. A Talk Python Training, we have enterprise tiers for all of our courses, get just the one course you need for your team with full reporting and monitoring, or ditch that unused subscription for our course bundles, which include all the courses and you pay about the same price as a subscription. Once For details, visit training dot talk slash business or just email sales at talk

33:17 Another thing I like to do is relate quantum mechanics back to programming ideas. And I think they're really relevant in both profiling and debugging. And that is the idea I'm thinking of is the observer effect that by observing some phenomenon, you might actually change it right? Maybe the the tool you're using to measure it actually makes some difference. Or in quantum mechanics, like just insane, bizarre observer effect. Things happen that, again, it shouldn't, but it does. One of the challenges I see around profiling is especially intra instrumenting style profilers is you run it because you is too slow, you want to understand the performance. So you apply the profiler to it. Now, it's 10 times slower, or 20 times slower, but not evenly, right? Like, if it's in a really tight loop, that part slows down more than if you're calling like a C function, that you're not technically profiling that part. But it's still slow, that might not really slow down at all. So you might exaggerate different parts of it as well. And it sounds to me like Phil doesn't have much of this observer problem. Yeah. So the observer problems tend to be worse in CPU profiling, because as you said, like the act of profiling can change how fast the process runs, or which parts of the code run faster. So C profile suffers from this, because it's adding overhead for Python function. And so code that has a lot of Python functions will be slower. And code has less Python functions even even if the actual runtime is the same. So the adds overhead and even the solution our head and CPU profiles, actually sampling we only like every thousand times a second. You see what's running right now. And tools. I believe Austin works that way and I spine ancient, right?

35:00 It's more like a very like a helicopter parent, like, what are you doing? What are you doing? What are you doing, instead of actually walking along every step just constantly asking? Yeah, and so then then it gets a chance to run faster or whatever, when it's not Yeah, the impact is quite minimal. And because slower CPU functions will show up more, or you're just picking every once in a while, like statistically converge, you'll get a overview of where performance is being spent. That isn't exactly right. But it's close enough that you can that it doesn't matter that's not exact. CPU in memory sampling is might work well for someone with a memory leak. Because with a memory leak, like eventually, all your memory usage is this one function being called over and over. So if you only check some of the time, that's like, eventually you'll catch it. But if you care about the peak, you have to maybe not have to capture all the allocations, but like, you may have like one like one specific allocation, that's like 20 gigabytes, that's that's causing a peak. And if you're like, sampling doesn't catch it, then the sampling, the profiling is useless. And so effectively, one way or another, you have to track every memory allocation, if you actually want to find the big memory. And so the implementation approach, whereas sampling is a superior approach and for CPU, if you will care about high watermark or peak memory. instrumentation is often the only way to go. If you have uneven allocation patterns, which is the case in data processing applications, right? Yeah. And it sounds like maybe 50% speed hit is what the doc say, that doesn't sound too bad. Yeah, I mean, it varies like it's by slower in some cases, and faster and others is what, like, if you run pipes down? Yeah. It's not like 1,000% or something like that. Right? Yeah. And I spent, basically, once your profile is slow enough, people just don't use it, because they don't have the patience. Yeah. So a lot of the effort I put, like, the basic idea what it does is, is not that sophisticated, it's basically deck you you intercept on their reallocations, you know, keep track and then whenever you hit a new pq, you store a copy of that to that. You know, that's the pic. It's just yeah, so every overhead that that takes work, right? Absolutely. So one of the challenges is the reason you're using the profile or probably is because you have a lot of data, and you built it in some small scenario, and then you run in the real scenario, then it actually is not doing as well as you'd hoped. I that's exactly when you need to be able to run it with the profiler. And you need it to work fast, I guess, is what I'm saying to really use it in real scenarios. Yeah, and another thing I've done to handle that which, and this is a new project. So this is all, like working progress. But I know, like, I've gotten this one success story, if someone's saying they found a, I can within minutes, they found a memory issue, they wouldn't have found otherwise. So I know it's useful for some people, and other people have bugs. But another feature that I've added is when the worst case scenario for for running out of memory is your program has crashes. And this can be as bad as like your computer just switches altogether, which is not uncommon. Like just everything becomes so utterly slow that like, yes, if you left it alone for a day to come back, but I was getting started or you get like or just crashes, you can do a core dump, like record doesn't tell you, in theory, it has information you want. But practice that's a whole nother level right there. Yeah. And or no, it actually does not have the information, you want to think that what another thing, a feature I've added is that Phil makes some attempts to handle out of memory crashes. So if you run out of memory, and I'll say like, okay, you just got a failed application. So I'm going to try to deallocate all the large allocations that I know about just to free up some memory, and has like this emergency stash, like 16 megabytes that, like just allocates up front until it breaks the glass, the allocates that memory. So there's a bit more it lets it go and then starts tearing stuff down as hard as Yeah. And then tries to dump a report of like, this is like memory usage once and it won't always work. And I suspect it needs a bunch more work like it's a bunch of optimization. So they feel it dumping the report from field text memory. But the idea, like my goal, at least is that when you run out of memory, instead of just a crash, you'll actually get some feedback that will help you diagnose about Yeah, that's really, really cool. I don't know how c Python a C profile, excuse me, I don't know exactly how deep its reach is. But in seapro file, if I'm trying to look at, say data science stuff, and I'm calling a library and it's using its internal malloc and its current internal c stuff to manage the memory down on the C layer. I don't know c profile will check that you know if it's doing like crazy Fortran stuff or other allocations, who knows, profile. I mean, it's getting your CPU but it's Yeah, sorry, memory profiler, the one that does remember

40:00 Yeah, so Python actually has a memory, memory profiler and trace malloc. But it only knows about Python API memory API's. So if you're using an arbitrary C++ library, you won't know about it, which is common in the data science world, right? That's exactly where a lot of the action is. Yeah, yeah, memory profiler has a bunch of different ways it can work. But it can actually, the most general way works is like at the beginning of the line of code, the end of a line of code checks just how much memory that process is using. And so it'll work with any application. But it has the other downsides that we talked about earlier. So memory profiler can actually the reason I was using it was because it can actually catch any application from any C library,

40:44 painfully, for purposes of reducing memory usage, for sure. And so my goal with Phil Phil was to not just be tied to Python code, allocations, and be able to just generically support anything that any third party library using, which is somewhat tricky the way the way it's implemented, because there's like, I don't know dozen different ways you can allocate memory in an A program, and more if you support add support for Windows, like there is so yeah, malloc. And then there's a map. And then there's POSIX, memulai. And there's like C++ has aligned elk. And there's just Linux as mem fd create, you can create files, remembering and there's, you can map a map, you can sort of load files into memory of your process, and then like the memory of a citizen even because the operating system will cleverly load and unload stuff from disk on demand. And so it is affecting how much memory use but then OS will sort of optimize it for you. So it's not clear how to measure it. So there's a lot of ways that if you want to track everything like it, there's a lot of them, and I don't do all of them quite yet. But I've been sort of adding them one by one and hope to cover the

41:56 vast majority of cases pretty soon. Yeah, but you covered some of these at least already. Yeah. Yeah. cover basic mF usage malloc calloc, realloc. And transfer the standard API's added aligned ela, coaches, C++ parently, at least in some cases, Fortran, I've never done anything with Fortran, I just know that. It's the thing that scientific computing uses. And so I said, Okay, I'm going to figure out if Fortran is covered by this. And it turns out that traditionally, Fortran never actually had memory allocation, you would just like, write some code, and you'd say, I'm gonna have this array. And that's all you ever got. But modern Fortran from 1990 onwards, has explicit allocation and feel can at least capture that if you use this GCC is Fortran compiler. And so the idea is, you should be able to just take arbitrary data processing or scientific computing code, and it will figure out those allocations, it won't tell you like, which line of Fortran which line of CEOs responsible, because that's, like, there are tools that do that, but the performance overhead is immense. But it will tell you the switch lines, Python was responsible, and much of the time, that's sufficient, right? And as a Python developer, really, that's kind of the answer you want. You don't want to know that, like this internal part of NumPy did it you just want to know, I called, you know, load CSV on pandas or something. And that's where the memory is. Yeah, or something, right? You want to see the kind of boundary into that library? Because that's, that's where you control you're not going to go rewrite pandas or NumPy. Yeah. And yeah, much of that. So yeah, you will, like the the goal field is tell you, where in your Python code, the memory usage was, and not only tell you that in a very easy to understand way, it was another one of my goals. Yeah. So you want to tell people and maybe describe the flame graphs that they can see and explore? Yeah. And maybe we can link to one of the show notes. So flame graph, I think, Brendan Gregg came up with the idea. And then yeah, it's sort of showing you, you know, your programs that kind of any point, you have like a call stack, like you have function f cost function, G equals function h, that's kind of sort of a stack. And so you can put these bars that were the wider they are the more resource they're using brynden graduate into this for CPU, I'm using it for memory. And then yes, if you have a really wide, like, if you have a bar that's like 100% of the screen that's like it's this thing's using all this or the functions are called losing all your memory. If it's like narrower, it's using less memory. And then I've arranged it in a way that actually includes the source code to what you're reading looks like the best stack trace, it looks like something threw an exception, and you're just reading it, but the width of the bar shows you which lines of code you were responsible for how much memory cumulatively, I also added some stuff where there's a building and a rust library called Inferno, which is great, which then much of the heavy lifting, but I added a feature to infer nowhere. The wider the bar, the more memory choosing the redder it is. And so the idea is you just look at the graph and you can just see like, where it's red is where where's it red. That's the that's

45:00 gotta focus on right. Yeah. And so like you're I just naturally focuses on the expensive parts of code. And then what you're reading is etc. And these are cool, you can embed these into the webpages and then you can hover over them and click in like zoom into the functions and and really explore it quick and easy, right? Yeah, but Brendan Gregory general, the sort of Perl scripts that converted data into these SVG A's, and then in turn O, library ported that the rest, so I'm using ASP. NET, so they did much of the work, I'm just building on top of it, mostly some small features. It's nice, like this whole UI for exploring there to use it is super simple. Like, if you were gonna run Python space, your app.pi with its arguments, you just would replace Python with Phil dash profile space run. And that's it right, and you get this output. Yeah, my goal was also no options. This isn't a people don't run memory profiling, like every day, like it's not like a tool you want to tweak and customize your own personal needs, or you want to spend a lot of time learning. So another my goals is just it should just work. So I've, at the moment, it has one command line option, like we're done today, you know, you need to set that or think about it. And then the output is like, HTML page that has the graphs embedded and has some explanations. And so the goal is as much as possible to make it as sort of transparent and easy to use. And I have some further ideas of how to improve the UX, which I haven't gotten to yet, but nice. So as like a data scientists or a computing person who is not necessarily a programmer, I could just drop in here, pip install Phil, Phil dash profile, run my thing that normally I would just say Python run. And that's, that's all I really got to know. And then I get a web page, open the web page automatically.

46:47 So you don't even have to know if you're going to run out. And yeah, if your goal is you run it, it pops up a web page, read the web page, and you have the answer. Yeah, what's using wherever memory is going? He spoke about one of the cool features being the out of memory, catch and analysis. And you've got to do a slightly different thing on the command line to make that work. Right. Yeah, the issue is, and this is a thing I can probably fix eventually, it's just this is sort of a limit of my implementation. The code that generates the report right now is in Python. And if you just run out of memory, you can't go back into Python f1. Yeah, so if you're on a memory, like it's not, the experience isn't quite as nice. Eventually, I might end up like, if it reaches the point where I'm not like iteratively iterating it as quickly, I might rewrite that in rest. And then at that point, it might be feasible to actually, like, have a fully nice UI that crashes. Right. Okay, cool. Now also, currently, it runs on POSIX, Linux and macOS only, right? Yeah, I would expect it. I'm not sure we'd run in anything other than like, if you run this in FreeBSD. My guess is, yeah, that letting some access? Yeah, yeah, I don't think data scientists are scientists are using much FreeBSD. And macOS was added fairly recently. And someday, I would like to add windows, but it's, there's a lot of like, dealing with, like linkers. And like fairly low level details that I don't know as much about MCs on Windows. So it should be possible. I've seen things that make that make me think that it is possible. I just it's chunk of work. I haven't done too, because they're hired sorry. Yeah, you've either got to get it working, or

48:32 you've been disappointed. bankwest. Is that a lot of work? So yeah, yeah, I'm sure it was. So I actually think that maybe you don't have to worry too much about Windows. And that's not to say that people don't use Windows Windows is used by like, half the Python developers, and it's probably pretty heavy in the data science world as well. But you know, Windows 10, now has windows subsystem for Linux, and v2 is quite nice. So it's very possible, you can just point people at, you know, you have to use Windows subsystem for Linux, it would probably work because it's all it's all API's that I would expect, are emulated fairly faithfully. Yeah. I think it's just

49:10 a boon to virtual machine. So I don't think you have to do anything my impression, isn't it? Well, I was the original one was rather more sophisticated. Like there was something about like, yes, translating sis calls, I don't know. But version two, may add, there's a decent chance it'll work just fine. And so yeah, I'll put a link to Chris Moffitt's article on creating a when using Windows SSL to build a Python development environment on Windows. And maybe that helped people in general, maybe this will work. I don't know, we give it a try. Cool. And then you also, you know, it's one thing to just say, Well, too bad. That didn't work.

49:43 It's a lot better to say and here's some ideas for making it better. So you have a couple of recommendations for data scientists on how to be more efficient with their code and their memory. So I talked earlier about batching indexing and compression and actually gave a

50:00 supposed to give a talk at pike on about that this year, it was I mean, there's a recorded recording of it, I gave it live. And there's a series of articles here that sort of talk about those ideas and then show how to apply them and NumPy should apply them in pandas. And I've started writing some articles about like, how to just Python level issues, like

50:20 we talked about with like function calls and just ways to structure code reduce memory usage. So so there's a bunch of articles there ready, adding more over time, just with sort of the techniques you need to once you figure out where the problem is to reduce the memory usage. Right, right. Yeah, I just saw your your video. I didn't realize I didn't watch it yet. So I'll put a link to it in the show notes so people can watch your virtual Python talk. Yeah.

50:45 I've been going to Python for a very long time. And so she's very sad not being able to see like, friends at least once a year. And I know, Python is like my geek holiday. You know, just get out there and hang out with a lot of my friends that I only see. Yeah, otherwise interact with online, and it's really special. It's too bad. It didn't happen this year. Yeah. someday. Yeah. Someday, it'll be back someday. Like everything. All right? Well, these are really interesting ideas. I think covering them agenda was good. And Phil is a cool project. So I think it'll help some people out there who are having challenges. Maybe their code is using memory and too much memory and swaps and becomes insanely slow. Or they just couldn't process the data they wanted, because it didn't work. So they can hit it with this useful me recommendations and maybe unlock some answers. Yeah, I should add, this is a very new project. And so like, I know, one person for whom it worked great, but I also know one person for whom it just wildly misreported the memory usage. Okay. He's hoping to send me a reproducer. Later this week, you can fix it. Also, if it doesn't work, I very much encourage you to file a bug report. And let me know, I'm happy to do a screen sharing session. And so some people, just because I want this to be a tool that works for people. And so if it's not working, I want to help. And it's an early enough stage that I expect that there's still a bunch of major issues, even if it does actually work in some cases. So please try it. It might just work. And if it doesn't, please let me know. And I'll do my best to help. Yeah, very cool. And speaking of which, you know, people are asking me recently, hey, I'm looking for an open source project to contribute to do you have any recommendations on ones I might look at? Or consider contributing to? What's the story there? Are you looking for people who might participate, I would be happy to accept contributions. It's some parts of it are, there's a lot of fun stuff in there. Like, in terms of low level systems programming, there's like, there's a bunch of rust and like a bunch of C code and like, poking into the internals of CPython, if that is the thing that interests you, there's a bunch of work there. There's also a bunch of UI things that could be done. Like, if you think about profiling, the real usage pattern should really be profile this program, try to fix it, and then say, profile this again, and show me the difference. Like and then you can have a visualization of the differences. That is my eventual goal is like to have a user experience. That's not just what I use now, but actually shows you if things are better or worse than where. And so if people are interested in sort of that sort of UX kind of work, is there. What about building like, tutorials and stuff like that? Yeah, I mean, like, in general, I'm exciting to see people, it could. But it's also the same time,

53:34 some low level stuff, right, you will hit these places where it's like, I'm poking into the icon causing, like slight memory leaks internally and see Python, for out of position purposes, things like that.

53:49 Because you want to be able to refer to pointers being like, there's a bunch of work in order to not have a lot of overhead when you report a new allocation. And you want to be able to, like, keep a pointer address and the Python interpreter as a persistent key, which means you have to make sure things are garbage. Yeah, so that makes sense. Yeah, I can imagine low level this is a beast. Yeah, the the debugging can be tricky. But it's a lot of fun. And it's it's a very, I find a sort of a therapeutic project, because like it's like, it's tricky and difficult. But it's also like a very, it's a closed universe, let's, you know, be doing web development or distributed systems. It's like you're talking to remote services. And like, if you spin up five processes and like, you're dependent on a whole external world to make anything work these days, hours, this is sort of like its program, it runs on your computer, read some data rates and data. There's no, there's no outside Rome. Yeah, that's cool. So just like you can stay focused on the problem on hand and not the fact that like, get hands down, or whatever. Yeah, I've been there. Alright, before I let you out here though, let me ask you the final few questions. If you're going to write some Python code, what editor do you use? I use Spaceman

55:00 x, which is a configuration of Emacs that makes Emacs a lot more like a modern ID nice. Okay. Makes Emacs like, experienced junk 20 years forward, just by installing configuring write packages cool. And notable pipe UI package Pippin salt is it Phil or Phil bash profile? I got a pip install. It's filled profiler, no dash, right. And so the si l e R. S, I le. Yeah, that's an obvious one. What's another one that maybe you've come across recently, and they're like, Oh, this is really cool. People should know about Ah, nothing is, I guess, to mention Austin, I don't know quite as much about it. But pi spies is another. It's another sampling profiler. And it's another kind of System Programming II package where like it's doing these interesting things in rustler. It's like, it looks at like the memory layout doesn't, it looks the very layout of your Python program, like parses out the data structures and reads things out. So it's another sort of very intense System Programming, which, ideally, is all hidden behind the scenes. This gives you really some results. All right, that's a good one. Yeah. Have to check it out and try that one. All right, final call to action. People want to get started with Phil, what do they do? Go to Python speed, comm slash products slash profiler. Maybe in the URL wrong. I should probably get a shorter URL, excuse me, Google phi l space profiler. Should or you can put a link in the show notes. Yeah, I definitely have Legos here notes. No doubt. Yeah. Doing Phil, space profiler works for me, looks for me. Or you can go to Python speed calm. And then this links to that and other stuff I've written. All right. Very cool. also include the link to your virtual Python talk as well, so people can check that out. Cool. All right. Thanks for having me. This has been another episode of talk Python. To me. Our guest on this episode was Mr. Turn trying and it's been brought to you by linode. And us over at Talk Python Training. Start your next Python project on the nodes state of the art cloud service, just visit talk slash linode. Li in Eau de, you'll automatically get a $20 credit when you create a new account. Want to level up your Python. If you're just getting started, try my Python jumpstart by building 10 apps course. Or if you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at slash iTunes. The Google Play feed is slash play in the direct RSS feed net slash RSS on talk This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon