Learn Python with Talk Python's 270 hours of courses

#312: Python Apps that Scale to Billions of Users Transcript

Recorded on Thursday, Apr 8, 2021.

00:00 How do you build Python applications that can handle literally billions of requests. It certainly has been done to great success with places like YouTube handling a million requests a second and Instagram, as well as internal pricing API's at places like PayPal and other banks. While Python can be fast at some operations and slow it others it's generally not so much about the language raw performance, as it is about building an architecture for that scale. That's why it's great to have Julian Danielle on this show. We'll dive into his book 'The Hacker's guide to scaling Python', as well as some of his performance work he's been doing over at 'Data Dog'. This is talk Python to me, Episode 312, recorded April 8 2021.

00:52 Welcome to talk Python, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talk python.fm. And follow the show on Twitter via @talkpython. This episode is brought to you by '45Drives' and us over at Talk Python Training. Please check out what we're offering during those segments. It really helps support the show, Julian, welcome to talk Python me,

01:19 thank you, it's great to have you here. We've got a bunch of fun stuff to talk about. It's really interesting to think about how we go about building software at scale. And one of the things that just I don't know how you feel about it reading your book, I feel like you must have some opinions on this. But when I go to a website that is clearly not a small little company, it's obviously a large company with money to put behind you and professional developers and stuff. And you click on it. And it takes four seconds for every page load. It's just like, how is it possible that you're building this software, with so much, this is the face of your business. And sometimes they decide to fix it with front end frameworks. So then you get like a quick splash of like a box with a little UI. And then it says loading for four seconds, which to me feels no better. So I don't know, I feel like building scalable software is really important. And still people aren't getting it quite wrong quite often.

02:11 Yeah. I mean, I think it's all it's all also, there's a lot of things, what you want to do when you do that, which is like, I write proper code, for sure. But also you want to be able to make sure everything like to understand where that bottleneck might be. And that's not the easy part, like writing code and fixing bugs is dumb. And we all know to do that. But then if we are asking you to optimize like, well, that's one of the things that I usually use, as an example, when I talk about profiling is like, well, if I were to ask you tomorrow, like, I want you to tell me which products your curry is using 20% of the CPU, you really don't know, like, you can guess yeah, you can probably do a good guess most of the time. But for real, you don't know, you have no clue until you actually look at the data. Use a profiler or any tool for being that will give you this information.

02:59 Yeah, we're really bad. And using our intuition for those things. I remember the most extreme example I ever heard of this was I was working on this project that was doing a huge amounts of math, wavelet decomposition, kind of like Fourier analysis, but I think kind of worse. And I thought, Okay, this is too slow. It must be in all this complicated math area. And I don't understand the math very well. And I don't want to change it. But this is got to be here, right? It's slow. And I put it into the profiler. And the thing that turned out was, we were spending 80% of our time just doing a like finding the index of an element in a list. Yeah. Which is not a little

03:35 insane.

03:36 Yeah. My favorite I've been programming code is from the lock Knuth, which is, early optimization is the root of all evil. Like, it's, yeah, what do you know? And I mean, nothing. I will code it every week or so now.

03:48 Yeah, it's fantastic. It's fantastic. Yeah. In my case, we switched it to a dictionary, and it went five times faster. And that was it. Like it was incredibly easy to fix. But understanding that that was where the problem was, I would have never guessed. So yeah, it's hard to understand. And we're gonna talk about finding these challenges, and also some of the design patterns, you've written a really cool book called the hackers guide to scaling Python. And we're gonna dive into some of the ideas you cover there. also talk about some of your work at data dog, that where you're doing some of the profiling stuff, not necessarily for you internally, although I'm sure there's some, but it also could be for so many people, like you guys basically have profiling as a service and, you know, runtime as a service runtime analysis, that service, which is great. And we'll get into that. But before we do, let's start with your story. How did you get into programming in Python?

04:32 Oh, that's a good question. So actually, having started like, 15 years ago, or so I actually started to learn for the first programming language like, you know, get a scripting language like we used to call them at least a few years ago, and I liked Perl, but I wanted to learn like object oriented programming, and I never understood object oriented programming like the remote. It was so weird for me. I really could because I was young, and I don't know somebody talked to me about Python. I bought the book, like the

04:32 O'reily book about Python. And I kept it around for a year or so because I had no project at all, like no idea. Most of my job back then was to be a sysadmin. So not really anything to do with Python. And some of them are like I was working on 'DBM length aligners distribution'. And I was like, Oh, I need to do something like a new project. And I'm going to do that with Python. And I started to run Python this way with my project on one side, the book on the other side, I was like, That's amazing. I love it. And I never stopped doing Python after that.

05:30 Yeah, that's fantastic. It feels like it very much was a 'Automate the boring stuff', type of introduction, like there's these little problems and 'Bashes' too small or too hard to make it solve those problems. What else could I use? and Python was a good fit for that.

05:44 Yeah, that's a great way. I mean, usually, I have a lot of people coming to me over the years and being like, I want to contribute to a project, I want to start something in Python, like, what should I do? Like, I don't know, like, what's your problem you want to solve? Right? If you want to find a boring thing you want to automate or anything was the best idea you can have to? If it's an open source project that exists already? Great. I mean, good for you. It's even better. But I mean, just write a script or whatever you want to do start hacking and learning. That's not the best ways to scratch your own itch.

06:12 Yeah, absolutely. It's so easy to think of, well, I want to build this great big thing. But we all have these little problems that need solving. And it's good to start small and practice small and build up and I find it really valuable. People often ask me like, Oh, I want to get started. What should I do? Should I build a website like this? Maybe machine learning thing like that? I'm like, Whoa, like, that's Yes, you definitely want to get there. But you're really, really just starting, like, don't kill yourself by trying to take on too much at once. So yeah, it sounds like it worked well for you. How about now? What are you doing day to day I hinted at 'Data Dog'.

06:41 Yeah, so I've been doing Python for the next 10. Year after on Python, I've been working on OpenStack, which is a huge Python project, implementing a open cloud system where you can host your own AWS particularly. And so everything is in Python there. So I work on very large ones, the largest, I think Python project, which is OpenStack for a few years. And then I decided to go for a challenge. And then I was looking into building a profiling team building a profiler, a continuous profiler, which means you would not profile your script on your laptop, but you would prefer your application running on your production system for real. And I was like, but not something I think anyone did before in Python, at least. So I want you to do that. And but what I like to do like two years ago, and I'm still doing,

07:29 that's really interesting, because normally, you have this quantum mechanics problem with profilers and debuggers, especially profilers, like the line by line ones, so much where it runs at one speed normally, then you hit it with like C profile or something, and it's five times slower, or whatever it turns out to be and you're like, Whoa, this is a lot slower. Hopefully, it gives you just a factor of slowness over it. Like if it says it's been 20%, here and 40%. There. Hopefully, it's still true at normal speed. But sometimes it really depends, right? Like if you're calling a function that goes out of your system, and that's 20%. And then you're doing a really tight loop with lots of code. The profiler will introduce more overhead in your tight loop part now than it will in the external system where adds basically zero overhead. And so that's a big challenge of understanding profiling results in general. And it's a really big reason to not just run the profiler constantly in production. Right?

08:28 Yeah, exactly. And people do that. No, I mean, if you have the right profile, the way c profile works, I mean, we can dig a bit into whether 'Lexi' profile the way it works, like it's going to intercept everything yet, it's what we call a deterministic profiler, where if you run the same program twice, you will get the same C Profiler for sure. Like it's intercepting all the function calls that you have. So if you have a ton of function calls, it makes things like you were seeing five times slower, for sure, at least. So yeah, yeah. And it'll inject a little bits of byte code at the beginning and end of every function, all sorts of stuff. And it actually changes what happens, right? Yeah, exactly. So we can change the timing, it gets it. I mean, it's so it's a good solution to like 'Ballpark' estimate of what's going on. And it gives you pretty good results. And usually it's a good tool, like I use it a lot of times over the years. And it always gave me a good information of the primary supervisor, but you can't use it in production because it's too slow. It's also not providing information like it gives you the war channel that you use, but not sorry, the CPU time of the for each of your threads, etc, whatever you're going to use. So information is not really fine grained, it's a rough world jam. It's probably not streaming either, right? It probably it runs, I think gives you the answer. Exactly. It's not some sort of real time stream of what's happening. So I mean, one of the kids like who I mentioning previously, where I mean, you know, obvious part of the colloquy scenario but you can recreate in a one minute script or something you know, it's slow and it should take only 40 seconds you can run super fast around it on one minute on your laptop and the okay I'm going to trim out this piece of God punish you want to see What's happening prediction with a real workload for real and like you were saying streaming the data to see in real time what's going on, while super far doesn't fit. And also, any deterministic profiler, which tries to catch everything your program does will not work with good performance. So you have to do another approach, which is what most profiling profilers for continuous profiling do, which is statistical profiling, where you actually somewhere your program and you try to look what it does most of the time, and most of them, so it's not a true representation. It's not like the reality 100%. It's a good statistical approach of what your program is doing. Most of the time, I see is that more of the sampling style profilers? where it's like, every 200 milliseconds, like, what are you doing now? What are you doing now? Exactly? Like a really annoying young child, like, what are you doing now? And it's gonna miss some things, right? If there's a function, you call and it's really quick, it's like, well, you never called that function as far as the profilers concerned, because it just didn't line up. But if you do it enough, over time, you'll get to see a good picture of exactly, you don't care about the stuff that happens really fast, what you care about the stuff that happens really slow. And those are going to show up pretty largely in these sorts of sampling. Exactly. So if you see profile, you will see this very small function code because it catches everything. But reality for the purpose of optimizing your program, you actually don't care. You don't care if you don't see them statistically, but because they're not important. So that's not what you want to optimize in the unless not where your problem lies, probably. It's in the 'Black Outliers' are ones that you see often in your profile to one using UC, like 80% of the time where the professor asks you program, what are you doing? It's always it's unfortunate being called was the one you want to look at? Yeah.

11:38 So I think that fits quite well, with the production style. I know, it's gonna ask you about your book, but we're sort of down in this profiling story. That's fine. And, you know, I've used Data Dogs, tools for error handling, and like exception, you know, let me know when there's an error type thing. So I have that set up on like, talk Python, the podcast site and the talk by then training courses site. And of course, when you turn it on, you get all these errors that are like happening in the site. But nobody's complained to you that you didn't realize there's some edge case or whatever, it is really fantastic. But something I've never looked at is the real time profiling stuff that you're working on these days. So maybe just I have no idea what this is, like, I can imagine maybe what it's like, but you give me a sense of what kind of stuff do I get out of it?

12:21 Sure. Yeah. So what's the first thing you'll get? It's two profiles. So you get framed shots, essentially, which are, you know, these kind of shots where, I mean, you look like flames, usually, because there are like orange and red and going up and down. And the high being that that for your stack trace, and with being the percent of time of resources that you use, so usually, it's time you're going to meet her, for example, we meet her wartime institutions or watch recei is if your function using a lot of war time, is it waiting for something is waiting for a second to be read, for a lock to be acquired. But one of the profile we gather is our CPU is actually using. So if you want to know if you are programming CPU bound, you will see which function is actually using the most CPU in your program, right? Because I could go to like my hosting provider, and I could check a box and say, No, no, I don't want to pay $20 a month, I want to pay $50 a month to make this go two and a half times faster. If I'm CPU bound, that might actually work. But if I'm not, it probably has no effect, or small effects. Right? Exactly. This portion of talk, by the way is brought to you by 45Drives. 45Drives offers the only enterprise Data storage servers powered by open source. They build their solutions with off the shelf hardware and use software defined open source designs that are unmatched in price and flexibility. The Open Source solutions 45 Drives uses are powerful, robust and completely supported from end to end. And best of all, they come with zero software licensing fees, and no vendor lock in 45 drives offer servers ranging from four to 60 bays and can guide your organization through any sized data storage challenge, check out what they have to offer over at talk python.fm/45drives. If you get in touch with them and say you heard about their offer from us, you'll get a chance to win a custom front plate. So visit talk python.fm/45 drives or just click the link in your podcast player. So knowing that answer, the only thing that would be really helpful. Can I scale this vertically? Or do I have to change something else? Yeah, yeah, it's it was three I'm profiling where I mean, most of our users when they come to us alive, we save 1000s of dollars, because we actually understood that we got the button lucky are there and we were able to downsize our deployment because we optimize this function. And we understood that this was blocked by this IO wherever and when you understand all of that with profiling, whatever the language is, by the way, and being Python or Java or anything, you actually say a lot a lot. We have terrific stories that like our blood of our customers or internal users, saving 1000s of dollars just because they were able to understand what was going on in our program and and Getting up was not the solution, optimizing the right function while the solution so you'll get CPU or 'PyCharm'. And we also do memory profiling. So you will see all of the memory allocations that are done by Python, which are kind of tie to the CPUs as we go, the more objects are going to allocate them when I mean allocate, I mean, even if they don't stay around, instead, you want to create a new string and your object or whatever, even for a few seconds, milliseconds, it costs to memory like you have to call 'malloc( )'. Under the hood, you have to allocate memory, which takes time. So you will see that so if you reserve objects that are not for us, for example, you might want to see that that is when we shipped two weeks ago as 'e-profiler', we were you actually see a sample of your heap like member you use, okay, in real time, and what has been allocated on the heap? And can you tell me how many of each type of object I have like you've got? I wish 20 Meg's in less? You've got 10 Meg's and strings? No, no, I mean, in theory, yes. In practice, no. And I'm surely fighting upstream with the CPython folks to to be able to do that. Is there a limitation on CPython right. Now, technically, we can't really do that. But I'm able to give you the line number of the file, the function name and the thread, but I also look at the memory. And yeah, I wish I could know the class name. But would you like for the javas? That's, and I want to add that in Python. So like, we bought my pythom for next year? Yeah. But I mean, if you have a memory leak, for example, which is quite common, right, where you keep adding more objects on top of each other, at some point, your memory grow forever, and you don't know where they come from. So with such a tool, a profiler, you're about to see which stack trace is going to add more and more and more memory forever. And you'll be able to say, No, it won't give you the solution to your problem, it will give you where to look at which usually is still pretty good. Yeah, yeah, it's like 90% of the problem, too. Can you talk a little bit about the internals of like, how this works? I'm guessing it's not using C profile? No, directly? Is it using other open source things to

16:56 sort of put this service together? Or is it not so sometimes everything is open source, so we want you to look at it, it's on on our Data Dog repository on GitHub. So the way it works for this view, and then what I'm profiler is pretty easy. A lot of people know about that, you can actually ask

16:56 'CPython' to give you the list of running threads. So if you do that 100 times per second, you get the list of running threads. And you can get the stack trace by running like function name, line number, like running, so you should not run out of time, get a pretty good picture of what your programming threads are doing most of the time. So I mean, it's a variable. So it works, it's pretty easy, then there's a few tricks to get the CPU time etc, using the different API. But that's most of it. And for memory, they're actually a good thing that has been done by a friend, Victor's chinnor, which is one of the CPython, he's done a great amount of performance improvement, like really important stuff. Yeah. And one of the things that you didn't buy from freelook, for , or it was a long time ago is to add this module trace malloc, which we don't use, I mean, I actually built on top of it at some point, but we don't use anymore. We were the lightweight version of it. But it open the memory API of CPython, where you can actually plug your own memory allocator to 'CPython'. And that's what we do with our profiler. We replace the memory allocator bar, tiny rubber, but caches every location and then to profiling on top of it. Right, exactly. So when it says allocate this, you say, record this was allocated. Exactly. And then allocate it right. So I'm like that. Yeah. Is this the thing you were talking about? On the screen? That's DD trace pie? Or is it? Yeah, exactly. You have a room directory in cybers, the profiling directory and all the goodies were so you can take a look at the way it works internally. Yeah, I mean, the way we build it is to be able to be easy to ship to deploy, you don't require any extra permission. Like there are a lot of different way of doing profiling using both of our Linux capabilities. There are a lot of things that are external and not necessarily portable outside Linux. But the problem is that most of them require extra permission, like being rude or anything like using the REST API requires you extra permission, which is not, which is great. I mean, resolution may be better technically, for some points, compared to what we do there. But there are very complicated to deploy. So what was the thing? That drivers, I think, for writing this, right, so a simple 'pip' install plugin a liner to and off you go. Right, exactly. I mean, it's pretty simple. And so for exporting that data, we use a 'pprof' format from 'Google', which is pretty standard. So you can actually use this profile if you want. Even if you're not a direct customer, and you want to give it a try, you can actually export the data to a profile and see what the data you want or the whole the analytics that we provide and as a fancy frame shop with all their rainbow colors, but you can use the people of goodwill, which is pretty okay. Oh, interesting. So you can get basically the raw data out of it just by using it directly. It's just you guys provide Yeah, the nice gathering? Yeah, exactly. Yeah, you have to store the file. Exactly. We provide the streamlining. Well through Patreon, and we provide a ton of modalities. But if you are curious and want to take a look at how it works, and where you can provide a man, it's a good way to do it too. Alright. Also, I do want to dig into your profile or your scaling book, which is what we're going to spend a lot of time on. Sure. One final question, how can I diff profiles, like from one version to another? Because one of the things that drives me crazy is, yeah, I've done a bunch of recording, I got my numbers, and then I'm gonna make a change. Is it better? Is it worse? What has gotten better? What has gotten worse? Like? Is there a way to say compare? Yeah, that's something we are building a data dog and our back end site to be able to track all your releases and tell you where this is going faster versus going slower, and which functions or methods are being the culprit like of your slowness, or whatever. So yeah, I mean, that's definitely something we want to do. Yeah, that'd be so neat, because you do maybe take a whole week or two, and you sit down and you make your code fast, and you get it all polished. And then over time, it kind of degrades right, as people add new stuff to it. And they don't really necessarily do so thinking about performance. So it'd be cool to like, okay, here's how it's degraded, and we can just focus our energy on making this part better again, I think that'd be great. Yeah. All right. Well, tell us about your book, I find it fascinating. I kind of gave it a bit of an intro duction the idea of scaling it, but official title is the 'Hackers guide to scaling Python'. And the subtitle is build apps that scale to billions, millions of users billions of requests, billions of whatever, I guess, but yeah, most apps don't do this. So I think they would be a lot of people be interested hear about it, right? I mean, most of them do this. But so many of us don't really need to do that. So that's not I wrote that book, I think four years ago. Now, because I was working, like I said, on OpenStack, where I've actually tried to scale the things to periods where we'll be running like it will be hops running on 1000s of nodes,

21:48 right. And maybe any individual app is not scaling to that level. But you guys basically being the platform as a service. Yeah, in aggregate have a huge amount of load put on it. Right, exactly. Okay. And which I've left for like, when authors who write the book, a lot of people were playing outside Python, or Python round, he goes, why Python is slow. You can't do anything meaningful with Python. Right? So slow, you have to switch to go. And that was another thing I was. Yeah, that's the first thing you have to do is if I understand this, yeah, that Python is slow. So you have to switch to go. Right. That's, I hear this all the time. Yeah, exactly. So I so conference with stack in the OpenStack project, somebody rewrote one of the software of OpenStack. In go, because it was faster and was like, Nope, I mean, the Python architecture you use is slow. That's why it's the programming slowest, nothing ready to Python like, and there's no need to switch to go like, I mean, it's not the language, he said the architecture was a different. So that's what kind of motivated me at the beginning to write that book to be able to share what everything I learned for the years before building OpenStack in part of it, and learning on what works and doesn't work as getting Python and to stop people switching to go for batteries. And there are good reasons for go for sure. But

23:02 yeah, sometimes know. Exactly, not just because there's so so well, you know, another example of this is people switching to 'nodeJS', because it's, it could handle more connections right now. And the reason it can handle more connections, is because it's written in an asynchronous way, a non blocking way. And so if you write blocking Python, it doesn't matter. If you write blocking C, it's not going to take it as well as if you write non blocking Python, right. And so you know, things like 'asyncio'. And those types, ASCII servers and whatnot, can automatically sort of put you back in the game, compared to those systems that the magic of magic in quotes, the magic of node was they made you do that from the start? They're like, Oh, you can't call synchronous function. So the only way you do this is you write crazy callbacks until better ways, like with promises and futures, and then async and await get into the language. But no, they forced people to go down this pattern that allowed for scale, then you can say, Oh, look, our apps scale really well. And it's just that I think a lot of times people start with the easy way, which makes a lot of sense in Python, but that's not necessarily the scalable way. So yeah, start one way, but as you identify these problems, you know, maybe bring in some of the ideas of your book, right?

24:08 Yeah, totally. I mean, one of the first thing I'd like to say about that is like, Python is not fast or slow. First, it's I would say like English is slower English is fast, doesn't make any sense. You have people speaking English very fast or not. It's like Python the long way round C, Python is slow. Okay, it's not the best VMware actually, I think it's far from being the best VM author, machine of the long way. Like, if you look at the state of the art of an O VI for JavaScript, or, or 'Growl' or whatever, for Java, or the JVM itself is pretty great nowadays. And, I mean, if you look at all about Python, I mean, CPython is really looking bad, I think, but very over upside, which gives you good things when you use Python and our good reason to keep using Python and CPython in 'VN'. So I think it's a trade off and people are not always putting the wrong All right, wait at the right place for doing that trade off.

25:03 Yeah, I agree. One trade off might be Oh, you could write it in, let's say 'RUST' or something, for example, make it go faster. But then you're giving up the ability for people to come with just a very partial understanding of Python itself and still being really productive, right? Like, people don't come to 'RUST' and Java with very partial understandings of it and be super productive. They just don't, right, you got to dig in a big bite of that hole. All the computer science ideas, there was like Python concert so simple and clean. And I think that's part of the magic. But some of the, I guess some of the patterns of that simple world don't always make sense, right? I do like that. You pointed out that not everyone needs highly scalable apps, right? Because it's really cool to hear, oh, they're doing this thing with Instagram, right? Like Instagram turned off the garbage collector. And now they're getting like better memory reuse across the web workers. And so maybe we should do that, too. It's like, Well, hold on now. Yeah, how much are you spending on infrastructure? Can you afford just 20 more dollars and not have to deal with this ever? Right? I mean, they run their own version of CPython, that's a fork where they turn off the garbage collector, right? Like, do you really need to go that far? No. So I kind of put that out there just kind of a heads up for people before they dive in. Because kinda like design patterns. Like, I feel like when you learn some of the stuff, you're like, oh, let's just put all this into place. And then you can end up with a more complicated system that didn't really need all those putting together at once. And maybe like, there's no app that actually incorporates every single idea that you've mentioned here. Just they're all good ideas in their context, but not necessarily. You wouldn't order everything on a menu and put it all on one plate and then try to eat it.

26:36 Now, right? Especially because if you start like, for example, the other thing people do usually is I, you read a program, okay? It's not fast enough. Let's not say it's slow. It's not fast enough for you, you're like, Okay, I want to make it faster. So if you can, you can paralyze thing, you're like, Okay, I could run this engine bar, you go. Again, we used to use threads. Alright, but easy. Where's the threading API, there are the concurrency. In future API in Python, it's pretty easy to do. But it adds so much complexity to your program, you have to be sure it's really worth it. Because now you know, you're entering the world of concurrency. And when you're entering yameen, you have to use locks, you have to be sure about your program is not having side effects between threads as a bad time or anything. And it adds so much complexity, it's actually very hard to make this kind of program right and to make sure it works. And there are so many new edge cases you're adding by adding concurrency being friends or anything else. But you have to be sure it's worth it. And for a lot of people out there, it's really not worth it, you could have a pretty simple application with just one process or a couple of process behind a unicorn, or you're using a walker and be fine forever. But it's mechanisms to try to optimize like I was saying, like early optimization through the follicle, don't do it like unless you are sure and you actually, you know why it's slow, you know where to, to optimize, which might be a good user for profiler or not, depending on what you're trying to change, optimize. But make sure that you understand the trade offs you are doing, I saw so many people rushing into threads or anything like that, and writing code that is invalid. And I think you crash in production because of race condition, etc. And whenever I thought about, and it takes them months, years to get things right, again, because it's very complex and writing, like manipulate God is not something humans do very well. So yeah, if you can afford to not do it, don't do it.

28:25 Well, I think going back to the earlier discussion with profiling and stuff, either in production or just with C profiling 'Measure first', right? Because, yeah, and then you know, so much better what can be applied to solve that problem? Because if the slow thing is you're waiting on the database, well, you sure don't need more threads to worry about the hand, right? You might consider 'cacheing', that could be an option. You might consider indexes to make the database faster. But you could really easily introduce lots of complexity into the system by applying the wrong fix, and not really get a whole lot better, though. Yeah. All right. Let's talk about scaling. I think scaling is just the definition of scaling is really interesting, because a lot of people see your that like, I want an app that scales or like, man, YouTube is so awesome. It scales to, you know, million requests a second or whatever, I want that. And then they have one, they have their app running, and they click the link, and it takes three seconds, or they they run the function, it takes three seconds. Well, if that app scaled Well, it would mean it could run in three seconds for 100 people as well as it could run in three seconds for one person. Like that doesn't necessarily mean faster. So there's like this. Yeah, this distinction I think people need to make between high performance fast code quickly responding and then scaling. Like, it doesn't degrade as it takes on more users. Right? Maybe when I riff on that a little bit.

29:47 Yeah, there are three dimensions basically, which is like we were saying what is more users, which is more in parallel, let's say and what is faster or like having the page being loaded faster. So there are two different things. If you want to really optimize One particular use case like page being loaded or whatever, it's really a good practice. I mean, you can't really scale that request on multiple nodes. Let's it's very complicated. But I like to load a page or REST API or anything like that. You really want to 'Profile' that part of the code to be sure,

30:17 yeah. And that's a case where profiling locally with C profile might actually be really good, right? Like one request is actually quite slow. Yeah, like, you could learn a lot about that, running that in a profiler. And adding the horizontal scalability stuff might actually make it a tiny bit slower for each individual requests, but allow many more of them to happen. So you got to figure out which part you're working on, right?

30:38 Yeah, keeping email, like if you see profile on your laptop is going to be from the agency profile on AWS, or anything you run, because like your database is going to be different, the latency going to be different. So it's hard to reproduce the same condition on your developer laptop that you have in production during the same system. So I mean, it's really a good way to get to 80% of the job. But in some cases, it's great to have continuous profiling on your production system that like gives you a good way to optimize your code and to make sure that this dimension of being faster is covered, then the dimension of Wow, let's scale to all of users and still have the 3 second load for everyone, then that's another problem. And that's where you actually don't need a profiler. But you need a good architectural program and your code, yeah, and be able to spawn a new process, a fresh, new node new anything, one can process things in parallel for you, and take up all like spiritual programming from parts, and I think a good architecture and there, you can do it with Python, with any programming language, you honestly, you can do it also that with Python, originally, to switch to any other language, if you know what you're doing, right,

31:44 it makes such an important difference there. Alright, so let's go. And that'd be fun to go through a couple of the chapters of your books and just maybe book and just talk about some of the big ideas there. And the first you kind of build your way up to larger systems, right? Like you start out, where are you talking about what is scaling, but the next one that you really focused on is, how do I scale to take full advantage of my current computer, like, the one I'm recording on here is my Mac Mini and one, it has four cores over there, I have my simracing setup, it has 16 cores. Let's suppose I'm running on that one by run my Python code over there, and I create a bunch of threads and it says a bunch of Python things, there's a good chance it's using 116 of that CPU, right?

32:22 Yeah, exactly. It says, I mean, people will start with Python, usually, it's that issue in pretty soon where you want to run multiple. I mean, you want to miss both threads in parallel, for example, to make sure your code is faster. And then which is a proper way. I mean, outside Python is a poor way to scale like running your threads allows you to run another execution thread of your programming on your cpu. I mean, and Fred, we're not used that much 20 years ago, because all computer on every computer on your one core, right? I mean, your personal computer, it was, right, a bunch of them with only one core and nobody cared about the threads. No, but everybody 16 cores in their pockets is like, Whoa, we should keep our friends. Right. So Exactly. Yeah. So I mean, if that's where you started, like 10 years ago, we're seeing more and more people being interested in using threads in Python, because Well, I mean, then we this competition, and I could do it twice to go faster. So I'm spinning on your threads. And then we got accepted. If you do that in Python, it doesn't work very well, because we're this global interpreter lock Gil, which actually makes sure that your Python code works nice on multiple threads. It's about every thread running Python code executing bytecode, they have to acquire this lock, and l to wait forever until we're finished or until they get interrupted. Which means you can only have one thread in Python running at a time on the Python VM. Yeah, and your productive weight or do overs, which are not Python related, which is what a lot of C extension like NumPy or over 16 you may be using are doing very well using the Gil and doing things which are not Python, but still doing things that are useful for you. But if I guess like you were about to release the Gil and let the rest of the Python program runs, but if your program is 100%, Python, you don't use any kind of C extension, known as you've got anything, then all your threads are this giant bottleneck, which is a Gil, which block and referred. So if you run, I think my record is like 1.6 cores with a ton of threads on the Python program like you can't really use, I never managed to use two cores with a single Python program and a lot of threads. It's very hard to get to that two cores being useful when you have 30 to 64 cores, machine by two RAM's over to us it's a pretty waste of resources.

34:38 Yeah. So this is super interesting. And people often see this as like Python doing a threading restriction. And what's really interesting is the GIL is really about protecting the memory allocation and cleanup, it's incrementing and decrementing. That ref count. So you don't have to take a lock every time you touch an object or assign a variable which would make it really slow. It'd be more scalable, but it would be slower. Even in that single use case, right?

35:04 Yeah, the experiences to do that. And essentially what you have in Java, there are this kind of monitor, I think they call it where you have a locker object, and it works well for them. But that's me the details. But for Python, there have been a few experiment to do that. And yeah, it makes everything very, very slower, unfortunately. So it's not a good option to do to to go that road. And I mean, there have been, if you look at the history of the gear, there has been a product called the 'Galectomy'. me a few years ago to remove the gear, I mean, there'll be plenty of experiments to try to get rid of that. And the other problem that if we ever do that, at some point, it will break the language. And a lot of us are on the language, because like in Python, when you add an item to a list, it is thread safe by definition, because the Gil for sure. But if we start by saying, well, each time you want to add an item to a list, you need to use a lock to do that. The way we do it implicitly, but is very slow. Oh, you do it explicitly as a programmer, then it's going to be very tedious for sure. And it's not going to be compatible with the Python window right now. Which is not a good option. So we're stuck. Yeah. Well, there is have you been tracking the work on have 554, multiple sub interpreters with that Erickson has been doing? Yeah, a little bit. I think that offers some really interesting opportunities there. Yeah, I think it's a different approach. But it's a mix, like it's a trade off between multi threading and multi processing. Yeah, it's like a blend like a half and half of them. Yeah, yeah. And but I think it's the most promising thing we have right now. Because I don't think the Gil is going to go away anytime soon, unless somebody really take like a giant project and do that. But there's nobody unfortunately, outside inside of the Python community. Divers, no company is going to sponsor any kind of effort like that a lot of the Python upstream stuff, from what I see are run by people, you know, willing to do that on their free time. And some are sponsored for sure, or either by company, but a lot of them are not. And then there are nobody, like a giant big tech company trying to push something like that forward. So it's probably what's also not I think, Python. So with the materials, what I think is probably the next best thing we'll get, I think it is as well, because I just don't see the Gil going away, unless we were to say we're going to give up reference counting. Yeah. And if you give up reference counting, and then you add like a jet, and you get like I mean, that's a really different change. And you think of all the trouble just changing strings from Python2 to Python3. Yeah. Like this is right. It's crazy. Okay, we're not finished yet. I still have to mention a lot of Python two code, to be honest. Yeah,

37:35 I'm not ready to do Python4 yet. So yeah, I don't think we're ready for either. So I think several interpreters is interesting. And some interpreters are interesting, because they take the ways in which you sort of do multiprocessing, which does work for scaling out, right, you kind of do message passing, and each process owns its own data structures and stuff like that. And it just says, Well, you don't have to create processes to make that happen. So it's faster, basically.

38:01 Yeah, and I think one of the problems with multiprocessing is also serializing. The data between the different process which there is always I think Stack Overflow is filled with that people are complaining about unable to peakers the data between multiple processes, which is very true. So I think I hope that having the system at the bottom thing will solve part of what's not only to serialize everything as window of performance, I don't have to say for sure. But also in terms of performance, you don't have to serialize and say another thing every time you want to pass something to a sub process with a very huge fail. Yeah.

38:35 So in danger of not making it all the way through the other topics. Let me ask you, a couple other real quick questions or comments, let you call out a couple of things. One like this CPU scaling is a problem. Except for when it's not like sometimes it's not a problem at all. And the example I'm thinking of if I'm writing an API, if I'm writing a website, we need those things, the way that you host those is you go to a server, or you use a platform as a service, which does this for you. And you, you run it in something like micro whiskey or 'Gunicorn' or something. And what they do immediately is they say, well, we're not really going to run it in the main process, the main process is going to look at the other ones, I want to create four or eight or 10 copies of your app running all at the same time, it will like send the request off to these different processes. And then all of a sudden, hey, if you have less than 10 cores, you're scaling again.

39:21 Yeah, so that's why I mean, threads are great for things like IO, etc. But if you don't really want to scale for CPU and cores, threads are not the right solution. And it's way better to use multiple processes. So either way, I mean, unicorn usage very good solution for web apps, or for alternative to like, I mean to that but yeah, or a framework like like 'Celery' for for doing jazz, for example, which is about out of the box was born, which processes to enter all of your tasks on multiple CPUs. And usually, once you I mean, if you don't use any kind of 'asyncio' like framework or 'tornado' or anything like that, where you only do One process running one task at a time. And you can respond to multiple processes even more productive when you have cores. If you have 16 cores, you can start, I don't know, 100 processes, if you have enough memory for sure. Which memory is really not a problem, unless you're definitely Why should you for sure. But like for a REST API, it's really not a problem. You're not using gigabytes of memory per process and progress. So yeah, it's fine spending a lot of unique on walkers. Yeah, it depends on that for sure. So two things that I ran across that were interested in this chapter that you covered, were 'futurist' and 'QuTiP', then I'm not sure how you say that. Second one. But can you tell people about these two little helper library packages? Yeah, sure, is actually, it's a tiny wrapper around functional features that you might know. And in Python, the thing he does a few things that are not there, like the ability to have statistics about your pool of threads or your clock of anything, you use processor, which is give you a pretty good ID a lot of application on where the like, I can scale on 32 thread and 64. And you have a setting usually to scale that. And you don't really know as a user, or even as a developer, or many friends, you're supposed to start and turn down your your workload, you're like just typing a number randomly and see if it works or not. And I think statistic around VR is pretty useful. There are some feature if I remember correctly, where you can actually control the backlog. Like usually you have a pool of threads or processes or a pool of anything trying to handle your task. But the more you are, I mean, you can grow forever. So I think the ability to control your backlog as I okay, I have enough to ask in the queue. No, you have to do something like I'm not going to take any more time. So that's the pattern you see a lot in queue system? Do people want the general queue system that is under accuses them with like, Where's the queue, I'm going to take things out of it and process them. And they don't think about covering two sides of the queue. So the queue can grow forever, which in theory is great. But in practice, you don't have infinite resources to store the queue and venture processes. So you want to be able to reject works.

42:02 Talk Python, to me is partially supported by our training courses. Do you want to learn Python, but you can't bear to subscribe to yet another service at Talk Python Training we hate subscriptions too that's where our course bundle gives you full access to the entire library of courses. But one fair price. That's right, with the course bundle, you save 70% off the full price of our courses, and you own them all forever. That includes courses published at the time of the purchase, as well as courses released within about a year of the bundle to stop subscribing and start learning at talk python.fm /everything. One of the big complaints or criticisms, I guess, I should say is in these 'async' systems, they don't provide back pressure. Yeah, right, a bunch of work comes into the front and it piles into asyncio, which then piles massively on top of the database, and then the database dies. And there's no place further back before the dying of the database where it kind of slows down. And so this is something that will allow you to do that for threading and multi processing, right?

43:04 Yeah, exactly. And which is one of the other chapter of the book, which was titled is designed for failure. And you could write another book on that, which is, when you write your application, usually you write something in a very optimistic way, because you are in a good mood and like everything's going to work, right? Well, you test to a small data and a few clients. Right? Exactly. And the more you scale, like, the more you add thread's, the more you add processes, we're going to add nodes on your network, you're going to use Kubernetes to spawn hundreds of nodes a version of your application, and the more likely it is to feel like somebody is going to unplug your cable somewhere, anything can happen for sure. And you're not designing for that. Usually you're designing in a very optimistic way. Because most of the time it works. And when when it doesn't, if you really want to go at scale, I mean, what do you want to go for, and you want to work, even in extreme condition, like when the weather is really rainy? It's a lot of work. So that's why I was saying at the beginning, like it's a trade off to even thread scores, when you have to wonder what happens when I come started a new thread anymore, because I know my system is up, right? Which is pretty rare. Nowadays, you have a lot of threads or memory. But if you want to, I mean, very limited resource system or whatever. Like what do you under that? Yeah, threads pre allocate a lot of memory like stack space and stuff. Yeah. Have you heard of locusts@locust.io? Have you seen the thing? No, I don't think so. Yeah. So speaking of back pressure, and just knowing what your system would take this thing is super, super cool. It's a Python load testing thing that allows you to even do distributed load. But what you do that's really interesting is you create a class in Python. And you say you give it tasks. And then those tasks have URLs. And then you can say, well, this one I want 70% of the time, the user is going to go there and then they're going to go here and then I want you to kind of delay to like they click around like maybe you every 10 seconds or so. So randomly around that time haven't click around. And it's just a really good library for tool for people are like, Oh, I didn't test it with enough users because I'm just me. But what is it? Like something like this would be a really good option? I think, yeah, that's a good event to gather data for profiling after. That's pretty good. If I were to do something that looks delicious. Oh, yeah, yeah. Yeah. That's interesting. Yeah. Right, because you want to profile a realistic scenario. So instead of hitting with one person, you hit it with like, 100. Yeah. And then get the profiling results of that. Okay. That's a good idea. Yeah. That's really the good thing with countries profiling, what you are able to see for real what happens, but if you know to reproduce, it was also a valuable option. Yeah. Okay. Very interesting. Alright, so CPU scaling is interesting. And then Python around 3435 came out with this really interesting idea of Yeah, 'Async IO', and especially 'Async', and 'Await' that makes 'AsyncIO' so much better to work with. And what's interesting is that has nothing to do with threading. Like that doesn't suffer any of the problems of almost the problems of like the Gil, and so on, because it's all about waiting on things. And when you're waiting, usually you release the Gil. Yeah, so Thread's are a good solution for IO, when you can't actually use something like async i O, because let's be honest, I mean, issues your own library, which was designed five years ago, it's not designed to be a single, so right, or you're using an old ORM. And the old ORM doesn't have an async version of it. And it's either rewrite it in a new one. Yeah. Or use the old, not anything, right. Something like that. Maybe threads? I don't know. Yeah. Usually. I mean, it's a good bad example. This one's What's the good example. Technically, but usually the premise people writing but queries, better queries in index, I'll probably solve that most the time. Yeah, exactly. But in theory, you're right. Technically, it's a good example. And yeah, I mean, event loops like async i O, it's magic. Because like you were saying, it's like the node thing that brought that back to life where it has been used and where for the last 40 years? I don't know. Yeah. And it's like, suddenly, everybody's like, well, it was amazing. And so you would write anyway, any web servers was last for two years. But it's great. No, it's written in Python. So it's pretty easy to use. And it's a very, very well, I mean, I think it progressed a lot of whether you're like a couple of years or years ago, where everybody was using Flask, or Django, which is still true. But there's a lot a lot of better alternative in this sense about 'Blackstar Latics', 'Java First API' that you can use to build an API website based on 'async' view. Yeah, this whole 'asyncio' world is really flourished since you wrote the book, hasn't it?

47:18 Yeah, I would rather actually learn, I think you're writing the book. And there was nothing It was very, like, the membrane is like, I want to use Redis, or like you were saying a database, and there's nothing. So all these very low level stuff. And like, it's not going to be like, yeah, I can use it. But it's going to take me hours of work to get anything close to what I would do the synchronous version. So nowadays, yeah, it's totally better. I actually do a lot of work, I think I do myself, it's, I mean, everything's very for every thing doing so get to far layovers, and everything is available, there are sometimes progression of the same library, because a lot of them don't agree on auto already. So whatever, which, I mean, it gives you a choice. So as far as grades, and it's a very, very good way to not use threads. So use it as a concurrency program where you can have multiple tasks from a single you're running not as the same time in our space time dimension, but like being sorry, means pose being resumed later. So you have to still take care and you actually use lock to your 'asyncio'. Yeah, but it's still a little less of a problem an issue with threads, and you're not going to use more than one CPU for sure. So designed to do that, but you will be about maybe more easily because you will have less overhead than with threads to be able to use 100% of your CPU like we were able to, to max out your resource or CPU resource. And then when you do that with one Python process, we'll just start a new one using 'Celery', 'Unicorn', whatever or coated on you were mentioning which is a good tool to do that. Yeah, which is actually able to to spawn multiple process management for Yukos journalist reporting, when you have a Gmail, you want to do a lot of work, like the salary model for Q is a pretty good example when you have multiple workers. And each worker is in some thread. Doing things in the background if you're not using a framework, such as 'Celery' and 'QuTiP' returns a good small library to do that when you can multi class and each class being spawn as a process basically, and be managed by a master process like you would do with whiskey or unicorn and magical char, restarting that method. So that's a lot of work to do. You can certainly do that yourself. But because you're done does work for you out of the box. He does. Yeah, yeah, that's a cool way to create those sub processes and stuff. But yeah, I think a 'Async IO'

47:18 is a lot of promise it's coming alive. It's really been growing. Sequel alchemy just released.

49:40 Yeah,

49:41 they're 1.4 version, which actually so sequel alchemy now just in like couple of weeks ago now supports 'Await' session query of thing. Not exactly that syntax, but almost they've slightly adjusted it, but pretty cool. All right. And then what are the things that you talked about with scaling that I agree on? That is super important is statelessness So yeah, if you want to, I suspect going from one to two is harder than going from two to 10. In terms of scaling right now, soon as you're like, Okay, this is gonna be in two places, that means it has to be stateless in its communication. Alright, all these things, if you're just putting stuff in memory and sharing the pointers and kind of story in a persistent memory session for somebody while then putting that in two places is really fraud.

50:24 Yeah, it's really the thing where I like to say that if you start using multiple processes, you actually probably ready to under multiple nodes like of our network, because using multiple threads, you will always be the same program. So it's tempting to share the state of everything between your different threads. And when you have concurrency issue and unique lock, etc. But a lot of people go that road being I don't know, maybe a bit naive and seeing all of us is not really like my program. But if you are ready to go to step where you actually okay to split your work into multiple process, which might have to communicate between website for sure. And they can start by communicating over the same host. But then you just add networks in between the process and you can scale to multiple nodes, and then over whatever number of nodes you want, then the Prime issue under connectivity issue, but you don't have issue run on single hose, which will process your data, you don't have somebody unplugging, right invisible cable. But if you reach $1, that network failure, which will happen for sure, between your different processes, when you can scale pretty easily on different nodes. But as you were saying, it's like, you have to switch your padding when you write your program, which is being as stateless as possible, which is why I wrote a chapter on functional programming. Because while I love functional programming, I love Lisp. And I would do 'Lisp'this if it was more popular, but I have to do Python. So I do Python is a great list. And then pusher program gives you a pretty good way of writing code and give you a good mindset. I was I would say, to write code that avoids to do side effects. And that makes your program stateless most of the time, which makes it very easy to scale.

51:59 Right? The more stateless ness you can have, the easier it's going to scale. And yeah, you get it down to a point where maybe the state is now stored in a Redis server that's shared between them. Or some were even in a database, like a real common examples, just put it in a database, right. So like on the training site that I have people come in, the only piece of state that is shared is who is logged in. And when it comes back, it goes back to the databases. Okay, well, who is this person? Actually? Do they have a course? Can they access this course? Like all those things are asked every time and my first impression of like writing code like that was like, if I have to query the database for every request to get back all the information about whatever it is I care about, for I'm tracking on this request. It's gonna be so slow, except for it's not really it works really well. Actually, it definitely lets you scale better. Yeah, yeah. Yeah, that's pretty interesting. Okay. So stateless programming, which means like functional programming, you want to, like call out that example of remove last item that you have on the screen here, the first page of that book? Yeah,

52:57 I think it will give people a sense of what you're talking about. Yeah, exactly as I was trying to explain in Russia was a pure non pure function where you actually have one function doing the side effect. I mean, when you pass argument like functional programming, if you never have it, it's pretty simple. You mentioned about all your functional black boxes, and what you are going to put something in it. And when you get something out, you can't reuse the thing that you put inside, you're going to use only what's being list. So when you don't do a pure function and function programming, you're going to pass a list, for example, and you're going to modify it and not returning anything, because you actually modify the list what you pass as an argument, which is not functional at all, because you actually maybe like

52:57 list.sort, it would be an example. Right? Like, exactly, yeah, the thing you're calling sort on is changing the list itself. Yeah, yeah. Which is, yeah, that's trade off, because these dots are usually faster than sorted, putting sorted on the list, but it's not functional. But if you consolidate or if you reach on the list minus the last item, or a function last remove the last item, when it's functional, you're not returning the same list, creating a new list with a different would put like the last item being removed, but it is stateless. Like you can lose the first what you put as an input, you don't care anymore, you add something that is outside any shoe design, all your program I got is pretty easy to imagine having a large input of data provided to cue learning a worker taking that doing whatever we need to do, and then putting something and putting that into whatever Q database whatever I want to do. And that's the basis of any thing that scares is due to be able to do that to be able to scale into 'Asynchronous' task in the background.

54:32 Yeah, I think lists that sort versus sorted of less is like the perfect comparison there those Yeah. All right. You touched on queues. And I think queues have this possibility to just allow incredible scale, right? Instead of every request trying to answer the question or to do with a task it's meant to do entirely. All it has to do is start a task and say, Hey, that started off we go and put it into something like rabbit, you 'Celery' 'RedisQ'. Something like that. Some other thing that's going to pull it out of there and get to it when it gets done, right.

55:06 Yeah, exactly. It really depends on what you do and what you're trying to solve with your application library, whatever. But other general thing, it's a pretty good way and architecture of a program to add that, like, if you find a better REST API, which is what people do most of the time. Now, I mean, you can definitely process a request right away. And if it's, if you know that it's going to take less than one second, okay, it's fine, you can do that right away, very easy to know is going to take 10 seconds, 20 seconds, it's very, very impractical, impractical for a client to keep the connection open for 30 seconds for good and bad reasons. But the problem right now almost anyone's gonna think it's broken. Yeah, even if it technically would have worked, like, it's been 10 seconds. And it's something wrong, this is not okay. Right. Like, it's just not the right response. Yeah. And I'm in creation can be Queued. So if you need 20 seconds to do anything, and then it is being Queued at 18 seconds, when you lost your time and the claim, let's retry. So he has to repost the same payload, and then you have to reprocess it for 20 seconds. So you are actually losing time. So it is right better to take the input is already in a queue reply with 200, okay, I got the payload, I'm going to take care of it. And then I will notify you with the web hook, I'm going to give you the result at this address, whatever mechanism you can use to do is synchronous. But I mean, building this kind of system in seconds, when when you as you walk out taking message from the queue, processing them putting the results somewhere else, it's certainly way a really good way to scale your application. And you can start without, I mean, you can start with by finding stuff like there's a queue in Python, various multi process, and you don't have to like deep rabbit mq, whatever you can actually start if you know your program is not you can even just have a background thread and a list. Yeah, exactly. I mean, you can start with something very simple for responder, you don't have to use a huge framework or whatever, if you know the pattern. And you know, it applies to what you're doing, you actually can use it. And you know, for example, that you will never need more than one host, one node, one computer will be enough forever for your program. While you don't need to deploy your network based queue system, like 'Redis', 'RabbitMQ or whatever, you can use Python itself using multi process Q. And that will solve all your problem perfectly. Yeah, that's a great example. And multiprocessing has an actual queue data structure that properly shares across with notifications and everything across these multi process processes. Where the multi process thing can just say, I'm gonna block until I get something from the queue. And then as soon as you put it in, it picks it up and goes, but otherwise it just chills in the background. Yeah, very nice. All right, moving on designing for failure. That's a good one. You know, the thing that comes to mind is the at the extreme end of this, when I talked about scalability, I maybe said YouTube in a million requests a second, this one is chaos, monkey and Netflix. No, you have to design for that. Like I was saying, like people that try to write their code with a very optimistic mindset, like, everything's going to be fine. And I don't really care about order and exceptions where you actually want to write proper exceptions, like proper classes of exceptions, and proper and legal succession in your program and making sure that when you follow up, or when you use a lot Redis and use a Redis library, you want to be sure to be aware. And that's something that's not very obvious, honestly, because it's obviously not really well documented, like getting rid of the API of a Redis library and see, okay, it takes that type as an argument, and you need to return that type. But you don't know which exception is going to be raised. So sometimes you have to to see it with your own eyes in production, like oh, is broken is going to raise connection error. Okay, no, I

58:40 know, I need to fix it. Do all the tricky part of that is not necessarily seeing the exception and knowing it. But now what Yeah, like when I get the connection error means I can't talk to the database, it's overloaded, or it's rebooting because it's passion. But then what happens, right, like, how do I not just go? Well, there was an error, tell the 'Datadog' people, there is an error. So we know and then crash for the user like they what do you do beyond that?

59:02 Yeah. And the answer is not. I mean, it's not obvious. It really depends on what you're doing. Like, if you are in a REST API, and your database connection is broken, you're gonna connect to the database. I mean, what are you going to retry for? How much? I mean, how many times for how many seconds are going to retry because the other guy is waiting on the other side of the line? Right? So you can't do that for 10 seconds too long. So you have to do that a few times, then what do you do to return a 500 error or and crash? Or do you return something that is the actual retry later? I mean, there's a lot of you have to think about all of that, like when to say to the client to retry if they can retry or just yeah, crushed by some selections sometimes. And there are so many partners, most of the time network errors, but maybe disk full or whatever. And you have to so you can think about everything at the beginning for sure. So you have to have a good report system and they will redeploy. I totally agree about the reporting system is hugely valuable and notifications as well because if you don't look at the

59:57 reporting system, the log is full of errors and nobody will look for a week But are you a fan of the retry decorators Are you know, I'm talking about some of those things you can say, Here's like an exponential back off. Like you just retry five times and like first after like a second, and then five seconds and 10 seconds. What do you think

01:00:12 I'm the author of tenacity, which is one of the most widely used? Yeah. Okay. So that answers that question. You You're a fan. Exactly. Okay, cool. I am a fan. And it's all 80% of the problem. I mean, when it's up to you to know how to retry, but it's a very, very good pattern to use. And tenacity provides that as a decorator, which is not the best strategy, if you want to have different strategy, or like these functions will be rich, why this number of time depending on what the color is, but most of the time is good enough, actually is good enough. Like most of time, it's better to use that in a naive way, where you just retry five times for five seconds or whatever, but not doing anything. Because if you know, it's also not a silver bullet, like I see sometimes people using it to like, well, if anything wrong up, but I'm just going to retry, which is like, please use proper exception types, like the right thing and retry for the right reason, not for everything, because right, like maybe retry with connection timeout. But some other thing that crashes, like authorization failure, like that's never going to get better. Exactly, exactly. But sometimes you see people writing quiet, I'm going to reach for an exception, whatever is raised, I'm going to read, which is really not a good idea. Because Yeah, it's going to be fine. For network error or the light, we're saying if it's out dication, you don't want to retry. So I mean, be careful with that. But then if you know that you go to IO error most of the time, because the network is done one day when it's fine to to divide, and it's a really, really good way to design easily for this kind of thing. It doesn't solve everything. And for I don't know, if you have a large job, for example, but you you know, it's going to take 10 minutes to compute, etc. I mean, obvious, I think rather, this kind of retries is going to serve you because in a framework accelerate. For example, if your job fails, after five minutes, for whatever reason, is just going to put it back into the queue and retry later, which will last the five first minutes because you're used. Yeah. And you can end up in these poison message scenarios where it tries it fails. It goes back and tries and fails. It goes back and tries. And then

01:02:09 yeah, then it's not so great. All right, just a little bit of time for some more deployment. Yeah. Talk about in your book, you talk about deploying on a platform as a server as a 'PaSS' like Heroku. There's always VMs these days, we have Docker and Kubernetes. I mean, honestly, it's not simple to know what to do as somebody who's a newcomer, I think,

01:02:26 yeah, and I think it was around since I wrote the book, bats nowadays, I still I mean, Eric, we're still there and pretty widely used, because it's a good solution. The thing is, like deploying Python application, like for myself, I'm a pretty good Python programmer. But then outside of Python, like infrastructure, and Kubernetes, I barely know anything about it as like, it's a full time job. And it's not my job, it's another side of another job. So I could learn for sure. And I could be an expert in in Kubernetes, and deployment of anything, but I mean, it's fine to do that, if you want to do that. But a platform like Python debugger, I don't really want to do it to use any kind of platform as a service like Heroku, where I can actually, like the using the Kubernetes container approach of deploying, and spending a lot of ways to scale is not my responsibility, but I can outsource it to somebody that knows how to do that. So there's plenty of option. I think I wrote up a 'Heroku OpenShift'. Does that, you know, no, I mean, Amazon or Microsoft, or Google or other solution to do that. You they are they must have something? Yeah, yeah. And I mean, normally, there's no reason if you really know that your application is going to scale. And you don't want to spend a lot of time on infrastructure and learning. I mean, given it is Docker, or whatever I mean, you can spin easily depiction on top of, of a 'Heroku', and then click a button to have two nodes, three nodes, four nodes, ten nodes, and then platform is expensive. But that's another issue. Yeah, the platform as a service, often they exchange, complete ease of use with maybe two things, one with cost. And then the other is with flexibility, right? Like you kind of got to fit their way like well, the way the databases are managed database service. And if you don't like that, well, then I don't know. You got to just use our managed service. You know what I mean? Things like that are kind of somewhat fixed. But yeah, I think it's really good for a lot of people.

01:04:11 Yeah, exactly. I mean, covers 90% of the market, right. I mean, most people are glad that we're going to start with even if it's not a bad product, but like you're starting your company, you're doing a small project, and maybe one day you will be the next you have to scale but I thought Xiaomi will solve the problem, you'll get plenty of money to solve it. But until then you don't have a lot of time. And a lot of money is actually pretty cheap compared to the time you would spend learning the ropes of Kubernetes mean secure deployment at scale of Kubernetes. I'm sure it's pretty more competitive than writing a simple flask application. So it's a trade off. I think it's a pretty good trade off if you really want to start saying okay, I think at some point, we need to scale our contract and on my laptop anymore. I need to run that somewhere. or using a platform like that is a pretty good trade off.

01:04:53 Yeah. And I think it's so easy to dream big and think oh, I'm gonna have to scale it. So if I'm going to deploy this, what is it going to be like? If I get the first 100,000 users, you should be so lucky that you have that problem, right. So things get built, and they just stagnate or they don't go anywhere or that reason they stagnate is you're not adding features fast enough, because you spent so much time building complicated architectures. For a case in the future, when reality is on the past, you could just pay 10 times as much for two weeks, and then completely move to something else. And you could buy yourself that time for $500. Right? But you could spend months building something right? Like that's going to support some insane future that doesn't exist. And so a lot of people, they'd be better off to just move forward, and then evolve and realize it's not forever. It's a path towards where you're going to be. Yep. And then learn marketing to help people.

01:05:47 project

01:05:47 that is the problem. Yes, that's the hard part. Now, that might be my next book about like doing marketing to get people on your project to be able to scale them. Yeah, I'll definitely read that book.

01:05:57 All right. All right, we're,

01:05:58 we got some more topics to cover. But we're down to just one and I think we have a little time to touch on. Because it's like magic sauce for database. The magic sauce is indexes. For many other things, the magic sauce is caching, right? If this thing is really slow, I'll give you an example from talk Python, or maybe even better from Python bytes. That's a more interesting examples are that RSS feed for that thing is made up out of the word delimited, because we've got too many, but for a while it's made of 200 episodes, each episode is like five pages of markdown. In order to render that RSS feed on demand, I've got to render, I've got to go to the database, query 200 things and then mark down to five of them 200 times and then put that into an XML document and return that. And that is not super fast. But you know what, if I take that result I would ever return and save it in the database and just generate it once a minute, then it's fine, right? Those, it's like magic, it goes from one second to one millisecond. And you just get so much scale from it.

01:06:56 Yeah, that's exactly what you're saying. Like it's pretty good border and when you have to optimize. So what would be more for the performance dimension when you want to go to be faster, not necessarily to scare a number of user, even if it's sometime correctly, like if you have 200 people requesting the same time, your RSS feed, and you have to do that the same time it will return that's pretty useless. So I mean, caching is pretty good pattern. There is nothing actually very specific to Python, where I mean, even in this chapter of the book, it's actually pretty like I'm talking about how to use 'mem cache' already. So whatever you want to cache your operative words of good solution to get over the network, you can start by caching locally in your own process, like memorizing, right, like a Python dictionary is a really good cache, right? For certain things. Exactly. And there are in Python3, something they are in, there was a LRU cache, and there is a lot of LRU cache decorators really nice. Yeah, gosh, tools, the cache was library. In Python, there's a lot of different algorithm if you want to cache locally in your own Python program, like if you know, you're going to call this method 100 of times for and there is going to be the same. Just cache it is expensive to compute. I mean, expensive to compute, maybe why expensive, which of CPU better for me might be like expensive for database, or sometimes the expansiveness is going to be the network, like you're going to request some data over the network. And it's very far, or it's very slow system, or very unreliable system. Yeah. So using caching system is a pretty good solution to avoid the scheduler, which is also linked to the design for failure we talked about before, right? If you're consuming third party services, you can't necessarily depend on their uptime on their reliability on the response time, all those types of things. I'll give you another example of expensive. So when you go to our courses, we've got 12 video servers throughout the world, and we want to serve you the video from the one closest to you.

01:08:48 So we have a service that we call that takes your IP address, figures out where you are, and then chooses a video server for you that so you get the best time that costs a little tiny bit of money each time. But with enough requests, it would be you know, hundreds, maybe even I don't know, definitely into the hundreds per month of where is this person API charges. And so we just cache that, like, if this IP address is from this city, or this country, we just put that in our database. And first we check the database, do we know where this IP address is no, go to the service. Otherwise, just get it from the database. It's both faster, and it literally doesn't cost as much it's less expensive in the most direct meaning of that.

01:09:26 And then you're issuing on the first and biggest issue in computer science, which is cache invalidation, which is why in your case, the IP might not turn off of country pretty often. It can change it not very often, but it can change. Yeah. So for our just for our example, what I did is it's in MongoDB, so I set up a an index that will remove it from the database after six months. Yeah. Which is fine. But But arbitrary, right? Yes, totally. I could be wrong for a while. I mean, exactly. But the failure cases it's slow streaming with buffering potentially. It's not complete failure completely the wrong answer. Right. So for us, it's acceptable. Yeah, exactly. It's a bit of a trade off, which is totally fine for us guys. And a lot of things that you do when you want to scale is trade off. And sometimes you don't get things that you're right. But it's fine. It's just not the best experience for your user in your case, but it's fine, you can live with it. And I said, I think it's a change of mindset when you go from, I'm writing a Python program, which has to be perfect and walks 100% of the time. And when we want to scale, you have to do a lot of freedom for like, if we work fine for 80% of the people, and for some cases, 5% of the time, while that might be not optimal, but it's fine. And that's a lot of doing finger scale are Johnny regionalliga is mindset to what it was always it's always true. And sometimes it's not really true. I mean, if you had a way for you to be aware and notified that IP address challenges country, you could invalidate your cash and then make it Yeah, totally real reliable. I mean, for a few seconds, maybe won't be up to date, but that will be close to perfection. Yeah, but you don't have that system. So you have to do what you did, which is a good trade off. I mean, it's pragmatic. You have to be very pragmatic when you do things at scale. Yeah. And also kind of design for failure. Like, what's the worst case? If that goes wrong? Yeah. Right. It's like streaming halfway around the world or something. Whereas other things like if the database goes down, you've got to deal with that entirely differently. Yeah, that's a hard one to fix, though I didn't really know what to do. Well, I mean, caching, like you could judge, for example, you should know that this goes on you get That's true. You could cache hold the reversion and reply to the client, like, it might be an older version, I'm sorry, like, what the time system is where or, I mean, they don't know what you build, obviously, you have to know the use case. And if it is up to do but caching could be a solution. And then the premise, right? That's a good idea. Because you might actually be able to say, if I'm going to go to Redis. And if it's not there, then I'm going to go to database, a mini the requests that come in, might never need to go to the database again. You know, if you say, Oh, whoops, the database is down, we're just going to serve what's out of the cache, it could go for a while until there's some write operation, as long as it's read only, it might not matter, which is what services like CloudFlare does for the web, for example, like we do caching protects you. And if you're down, we're just going to show the Page Layout few seconds ago until your backup and nobody will notice. Yeah, interesting. Yeah, you can apply that. And everything you have to keep in mind when you do caching is to be able to invalidate your cache. Like if you're caching database, and something changed in the database, you have to out this callback mechanism where your database can say to your cache, by the way, what changes you need to update it, if you are able to like you need to be aware of that always you have to put arbitrary timestamp, like you said for your like six months is going to be six months advances, which is fine for such a use case. But for a lot of your again, like your RSS feed like Moonwalker, we will probably if you weren't doing six months,

01:12:43 yeah, that would be bad. All of a sudden, now there's 24 new episodes all of a sudden or something. Yeah. So this is where you write that cron job that just restarts Redis once an hour, and you'll be fine. Now just kidding. You're right. Like this cache invalidation really, really is tricky. Because if you check the database, then you're not really caching anymore. Right? You might as well not have the cache. So yeah, it's super tricky, but definitely a bit of the magic sauce. All right, I think that there's plenty more we could cover in the book and about scaling and architecture. And it'd be really fun, but we're way over time. So we should probably just wrap it up with a couple of questions here that always ask at the end of the show. Did you lean if you're going to write some Python code? What editor?

01:13:22 do you use? Emacs? Emacs user for the last 10 years, I think I still have to commit access to Emacs itself. So Oh, cool. You did say you love Lisp. And you get a Yeah, have your code powered by Lisp. Exactly. Yeah, I stopped that. I wrote a lot of these guys 10 years ago. Yeah, cool. And then notable 'PyPI'package, if you want, you can shout out 'Tenacity' which we covered or something else if you'd like tenacity, and 'jQuery', which I love. jQuery is a tiny wrapper around the logging system, or parents or the stories that I never remember how to configure the logging system in Python, like I do involve logging in a lot. Like, I don't know, to configure it to work like I want. So jQuery does that. It's pretty easy to use it as a function of functional approach, like diversity in its design. And it's tricky, like two lines to use a login STEMI or something that works out of the box with colors, etc. So like

01:14:11 it. Oh, fantastic. Yeah, I always forget how to set the logging system in use something else as well. I don't I don't need to remember this. Fantastic. All right. So thank you for being here. Thank you for talking about this and covering all these ideas. They're fascinating. And they always require some trade offs, right? Like, it's when should I use this thing or that thing? But if people want to get started, one, where do they find your book? Two What advice in general do you have for them going down this path?,

01:14:37 you can find my book 'gettingpython.com' if you want to take a look, it's a pretty good read. I think it'll give you the right mindset to understand what the trade option you might need to be to do program and I think that's whiteboards on to that like, what are you ready to change and how to design your program and what are you going to what is going to be your real use case? Like why do you want to scale and are you going to scale for real Are you Just thinking that you will need to scale in the future and do the right trade off. And don't over complicate things, because you're just going to shoot yourself in the foot by doing that.

01:15:08 Yeah, it's super tricky. I would just add on to that what you think you might need to scale in the future is a web app or web API use locust to actually measure it. And if what it is something more local, like data science type of thing or something computationally locally, Run C profiler with it. And just measure right however you go about your measuring. Yeah. Fantastic. All right. Thank you, Julian. It's been great to chat with you and sharing these ideas. Great book.

01:15:32 Thank you, Micah. Yep. Bye.

01:15:35 This has been another episode of talk Python to me. Our guests in this episode was Julian Danjo, and it's been brought to you by '45Drives' & us over at Talk Python Training. Solve your storage challenges with hardware powered by open source, check out 45 drives storage servers at the

01:15:35 'talkpython.fm/45drives' and skip the vendor lock in and software licensing fees. On level up your Python we have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at 'training.talkpython.fm' Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play and the direct RSS feed at /rss on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talk python.fm/YouTube. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon