Designing Effective Load Tests for Your Python App

Episode Deep Dive Links Transcript

You're about to launch your new app or API, or even just a big refactor of your current project. Will it stand up and deliver when you put it into production or when that big promotion goes live? Or will it wither and collapse? How would you know? Well you would test that of course. We have Anthony Shaw back on the podcast to dive into a wide range of tools and techniques for performance and loading testing of web apps.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Opening Stories and Analogies
- Anthony and Michael begin by sharing real-world stories where load or performance testing was overlooked.
- Examples include the UK “halftime kettle” electricity spikes, the Australian online census crash, and unexpected costs from serverless usage.
Defining Load Testing and Avoiding Common Mistakes
- Load testing isn’t just about finding a raw requests-per-second number; it’s about simulating real user behavior.
- Users click around, pause, log in, log out, and navigate in non-linear ways. Simple benchmarks often miss these nuances.
Tools and Approaches
1. Locust
  - Website: locust.io
  - A Python-based load testing tool that lets you write user flows (tasks) in code.
  - Supports weighting of tasks (e.g. “homepage” more common than “checkout”), randomizing inputs, and ramping up traffic.
2. Playwright
  - Website: playwright.dev
  - Typically for UI testing but can integrate with Locust for full browser-based load tests.
  - Automates realistic user actions (filling forms, clicking, etc.).
3. Azure Load Testing
  - Overview: azure.microsoft.com/products/load−testing
  - A cloud service that can run large-scale distributed load tests.
  - Planned support for Locust-based scripts (in addition to other tools like JMeter).
4. WhiteNoise (for Django static files)
  - Docs: whitenoise.evans.io/en/stable/
  - A convenient way to serve static files but not ideal for production-level static hosting.
  - Mentioned as a potential culprit if not properly optimized (too many requests served by Python).
Key Concepts to Consider in Load Testing
- Spikes vs. Ramping:
  - Predictable spikes (e.g., halftime or major TV events) vs. gradually increasing traffic.
  - Configure “ramp-up” in load test tools to see at what point the system starts to degrade.
- Caching and CDNs:
  - Evaluate caching for both static resources (CSS/JS) and dynamic data.
  - Watch out for dynamic pages (e.g., logged-in user pages) that may bypass caches.
- Database Bottlenecks:
  - N+1 query issues and under-indexed tables can slow everything down under load.
  - Look at actual traffic and query patterns to find the real performance hotspots.
- Data Volume for Realism:
  - Use fake data generators like Mimesis mimesis.name to seed databases.
  - Test with production-like database sizes, not just a handful of rows.
Observability and Monitoring
- OpenTelemetry
  - Website: opentelemetry.io
  - Collects unified metrics, logs, and traces across your app.
  - Helps pinpoint bottlenecks revealed by load tests (e.g., slow database queries, external API rate limits).
Handling Serverless and Cloud Nuances
- Serverless platforms like Azure Functions or AWS Lambda often have cold-start times.
- You can pay for “always-on” features or pre-warmed instances, but that blurs the line of “serverless.”
- Monitor cost as well as performance (avoid scenarios like the unexpected huge bill on Vercel).
Uptime Tools
- Mention of Uptime Kuma github.com/louislam/uptime−kuma
  - Continuously monitors website status over time.
Useful for seeing real-world performance trends and any spikes in latency or downtime.

Overall Takeaway

Effective load testing in Python requires more than just raw throughput numbers. You need to simulate realistic usage patterns, seed your database with production-like data, and instrument the app for visibility. Tools like Locust, Playwright, Mimesis, and OpenTelemetry (along with cloud or local monitoring solutions) can help reveal the true performance and reliability of your web apps or APIs under real-world conditions.

Links from the show

Anthony on Twitter: @anthonypjshaw
Anthony's PyCon Au Talk: youtube.com
locust load testing tool: locust.io
playwright: playwright.dev
mimesis: github.com
mimesis providers: mimesis.name
vscode pets: marketplace.visualstudio.com
vscode power-mode: marketplace.visualstudio.com
opentelemetry: opentelemetry.io
uptime-kuma: github.com
Talk Python uptime / status: talkpython.fm/status
when your serverless computing bill goes parabolic...: youtube.com

Watch this episode on YouTube: youtube.com
Episode #479 deep-dive: talkpython.fm/479
Episode transcripts: talkpython.fm

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 You're about to launch your new app or API, or even just a big refactor of your current project.

00:04 Will it stand up and deliver when you put it in production? Or will it wither and collapse?

00:09 How do you know? Well, you would test that, of course. We have Anthony Shaw back on the podcast

00:14 to dive into a wide range of tools and techniques for performance and load testing of web apps.

00:20 This is Talk Python To Me, episode 479, recorded August 8th, 2024.

00:25 Are you ready for your host, please?

00:29 You're listening to Michael Kennedy on Talk Python To Me.

00:32 Live from Portland, Oregon, and this segment was made with Python.

00:35 Welcome to Talk Python To Me, a weekly podcast on Python.

00:42 This is your host, Michael Kennedy.

00:44 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

00:49 both accounts over at fosstodon.org, and keep up with the show and listen to over nine years of episodes at talkpython.fm.

00:57 If you want to be part of our live episodes, you can find the live streams over on YouTube.

01:02 Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about upcoming shows.

01:08 This episode is brought to you by Sentry.

01:10 Don't let those errors go unnoticed.

01:12 Use Sentry like we do here at Talk Python.

01:14 Sign up at talkpython.fm/sentry.

01:17 And it's brought to you by WorkOS.

01:19 If you're building a B2B SaaS app at some point, your customers will start asking for enterprise features like SAML authentication,

01:26 SKIM provisioning, audit logs, and fine-grained authorization.

01:30 WorkOS helps ship enterprise features on day one without slowing down your core product development.

01:37 Find out more at talkpython.fm/workos.

01:41 Anthony, welcome back to Talk Python To Me again, again and again.

01:46 So awesome to have you back.

01:48 Always good.

01:48 Yeah, yeah.

01:49 Great to be here.

01:49 Good to catch up.

01:51 Good to have you on the show.

01:52 I know we've got a really interesting topic to talk about.

01:56 Performance, load testing.

01:58 How do you know if your website or your API is going to work when you ship it to the world in a real world way?

02:05 Not just how many requests per second can it take, but a real use case, as I know.

02:10 You're going to tell us all about, so that's going to be awesome.

02:12 Before we do, you know, just a quick introduction to who you are and maybe for most 98% of the world who knows you, what have you been up to?

02:20 Yeah, so I am the Python advocacy lead at Microsoft.

02:24 I do a bunch of open source work, maintain some projects and stuff like that.

02:29 Wrote a book on Python, the Python compiler called CPython Internals.

02:33 But these days, I'm mostly known as the person that created VS Code Pets, which was a bit of fun, but has become the most popular piece of software I've ever written.

02:45 So that's been interesting.

02:46 It's now got over a million active users.

02:48 I'm not sure how that works.

02:50 Careful what you create.

02:51 You might get known for it, you know?

02:53 Yeah, I know.

02:54 Yeah, it's interesting when you go to conferences and stuff now and I'm like, oh, I work on this project and this project.

02:59 And then you'd mention the pets thing and they're like, oh, you're the pets person.

03:03 Oh, you're the developer.

03:04 I'm like, you spent a year writing a deep book on the internals of CPython and it's runtime.

03:10 I don't know who you are, but this pets thing is killer, man.

03:14 Yeah, there's a cat that runs around in VS Code.

03:16 So that's cool.

03:16 Can it be a dog as well?

03:18 What kind of pets can we have?

03:19 Oh, there's everything.

03:20 Yeah, it can be a dog.

03:22 There's like chickens and turtles and snakes and everything you can think of.

03:28 It's a pretty active repository as well.

03:29 We get a lot of feature requests for new shades of fur and new pets and new behaviors and stuff like that.

03:36 So, yeah, if you haven't checked it out, then check out the VS Code pets extension for VS Code.

03:41 Yeah, I installed it for a while.

03:43 I had to uninstall it.

03:44 It was too much.

03:45 If you're one of those people that likes having distractions, then it's helpful.

03:49 If you find it hard to have a little thing running around whilst you're trying to code, then it might be a bit much.

03:55 A little bit like power mode.

03:56 Oh, yeah.

03:57 I don't know.

03:58 Are you familiar with the power mode?

04:00 I used the one in JetBrains.

04:01 Yeah, I used the power mode for JetBrains when that was around.

04:04 It was pretty cool.

04:05 Yeah, if you want maximum distraction for your work.

04:09 Yeah, it reminds me of Unreal Tournament, the sort of power modes for that.

04:13 That's right.

04:14 That's awesome.

04:15 Well, let's start off with some stories.

04:19 So, I think everyone has a sense of, like, why do you want your website or your API or your microservice thing of API and website, however you combine these things, to work well and understand.

04:32 But it's always fun to share some stories.

04:34 I know you got some.

04:36 Yeah, so we want to talk about load testing.

04:37 And I think a fun thing to do with load testing is to reflect on times where people haven't done it the right way and it's caused a really public disaster.

04:48 But the analogy that I use is the halftime problem, which I'm English, and so this is mostly a soccer thing.

04:57 But we've got really big soccer games.

05:00 And it would be the same, I guess, with the Super Bowl.

05:02 Yeah, I was just thinking the American Super Bowl, football Super Bowl.

05:05 It's got to be.

05:06 Yeah, but the difference is that in soccer, at 45 minutes, you've got a 15-minute break.

05:11 Whereas in the Super Bowl, every 15 minutes, you've got a 15-minute break.

05:14 Well, and it's just loaded with commercials.

05:17 So, you've got, like, every five minutes, there's, like, oh, here's two more commercial breaks.

05:21 Where soccer is a flowing game, there's, like, no break until halftime.

05:25 That's the difference, I think.

05:26 So, what happens with a big game, so if it's, like, a FA Cup final or, like, a, you know, like, a Champions League final or something, then you basically

05:35 have 100 million people all stop watching the TV at the same time for 15 minutes, go to the

05:42 kitchen, turn on the kettle, make a cup of tea, go to the bathroom.

05:46 And so, the electricity providers actually have to plan for this because the kettle uses, like,

05:53 a couple of kilowatts of energy.

05:55 Especially in the UK where it's 240.

05:57 Yeah, exactly.

05:58 Yeah.

05:59 And all of a sudden, you've got, like, tens of millions of people all switching on their two kilowatt kettles at the same time.

06:05 So, the electricity providers basically have to plan around the soccer games so that they get this massive spike in load in their grid.

06:14 So, they actually do look at the sports schedules to basically plan around this.

06:18 Especially if there's something like a World Cup or something like that.

06:20 And this is kind of like a load testing thing where you kind of need to plan ahead and think about what the load on the system is going to be.

06:28 And is it a spike in traffic or is it, like, distributed?

06:31 And I've definitely seen a couple of occasions where it's gone a bit wrong.

06:38 So, here in Australia, we had a census maybe eight years ago.

06:43 And normally, it's a paper census.

06:45 You fill in the form, say who you are, what you do.

06:47 But this time, they wanted to do it online as well.

06:50 And they really encouraged people to use the online version.

06:52 And they set up the system, set up the website, and said, okay, everyone can fill in the census.

06:58 There's only 20-something million people in Australia.

07:01 It's not a very big population.

07:02 But on the last night before the census was due, it crashed.

07:06 Because everybody logged on at the last minute and tried to fill it in.

07:10 And they hadn't tested it properly.

07:12 And it just wasn't ready for the load.

07:14 And then the postmortem, they said, oh, we did the load testing.

07:17 And we tested it.

07:18 And it seemed fine.

07:19 But we expected everybody to fill in the census over the six months they had to do it.

07:24 We didn't think that people would leave it to the last minute, which is more like, well, what did you expect to happen?

07:29 This is human nature that you're going to put it off until the last possible moment.

07:33 Yeah, I can only see two spikes.

07:35 A small spike at the beginning and a huge spike at the end.

07:38 And like nothing in the middle.

07:39 Yeah, exactly.

07:39 So they had to kind of delay the deadline and then provision new infrastructure and stuff like that.

07:45 So yeah, it was interesting.

07:46 But it kind of highlighted how you need to think about load testing.

07:49 Yeah, that's nuts.

07:51 And I guess the mode before, the style before was kind of a distributed queuing system using paper envelopes where the queuing literally was physical.

08:01 And then it would be processed at the rate at which the system can handle it, right?

08:05 Which, you know, however fast you can get them in based on paper.

08:09 But then you try to turn it into an interactive system.

08:12 And oh, no.

08:12 Yeah, yeah, exactly.

08:13 You know, I have another example to share.

08:16 Like we've seen websites that, you know, fall down like this.

08:20 It fell down.

08:21 And when something has to be done, I think it's probably really tricky to communicate.

08:25 Let's say 10 million people tried to fill it out that day and it crashed on them.

08:30 Well, how do they know that the deadline's extended and how do they come back, right?

08:33 Like there's, it creates this cascading chain of really challenging problems all of a sudden to deal with it.

08:39 You know, we had the healthcare Obamacare stuff in the U.S. where soon as that thing opened, it just completely died.

08:45 And for weeks, people couldn't get insurance, which was bad.

08:47 But that's not the story I want to share.

08:49 I want to share a different one.

08:50 There was this app, this person who was frustrated about, they're an artist, like a photographer, I believe.

08:56 And they're frustrated about LLMs and diffusion models taking everyone's art and generating new art from it.

09:03 And kind of, you know, I guess it's up to people's opinion whether or not it counts as stealing or copyright theft or it's fair use or whatever.

09:10 But this person wasn't a fan of that.

09:12 So they came up with a web app based on serverless that would look at stuff and say, this is real art.

09:18 This is AI generated art.

09:20 And it's not exactly the same, but they ended up with a $96,000 Vercel bill.

09:27 Because they didn't really take into account how much it's going to cost per request to handle this on their serverless.

09:36 It was like 20 cents a user.

09:37 And everyone's like, you know, this is right at the height of the LLM boom and all the AI art boom and stuff.

09:44 And people just swarm to it.

09:45 Yeah, yeah.

09:46 So that's a different kind of problem, right?

09:48 Yeah, that's the cost issue.

09:49 Yeah, there are definitely other ones to keep in mind.

09:52 And we mentioned like, you know, the census and the Obamacare thing.

09:55 And you might be listening and thinking, well, you know, my website is not going to get, you know, 15 million people logging in all at the same time.

10:03 So like, is this really an issue that I need to handle?

10:05 And like a few months ago, I was doing a load test of an AI app.

10:10 So like a chat app and designed the load test and ran it with 10 users and it had issues.

10:16 So like, this doesn't need to be a tens of millions of users problem.

10:20 And the issue that it uncovered was like, oh, there's a rate limit on the number of tokens you can send to the LLM.

10:26 And by default, it's a really small number.

10:28 And if you have 10 people asking questions simultaneously, then that gets exhausted really quickly.

10:34 And it just throttles you.

10:36 And then like the app just says, oh, rate limit error.

10:39 And you just wouldn't have noticed because if you're a single user clicking around, typing in, you know, questions, or even if you've written some fancy like UI testing or something, you're like, oh, we've tested it.

10:50 And it works great.

10:51 But, you know, you run 10 people using it at the same time.

10:54 And that's where you start to notice problems.

10:55 I've more than once considered putting some kind of LLM AI augment, you know, rag sort of thing in front of Talk Python because, God, at this point, eight, nine years of full transcripts, human corrected with all sorts of, you know, pretty highly accurate stuff.

11:11 It would be awesome to have people have conversations about it.

11:13 But I don't know, just the workload.

11:15 I can see so many people coming to it and go, that's neat.

11:18 I'm going to ask it about my homework.

11:19 It has nothing to do with it.

11:21 Just using it as like a free AI to just, instead of bothering to download something or use ChatGPT, I'll just use yours.

11:28 And I'm like, eh, probably not worthwhile.

11:30 All the trouble.

11:31 Yeah, you can set them up so that they only answer questions about your own stuff.

11:35 They don't act as a fancy front end for ChatGPT.

11:39 We've got plenty of demos like that, that you give it your own internal documents and it only answers questions about those.

11:45 And you can set it up so that if it doesn't, if it can't figure out a reliable answer, it tells you.

11:51 It says, I can't.

11:52 Yeah, that's great.

11:52 Rather than just making something up.

11:55 Yeah, that's really nice.

11:56 Yeah, I forgot how closely you guys work with OpenAI and all them, right?

12:01 Yeah.

12:01 I mean, the compute in Azure to run and train all that stuff is, it's out of control.

12:07 Yeah, the GPUs alone are insane.

12:09 Yeah, it's, you know, I talked to Mark Rosinovich, a bit of a diversion, but about some of the hardware and some of the data center stuff.

12:16 And it's like, we're going to create distributed GPUs that connect back over fiber to the actual hardware.

12:23 So you can scale your GPUs and CPUs independently.

12:26 There's just all kinds of crazy, interesting stuff there.

12:28 But that's a side, a bit of a side note, a bit of a side note.

12:32 All right.

12:32 So let's talk about basically what do you want to do to design a load test?

12:37 Like it's more than just let me write a loop and see how many requests I can make, you know, while true, call, you know, use requests and call this or call this endpoint or something, right?

12:48 Yeah, so there's, I guess, what I normally highlight with load testing is that the wrong way to start is to try and work out how many requests per second your server can handle.

13:00 That's interesting for throughput, but it's not a good measure of real user traffic.

13:06 And the main reason is that every user is unique.

13:10 They do different things.

13:11 They follow different paths.

13:13 They put in different data.

13:15 And also when you're running like a benchmark, then it's just it's not waiting between requests.

13:21 It's just trying to hammer as much as it can.

13:23 So real users pause and read and click and wait.

13:28 And there's normally latency.

13:30 They don't request the same page over and over and over, right?

13:34 They request a blend of pages.

13:36 Here they go to the homepage and they do a search and they explore what they find in the search and they go back and do another search and then they go to check out or whatever, right?

13:44 And as you're pointing out, they don't just hold down control or command R and just flicker the screen as hard as they can.

13:50 They they're they're reading and they're interacting.

13:52 And so what you're saying is if you really want to say how many actual users, not just kind of a throughput number, but how many people using the app do you possibly expect it could keep working or right?

14:05 You got to factor all these things in, right?

14:06 Yeah.

14:07 So you got to factor in the randomness of where they go, what they type in the weights between each click or each API request.

14:16 And then something else that's really important is that most modern applications is not just the initial HTTP request.

14:23 It's the 95 additional requests to all the scripts and the resources and the Ajax calls and everything like that.

14:32 So if it's if it's a browser based application, then typically, you know, it's not just the initial request.

14:38 It's everything that happens after it.

14:39 And I've definitely seen times where people have done a load test and said, oh, yeah, the website runs great.

14:44 And then in the app, they had like a JavaScript polar that would like refresh the data or something every minute.

14:51 And then they didn't test that.

14:53 And then you get thousands of users who leave the tab in their browser and it's just polling in the background.

14:59 So you've basically got this continuous stream of traffic to a special API that does like polling.

15:05 And they hadn't load tested that and that produced a huge like spike in load and that caused the issue.

15:09 It's like 75 percent of the workload is the polling.

15:12 Yeah, exactly.

15:13 What's your browser?

15:14 What's your browser tab story?

15:16 Are you a person that just has tons and tons of tabs open?

15:18 I don't know.

15:19 I try and clean them up as much as possible.

15:21 Yeah.

15:21 And have like a couple open maybe.

15:24 And there's some researching something.

15:26 And then when I'm finished, I just do close all tabs.

15:28 Yeah, that's me as well.

15:29 I think there's a there's a good number of people that just have like 50 tabs open.

15:33 They just leave them.

15:34 And think about what that does for your website.

15:36 If you've got some kind of timer based deal, right?

15:39 You got to consider those people that have just left it.

15:41 Yeah.

15:43 This portion of Talk Python To Me is brought to you by Sentry.

15:46 Code breaks.

15:47 It's a fact of life.

15:48 With Sentry, you can fix it faster.

15:50 As I've told you all before, we use Sentry on many of our apps and APIs here at Talk Python.

15:56 I recently used Sentry to help me track down one of the weirdest bugs I've run into in a long time.

16:02 Here's what happened.

16:03 When signing up for our mailing list, it would crash under a non-common execution pass, like situations where someone was already subscribed or entered an invalid email address or something like this.

16:15 The bizarre part was that our logging of that unusual condition itself was crashing.

16:21 How is it possible for our log to crash?

16:24 It's basically a glorified print statement.

16:26 Well, Sentry to the rescue.

16:29 I'm looking at the crash report right now, and I see way more information than you'd expect to find in any log statement.

16:35 And because it's production, debuggers are out of the question.

16:38 I see the traceback, of course, but also the browser version, client OS, server OS, server OS version, whether it's production or Q&A, the email and name of the person signing up.

16:49 That's the person who actually experienced the crash.

16:52 Dictionaries of data on the call stack and so much more.

16:54 What was the problem?

16:56 I initialized the logger with the string info for the level rather than the enumeration dot info, which was an integer based enum.

17:05 So the logging statement would crash saying that I could not use less than or equal to between strings and ints.

17:12 Crazy town.

17:13 But with Sentry, I captured it, fixed it, and I even helped the user who experienced that crash.

17:19 Don't fly blind.

17:21 Fix code faster with Sentry.

17:22 Create your Sentry account now at talkpython.fm/sentry.

17:26 And if you sign up with the code TALKPYTHON, all capital, no spaces, it's good for two free months of Sentry's business plan, which will give you up to 20 times as many monthly events as well as other features.

17:39 You did mention that the CSS and the, you talked about the AJAX, but I think also things like CSS images, JavaScript files, not JavaScript execution, but just getting the files, right?

17:51 Some of these frameworks are a couple hundred megs.

17:53 Like if you're not somehow distributing that through a CDN, that could be worth considering as well, I think.

17:59 Yeah, it is definitely.

18:00 And so in Python, for example, you've got in Django, there's an extension called white noise.

18:06 So in Django, you've got static content.

18:09 There is a really easy extension you can install because normally you have to kind of configure static and say, okay, my static files are here.

18:16 And then you have to set up Nginx or whichever web server you're doing to serve that static content directly.

18:22 So as a bit of a workaround, people often install this white noise extension that basically uses Python as their static server.

18:31 So like every time you request a CSS or whatever, Python actually is the thing that reads it from disks and serves it back, which is great for like development.

18:39 But like you should never use that in production because Python is not a very good CDN.

18:44 So yeah, that's kind of with load testing, you would just test the endpoint and say, oh, yeah, it works great.

18:51 But then you actually run it in a browser.

18:52 And if you're using something like white noise, it's actually creating 10 times, 20 times more load than you expected because it's pulling in all this like CSS and JavaScript and images and stuff like that.

19:03 Yeah, it can be out of control.

19:04 I ran into an issue with my website where, I don't know, I got tired of being too careful with font awesome fonts.

19:12 And I had like missed one.

19:14 I'm like, I'll just put the whole thing.

19:15 I'll just put the whole CSS file in.

19:17 I'm sure it'll be fine.

19:19 It wasn't fine.

19:19 Like the web app is running at like 10% CPU usage and Nginx trying to serve up all the JavaScript.

19:26 Is it like 80?

19:26 I'm like, what is it doing?

19:28 Like, oh, it was serving up a megabyte size to font stuff.

19:32 Every request.

19:33 This is not good.

19:34 So yeah, in terms of projects that I've worked on in the past, I've worked on load testing some like big campaigns and stuff like that.

19:43 Particularly around sports events, awesome television.

19:46 Back when people used to watch live television instead of streaming at all, you know, they'd have like a murder mystery or something on a big soap opera.

19:54 And so I'd like load test the application so that when everyone wants to find out who murdered the vicar's wife or whatever, then they'd all kind of click on the website at the same time and just trying to load test it.

20:05 So some of the things I'd seen were trying to make sure you simulate browser load correctly, trying to distribute traffic in a realistic way.

20:14 And then spikes in traffic are basically a different problem.

20:18 So the thing we talked about, like the halftime problem where you've got everybody turning on the kettle at the same time, that's a special type of load test.

20:25 It's a predictable spike as opposed to just out of the blue spike, right?

20:30 That one is predictable, which is really nice.

20:31 And then you get things like seasonal traffic.

20:34 So for a lot of e-commerce applications and stuff like that, you would expect in the lead up to Christmas or in the Black Friday sale in the US, for example, like Cyber Monday, you know, you'd expect like a spike in traffic for those.

20:46 But it will be distributed over a day or two.

20:48 So you want to be able to like properly assess those.

20:52 Another spike that I think you and I can both relate to is the driver of the day, Formula One.

20:58 Oh, right.

20:58 You've got 10 laps left within the race.

21:01 Everyone now go to this thing and press this button and you have five minutes to do it or something, right?

21:07 Like that's going to be a mega spike.

21:09 Yeah, there can be some big, big loads in traffic.

21:11 Sudden spikes are really hard to handle because you get bottlenecks in the network.

21:17 You get bottlenecks in the database.

21:18 You get bottlenecks in the web server.

21:20 So those are kind of a unique type of problem to test for.

21:24 If you're looking more like waves of traffic, so like, oh, you know, traffic would build up during the weekday or the type of application I have actually gets busier at the weekend or in the evenings.

21:35 Then those are kind of the ones where you test ramped traffic.

21:38 So all these load testing tools have a ramp up time or ramp configuration where certain locusts, for example, you say, how many concurrent users do you want to get to?

21:48 So let's say that's like a thousand or something.

21:50 And then how much do you want to ramp up?

21:52 And you generally should always use a ramp because if you've got a thousand users, unless you've put them all in a room and said, okay, everybody click on the button at the same time.

22:02 Like that's not going to happen.

22:04 People, people log in gradually over a period of time.

22:07 And if you don't use ramping and load testing tools, you actually create, you kind of end up simulating stuff that's not realistic.

22:14 So you'd see like a massive spike in response times as the application gets backed up.

22:20 But if it's not really realistic that you get those types of spikes, then don't, don't simulate it that way.

22:24 We're going to talk about it a little bit, but there's a lot of different layers of caching as well.

22:29 And if the app has just come to life, many of those layers of caching are not set up and warmed up.

22:35 So that's a big deal.

22:37 And it could be as simple as the web app parsing the Jinja or Chameleon or Django template, right?

22:44 That's the very first time that can be slow.

22:45 But then if you set it up right in production, it will not parse at the second, third and fourth request, right?

22:52 So if we talk about Locust as an example, you've got on the screen, I think it's my favorite load testing.

22:57 Locust is so good.

22:59 It's really flexible and you define the user flows in Python code.

23:04 So you write a class that represents a user and then you program in the steps that they would take.

23:09 So like, oh, they start on the homepage, then they click on this page, then they click on this page.

23:15 And then you can program in like the pauses and the randomness and stuff like that.

23:20 And actually set up different types of personas or different types of users.

23:24 I think this is a great way of designing a load test because you're thinking about what would the user do rather than like what's the throughput.

23:31 So if you set up a Locust test where they start off at a point in your site, if your site has a login, how many of your users would log in?

23:39 Because I think it's important to start off.

23:42 If this is a new application, it's really hard to know.

23:44 So you're going to have to come up with an educated guess or test like a range of parameters.

23:49 But if this isn't a website that you've had running for, you know, a year or two, and you've got like a history of traffic, then you can look at it and say, okay, how many people on the homepage are logged in?

23:59 Because there's a big difference.

24:00 Because when you talk about caching and like CDNs and stuff like that, you know, you can cache the rendered template, but you generally wouldn't cache the whole rendered template if they're logged in.

24:11 Because at the top, it might say, you know, hello, Anthony, and you wouldn't want to cache that so that when Mike clicks on the website, it says hello, Anthony.

24:18 And he's like, who the hell's that?

24:19 That would be very disturbing, wouldn't it?

24:21 So there's a balance between caching and serving up dynamic content.

24:26 And then also if you're presenting, you know, lists of products or someone's got shopping cart or something like that, obviously that's unique to them.

24:32 So you're kind of trying to simulate this as best as possible.

24:36 Otherwise, you just create these really optimistic load tests where you're like, oh, we tested the homepage and it can handle 50,000 users.

24:44 But in practice, if a thousand of those log in, then the whole thing falls apart because all of a sudden you're not using the cached version of the homepage.

24:51 Right.

24:52 You know, another example that you brought up is, let's just take Talk Python, right?

24:56 It's got coming up on 500 episodes.

24:58 If you test and suppose I put really aggressive caching, like on the database results that come back from a single episode for the page and then just render that out of memory, basically, at least just out of the objects that are in memory.

25:10 If you do your test to just hit the one, like let's just randomly pick episode 300, hit that one a bunch of times.

25:16 But in reality, people are hitting all four or 500 and it's not all cached the same.

25:21 Or maybe there's so much memory that each one has to hold that it like putting all 500 in memory and the cache like runs it out of RAM on the server.

25:28 All sorts of stuff that happens as you get this kind of dynamic mix, right?

25:32 You want to introduce randomness.

25:34 So for example, on the login, if you've got a login flow or if you're clicking on a particular product, then try not to hard code which one it is.

25:44 If you've got caching, you might not even see the caching either.

25:47 Like databases do a lot of caching.

25:49 So like if you're always pulling the same thing from the database, the database server is probably going to cache that.

25:55 So, you know, it's going to be a lot faster.

25:57 So if you can randomize the like the inputs or the pages that you click on or the flows that people take, then that's great.

26:04 Locust actually has like an extra when you define a task, which is a decorator on a function.

26:09 You can have like how often this happens.

26:12 So you can have some tasks which happen more sort of frequently than others.

26:16 So you can say, OK, you know, five times more people go to the homepage and then every so often somebody does a search.

26:23 But then when you do the search, when you want to low test the search, you know, you want to randomize that a bit more than just always searching for the same thing.

26:31 Right, right, right, right.

26:32 Yeah.

26:33 And just so for people who haven't seen Locust, you create a class, you give it a couple of functions.

26:38 And then, for example, to test the homepage, you just literally put the task decorator on the function.

26:43 And you say self.client.get slash self.client.get, maybe some additional assets or not.

26:48 And then you can even hard code into the class what the domain is or localhost or support or whatever.

26:54 And that's it.

26:55 And you just you can assign a weight to these tasks like homepage five times more likely than about.

27:00 So just put task of five instead of task by itself.

27:03 Right.

27:03 It's incredible.

27:04 Yeah, it's really, really helpful.

27:05 And then there's a library.

27:07 There's a couple of things you need to keep in mind as well.

27:09 And like if the user logs in and then it probably creates some sort of session, whether that's like a cookie or a token or something like that.

27:17 So you need to store that somewhere in the class so that subsequent requests use the same login session.

27:23 And then also, you know, frameworks like Django and Flask have got cross-site request forgery like protection.

27:31 So they generate these CSRF tokens in the forms as well.

27:35 So there's normally like a bit involved in getting the cookie or the session ID, getting the CSRF value.

27:42 And then like, say, if you're submitting forms or you're like doing a search or something, you need to code a bit in Locust to work around the security controls.

27:52 Right.

27:53 For example, you might do a request to the page that has the form and then pull back the CRF token from a cookie and then use that as part of the form submission data.

28:03 Otherwise, it might just say invalid.

28:07 This portion of Talk Bython is brought to you by WorkOS.

28:10 If you're building a B2B SaaS app, at some point, your customers will start asking for enterprise features like SAML authentication, skim provisioning, autologs and fine grained authorization.

28:20 That's where WorkOS comes in with easy to use APIs that help you ship enterprise features on day one without slowing down your core product development.

28:30 Today, some of the fastest growing startups in the world are powered by WorkOS, including ones you probably know, like Perplexity, Vercel and Webflow.

28:38 WorkOS also provides a generous free tier of up to one million monthly active users for AuthKit, making it the perfect authentication layer for growing companies.

28:48 It comes standard with useful features like RBAC, MFA and bot protection.

28:53 If you're currently looking to build SSO for your first enterprise customer, you should consider using WorkOS.

28:59 Integrate in minutes and start shipping enterprise plans today.

29:04 Just visit talkpython.fm/workos.

29:06 The link is in your podcast player show notes.

29:08 Thank you to WorkOS for supporting the show.

29:11 You know, another one that would be really tricky would be like turnstile or reCAPTCHA.

29:17 You're probably basically not getting that.

29:19 It's not worth it.

29:20 Yeah, you're stuck there.

29:22 You've got to fill in those ones.

29:23 Just maybe turn it off real quick for your test and then turn it back on.

29:27 I don't know.

29:27 I mean, you could do it, turn it off in development or something like that, right?

29:30 Yeah.

29:30 So when I look at this, this locus stuff, I see behind the scenes something like requests or HTTPX,

29:38 where all it does is pull back the string of the HTML, or maybe even it just gets the head and actually throws away the content.

29:45 I bet it streams it back.

29:46 But what I don't imagine it does is it doesn't parse the HTML, realize that it's a view front end,

29:54 execute the JavaScript that then has three API calls to more stuff on the back on the server, right?

29:59 If it's a rich front end app, it probably doesn't treat it the same if we write it just like this, right?

30:05 Yeah.

30:05 So the other one I wanted to highlight is an extension to locust so you can connect it with playwrights.

30:11 At this thing?

30:12 It does HTML parsing.

30:14 That's more like looking at the HTML to see if like a particular, you know, like a beautiful soup or responses was the framework for this.

30:23 Right.

30:24 Make sure timeout does not appear in the text or something like that, right?

30:27 Yeah.

30:27 Make sure the page doesn't have error in big letters.

30:30 That's one thing you can do is checking that the content of the page actually contains the right thing.

30:34 And then playwright is a UI, like a web testing tool.

30:39 Playwright works really well with pytest.

30:42 So I recommend using playwright anyway, if you're running a web application, because you can write pytest tests in playwright.

30:50 Even better is if you want to get started with playwright, it has a code generator.

30:55 So when you pip install it, you can run it in the code gen mode.

30:58 It pops up a browser and then you just go on your website and just click around and then do what you would do normally.

31:05 Type in, you know, fill in the forms, click on the buttons.

31:07 And then whilst you're doing that in another tab, it actually generates all the Python code for the pytests.

31:13 So it basically generates the pytest test code in a separate window automatically as you're clicking around in the browser.

31:20 So like in terms of writing UI tests is quite difficult because often you have to be like, OK, how do I find the button?

31:26 How do I click on the right button?

31:28 How do I find the form?

31:29 Especially with JavaScript, because it's not, you know, often things don't have a specific idea or you've got to figure out what selectors to use and stuff like that.

31:37 So this makes it a lot easier because you can use the code gen.

31:39 So that's a browser test.

31:41 So with a load test normally in Locust, you're just making HTTP requests, but you're not actually rendering the page or running the AJAX or the JavaScript code.

31:50 Whereas with Playwright, when you run Playwright, it's actually spinning up a browser and then driving the browser from Python.

31:56 So you can plug Locust and Playwright together.

31:58 There's a Playwright extension for Locust.

32:00 So you can say, OK, each load test user, I actually want that to be a browser.

32:06 And so when you say I want to test 100 concurrent users, it actually spins up 100 browsers.

32:14 In a very, well, Playwright's actually really interesting how it works, but like there's a headless mode for browsers these days.

32:21 So it doesn't actually run 100 windows like you can't see them.

32:24 I can't use my computer while this is running.

32:26 It's just overwhelmed by.

32:28 But yeah, I don't recommend running 10,000 concurrents on a single laptop because it's going to run pretty slowly.

32:35 It's going to have a bad time.

32:36 But that is really important to actually test.

32:39 So that will test it.

32:40 All the stuff that we've been talking about.

32:42 It'll test.

32:42 If there's polling, it'll test that.

32:44 If the CSS is not delivered over a CDN, it's going to go get that potentially white noise or wherever it's coming from.

32:52 It's going to do all the things.

32:54 Yeah, the challenge with it is that, you know, like you mentioned, running a browser, even if it's just a tab, uses a lot of resources.

33:01 Whereas making a HTTP request using a request or something, you know, it doesn't really need anything.

33:06 So you can quite happily in Locust, you know, make 100,000 requests and your local machine that you're testing on will be fine.

33:15 It will.

33:16 So, yeah, you can actually get low testing as a service.

33:20 And the reason you'd probably want to use that is if you're testing a scenario where your local dev environment or your test environment just isn't big enough, where you need to distribute it, then you need more horsepower, basically, to go and run all the requests.

33:36 Especially for something like Playwright, you know, where you need a lot of resources for the browser.

33:42 Yeah, the little screenshot they have for Locust says 21,400 users are currently accessing the website.

33:48 It's like, that's a lot of browser instances.

33:50 That's a lot of browser instances.

33:52 That looks like an API test, which would be a lot easier to do.

33:56 APIs are easy.

33:57 Yeah, so we, there's a service called Azure Low Testing.

34:00 There is other options as well.

34:02 I'm sure AWS has one and Google probably has one as well.

34:06 Azure Low Testing, I know.

34:07 And we are launching Locust support for that.

34:12 At the moment, it supports Jamie to test.

34:14 But yeah, we're going to be launching Locust support for that.

34:17 I think by the time this episode comes out, it will probably be in public preview.

34:21 So I've been using that and testing that.

34:23 And the reason you would use it is, like I said, if you can run the Locust test locally on your machine and it runs great and brilliant, but, you know, you can ask us to spin up 50, 100, or even more instances running in parallel.

34:37 And so if you want to do a really large scale test, then you can just basically get that from a cloud service provider like Azure.

34:44 Yeah, awesome.

34:45 I didn't realize you guys were bringing that online.

34:46 That's cool.

34:47 Yeah, nobody knows about us.

34:48 Yeah.

34:49 Can you do distributed?

34:51 I think you can do some distributed stuff with the Locust even, can't you?

34:54 Just, yeah.

34:54 Yeah, you can.

34:56 But you've got to have your own infrastructure, right?

34:57 A lot of it uses SSH and you've got to, like, basically it kind of works over the shell.

35:02 Yeah.

35:02 So, yeah, we've kind of got our own distribution and, like, coordination system and stuff like that.

35:07 And then also when you're running the test, your output, I guess, is what is the response time of the pages?

35:15 And, like, what does that look like?

35:17 And I think what confuses a lot of people is the first time they use Locust or JMeeter or one of the other tools, they get all these percentiles back.

35:25 And they're like, oh, the 90th percentile response time is this and the 95th is this and the 99th is that.

35:31 And they're like, okay, well, if I go back to my high school math, I think I can remember what percentiles are, but, like, which one matters?

35:38 Because normally there's a massive difference between the 90th and the 99th percentiles with low testing.

35:44 And, you know, you might say, oh, well, the 99th is like 10 seconds, but the 90th is like 300 milliseconds.

35:51 So, you know, was that a good output or a bad output?

35:54 I'm not really sure how to interpret the results.

35:56 So, yeah, you'll see in the UI for Locust, it gives you percentiles.

36:01 All the other low testing tools are very similar.

36:03 It's basically like a distribution of the response times for those particular pages.

36:09 And what you're trying to understand is, like, what is the expected response time?

36:14 So, you know, if it's a bell curve, then what's the center point of that?

36:17 Because the 99th is interesting, but often, like, it's because the cache was warming up or, you know, there was like one user that took 10 seconds.

36:26 The cache expired and then it got recreated right then or something.

36:30 Yeah, exactly.

36:31 So for 99th percentile, you know, if it's 10 seconds, you might have one user that took 10 seconds and 99 users that took, you know, a couple of hundred milliseconds.

36:40 So it's, you know, do you want to, you know, factor for that one user or do you want to focus on the other, on the bulk of the group?

36:48 Yeah.

36:48 And if you're like my daughter, you'll just say that the Internet's broken.

36:50 Yeah.

36:51 It could be, or it could be that YouTube is slow for some odd reason for you for a minute.

36:56 Like, it's not necessarily the entire Internet that is the fault here.

36:59 Yeah.

36:59 Well, most users, I think if it took 10 seconds to respond with just clicking the refresh button.

37:05 Yeah, exactly.

37:08 So just generate more load.

37:10 Yeah.

37:10 You might have to program that into your load test.

37:12 If it takes longer than five seconds, then issue another three requests.

37:16 Yeah.

37:16 That's a really good point, actually.

37:17 Let's see.

37:18 Are there any pictures of the reports here?

37:21 Let's see.

37:22 Yeah.

37:22 You got these, you got these nice graphs that shows you response times.

37:26 Yeah.

37:27 They're not just graphs.

37:28 They're live graphs.

37:28 You can kind of see it like flowing as it's testing, right?

37:32 As it's ramping up or whatever.

37:33 Yeah, definitely.

37:35 So what we do with the load testing services is we've got that graph.

37:39 But then you can also say, I also want to see how many crews a second were happening on the database.

37:44 What was the memory usage of the web applications?

37:47 Like how many, like what was the pod size if you're using Kubernetes?

37:51 Like stuff like that.

37:53 I've got a couple of demos where I've got like parameter-based load testing, where when you're setting up a Kubernetes environment, you're like, oh, I'm not sure how much memory I should allocate the containers or like how many of them I should have in a pod.

38:06 So you can basically just use something like GitHub Actions to give a matrix and say, okay, I want to test these configurations.

38:13 Let's see what happens with half a gig of RAM per container or two gigs of RAM per container.

38:18 Let's do two, four, eight in a cluster.

38:21 And it will run the same load tests for every configuration.

38:23 And then you can just compare all the graphs and say, okay, we don't want to over-allocate infrastructure because, you know, people often go a bit nuts.

38:31 They're like, oh, how many instances in a cluster do we need on the front end?

38:34 And they allocate like 16.

38:36 But then when it's actually running, 15 of them are idle.

38:39 So it's kind of just over-provisioning.

38:41 So load testing can be a great way of not just like planning for spikes in traffic or being able to like cater for it.

38:49 But actually the other way around, which is you run a load test and you realize you could probably actually do with less infrastructure.

38:55 So like things like memory as well, like memory is expensive when you're buying it from a service provider.

39:01 And, you know, CPUs and stuff like that or the number of instances.

39:04 And actually, let's see what happens if we turn some of those things down.

39:08 Will that impact the response times?

39:10 Can we get away with it basically?

39:12 Because in a lot of cases, you can actually spend a lot less money and get the same performance or even just a negligible difference.

39:19 Or maybe you identify somewhere where you could add some level of caching.

39:23 Yeah.

39:23 And then all of a sudden you get, you can preview, you get 10x the load per worker process.

39:28 You know, we could, we could have a smaller machine or smaller cluster or whatever.

39:32 Yeah, exactly.

39:33 So one, one thing I want, would like maybe to talk to the folks about is just how do you interpret these graphs?

39:39 So you have like a request per second.

39:42 And then you talked about, say, the 95th percentile response time and this ramping up.

39:47 And usually when you look at these graphs, it's really obvious, like, yeah, we can add more users.

39:51 But here is where we start to suffer consequences if we add any more users than this.

39:57 Doesn't always just completely fall over.

39:59 It just behaves worse until it does.

40:01 It's kind of a bit like a feedback system for anyone who studies that.

40:05 When you configure a ramp in Locust, you're saying, okay, how many users per second do we want to add?

40:11 And you start off with a slow ramp is my suggestion.

40:16 So like, you know, every 10 seconds, add one user.

40:19 And would like be a really slow way of doing it.

40:21 And then what you're looking at is the response times for each page, or you can get an average of everything.

40:28 So if you're starting off with a load test that just tests one page, you slowly ramp up the users.

40:34 Then you're looking at the response time graph.

40:36 Often when you run a load test where it starts, the response times spike up at the beginning.

40:42 Because, you know, the service was probably asleep.

40:45 The database server was probably asleep.

40:47 You know, it needs to like kick a few things into action to get it responding.

40:51 So you often see like a spike at the beginning.

40:53 That's fine.

40:54 Don't worry about that.

40:55 If as long as it's a short spike and not, it doesn't go on for hours.

40:58 But we know once everything's warmed up, then you should see a stable graph.

41:02 So the response times should stick around the same level, even as you add more and more users to a point.

41:09 So it's basically looking at the two graphs, which is the response time and the number of users,

41:14 and trying to understand how many users does it need to get to before that response timeline starts going up.

41:22 And you basically know that's where you've introduced some sort of bottleneck.

41:26 Something's reaching its limit.

41:27 Yeah.

41:28 That's where someone, you know, people are all of a sudden sitting in a queue somewhere.

41:33 The hard part is actually figuring out where that queue is.

41:36 If you don't have the time, but you've got the money, you can just throw more infrastructure at it.

41:39 But often you can fix some of those queues by looking at where, what are we caching and where?

41:45 Right. Look at the CPU usage of the various aspects.

41:48 What's the database CPU load?

41:51 What's the web process?

41:52 If the web bit is kind of chill, but the database is at 100%, maybe you need better indexes or something.

41:59 Yeah, because often you look at the infrastructure on the back end.

42:02 And even though the response times are going up, so there's a bottleneck, the CPU is maybe still at 60%.

42:07 And the memory might only still be at 50%.

42:09 So you're like, okay, you know, more CPUs isn't necessarily the issue, but things are getting stuck somewhere.

42:15 And that might be that, you know, each page, actually, I remember one low test I did where each, it was a PHP application using like some framework.

42:25 And the performance like was horrible and we couldn't figure out why.

42:28 And we put on some debugging tools and realized that every single page ran 200 SQL queries.

42:34 Because of the way they've written it.

42:36 It was like, oh, that'd be why.

42:38 Because, you know, you're looking at the CPU and memory of the actual web servers and they're fine.

42:43 Like the web servers are just like basically just sitting there continually waiting for the database server to respond.

42:49 So, you know, you look at the resource usage and you might be thinking, well, why is it getting slower?

42:54 But often it's because you're making calls to the database or a backend API or something.

42:59 And it's just sitting there idle waiting for the responses.

43:02 So I know you mentioned the show a few times, but like the N plus one issues and stuff like that for ORMs in particular is where you get those types of scale where you're, you should not be seeing that many SQL queries.

43:14 It's so easy to do because you write the same code, get the things and then loop over some and interact with some aspect of them.

43:21 And if you don't eagerly do the query at the first part, each one of those is a separate query.

43:26 The more you get back, the worse it is.

43:28 Yeah, exactly.

43:28 And I mentioned like an AI app, like with this LLM rag app that we did a low test on with like 10 users.

43:35 And like the CPU and memory were fine on the front end because it's not really doing anything.

43:38 It's just calling the LLM.

43:39 But it hit like a token limit, like really, really quickly.

43:43 And because we were capturing that and tracing it, then we could see like that's what the bottleneck was, that it was getting rate limited on the on the back end.

43:50 One of the challenges that people can run into, this is really important, is testing with way less data than you're going to have in production.

43:58 I've got 10 entries in the database because that's what I bothered to test type in while I was playing with the app.

44:04 But I've got a million in production.

44:06 Stuff like that, right?

44:07 I've got three products and there's a million in production.

44:11 I've seen this a few times where people have done load testing and they're like, oh, on the all users page or on the list products page, like it runs super fast because they've deployed the dev environment and it's got like no products in the database.

44:24 It doesn't matter if you have indexes because there's only three things, just return them all.

44:28 Yeah.

44:29 Or it's good.

44:29 Yeah.

44:29 And you don't see things like the N plus one because, you know, there's only like a couple of products, if any at all.

44:34 So you often want to seed the application with as much fake data as possible.

44:40 And the library that I absolutely love for this is called Mimesis, which is a fake data generator, kind of similar to Faker, but it's got a lot more support for like different languages and environments and stuff like that.

44:53 So if you wanted to say, okay, let's create a hundred thousand users and you need to generate names and addresses and locations and stuff like that, you can do that using Mimesis really easily.

45:04 And also if you want to do like test like different cultures or locales.

45:10 So, you know, not just testing like English names, but testing all sorts of different countries and stuff like that or different phone numbers, then yeah, you can use Mimesis to do that.

45:19 Yeah.

45:19 It's got all kinds of different things.

45:21 You get credit cards and phone numbers and all kinds of stuff.

45:25 Yeah.

45:25 I need a hundred thousand Brazilian phone numbers and it will just give you them in exactly the right format.

45:31 Yeah.

45:31 So then you save those to your database once.

45:33 Yeah.

45:34 And then you can run your tests and see if you have your plus one problem or you, your indexes don't fit into memory or whatever the problem might be.

45:42 Right.

45:42 Yeah.

45:42 So for Django or for SQLAlchemy, you can do the, like these load and dump commands.

45:49 So what I kind of recommend is to keep it fast is to use Mimesis to generate a seed data file, even if that's like JSON or something, and then just do like a bulk load in the test environment.

46:01 And then you can reset it every time.

46:02 So you can basically just do a rollback and then just reset and do a bulk load.

46:06 If you're using like a document database, then they've got similar tools for that.

46:10 So if you're using Mongo, then you could just do a bulk load from like a test command, basically.

46:15 Right.

46:15 Just do a load it up with some fake data, do a Mongo dump.

46:18 And then whenever you're ready to reset it, just do a Mongo restore --drop and it'll.

46:22 Yeah, exactly.

46:23 From scratch.

46:23 Yes.

46:24 Yep.

46:24 And you can reuse those for your integration tests as well.

46:26 If you're writing, if you've got integration tests with Django or Flask, then often you need a database fixture and you want to seed that with some information, then you can just reuse the same data.

46:35 But also do transactions, right?

46:37 Like put a, put a transactions that's always rolled back.

46:40 So once you load it up, you're not, you're not breaking it.

46:43 But that's tricky if the code itself calls commit, I suppose.

46:45 Yeah.

46:46 Some idea.

46:48 No, it changed it.

46:49 Darn it.

46:49 Yeah.

46:49 There was a question in the chat.

46:51 How does load testing change when you have non-deterministic processes, e.g. LLM queries, which is a really good question.

46:57 And I think it's kind of related to this where you've got a, you want to introduce like an element of randomness.

47:03 But we talked about like a, you know, a user search page, or if you've got like a chat feature or something, then you want to kind of vary the question, especially if you've got any kind of caching.

47:14 So that is tricky though, because at the moment, like the LLM calls, depending on which model you're using and how it's set up and stuff like that, like, but they can take, you know, a second to 10 seconds to get a response.

47:27 Easy.

47:27 Yeah.

47:28 So, you know, you need to kind of factor that in.

47:30 It's like, how is your app, how does your app handle sitting and waiting for 10 seconds before it gets a response back?

47:37 And often you'll find that you'll max out the number of like threads you've got, all the number of workers in like G Unicorn or Uvicorn or something like that, because they're all just sitting there waiting.

47:47 That's a place where async and await would be pretty awesome.

47:50 Yeah.

47:50 Right.

47:51 Because you can sort of let the thread go and just, you'll get back to it.

47:55 Also, maybe you just need a different architecture.

47:58 You know, maybe it's not just more caching or more hardware.

48:00 It's like, so we're going to put the, there's a pending request in the database and we're going to push that off to somewhere.

48:06 And then we'll just set up like some kind of JavaScript thing to check if it's done and then pull down the answer or, you know, something that's not necessarily blocking potentially.

48:14 I think in a way it's great for, like you mentioned, great for that.

48:17 Actually, there was an announcement pretty recently that you can run Uvicorn now, like as a production.

48:23 Yeah.

48:23 Without being wrapped through G Unicorn.

48:25 Yeah.

48:26 That's awesome.

48:27 For this kind of scenario we just talked about where you're waiting on backend calls and stuff like that, use the async version of those libraries.

48:33 Like if it's OpenAI, there's an async version of the OpenAI SDK.

48:37 Use that.

48:38 And then at least like Uvicorn is going to just have, it will quite happily run like hundreds and hundreds of those requests that are just waiting on backend calls.

48:46 Absolutely.

48:47 So one thing I want to throw in here, this is coming from me, not from you, but I want to get your thoughts on it.

48:53 So I recently set up this thing called Uptime Kuma.

48:56 Are you familiar with Uptime Kuma?

48:58 No.

48:59 So this is an open source, self-hosted Uptime monitoring tool.

49:03 And just run it in Docker.

49:06 Off it goes.

49:06 And one of the things that's nice about this is, well, I, over on Talk Python.

49:11 Now, if you go to the bottom, it's got a server status and you can come in here, you can see the websites behaving well.

49:17 The RSS feed is behaving well, the courses size, but also the mobile API, all these things.

49:22 But the reason I think this is, I mean, status, like, is it working or not?

49:26 It's not really that relevant.

49:27 That's like an operational thing.

49:28 But what's cool about it is if you go and dig into the thing, you go to the dashboard, like, let's say, you could even group them, right?

49:36 So like, let's do this.

49:38 I'll do the API.

49:39 So for like the mobile apps API or the mobile app for Talk Python courses, you can actually see over time what it's seen in response time for like days.

49:50 So with your load tests, you're saying, let me just hammer it and see how long, how it responds to that, right?

49:57 And if you point something like this, or there's many other tools like this, but this is a nice one.

50:01 You can just say, just keep a record and let me go back and look for, you know, for 24 hours.

50:06 How is the load looked?

50:08 And you can see there's some weird spike there.

50:09 It probably did like a deployment, like right around then or something.

50:12 But in general, you know, 40, 50 milliseconds, 35, 81, right?

50:17 It's handling it.

50:18 Because if the thing was overwhelmed, then these tests would also suffer the latency that everyone else is suffering, you know?

50:25 And this is like every 30 seconds.

50:27 Yeah, that's really cool.

50:28 I do want to give a shout out to OpenTelemetry.

50:30 I hope more people, this, every year, more and more people find out about this and they're like, oh, this solves a lot of my problems.

50:37 It's a CNCF project.

50:42 So it's a big open source standard.

50:44 It's basically is a standard for instrumentation, like an observability.

50:49 So you kind of plug these instrumentation libraries into your app.

50:54 And there's ones for fast, FastAPI, Django, Flask, like the backends, like Mongo, like SQLAlchemy, stuff like that.

51:02 And you basically just install these packages and it will just start to capture all this information about, oh, there's a request and it had these parameters and it took this long.

51:12 And so in Python, like with a very small amount of code and a few packages, you can instrument all this stuff in your application.

51:19 And then the other thing that OpenTelemetry does is it will have, it's kind of like a pluggable exporter.

51:25 So wherever you want to send all that data to, you can kind of pick and choose like which platform you want to send it to.

51:33 And there are some like local ones you can use as well.

51:36 Like you showed this tool for uptime.

51:38 There's some Docker containers for that.

51:40 So like you just spin up a Docker container and it will just take all that telemetry data and give it to you in a GUI.

51:46 So you can see like all the traces, how long they took, what calls it made on the backend, what queries it was running.

51:52 Like OpenTelemetry is a brilliant way of doing that.

51:55 And it's not just Python.

51:56 Like if you've got some components of your application that are written in other languages, like odds are that it is also supported.

52:04 So it's kind of like a framework, I guess, for capturing data.

52:07 And when you're doing load tests, this is how we do like postmortems to figure out or even just looking at the stats and seeing like where did things go slow.

52:16 And I've got some videos and stuff of demos I've done with this where I've got like applications.

52:21 I've done a load test on them and then I can see, oh, there was a spike in load.

52:25 Let's go back and look at the data to see what caused that and which calls was it?

52:30 You know, what was the resources, the memory usage?

52:32 And often like, you know, was it the database or was it an API call?

52:36 And why did it take so long?

52:38 What were the special parameters, stuff like that?

52:40 So you.

52:40 What were we waiting on?

52:41 Was it the actual web app or was it a database?

52:43 But like you can retroactively know that, right?

52:46 Yeah.

52:46 And I can tell them, she's a great, great way of doing that.

52:48 Yeah.

52:48 That's super cool.

52:49 That's way more holistic than my just give me a graph or response time.

52:53 Yeah.

52:53 You know, that's still cool.

52:54 It's still cool.

52:56 It's, you know, 20 minutes to set it up in the Docker cluster.

52:59 Yeah.

52:59 Awesome.

53:00 Well, let's talk about, I know we're basically out of time, Anthony.

53:03 Let's just talk really quickly about one thing that I think is a little bit tricky.

53:07 And maybe just to get your thoughts on, you know, my Vercel example of cost is sort of

53:13 in this space.

53:13 And that's serverless.

53:15 What are your thoughts on serverless?

53:17 Like you have way more control when you have a VM or you've got a Kubernetes cluster or whatever

53:22 it is you're working with.

53:23 Right.

53:23 I don't know.

53:24 I always kind of see serverless.

53:25 It's a really cool idea, but it's more of a commercial concept than a technical one.

53:30 Yeah.

53:31 I'm thinking more about just the, you don't control the warmup.

53:34 Yeah.

53:34 You don't control the machine.

53:36 You know, there's just a lot of stuff that's black box to you.

53:39 I think most of the platforms and most of the providers have got a, like a range of options.

53:45 Like I speak to Azure Functions, which is like our serverless Python engine.

53:51 You can use that as like just pay per request, but you know, it then has certain optimizations

53:56 so that it has like a warm startup time.

53:58 So, you know, if it's, you know, if you haven't made any requests for, you know, a certain number

54:02 of hours, then the application will fall asleep.

54:04 So it's not using resources and we're not charging you money.

54:07 But if you don't want to have that, if you want the startup time to be super fast, then

54:10 you can use a different service level basically and have like faster startups.

54:15 You can have like pre-allocated resources, which then becomes not serverless, but it's

54:20 the same framework.

54:21 It's the same architecture and Lambda is the same.

54:23 AWS Lambda is the same.

54:25 You can kind of set it up in those different options.

54:27 Things kind of get tricky when you get into Kubernetes because you've got real infrastructure

54:32 on the background.

54:33 And, you know, when you look at Kubernetes clusters or Kubernetes as a service or however you're

54:38 setting it up, the first question is like, how many VMs do I need to provision and how

54:42 big do they need to be?

54:43 So yes, you're kind of building like a serverless abstraction on top of it, but you've got real

54:50 infrastructure in the background.

54:51 That's actually really hard because in a lot of cases, you've just got tons of idle infrastructure.

54:56 So I think low testing is a good way of looking at like trying to right size what you've got

55:01 provisioned.

55:02 In many cases, scaling down some pits and scaling up others.

55:06 I don't mess with Kubernetes.

55:06 It's too much for me.

55:08 I don't need all that.

55:09 I don't need all that stuff.

55:11 I do use Docker though, which is really, really nice.

55:13 And I know a lot of these tools we talked about support running the Docker or working with Docker

55:17 and so on.

55:18 Yeah.

55:18 All right.

55:19 Well, hopefully people have some really concrete ideas and tools that they can use like Locus.io,

55:25 which we're both huge fans of.

55:27 Playwright, same deal, but also some of the philosophy and ideas behind it, which is super

55:32 important too.

55:32 So much appreciated you coming on the show to share that and just the chance to catch up.

55:36 How about you give us a quick wrap up, important takeaways for people who want to go out and

55:41 test their stuff now?

55:42 Yeah.

55:42 So I think step one is to look at your application and understand what your users are likely to

55:48 do.

55:48 So if you want to design a low test, start simple and maybe even start with Playwright

55:52 because you can just, you just spin up the browser recorder, click around in the website

55:56 and, you know, simulate what a user would be doing.

55:59 Stitch that together with Locust and test 10 users.

56:02 Don't go nuts.

56:03 Start off with a small number, simple test, and you will uncover the things that need optimizing.

56:09 I don't think I've ever encountered an application that just ran really efficiently the first time

56:13 and then just keep working on that.

56:15 So yeah, instead, because often you would end up trying to, kind of optimize things that are not going to get touched or don't really make a difference

56:21 when you actually test it.

56:23 So yeah, start simple.

56:24 I recommend using Locust and Playwright if you want, or you can just write a simple Locust

56:29 test and then put the instrumentation in the backend so that you can see not just the response

56:34 times, but you can see as much data as you can, as possible on what's happening and how

56:39 long it's taking.

56:40 And I'll share like a couple of links of some simple dashboards you can use with OpenTelemetry,

56:46 whether it will capture that data locally or in the cloud and show you, you know, a trace

56:50 of every request.

56:51 Awesome.

56:52 And do that with real data.

56:54 Don't do it with three entries.

56:55 Yeah.

56:57 It's not going to, not going to mean what you think it means if you do it with only

57:00 three entries.

57:00 Yeah.

57:00 All right.

57:01 Well, always great to have you on the show.

57:03 Thanks for being here, Anthony.

57:04 Catch you later.

57:04 Great to be back.

57:05 Thanks for much.

57:06 Yeah.

57:06 Bye.

57:06 This has been another episode of Talk Python To Me.

57:10 Thank you to our sponsors.

57:12 Be sure to check out what they're offering.

57:13 It really helps support the show.

57:15 Take some stress out of your life.

57:17 Get notified immediately about errors and performance issues in your web or mobile applications with

57:22 Sentry.

57:22 Just visit talkpython.fm/sentry and get started for free.

57:27 And be sure to use the promo code talkpython, all one word.

57:31 This episode is brought to you by WorkOS.

57:34 If you're building a B2B SaaS app, at some point, your customers will start asking for

57:39 enterprise features like SAML authentication, skim provisioning, audit logs, and fine-grained

57:44 authorization.

57:45 WorkOS helps ship enterprise features on day one without slowing down your core product development.

57:51 Find out more at talkpython.fm/workos.

57:55 Want to level up your Python?

57:56 We have one of the largest catalogs of Python video courses over at Talk Python.

58:01 Our content ranges from true beginners to deeply advanced topics like memory and async.

58:05 And best of all, there's not a subscription in sight.

58:08 Check it out for yourself at training.talkpython.fm.

58:11 Be sure to subscribe to the show.

58:13 Open your favorite podcast app and search for Python.

58:16 We should be right at the top.

58:17 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

58:22 and the direct RSS feed at /rss on talkpython.fm.

58:26 We're live streaming most of our recordings these days.

58:29 If you want to be part of the show and have your comments featured on the air,

58:33 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

58:37 This is your host, Michael Kennedy.

58:39 Thanks so much for listening.

58:41 I really appreciate it.

58:42 Now get out there and write some Python code.

58:44 I'll see you next time.