WEBVTT

00:00:00.001 --> 00:00:04.900
You're about to launch your new app or API, or even just a big refactor of your current project.

00:00:04.900 --> 00:00:09.460
Will it stand up and deliver when you put it in production? Or will it wither and collapse?

00:00:09.460 --> 00:00:14.900
How do you know? Well, you would test that, of course. We have Anthony Shaw back on the podcast

00:00:14.900 --> 00:00:20.060
to dive into a wide range of tools and techniques for performance and load testing of web apps.

00:00:20.060 --> 00:00:25.860
This is Talk Python To Me, episode 479, recorded August 8th, 2024.

00:00:25.860 --> 00:00:28.600
Are you ready for your host, please?

00:00:29.100 --> 00:00:32.300
You're listening to Michael Kennedy on Talk Python To Me.

00:00:32.300 --> 00:00:35.980
Live from Portland, Oregon, and this segment was made with Python.

00:00:35.980 --> 00:00:42.060
Welcome to Talk Python To Me, a weekly podcast on Python.

00:00:42.060 --> 00:00:44.280
This is your host, Michael Kennedy.

00:00:44.280 --> 00:00:49.640
Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

00:00:49.640 --> 00:00:57.540
both accounts over at fosstodon.org, and keep up with the show and listen to over nine years of episodes at talkpython.fm.

00:00:57.700 --> 00:01:02.140
If you want to be part of our live episodes, you can find the live streams over on YouTube.

00:01:02.140 --> 00:01:08.360
Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about upcoming shows.

00:01:08.360 --> 00:01:10.380
This episode is brought to you by Sentry.

00:01:10.380 --> 00:01:12.160
Don't let those errors go unnoticed.

00:01:12.160 --> 00:01:14.020
Use Sentry like we do here at Talk Python.

00:01:14.420 --> 00:01:17.380
Sign up at talkpython.fm/sentry.

00:01:17.380 --> 00:01:19.660
And it's brought to you by WorkOS.

00:01:19.660 --> 00:01:26.960
If you're building a B2B SaaS app at some point, your customers will start asking for enterprise features like SAML authentication,

00:01:26.960 --> 00:01:30.900
SKIM provisioning, audit logs, and fine-grained authorization.

00:01:30.900 --> 00:01:36.840
WorkOS helps ship enterprise features on day one without slowing down your core product development.

00:01:37.300 --> 00:01:40.720
Find out more at talkpython.fm/workos.

00:01:41.580 --> 00:01:46.900
Anthony, welcome back to Talk Python To Me again, again and again.

00:01:46.900 --> 00:01:48.060
So awesome to have you back.

00:01:48.060 --> 00:01:48.760
Always good.

00:01:48.760 --> 00:01:49.320
Yeah, yeah.

00:01:49.320 --> 00:01:49.940
Great to be here.

00:01:49.940 --> 00:01:51.020
Good to catch up.

00:01:51.020 --> 00:01:52.340
Good to have you on the show.

00:01:52.340 --> 00:01:56.580
I know we've got a really interesting topic to talk about.

00:01:56.580 --> 00:01:58.340
Performance, load testing.

00:01:58.760 --> 00:02:05.680
How do you know if your website or your API is going to work when you ship it to the world in a real world way?

00:02:05.680 --> 00:02:10.480
Not just how many requests per second can it take, but a real use case, as I know.

00:02:10.480 --> 00:02:12.960
You're going to tell us all about, so that's going to be awesome.

00:02:12.960 --> 00:02:20.320
Before we do, you know, just a quick introduction to who you are and maybe for most 98% of the world who knows you, what have you been up to?

00:02:20.320 --> 00:02:24.860
Yeah, so I am the Python advocacy lead at Microsoft.

00:02:24.860 --> 00:02:29.760
I do a bunch of open source work, maintain some projects and stuff like that.

00:02:29.760 --> 00:02:33.960
Wrote a book on Python, the Python compiler called CPython Internals.

00:02:33.960 --> 00:02:45.040
But these days, I'm mostly known as the person that created VS Code Pets, which was a bit of fun, but has become the most popular piece of software I've ever written.

00:02:45.040 --> 00:02:46.940
So that's been interesting.

00:02:46.940 --> 00:02:48.900
It's now got over a million active users.

00:02:48.900 --> 00:02:50.800
I'm not sure how that works.

00:02:50.800 --> 00:02:51.620
Careful what you create.

00:02:51.620 --> 00:02:53.040
You might get known for it, you know?

00:02:53.040 --> 00:02:54.000
Yeah, I know.

00:02:54.380 --> 00:02:59.860
Yeah, it's interesting when you go to conferences and stuff now and I'm like, oh, I work on this project and this project.

00:02:59.860 --> 00:03:03.300
And then you'd mention the pets thing and they're like, oh, you're the pets person.

00:03:03.300 --> 00:03:04.460
Oh, you're the developer.

00:03:04.460 --> 00:03:10.920
I'm like, you spent a year writing a deep book on the internals of CPython and it's runtime.

00:03:10.920 --> 00:03:14.180
I don't know who you are, but this pets thing is killer, man.

00:03:14.180 --> 00:03:16.000
Yeah, there's a cat that runs around in VS Code.

00:03:16.000 --> 00:03:16.800
So that's cool.

00:03:16.800 --> 00:03:18.780
Can it be a dog as well?

00:03:18.780 --> 00:03:19.880
What kind of pets can we have?

00:03:19.880 --> 00:03:20.600
Oh, there's everything.

00:03:20.980 --> 00:03:22.360
Yeah, it can be a dog.

00:03:22.360 --> 00:03:28.040
There's like chickens and turtles and snakes and everything you can think of.

00:03:28.040 --> 00:03:29.860
It's a pretty active repository as well.

00:03:29.860 --> 00:03:36.080
We get a lot of feature requests for new shades of fur and new pets and new behaviors and stuff like that.

00:03:36.240 --> 00:03:41.040
So, yeah, if you haven't checked it out, then check out the VS Code pets extension for VS Code.

00:03:41.040 --> 00:03:43.820
Yeah, I installed it for a while.

00:03:43.820 --> 00:03:44.780
I had to uninstall it.

00:03:44.780 --> 00:03:45.720
It was too much.

00:03:45.720 --> 00:03:49.440
If you're one of those people that likes having distractions, then it's helpful.

00:03:49.440 --> 00:03:55.460
If you find it hard to have a little thing running around whilst you're trying to code, then it might be a bit much.

00:03:55.460 --> 00:03:56.860
A little bit like power mode.

00:03:56.860 --> 00:03:57.460
Oh, yeah.

00:03:57.460 --> 00:03:58.020
I don't know.

00:03:58.020 --> 00:03:59.860
Are you familiar with the power mode?

00:04:00.020 --> 00:04:01.560
I used the one in JetBrains.

00:04:01.560 --> 00:04:04.760
Yeah, I used the power mode for JetBrains when that was around.

00:04:04.760 --> 00:04:05.500
It was pretty cool.

00:04:05.500 --> 00:04:09.000
Yeah, if you want maximum distraction for your work.

00:04:09.000 --> 00:04:13.040
Yeah, it reminds me of Unreal Tournament, the sort of power modes for that.

00:04:13.040 --> 00:04:14.540
That's right.

00:04:14.540 --> 00:04:15.140
That's awesome.

00:04:15.560 --> 00:04:19.200
Well, let's start off with some stories.

00:04:19.200 --> 00:04:32.340
So, I think everyone has a sense of, like, why do you want your website or your API or your microservice thing of API and website, however you combine these things, to work well and understand.

00:04:32.340 --> 00:04:34.900
But it's always fun to share some stories.

00:04:34.900 --> 00:04:36.040
I know you got some.

00:04:36.040 --> 00:04:37.700
Yeah, so we want to talk about load testing.

00:04:37.700 --> 00:04:48.680
And I think a fun thing to do with load testing is to reflect on times where people haven't done it the right way and it's caused a really public disaster.

00:04:48.680 --> 00:04:57.400
But the analogy that I use is the halftime problem, which I'm English, and so this is mostly a soccer thing.

00:04:57.400 --> 00:05:00.040
But we've got really big soccer games.

00:05:00.040 --> 00:05:02.260
And it would be the same, I guess, with the Super Bowl.

00:05:02.260 --> 00:05:05.420
Yeah, I was just thinking the American Super Bowl, football Super Bowl.

00:05:05.700 --> 00:05:06.320
It's got to be.

00:05:06.320 --> 00:05:11.620
Yeah, but the difference is that in soccer, at 45 minutes, you've got a 15-minute break.

00:05:11.620 --> 00:05:14.700
Whereas in the Super Bowl, every 15 minutes, you've got a 15-minute break.

00:05:14.700 --> 00:05:17.980
Well, and it's just loaded with commercials.

00:05:17.980 --> 00:05:21.360
So, you've got, like, every five minutes, there's, like, oh, here's two more commercial breaks.

00:05:21.360 --> 00:05:25.380
Where soccer is a flowing game, there's, like, no break until halftime.

00:05:25.380 --> 00:05:26.560
That's the difference, I think.

00:05:26.560 --> 00:05:35.360
So, what happens with a big game, so if it's, like, a FA Cup final or, like, a, you know, like, a Champions League final or something, then you basically

00:05:35.360 --> 00:05:42.740
have 100 million people all stop watching the TV at the same time for 15 minutes, go to the

00:05:42.740 --> 00:05:46.760
kitchen, turn on the kettle, make a cup of tea, go to the bathroom.

00:05:46.760 --> 00:05:53.240
And so, the electricity providers actually have to plan for this because the kettle uses, like,

00:05:53.240 --> 00:05:55.260
a couple of kilowatts of energy.

00:05:55.260 --> 00:05:57.620
Especially in the UK where it's 240.

00:05:57.940 --> 00:05:58.620
Yeah, exactly.

00:05:58.620 --> 00:05:59.120
Yeah.

00:05:59.120 --> 00:06:05.720
And all of a sudden, you've got, like, tens of millions of people all switching on their two kilowatt kettles at the same time.

00:06:05.820 --> 00:06:14.320
So, the electricity providers basically have to plan around the soccer games so that they get this massive spike in load in their grid.

00:06:14.320 --> 00:06:18.540
So, they actually do look at the sports schedules to basically plan around this.

00:06:18.540 --> 00:06:20.880
Especially if there's something like a World Cup or something like that.

00:06:20.880 --> 00:06:27.840
And this is kind of like a load testing thing where you kind of need to plan ahead and think about what the load on the system is going to be.

00:06:28.200 --> 00:06:31.980
And is it a spike in traffic or is it, like, distributed?

00:06:31.980 --> 00:06:38.600
And I've definitely seen a couple of occasions where it's gone a bit wrong.

00:06:38.600 --> 00:06:43.360
So, here in Australia, we had a census maybe eight years ago.

00:06:43.360 --> 00:06:45.400
And normally, it's a paper census.

00:06:45.400 --> 00:06:47.900
You fill in the form, say who you are, what you do.

00:06:47.900 --> 00:06:50.240
But this time, they wanted to do it online as well.

00:06:50.240 --> 00:06:52.760
And they really encouraged people to use the online version.

00:06:52.760 --> 00:06:58.060
And they set up the system, set up the website, and said, okay, everyone can fill in the census.

00:06:58.540 --> 00:07:01.260
There's only 20-something million people in Australia.

00:07:01.260 --> 00:07:02.540
It's not a very big population.

00:07:02.540 --> 00:07:06.680
But on the last night before the census was due, it crashed.

00:07:06.680 --> 00:07:10.360
Because everybody logged on at the last minute and tried to fill it in.

00:07:10.360 --> 00:07:12.500
And they hadn't tested it properly.

00:07:12.500 --> 00:07:14.160
And it just wasn't ready for the load.

00:07:14.160 --> 00:07:17.320
And then the postmortem, they said, oh, we did the load testing.

00:07:17.320 --> 00:07:18.580
And we tested it.

00:07:18.580 --> 00:07:19.340
And it seemed fine.

00:07:19.340 --> 00:07:24.580
But we expected everybody to fill in the census over the six months they had to do it.

00:07:24.580 --> 00:07:26.860
We didn't think that people would leave it to the last minute,

00:07:26.980 --> 00:07:29.820
which is more like, well, what did you expect to happen?

00:07:29.820 --> 00:07:33.940
This is human nature that you're going to put it off until the last possible moment.

00:07:33.940 --> 00:07:35.400
Yeah, I can only see two spikes.

00:07:35.400 --> 00:07:38.120
A small spike at the beginning and a huge spike at the end.

00:07:38.120 --> 00:07:39.260
And like nothing in the middle.

00:07:39.260 --> 00:07:39.980
Yeah, exactly.

00:07:39.980 --> 00:07:45.340
So they had to kind of delay the deadline and then provision new infrastructure and stuff like that.

00:07:45.340 --> 00:07:46.800
So yeah, it was interesting.

00:07:46.800 --> 00:07:49.820
But it kind of highlighted how you need to think about load testing.

00:07:49.820 --> 00:07:51.200
Yeah, that's nuts.

00:07:51.680 --> 00:08:01.180
And I guess the mode before, the style before was kind of a distributed queuing system using paper envelopes where the queuing literally was physical.

00:08:01.180 --> 00:08:05.600
And then it would be processed at the rate at which the system can handle it, right?

00:08:05.600 --> 00:08:09.040
Which, you know, however fast you can get them in based on paper.

00:08:09.620 --> 00:08:12.320
But then you try to turn it into an interactive system.

00:08:12.320 --> 00:08:12.840
And oh, no.

00:08:12.840 --> 00:08:13.640
Yeah, yeah, exactly.

00:08:13.640 --> 00:08:16.880
You know, I have another example to share.

00:08:16.880 --> 00:08:20.360
Like we've seen websites that, you know, fall down like this.

00:08:20.360 --> 00:08:21.240
It fell down.

00:08:21.240 --> 00:08:25.740
And when something has to be done, I think it's probably really tricky to communicate.

00:08:25.740 --> 00:08:30.140
Let's say 10 million people tried to fill it out that day and it crashed on them.

00:08:30.220 --> 00:08:33.800
Well, how do they know that the deadline's extended and how do they come back, right?

00:08:33.800 --> 00:08:39.080
Like there's, it creates this cascading chain of really challenging problems all of a sudden to deal with it.

00:08:39.080 --> 00:08:45.160
You know, we had the healthcare Obamacare stuff in the U.S. where soon as that thing opened, it just completely died.

00:08:45.160 --> 00:08:47.540
And for weeks, people couldn't get insurance, which was bad.

00:08:47.540 --> 00:08:49.220
But that's not the story I want to share.

00:08:49.220 --> 00:08:50.360
I want to share a different one.

00:08:50.360 --> 00:08:56.640
There was this app, this person who was frustrated about, they're an artist, like a photographer, I believe.

00:08:56.640 --> 00:09:03.020
And they're frustrated about LLMs and diffusion models taking everyone's art and generating new art from it.

00:09:03.020 --> 00:09:10.260
And kind of, you know, I guess it's up to people's opinion whether or not it counts as stealing or copyright theft or it's fair use or whatever.

00:09:10.260 --> 00:09:12.820
But this person wasn't a fan of that.

00:09:12.820 --> 00:09:18.480
So they came up with a web app based on serverless that would look at stuff and say, this is real art.

00:09:18.480 --> 00:09:20.320
This is AI generated art.

00:09:20.680 --> 00:09:27.060
And it's not exactly the same, but they ended up with a $96,000 Vercel bill.

00:09:27.060 --> 00:09:36.080
Because they didn't really take into account how much it's going to cost per request to handle this on their serverless.

00:09:36.080 --> 00:09:37.460
It was like 20 cents a user.

00:09:37.460 --> 00:09:44.280
And everyone's like, you know, this is right at the height of the LLM boom and all the AI art boom and stuff.

00:09:44.280 --> 00:09:45.580
And people just swarm to it.

00:09:45.580 --> 00:09:46.100
Yeah, yeah.

00:09:46.100 --> 00:09:48.280
So that's a different kind of problem, right?

00:09:48.280 --> 00:09:49.640
Yeah, that's the cost issue.

00:09:49.960 --> 00:09:52.260
Yeah, there are definitely other ones to keep in mind.

00:09:52.260 --> 00:09:55.860
And we mentioned like, you know, the census and the Obamacare thing.

00:09:55.860 --> 00:10:03.120
And you might be listening and thinking, well, you know, my website is not going to get, you know, 15 million people logging in all at the same time.

00:10:03.120 --> 00:10:05.980
So like, is this really an issue that I need to handle?

00:10:05.980 --> 00:10:10.820
And like a few months ago, I was doing a load test of an AI app.

00:10:10.820 --> 00:10:16.600
So like a chat app and designed the load test and ran it with 10 users and it had issues.

00:10:16.600 --> 00:10:20.020
So like, this doesn't need to be a tens of millions of users problem.

00:10:20.020 --> 00:10:26.380
And the issue that it uncovered was like, oh, there's a rate limit on the number of tokens you can send to the LLM.

00:10:26.380 --> 00:10:28.220
And by default, it's a really small number.

00:10:28.220 --> 00:10:34.260
And if you have 10 people asking questions simultaneously, then that gets exhausted really quickly.

00:10:34.720 --> 00:10:36.200
And it just throttles you.

00:10:36.200 --> 00:10:39.380
And then like the app just says, oh, rate limit error.

00:10:39.380 --> 00:10:50.260
And you just wouldn't have noticed because if you're a single user clicking around, typing in, you know, questions, or even if you've written some fancy like UI testing or something, you're like, oh, we've tested it.

00:10:50.300 --> 00:10:51.160
And it works great.

00:10:51.160 --> 00:10:54.280
But, you know, you run 10 people using it at the same time.

00:10:54.280 --> 00:10:55.760
And that's where you start to notice problems.

00:10:55.760 --> 00:11:11.320
I've more than once considered putting some kind of LLM AI augment, you know, rag sort of thing in front of Talk Python because, God, at this point, eight, nine years of full transcripts, human corrected with all sorts of, you know, pretty highly accurate stuff.

00:11:11.320 --> 00:11:13.360
It would be awesome to have people have conversations about it.

00:11:13.360 --> 00:11:15.500
But I don't know, just the workload.

00:11:15.500 --> 00:11:18.080
I can see so many people coming to it and go, that's neat.

00:11:18.080 --> 00:11:19.440
I'm going to ask it about my homework.

00:11:19.840 --> 00:11:21.000
It has nothing to do with it.

00:11:21.000 --> 00:11:28.560
Just using it as like a free AI to just, instead of bothering to download something or use ChatGPT, I'll just use yours.

00:11:28.560 --> 00:11:30.660
And I'm like, eh, probably not worthwhile.

00:11:30.660 --> 00:11:31.240
All the trouble.

00:11:31.240 --> 00:11:35.580
Yeah, you can set them up so that they only answer questions about your own stuff.

00:11:35.580 --> 00:11:39.460
They don't act as a fancy front end for ChatGPT.

00:11:39.460 --> 00:11:45.740
We've got plenty of demos like that, that you give it your own internal documents and it only answers questions about those.

00:11:45.740 --> 00:11:51.060
And you can set it up so that if it doesn't, if it can't figure out a reliable answer, it tells you.

00:11:51.060 --> 00:11:52.280
It says, I can't.

00:11:52.280 --> 00:11:52.700
Yeah, that's great.

00:11:52.700 --> 00:11:54.580
Rather than just making something up.

00:11:55.840 --> 00:11:56.960
Yeah, that's really nice.

00:11:56.960 --> 00:12:01.520
Yeah, I forgot how closely you guys work with OpenAI and all them, right?

00:12:01.520 --> 00:12:01.920
Yeah.

00:12:01.920 --> 00:12:07.100
I mean, the compute in Azure to run and train all that stuff is, it's out of control.

00:12:07.280 --> 00:12:09.480
Yeah, the GPUs alone are insane.

00:12:09.480 --> 00:12:16.660
Yeah, it's, you know, I talked to Mark Rosinovich, a bit of a diversion, but about some of the hardware and some of the data center stuff.

00:12:16.660 --> 00:12:23.120
And it's like, we're going to create distributed GPUs that connect back over fiber to the actual hardware.

00:12:23.120 --> 00:12:26.280
So you can scale your GPUs and CPUs independently.

00:12:26.280 --> 00:12:28.660
There's just all kinds of crazy, interesting stuff there.

00:12:28.960 --> 00:12:32.100
But that's a side, a bit of a side note, a bit of a side note.

00:12:32.100 --> 00:12:32.560
All right.

00:12:32.560 --> 00:12:37.060
So let's talk about basically what do you want to do to design a load test?

00:12:37.060 --> 00:12:48.460
Like it's more than just let me write a loop and see how many requests I can make, you know, while true, call, you know, use requests and call this or call this endpoint or something, right?

00:12:48.460 --> 00:13:00.180
Yeah, so there's, I guess, what I normally highlight with load testing is that the wrong way to start is to try and work out how many requests per second your server can handle.

00:13:00.180 --> 00:13:06.000
That's interesting for throughput, but it's not a good measure of real user traffic.

00:13:06.000 --> 00:13:10.080
And the main reason is that every user is unique.

00:13:10.080 --> 00:13:11.820
They do different things.

00:13:11.820 --> 00:13:13.120
They follow different paths.

00:13:13.120 --> 00:13:14.660
They put in different data.

00:13:15.000 --> 00:13:21.320
And also when you're running like a benchmark, then it's just it's not waiting between requests.

00:13:21.320 --> 00:13:23.960
It's just trying to hammer as much as it can.

00:13:23.960 --> 00:13:28.880
So real users pause and read and click and wait.

00:13:28.880 --> 00:13:30.260
And there's normally latency.

00:13:30.260 --> 00:13:34.800
They don't request the same page over and over and over, right?

00:13:34.800 --> 00:13:36.960
They request a blend of pages.

00:13:36.960 --> 00:13:44.520
Here they go to the homepage and they do a search and they explore what they find in the search and they go back and do another search and then they go to check out or whatever, right?

00:13:44.760 --> 00:13:50.500
And as you're pointing out, they don't just hold down control or command R and just flicker the screen as hard as they can.

00:13:50.500 --> 00:13:52.480
They they're they're reading and they're interacting.

00:13:52.480 --> 00:14:04.920
And so what you're saying is if you really want to say how many actual users, not just kind of a throughput number, but how many people using the app do you possibly expect it could keep working or right?

00:14:05.000 --> 00:14:06.740
You got to factor all these things in, right?

00:14:06.740 --> 00:14:07.140
Yeah.

00:14:07.140 --> 00:14:16.520
So you got to factor in the randomness of where they go, what they type in the weights between each click or each API request.

00:14:16.520 --> 00:14:23.960
And then something else that's really important is that most modern applications is not just the initial HTTP request.

00:14:23.960 --> 00:14:32.560
It's the 95 additional requests to all the scripts and the resources and the Ajax calls and everything like that.

00:14:32.560 --> 00:14:38.160
So if it's if it's a browser based application, then typically, you know, it's not just the initial request.

00:14:38.160 --> 00:14:39.680
It's everything that happens after it.

00:14:39.900 --> 00:14:44.900
And I've definitely seen times where people have done a load test and said, oh, yeah, the website runs great.

00:14:44.900 --> 00:14:51.020
And then in the app, they had like a JavaScript polar that would like refresh the data or something every minute.

00:14:51.020 --> 00:14:53.300
And then they didn't test that.

00:14:53.300 --> 00:14:59.160
And then you get thousands of users who leave the tab in their browser and it's just polling in the background.

00:14:59.160 --> 00:15:05.420
So you've basically got this continuous stream of traffic to a special API that does like polling.

00:15:05.640 --> 00:15:09.940
And they hadn't load tested that and that produced a huge like spike in load and that caused the issue.

00:15:09.940 --> 00:15:12.920
It's like 75 percent of the workload is the polling.

00:15:12.920 --> 00:15:13.760
Yeah, exactly.

00:15:13.760 --> 00:15:14.700
What's your browser?

00:15:14.700 --> 00:15:16.080
What's your browser tab story?

00:15:16.080 --> 00:15:18.780
Are you a person that just has tons and tons of tabs open?

00:15:18.780 --> 00:15:19.760
I don't know.

00:15:19.760 --> 00:15:21.420
I try and clean them up as much as possible.

00:15:21.420 --> 00:15:21.840
Yeah.

00:15:21.840 --> 00:15:24.620
And have like a couple open maybe.

00:15:24.620 --> 00:15:26.260
And there's some researching something.

00:15:26.260 --> 00:15:28.840
And then when I'm finished, I just do close all tabs.

00:15:28.840 --> 00:15:29.780
Yeah, that's me as well.

00:15:29.780 --> 00:15:33.700
I think there's a there's a good number of people that just have like 50 tabs open.

00:15:33.700 --> 00:15:34.360
They just leave them.

00:15:34.660 --> 00:15:36.220
And think about what that does for your website.

00:15:36.220 --> 00:15:39.380
If you've got some kind of timer based deal, right?

00:15:39.380 --> 00:15:41.640
You got to consider those people that have just left it.

00:15:41.640 --> 00:15:41.980
Yeah.

00:15:43.420 --> 00:15:46.380
This portion of Talk Python To Me is brought to you by Sentry.

00:15:46.380 --> 00:15:47.500
Code breaks.

00:15:47.500 --> 00:15:48.800
It's a fact of life.

00:15:48.800 --> 00:15:50.940
With Sentry, you can fix it faster.

00:15:50.940 --> 00:15:56.660
As I've told you all before, we use Sentry on many of our apps and APIs here at Talk Python.

00:15:56.980 --> 00:16:02.360
I recently used Sentry to help me track down one of the weirdest bugs I've run into in a long time.

00:16:02.360 --> 00:16:03.320
Here's what happened.

00:16:03.320 --> 00:16:15.080
When signing up for our mailing list, it would crash under a non-common execution pass, like situations where someone was already subscribed or entered an invalid email address or something like this.

00:16:15.440 --> 00:16:21.080
The bizarre part was that our logging of that unusual condition itself was crashing.

00:16:21.080 --> 00:16:24.280
How is it possible for our log to crash?

00:16:24.280 --> 00:16:26.860
It's basically a glorified print statement.

00:16:26.860 --> 00:16:28.560
Well, Sentry to the rescue.

00:16:29.000 --> 00:16:35.020
I'm looking at the crash report right now, and I see way more information than you'd expect to find in any log statement.

00:16:35.020 --> 00:16:38.060
And because it's production, debuggers are out of the question.

00:16:38.060 --> 00:16:49.960
I see the traceback, of course, but also the browser version, client OS, server OS, server OS version, whether it's production or Q&A, the email and name of the person signing up.

00:16:49.960 --> 00:16:52.000
That's the person who actually experienced the crash.

00:16:52.000 --> 00:16:54.820
Dictionaries of data on the call stack and so much more.

00:16:54.820 --> 00:16:55.760
What was the problem?

00:16:56.380 --> 00:17:05.480
I initialized the logger with the string info for the level rather than the enumeration dot info, which was an integer based enum.

00:17:05.480 --> 00:17:12.060
So the logging statement would crash saying that I could not use less than or equal to between strings and ints.

00:17:12.060 --> 00:17:13.460
Crazy town.

00:17:13.460 --> 00:17:19.640
But with Sentry, I captured it, fixed it, and I even helped the user who experienced that crash.

00:17:19.640 --> 00:17:21.080
Don't fly blind.

00:17:21.080 --> 00:17:22.760
Fix code faster with Sentry.

00:17:22.760 --> 00:17:26.780
Create your Sentry account now at talkpython.fm/sentry.

00:17:26.780 --> 00:17:39.120
And if you sign up with the code TALKPYTHON, all capital, no spaces, it's good for two free months of Sentry's business plan, which will give you up to 20 times as many monthly events as well as other features.

00:17:39.840 --> 00:17:51.680
You did mention that the CSS and the, you talked about the AJAX, but I think also things like CSS images, JavaScript files, not JavaScript execution, but just getting the files, right?

00:17:51.680 --> 00:17:53.500
Some of these frameworks are a couple hundred megs.

00:17:53.500 --> 00:17:59.480
Like if you're not somehow distributing that through a CDN, that could be worth considering as well, I think.

00:17:59.700 --> 00:18:00.480
Yeah, it is definitely.

00:18:00.480 --> 00:18:06.620
And so in Python, for example, you've got in Django, there's an extension called white noise.

00:18:06.620 --> 00:18:09.440
So in Django, you've got static content.

00:18:09.440 --> 00:18:16.760
There is a really easy extension you can install because normally you have to kind of configure static and say, okay, my static files are here.

00:18:16.760 --> 00:18:22.720
And then you have to set up Nginx or whichever web server you're doing to serve that static content directly.

00:18:22.720 --> 00:18:31.340
So as a bit of a workaround, people often install this white noise extension that basically uses Python as their static server.

00:18:31.340 --> 00:18:39.280
So like every time you request a CSS or whatever, Python actually is the thing that reads it from disks and serves it back, which is great for like development.

00:18:39.280 --> 00:18:44.520
But like you should never use that in production because Python is not a very good CDN.

00:18:44.860 --> 00:18:51.100
So yeah, that's kind of with load testing, you would just test the endpoint and say, oh, yeah, it works great.

00:18:51.100 --> 00:18:52.520
But then you actually run it in a browser.

00:18:52.520 --> 00:19:03.140
And if you're using something like white noise, it's actually creating 10 times, 20 times more load than you expected because it's pulling in all this like CSS and JavaScript and images and stuff like that.

00:19:03.140 --> 00:19:04.440
Yeah, it can be out of control.

00:19:04.440 --> 00:19:12.760
I ran into an issue with my website where, I don't know, I got tired of being too careful with font awesome fonts.

00:19:12.960 --> 00:19:14.440
And I had like missed one.

00:19:14.440 --> 00:19:15.480
I'm like, I'll just put the whole thing.

00:19:15.480 --> 00:19:17.900
I'll just put the whole CSS file in.

00:19:17.900 --> 00:19:19.180
I'm sure it'll be fine.

00:19:19.180 --> 00:19:19.980
It wasn't fine.

00:19:19.980 --> 00:19:26.040
Like the web app is running at like 10% CPU usage and Nginx trying to serve up all the JavaScript.

00:19:26.040 --> 00:19:26.940
Is it like 80?

00:19:26.940 --> 00:19:28.220
I'm like, what is it doing?

00:19:28.220 --> 00:19:32.860
Like, oh, it was serving up a megabyte size to font stuff.

00:19:32.860 --> 00:19:33.820
Every request.

00:19:33.820 --> 00:19:34.560
This is not good.

00:19:34.560 --> 00:19:42.400
So yeah, in terms of projects that I've worked on in the past, I've worked on load testing some like big campaigns and stuff like that.

00:19:43.080 --> 00:19:46.060
Particularly around sports events, awesome television.

00:19:46.060 --> 00:19:53.480
Back when people used to watch live television instead of streaming at all, you know, they'd have like a murder mystery or something on a big soap opera.

00:19:54.040 --> 00:20:05.500
And so I'd like load test the application so that when everyone wants to find out who murdered the vicar's wife or whatever, then they'd all kind of click on the website at the same time and just trying to load test it.

00:20:05.580 --> 00:20:14.460
So some of the things I'd seen were trying to make sure you simulate browser load correctly, trying to distribute traffic in a realistic way.

00:20:14.460 --> 00:20:18.320
And then spikes in traffic are basically a different problem.

00:20:18.320 --> 00:20:25.720
So the thing we talked about, like the halftime problem where you've got everybody turning on the kettle at the same time, that's a special type of load test.

00:20:25.720 --> 00:20:30.020
It's a predictable spike as opposed to just out of the blue spike, right?

00:20:30.020 --> 00:20:31.980
That one is predictable, which is really nice.

00:20:31.980 --> 00:20:33.960
And then you get things like seasonal traffic.

00:20:34.200 --> 00:20:46.420
So for a lot of e-commerce applications and stuff like that, you would expect in the lead up to Christmas or in the Black Friday sale in the US, for example, like Cyber Monday, you know, you'd expect like a spike in traffic for those.

00:20:46.420 --> 00:20:48.660
But it will be distributed over a day or two.

00:20:48.660 --> 00:20:52.560
So you want to be able to like properly assess those.

00:20:52.560 --> 00:20:58.260
Another spike that I think you and I can both relate to is the driver of the day, Formula One.

00:20:58.260 --> 00:20:58.860
Oh, right.

00:20:58.860 --> 00:21:01.320
You've got 10 laps left within the race.

00:21:01.480 --> 00:21:07.480
Everyone now go to this thing and press this button and you have five minutes to do it or something, right?

00:21:07.480 --> 00:21:09.180
Like that's going to be a mega spike.

00:21:09.180 --> 00:21:11.660
Yeah, there can be some big, big loads in traffic.

00:21:11.660 --> 00:21:17.240
Sudden spikes are really hard to handle because you get bottlenecks in the network.

00:21:17.240 --> 00:21:18.740
You get bottlenecks in the database.

00:21:18.740 --> 00:21:20.520
You get bottlenecks in the web server.

00:21:20.880 --> 00:21:24.380
So those are kind of a unique type of problem to test for.

00:21:24.380 --> 00:21:35.100
If you're looking more like waves of traffic, so like, oh, you know, traffic would build up during the weekday or the type of application I have actually gets busier at the weekend or in the evenings.

00:21:35.220 --> 00:21:38.720
Then those are kind of the ones where you test ramped traffic.

00:21:38.720 --> 00:21:48.520
So all these load testing tools have a ramp up time or ramp configuration where certain locusts, for example, you say, how many concurrent users do you want to get to?

00:21:48.680 --> 00:21:50.700
So let's say that's like a thousand or something.

00:21:50.700 --> 00:21:52.760
And then how much do you want to ramp up?

00:21:52.760 --> 00:22:02.560
And you generally should always use a ramp because if you've got a thousand users, unless you've put them all in a room and said, okay, everybody click on the button at the same time.

00:22:02.980 --> 00:22:04.560
Like that's not going to happen.

00:22:04.560 --> 00:22:07.620
People, people log in gradually over a period of time.

00:22:07.620 --> 00:22:14.980
And if you don't use ramping and load testing tools, you actually create, you kind of end up simulating stuff that's not realistic.

00:22:14.980 --> 00:22:20.440
So you'd see like a massive spike in response times as the application gets backed up.

00:22:20.440 --> 00:22:24.940
But if it's not really realistic that you get those types of spikes, then don't, don't simulate it that way.

00:22:24.940 --> 00:22:29.120
We're going to talk about it a little bit, but there's a lot of different layers of caching as well.

00:22:29.120 --> 00:22:35.840
And if the app has just come to life, many of those layers of caching are not set up and warmed up.

00:22:35.840 --> 00:22:37.140
So that's a big deal.

00:22:37.140 --> 00:22:44.220
And it could be as simple as the web app parsing the Jinja or Chameleon or Django template, right?

00:22:44.220 --> 00:22:45.940
That's the very first time that can be slow.

00:22:45.940 --> 00:22:52.040
But then if you set it up right in production, it will not parse at the second, third and fourth request, right?

00:22:52.040 --> 00:22:57.620
So if we talk about Locust as an example, you've got on the screen, I think it's my favorite load testing.

00:22:57.620 --> 00:22:59.000
Locust is so good.

00:22:59.000 --> 00:23:04.340
It's really flexible and you define the user flows in Python code.

00:23:04.340 --> 00:23:09.960
So you write a class that represents a user and then you program in the steps that they would take.

00:23:09.960 --> 00:23:15.060
So like, oh, they start on the homepage, then they click on this page, then they click on this page.

00:23:15.060 --> 00:23:20.340
And then you can program in like the pauses and the randomness and stuff like that.

00:23:20.340 --> 00:23:24.340
And actually set up different types of personas or different types of users.

00:23:24.340 --> 00:23:31.300
I think this is a great way of designing a load test because you're thinking about what would the user do rather than like what's the throughput.

00:23:31.300 --> 00:23:39.640
So if you set up a Locust test where they start off at a point in your site, if your site has a login, how many of your users would log in?

00:23:39.640 --> 00:23:42.100
Because I think it's important to start off.

00:23:42.100 --> 00:23:44.980
If this is a new application, it's really hard to know.

00:23:44.980 --> 00:23:49.400
So you're going to have to come up with an educated guess or test like a range of parameters.

00:23:49.400 --> 00:23:59.020
But if this isn't a website that you've had running for, you know, a year or two, and you've got like a history of traffic, then you can look at it and say, okay, how many people on the homepage are logged in?

00:23:59.020 --> 00:24:00.340
Because there's a big difference.

00:24:00.340 --> 00:24:11.600
Because when you talk about caching and like CDNs and stuff like that, you know, you can cache the rendered template, but you generally wouldn't cache the whole rendered template if they're logged in.

00:24:11.740 --> 00:24:18.720
Because at the top, it might say, you know, hello, Anthony, and you wouldn't want to cache that so that when Mike clicks on the website, it says hello, Anthony.

00:24:18.720 --> 00:24:19.700
And he's like, who the hell's that?

00:24:19.700 --> 00:24:21.320
That would be very disturbing, wouldn't it?

00:24:21.320 --> 00:24:26.380
So there's a balance between caching and serving up dynamic content.

00:24:26.380 --> 00:24:32.840
And then also if you're presenting, you know, lists of products or someone's got shopping cart or something like that, obviously that's unique to them.

00:24:32.840 --> 00:24:36.520
So you're kind of trying to simulate this as best as possible.

00:24:36.720 --> 00:24:44.140
Otherwise, you just create these really optimistic load tests where you're like, oh, we tested the homepage and it can handle 50,000 users.

00:24:44.140 --> 00:24:51.440
But in practice, if a thousand of those log in, then the whole thing falls apart because all of a sudden you're not using the cached version of the homepage.

00:24:51.440 --> 00:24:52.000
Right.

00:24:52.000 --> 00:24:56.260
You know, another example that you brought up is, let's just take Talk Python, right?

00:24:56.260 --> 00:24:58.380
It's got coming up on 500 episodes.

00:24:58.500 --> 00:25:10.700
If you test and suppose I put really aggressive caching, like on the database results that come back from a single episode for the page and then just render that out of memory, basically, at least just out of the objects that are in memory.

00:25:10.840 --> 00:25:16.240
If you do your test to just hit the one, like let's just randomly pick episode 300, hit that one a bunch of times.

00:25:16.240 --> 00:25:21.780
But in reality, people are hitting all four or 500 and it's not all cached the same.

00:25:21.780 --> 00:25:28.640
Or maybe there's so much memory that each one has to hold that it like putting all 500 in memory and the cache like runs it out of RAM on the server.

00:25:28.640 --> 00:25:32.640
All sorts of stuff that happens as you get this kind of dynamic mix, right?

00:25:32.800 --> 00:25:34.700
You want to introduce randomness.

00:25:34.700 --> 00:25:44.300
So for example, on the login, if you've got a login flow or if you're clicking on a particular product, then try not to hard code which one it is.

00:25:44.300 --> 00:25:47.380
If you've got caching, you might not even see the caching either.

00:25:47.380 --> 00:25:49.740
Like databases do a lot of caching.

00:25:49.740 --> 00:25:55.320
So like if you're always pulling the same thing from the database, the database server is probably going to cache that.

00:25:55.320 --> 00:25:57.360
So, you know, it's going to be a lot faster.

00:25:57.900 --> 00:26:04.640
So if you can randomize the like the inputs or the pages that you click on or the flows that people take, then that's great.

00:26:04.640 --> 00:26:09.700
Locust actually has like an extra when you define a task, which is a decorator on a function.

00:26:09.700 --> 00:26:12.200
You can have like how often this happens.

00:26:12.200 --> 00:26:16.440
So you can have some tasks which happen more sort of frequently than others.

00:26:16.440 --> 00:26:22.840
So you can say, OK, you know, five times more people go to the homepage and then every so often somebody does a search.

00:26:23.160 --> 00:26:31.160
But then when you do the search, when you want to low test the search, you know, you want to randomize that a bit more than just always searching for the same thing.

00:26:31.160 --> 00:26:32.580
Right, right, right, right.

00:26:32.580 --> 00:26:33.000
Yeah.

00:26:33.000 --> 00:26:38.360
And just so for people who haven't seen Locust, you create a class, you give it a couple of functions.

00:26:38.360 --> 00:26:43.300
And then, for example, to test the homepage, you just literally put the task decorator on the function.

00:26:43.300 --> 00:26:48.780
And you say self.client.get slash self.client.get, maybe some additional assets or not.

00:26:48.780 --> 00:26:54.600
And then you can even hard code into the class what the domain is or localhost or support or whatever.

00:26:54.600 --> 00:26:55.580
And that's it.

00:26:55.580 --> 00:27:00.400
And you just you can assign a weight to these tasks like homepage five times more likely than about.

00:27:00.400 --> 00:27:03.300
So just put task of five instead of task by itself.

00:27:03.300 --> 00:27:03.540
Right.

00:27:03.540 --> 00:27:04.260
It's incredible.

00:27:04.260 --> 00:27:05.520
Yeah, it's really, really helpful.

00:27:05.520 --> 00:27:07.260
And then there's a library.

00:27:07.260 --> 00:27:09.560
There's a couple of things you need to keep in mind as well.

00:27:09.560 --> 00:27:17.380
And like if the user logs in and then it probably creates some sort of session, whether that's like a cookie or a token or something like that.

00:27:17.380 --> 00:27:23.480
So you need to store that somewhere in the class so that subsequent requests use the same login session.

00:27:23.480 --> 00:27:31.600
And then also, you know, frameworks like Django and Flask have got cross-site request forgery like protection.

00:27:31.600 --> 00:27:35.480
So they generate these CSRF tokens in the forms as well.

00:27:35.480 --> 00:27:42.120
So there's normally like a bit involved in getting the cookie or the session ID, getting the CSRF value.

00:27:42.120 --> 00:27:52.680
And then like, say, if you're submitting forms or you're like doing a search or something, you need to code a bit in Locust to work around the security controls.

00:27:52.680 --> 00:27:53.040
Right.

00:27:53.040 --> 00:28:03.100
For example, you might do a request to the page that has the form and then pull back the CRF token from a cookie and then use that as part of the form submission data.

00:28:03.440 --> 00:28:05.440
Otherwise, it might just say invalid.

00:28:07.120 --> 00:28:10.140
This portion of Talk Bython is brought to you by WorkOS.

00:28:10.140 --> 00:28:20.840
If you're building a B2B SaaS app, at some point, your customers will start asking for enterprise features like SAML authentication, skim provisioning, autologs and fine grained authorization.

00:28:20.840 --> 00:28:29.980
That's where WorkOS comes in with easy to use APIs that help you ship enterprise features on day one without slowing down your core product development.

00:28:30.200 --> 00:28:38.360
Today, some of the fastest growing startups in the world are powered by WorkOS, including ones you probably know, like Perplexity, Vercel and Webflow.

00:28:38.360 --> 00:28:48.060
WorkOS also provides a generous free tier of up to one million monthly active users for AuthKit, making it the perfect authentication layer for growing companies.

00:28:48.420 --> 00:28:53.460
It comes standard with useful features like RBAC, MFA and bot protection.

00:28:53.460 --> 00:28:59.460
If you're currently looking to build SSO for your first enterprise customer, you should consider using WorkOS.

00:28:59.460 --> 00:29:03.660
Integrate in minutes and start shipping enterprise plans today.

00:29:04.000 --> 00:29:06.780
Just visit talkpython.fm/workos.

00:29:06.780 --> 00:29:08.920
The link is in your podcast player show notes.

00:29:08.920 --> 00:29:11.160
Thank you to WorkOS for supporting the show.

00:29:11.160 --> 00:29:17.120
You know, another one that would be really tricky would be like turnstile or reCAPTCHA.

00:29:17.120 --> 00:29:19.440
You're probably basically not getting that.

00:29:19.440 --> 00:29:20.220
It's not worth it.

00:29:20.220 --> 00:29:22.320
Yeah, you're stuck there.

00:29:22.320 --> 00:29:23.600
You've got to fill in those ones.

00:29:23.600 --> 00:29:27.340
Just maybe turn it off real quick for your test and then turn it back on.

00:29:27.340 --> 00:29:27.800
I don't know.

00:29:27.800 --> 00:29:30.420
I mean, you could do it, turn it off in development or something like that, right?

00:29:30.420 --> 00:29:30.840
Yeah.

00:29:30.840 --> 00:29:38.180
So when I look at this, this locus stuff, I see behind the scenes something like requests or HTTPX,

00:29:38.180 --> 00:29:45.860
where all it does is pull back the string of the HTML, or maybe even it just gets the head and actually throws away the content.

00:29:45.860 --> 00:29:46.920
I bet it streams it back.

00:29:46.920 --> 00:29:54.100
But what I don't imagine it does is it doesn't parse the HTML, realize that it's a view front end,

00:29:54.100 --> 00:29:59.580
execute the JavaScript that then has three API calls to more stuff on the back on the server, right?

00:29:59.580 --> 00:30:05.020
If it's a rich front end app, it probably doesn't treat it the same if we write it just like this, right?

00:30:05.020 --> 00:30:05.420
Yeah.

00:30:05.420 --> 00:30:11.920
So the other one I wanted to highlight is an extension to locust so you can connect it with playwrights.

00:30:11.920 --> 00:30:12.440
At this thing?

00:30:12.440 --> 00:30:14.020
It does HTML parsing.

00:30:14.020 --> 00:30:23.420
That's more like looking at the HTML to see if like a particular, you know, like a beautiful soup or responses was the framework for this.

00:30:23.420 --> 00:30:23.860
Right.

00:30:24.000 --> 00:30:27.160
Make sure timeout does not appear in the text or something like that, right?

00:30:27.160 --> 00:30:27.560
Yeah.

00:30:27.560 --> 00:30:30.180
Make sure the page doesn't have error in big letters.

00:30:30.180 --> 00:30:34.300
That's one thing you can do is checking that the content of the page actually contains the right thing.

00:30:34.300 --> 00:30:39.160
And then playwright is a UI, like a web testing tool.

00:30:39.160 --> 00:30:42.360
Playwright works really well with pytest.

00:30:42.660 --> 00:30:50.480
So I recommend using playwright anyway, if you're running a web application, because you can write pytest tests in playwright.

00:30:50.480 --> 00:30:55.180
Even better is if you want to get started with playwright, it has a code generator.

00:30:55.180 --> 00:30:58.820
So when you pip install it, you can run it in the code gen mode.

00:30:58.820 --> 00:31:05.020
It pops up a browser and then you just go on your website and just click around and then do what you would do normally.

00:31:05.020 --> 00:31:07.820
Type in, you know, fill in the forms, click on the buttons.

00:31:07.820 --> 00:31:13.260
And then whilst you're doing that in another tab, it actually generates all the Python code for the pytests.

00:31:13.260 --> 00:31:20.100
So it basically generates the pytest test code in a separate window automatically as you're clicking around in the browser.

00:31:20.100 --> 00:31:26.440
So like in terms of writing UI tests is quite difficult because often you have to be like, OK, how do I find the button?

00:31:26.440 --> 00:31:28.240
How do I click on the right button?

00:31:28.240 --> 00:31:29.580
How do I find the form?

00:31:29.580 --> 00:31:37.160
Especially with JavaScript, because it's not, you know, often things don't have a specific idea or you've got to figure out what selectors to use and stuff like that.

00:31:37.160 --> 00:31:39.860
So this makes it a lot easier because you can use the code gen.

00:31:39.860 --> 00:31:41.500
So that's a browser test.

00:31:41.500 --> 00:31:50.000
So with a load test normally in Locust, you're just making HTTP requests, but you're not actually rendering the page or running the AJAX or the JavaScript code.

00:31:50.000 --> 00:31:56.000
Whereas with Playwright, when you run Playwright, it's actually spinning up a browser and then driving the browser from Python.

00:31:56.000 --> 00:31:58.440
So you can plug Locust and Playwright together.

00:31:58.440 --> 00:32:00.940
There's a Playwright extension for Locust.

00:32:00.940 --> 00:32:06.560
So you can say, OK, each load test user, I actually want that to be a browser.

00:32:06.560 --> 00:32:12.920
And so when you say I want to test 100 concurrent users, it actually spins up 100 browsers.

00:32:14.300 --> 00:32:21.740
In a very, well, Playwright's actually really interesting how it works, but like there's a headless mode for browsers these days.

00:32:21.740 --> 00:32:24.960
So it doesn't actually run 100 windows like you can't see them.

00:32:24.960 --> 00:32:26.980
I can't use my computer while this is running.

00:32:26.980 --> 00:32:28.340
It's just overwhelmed by.

00:32:28.340 --> 00:32:35.060
But yeah, I don't recommend running 10,000 concurrents on a single laptop because it's going to run pretty slowly.

00:32:35.060 --> 00:32:36.260
It's going to have a bad time.

00:32:36.320 --> 00:32:39.740
But that is really important to actually test.

00:32:39.740 --> 00:32:40.720
So that will test it.

00:32:40.720 --> 00:32:42.300
All the stuff that we've been talking about.

00:32:42.300 --> 00:32:42.800
It'll test.

00:32:42.800 --> 00:32:44.700
If there's polling, it'll test that.

00:32:44.700 --> 00:32:52.560
If the CSS is not delivered over a CDN, it's going to go get that potentially white noise or wherever it's coming from.

00:32:52.560 --> 00:32:54.480
It's going to do all the things.

00:32:54.760 --> 00:33:01.680
Yeah, the challenge with it is that, you know, like you mentioned, running a browser, even if it's just a tab, uses a lot of resources.

00:33:01.680 --> 00:33:06.880
Whereas making a HTTP request using a request or something, you know, it doesn't really need anything.

00:33:06.880 --> 00:33:15.700
So you can quite happily in Locust, you know, make 100,000 requests and your local machine that you're testing on will be fine.

00:33:15.700 --> 00:33:16.220
It will.

00:33:16.220 --> 00:33:20.980
So, yeah, you can actually get low testing as a service.

00:33:20.980 --> 00:33:36.860
And the reason you'd probably want to use that is if you're testing a scenario where your local dev environment or your test environment just isn't big enough, where you need to distribute it, then you need more horsepower, basically, to go and run all the requests.

00:33:36.860 --> 00:33:42.420
Especially for something like Playwright, you know, where you need a lot of resources for the browser.

00:33:42.420 --> 00:33:48.920
Yeah, the little screenshot they have for Locust says 21,400 users are currently accessing the website.

00:33:48.920 --> 00:33:50.740
It's like, that's a lot of browser instances.

00:33:50.740 --> 00:33:52.160
That's a lot of browser instances.

00:33:52.160 --> 00:33:56.540
That looks like an API test, which would be a lot easier to do.

00:33:56.540 --> 00:33:57.420
APIs are easy.

00:33:57.420 --> 00:34:00.300
Yeah, so we, there's a service called Azure Low Testing.

00:34:00.300 --> 00:34:02.140
There is other options as well.

00:34:02.140 --> 00:34:06.400
I'm sure AWS has one and Google probably has one as well.

00:34:06.400 --> 00:34:07.760
Azure Low Testing, I know.

00:34:07.760 --> 00:34:11.860
And we are launching Locust support for that.

00:34:12.060 --> 00:34:14.120
At the moment, it supports Jamie to test.

00:34:14.120 --> 00:34:17.240
But yeah, we're going to be launching Locust support for that.

00:34:17.240 --> 00:34:21.220
I think by the time this episode comes out, it will probably be in public preview.

00:34:21.220 --> 00:34:23.760
So I've been using that and testing that.

00:34:23.760 --> 00:34:37.520
And the reason you would use it is, like I said, if you can run the Locust test locally on your machine and it runs great and brilliant, but, you know, you can ask us to spin up 50, 100, or even more instances running in parallel.

00:34:37.580 --> 00:34:44.500
And so if you want to do a really large scale test, then you can just basically get that from a cloud service provider like Azure.

00:34:44.500 --> 00:34:45.100
Yeah, awesome.

00:34:45.100 --> 00:34:46.920
I didn't realize you guys were bringing that online.

00:34:46.920 --> 00:34:47.300
That's cool.

00:34:47.300 --> 00:34:48.360
Yeah, nobody knows about us.

00:34:48.360 --> 00:34:49.360
Yeah.

00:34:49.740 --> 00:34:51.000
Can you do distributed?

00:34:51.000 --> 00:34:54.140
I think you can do some distributed stuff with the Locust even, can't you?

00:34:54.140 --> 00:34:54.920
Just, yeah.

00:34:54.920 --> 00:34:56.140
Yeah, you can.

00:34:56.140 --> 00:34:57.840
But you've got to have your own infrastructure, right?

00:34:57.840 --> 00:35:02.460
A lot of it uses SSH and you've got to, like, basically it kind of works over the shell.

00:35:02.460 --> 00:35:02.800
Yeah.

00:35:02.880 --> 00:35:07.600
So, yeah, we've kind of got our own distribution and, like, coordination system and stuff like that.

00:35:07.600 --> 00:35:15.220
And then also when you're running the test, your output, I guess, is what is the response time of the pages?

00:35:15.220 --> 00:35:17.380
And, like, what does that look like?

00:35:17.600 --> 00:35:25.340
And I think what confuses a lot of people is the first time they use Locust or JMeeter or one of the other tools, they get all these percentiles back.

00:35:25.340 --> 00:35:31.180
And they're like, oh, the 90th percentile response time is this and the 95th is this and the 99th is that.

00:35:31.180 --> 00:35:38.580
And they're like, okay, well, if I go back to my high school math, I think I can remember what percentiles are, but, like, which one matters?

00:35:38.580 --> 00:35:44.960
Because normally there's a massive difference between the 90th and the 99th percentiles with low testing.

00:35:44.960 --> 00:35:51.860
And, you know, you might say, oh, well, the 99th is like 10 seconds, but the 90th is like 300 milliseconds.

00:35:51.860 --> 00:35:54.680
So, you know, was that a good output or a bad output?

00:35:54.680 --> 00:35:56.940
I'm not really sure how to interpret the results.

00:35:56.940 --> 00:36:01.000
So, yeah, you'll see in the UI for Locust, it gives you percentiles.

00:36:01.000 --> 00:36:03.060
All the other low testing tools are very similar.

00:36:03.060 --> 00:36:09.240
It's basically like a distribution of the response times for those particular pages.

00:36:09.240 --> 00:36:14.180
And what you're trying to understand is, like, what is the expected response time?

00:36:14.180 --> 00:36:17.940
So, you know, if it's a bell curve, then what's the center point of that?

00:36:17.940 --> 00:36:26.200
Because the 99th is interesting, but often, like, it's because the cache was warming up or, you know, there was like one user that took 10 seconds.

00:36:26.200 --> 00:36:30.580
The cache expired and then it got recreated right then or something.

00:36:30.580 --> 00:36:31.480
Yeah, exactly.

00:36:31.660 --> 00:36:40.340
So for 99th percentile, you know, if it's 10 seconds, you might have one user that took 10 seconds and 99 users that took, you know, a couple of hundred milliseconds.

00:36:40.340 --> 00:36:48.100
So it's, you know, do you want to, you know, factor for that one user or do you want to focus on the other, on the bulk of the group?

00:36:48.100 --> 00:36:48.300
Yeah.

00:36:48.300 --> 00:36:50.600
And if you're like my daughter, you'll just say that the Internet's broken.

00:36:50.600 --> 00:36:51.500
Yeah.

00:36:51.500 --> 00:36:56.780
It could be, or it could be that YouTube is slow for some odd reason for you for a minute.

00:36:56.780 --> 00:36:59.500
Like, it's not necessarily the entire Internet that is the fault here.

00:36:59.560 --> 00:36:59.760
Yeah.

00:36:59.760 --> 00:37:05.380
Well, most users, I think if it took 10 seconds to respond with just clicking the refresh button.

00:37:05.380 --> 00:37:08.420
Yeah, exactly.

00:37:08.420 --> 00:37:10.160
So just generate more load.

00:37:10.160 --> 00:37:10.560
Yeah.

00:37:10.560 --> 00:37:12.340
You might have to program that into your load test.

00:37:12.340 --> 00:37:16.100
If it takes longer than five seconds, then issue another three requests.

00:37:16.100 --> 00:37:16.420
Yeah.

00:37:16.420 --> 00:37:17.440
That's a really good point, actually.

00:37:17.440 --> 00:37:18.540
Let's see.

00:37:18.640 --> 00:37:21.700
Are there any pictures of the reports here?

00:37:21.700 --> 00:37:22.320
Let's see.

00:37:22.320 --> 00:37:22.840
Yeah.

00:37:22.840 --> 00:37:26.700
You got these, you got these nice graphs that shows you response times.

00:37:26.700 --> 00:37:27.180
Yeah.

00:37:27.180 --> 00:37:28.160
They're not just graphs.

00:37:28.160 --> 00:37:28.820
They're live graphs.

00:37:28.820 --> 00:37:32.160
You can kind of see it like flowing as it's testing, right?

00:37:32.160 --> 00:37:33.960
As it's ramping up or whatever.

00:37:33.960 --> 00:37:34.780
Yeah, definitely.

00:37:35.140 --> 00:37:39.020
So what we do with the load testing services is we've got that graph.

00:37:39.020 --> 00:37:44.940
But then you can also say, I also want to see how many crews a second were happening on the database.

00:37:44.940 --> 00:37:47.580
What was the memory usage of the web applications?

00:37:47.580 --> 00:37:51.700
Like how many, like what was the pod size if you're using Kubernetes?

00:37:51.700 --> 00:37:52.960
Like stuff like that.

00:37:53.180 --> 00:38:06.980
I've got a couple of demos where I've got like parameter-based load testing, where when you're setting up a Kubernetes environment, you're like, oh, I'm not sure how much memory I should allocate the containers or like how many of them I should have in a pod.

00:38:06.980 --> 00:38:13.180
So you can basically just use something like GitHub Actions to give a matrix and say, okay, I want to test these configurations.

00:38:13.180 --> 00:38:18.060
Let's see what happens with half a gig of RAM per container or two gigs of RAM per container.

00:38:18.060 --> 00:38:20.840
Let's do two, four, eight in a cluster.

00:38:21.180 --> 00:38:23.540
And it will run the same load tests for every configuration.

00:38:23.540 --> 00:38:31.260
And then you can just compare all the graphs and say, okay, we don't want to over-allocate infrastructure because, you know, people often go a bit nuts.

00:38:31.260 --> 00:38:34.580
They're like, oh, how many instances in a cluster do we need on the front end?

00:38:34.580 --> 00:38:36.420
And they allocate like 16.

00:38:36.420 --> 00:38:39.260
But then when it's actually running, 15 of them are idle.

00:38:39.260 --> 00:38:41.960
So it's kind of just over-provisioning.

00:38:41.960 --> 00:38:49.140
So load testing can be a great way of not just like planning for spikes in traffic or being able to like cater for it.

00:38:49.260 --> 00:38:55.220
But actually the other way around, which is you run a load test and you realize you could probably actually do with less infrastructure.

00:38:55.220 --> 00:39:01.480
So like things like memory as well, like memory is expensive when you're buying it from a service provider.

00:39:01.920 --> 00:39:04.940
And, you know, CPUs and stuff like that or the number of instances.

00:39:04.940 --> 00:39:08.420
And actually, let's see what happens if we turn some of those things down.

00:39:08.420 --> 00:39:10.880
Will that impact the response times?

00:39:10.880 --> 00:39:12.720
Can we get away with it basically?

00:39:12.720 --> 00:39:19.080
Because in a lot of cases, you can actually spend a lot less money and get the same performance or even just a negligible difference.

00:39:19.320 --> 00:39:23.200
Or maybe you identify somewhere where you could add some level of caching.

00:39:23.200 --> 00:39:23.620
Yeah.

00:39:23.620 --> 00:39:28.640
And then all of a sudden you get, you can preview, you get 10x the load per worker process.

00:39:28.640 --> 00:39:32.380
You know, we could, we could have a smaller machine or smaller cluster or whatever.

00:39:32.380 --> 00:39:33.220
Yeah, exactly.

00:39:33.500 --> 00:39:39.840
So one, one thing I want, would like maybe to talk to the folks about is just how do you interpret these graphs?

00:39:39.840 --> 00:39:41.840
So you have like a request per second.

00:39:42.080 --> 00:39:47.480
And then you talked about, say, the 95th percentile response time and this ramping up.

00:39:47.480 --> 00:39:51.660
And usually when you look at these graphs, it's really obvious, like, yeah, we can add more users.

00:39:51.660 --> 00:39:57.080
But here is where we start to suffer consequences if we add any more users than this.

00:39:57.080 --> 00:39:59.280
Doesn't always just completely fall over.

00:39:59.280 --> 00:40:01.620
It just behaves worse until it does.

00:40:01.620 --> 00:40:05.800
It's kind of a bit like a feedback system for anyone who studies that.

00:40:05.800 --> 00:40:11.880
When you configure a ramp in Locust, you're saying, okay, how many users per second do we want to add?

00:40:11.880 --> 00:40:16.500
And you start off with a slow ramp is my suggestion.

00:40:16.500 --> 00:40:19.780
So like, you know, every 10 seconds, add one user.

00:40:19.780 --> 00:40:21.800
And would like be a really slow way of doing it.

00:40:21.800 --> 00:40:28.580
And then what you're looking at is the response times for each page, or you can get an average of everything.

00:40:28.580 --> 00:40:34.240
So if you're starting off with a load test that just tests one page, you slowly ramp up the users.

00:40:34.240 --> 00:40:36.640
Then you're looking at the response time graph.

00:40:36.920 --> 00:40:42.220
Often when you run a load test where it starts, the response times spike up at the beginning.

00:40:42.640 --> 00:40:45.400
Because, you know, the service was probably asleep.

00:40:45.400 --> 00:40:47.520
The database server was probably asleep.

00:40:47.520 --> 00:40:51.400
You know, it needs to like kick a few things into action to get it responding.

00:40:51.400 --> 00:40:53.640
So you often see like a spike at the beginning.

00:40:53.640 --> 00:40:54.400
That's fine.

00:40:54.400 --> 00:40:55.220
Don't worry about that.

00:40:55.220 --> 00:40:58.300
If as long as it's a short spike and not, it doesn't go on for hours.

00:40:58.400 --> 00:41:02.760
But we know once everything's warmed up, then you should see a stable graph.

00:41:02.760 --> 00:41:09.420
So the response times should stick around the same level, even as you add more and more users to a point.

00:41:09.420 --> 00:41:14.900
So it's basically looking at the two graphs, which is the response time and the number of users,

00:41:14.900 --> 00:41:21.840
and trying to understand how many users does it need to get to before that response timeline starts going up.

00:41:22.080 --> 00:41:26.380
And you basically know that's where you've introduced some sort of bottleneck.

00:41:26.380 --> 00:41:27.780
Something's reaching its limit.

00:41:27.780 --> 00:41:28.080
Yeah.

00:41:28.080 --> 00:41:33.380
That's where someone, you know, people are all of a sudden sitting in a queue somewhere.

00:41:33.380 --> 00:41:36.060
The hard part is actually figuring out where that queue is.

00:41:36.060 --> 00:41:39.780
If you don't have the time, but you've got the money, you can just throw more infrastructure at it.

00:41:39.780 --> 00:41:45.360
But often you can fix some of those queues by looking at where, what are we caching and where?

00:41:45.480 --> 00:41:48.560
Right. Look at the CPU usage of the various aspects.

00:41:48.560 --> 00:41:51.020
What's the database CPU load?

00:41:51.020 --> 00:41:52.800
What's the web process?

00:41:52.800 --> 00:41:59.340
If the web bit is kind of chill, but the database is at 100%, maybe you need better indexes or something.

00:41:59.340 --> 00:42:02.220
Yeah, because often you look at the infrastructure on the back end.

00:42:02.220 --> 00:42:07.400
And even though the response times are going up, so there's a bottleneck, the CPU is maybe still at 60%.

00:42:07.400 --> 00:42:09.680
And the memory might only still be at 50%.

00:42:09.680 --> 00:42:15.640
So you're like, okay, you know, more CPUs isn't necessarily the issue, but things are getting stuck somewhere.

00:42:15.640 --> 00:42:25.000
And that might be that, you know, each page, actually, I remember one low test I did where each, it was a PHP application using like some framework.

00:42:25.000 --> 00:42:28.600
And the performance like was horrible and we couldn't figure out why.

00:42:28.600 --> 00:42:34.460
And we put on some debugging tools and realized that every single page ran 200 SQL queries.

00:42:34.460 --> 00:42:36.580
Because of the way they've written it.

00:42:36.700 --> 00:42:38.100
It was like, oh, that'd be why.

00:42:38.100 --> 00:42:43.120
Because, you know, you're looking at the CPU and memory of the actual web servers and they're fine.

00:42:43.120 --> 00:42:49.100
Like the web servers are just like basically just sitting there continually waiting for the database server to respond.

00:42:49.100 --> 00:42:54.380
So, you know, you look at the resource usage and you might be thinking, well, why is it getting slower?

00:42:54.380 --> 00:42:59.600
But often it's because you're making calls to the database or a backend API or something.

00:42:59.600 --> 00:43:02.260
And it's just sitting there idle waiting for the responses.

00:43:02.600 --> 00:43:14.620
So I know you mentioned the show a few times, but like the N plus one issues and stuff like that for ORMs in particular is where you get those types of scale where you're, you should not be seeing that many SQL queries.

00:43:14.620 --> 00:43:21.460
It's so easy to do because you write the same code, get the things and then loop over some and interact with some aspect of them.

00:43:21.460 --> 00:43:26.780
And if you don't eagerly do the query at the first part, each one of those is a separate query.

00:43:26.780 --> 00:43:28.240
The more you get back, the worse it is.

00:43:28.240 --> 00:43:28.920
Yeah, exactly.

00:43:28.920 --> 00:43:35.080
And I mentioned like an AI app, like with this LLM rag app that we did a low test on with like 10 users.

00:43:35.080 --> 00:43:38.700
And like the CPU and memory were fine on the front end because it's not really doing anything.

00:43:38.700 --> 00:43:39.980
It's just calling the LLM.

00:43:39.980 --> 00:43:43.280
But it hit like a token limit, like really, really quickly.

00:43:43.700 --> 00:43:50.800
And because we were capturing that and tracing it, then we could see like that's what the bottleneck was, that it was getting rate limited on the on the back end.

00:43:50.800 --> 00:43:58.900
One of the challenges that people can run into, this is really important, is testing with way less data than you're going to have in production.

00:43:58.900 --> 00:44:04.900
I've got 10 entries in the database because that's what I bothered to test type in while I was playing with the app.

00:44:04.900 --> 00:44:06.780
But I've got a million in production.

00:44:06.780 --> 00:44:07.800
Stuff like that, right?

00:44:07.800 --> 00:44:11.000
I've got three products and there's a million in production.

00:44:11.000 --> 00:44:24.380
I've seen this a few times where people have done load testing and they're like, oh, on the all users page or on the list products page, like it runs super fast because they've deployed the dev environment and it's got like no products in the database.

00:44:24.380 --> 00:44:28.820
It doesn't matter if you have indexes because there's only three things, just return them all.

00:44:28.820 --> 00:44:29.100
Yeah.

00:44:29.100 --> 00:44:29.680
Or it's good.

00:44:29.680 --> 00:44:29.840
Yeah.

00:44:29.840 --> 00:44:34.880
And you don't see things like the N plus one because, you know, there's only like a couple of products, if any at all.

00:44:34.880 --> 00:44:40.120
So you often want to seed the application with as much fake data as possible.

00:44:40.540 --> 00:44:53.780
And the library that I absolutely love for this is called Mimesis, which is a fake data generator, kind of similar to Faker, but it's got a lot more support for like different languages and environments and stuff like that.

00:44:53.780 --> 00:45:03.800
So if you wanted to say, okay, let's create a hundred thousand users and you need to generate names and addresses and locations and stuff like that, you can do that using Mimesis really easily.

00:45:04.280 --> 00:45:10.440
And also if you want to do like test like different cultures or locales.

00:45:10.440 --> 00:45:19.220
So, you know, not just testing like English names, but testing all sorts of different countries and stuff like that or different phone numbers, then yeah, you can use Mimesis to do that.

00:45:19.220 --> 00:45:19.440
Yeah.

00:45:19.440 --> 00:45:21.420
It's got all kinds of different things.

00:45:21.420 --> 00:45:25.120
You get credit cards and phone numbers and all kinds of stuff.

00:45:25.120 --> 00:45:25.360
Yeah.

00:45:25.440 --> 00:45:31.220
I need a hundred thousand Brazilian phone numbers and it will just give you them in exactly the right format.

00:45:31.220 --> 00:45:31.840
Yeah.

00:45:31.840 --> 00:45:33.860
So then you save those to your database once.

00:45:33.860 --> 00:45:34.240
Yeah.

00:45:34.280 --> 00:45:42.380
And then you can run your tests and see if you have your plus one problem or you, your indexes don't fit into memory or whatever the problem might be.

00:45:42.380 --> 00:45:42.580
Right.

00:45:42.580 --> 00:45:42.880
Yeah.

00:45:42.880 --> 00:45:49.000
So for Django or for SQLAlchemy, you can do the, like these load and dump commands.

00:45:49.000 --> 00:46:01.040
So what I kind of recommend is to keep it fast is to use Mimesis to generate a seed data file, even if that's like JSON or something, and then just do like a bulk load in the test environment.

00:46:01.040 --> 00:46:02.780
And then you can reset it every time.

00:46:02.780 --> 00:46:06.380
So you can basically just do a rollback and then just reset and do a bulk load.

00:46:06.380 --> 00:46:10.700
If you're using like a document database, then they've got similar tools for that.

00:46:10.700 --> 00:46:15.360
So if you're using Mongo, then you could just do a bulk load from like a test command, basically.

00:46:15.360 --> 00:46:15.700
Right.

00:46:15.700 --> 00:46:18.480
Just do a load it up with some fake data, do a Mongo dump.

00:46:18.480 --> 00:46:22.600
And then whenever you're ready to reset it, just do a Mongo restore --drop and it'll.

00:46:22.600 --> 00:46:23.240
Yeah, exactly.

00:46:23.240 --> 00:46:23.880
From scratch.

00:46:23.880 --> 00:46:24.120
Yes.

00:46:24.120 --> 00:46:24.460
Yep.

00:46:24.460 --> 00:46:26.580
And you can reuse those for your integration tests as well.

00:46:26.580 --> 00:46:35.640
If you're writing, if you've got integration tests with Django or Flask, then often you need a database fixture and you want to seed that with some information, then you can just reuse the same data.

00:46:35.640 --> 00:46:37.200
But also do transactions, right?

00:46:37.200 --> 00:46:40.020
Like put a, put a transactions that's always rolled back.

00:46:40.020 --> 00:46:42.920
So once you load it up, you're not, you're not breaking it.

00:46:43.140 --> 00:46:45.820
But that's tricky if the code itself calls commit, I suppose.

00:46:45.820 --> 00:46:46.560
Yeah.

00:46:46.560 --> 00:46:48.020
Some idea.

00:46:48.020 --> 00:46:49.120
No, it changed it.

00:46:49.120 --> 00:46:49.440
Darn it.

00:46:49.440 --> 00:46:49.660
Yeah.

00:46:49.660 --> 00:46:51.160
There was a question in the chat.

00:46:51.160 --> 00:46:57.860
How does load testing change when you have non-deterministic processes, e.g. LLM queries, which is a really good question.

00:46:57.860 --> 00:47:03.660
And I think it's kind of related to this where you've got a, you want to introduce like an element of randomness.

00:47:03.660 --> 00:47:14.560
But we talked about like a, you know, a user search page, or if you've got like a chat feature or something, then you want to kind of vary the question, especially if you've got any kind of caching.

00:47:14.560 --> 00:47:27.320
So that is tricky though, because at the moment, like the LLM calls, depending on which model you're using and how it's set up and stuff like that, like, but they can take, you know, a second to 10 seconds to get a response.

00:47:27.320 --> 00:47:27.760
Easy.

00:47:27.860 --> 00:47:28.000
Yeah.

00:47:28.000 --> 00:47:30.300
So, you know, you need to kind of factor that in.

00:47:30.300 --> 00:47:37.440
It's like, how is your app, how does your app handle sitting and waiting for 10 seconds before it gets a response back?

00:47:37.440 --> 00:47:47.880
And often you'll find that you'll max out the number of like threads you've got, all the number of workers in like G Unicorn or Uvicorn or something like that, because they're all just sitting there waiting.

00:47:47.880 --> 00:47:50.580
That's a place where async and await would be pretty awesome.

00:47:50.580 --> 00:47:50.800
Yeah.

00:47:50.800 --> 00:47:51.080
Right.

00:47:51.080 --> 00:47:55.280
Because you can sort of let the thread go and just, you'll get back to it.

00:47:55.280 --> 00:47:57.680
Also, maybe you just need a different architecture.

00:47:58.140 --> 00:48:00.980
You know, maybe it's not just more caching or more hardware.

00:48:00.980 --> 00:48:06.560
It's like, so we're going to put the, there's a pending request in the database and we're going to push that off to somewhere.

00:48:06.560 --> 00:48:14.040
And then we'll just set up like some kind of JavaScript thing to check if it's done and then pull down the answer or, you know, something that's not necessarily blocking potentially.

00:48:14.040 --> 00:48:17.600
I think in a way it's great for, like you mentioned, great for that.

00:48:17.600 --> 00:48:23.440
Actually, there was an announcement pretty recently that you can run Uvicorn now, like as a production.

00:48:23.440 --> 00:48:23.980
Yeah.

00:48:23.980 --> 00:48:25.960
Without being wrapped through G Unicorn.

00:48:25.960 --> 00:48:26.340
Yeah.

00:48:26.340 --> 00:48:27.040
That's awesome.

00:48:27.160 --> 00:48:33.420
For this kind of scenario we just talked about where you're waiting on backend calls and stuff like that, use the async version of those libraries.

00:48:33.420 --> 00:48:37.140
Like if it's OpenAI, there's an async version of the OpenAI SDK.

00:48:37.140 --> 00:48:38.220
Use that.

00:48:38.220 --> 00:48:46.900
And then at least like Uvicorn is going to just have, it will quite happily run like hundreds and hundreds of those requests that are just waiting on backend calls.

00:48:46.900 --> 00:48:47.300
Absolutely.

00:48:47.300 --> 00:48:53.140
So one thing I want to throw in here, this is coming from me, not from you, but I want to get your thoughts on it.

00:48:53.540 --> 00:48:56.800
So I recently set up this thing called Uptime Kuma.

00:48:56.800 --> 00:48:58.300
Are you familiar with Uptime Kuma?

00:48:58.300 --> 00:48:59.160
No.

00:48:59.360 --> 00:49:03.860
So this is an open source, self-hosted Uptime monitoring tool.

00:49:03.860 --> 00:49:05.840
And just run it in Docker.

00:49:06.000 --> 00:49:06.480
Off it goes.

00:49:06.480 --> 00:49:11.980
And one of the things that's nice about this is, well, I, over on Talk Python.

00:49:11.980 --> 00:49:17.660
Now, if you go to the bottom, it's got a server status and you can come in here, you can see the websites behaving well.

00:49:17.760 --> 00:49:22.940
The RSS feed is behaving well, the courses size, but also the mobile API, all these things.

00:49:22.940 --> 00:49:26.580
But the reason I think this is, I mean, status, like, is it working or not?

00:49:26.580 --> 00:49:27.520
It's not really that relevant.

00:49:27.520 --> 00:49:28.760
That's like an operational thing.

00:49:28.760 --> 00:49:36.760
But what's cool about it is if you go and dig into the thing, you go to the dashboard, like, let's say, you could even group them, right?

00:49:36.760 --> 00:49:38.920
So like, let's do this.

00:49:38.920 --> 00:49:39.660
I'll do the API.

00:49:39.660 --> 00:49:50.080
So for like the mobile apps API or the mobile app for Talk Python courses, you can actually see over time what it's seen in response time for like days.

00:49:50.080 --> 00:49:57.440
So with your load tests, you're saying, let me just hammer it and see how long, how it responds to that, right?

00:49:57.440 --> 00:50:01.180
And if you point something like this, or there's many other tools like this, but this is a nice one.

00:50:01.180 --> 00:50:06.660
You can just say, just keep a record and let me go back and look for, you know, for 24 hours.

00:50:06.660 --> 00:50:08.160
How is the load looked?

00:50:08.320 --> 00:50:09.780
And you can see there's some weird spike there.

00:50:09.780 --> 00:50:12.760
It probably did like a deployment, like right around then or something.

00:50:12.760 --> 00:50:17.240
But in general, you know, 40, 50 milliseconds, 35, 81, right?

00:50:17.240 --> 00:50:18.520
It's handling it.

00:50:18.520 --> 00:50:25.440
Because if the thing was overwhelmed, then these tests would also suffer the latency that everyone else is suffering, you know?

00:50:25.440 --> 00:50:27.100
And this is like every 30 seconds.

00:50:27.100 --> 00:50:28.000
Yeah, that's really cool.

00:50:28.000 --> 00:50:30.500
I do want to give a shout out to OpenTelemetry.

00:50:30.500 --> 00:50:37.800
I hope more people, this, every year, more and more people find out about this and they're like, oh, this solves a lot of my problems.

00:50:37.800 --> 00:50:42.240
It's a CNCF project.

00:50:42.240 --> 00:50:44.340
So it's a big open source standard.

00:50:44.340 --> 00:50:49.820
It's basically is a standard for instrumentation, like an observability.

00:50:49.820 --> 00:50:54.320
So you kind of plug these instrumentation libraries into your app.

00:50:54.320 --> 00:51:02.740
And there's ones for fast, FastAPI, Django, Flask, like the backends, like Mongo, like SQLAlchemy, stuff like that.

00:51:02.920 --> 00:51:12.060
And you basically just install these packages and it will just start to capture all this information about, oh, there's a request and it had these parameters and it took this long.

00:51:12.060 --> 00:51:19.700
And so in Python, like with a very small amount of code and a few packages, you can instrument all this stuff in your application.

00:51:19.700 --> 00:51:24.960
And then the other thing that OpenTelemetry does is it will have, it's kind of like a pluggable exporter.

00:51:25.500 --> 00:51:33.040
So wherever you want to send all that data to, you can kind of pick and choose like which platform you want to send it to.

00:51:33.040 --> 00:51:36.240
And there are some like local ones you can use as well.

00:51:36.240 --> 00:51:38.360
Like you showed this tool for uptime.

00:51:38.360 --> 00:51:40.840
There's some Docker containers for that.

00:51:40.840 --> 00:51:46.180
So like you just spin up a Docker container and it will just take all that telemetry data and give it to you in a GUI.

00:51:46.240 --> 00:51:52.660
So you can see like all the traces, how long they took, what calls it made on the backend, what queries it was running.

00:51:52.660 --> 00:51:55.400
Like OpenTelemetry is a brilliant way of doing that.

00:51:55.400 --> 00:51:56.600
And it's not just Python.

00:51:56.600 --> 00:52:04.020
Like if you've got some components of your application that are written in other languages, like odds are that it is also supported.

00:52:04.020 --> 00:52:07.660
So it's kind of like a framework, I guess, for capturing data.

00:52:07.660 --> 00:52:16.220
And when you're doing load tests, this is how we do like postmortems to figure out or even just looking at the stats and seeing like where did things go slow.

00:52:16.220 --> 00:52:21.780
And I've got some videos and stuff of demos I've done with this where I've got like applications.

00:52:21.780 --> 00:52:25.040
I've done a load test on them and then I can see, oh, there was a spike in load.

00:52:25.040 --> 00:52:30.720
Let's go back and look at the data to see what caused that and which calls was it?

00:52:30.720 --> 00:52:32.720
You know, what was the resources, the memory usage?

00:52:32.720 --> 00:52:36.860
And often like, you know, was it the database or was it an API call?

00:52:36.860 --> 00:52:38.540
And why did it take so long?

00:52:38.540 --> 00:52:40.380
What were the special parameters, stuff like that?

00:52:40.380 --> 00:52:40.700
So you.

00:52:40.700 --> 00:52:41.940
What were we waiting on?

00:52:41.940 --> 00:52:43.800
Was it the actual web app or was it a database?

00:52:43.800 --> 00:52:45.920
But like you can retroactively know that, right?

00:52:46.080 --> 00:52:46.240
Yeah.

00:52:46.240 --> 00:52:48.340
And I can tell them, she's a great, great way of doing that.

00:52:48.340 --> 00:52:48.620
Yeah.

00:52:48.620 --> 00:52:49.380
That's super cool.

00:52:49.380 --> 00:52:53.240
That's way more holistic than my just give me a graph or response time.

00:52:53.240 --> 00:52:53.580
Yeah.

00:52:53.580 --> 00:52:54.640
You know, that's still cool.

00:52:54.640 --> 00:52:56.340
It's still cool.

00:52:56.340 --> 00:52:59.260
It's, you know, 20 minutes to set it up in the Docker cluster.

00:52:59.260 --> 00:52:59.640
Yeah.

00:52:59.640 --> 00:53:00.020
Awesome.

00:53:00.020 --> 00:53:03.080
Well, let's talk about, I know we're basically out of time, Anthony.

00:53:03.080 --> 00:53:07.620
Let's just talk really quickly about one thing that I think is a little bit tricky.

00:53:07.620 --> 00:53:13.080
And maybe just to get your thoughts on, you know, my Vercel example of cost is sort of

00:53:13.080 --> 00:53:13.740
in this space.

00:53:13.740 --> 00:53:15.440
And that's serverless.

00:53:15.440 --> 00:53:17.020
What are your thoughts on serverless?

00:53:17.020 --> 00:53:22.460
Like you have way more control when you have a VM or you've got a Kubernetes cluster or whatever

00:53:22.460 --> 00:53:23.340
it is you're working with.

00:53:23.340 --> 00:53:23.560
Right.

00:53:23.680 --> 00:53:24.040
I don't know.

00:53:24.040 --> 00:53:25.620
I always kind of see serverless.

00:53:25.620 --> 00:53:30.160
It's a really cool idea, but it's more of a commercial concept than a technical one.

00:53:30.160 --> 00:53:31.240
Yeah.

00:53:31.240 --> 00:53:34.300
I'm thinking more about just the, you don't control the warmup.

00:53:34.300 --> 00:53:34.620
Yeah.

00:53:34.620 --> 00:53:36.040
You don't control the machine.

00:53:36.040 --> 00:53:39.100
You know, there's just a lot of stuff that's black box to you.

00:53:39.100 --> 00:53:45.540
I think most of the platforms and most of the providers have got a, like a range of options.

00:53:45.540 --> 00:53:50.840
Like I speak to Azure Functions, which is like our serverless Python engine.

00:53:51.160 --> 00:53:56.360
You can use that as like just pay per request, but you know, it then has certain optimizations

00:53:56.360 --> 00:53:58.460
so that it has like a warm startup time.

00:53:58.460 --> 00:54:02.840
So, you know, if it's, you know, if you haven't made any requests for, you know, a certain number

00:54:02.840 --> 00:54:04.700
of hours, then the application will fall asleep.

00:54:04.700 --> 00:54:07.060
So it's not using resources and we're not charging you money.

00:54:07.060 --> 00:54:10.740
But if you don't want to have that, if you want the startup time to be super fast, then

00:54:10.740 --> 00:54:15.480
you can use a different service level basically and have like faster startups.

00:54:15.480 --> 00:54:20.620
You can have like pre-allocated resources, which then becomes not serverless, but it's

00:54:20.620 --> 00:54:21.360
the same framework.

00:54:21.360 --> 00:54:23.520
It's the same architecture and Lambda is the same.

00:54:23.520 --> 00:54:25.000
AWS Lambda is the same.

00:54:25.000 --> 00:54:27.620
You can kind of set it up in those different options.

00:54:27.620 --> 00:54:32.560
Things kind of get tricky when you get into Kubernetes because you've got real infrastructure

00:54:32.560 --> 00:54:33.460
on the background.

00:54:33.460 --> 00:54:38.720
And, you know, when you look at Kubernetes clusters or Kubernetes as a service or however you're

00:54:38.720 --> 00:54:42.680
setting it up, the first question is like, how many VMs do I need to provision and how

00:54:42.680 --> 00:54:43.560
big do they need to be?

00:54:43.900 --> 00:54:50.060
So yes, you're kind of building like a serverless abstraction on top of it, but you've got real

00:54:50.060 --> 00:54:51.160
infrastructure in the background.

00:54:51.160 --> 00:54:56.000
That's actually really hard because in a lot of cases, you've just got tons of idle infrastructure.

00:54:56.000 --> 00:55:01.720
So I think low testing is a good way of looking at like trying to right size what you've got

00:55:01.720 --> 00:55:02.360
provisioned.

00:55:02.360 --> 00:55:05.940
In many cases, scaling down some pits and scaling up others.

00:55:06.120 --> 00:55:06.900
I don't mess with Kubernetes.

00:55:06.900 --> 00:55:08.180
It's too much for me.

00:55:08.180 --> 00:55:09.420
I don't need all that.

00:55:09.420 --> 00:55:11.080
I don't need all that stuff.

00:55:11.080 --> 00:55:13.240
I do use Docker though, which is really, really nice.

00:55:13.240 --> 00:55:17.540
And I know a lot of these tools we talked about support running the Docker or working with Docker

00:55:17.540 --> 00:55:18.340
and so on.

00:55:18.340 --> 00:55:18.720
Yeah.

00:55:18.960 --> 00:55:19.300
All right.

00:55:19.300 --> 00:55:25.500
Well, hopefully people have some really concrete ideas and tools that they can use like Locus.io,

00:55:25.500 --> 00:55:27.120
which we're both huge fans of.

00:55:27.120 --> 00:55:32.260
Playwright, same deal, but also some of the philosophy and ideas behind it, which is super

00:55:32.260 --> 00:55:32.760
important too.

00:55:32.760 --> 00:55:36.380
So much appreciated you coming on the show to share that and just the chance to catch up.

00:55:36.380 --> 00:55:41.640
How about you give us a quick wrap up, important takeaways for people who want to go out and

00:55:41.640 --> 00:55:42.320
test their stuff now?

00:55:42.320 --> 00:55:42.580
Yeah.

00:55:42.580 --> 00:55:48.200
So I think step one is to look at your application and understand what your users are likely to

00:55:48.200 --> 00:55:48.420
do.

00:55:48.740 --> 00:55:52.760
So if you want to design a low test, start simple and maybe even start with Playwright

00:55:52.760 --> 00:55:56.920
because you can just, you just spin up the browser recorder, click around in the website

00:55:56.920 --> 00:55:59.680
and, you know, simulate what a user would be doing.

00:55:59.680 --> 00:56:02.860
Stitch that together with Locust and test 10 users.

00:56:02.860 --> 00:56:03.580
Don't go nuts.

00:56:03.580 --> 00:56:09.100
Start off with a small number, simple test, and you will uncover the things that need optimizing.

00:56:09.100 --> 00:56:13.720
I don't think I've ever encountered an application that just ran really efficiently the first time

00:56:13.720 --> 00:56:15.720
and then just keep working on that.

00:56:15.720 --> 00:56:18.520
So yeah, instead, because often you would end up trying to,

00:56:18.520 --> 00:56:21.900
kind of optimize things that are not going to get touched or don't really make a difference

00:56:21.900 --> 00:56:23.320
when you actually test it.

00:56:23.320 --> 00:56:24.580
So yeah, start simple.

00:56:24.580 --> 00:56:29.040
I recommend using Locust and Playwright if you want, or you can just write a simple Locust

00:56:29.040 --> 00:56:34.420
test and then put the instrumentation in the backend so that you can see not just the response

00:56:34.420 --> 00:56:39.580
times, but you can see as much data as you can, as possible on what's happening and how

00:56:39.580 --> 00:56:40.300
long it's taking.

00:56:40.300 --> 00:56:46.200
And I'll share like a couple of links of some simple dashboards you can use with OpenTelemetry,

00:56:46.200 --> 00:56:50.820
whether it will capture that data locally or in the cloud and show you, you know, a trace

00:56:50.820 --> 00:56:51.860
of every request.

00:56:51.860 --> 00:56:52.420
Awesome.

00:56:52.420 --> 00:56:54.480
And do that with real data.

00:56:54.480 --> 00:56:55.680
Don't do it with three entries.

00:56:55.680 --> 00:56:56.240
Yeah.

00:56:57.100 --> 00:57:00.100
It's not going to, not going to mean what you think it means if you do it with only

00:57:00.100 --> 00:57:00.660
three entries.

00:57:00.660 --> 00:57:00.960
Yeah.

00:57:00.960 --> 00:57:01.320
All right.

00:57:01.320 --> 00:57:03.000
Well, always great to have you on the show.

00:57:03.000 --> 00:57:04.180
Thanks for being here, Anthony.

00:57:04.180 --> 00:57:04.900
Catch you later.

00:57:04.900 --> 00:57:05.480
Great to be back.

00:57:05.480 --> 00:57:06.100
Thanks for much.

00:57:06.160 --> 00:57:06.340
Yeah.

00:57:06.340 --> 00:57:06.760
Bye.

00:57:06.760 --> 00:57:10.400
This has been another episode of Talk Python To Me.

00:57:10.400 --> 00:57:12.240
Thank you to our sponsors.

00:57:12.240 --> 00:57:13.840
Be sure to check out what they're offering.

00:57:13.840 --> 00:57:15.260
It really helps support the show.

00:57:15.260 --> 00:57:17.060
Take some stress out of your life.

00:57:17.060 --> 00:57:22.540
Get notified immediately about errors and performance issues in your web or mobile applications with

00:57:22.540 --> 00:57:22.840
Sentry.

00:57:22.840 --> 00:57:27.840
Just visit talkpython.fm/sentry and get started for free.

00:57:27.840 --> 00:57:31.420
And be sure to use the promo code talkpython, all one word.

00:57:31.420 --> 00:57:34.320
This episode is brought to you by WorkOS.

00:57:34.520 --> 00:57:39.040
If you're building a B2B SaaS app, at some point, your customers will start asking for

00:57:39.040 --> 00:57:44.540
enterprise features like SAML authentication, skim provisioning, audit logs, and fine-grained

00:57:44.540 --> 00:57:45.260
authorization.

00:57:45.260 --> 00:57:51.200
WorkOS helps ship enterprise features on day one without slowing down your core product development.

00:57:51.200 --> 00:57:55.080
Find out more at talkpython.fm/workos.

00:57:55.080 --> 00:57:56.580
Want to level up your Python?

00:57:56.580 --> 00:58:00.600
We have one of the largest catalogs of Python video courses over at Talk Python.

00:58:01.060 --> 00:58:05.800
Our content ranges from true beginners to deeply advanced topics like memory and async.

00:58:05.800 --> 00:58:08.460
And best of all, there's not a subscription in sight.

00:58:08.460 --> 00:58:11.360
Check it out for yourself at training.talkpython.fm.

00:58:11.360 --> 00:58:13.480
Be sure to subscribe to the show.

00:58:13.480 --> 00:58:16.260
Open your favorite podcast app and search for Python.

00:58:16.260 --> 00:58:17.560
We should be right at the top.

00:58:17.560 --> 00:58:22.740
You can also find the iTunes feed at /itunes, the Google Play feed at /play,

00:58:22.740 --> 00:58:26.940
and the direct RSS feed at /rss on talkpython.fm.

00:58:26.940 --> 00:58:29.900
We're live streaming most of our recordings these days.

00:58:29.900 --> 00:58:33.300
If you want to be part of the show and have your comments featured on the air,

00:58:33.300 --> 00:58:37.680
be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

00:58:37.680 --> 00:58:39.780
This is your host, Michael Kennedy.

00:58:39.780 --> 00:58:41.080
Thanks so much for listening.

00:58:41.080 --> 00:58:42.240
I really appreciate it.

00:58:42.480 --> 00:58:44.140
Now get out there and write some Python code.

00:58:44.140 --> 00:59:05.020
I'll see you next time.

