Monitor performance issues & errors in your code

#352: Running Python in Production Transcript

Recorded on Wednesday, Jan 12, 2022.

00:00 Do we talk about running Python in production enough? I can tell you. The Talk Python Infrastructure courses, podcasts, APIs, and so on get a fair amount of traffic, but they look nothing like what Google or Instagram or Insert Big tech name here's deployments look like. Yet mostly we hear about interesting feats of engineering at massive scale that, while impressive, often is outside the world of most Python devs. On this episode, we have three great guests who do think we should talk more about small to medium sized Python deployments. Emily Morehouse, Hynek, and Glyph. I think you'll enjoy the conversation. They each bring their own interesting perspectives. This is stockpython to me. Episode 352, recorded January 12, 2022.

00:58 Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at talkpython.Fm. And follow the show on Twitter via @talkpython. We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at 'talkpython.fm/YouTube' to get notified about upcoming shows and be part of that episode.

01:24 This episode is brought to you by Signal Wire and US over at Talk Python training. Please check out what we're offering during our segments. It really helps support the show.

01:33 Emily Hannah Glyph, welcome to Talk Python to me. So good to have you here on the show. And I'm Super excited about talking about running Python in production. And I'm glad that it's more than just one of you because I think everyone has their own perspective and their own context. And whatnot do you work for a small company or a large company, or do you have many people on their team or just a couple? And I think that really influences what you might cover, some of the decisions you might make and so on. But before we get to that, let's just start off with a little bit of background. Emily, it's been a while since you've been on the show. Maybe tell people a bit about yourself. Yeah.

02:11 So I'm Emily Morehouse. I am the director of engineering at Cuttlesoft. We are a digital product development company, so we kind of touch on anything from web, mobile, IoT, DevOps. We really touch the whole stack and get to work with a lot of different industries. And then I also am a Python core developer and the PyCon chair for the share.

02:34 Yeah.

02:34 You just happen to grab that role right when Covid hit and conferences got insanely complicated and uncertain. How do you juggle all that?

02:46 It was a really big bummer, I think, because PyCon is really such an important piece of our community and getting to see people in person and connect on a regular basis. But of course, we wanted to do the thing that was going to keep everyone the most safe. So kind of last minute going online and then doing a full online conference and then still hoping that things will settle down this year so that in person is something that feels very safe for us to do. But I was grateful I actually ended up staying on for this third year as chair just so I could have that sort of like last hurrah.

03:20 Yeah, I hope so. Awesome. Well, thanks for all your hard work on it, even if we didn't get a meet in Pittsburgh. Cliff, you want to go next?

03:26 I'm Glyph I am probably best known for Twisted. I have worked on a variety of production Python environments in my career. I am currently in the process of exploring some options for something independent, all of which are secret and I can't really say anything about here yet, most recently of Pilot.com. They are a fantastic company and I very much recommend people work there, and we also ran a ton of Python and production over there. I'm sure that folks in the Python podcasting space have probably heard me occasionally before, so I want to labor my introduction.

04:01 Yeah, sounds good. Well, good to have you here again. And I welcome. Hynek welcome.

04:04 Yes, thank you for having me. I think more or less my first Python podcast in my life.

04:10 Well, it's awesome to have you do so much writing and you have such an influence on through conferences and presentations. You should definitely be here. So it's long overdue. Welcome.

04:21 It's great to hear because I don't think about it like myself, but I work for a small posting company. We are like a traditional web poster. I've been working there for now like 14 years, which is a very long time, which is like 1000 years in Silicon Valley time. I think everything I do I do in Python. Basically there like from web services to simple web pages over proxies applications, you name it, we use Python for it.

04:47 Fantastic. Do you like control VMs and provision VMs and stuff and that kind of thing or what type of work agent?

04:56 We don't sell VMs, we don't sell anything where you can have root on it. So we are selling a platform, basically mostly PHP.

05:06 It is what it is.

05:07 Yeah, sure. At least you can control it all. If I thought. Yeah, well, fantastic. So the reason we're all here today actually, is it's your fault. Hynek.

05:15 So you wrote this article back about a year ago.

05:19 Check the dates. Yeah, I know.

05:21 We've been talking about getting the four of us together for a while, haven't we? It's been a bit of a journey, so we didn't mean for it to be that long, but I don't think there's anything that is dated about this, so it's totally fine.

05:34 What I meant is that it was like right before Corona, I was writing this article on my phone while listening to a podcast in a van, going to a Husky farm in Finland.

05:46 A Husky farm like the dogs.

05:49 Yeah, I did a Husky Safari, basically. Like riding Husky sleds mushing.

05:53 Yeah, that sounds interesting.

05:56 It was. It was my last trip before Corona, which makes it kind of funny because I talk about conferences later on.

06:04 That kind of got put on hold, but. Well, we can replace it with things like this in other conversations. So in this article, you talked about the title of Python in production. Of course, we'll link to it in the show notes, and you said you were missing a key part from the public Python discourse and would like to help change that. Basically, we should have more conversations about running in productions. So let me see if I get this right. You're a huge proponent of microservices. Everything should just be a bunch of small micro services and everyone else is doing it wrong. I'm just kidding. You actually take a different view than that. But we're going to go through sort of some of the themes in your article, some of the other ones out there, and then also just all of us talk about sort of what we're doing in production, some of the considerations and trade offs. So a lot of fun there. Let's talk about cloud computing first. I'll pick one of these out of here, and then we can touch on the whole micro service monolith thing as well. Glyph, let's pick you. We have this trade off, right? We could go and create a VM, set up our own system, and it's totally portable, could run in a data center, it could run in the cloud or whatever. Or we could go all in with Lambda and other very specialized services tied to AWS or to Azure or whatever. What are your thoughts on this whole running the cloud story?

07:22 There's as many dimensions to that question as there are services in AWS, which is to say like a higher order than Alf Null Infinity.

07:33 Exactly.

07:33 The way that I look at that question is you have to look at each service kind of one at a time. And the way that I would generally suggest people decide whether or not it's a good idea to adopt a particular service has to do with the way that their operational footprint works in terms of overhead. So, for example, should you run your own Postgres or should you use RDS? Almost always you should just use RDS, because the operational footprint of that is Amazon configures and manages the tremendous complexity of operating a relational data store and production. And when you need to set that up for development, because part of production is having a development environment that matches closely enough that it works well and that you can test things and know for sure that you're getting adequate test coverage is you can just run a postgres Docker container and functionally for your infrastructure. It's probably the same. You rarely need to do deep configuration that is meaningful to your application between your local development Postgres container and your RDS instance as that slider moves further and further towards like. Well, actually our data store configuration is really part of our application. Well, actually, we need to, like, tune things very closely. Even running your own data store might make sense. That's a pretty unusual circumstance, but you can look at that for other aspects of your platform as well. Routing, like load balancing, caching. Is that part of your app? Do you care about it in development? If it breaks in an interesting way, is that going to take you down and look at each of those issues for each cloud service?

09:14 How expensive is it going to be for you to replicate it in development? And how accurate is that reflection going to be of what's in production and always try to pick the answer that is the lowest overhead for your particular team configuration. That said, the tricky part is this changes as your team grows and as your service gets more complex. One of the things that I imagine I'm going to touch on a few times is that probably the most interesting Python and production experience I had was running Pilot.com's application, which started in a very different place than it ended up, and I'm very happy with the way that went. But we started in an incredibly super macro service, just one big Python container and nothing else towards picking up application load balancers and lambda's and all kinds of other stuff that eventually became part of that infrastructure as the team grew and we had more of a dedicated infrastructure like operational footprint that we could actually manage because it was staffed.

10:14 Yes, that makes a lot of sense. It definitely changes as your team changes. Emily, what do you think? What's your view on this? How much of the cloud should you bite off, and Where's the trade offs?

10:24 I agree with a lot of the things that Glyph said. I think that once you get to the point where you're adding complexity to your local development environment, that's usually a red flag for me. And you have to be doing something that gives you a lot of value. So, for example, we'll use like Firestore as an offline database for mobile applications. And whether you're using that or you're using Couch DB, whatever choice you're making there, you're going to have some sort of added complexity locally. So, yeah, I totally agree with like, RDS is a really easy swap, and if it's RDS on AWS or moving to GCP or Heroku or whatnot, your portability between those different ecosystems is going to be pretty straightforward. But if you want to rewrite your Lambda functions to use cloud run instead, it's not just a drag and drop sort of thing, right?

11:16 You don't just change that connection string and now it's all good again.

11:19 Yeah. And I think it's going to depend on where the company is at. Again, for us, we're often looking at what our client needs. So if they're a client that doesn't have a technical team, that they just kind of want to let this run on its own and they don't want to manage it. That's going to impact whether we choose Heroku or AWS. Really looking at where you're at now and what sort of support you need in the future is a big question to ask for that decision.

11:47 Which way does that gauge go for you? If they are kind of hands off and not super technical, do you give them Heroku or what do you give them?

11:54 Yes.

11:54 If they're non technical, we give them Heroku. If they are technical.

11:58 Even then that kind of depends. A lot of times it's AWS or GCP, just kind of depending on what other pieces of the ecosystem that they need. Or since we work with a lot of existing tech companies, a lot of times they say, hey, we are a GCP company or an AWS company. And then we just say, cool, we can do it.

12:15 Right.

12:15 And if they're already there, you might as well just keep going with that, right?

12:18 Yeah, definitely.

12:19 Yeah, absolutely.

12:20 But I will say that 95% of the time, anything that's like web application or API based, it's going to be in a Docker container anyway. Just to give us that portability.

12:30 Sure. Eric thoughts?

12:32 I have very little first time experience with cloud services because we use none.

12:38 It would be kind of odd for a cloud hosting company to use somebody else's cloud, although I know that people are doing it. It totally happen. People are more or less reselling the big cloud.

12:48 It's true.

12:49 Wrapper. Yeah.

12:50 We run our own hardware and own cages in shared data center, like military grade. I once almost got tasered because I took a photo from the outside. Security came running immediately.

13:03 So the one thing we run outside is Sentry, because that makes sense. And I didn't want to run it anymore myself. But there's also the thing that Europeans in general and Germans in particular are not very excited when you put their data on other people's servers, particularly of US companies. So often it is also like a competitive advantage for us to just say our data literally does not leave Berlin.

13:30 Right. That's certainly something that here in the US, it's easy to kind of forget about. Right.

13:37 We think about it just the big clouds. Right. These are big cloud companies, but really these are US cloud companies. Right. And if you're in Europe or somewhere else, that's another angle to think about. Right.

13:47 I have maybe one thing to add as a user, but as a user, I don't care that US East one is down.

13:57 But you will know. You'll still know.

13:58 Yeah.

13:58 This is what I see happening. Right. People rely too much on cloud services that half of the Internet is down that many people are talking about for sure.

14:09 So maybe we can wrap this up real quick. But I do want to add just one thing. A roller out there in the audience says I prefer my $5 Digital Ocean VM and just my Internet MongoDB Postgres Python. So how often do you think about especially, I guess, you emily, how often do you think about price? If you look at the price for AWS, it can add up. And for some companies, it doesn't matter, right? Just having infrastructure is great. But for others, maybe they don't have revenue yet or they're not profitable and they're really trying to squeeze by. Where do you land on like, let's save you some money. We could do this for $10, but we're going to do it for $500 because it's the right architecture to run this distributed thing.

14:52 What are your thoughts? What do you tell your customers?

14:54 Sure, the cost benefit often comes down to picking an ecosystem that's going to be super stable for them, where we can say, hey, yeah, this is going to cost you a few hundred dollars a month. But if you need somebody to step in and start to manage your system for you, a couple of developer hours will easily outshine their costs for their cloud hosting. So that kind of puts it in perspective. A lot of these clients are already people who have spent multiple thousands, tens to hundreds of thousands of dollars actually building their software in the first place. So as long as you're on a much lower order of magnitude, you're typically okay.

15:33 Yeah, that's true, especially if you're not technical and you've got to hire someone in to kind of fix it, then any sort of failure is a big problem.

15:43 This portion of Talk Python to me is brought to you by Signal Wire. Let's kick this off with a question. Do you need to add multiparty video calls to your website or app? I'm talking about live video conference rooms that host 500 active participants, run in the browser, and work within your existing stack, and even support 1080p without devouring the bandwidth and CPU on your users devices. Signal Wire offers the APIs, the SDKs and Edge networks around the world for building the real estate of real time voice and video communication apps with less than 50 milliseconds of latency. Their core products use Websockets to deliver 300% lower latency than APIs built on rest, making them ideal for apps where every millisecond of responsiveness makes a difference. Now you may wonder how they get 500 active participants in a browser based app. Most current approaches use a limited but more economical approach called SFU, or selective forwarding units, which leaves the work of mixing and decoding all those video and audio streams of every participant to each user's device. Browserbased apps built on SFU struggled to support more than 20 interactive participants, so Signal Wire mixes all the video and audio feeds on the server and distributes a single, unified stream back to every participant so you can build things like live streaming fitness Studios where instructors demonstrate every move from multiple angles, or even live shopping apps that highlight the charisma of the presenter and the charisma of the products they're pitching at the same time. Signal Wire comes from the team behind Free Switch, the open source telecom infrastructure toolkit used by Amazon, Zoom, and tens of thousands of more to build mass scale telecom products. Sign up for your free account at talkpython.fm/SignalWire and be sure to mention talk Python to me. Receive an extra 5000 video minutes that's talkPython.Fm/SignalWire and mention talk Python to me for all those credits.

17:34 One thing that I'd love to add onto that is, well, two things. First of all, on the notion of price and related to which technology she should use. I think the question you constantly need to be asking is really never ask about one side of the equation. Never ask about features or price. You always want to be looking at a price performance ratio, and that performance shouldn't necessarily, in fact, usually should not be like metrics like gigabytes or throughput or anything like that. You should be looking at do you need what the product offers and what is it going to save you time on? So like when Emily says if it's going to save you development time, that is gold, you always want to go like error on the side of saving development time. Developers are very expensive, we're very finicky. And anything that you develop, you also need to maintain. So one of the big benefits of cloud services is think of that price, not just in terms of development costs, but are you going to need to maintain it? And on the flip side, do you need this thing at all? Quite like Henneck was talking about, just not using cloud at all. And yet he successfully develops and deploys many services in production and they all seem fine. And so I think that there's often kind of this tool obsession where we look at features, features, features, and assume that features equals benefits. But features are only benefits to you if you need them. If you don't need them, they're additional cost. You have to learn stuff about every single one of those features. The time your developers spend learning the surface of the AWS API is a cost you have to think about. And so it's quite often just better to not use the cloud because it's cheaper to not figure it all out if you know how expensive it's going to be in your own infrastructure.

19:15 Or as Emily pointed out, if it's in a Docker container, you may run it in the cloud now, but it's fairly portable and not super tied to it. Paul Everett, out in the audience has a quick question, says, we talk about pinning dependencies to control change, but when we talk about cloud computing, we say let things like AWS handle it. That's kind of a dependency on the compatibility of each one of those services you take on. What do you think about stability of those APIs and those services over time? I think Azure maybe is a little more changing than AWS. Things come and go, but still, it's something to consider. Right. It's another thing you've got to deal with turn on, I guess. Yeah.

19:53 And I think it depends on how quickly you're jumping on new services.

19:58 So we actually started using ECS right after it was released, and it's come a long way since then, and both the product and our proficiency has come a long way. So I think that taking any sort of new service with a heavy dose of skepticism and making sure that something that is really stable, that's going to fulfill the need is an important thing to look at.

20:18 Yeah, that's a really good point. All right, let's move on. We touched on this enough, I think. Let's talk about microservices. I was joking with Hynek about that at the beginning. So we have the spectrum. We could have just one codebase, one process that runs in microWSGI or Gunicorn or whatever, or we could have a little flask API that does user management and login, another part that does the catalog or whatever. We have a bunch of these little services, these micro services, and then put them all together. I have thoughts on this, but what are your thoughts?

20:51 My opinion at this point is, I think, quite public that I think you should start with a monolith first for the simple reason that Microsoft has come with a lot of adjacent complexity, which we basically just talked about with cloud services. Right. Like, you cannot have microservices without service discovery, you cannot have microservices without tracing, because every error becomes like I thought it was called a distributed Murder mystery, trying to find the fault of it. There are things that people don't think about, like retries need to be kept. Otherwise you're going to get exponential growth and you denial service yourself and stuff like that. So, I mean, it's generally a good idea to have as few moving parts as possible, which already someone said about cloud computing, because more moving parts are always harder to make reliable. Right. And Martin Fowler calls it the micro service premium, which I like. And again, it's a trade off. Is this premium worth it to you? And I think you need a lot more experience to make this trade off than many people think. That's my experience.

22:05 Yeah. It probably starts simple and then with just two or three pieces, and then you end up with 20 and you're like, how do we get here? Yeah.

22:12 And they don't really know why. Right. Because people want boundaries. I cannot speak for huge teams because one of the peculiarities of my job is that our team is very small with a lot of responsibility but there are big teams that have boundaries that do not come with a network partition. So it's a trade off. And I think that most people need more experience to actually make this trade off.

22:32 Yeah, I very much agree with that. Particularly the people who tend to be asking this question are often asking it because they're at a small company, they have a small team. They saw a really cool white paper or a presentation by somebody at Google or Netflix, and they're like, wow, all this stuff about microservices sounds great. Fault isolation and distributed tracing and use whatever language you want. And I'll get back to that one in a second. And again, like I said before, you will need to think about everything in terms of cost benefit and not just benefit. And the folks at Google and Netflix are talking about 10,000 person teams. And like, how do you manage complexity at that scale? How do you deal with the problems that come with that type of organization?

23:24 Even the shape of this debate has long bothered me, because the terms that we're using. Right. Do we want monoliths or microservices?

23:33 Asking that question is like saying, well, when we have a wheel, do we want it to be like an 18 Wheeler, like truck wheel, or do we want it to be like one of those wheels that comes on like a micro machine's car? And the answer is, I don't know, what are you putting on the vehicle that this wheel is going to be attached to? It really depends the question of how big do you want your service to be, which is, I think, a better question than do you want a micro service or a monolith? It comes down to what is the surface area to volume ratio of your team? It takes about twelve people to run a micro service. I would say like a dozen people per service is roughly what you should be thinking about. Plus, there's the fixed overhead of a micro service architecture where you need service discovery and tracing and logging and like a whole bunch of other stuff which you probably need in today's modern fancy cloud environment. If you're going to be able to leverage services like Honeycomb and Cloud Watch and just manage all your logs and not have to deal with that yourself, then you're talking about maybe a six person team that can just do infra. And if you are at a three person startup and thinking that sounds like a lot of people, you want a monolith, you want one piece of code you can maintain. Because the benefit of having something like a very fine grained micro service architecture is that you can say, we're going to have each team on its little piece, and we're going to move a lot of the complexity of orchestrating from code to configuration because our operations team can deal with the configuration and manage their process around testing configurations but if you're a small development team, you want code you can write unit tests for, you want code you can run and understand the way that you run and understand everything else. And if you push everything in DML files and Ini files and cloud APIs, what does your development environment look like anymore? How do you test those changes?

25:25 Right. Even developing on your own machine becomes tricky.

25:28 Exactly. Yeah.

25:29 I think the big takeaway is that the team needs to match what micro services are optimized for.

25:36 A really good point, Emily. I suspect the people you suggest used Heroku probably are not receiving ten microservices over there, are they?

25:44 No, definitely not. I think one of the things that we typically have built in is a certain amount of ability to scale independently. And so we typically do have two different workers on Heroku. So we're going to have a Django application and a Celery task worker, and then we know that we can kind of scale those two independently, but that's where we really see the division of load. So that's my lens that I look at monolith versus microservices with is really do I need to scale them independently?

26:15 Yeah. And I certainly think having some kind of we're going to kick off the long running work over there. So it's not happening as part of a Web request. That certainly makes sense. Even things just like sending a group of people an email often takes too long if the group is large enough before the request will time out and wreck your servers and all sorts of stuff makes sense.

26:35 I think developer experience is a really big one. We have had the pleasure of working with another very large tech company, like multiple hundreds of employees, and they had a monolithic Rails app, and it was absolute hell to work with as a developer because you're constantly like having hundreds of PRS open at a time. You open a PR and within 20 minutes you have a merge conflict with somebody else. And really, at that point, it makes sense from a developer perspective, too, to be able to divide it up and say, okay, this is your area of expertise. This is your area of expertise, and at least set up those partitions. But you can do that in a monolith as well. It just takes a little bit of awareness and slice in the problem a little differently.

27:18 Yeah. So let me throw out an idea that I just thought of that I'm not sure I'm advocating this or even that it's a good idea, but one of the problems you have is sort of you have these big monoliths like you were touching on, Emily, is a lot of people are changing the same bits of code, and they're kind of in all over the place. So the micro services is one way to solve that problem. What about trying to think about could we package up some of the functionality of what our application does or area into Python packages that we can then install and use in the main app. That idea. Good idea.

27:54 I mean, it's interesting, but it's also another level of complexity, right?

27:57 Yeah. You got to run your own private server or something.

28:00 I would actually say that that is a prerequisite to micro services.

28:04 Specifically, one of the things that people often forget about the whole service architecture thing. And this comes down to, if you remember I said I was going to talk about choose your own language kind of thinking before. One of the things that people often choose micro service architecture for badly is this idea that they want to experiment with different programming languages because they want to use the right tool for the right job. And number one, just the cognitive overhead of jumping between Pascal and Ocamel and Python and rust on your back end is way higher than most people think. But even forgetting about the sort of human cognitive overhead, your service architecture, your service fabric needs to be logging and recording metrics and dealing with load balancers and dealing with data stores in a consistent way. And that task by itself is complex enough that you probably need a library, which means every supported language in your environment needs to have packages installed that are maintained by your infrastructure team and not by your application teams. And that means before you can even think about microservices, you have to be able to split your workflow into multiple different package repositories, different source control, different teams, different CI. You need to be able to do that first before you can reasonably split things across, like multiple actual services. That doesn't necessarily mean you need to not have a quote unquote monolith, because you can put a mono repo into that kind of multiple library workflow. And that's also fine. But you do need to be able to have multiple work streams going that end up in the same service package.

29:37 This portion of Talk Python To Me is brought to you by Tonic.ai. Creating quality test data for developers is a complex, never ending chore that eats end of valuable engineering resources. random data doesn't do it. And production data is not safe or legal for developers to use. What if you could mimic your entire production database to create a realistic data set with zero sensitive data product that AI does exactly that with Tonic. You can generate fake data that looks acts and behaves like production data because it's made from production data, using their universal data connectors, and a flexible API Tonic integrates seamlessly into your existing pipelines, and allows you to shape and size your data to scale realism and degree of privacy that you need. Their platform offers advanced subsetting security identification and ml driven data synthesis to create targeted test data for all your pre production environments. Your newly mimicked data sets are safe to share with developers QA data scientists and heck even distributed teams around the world shorten development cycles, eliminate the need for cumbersome data pipeline work and mathematically guarantee the privacy of your data with tonic.ai. Pick out their service right now at talkpython.fm/tonic Or just click the link in your podcast player shownotes Be sure to use our link talkpython.fm/tonic So they know you heard about them from us.

31:08 Yeah, absolutely. Another thing is versioning across the services. Right. The definition like if you use SQLAlchemy, the user shape has to match all the parts that talk to the database that might touch a user object or something like that. So I guess I'll wrap it up with a quick thought that for me, microservices feel like I'm trading code and developer complexity for operational and DevOps complexity. And I personally feel much more comfortable managing and working with code complexity than infrastructure complexity. But if your team and your organization is all about managing infrastructure complexity and DevOps and you have a bunch of junior devs, maybe microservices makes sense. I don't know. It kind of depends. But my vote is for the monolith side, because I'd rather manage the software complexity. Let's talk about security. I mean, we've had some crazy stuff. I think it's log4 jmemes.com. Yeah. So security is clearly on the we always have to think about, but something that recently logged, obviously being in Java and not Python thing. But it's the one thing that makes me nervous about running stuff in production. Honestly, we've got stuff like 'status cake' or various other things that you can fire up or uptime or whatever that will tell you if your site is down and send you notifications and things like that. But the security one is a little bit scary. So how do you approach thinking about this headache? Maybe you go first. Hosting other people's code is like an extra lot, like meta security.

32:39 Yeah. The one flag I want to wave here is defense in depth, which is something that's very dear to my heart, which could be a bit more popular because every significant attack nowadays is a multi stage one. It doesn't matter if it's like owning your Chrome or owning a server. It's usually multiple stages, so you shouldn't make it as hard as possible to the attackers, even though they have entered your infrastructure at this point. So for me it means that I treat our own network, which is very private. You need a VPN to get in and everything as if it has intruders inside it. And that's our standard. So you should hash your passwords as the maintainer of Argon Two, CFI of course using Argon Two, but use whatever we use TLS, even in private networks, because I know that cloud providers have virtual lands which are also often encrypted, but still it's just another layer. You cannot have enough layers to protect yourself at this point, because if someone intrudes you, you don't want them to sniff passwords out of your traffic and things like that. The list goes on.

33:45 Yeah, it definitely does. Emily, security thoughts?

33:48 Yes. Luckily this isn't one that we've necessarily had to worry too much about. I think the worst thing that's happened for us is DDoS attack, and that's about it. So for me, I think definitely staying on top of dependency management, keeping things up to date, which is not necessarily always the easiest thing, especially like the Python 2 to 3 transition upgrading, Django sometimes a little bit more complex than you want it to be, but I think that getting those security updates and making sure that you're on an LCS version is really important.

34:17 Lts being long term support. Yeah, absolutely. So one of the things that I think I was picking on the log4J thing, but one of the problems that made this a little bit harder, this is going to be something we live with for a long time. It's going to be a nightmare, but the consequences of it. But one of the problems was that the fixed version of log4J, which for those who didn't know the log4J problem is if you can basically get a server to print a message with any of your input, like this URL was Invalid or this email tried to log in and failed. You can own the computer, which is really bad, so it has to be fixed straight away. But the problem was the fixed version was on a newer version like Java8. So if you were running Java7, you had to both not just upgrade your library, but then upgrade your whole runtime, which might be problematic. So, Emily, that makes me think that you probably one of the good rules to go by is don't let your frameworks get too far out of date. You don't have to be on the latest Django, but don't stay on Django One when Django Four is about to be released. Or don't stay on Python 3.6 when you could be on the newer one or something like that. How do you all feel about that?

35:33 Yeah, and I think that looking at it from an open source maintainers perspective, making sure that you have the ability to kind of hot tick previous versions and be very clear about which version you're supporting and which ones you're not. That way you make it easy for a user to know like I'm not going to get the security update and I need to upgrade and have that done ahead of time to stay on top of things.

35:56 Yes, and Edgar Glyph old frameworks. What do you think?

35:59 I think that this is a I don't think I can find this right away, but I remember one of my more popular tweets was one of those sort of two buttons memes where the guy can't choose. And on one side you've got get owned because your dependencies are out of date and you have no way to immediately update them or get owned because you're automatically updating from an upstream that you don't know if you can trust or not.

36:21 So the way that I sort of split that difference in practice is you really want to make sure that it's not just about regularly upgrading because you can always say like, oh yeah, we'll regularly upgrade, but you've only got a fixed budget for security, right? Like it's possible to spend all of your time spinning your wheels trying to increase the cost for attackers across every possible access. Sorry, access.

36:47 Yeah. But eventually you got to ship features and deliver stuff that's really exactly thinking about Hennock's suggestion of defense in depth.

36:55 The way that I like to think about that is you want to raise the bar to a certain minimal level in a bunch of different areas and then not get too obsessed with getting that last 5% of security on each possible access.

37:07 So for dependencies, the way that I think about that is have the very widely available automation that already does this stuff like Dependabot, make sure you are getting those PR's automatically pin everything so that you're never dealing with a library upgrade in the middle of feature development. Your library upgrade work should always be I am upgrading this library now and 90% of the time, if you have that set up where you've got a build that is running on every PR that builds your whole dependency structure, that's got everything pinned and hashes pinned and everything, and then is also regularly receiving these PR updates. Most attackers who are doing things with supply chain attacks aren't all that clever and so they will just end up trying to pop your CI and you'll see that you'll get some kind of error. You'll probably notice not necessarily attackers can be very sophisticated, but you want to have everything, every library running in your nicely isolated CI environment on your GitHub actions or whatever first. And again, you want those changes to all be like same code with your old dependency, same code with new dependency. And every so often you'll get one very expensive upgrade that you really got to do and you got to make time for if you're not upgrading all the super easy, almost free, like just hit the green button on the PR that's working. If you're not doing that all the time and when you do get to those big upgrades, you will be upgrading 50 dependencies at a time.

38:38 Yeah, it's going to be rough. Yeah.

38:39 You really need to think about the cost to you as a defender and the way that you reduce that cost on dependency upgrades is spending just a little bit of time every week tending that automation and looking for those upgrades. They're really going to take some development work and there are fewer than you might think. Like it quite often feels like a really big task because most people get stuck in this, oh no, it's time to do the dependency upgrades this quarter and you have like a seven week project that you're trying to figure out which dependency is the one that's making all your tests fail and you're like changing 50 lines and your requirements takes all at once if you don't do that. And by the time you're upgrading the one library that really does have a big breaking API change, it's not actually that hard as long as everything else is already up to date.

39:23 Yeah. And usually there's one or two libraries that are massive like Django or Flask and then it's all the little dependencies that probably require no effort on your part. One other thing that I switched to, you talked about Dependabot, which is really great. And I have Dependabot turned on for my stuff. But I started using Pip. Compile where it will go and basically you give it an infile and it will build your requirements. Txt. And it'll even say like this thing is here because it depends. You installed Flash, that's why it's dangerous here, for example. And I really like having that because it's just this week I'll go and I'll run it and it'll tell me what the new requirements are. I'm actively knowing that's happening. I'll sort of process it and go with it. What do you all think Emily, are using Dependabot or something else?

40:09 Yeah, we do use Dependabot. I think another interesting, integral piece of being able to upgrade your dependencies is having tests that give you the confidence that you can. So do you feel confident to say, yes, our upgraded dependencies passed our test suite, therefore we can upgrade? Or is there something that is going to require somebody else to go in and look at it? But yeah, I think we've come a really long way in terms of dependency management. I do not miss the giant requirements text files that had all the dependencies of dependencies of dependencies all the way down and you didn't know what you would install versus another library. I also like things like Pip. Compile because it lets you have a little bit more control over the child dependencies versus relying on a top level library like Django to specify their own requirements in a way that works for your organization.

41:00 Yeah, absolutely. So I also use Dependabot, like I said, and if you go and run pip tools and then you commit that back to the repo dependent, bot will notice that that's been upgraded and it will automatically close the PR. So there's kind of a nice match between them as well. Yeah.

41:15 The answers to all these questions are really so much simpler than they used to be five or six years ago. I've been using requires IO for probably almost a decade, and dependable on is much better, largely because it defaults, just sending you those individual upgrades and you can really tune how much stuff it's going to try to do at once. So, yeah, mostly the answer on this is like make something that builds your Docker container or whatever your fully realized application artifact is, run it all the time, and have part of that process be like freeze your dependencies, pin everything you need to make sure that everything is pinned so that you get very reliable repeatable builds. But having done that, you can really just bask in the cornucopia of tools that we have available to do this. Now that make it all pretty easy.

42:01 Yeah, it's definitely getting easier and easier. Let's talk about performance a little bit. I guess maybe I'll show just really quickly. I'll throw out there that at snyk.io, they've got some stuff where you can put in like a package or something like that. Let's see, I don't remember who you go to do, but you can basically put in like a Python package. It will give you a security report for it. I don't know how accurate that turns out to be for everything, but it's something. But let's talk about performance. And one thing I'd like to touch on is actually.

42:30 Can we just say one more thing about security before we move on? Because I hadn't mentioned encryption and not trusting your local network at the beginning. And one other thing I wanted to mention, speaking of tools that are much better than they used to be. One popular Idiom is to have a production mode that has all your encryption on and a development mode where you just turn it all off for convenience. And I'm a big fan of setting up, like an entry in your hosts file for each developer and having some infrastructure for provisioning certificates for individual developer machines. So that encryption is just always on. Nobody ever makes a connection without TLS on it. Even your, like, local API stubs still have some kind of TLS on them because it's actually not all that hard, particularly with let's encrypt. You get a couple of DNS plugins and you can easily Vend those certificates to your Dev team. It takes a couple of days of work at most. And having done that, not only are you more secure because you just don't even have that switch anymore to accidentally be sending everything in plain text, but also you spot a surprising number of configuration issues, and you get to see how your for real certificates work while you're doing development. And that can really help a lot of developers understand what's going on with the somewhat tricky world of Https.

43:47 It's just running a little bit closer to production.

43:49 Yeah.

43:49 All right, let's talk about performance. And I'll put this up on the screen because I think it's a fun thing. Have you all seen Locust IO?

43:56 Yeah, we use it regularly for load testing.

43:58 Do you maybe tell people about it real quick? I've used it once or twice and it's fantastic.

44:02 Yeah, definitely. So it's a tool that allows you to essentially write a script that will emulate requests to your server, and then you can give it a variety of wow, mom brain with no sleep makes me forget words, but basically just like, different parameters that you can specify. So it makes it really easy to say, like, here is the approximate randomized behavior for a single user, and then scale it up to hundreds of thousands of users and see how your server handles it.

44:30 Right. You give it a Python script, which is really interesting. Right. You created a class and you say, here's a typical user of this type, and it does different things. Like it'll call the index page, it will call the about page. It'll go do a search, and you can say things like there's going to be a certain amount of delay between pages. And then you can, as you said, scale that up, say, like, I want 700 regular users and 50 admin users and let them go crawl around on the site and do what they do. Right? Yeah, super cool. And the real time dashboard is neat. So if you want to know about your performance, you could use this. But how do you all think about performance for the apps? Either you're running or delivering what's fast enough? What's just wasting your time getting the last millisecond of response time.

45:18 So I'm going to make the bold statement that for the vast majority of developers, Python is fast enough.

45:25 Oh, yeah, absolutely.

45:26 It is fast enough to saturate a database. And once your database is saturated, you have different problems.

45:32 There's definitely things for which is a problem. Like you cannot saturate an LDAP server, for example. I know that because I tried and it's one of our Go service.

45:42 It would be nice if Python could be faster. And I'm very excited about Guido's performance task force at Microsoft. It's great. I feel like the indifference that the no GIL movement has been shown kind of shows that nobody really thinks that intensively about it anymore. Like for most people, it's fine. It's not good. It's not great. It's fine. Instagram is printing money with it.

46:08 They absolutely are. Yeah. You can go and put your site into page speed insights and see what you get. And I think once you get down to several milliseconds response time per page, there's not a whole lot you can do to make it faster. There's not a lot of benefit to doing that work. Right.

46:29 It's mostly tuning a database with the right indexes and caching. That's the two things. And everything else is like the last 10% or something.

46:36 Yeah, absolutely. So what I wanted to definitely emphasize here and hear your thoughts. I feel like there's so many times I go to some page and doesn't necessarily some site. It doesn't necessarily have to be Python. It's just some database backed page and you go there and it's sitting on what seems like one of their primary pages and it's like 5 seconds till it responds or several seconds even.

46:57 What is this thing doing? And I know they don't have indexes in their database. They can't.

47:04 So I just want to put out a plea for please use some database profiling features. Please look at your queries and please put an index. It's so incredibly easy. Any of you all want to rant on that with me?

47:17 I need to bring up one tool before we talk about this. And it's called Pmustard.

47:22 Okay.

47:23 I've never been able to fully understand and explain statement. Like, I learned it many times and then I forgot it again. And PgMustard is amazing. You just take an explain, you copy paste it into a web app and it tells you exactly what's going wrong.

47:38 Oh, nice.

47:38 And I've shaved. Like, I had like one query, which is like very big from a financial system. And I think like 66% of the current runtime I was able to shave off just because there was an index, but it was set up wrong. It was amazing.

47:52 Yeah. So here's a really beautiful visual thing, and it's also a bit of like a profiler. So it says on this part of the query statement, you spent 04% of the time. And this part you spent 58% of the time right here's an index scan.

48:05 That's the best part. It doesn't just tell you what's wrong, but do this make this index things like that. Okay.

48:13 I've never seen this before. This is fantastic. Emily, thoughts and indexes. Join my plea.

48:19 Yeah. So I think that piece of it's definitely very important, like making sure that you're doing those sort of basic things to get you to that 80% to 90% performance potential. But I would also argue that these days in a vast majority of applications, you're going to have a decoupled front end from your back end. And I would argue that making sure that you're implementing best practices for user experience on your front end is going to give you so much more pay off than trying to optimize that API call by a few milliseconds.

48:51 So previously I had pulled up the page speed insights before. Right. And at PageSpeed, web Dev from Google, I believe. Yeah. And what's really interesting about that is if you go and put your site into there, it doesn't feel fantastic to do it, by the way. So if you go and you put your site in here, you at some point might get a good number. I'm getting 100 out of 100 on the talk python training site right now. But that's because I spent three days addressing every little thing. Like, this image is not sized. Right. This JavaScript is not bundled with that JavaScript. This element is being resized by the browser. You should make it the same size by default and just all of these little things. And it wasn't even about the server response time, which was always pretty low. It's about all the other stuff. It's like how does it feel to the user to get to the page rather than what is the HTML out of the server response time? That's what you're talking about, right? That kind of stuff.

49:50 Yes, definitely.

49:52 Awesome. All right.

49:53 So I have so many feelings about this. This is probably the number two topic for me behind packaging.

49:59 Okay.

49:59 Those of you not listening on the live stream, if this next part sounds choppy, it's because I talked for an hour and a half and Michael had to edit it down.

50:08 We just cut them off. We just had to cut him off when he lost his voice.

50:11 But seriously, the thing that I think is most important is quite often the parts of your application that end up being the performance issues are once you're past this step. Now there are lots of like you all have been mostly talking about websites, except for Hank mentioning LDAP, which is an interesting.

50:32 One.

50:32 I have also scaled LDAP. So the two major applications that I've dealt with performance issues on are an internal calendar service that I maintained, a large company that you can figure out which one it is by reading my Wikipedia page and pilot. Com's internal bookkeeping automation and in both of those. So number one, the calendar service was an API with no front end. The front end was maintained by the client teams that were not even doing web stuff. And the way that we had to performance test that involved standing up our own custom load generation tool and running it as kind of a qualification process for our deployments. And the reason that I bring up that one is it's interesting because we had to figure out what our actual interesting load patterns were. We couldn't use any of these, like standard Http load generation things because we needed very specific data. And that often ends up being the problem that you're facing on that service. In particular, we had performance problems that arose because we added too many indexes. So we are having problems on the right side of the equation. Every time you add an index, you're optimizing read at the expense of. Right. And usually it's like a lot of read performance for a little. Right performance, but eventually it does add up.

51:50 It does. Yeah.

51:51 Kivo out in the audience says, yeah, you got to be careful adding too many indexes as well. There's two parts to that, right, Cliff? One part is when you write something, the indexes have to be computed for that thing that's going in. And the other is more indexes mean more stuff in memory. And so another really important aspect is, does the totality of your indexes reside in memory or does it have to get paged out? Right. And so you would hit that problem, both of those problems you would run into by having too many indexes.

52:19 Right.

52:20 So you need to be ready to measure things because you don't necessarily know that old chestnut about not really premature optimization, not knowing what the hot spots are until you run them. But that also means there are two things about that. One, the tools for doing this are not great.

52:36 The one that I really wish were good and just isn't is Speed Center, which Twisted and PyPy used to monitor their performance over time, like what the performance of each revision of the code is. And there's nothing like GitHub Actions or Travis CI. There's no sort of leader in that field that will just tell you, like, hey, your performance regressed by 10% on this commit. And that is the tool which I desperately want to exist. And so most of the things that I do are trying to approximate that. Part of that is making sure you have metrics and production that are telling you so that you notice when things are slow, you don't want to be having users telling you or even really, like if you're getting the bad performance metrics out of your load testing tool. And that's a surprise to you. That means you're probably not instrumenting enough in Prod to know like, oh, users are seeing some slowness here because you're also going to get things where, like, your database is doing great, everything seems like it's super fast, but your queries are actually really slow. They're just all running in memory. And then you hit the cliff where suddenly you're hitting the disk and now everything is much, much slower and none of your code changed and your data only grew by like 10%. And being able to spot stuff like that means you have to be looking at Perf in Prod. You can't do synthetic tests for everything. And particularly if you have a large site with a lot of users, it's very easy to miss if your 95th percentile is falling off a cliff. Right. You have to be looking at your Quartiles and all of these different things, not just like average performance. And the second part of that is the custom data generation for your synthetic tests. So, for example, the calendar service had those issues that I just mentioned, and pilot service had this issue where most of the performance stuff was not the database. It was talking to APIs to pull in financial transactions and analyze them. And it was those APIs being slow, us being silly, and not talking to those APIs in parallel data volumes, just being huge, like thousands and thousands and thousands of transactions in a single call. And you have to know that that's going to happen. And you have to be able to on demand, add to your test suite or your performance test suite, new types of data. And that performance tool where you write Python code, looks like a great way to do that. I actually had never used that one.

54:54 This thing is glorious.

54:55 Yes. But was it a locust?

54:57 Io? Yes. And it has the dynamic sort of graphs and dashboards show you sort of as it ramps up and as you change it, I think that might give you the right, because you basically structure with Python how it Hammers on the server, which is pretty neat. Yes.

55:10 And the one thing I'd say about that was you need to do that. You will need to write custom stuff. Don't just assume you can add a couple of indexes. You should just add a couple of indexes at first if performance is not your primary concern. But having done that, you have to know you're going to need to think about perf and like, write code to monitor it.

55:29 Absolutely. All right. I think we're just about out of time, although we have barely scratched the surface of all the stuff we could talk about. Let's close out with this question. Do any of you have it sounds like maybe you've thought about this. You have CI performance checks or failures or anything like that. We have run our tests. The bill doesn't pass if the tests don't pass. But do you have something like that for performance?

55:54 That's the thing I want to exist, and I never managed. I've done things that approximate it within tests.

56:00 But never gotten really nothing like that either, Emily.

56:04 No. I mean, I think the closest approximation that we get is we focus a lot on Cypress tests for front end. So actually, these are going through and working with the application. And I think the closest thing that we could get at this point is just setting our Max time out in our Http service, bumping that down and saying, like, if anything's taking over 10 seconds to respond when we run the test, then it should fail. But no, I don't think we don't do any, like regular performance testing.

56:32 Yeah, we do not know. Yes, but we all kind of like that one is kind of nice to have, I think. But yeah, there's a lot to set it up, though, right? You've got to have enough data in the database for it to be meaningful, and that's tricky to do in CI. It would be cool, though.

56:45 I think Gloss makes a really good point, though, like the load testing and the manual testing of performance is great, especially when it's a prerequisite to launch. But there's no way that you're going to be able to replicate anything in production. And the best thing that you can do is monitor Prod as closely as you can.

57:00 Yeah, absolutely. Some of that real time monitoring. Fantastic. All right. Well, thank you all for being here. This has been super interesting to chat about before you got it here. I think I'll just put it down to one of the final two questions so that we don't take too long. But if you're going to write some Python code, what editor use? Let's go clockwise. Hynek, how about you? What are you writing code with these days?

57:20 Well, I have a long story of Editors. I've used almost all of them at some point. I usually stopped using because I got crippled like MX Pinkie. Nowadays, I usually use either them in a console or Vs code.

57:33 Right on. Glyph.

57:34 There's an implicit thing in this question where it sounds like you're recommending the thing that you use. I want to be clear that I'm not doing that.

57:42 Don't do as I do.

57:43 Yeah, I use Emacs, but with about ten megabytes of custom, elisp, which I'm never going to share with anyone.

57:51 Fantastic.

57:51 You shouldn't use it. Just use Vs code.

57:53 Awesome.

57:53 Emily VS Code all day.

57:55 Okay.

57:56 Big thanks to Brett Cannon, who sat me down at a PyCon and forced me to use it because the first time I used it, I hated it and I went back to Sublime. But yeah, I don't think there's anything that competes these days.

58:06 Yeah. He's such a good ambassador for that, so it's good to have him working on it. All right. Thank you all for being on the podcast. It's been great to chat about the stuff and really insightful to get your experience.

58:16 Thank you so much for having us.

58:17 Awesome. Thanks for having us.

58:18 Yeah. You bet. Bye.

58:21 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Add high performance multiparty video calls to any app or website with SignalWire. Visit talkpython.fm/signalwire and mentioned that you came from Talk Python to Me to get started and grab those free credits. Want you level up your Python. We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in site. Check it out for yourself at Training.talkpython.Fm be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunes feed at /itunes, the GooglePlay feed at /Play and the Directrss feed at /rss on talkpython.fm.

59:14 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at Talkpython.fm/Youtube. This is your host Michael Kennedy. Thanks so much for listening, I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon