Monitor performance issues & errors in your code

#421: Python at Netflix Transcript

Recorded on Thursday, Jun 8, 2023.

00:00 When you think of Netflix as a technology company, you probably imagine them as cloud innovators.

00:06 They were one of the first companies to go all in on a massive scale for cloud computing, as well as throwing that pesky chaos monkey into those servers.

00:14 But they have become a hive of amazing Python activity from their CDN, their demand predictions and failover, security, machine learning, executable notebooks, and lots more.

00:25 The Python at play is super interesting.

00:28 And on this episode, we have Zorin Simic and Amjith Ramanujan on the show to give us this rare look inside.

00:35 This is "Talk Python to Me," episode 421, recorded June 8th, 2023.

00:41 (upbeat music)

00:44 Welcome to "Talk Python to Me," a weekly podcast on Python.

00:57 This is your host, Michael Kennedy.

00:58 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, both on fosstodon.org.

01:06 Be careful with impersonating accounts on other instances, there are many.

01:10 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:15 We've started streaming most of our episodes live on YouTube.

01:19 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:27 This episode is brought to you by JetBrains, who encourage you to get work done with PyCharm.

01:33 Download your free trial of PyCharm Professional at talkbython.fm/done-with-pycharm.

01:40 And it's brought to you by InfluxDB.

01:42 InfluxDB is the database purpose-built for handling time series data at a massive scale for real-time analytics.

01:49 Try them for free at talkpython.fm/influxDB.

01:52 Hey, Soren.

01:55 Hey, Amjith - Hello, Michael.

01:56 - Hello, Michael.

01:56 Welcome to talk Python to me, you guys.

01:58 It's excellent to have you here.

02:00 - Thank you very much.

02:01 I'm a big fan, so it's very nice to be on the show, actually.

02:04 - Awesome, yeah.

02:05 We've got to meet a couple times at PyCon, which is honestly one of my favorite purposes of PyCon is to meet people and just hang out and have those experiences, you know?

02:14 - Yeah, absolutely.

02:15 - Yeah, and nice to have you on the show, Zorin.

02:18 - Yeah, I'm a big fan as well.

02:20 - Thank you very much.

02:20 That's very kind of both of you.

02:22 So we're gonna talk about a pretty awesome tech company, I think Netflix, you both work at Netflix, and people who are watching the video, you're coming to us from the Netflix headquarters, which I've got the chance to be there for like some Python stuff going on there before as well.

02:37 Got cool posters and like sort of movie studio feel.

02:40 So that's the backdrop you both have going on, which is excellent.

02:44 - Yeah, yeah.

02:45 It's pretty nice to work at Netflix.

02:49 It's a very good company.

02:50 I'm very happy.

02:51 - A lot of Python we're gonna learn.

02:52 - Yes, yeah.

02:53 We do use a lot of Python, yeah.

02:56 - Excellent, so we're gonna talk about Python and Netflix, a wide ranging sort of survey of a lot of projects you all have created, how you're using it, some other ones that both of you personally created, either tied to or not tied to Netflix, but I think people are gonna really enjoy this look inside what you all got going on.

03:13 Before we get to that though, let's start with your stories.

03:16 Quick introduction, how'd you get here working on Python?

03:20 Zoran, you wanna go first?

03:21 - Yeah, so I was hooked into programming Ever since I saw my first computer, I finished at 13 in middle school.

03:28 It was an Amstrad CPC.

03:30 Right.

03:30 I was, yeah, that was the thing I wanted to do.

03:33 So, yeah, I started programming as a hobby at first.

03:36 And fun fact, way back then, later on in high school, one of my math teachers told me, Hey, do something real.

03:43 Don't, don't do programming.

03:44 It's like a dead end.

03:45 You know, you won't be able to find a job.

03:49 Did they tell you things like these drag and drop visual tools are going to replace all the programmers and all like the low code of the eighties and nineties, maybe?

04:00 Yeah, back then, I guess it was very, well, didn't seem that obvious.

04:06 Yeah.

04:07 And then, yeah, I decided to go computer science anyway, because that's what I wanted to do.

04:13 And then I spent the vast majority of my career in a language that is not very much known or used, I think, iPhone.

04:21 So I spent more than a decade on doing iPhone mostly.

04:25 And then I discovered Python once I joined LinkedIn in 2011.

04:29 And that's when I kind of, well, got hooked and decided to do more and more things Python.

04:36 And now at Netflix, even more so trying to support NetPython across the board.

04:41 Yeah.

04:41 You were kind of doing meta Python in the sense that your team does a lot of stuff to facilitate other people doing Python too, right?

04:49 Exactly.

04:50 Yes.

04:50 Yeah.

04:51 That's our, that's our team at Netflix.

04:53 Like we enable other Python developers to be more productive by building tools or building the infrastructure necessary to ship their code faster or build their products sooner, things like that.

05:04 Yeah.

05:04 Cool.

05:05 How about you Amjith?

05:06 Oh, I got introduced to programming in high school.

05:10 We had like one hour of a computer lab every week.

05:13 I got to learn GW basic, that was my first language.

05:17 It was fantastic.

05:18 I still have fond memories of like trying to draw circles on the screen.

05:22 And then I went to college, I learned C and C++.

05:25 I liked those, but then after I got a job, I wanted to learn, you know, how to be a better programmer and somebody mentioned, you know, oh, functional programming is the bee's knees, you should actually, you know, if you learn how to do functional programming, your general programming will get better and the best language to learn functional programming is Haskell.

05:41 And so I took a book called, learn new Haskell.

05:44 And then I went through like the first few chapters and, and it was mind blowing.

05:47 It was like a really fantastic language.

05:49 And, and I got first introduced to a concept of REPL and like trying out like little snippets in the interpreter and getting answers and it was fantastic.

05:57 And I got introduced to lists comprehension in Haskell and it was just mind blowing.

06:02 It's like, you know, without having to write a, write like five lines of for loop, you could just, it's a single line thing.

06:08 And I quickly realized that, you know, you can't find actual jobs writing Haskell or at least, you know, not, not, not in a good way.

06:17 So, so I figured out like, what's a language that has list comprehension that is actually employable, you know, that, that I could find jobs in.

06:24 That's how I found Python because I came to Python because of list comprehension.

06:28 Oh, awesome.

06:29 Yeah.

06:29 Okay.

06:30 Learn you a Haskell for great good, a beginner's guide.

06:33 Is that the book?

06:33 That is the book.

06:34 Yeah.

06:35 And it's actually still available online for free that anybody could read, I'm fairly certain.

06:40 And I actually bought like a paper copy of the book.

06:42 It's a good book.

06:44 It's a fun one to go through.

06:45 - Yeah, it looks like it's really got a playful nature to it.

06:48 - Yeah, exactly.

06:49 - Yeah.

06:50 You know, your thoughts about less comprehensions really connects with me as well.

06:55 I guess my first exposure to something like that was Linq, L-I-N-Q and C#, which is, it's honestly, I think it's better than Python less comprehensions.

07:04 I wish Python had just a little bit more.

07:06 - Nice. - A little bit.

07:07 Just one or two things more.

07:09 For example, wouldn't it be nice in a list comprehension if you could specify a sort?

07:14 'Cause I find myself often doing a list comprehension and then sorting the thing in the end afterwards.

07:19 But if you could just say order by and give it an expression like you would to pass a lambda over to a, you know.

07:26 So there's room for more.

07:27 What pep do I need to write to get sort in a list comprehension?

07:30 I don't know, but I want it anyway.

07:32 Yeah. So I really think that that's a cool language feature.

07:36 And you know, it's also one of the areas that they're applying some of these speed-ups in the faster CPython work that's coming, they're doing, you know, list comprehensions for isolation purposes and Python three are basically hidden function calls with their own stack frame and variables that you, you don't actually see, right.

07:53 You don't write it, but that's kind of the execution level.

07:56 And now they're inlining those to make them a little bit faster.

08:00 Yeah, I think the faster Python team is doing like a fantastic job.

08:04 Like the, there was a talk that I attended at PyCon, not this year, but the previous year where they introduced like switch case, how they were doing the, the case statements, it's not the exact switch case, but you know, I coming from C and C++, I knew what switch cases are.

08:16 And when I saw what, what is possible with the pattern matching, like structural pattern matching in Python, it's like take switch case and then like turn it up to 11 and that's what this is.

08:25 And you're right.

08:26 I mean, there is always more that can be done, but I think it's going in a great direction, I think it's fantastic.

08:31 - Yeah, let's talk about that.

08:33 I mean, we're going to dive into the details of Netflix and stuff, but just, you know, this whole Python 3.11, 3.12, these are really big performance improvements coming along.

08:45 - Yeah.

08:46 - Are you able yet to take advantage of those at Netflix?

08:48 And is that making a big difference?

08:49 You know, like, are you guys still running, you know, 3.8 or are you more closer to the cutting edge in terms of releases?

08:56 - So I think one of the advantages here at Netflix is that every team has the freedom to choose the tools that they need to use.

09:03 And it's great and also kind of painful for central teams because now, you know, there is like a bifurcation of all kinds of different versions out there.

09:10 But where I'm going with this is that every team is allowed to choose what is what they need to use in order to get their job done.

09:17 And so my previous team, we were at the cutting edge, like we used 3.11, or we still use 3.11 in the projects that we built, and the services that we use.

09:25 And it is a nice boost, like we could certainly see.

09:28 So for instance, there is like a periodic job that runs, and it's like a sort of a cron job that runs every five minutes or so.

09:35 And we had put in like so much optimization so that it will actually finish within the five minutes because we were doing a lot of data crunching and so forth.

09:42 And just so we don't like stack up the cron tasks.

09:45 But when we switched from, I think from, like, we did jump from 3.9 to 3.11 directly.

09:50 We did not like go to 3.10.

09:52 But then when we jumped, it felt like, you know, things that were taking like four minutes, we're now finishing in like two minutes.

09:58 And it was like a huge improvement that you could see.

10:02 And like, it felt very rewarding to see that.

10:04 So yeah, absolutely.

10:05 So every team gets to choose what they want to use.

10:08 And our job as a central Python team that Zorin and I are currently part of is to try and enable people to use that, use whatever is the latest that is available.

10:17 So, you know, whatever internal tools that we have, we have to make sure that it actually gets exercised in the latest Python version that got released and make sure that everything is building and deploying as they are supposed to do and so on.

10:29 - Okay, excellent.

10:30 That's pretty cool, that story of speeding up your Cron jobs.

10:33 That's non-trivial, and it probably wasn't a lot of work to move from 3.9 to 3.11.

10:39 I know my upgrade path was rebuild some virtual environments on the server, and now we're good to go.

10:45 - Exactly, yeah.

10:46 - So, Zorin, anything you want to add about that?

10:49 3.11, faster CPython side?

10:51 - Oh yeah, absolutely, it's so faster.

10:55 So much faster.

10:56 Yeah.

10:56 The main issue on when upgrading is the lack of wheels, if you're like stuck on older libraries, but we do have a, like a few numbers, like the most used right now is Python 3.10 across the board, right?

11:08 It will depend on the team right now.

11:10 Everybody is upgrading at their own pace and 3.11 is starting to grow a bit.

11:15 But yeah, most used right now is 3.10 statically.

11:19 You should look at it.

11:20 - Honestly, that sounds really quite good for a company the size of Netflix and how much Python you're doing.

11:27 That's pretty close to pushing the envelope.

11:29 - Yeah, there are still some teams that are sort of stuck on 3.8 or 3.7, I wanna say, simply because they provide a platform that allows data scientists to write their code and they have this pre-built image with all of the necessary libraries pre-installed in there.

11:47 And so they have like a pretty tight control over which libraries will get upgraded on what cadence and so on.

11:53 And so for them, I think they have, they're still, you know, running on 3.7.

11:57 And I'm sure when they switch to 3.10 or 3.11, it's going to be like a screaming fast improvement.

12:03 So looking forward to that migration to happen.

12:06 Yeah, excellent.

12:07 This number is very static, right?

12:09 It's a number of like short pythons across repos.

12:13 But yeah, dynamically, right?

12:15 Like you may have lots of instances who still run on 3.7, and they will massively move to a, so that team is moving from 3.7 to 3.10, for example.

12:24 - Right, yeah.

12:25 - Yeah, so upgrade paths.

12:26 - This portion of Talk Python to Me is brought to you by JetBrains and PyCharm.

12:33 Are you a data scientist or a web developer looking to take your projects to the next level?

12:37 Well, I have the perfect tool for you, PyCharm.

12:40 PyCharm is a powerful integrated development environment that empowers developers and data scientists like us to write clean and efficient code with ease.

12:50 Whether you're analyzing complex data sets or building dynamic web applications, PyCharm has got you covered.

12:56 With its intuitive interface and robust features, you can boost your productivity and bring your ideas to life faster than ever before.

13:03 For data scientists, PyCharm offers seamless integration with popular libraries like NumPy, Pandas, and Matplotlib.

13:09 You can explore, visualize, and manipulate data effortlessly, unlocking valuable insights with just a few lines of code.

13:16 And for us web developers, PyCharm provides a rich set of tools to streamline your workflow.

13:21 From intelligent code completion to advanced debugging capabilities, PyCharm helps you write clean, scalable code that powers stunning web applications.

13:30 Plus, PyCharm support for popular frameworks like Django, FastAPI, and React make it a breeze to build and deploy your web projects.

13:38 It's time to say goodbye to tedious configuration and hello to rapid development.

13:43 But wait, there's more.

13:45 With PyCharm, you get even more advanced features like remote development, database integration, and version control, ensuring your projects stay organized and secure.

13:53 So whether you're diving into data science or shaping the future of the web, PyCharm is your go-to tool.

13:59 Join me and try PyCharm today.

14:01 Just visit talkpython.fm/done-with-pycharm, links in your show notes, and experience the power of PyCharm firsthand for three months free.

14:12 PyCharm, it's how I get work done.

14:14 Let's start by talking about kind of the broad story of Python at Netflix.

14:24 Maybe we could start with what you all do day to day in terms of what's your role, 'cause you kind of support other people's Python as I hinted before.

14:33 So maybe we can get a sense of what you all do day to day and then we'll, Amjith you wrote a nice blog article That's a big, broad, pure survey of how Python's being used in all these different places.

14:44 So maybe start with what you all do day to day on your, on your team, and then we'll go into that.

14:47 Yeah, sure thing.

14:48 I've been with Netflix for about six years now.

14:51 And previously I was in a different team and we were doing fail overs, which was a way of running, you know, if Netflix ever goes down in one of the AWS regions, we are the team that gets paged in and we go and move all the traffic from that region to another other two regions that we run in.

15:07 So that's what I was doing up until like February of this year.

15:10 And let me just take a step back real quick with you.

15:13 Netflix is kind of all in on AWS, right?

15:16 Like there's been a lot of stories about how you all have set loose the chaos monkey into your data centers and how you worked on failover from AWS regions.

15:26 And so I don't know if you all are the largest users of AWS, but certainly one of the more interesting, complicated deployments out there, right?

15:35 Yeah, so I think we were the earliest adopters of cloud computing when AWS first came out.

15:40 And so AWS has used as the poster child for, you know, see big companies can run in cloud, and you don't have to be on prem. And so we think of them as partners, not so much as, you know, like this client owner relationship or anything like that. So we consider AWS as our business as partners. And yes, we are full in on AWS. And Chaos Monkey, even now, yes, it is, it functions in AWS, like it goes around and just inside our VPC, it does terminate instances occasionally or not occasionally, like once every day, one instance every day on every service.

16:15 So that is so wild. I mean, obviously, you don't want to set it loose on other people's AWS instances, right? Just Yeah, that's a really interesting way to force people to think about developers and infrastructure folks to think about what happens if the cloud somehow your server dies, it may be sending the clouds fall, right? It's just like, okay, there's a Linux machine running and that thing died. It could have been running anywhere. It happened to be an AWS, but to force them to think about outgoing, like we will, it's not a eventuality. This will happen. And so you plan for it.

16:47 Yeah, it's even more than just the idea of like, it will happen. So we plan for It's more like, you know, it's a way of building software where you need to build software that's resilient and has enough fallbacks built in.

16:59 So for instance, if you are not able to reach the database, do you have a cache in front that can sort of, you know, keep the thing going for the few network calls that are failing to reach the database?

17:09 Those are like basic common things, paradigms that have become commonplace nowadays in software development where, you know, building fallbacks automatically is like standard practice these days.

17:19 these days. But when Chaos Monkey was created, which was about 10 years ago, these were like new concepts that people were not using. And it was assumed that once you have a server and you put your software on the server and you run it, it's basically done. Until you do the next deploy, which takes another month or so to refresh that server, refresh that code. But that all changed once we went to cloud, where we started doing deployments on a daily basis or maybe even more hourly basis and things like that. And so when you are doing that, when you are shutting down one server with old version and bringing up the new server with a new version, how are you going to make sure that the connections are not going to fall?

17:54 And how are you going to make sure that the network continuity continues and so forth?

17:58 So yeah, Chaos Monkey was just introduced as a way to ensure that people are building software in a way that is resilient.

18:04 And this is just a way to sort of test that on an ongoing basis.

18:09 Yeah, it's quite an operational challenge.

18:11 I mean, I don't recall seeing Netflix saying, our scheduled maintenance is coming up on Sunday, we'll be down for five hours.

18:17 Not acceptable is it?

18:19 It just makes you laugh to even think about it.

18:21 Especially not on a Sunday.

18:23 I've even seen government sites, I can't remember which government it was, saying that the website was closed, like the website had business hours.

18:31 That's a different deal.

18:32 Like, you came at night, like, "Oh, you can't come here right now." It's like, "What? It's the web. I don't understand what's going on." All right. So let's go through this blog post that you wrote here, entitled just Python at Netflix on the Netflix technology blog.

18:44 - Technology blog.

18:45 Yeah, so you wrote this in preparation of PyCon.

18:48 This is PyCon 2023?

18:50 - No, this was 2019 actually.

18:52 So this is old by at least two or three years now.

18:56 - Okay, yeah, you had pointed out before we press record that some of these projects mentioned here that used to be internal things are now also open source.

19:03 So there's a little more access to these than the blog posts might indicate.

19:07 - Yeah, some of the things that are mentioned here, yes, they have been open source since then.

19:11 So specifically the one that I remember right now is Metaflow, which is an infrastructure, it's like a platform orchestration infrastructure framework that is used by our machine learning organization where scientists would try and build their model or they use existing models from like XGBoost or like tons of other Python libraries.

19:33 And their interest and their expertise lies in crafting those models, training those models and building the correct algorithm to do the predictions and so on.

19:44 They are not so interested in making sure that enough compute is available to run these models, or they're not interested in making sure that the plumbing works, or this model's data is now going to the next step of this algorithm, or even getting it deployed and making it available in the production environment.

20:00 So that's all that abstraction is taken care of by Metaflow.

20:04 So Metaflow is a project that was mentioned here, and that allows you to make it easy for machine learning folks to get their system running and as well as deploying it out to production.

20:15 And now that is now open sourced and it is available for folks to use.

20:19 And I think some other companies have actually adopted to using that as well.

20:22 So, yeah.

20:23 - It kind of operate like a DevOps automation for machine learning.

20:29 So the people they're writing, creating the models and the data scientists don't have to also be DevOps people.

20:34 - Right, it's slightly more than DevOps as well because it also does the pipelining work to make it possible for someone to, you know, bring the data from this database and load it in, all of that work is already taken care of, or at least there are libraries that are built into Metaflow that makes it possible to bring those in.

20:50 And then it allows you to also do orchestration.

20:53 So for instance, machine learning models typically happen in multi-steps and multi-stages.

20:58 And so the data gets processed by this function, and then it gets moved on to this other function, and then it gets moved on to this other thing and so forth.

21:04 And so it does the plumbing to make sure that the data can flow through this topology and actually produce results and so on.

21:10 - Yeah, you probably have enough data that that's a lot of data to move, so.

21:14 (laughing)

21:16 All right, a quick question from the audience before we dive into the topics here.

21:19 Diego asks, "On such a big platform "with so many software engineers "with different coding practices, "do you all get together and follow some set norms "by Netflix, or is it more team by team basis?" - It is very much team by team basis.

21:32 So each team has their style and the areas that they focus on.

21:35 So for instance, like machine learning engineers are not going to care too much about how do I make this production grade super heavily fortified or whatever?

21:45 And security engineers might be focusing on completely different things.

21:48 So it is different.

21:49 But at the same time, I do want to mention that there are certain norms that are common across the entire company where, you know, so for instance, Chaos Monkey is one of those things where since Netflix operates in a way where, you know, every team is given the freedom to choose and operate the way they see fit, there is no edict that can come from a VP or a president that says, like you must write code in this way, like that doesn't happen.

22:13 And so what that means is, how are you going to enforce, like, you know, you have to write resilient software, or how are you going to make sure that your software will continue to run if one of the servers out of the hundred servers has gone down?

22:24 And so there is not a good way to enforce that.

22:26 And Chaos Monkey was created as a way to enforce that, which is, yes, we're not going to be able to tell you how to write software, but this particular service that exists, it's going to go around killing servers.

22:36 And so you better make sure that your software is actually resilient to servers going down.

22:41 So that's a way in which we influence people to write the--to produce the right outcome without telling them how to do it.

22:48 - I see. So sort of, you agree on a common principle of design for failure and design for resiliency, and then it's up to people how to make that happen.

22:58 - Yes, and also, we have the concept of paved paths, or paved road, which is we have certain libraries that are made to operate within our infrastructure.

23:08 So there is an internal discovery tool, and there is an internal metrics collection tool, and there is an internal, you know, like a failure recovery tool and so forth.

23:15 And these libraries that are provided in these languages, they make it really that simple to just integrate with these services.

23:24 And so it makes it the obvious choice for people to start using those libraries rather than, you know, paving their own path, for instance.

23:30 So we try and make it as easy as possible to do the right thing.

23:34 And so people generally fall into that paved road solutions that we have.

23:38 Excellent.

23:39 And we try to make it also now, especially as a central Python team, to promote good practices, right?

23:46 Like, you should have a pipeline, you should choose a release strategy, you should have tests, and we help.

23:53 If you don't, we can help you set that up and choose a good relevant release strategy for you.

23:59 Excellent. Yeah, that's really good.

24:01 So let's dive into this blog post.

24:03 Now it was written by Amjit, but Soren, jump in as well as we talk about, please.

24:08 So the first one is related to bandwidth.

24:12 To somewhat like delivering the content.

24:15 And there's some interesting articles and stuff that says how much of the internet's bandwidth does Netflix use?

24:21 And I don't know how accurate this is, but maybe give us a sense of like, you got to have a lot of traffic, right?

24:26 Yes.

24:27 So I think when I first joined Netflix, I was told that we use about one third of all of internet's bandwidth, but that was back in 2017.

24:35 So things have changed quite a bit since then.

24:38 Our use of bandwidth is slightly interesting in the sense, the actual, when somebody goes to their website and they're browsing around, all of that data is served directly from AWS servers.

24:50 And so we have servers running in AWS that does the search functionality, the thumbs up, the thumbs down, you're selecting something and reading the review or looking at related things and whatnot.

25:01 But as soon as they click on the play button on a particular video, the actual video file itself is not streaming from AWS, but instead it's coming from a CDN called Open Connect.

25:12 And this is a proprietary thing that we built where we ship these CDNs to various internet exchanges that are already filled with the right videos and they get populated with the correct videos that are getting released overnight or on a regular basis.

25:30 The reason we do that is because we want the videos to stream from the closest possible place for the end user.

25:36 And so when a end user in Florida clicks on it, it's coming from an internet exchange that is located in Florida.

25:42 And that's why you don't see a lot of buffering when videos are playing from Netflix is because there's, you know, it's inside their, their network to a large extent, that's our open connect team.

25:51 And that's, that's what they do.

25:53 And yeah.

25:53 Yeah.

25:54 That's, CDNs are awesome.

25:56 And they really are just, they're kind of a bit of magic performance dust you can sprinkle on sites.

26:03 That works for CSS and JavaScript and stuff, but when it comes to large content, then it makes all the difference.

26:12 So in the blog post you write, let's see, yeah, it says, "Various software systems "are needed to design, build, and operate "the CDN infrastructure, and a big part of them "are written in Python.

26:23 "The network devices that underlie a large portion of it are mostly managed by Python and so on.

26:28 Give us a sense of where Python fits in this Open Connect CDN that you all run.

26:33 Sure. Yeah. So the CDNs themselves run like high performance code to stream the video.

26:38 Obviously that software is not written in Python.

26:41 But the software, all the software that orchestrates and makes sure that these CDNs are remaining healthy, getting metrics out of them, as well as managing them and forecasting like what sort of videos are going to be going into these CDNs and so forth.

26:54 those are all orchestrated using Python applications.

26:57 So these are all internal tools.

26:59 There's like an OC tools team.

27:00 OC stands for the Open Connect, which is the name of the CDN.

27:03 And OC tools team is the one that builds that.

27:05 And they use quite a lot of Python for not just tracking our CDNs, but also for projecting, you know, which videos and what shapes they should be going into.

27:14 So for instance, like to give you a quick example, like if we are launching, let's say like Stranger Things, like the newest season, we know for a fact that these videos are going to be, you know, they're either going to be streamed like 90% of the time from television, like a 4k definition television, or people are going to be watching on their iPhone. So all these videos get encoded in different formats, like for, for different resolutions. And how much do we put into the CDNs and how do we get them prepared?

27:40 Do we need like multiple copies so that multiple streams can be read without having to, to have contention and so on. Things like those kinds of projections, those are all done using Python applications. Yeah.

27:50 You probably can't put every version of every video at every location all the time, right?

27:56 I don't know how much that is, but that's a large amount of video content, large load of files.

28:00 You probably got to predict, right?

28:02 These we can fall back to, you know, letting them stream from some higher upstream thing, but then it'll get cached after it gets viewed a little bit.

28:10 But these were pre-loading, right?

28:12 Yeah, yeah. Actually, Zorin used to work in the team that did all the encoding in different shapes and sizes.

28:19 and they use quite a bit of Python as well, he'd be able to tell you more about that stuff.

28:23 Yeah, did you just have like a huge office, like a whole building full of GPUs and just go in the whole time?

28:30 Encoding is a lot of work. Yeah, tell us about this.

28:32 Yeah, encoding is a lot of work.

28:34 That was my original start here and we do a lot of Python as well.

28:38 And yeah, we sum it up, we kind of try and scour, scavenge as many instances that we can put our hands on.

28:45 So if we have any, say, AWS reservations, that it so happens that nobody's using right now, we come and grab them and spawn our workers dynamically on it as much as we can.

28:58 - Interesting, almost like grid computing, like a steady at home.

29:02 - Yeah, exactly.

29:03 - Like steady at home, yeah.

29:04 - And if we do have something like we have this high priority, well, you know, there's not enough, like kind of workers laying around, then we can go and get some on the spot, you know, market or, well, get to grab more reservations if need be. So that is the, the encoding is basically we take these big master files, right? Like the, these originals and we encode them for every single variation where it makes sense, like for this TV, for that phone, for, you know, Android phone, iOS phone.

29:33 What is the product of all the different resolutions and different platforms?

29:36 How many video files do you have to make for how many formats do you have to have for one movie?

29:42 Do you know?

29:43 That changes per need.

29:44 And, you know, we kind of keep fine tuning how we want the smallest files with the best quality.

29:51 Right.

29:51 So that keeps evolving.

29:53 And sometimes we re-encode the full catalog because now we have like a better way of encoding, say, anime things versus, you know, action movies versus like, it gets to us.

30:04 I see.

30:04 You might choose a different encoder for a cartoon like thing versus the planet earth type of.

30:10 Yes.

30:11 Yeah.

30:11 Okay.

30:11 Yeah.

30:12 Yeah.

30:12 And all of this, basically by way of a product of all of this ends up on OpenConnect.

30:17 I mean S3, but also OpenConnect.

30:20 Yep. Excellent.

30:21 One thing in there that is mentioned on my team, very interesting project called vMath.

30:26 So that is written in Python, it's machine learning.

30:28 And once you have encoded, right, like let's say you're trying a new way of encoding to make the files even smaller, right?

30:36 You want to know during, while you're researching, right?

30:40 you want to know, did you come up with a very good, better encoder than before?

30:44 So VMAF is like a little bot that will look at encoded new file and give it a human-like score, like what quality would the human assess this to be?

30:56 And it has to be, you know, basically excellent quality, get a high score, I think 90 out of a hundred, roughly, to pass.

31:06 And then this is better, right?

31:07 Like we have a smaller file, but the quality is still excellent and perceptibly it's as good as before, but just a slightly smaller.

31:14 Then we could decide and re-encode the full catalog.

31:17 I see. That's really interesting.

31:20 So what you're telling me is you have an AI that you just make watch Netflix movies all the time.

31:25 All the time.

31:26 All the time.

31:27 And we have other AIs that watch the whole catalog, for example, and find where text appears, say.

31:33 you know, so that when we put subtitles, we can move them up or down, you know, to not put text on text and all kinds of metadata, like, where can we find landscapes? Where does broad pitch show up? Things like that. Incredible. I had no idea. People are always full of a lot of surprises. This portion of Talk Python to Me is brought to you by InfluxData, the makers of InfluxDB. InfluxDB is a database purpose built for handling time series data at a massive scale for real-time analytics. Developers can ingest, store, and analyze all types of time series data, metrics, events, and traces in a single platform. So, dear listener, let me ask you a question.

32:14 How would boundless cardinality and lightning-fast SQL queries impact the way that you develop real-time applications? InfluxDB processes large time series datasets and provides low-latency SQL queries, making it the go-to choice for developers building real-time applications and seeking crucial insights. For developer efficiency, InfluxDB helps you create IoT analytics and cloud applications using timestamped data rapidly and at scale. It's designed to ingest billions of data points in real time with unlimited cardinality. InfluxDB streamlines building once and deploying across various products and environments from the edge on premise and to the cloud. Try it for free at talkpython.fm/influxdb. The link is in your podcast player show notes. Thanks to Influx Data for supporting the show.

33:07 And I think the VMAF software that's written in Python, I believe that is open source, right Zorin?

33:12 It is. It is open source. Yes.

33:13 And I think it's one of the Emmy award winning software. I did not know that software could win Emmy awards before this one. And it's kind of, it, it apparently won an Emmy award for something videography or something. Probably. Yeah. Wow. Yeah. That's awesome. All right. The next major section is demand engineering. Yeah. This is kind of like DevOps type stuff, right? Keeping things running capacity plan. Yes, that is exactly right. Yeah. That was the team that I was in previously.

33:43 And the regional fail overs is the one where I mentioned where you could traffic from one of the of the AWS regions into the other two regions.

33:49 So we run in three separate AWS regions, and any time any of those regions is having a difficulty, we can easily move the traffic to the other two regions without users even noticing that there was a glitch or any kind of issue there.

34:02 - How long does it take?

34:03 If you say you've got to move 50% of the traffic out of US East, Virginia, to somewhere else, is that hours, minutes?

34:13 - So the fastest we have done is, So on average, it takes about seven minutes to do all of that.

34:19 And that was our target.

34:20 So when I first joined, I was given as a target.

34:22 It used to be around 45 minutes at the time.

34:24 And we built some, you know, interesting things to make it possible to run it inside seven minutes.

34:29 But the fastest we've done is like around five minutes in like an emergency where, you know, oh God, the entire region is tanked and people in the US are not happy about this.

34:38 Let's, let's move as fast as we can.

34:40 We can do it in five minutes.

34:41 Doesn't happen often, but you know, when it happens, especially, you know, when AWS Virginia goes down because a quarter of the internet stops working.

34:50 Sure.

34:52 But it's not just AWS that goes down.

34:54 Sometimes sometimes we shoot ourselves in the foot.

34:58 One of the interesting things to make sure that we release software that is safe is we do something called regionally staggered releases.

35:05 And so when a new software or when a new version gets released, since it's like hundreds of microservices that are running inside of Netflix to make it all possible, every service will deploy.

35:15 and when they start to deploy, they deploy it into a single region, wait about like five to ten minutes to make sure that nothing bad has happened, and then they proceed to the next one and then the next one.

35:24 And so when they release it to the first region, they can either, if they find out that it's bad, they can either quickly roll it back, or we could just evacuate out of that region, because we can do that in like under seven minutes.

35:36 And so if the rollback takes longer than seven minutes, then a call will be made by the core team, which will say, "Let's evacuate out.

35:43 we haven't figured out what the problem is.

35:45 So and then, you know, we evacuate and then we'll debug, you know, oh, which service did a release and what do we need to roll back and so on.

35:53 Because there are like hundreds of services that are simultaneously releasing at the same time.

35:57 So it's like quickly trying to identify which service that we need to roll back can sometimes be tricky.

36:02 So we have used failovers for that as well.

36:04 Yeah, so it's not just AWS's fault.

36:06 Yeah, sure.

36:07 And I don't mean to pick on AWS, because all these data centers go down.

36:11 The difference is when AWS goes down, it's like the internet goes down, you know, it's like the observability of it.

36:17 So why?

36:18 Cause so much runs on there.

36:20 It's like that in CloudFlare when they go down to you're like, Oh, I see everything's broken.

36:24 Okay.

36:24 Yeah.

36:25 And when, when sites go down in production, even for places way smaller than Netflix, it's really stressful and you might make it worse by trying to fix it.

36:34 So the ability to just go, let's buy ourselves some time to figure this out and just get everyone out and then we're going to look at it and then we'll we can bring them back.

36:41 That's pretty cool.

36:42 You did write an article called how Netflix does failovers in seven minutes flat, which I'll put in the show notes so people can read more about that if they want.

36:50 - Thanks.

36:51 - So this demand engineering side, talk about obviously tools are primarily built in Python there.

36:58 You got some NumPy and SciPy and even the B Python shell.

37:02 Tell us about some of the Python stuff going on here.

37:04 - Before I joined Netflix, like when I actually first started learning Python, I loved the REPL, but I always felt like the REPL did not have auto-completion in it.

37:13 And that, like, BPython is an alternate REPL for Python that provides you with, like, auto-completion and syntax highlighting and all that stuff.

37:21 So I am a huge fan of BPython.

37:24 One of the things that we have done, like, demand engineering specifically, is, you know, we get paged and we have to go in and try and rescue our traffic out of that region into the other two regions.

37:34 And sometimes our software itself will not work because if an entire region is down, let's say it's because of a network connectivity issue or something, then the things that we call out to in order to make these, you know, changes to scale up the other regions and like evacuate and make DNS changes or whatever, that itself might be broken.

37:52 And when that's broken, like we're literally SSH into the box and we will open up like a shell, Python shell, and do whatever we need to do.

38:01 that has not happened in like the last four years, I would say, but six years ago, yeah, that was a thing that we used to do.

38:07 And I wanted to call out bPython specifically in this particular case because it was so much more useful than trying to remember, "Oh, I remember I wrote this function. What is it?" Instead of opening my IDE to try to find out what that function is, I just import the module and then I do the module.

38:21 And it lists me all the functions.

38:23 And I could invoke it, and yeah, it's such a time saver.

38:26 Yeah, Python REPL is cool, but it leaves a lot to be desired in terms of history or even if you want to edit a function that is five lines long, it's hard to go through.

38:39 >> It becomes cumbersome.

38:39 >> Another one is PT Python that I'm also a fan of that one.

38:44 >> Yes.

38:44 >> They're the same category.

38:46 >> Yeah. Prompt Toolkit, the one that powered PT Python written by Jonathan Slenders actually, and it's a fantastic library.

38:54 Kudos to Jonathan for doing that.

38:57 It's a fantastic library.

38:58 Yeah.

38:59 Awesome.

38:59 So are you, you got a particular enhancement there for your, your REPL?

39:05 I'm not like that big of a user of REPL.

39:07 In the terminal, we do like, you know, ask questions for generating new projects, et cetera.

39:12 I'm much more of a PyCharm user myself.

39:14 Like I go in there over there.

39:16 As you bring that up, you know, one of the really nice Python REPLs is the, what I guess it's called probably the Python console in PyCharm, right?

39:23 Because if you go to that and you get the Python REPL, but you get PyCharm's auto-complete and type consistency, and it automatically modifies the path to import your project.

39:33 So yeah, you got one in there.

39:34 - Yeah.

39:35 - That one's yours, huh?

39:36 All right, let's see the core team, alerting and statistical work.

39:42 What's this one about?

39:43 - Core team is our frontline SRE.

39:44 So demand team is like building tools that the core team will leverage to get us out of trouble.

39:50 So core team is the one that anytime there is, like they monitor a lot of metrics, not just streaming metrics, but also things like error rates between services that are happening and how many requests are successfully coming back and so forth.

40:04 They obviously use Python to kind of keep tabs on, like obviously a person can't be sitting in front of a dashboard, just monitoring it themselves.

40:11 And so they use quite a bit of Python to analyze the data from all of the hundreds of microservices and between them, the inter-process communication that actually happens and the metrics that come through and so forth.

40:21 So they use Python for alerting.

40:23 And so actually they use the monitoring, the next section that's right there is monitoring, alerting, and auto-remediation.

40:30 We have an internal observability organization that has built our own time series database that's not in Python, but it's open source, called Atlas.

40:39 And that uses, that collects all of the time series data from all of these services, and then they try and do alerting and remediation, auto-remediation.

40:48 So when a particular alert condition is met, you can run a small Python script inside of a framework called Winston, that's again internal, that allows you to do more complicated things.

40:59 So for instance, if you have like this one bad instance in like this collection of 20 instances, instead of a user going and terminating that instance, you can now automate that by writing a script that says, you know, automatically restart that instance or just kill it, and so on.

41:15 That's our--

41:16 Cool, that's part of the auto remediation of it.

41:18 And it says it's built on G-Unicorn, Flask, and Flask Rest Plus.

41:24 I'm familiar with the first batch, but the Flask Rest Plus, this is an extension for Flask that adds support for quickly building REST APIs.

41:33 Okay, interesting.

41:34 Because Flask itself already does REST.

41:36 So REST Plus, I think, provides things like Swagger endpoints automatically, so you could try it out on the browser and so on.

41:44 I have not used Flask Rest Plus myself, but that team uses it quite a bit.

41:48 - Yeah, cool.

41:49 Probably some of the, some similarities to like what FastAPI kind of brings in addition to standard Flask, I'd imagine.

41:56 - Exactly, yeah, yeah.

41:57 - We use more FastAPI nowadays.

42:00 - Yes. - Oh yeah?

42:01 - Yeah, we're using quite a bit of FastAPI in most of our internal tools actually.

42:05 - Yeah, just from reading through this article, it sounds like there's a lot of APIs and just a lot of connectivity through, there's probably a lot of JSON going around Netflix.

42:14 - Yes, yeah, so some of the heavier data stuff or like high streaming services, like that are in the streaming path are all typically written in Java.

42:24 And they use for enterprise communication, they use gRPC and that uses Protobuf to communicate and so forth.

42:30 But most of our internal tools that are written in Python, either use JSON directly, or sometimes they need to talk to a gRPC service.

42:38 And so they use Python gRPC to get the work done.

42:41 - Cool.

42:42 Maybe we'll have some time to come back to gRPC.

42:44 I'm not sure.

42:45 We got a lot of things to talk about here.

42:47 - Yeah, we don't have to go through every section here.

42:49 - No, I know, there's just so many interesting angles, right?

42:53 And so the next one here is information security, which obviously, if you just put anything on the internet and just tail the log of it, within minutes, you'll see a request for wpadmin.php.

43:05 Like it's already just constantly being, people are just after it, right?

43:12 One of the things you have here that looks interesting is security monkey written in Python, which is I guess like chaos monkey, but.

43:20 - It is kind of like chaos monkey.

43:22 I think this project may have been archived or it's not actively in development.

43:28 It tries to scan our infrastructure for unsafe practices.

43:31 That's like an umbrella term to try to add like whatever is like good practices that should exist from the security standpoint.

43:39 - Yeah, okay, so people can check it out.

43:41 Maybe it's not totally active anymore, but they can take it as inspiration, right?

43:45 - Yeah.

43:46 Like back in 2019, it was one of our most active projects that have happened.

43:49 (laughing)

43:50 2023 is a different world.

43:52 - It is a different world.

43:53 And one of the areas in which 2023 is a different world is really the AI/ML side.

43:59 And you all are doing a lot of stuff with personalization algorithms, recommendation engines, machine learning.

44:06 And you talked about Metaflow, which is now available.

44:09 - Yeah, the personalization one, I think we've just mentioned a bunch of things that we use from the open-source world here.

44:14 So I think XGBoost is a library that does machine learning.

44:19 So personally, I am not in this field.

44:22 So I just went and interviewed the team and asked them to give me a blurb.

44:26 So I wouldn't be able to talk in detail about any of the personalization stuff here.

44:30 But this is just a showcase of how much this team relies on Python and the open-source ecosystem that comes with Python in general.

44:40 So it's like heavy users of Panda, TensorFlow, and PyTorch and so on.

44:45 Yeah.

44:46 So, Aron, let me ask you, is it your, both of your team supports Python developers and Python applications indirectly in that way, but is it different to support the data scientists than it is to support, say, software developers?

45:00 Like, do you have to think about that differently? How so?

45:03 Yes, yes, we do have like a team that is dedicated to supporting all the data scientists.

45:08 And we're like the team that supports the team who supports for data science.

45:11 And shit.

45:12 Right now.

45:13 So, yeah, we're definitely like now in 2023, you know, betting more on Python.

45:19 Before Python was more like, if it makes sense for you because of freedom and responsibility, if it makes sense to use Python in your team, you use Python, you know, and now we're trying to provide basically like a better paced path.

45:31 This is me and MG with this new team that we started.

45:35 And we're trying to kind of enhance this space path better and better for all these teams.

45:40 And we, you know, it's hard to know all the specifics in every single team, but we're trying to provide them with as good practices and automation as possible.

45:51 So I think you asked, like, how is it different supporting one versus the other?

45:55 I think we built, so when we first started the team, we met with 10 different organizations inside of Netflix to find out how they use Python, and we found that there were some commonalities, but the way, for instance, algorithms engineering uses Python is very different from the way a SRE team uses Python, and it's very, very different from how our animation studio uses Python.

46:16 So our VFX animation uses Python in a way where once they start...

46:22 This is apparently common in all of the movie industry, which is once they start a particular project, whatever they have chosen at the start of that project, they will stick to it until that project is completed.

46:32 So if that movie takes two years to finish, you cannot upgrade anything inside of that particular hermetically sealed environment, development environment that you have.

46:42 So that is very different from like another, like a machine learning person who's interested in like, you know, I just want to write my algorithm.

46:48 Like I don't care about how pip works or like how I pip install.

46:52 Like I don't want to worry about like virtual environments and things like that.

46:55 Whereas a person who is writing internal tools, they want to own the entire tool chain.

47:00 It's like, I not only want to maintain virtual environment, I also want this thing to work with a front-end that is written in React.

47:07 And so I would like you to be able to make it possible to do NPM and pip to coexist and live together.

47:14 That's not a hard thing to do, but it's one of those things where it's like, if I'm trying to solve a problem, let's say I'm bringing in Python dependency locking as a mechanism to help these web developers, because they don't want to automatically upgrade any time they build their system and suddenly break in production.

47:32 Now, that might be completely useless for someone who's working in machine learning.

47:36 And so they're like, "Why are you solving that problem?

47:38 "This, you bringing locking to packaging "doesn't help me in any way.

47:43 "Why are you wasting your time?" And so we had to sort of build personas for various ways in which Python is used inside of Netflix so that when we are working on a particular feature, we can tell them, "We are now targeting this persona.

47:55 We are working towards making life easy for animation engineers.

47:59 So if it doesn't work for you, that's fine.

48:01 You know, that's fine. We will get to you.

48:03 It's just that our persona that we're targeting right now is not yours.

48:06 So that's how it's different, I'd say.

48:08 Yeah.

48:10 Data scientists have a lot less legacy code that's just still cranking along because a lot of times once they get, they discover an insight, they don't need to run it again, right?

48:18 Or the algorithms are changing so fast, they can just, Well, now we're using L large language models instead of whatever, you know?

48:26 Yeah. There you go. Yeah.

48:27 Yeah. Whereas once you get a web app running, you might not touch that thing if it doesn't need touching. Right.

48:32 So you just exactly stability is what you need there.

48:36 So anything else you want to call out out of this article before we move on?

48:40 We don't have time left, honestly, but No, no, I think this was a great article, but yeah, a few things.

48:48 But with regard to this, let's just leave people with this idea that we only touched on a small part of what is laid out here and all the projects and all the ways in which it's being used. So certainly check out the article just called Python at Netflix. It'll put in the show notes.

49:03 It's hard to cover it all in just one hour.

49:05 It sure is. It sure is. So let's maybe talk for a minute here about this project that you're involved with Soren called Portable Python.

49:15 You know, I not long ago had Nathaniel Smith on to talk about PEP 711, distributing Python binaries and maybe treating like CPython runtimes as wheels almost.

49:27 And you guys also have a way that you've been using for a while internally to package up Python into something that can run as well called portable Python, which is open source.

49:37 You want to talk a bit about that?

49:38 Yes, that is indeed PEP 711.

49:41 I discovered it by listening to your podcast.

49:44 Right around Python, I think, yes, it would be very interesting to see if we could partner up once this is.

49:50 So Portable Python is, we want to provide Python, of course, to all Python developers inside, right?

49:57 Like you can always grab your own Python via all kinds of ways, right?

50:01 PyEnv, Docker image, et cetera.

50:03 But we also provide builds of Python inside to be used internally.

50:08 So Portable Python is trying to solve just that.

50:11 Well, one particular issue, how do you go and distribute Python on laptops?

50:16 So the end goal is we want to provide a tarball, just like that Pep says, like a wheel, a tarball that you can download and drop somewhere, typically in a user's own folder, tilde slash, you know, myPythons, and we want it to work from there.

50:34 So you could use PyEnv for that, but with PyEnv, you need to wait for it to build.

50:39 And we want to basically build it ahead of time and as soon as it's available and, you know, make it available internally.

50:46 So what Portable Python is designed to do is to do such a build, which we call portable, and drop it in our factory and then our tooling can just go fetch that real quick, unzip, and it's ready to go.

50:58 So your tooling, the Portable Python tooling basically says I'm on this platform.

51:03 So I'm on macOS and it's Apple Silicon.

51:06 So here's the, and they want this version of Python.

51:09 So that means this binary, let's go grab it.

51:12 Right.

51:12 Right.

51:13 So portable Python is invoked by our building machinery.

51:16 There is a one worker on macOS, x86, macOS M1, Linux, x86, Linux, ARM 64.

51:25 And there's some external internal tooling that kind of detects that the new open source version is available using portable Python.

51:33 So Portable Python can report you what is the latest, 3.11, for example, by looking at the ftp.python.org, basically.

51:41 Okay, so the latest is 3.11.3.

51:43 Let's see, do we have it internally? No.

51:45 Okay, let's kick off a build.

51:46 So we kick off one build for M1, one build for Linux, etc.

51:51 And with Portable Python, with its configuration, we say we want OpenSSL, that version, we want SQLite, that version, and Portable Python goes ahead and does the build, produces a tarball, We take that more and publish it.

52:05 That's interesting.

52:06 So you can control a little bit, some of the internals as well, like the open SSL version and SQLite version, maybe a bit more carefully.

52:13 Yes.

52:13 Yes.

52:13 And since it's written in Python, then we met like, it's able to also inspect, say, any Python, like you could run portable Python, inspect path to this installation and it will tell you, okay, it has a sound, that version SQLite, that version it does it use like homebrew a library of shared libraries or, or what.

52:32 it can report on that.

52:34 And, oh yeah, it generates a thing that I find very important, like a little file that says, it's called manifest.yaml.

52:42 So every time it builds anything, it generates that manifest.yaml where it says, well, I did a build with --LTO optimization--

52:51 like it says everything that was used to inform what the build had, and which worker it ran on, what time, what was the platform, like a little bit of metadata, which sometimes you could even see things like what C compiler optimization flags were enabled when you created it, for example, right?

53:08 Yes.

53:09 Okay.

53:09 And there is one additional thing.

53:11 So portable Python does not install Python on your system for you.

53:15 So it, it is a builder, so it builds them and produces tarballs that can be used in a standalone manner.

53:21 And so if you want to bring Python onto your system, you just download the tarball from our internal artifact storage and then expand it.

53:29 And that we have another tool that automatically does that.

53:32 And so when somebody bootstraps a brand new Python project and they say, I would like to use 3.11.3, which 3.11.4, I think that got released yesterday, then we will already have a binary ready for them that is in the artifactory, in our internal artifactory.

53:47 And when they run their build for the very first time, it will bring the appropriate Python version that they have specified in either pyproject.toml or in their cox.ini or somewhere.

53:57 and it will bring that appropriate Python, install it or expand it in a known location, and it will use that for building their project and so forth.

54:05 So it's a way to make it easy for people to not have to manage their Python on their laptop individually.

54:12 And also, this can build Python with a specific prefix.

54:16 So on servers, on our internal servers, what we do is we install Python in a specific location.

54:21 Like, we always put it inside, let's say, for example, /app/python, for example.

54:26 it will build it in a way that it makes it easy for Debian to be built.

54:29 And when you install the Debian, it will put the Python in a specific location.

54:33 And also, it has other benefits, such as it tries to make the Python binary as small as possible, because we're trying to deploy it out to like hundreds of thousands or 100,000 servers.

54:45 So we would try to reduce the amount that we need to put on that server.

54:50 It does that, the final product that Zorin checked yesterday, I believe it was only 50 megabytes compared to what other like PI and other things are producing, which was 200 megabytes.

55:00 It does it by a few tricks, like it removes the test folder, because, you know, once you have built it, like, you know, having the test folder as part of your final artifact makes no sense.

55:08 That was like a hundred megabytes savings right there.

55:10 So things like that, some optimizations that we do that is custom for our work.

55:14 Yeah, that's a really interesting system.

55:16 I think there's increasing momentum around having some kind of tool that is outside of Python for managing Python, right?

55:25 So far, primarily what we've had is things like Pip, PipX, so when you have a project called Piccoli, it's all about like, okay, you have Python, now how do you go forward?

55:35 But I think a lot of people are realizing like, wait, that assumption that I have Python, now what?

55:39 Is not a great assumption, right?

55:41 And so people are starting to look at tools like RustUp, which actually is kind of like Pip, but it brings Rust also over.

55:48 Yeah, so we're gonna see something there, I think.

55:50 I don't know what it is, but it'll be interesting.

55:52 Yeah. Did you see the one Rye?

55:54 Rye is the package manager that Armin wrote.

55:57 Yeah, from Armin Roenicker.

56:00 Yeah.

56:00 He, that brings Python for you.

56:02 That he, his inspiration is from Rust up apparently.

56:05 So Rye is actually written in Rust.

56:07 And it does all the things that Poetry and PDM and other package managers does.

56:12 But in addition to that, it also brings Python for you.

56:15 And it's using a different Python called standalone Python or something that you already had a link for, I forgot, but it brings Python from there to expand it into your system.

56:25 Yeah, Python build standalone, that's the project that it uses.

56:28 Yeah, I've heard of that. I haven't done anything with it, but it looks interesting.

56:31 Yeah.

56:32 All right, I think we have time. We're getting short on time here.

56:34 I think we have time for one more really quick thing, something that you're participating in, Amjith.

56:39 I'm sure, and I don't know if you are as well, but command line database clients with autocomplete and syntax highlighting.

56:45 Tell us about this. This looks cool.

56:47 Yeah, this is just my personal project that I wrote before.

56:51 This was a while back, but the idea is at the time I was trying to learn Postgres and I didn't know how to do like, I was, I was learning Postgres and I was using PSQL to do this.

57:01 And every time I, I come to like a table, I'd be like, you know, Oh, what columns were there?

57:06 I forgot the exact name of the column and I tried to find it and so forth.

57:10 And so finally, you know, I just, I broke down and decided to write like a shell for, for Postgres called PGCLI that uses actually from toolkit, like the same toolkit that's used by PT Python.

57:21 - I was going to say, it looks a lot like PT Python.

57:24 It's got that Emacs mode.

57:25 - Yep.

57:26 - You've got autocomplete for basically the whole SQL language, but also the database schema that you're connected to, right?

57:32 - Yes, that is correct.

57:33 So it reads the tables and the columns in that database, and then it tries to autocomplete as part of the SQL segment.

57:39 So after a WHERE clause, it'll only suggest columns, And after a from clause, it'll only suggest tables and so on.

57:45 Wow.

57:46 So after PGCLI, people wanted something for MySQL.

57:50 So I created MyCLI and then Microsoft came over and said, like, we would like to fork PGCLI to make one for Microsoft's MS SQL server.

57:58 So they did that themselves.

58:00 Like we didn't, so they, they took PGCLI source code and then they created that.

58:03 And then I, another person created light CLI, which is for SQLite.

58:07 And yeah.

58:08 And there's other things now.

58:09 I-Redis is like for a Redis client that's similar to these things, but there's a lot more, like more friendlier shells for databases in general.

58:17 - Excellent.

58:18 All right, this looks really cool, I think.

58:20 - Yeah, this has got nothing to do with Netflix.

58:22 It's mostly just like, hey, it's my personal project, and, you know, just what I do in my free time sort of a thing.

58:28 - Yeah.

58:28 Well, it looks really helpful for people because talking to databases just in your terminal, it can be tricky, right?

58:35 And having auto-complete, especially not so much, you know, the select and where people get that pretty quick, but the database schema understanding keeps you in your flow pretty well.

58:45 Right.

58:45 Yeah.

58:45 Again, inspired by B Python actually took inspiration from them.

58:49 Yeah.

58:49 Excellent.

58:50 All right.

58:50 Well, that'll be in the show notes as well.

58:52 Guys, I think that is it for time that we have today.

58:55 So I'm going to have to wrap it up with the final two questions here and recommendations.

58:59 Let's start with a PyPI project.

59:02 Not necessarily the most popular one, but something that you're like, Oh, this is awesome.

59:05 People should know about it.

59:06 Soren, got a recommendation for folks?

59:08 I'm going to say PICKLEY, go check out PICKLY.

59:11 PICKLEY, okay, so give us the elevator pitch on PICKLEY.

59:14 It's a CLI tool that allows you to install other CLI tools, very similar to PIPX in that sense.

59:21 The main difference is being that if you PICKLEY install Poetry, every time you run Poetry, it will keep itself up to date in the background.

59:29 So it will keep self-upgrading by default.

59:32 You can tell it also not to do that, but that's its main useful thing.

59:35 - Cool, so when you launch it, basically you're launching like a shim that says, "Run this," and then the background check for update, and when it exits, if there's an update, just update it.

59:44 - Yes, you can take a look at the little shell script, shell wrapper that it wraps it with.

59:49 - Yes.

59:49 - All right, Pickley, awesome.

59:50 Amjith?

59:51 - Oh, I guess I could plug again for BPython.

59:54 Like good design aesthetics, I think, yeah, it's an overall better shell than Python shell.

59:59 - Yeah.

59:59 - Oh, actually, PDB++, that's the one that I would actually recommend.

01:00:03 So if you ever use PDB, and you wish that PDB had auto-completion, PDB, it's PDB PP in PyPy.

01:00:10 You don't have to change your thing at all.

01:00:13 All you have to do is pip install PDB PP, and then any time you do a breakpoint, and it stops you there, you can do like, you know, variable dot, and it'll give you auto-completion.

01:00:23 And yeah, I don't know, I'm a huge fan of auto-completion.

01:00:26 Yeah, I was gonna say, you and I are kindred spirits.

01:00:29 I am all about the auto-completion.

01:00:31 I'm like, this tool is broken if it doesn't give me auto-complete.

01:00:33 Because it sends you into the documentation, you'll be like, Oh, I need to create one of these, client libraries.

01:00:39 What does it take?

01:00:40 Oh, star org, star star KW orgs.

01:00:42 Great.

01:00:42 Now what am I supposed to do?

01:00:43 Right?

01:00:44 Like, you know, the auto-complete it, it really makes you more productive.

01:00:49 All right.

01:00:49 And then, if you're gonna write some Python code, what editor, if you're not in the REPL, are you using?

01:00:55 Oh, for me, it's a PyCharm.

01:00:57 PyCharm, mostly, Sublime Text, and VIM if I'm messaging somewhere.

01:01:02 Excellent. And Amjit?

01:01:04 Vim all the way.

01:01:05 You know, even if I don't know how to quit it, I can restart my computer.

01:01:08 [laughter]

01:01:11 That is the source of, the endless source of jokes, you know, like, I saw some laptop, a picture of a laptop, and it was just smashed to pieces.

01:01:20 And it said, "Finally figured out how to quit Vim." [laughter]

01:01:25 >> For the longest time, actually, I had colon Q as a way to quit out of PGCLI because I, by instinct, just kept hitting colon Q and, yeah.

01:01:34 >> That's amazing.

01:01:35 All right, you guys.

01:01:36 Well, it's been great to have you on the show.

01:01:39 Thanks for being here.

01:01:40 Thanks for giving us this look at what you're all doing up over at Netflix and in your personal projects.

01:01:45 >> Yeah, thank you, Michael.

01:01:46 I just would like to mention that we have a lot of jobs at Netflix that require Python.

01:01:51 So if you are at all interested, please go to jobs.netflix.com and type in Python and you should get all of the Python job openings that are available.

01:01:58 There's a wide variety.

01:01:59 If you want to do infrastructures up, there's that.

01:02:02 If you want to do data science, there's that, right?

01:02:04 Like a lot of coolers.

01:02:05 Yes, absolutely.

01:02:06 All right.

01:02:07 Have a great day, guys.

01:02:08 Thank you.

01:02:09 Bye.

01:02:10 Bye.

01:02:11 This has been another episode of Talk Python to Me.

01:02:14 Thank you to our sponsors.

01:02:15 Be sure to check out what they're offering.

01:02:17 It really helps support the show.

01:02:20 The folks over at JetBrains encourage you to get work done with PyCharm.

01:02:24 PyCharm Professional understands complex projects across multiple languages and technologies, so you can stay productive while you're writing Python code and other code like HTML or SQL.

01:02:36 Download your free trial at talkpython.fm/donewithpycharm.

01:02:41 Influx data encourages you to try InfluxDB.

01:02:45 InfluxDB is a database purpose-built for handling time series data at a massive scale for real-time analytics.

01:02:51 Try it for free at talkpython.fm/influxdb.

01:02:53 Want to level up your Python?

01:02:57 We have one of the largest catalogs of Python video courses over at Talk Python.

01:03:01 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:03:06 And best of all, there's not a subscription in sight.

01:03:09 Check it out for yourself at training.talkpython.fm.

01:03:12 Be sure to subscribe to the show.

01:03:13 your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play, and the Direct RSS feed at /rss on talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

01:03:42 Now, get out there and write some Python code.

01:03:44 [MUSIC]

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon