« Return to show page
Transcript for Episode #60:
Scaling Python to 1000's of cores with Ufora
You've heard me talk previously about scaling Python and Python performance on this show. But on this episode I'm bringing you a very interesting project pushing the upper bound of Python performance for a certain class of applications.
You'll meet Braxton McKee from Ufora. They have developed an entirely new Python runtime that is focused on horizontally scaling Python applications across 1000's of CPU cores and even GPUs. They describe it as "compiled, automatically parallel Python for data science".
Let's dig into it on Talk Python To Me, episode 60, recorded May 2nd, 2016.
Welcome to Talk Python To Me, a weekly podcast on Python- the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython.
This episode is brought to you by Snap CI and Opbeat, thank them for supporting this show on Twitter via @snap_ci and @opbeat.
1:26 Michael: Braxton, welcome to the show.
1:27 Braxton: Thanks Michael, I'm excited to be here.
1:29 Michael: yeah, we are going to take Python to a whole another level of scaleability here with sort of computational Python which is where Python has actually struggled the most in parallelism, so I'm really excited to talk about that.
1:40 Braxton: Well, it's a complicated topic, I spent a lot of years on and it's actually really fun, so I think we'll have a lot of good stuff to talk about.
1:49 Michael: Yeah, I don't know how many people have heard of your project, but I am like really excited to share with them because it sounds super promising, but before we get into all of this, let's start at the beginning, what's your story, how did you get into Python and programming?
1:59 Braxton: Well, I've been programming for pretty much my entire life, I think I started when I was eight years old and I'm like 36 now, so it's kind of the way I think I studied math in school, but writing code has always been the way I think about approaching problems, so like when I was doing math assignment, if I wouldn't understand what was going on I would usually go write some kind of program to try to understand what was happening, do a numeric integral by hand or whatever. And then when I left college I started working in the hedgefund industry which was a great experience as a young person, I got to write a lot of code and to solve really interesting problems. And it was really there that my sort of desire to work on tools came around because I was just consistently frustrated with how much effort it took me to actually implement these solutions I would talk with my colleagues about some computation that we wanted to do and we could describe in principal what it was and you know, 5 minutes and then they would ask all right, how long is this going to take, all right come back to me in three months, because you know, getting things to actually scale up is really challenging.
3:08 I started writing Python code actually after I left the hedge fund industry which was in 2008, so Python really got big in 2005, 2006, 2007, and it got to my attention then when I was doing some speech recognition work and I spent a lot of time doing the same thing that most people who are frustrated with Python's performance and 3:32 which is you try and figure out a little bit of it it's slow and stick that in C++ and then write wrappers for that and then try to do all the rest of your work in Python. And a lot of the things that I'd been interested in recently and I think a lot of the pretty awesome projects that are out there for speeding Python apps are essentially trying to address that same problem, and think how can we get more and more of what you are doing to live in Python, because you pick Python for a reason, and be able to just use the tool the way you want not have to pay a performance penalty.
4:02 Michael: Yeah. You know, I've noticed in the industry there seems to be more and more attention given to the speed of Python. I know the Numpy guys have been around for a while, the PyPy guys have been around for a while, really working on this issue, but it seems like more and more people are trying different angles of approach. And so in Python 3.6 they have some really interesting stuff to speed up method invocation in Python which is one of the slower bits, there is the Pyjion stuff coming from Microsoft, there is a lot of really interesting work and so yours kind of lands in that space as well, right?
4:37 Brexton: Yeah, absolutely. I think that what's going on is Python is a really lovely language to program as an implementer, it's got a lot of weird corner cases and it's a very hard thing for computers to reason about and the core of what makes it possible for systems to speed up code is their ability to reason about the code that you've written, so if you look at like C++ compiler, it does all of these fancy tricks inside to transform the code into something that's faster and it does that because it can make really strong assumptions about what you are doing that it can apply to the machine language that it's generating, and that's really hard to do in Python because Python gives you so much flexibility and so I think that the reason you are seeing so many different approaches is because there is a million different ways to look at the problem and say ok we can look at what people have written and start to identify particular things that we want to speed up and make- I think this is more common case- start to put some constraints around what we are going to tell people is going to be fast and as soon as you put constraints, now you give these optimizers something to work with, and I bet you if you look at the internals of all of these different things, each one of them would have a different way of looking at Python and picking a subset of it saying this is the step that we are going to make fast. Like the Numpy, if you look at it originally, the whole point of Numpy why it's fast is because you can really, if you look at its roots, really just stick float an-
5:59 right, it's not trying to be fast for arbitrary Python, it's trying to make Python fast for sort of matrix algebra kind of stuff, and adding that constraint is what allows it to be fast.
6:15 Michael: Yeah. that's interesting, I think a lot of those things I mentioned, do add constraints, like PyPy sort of gave up the C module integration, in order to get its speed, whereas the Pyjion stuff from Microsoft they are trying to make sure they are 100% compatible but that outs a different kind of pressure and makes it harder on them. The Numpy stuff like you said is focused basically on Matrix algebra, and matrix multiplication stuff, yeah?
6:39 Brexton: Yeah, absolutely, and I think that guys like PyPy guys, I have a huge amount of respect for what they are doing,t hey are trying to solve a really brutal problem, which is let's make anything written in Python orders of magnitude faster 6:52 for most of my work, I have focused on slightly restricted problem which is to say let's try and make Python programs that are written in this sort of purely functional style faster, so if you, our original goal was not just to get a compiler to be faster, but also to get code to be able to scale, right, to be able to write idiomatic simple Python get something that can use thousands of cores, effectively, without the programmer having to really think about how that is happening, and one of the crucial ways that you can do that is you say ok, I am going to make it if you make a list of integers you can't change it again. That seems like a pretty trivial change but it's the-
7:35 for enormous amount of different optimizations that we can do inside of the program, and if you look at systems like Hadoop and Spark, they have had as a core tendency this idea that the data set-
7:45 are immutable and you make transformations on their data sets, and that immutability is sort of the key to them being able to do things in a distributed context, we can take that same idea and apply it not just at the outermost level of the operations were allowing, but we can let that filter always through the program and you can take those that same constrain and apply it not just to make things distribute, but also to make the compile machine code faster because if I make a list of integers I can now assume that it's going to be the list of integers forever because you can't change it, that makes me able to generate much faster machine code if I am from the compiler because I can start to make really strong assumptions about what's going on inside it.
8:28 Michael: Ok, that's awesome, and you said a few things that I am sure have peaked people's interest Python, thousands of cores, and machine instructions to compile the code, so maybe we can take a step back just for a minute and tell us about your project, Ufora and what you guys are up to there and then we can dig into the details.
8:43 Brexton: Yeah, absolutely. So, Ufora is a platform for executing Python code, at 8:50 scale and the basic idea is that you should be able to write simple idiomatic Python that expresses what you want and get the same performance that you get not just if you had rewritten the code in C but if you had rewritten the code in C and used threads and message path that can generate a parallel implementation, you should be able to do this directly from the Python source code without making any modifications to it, and having the compiler really see everything going on inside of your code and use that to generate the fast form. So, we've been working on this for about five years, we have a totally separate runtime and compiler framework that we've developed to do this, and we actually think of the languages almost like front end so we build a VM and I mean VM like in the sense of like the Java virtual machine or Microsoft clr and sort of has enough structure for it to be able to reason about programs in this context and make them fast. And then what we've basically built is a cross compiler from Python down into that virtual machine, and we are planning on doing other languages, at some point in the future but there is so much goodness in Python that we really wanted to start there because that was what our roots were. The project is a 100% open source so you can just go get it on github and the Python front end to this thing has really only been around for about 9 months, it's still work in progress but it's evolving pretty rapidly and you know, we've been able to get cases where you can take relatively naive looking Python programs dump this into this thing and get them to work actually pretty efficiently on thousands of cores, our biggest insolations have gone up too, six or seven thousand cores running on Amazon web services. And actually be able to use those things efficiently without really having to think about it is pretty fun.
10:44 Michael: Yeah, that's really amazing. Distributed sort of good computing type frameworks are not totally new but I think two of the things that are super interesting about what you have told me maybe the first and most interesting thing is that you don't have to write special distributed code, right, you just write regular code and the system reasons about how to parallelize it, is that correct?
11:08 Brexton: Right, absolutely, and that comes as I mentioned from this idea that we put some constraints on the language, so the idea is it's immutable Python, so you make a list and you don't modify the list, you make a new list by writing a list comprehension that scans over the old one and makes a transformation of it, because of that constraint, the system can now see ok like you are doing all of these operations on these data structures and I can see you are doing this function now but you are going to call this other function you know, next and it can reason about the flow of data inside of that and say ok, these things are independent of each other I can schedule them separately if I want to that freedom then gives the runtime the ability to say ok, now it's time to solve this problem of what's the most efficient way to lay things out, the end result is if you say ok, here is a list comprehension, or I've got to divide and conquer algorithm where I take something, I scan over it and I do some computation cut in half and then recurse, it can see that that's what you are doing and parallelize that and it turns out that if you look at regular people's code there is all kinds of opportunities to parallelize it that they probably weren't thinking about, they would only though about it if they really hit a performance bottleneck, but it's in there like if you are computing the-
12:28 of two time series, you end up having three different dot products that you need to do, they can all be done independently, of each other and if you write it out the naive way, the working compilers can actually see that that parallelism and if those tasks are big enough, it will split them into 3 and actually do them do separate thread for you and you don't have to actually explicitly say that. And, you know, in my thinking there has been a continual movement in computing to raise the level of abstraction at which we work, so like we used to write in assembler, I guess 40 years ago when I was born and then they came up with C and Fortran and this really made it possible to write more complicated programs because more and more of the details were abstracted away, and if you look at Python, Python is like the natural extension of that idea on single threaded computing as far as it can go, you can be so concise and clear in Python because so many low level details have been extracted away from you, but we are still not there in parallel computing, like people still spend huge amount of time writing, custom code to do parallel compute, I think the reason that technologies like Hadoop and Spark have been so successful is that within certain problem domain they can make a lot of that complexity go away and I wanted to take the step further and say let's not just do it for specific patterns of computing that produce jobs or whatever, but like let's do this for all of your code, everything that you write, there should be enough structure and it and make it fast.
13:57 Michael: That's great. Maybe you could just tell folks who are not familiar with Spark and Hadoop how they work and then you know, like what you mean by taking a little farther to general code with yours, because I am sure a lot of people know about Hadoop, but not everyone.
14:10 Brexton: Sure. So, Hadoop began out of Yahoo as the technology they developed to index the web, and you know, the core component of it is you take your data set and you split it across the bunch of machines, and you are given two basic primitive, you are given map and reduce, and the basic idea is that if you think hard enough you can fit any big parallel compute job into that pattern, mappers basically like you to take every element in your data set and run a little function on it to get a new element and reduce allows you to take collections of elements and jam them together, and for lot of big data parallel applications this turns out to be a very natural way to express things and the reason these technologies have been successful is because if you can figure out how to fit what you are doing into map reduce, it's actually a very easy way to program and the infrastructure that Hadoop ecosystem can give you all these benefits so Hadoop is very fault tolerant. You can turn the power off on any one of the machines or a rack of machines running Hadoop and because of the programming model, the Hadoop the central scheduler that is making decisions about what to do can react to that and still ensure that your calculation completes because it can schedule the map and reduce jobs that were running on that machine on another machine. As an example, so if you work within this framework, you suddenly make a bunch of problems that are sort of endemic to parallel computing go away but the problem is that map and reduce and the other primitives that the community added to those systems are still relatively clumsy ways of describing computations, there is a bunch of things that fit really naturally into them, so page rank and a lot of the things that people do with web scale data, fit very naturally, but scientific computing, numerical computing, things where the calculations don't just fall naturally into this very embarrassingly parallel structure don't fit that well, and it can take some real thinking to 16:15 with your application into that model. So when I talk about wanting to be able to do this from the arbitrary code what I really mean is that you should be able to write code the way you are thinking about solving your problem and the infrastructure should be able to generate that parallel implementation from your code as opposed to you saying look, here is the place in my code where there is a million little tasks each one of which could be done in parallel go do them in parallel which is what you are essentially doing when you break things down to fit them into the map reduce model, which I think is pretty much what 90% of people doing large scale computing right now are doing, at least the ones that I am talking to.
16:53 Michael: Yeah, you really have to change the way you approach to problems, to fit into map reduce, right.
16:57 Brexton: Yeah, and a lot of people are not comfortable with that, like my favorite example of that is like nested parallelism so like imagine you have a data set that has some tree structure to it, so you say well I've got a bunch of companies and for each company I've got a bunch of transactions and now you've kind of got two layers to your hierarchy and if you say I want to do something in parallel first over the companies and then within each company in parallel again, overall of the transactions that's like a very unnatural thing to say in map reduce because which thing are you mapping over, are you mapping over the companies or are you mapping over the records, you need to kind of jury-rig it in order to make it work, but if I told you to do that in Python, you would just have to loose, it would be totally obvious to you how to do it, and the only reason you would ever go to the map reduce land is because you really needed that performance. And, at the end of the day, this is in fact this original frustration I had like it was almost ten years ago when I left the hedgfund industry thinking about this stuff, thinking of myself why is it so hard to the system to see that I have two loops and that there is parallelism in both of them and just make it parallel, why am I having to like re18:08 everything in order to fit the model that's been given to me. And that turns out to be an easy thing to say and hard thing to implement, that's what we've been doing.
18:15 Michael: Yeah, that's really amazing, so maybe it's worth digging into that a little bit, you said in the beginning you thought that you would have to use a different language other than Python, right?
18:25 Brexton: Yeah, and this has been a really interesting evolution for us so that mutability idea, idea that like if I make a data structure I can't change, is at the heart of pretty much any system that scales and the reason is because it gets rid of all of these concurrency issues like if the data element changes on one machine and you allow that then if you have a distributive system you have to make sure that that right operation propagates to everybody else who needs it, and that can create real serious problems and if you don't handle it correctly, you end up with programs going unstable or really slow because there is a huge amount of locking, but if you look at Python- Python as a language is a very immutable thing, like when you make a class, you start with an empty class and you start by pushing method into it, right, and the way people build up lists op operations as they make lists that are empty and they keep appending things to them, so when I was first coming to this problem thinking to myself, ok I want everything that's good in Python-
19:28 this ability to just pass functions around and not think about typing, you know, all of the good things about Python's object model, but I don't want that mutability, I my first thought was ok this is going to be completely impossible to do with Vanilla Python. I also thought to myself well, actually there is a bunch of features in other languages that are purely functional, things like Ocaml that are really nice features that Ptyhon doesn't have and we might actually need those things when we take away the mutability.
19:57 Michael: Yes, that's interesting because Python is super mutable it's not just the data itself is mutable, but the structures that make up the execution of your program, like the classes and functions themselves are mutable, right, so this is a really hard problem to solve with Python.
20:14 Brexton: If you actually look at good Python programs, like they go through two phases, they go through one phase where they set everything up, and then they go through another phase where you don't change things, like the worst bugs I've ever seen in life Ptyhon programs, are the ones where people are changing the methods on classes, like as they are running their program, and so now, you've got some class and it's got some function on it and the meaning of that function changes as the program is running, it's impossible for a human being to reason about those programs, so like large scale stuff people don't do that, people actually tend the bigger their program gets the more closely it adheres the actual tot he immutable style because that actually makes it possible for human beings to reason about the code, and see- if a human being can't reason about your program, the chances that a computer can reason about it are pretty low, at least right now, maybe in two or three more years that won't be true21:07 But they haven't figured that one out yet.
21:07 Michael: That's right.
Continuous delivery isn't just a buzzword, it's a shift in productivity that will help your whole team become more efficient. With Snap Ci's continuous delivery tool you can test, debug and deploy your code quickly and reliably. Get your product in the hands of your users, faster and deploy from just about anywhere at any time.
And did you know that Thoughtworks literally wrote the book on continuous integration and continuous delivery? Connect Snap to your GitHub repo and they will build and run your first pipeline automagically.
Thank Snap CI for sponsoring this show by trying them for free at snap.ci/talkpython
22:06 Brexton: Yeah, immutability is like- my initial reaction was it was going to be impossible and 22:13 new language constructs, but after we built the language out I realized wow, this is basically Python, like it didn't actually change that much, and what we ended up with is basically our language is now bitcode, like actually mapping Python down into our language which is called Fora and say Fora is now bitcode for all of these other stuff, it's like a language for describing scaleable parallel scripting, that-
22:42 for needing to be a language that you would program in every day, because we have that, we have Python, and so it's actually a fairly nice transition for us. The reason I choose to do this, it came out of conversation I had with someone who has done a lot of modified Python, so if you talked to me five years ago, I would have said look, no one wants a language that has weird messed up semantics, they want Python to be Python and it's going to give them something different, it should be its own very pure thing, but I talked to this guy named Mike Dubno he was the CTO of Goldman Sachs for many years, he is a godfather of new language stuff on Wall Street, the team that work for him at Goldman in the 1980s, they have all gone to all the other financial institutions in New York and all reimplemented really interesting large scale computation systems, usually that have a language component to them.
23:32 Michael: Yeah, there is a lot of interesting language work happening on Wall Street, a friend of mine just got hired at a startup to build a new language, to express ideas just slightly differently, you know, and so that's cool.
23:43 Brexton: Yeah, no, and I bet you if you trace it back it traces back to Mike Dubno, at some level. He was the one who convinced me that actually Python with slightly modified semantics is actually something people were totally comfortable with and his evidence was that he built a system at Bank of America that is hundreds of millions of lines Python code and it a big graph calculation thing like you make one note that represents the price of oil and another note that some trades that some trader needs to have repriced, and then you describe these things in Python and they can run on different machines from each other but they kind of look like they are on one machine and there are some rules about how that needs to work, it's not Vanilla Python. And his point was that Python programmers were totally happy having some constraints around their work as long as it's mostly Python, and his evidence was there was hundred millions lines of code written against this thing and I thought it was pretty good evidence and that convinced me that-
24:41 Michael: That's good evidence, yeah.
24:41 Brexton: Right, that I could make a goal of this idea of restricted, of immutable Python, that that was something that people can work in.
24:50 Michael: Yeah, and you were able to basically take those ideas and map them to your custom runtime right?
24:56 Brexton: Yeah, and it actually didn't even take that long, like there has been a most of the work has been around libraries and 25:01 but we had the core mapping from by the Python object model and runtime down into Fora done in just a few weeks, mostly because the objects models are really not that different, most of the pain is around all those little idiosyncrasies of Python, like if you divide two integers in Python 2 you get an int, but if you divide them in Python 3 you get a float, like these little details, making sure all that stuff works perfectly tends to be a lot of work but that's not the 25:32 of the intellectual problem, right, that's the details of getting a working system that can run existing Python in code and get the same answers what you get in regular Python interpreter.
25:40 Michael: That totally makes sense. So, there is a bunch of questions I have- the first one is what kind of problems are you guys solving, what kind of problems you see other people solving with this system, like where are you focused on applying it?
25:54 Brexton: Sure, so for the most part we are interested in data science and machine learning, so because of the immutable nature of this version of Python it's really not a good fit for 26:06 programming tasks, like I wouldn't write a real time transaction processing thing in it because it's designed to take really big data sets, do really big computations on them efficiently, you can think of it almost like we optimized for throuput of calculations, not for latency, like anything you stick into this thing is going to take at least half a second, but the goal of it we'll be able to use 10 000 cores to take something that might take 5 hours down to 5 seconds, or something like that. We're obviously good at solving all of the standard data parallel things so like if what you are saying is they've got a bunch of data on Amazon s3 and I want to pull it into memory and parse it and slice it the way you would in Pandas, all that kind of stuff works pretty well, and you know, the goal of the platform is to actually make it so you can write regular idiomatic pandas code and just have that scale, we haven't done all of the functions but we are working towards that. I think the place where you really see diferentiation is when you have a more complicated algorithm, where you've got as I said before some kind of structure that doesn't fit nicely into map reduce, like I'll give you an example of one project that we did. One of our customers does Bayesian modeling of retail transactions so they have all of these transactions from different 27:20 as different vendors, and they build these models to try and predict whether they are going to be good customers for these vendors int he future, and they are trying to ask questions like well, what does it mean about you or your purchase at home depo the fact that I can tell that you go to Dunkin Donuts every single day in the morning for coffee, so they are interested in a relationships with cross vendors, and so you end up with this structure of observations where you can group the data by person, or you can group the data by vendor and you have variables that have to do with like how much people care about that particular vendor because you know, the way people interact with amazon.com is very different than the way they interact with lows, and then you also have this information about the individual person and so this ends up wanting to suck about a hundred gigabytes of data keep it in memory and then pass over that data in different orders depending on whether you are trying to update the variables about the individuals or whether you are updating the variables about the vendors, and so that is very easy to describe this in Python, you just have two different loops one in one direction and one in the other, you kind of alternate back and forth between them as you are updating the model. This would have been very hard to express as pure map reduce but it's just you know, the whole project is like a couple of hundred lines of Python code, which is pretty nice.
28:43 Michael: And that is all distributed in what not, right, across these different cores and machines.
28:49 Brexton: Exactly, the reason it's only a couple of hundred lines of code is because all of the painful stuff which is making it fast, getting the numeric likelihood calculation to be fast, doing gradient descend, minimization like all of that, is handled by the infrastructure as opposed to being explicitly described by the user, so the workflow is literally you run a command line thing that boots machines in AWS where the big fan of Spot instances and in Amazon infrastructure if you are not familiar with that it allows you to basically bid for the market against the market price for compute, if the market price goes above your level then you don't get the machines, but if it goes below then you do, and it is usually around 10% of the regular cost of buying the machine which means that I can get something like I get 1000 cores for something like $10 which is pretty insane price for that level of hardware.
29:43 Michael: That's great, and spot instances are really good for when you want to spin up some stuff, do some work for 30 minutes and throw them away. It's less good for like running them indefinitely as a web server, right?
29:52 Brexton: Yeah, exactly, although there are people who are figuring out that if you spread your requests over enough different zones in Amazon, that the chances that the price spikes in more than 2 or 3 of those zones is relatively low, so like I know there are people running real time add bidding networks entirely on spot, and the pricing is so cheap that that's feasible, but it takes a lot of careful design to make it work. But the kind of analytics workloads we are talking about, it's pretty perfect because usually as a programmer what you are hoping to do is tune your algorithm, get it so that you like it on a small scale and then fire it off and just have as many machines as is required to get calculation done, you know, in an amount of time that it allows you to like look at the answer and solve whatever business objective you have.
30:41 And so spot is perfect for that, because you can literally just say, "all right, how many machines do I need to get this done in half an hour?" You look at the workload, you look at the thruplet you divide, and it will just happen, and then you are not committed to holding onto that hardware for any longer than you need it, and at least in our case, because the back end infrastructure is fault tolerant, if the market price happens to change and you lose the machines you can just go and raise your bid price or wait for the price to drop again and continue where you left off. You don't really lose anything from having the volatility of the pricing shutting everything down.
31:16 Michael: Yeah, that's really interesting from a distributed computing perspective, it sounds to me with the restriction on sort of read only type of operations that not every package out there is going to be suitable, so can I go to your system and say pip install something and start working that way and are there packages that will run or does it have to kind of all be from scratch?
31:38 Brexton: It's certainly the case that if you do it from scratch you can control everything and that you have the highest like that it works, we adopted this hybrid approach where we said we are going to have two kinds of code there is going to be code that we understand how to translate and then there is going to be code where we know there is no hope of translating it, mostly because it's written in C like if you look at the core of Numpy and Pandas like they are all written in C there is nothing even analyzed there, so you don't need to anything to make them fast but you would need to do something in order to make it parallel.
32:09 So what we have is a library translation approach, so we've rewritten the core of Numpy and Pandas in pure Python and then whenever we see that you are using those libraries we replace your calls to the C versions of those libraries, with calls to the pure Python versions, and that allows our infrastructure to see the library and make it scale out, the downside is that we obviously have to translate any libraries that are written in C that way back into Python, it's not as much work as you think because part of the reason those libraries are so complicated is in fact because they are written in C like Numpy has to have separate implementations of every routine for both floats and doubled and ints and everything else but if you just write it as Python it's obviously a lot simpler and then the compilers responsible for generating all of those specializations.
33:04 As a user this means that you know, you are better off if your algorithm either uses really Vanilla stuff that we've already translated, or if you are willing to write the algorithm in regular Ptyhon yourself, so most of the work what we've had really good success scaling besides these cases where people wrote their own algorithm anyways because it didn't work with something off the shelf, but our revision over time is that we could get most of the major algorithms like Scikit learn stuff, Pandas, Numpy, all of the core stuff that people use, translate it back into pure Python in the way that works with the compiler and it's pretty easy to do these translations so you know, one of the things that I'd say to the community is if you are using this system and you run into a function that is not there, go into the translation in most cases it's pretty easy and you can contribute it back and it's a nice way of extending the system, we are working on that kind of thing all the time, and we just tend to do the functions as we run into them as we need them for our other work.
This episode is brought to you by Opbeat. Opbeat is application monitoring for developers, it's performance monitoring, error logging, release tracking, and workflow in one simple product. Opbeat is integrated with your codebase and makes monitoring and debugging of your production apps much faster and your code better.
Opbeat is free for an unlimited number of users and starting today, December 1st Opbeat is announcing that their Flask support is graduating from beta to a full commercial product. Visit opbeat.com/flask to get started today.
34:57 Brexton: We also have plans that should be released later this summer of letting you run stuff out of process in regular Python, so that if you don't want to translate it, you are willing to say look, here is some crazy model that somebody wrote, you can see, I am never going to translate it, I don't want to run this model in parallel, I want to run thousands of copies of it, the distinction is I don't need this model itself to scale I just need to have lots of different versions of it running on different machines, in that case we can let you run that our of process basically using the same kind of thing that you would do if you were solving this problem by hand, which is multiprocessing, and this is basically the same approach as what PyPy has taken which is to say you then chunk the problem up into smaller chunks yourself, we'll ship each slice of the problem to a different Python interpreter on a different machine, and it will just run in Vanilla Python the way it runs; the upside of that approach is that it makes it possible to access all of this content that may never fit nicely into one of these models, the downside of that approach is that you end up with all of the headaches that are usually associated with things like PySpark which is like if that process runs out of memory and crushes, you don't have any idea what happened, PySpark can't know, we won't really be able to know why that happened and then the responsibility is now back on the user of those systems to figure out well, why did it run out of memory, what do I need to do to get it set, stuff that we can make go away when we can actually see the source code in itself and translate it. So I think it's kind of a necessary step to getting everything working for a lot of use cases.
36:34 Michael: Yes, absolutely. Because, you don't want to try to translate the long tail of all these libraries and stuff. So, talk to me about how you share a memory, so for example, if I've got a list in memory and it's as far as my program is concerned has like a billion objects in it, what does that look like to the system really?
36:55 Brexton: Sure. This is a super interesting question. So, the idea is we put it in chunks, each chunk being a reasonably small amount of memory like 50 or 100 MB, and then the idea is that these chunks are scattered throughout memory, so like imagine that you had a billion strings and each string is like on average a kilobyte, so you got a terabyte of stuff, in that list, but like the strings are different lengths right, so you know, you might have some chunks might have 50 000 records and some other chunks might have 5000 records, and some might have 500 000 records depending on how your data actually look. So the system chunk setup and then what it does is as your program is running, it thinks about this sort of at the page level, it said ok, I can see your program is running and depending on what it's doing, it needs different blocks of the data to be located together on the same machine.
37:49 So if you do a really simple thing, like imagine you say ok, let's make a list comprehension I'll scan over my list of strings and take the length of each one, well that's really simple operation, I don't need anything more than each individual string once to do that, so it doesn't have to move any data around it just literally goes to every chunk and says ok take that chunk and apply the length function to it and now you get back a bunch of integers. But you could do something more complicated you could say all right, imagine that these strings are log messages, and the first tens of them are from one machine and the second tens are from another machine, you might say Ok, I am actually going to go figure out the indecies of machines and I want to do something where I am scanning through them and looking at them at different time stamps across all the different machines, and so now you actually have a more complicated relationship you need different chunks you might need the first 100 000 strings and string 500 million to 503 million you might need those two blocks together because you are actually doing something with the two of them.
38:51 Michael: Right, one partition might be really evenly distributed across the machines and you ask the question right way and it just parallelizes perfectly, but you might cross cut it really badly, right?
39:01 Brexton: There is two problems here, right, one of them is did you write something where I can get enough data in memory to solve your problem without having to load a lot of stuff, and then the other problem is like which things need to be together, so like what I was just describing is a case where if you just naively put the chunks on different machine it's very easy to ask a query where the chunks are in the wrong place, and our system can handle that right, so it will see like oh you are accessing these two chunks together because you are using both of them, that problem it solves well because it can actually see that collocation and break the problem down as a basically actively working set of pages, and so when we shuttle everything around in memory to meet the requirements of the problem. There is this second issue which is like imagine you said look, I've got a terabyte of strings I am going to start indexing randomly into the set of terabytes of strings and you are just going to able to predict anything about what I am going to do, I am just going to grab them randomly in sequence, you know, there is kind of no way to make that fast, what you will end up doing is repeatedly waiting on the network while the system goes and fetches string 1 million and sting 99 million and then whatever, it's going around and grabbing and this is just because you've written an algorithm that doesn't have any 40:15 to it whatsoever, and so there isn't really a good solution around that other than trying to minimize the amount of network latency but the way I try to articulate it to people is to try to think about it so that your program in your mind you could break it down into sets of Python function calls that use a couple of gigs of data at a time, and it will figure out how to get those gigs of data to the right place, it will see what that structure is, by basically through actually running your program and seeing where you have cache misses.
40:49 Michael: Yeah, that's pretty interesting because one of the things you can do to make your code much more high performance especially around parallelism but even in general is to think about cashing cache misses on the CPU cash right, like locality of data, things like that, and that's running on my own machine, but it's interesting that that also applies on the distributed sense for you guys.
41:13 Brexton: Yeah, in many senses, it's exactly the same problem, it's just that it's a much much much worse problem, like when you cache miss on L1 cache in your CPU you are doing something and the data is not present, like it's like a 100 CPU cycles to go and fetch that data from long term memory which is like much slower than what our CPU is used to but that's still incredibly fast. Like, if you have a job running on one computer on a network and it says hey I need this 50 MB of data on another machine, like go get that for me, like if you have 10 GB ethernet that means you can move like 1 GB around for second on that network between two machines, that's like 20th of a second that you are going to have to wait to get 50MB, that's 4 orders of magnitude, 5 I don't even know, some stupidly larger number of time, to waste for that data to be present, so it's the same basic problem, it's just that you're way more sensitive to the cache locality issue when you are operating in a distributed context, and it's one of the reasons why the map reduce spark model has been so successful, like you are getting the framework of very clear idea of what the cache locality is going to be, when you say take this function and run it on every element of this data set, what you are implicitly telling is like copy the data for this function to every single machine before you do the job and now you'll never be waiting for any data and like we are saying look, you can take that even a step further, you can infer which data actually needs to got here for arbitrarily complicated patterns, and make sure that you are not waiting on cache misses, but it's all boiling down to like trying to fit the program into a model where threads are not waiting on data where you are getting the most stats of the CPU that you possibly can.
43:00 Michael: Ok, yeah, that makes a lot of sense. So, let me ask you a little bit about your business model, Ufora is open source, you can go to github.com/ufora and check it out, right, but obviously you guys are doing this as a job so what's the story there?
43:17 Brexton: Sure, so we are basically a data science and engineering consultancy at this point, we deploy Ufora as part of our work for our clients, we open source it because we wanted technology to get to as many people's hands as possible, you know, I didn't leave the hedgefund industry in order to purely just make a pile of money right, I wanted to build stuff that people can use, that would enable the world to do interesting things, so we put it out there, we are hoping that other people would pick it up, but our business model mostly revolves around us actually doing data science and engineering work for our clients, and so in some cases the Ufora platform is really only 25 or 30 per cent of the solution because you have algorithmic work that needs to be done regardless of what platform you are doing it in and then you have data integration work that needs to be done making all of these various databases 44:16 to each other.
44:16 And so the infrastructure and that being sort of part of it I think over time we're anticipating building out additional products and services on top of the Ufora platform and selling as standalone applications, but infrastructure like this really wants to be open sourced like, it gets very hard to charge for it successfully because you are asking people to take infrastructure and then build a lot of things on top of it and nobody likes to do that, they don't actually control the infrastructure, and it's also one of those things that really benefits from community involvement, again, like this idea of library work, we as a small company are not going to be able to port every library but if I put this infrastructure out there and it's useful for people, then each marginal developer can look at it and say hey there is some piece of content in Numpy or some crazy Scikit learn algorithm that I am missing, it's not that much work for me to go port that one thing and contribute it back, and that spirit of like everybody working together is the kind of thing that makes this big platform systems actually run, which is I think again, why you had seen so much stuff in open source but this kind of platform infrastructure.
45:29 Michael: That sounds really nice. I think the one thing I could see that you guys could make sort of directly charge for is if you actually managed the server and you kind of sold it as computation as a service or something like this.
45:43 Brexton: I thought about that, but it's a little like, there is a company called PiCloud that 45:48 a bunch of vendor funding-
45:51 Michael: Yeah, I met some of those guys, yeah.
45:51 Brexton: Yeah, and their stuff is super awesome and I think they are ultimately shut that down because they didn't feel like they could generate the kind of return that they wanted and you know, Databricks guys are doing Spark hosted as a service but then Google came out with Spark hosted as a service, and Goggle's cost to be able to provide that infrastructure is super low, so I looked at it and it's a nice idea but I think that you just run into this problem where if you are successful Amazon and Google will run the same thing and their cost advantage will eat your lunch. I think in enterprise land there is a lot more value to solving the problem which is most of the people listening to this podcast are probably big users of open source software, a lot of them will pick this kind of technology up to use personally, when you bring this kind of technology into the enterprise you need all these additional support services that regular people don't need and it's integrate with some crazy Microsoft legacy product from ten years ago, it needs to be able to talk to all kind of funny databases that startups don't have as an example, I think that a lot of the monetization in open source happens around that kind of problems.
47:04 Michael: Yeah, I think you're right, enterprise definitely has a lot of its own special challenges let's say.
47:08 Brexton: Right. And those challenges, they don't affect adoption, they affect how big organizations like 47:16 and lock this kind of technology down, and so it makes sense to basically build products and services around that and to try and make the core infrastructure as widely used as possible in the community so that it's as good as it can be.
47:30 Michael: Yeah, ok, great. So, you said you have some interesting things going on and sort of what's coming as well, and one of them is you had recently added IPython notebook integration, right?
47:39 Brexton: Yeah, so this has been on our to do list for a while, it doesn't turn out that it's actually that much work, you just need to make sure that you understand where to get this source code, because instead of living on disk like most most Python modules 47:54 in memory in these specialized Python notebook cells, we also did some work around getting feedback from the clusters so you can fire something off now and it will tell you while the thing is running you are using 500 cores, you could see that number go up and down as it parallelizes your computation because you can write something that is naturally single threaded and you'll get feedback saying, hey like this thing that you wrote it's really only using one core because you've got some loop that can't be parallelized, something like that and this is I think a place we are planning on doing more, giving more, feedback more like some profillific tools and stuff like that.
48:29 So there is running your calculation you can see, ok it's really spending a lot of time inside these functions, these places like trashing because it's using too much data or whatever, so I think there is a lot that we can do there to sort of help give more feedback to people about what's going on in their calculations, this is a common problem in any distributed programming context which is like when you are working on a regular Python interpreter you can just put print statements and everything is good, but when you are working on across hundreds of machines even if you could get all the print statements you would be drowning in print statements you wouldn't be able to get anything productive out of it and it can be really hard to people to understand like ok, in a distributed context, why is this slow or fast, so I there is a lot to do to give people more clarity.
49:16 And I think the other things that we are working on right now that I think are super interesting is we just finished doing something where making an estimate of how long your computation is going to run the fora 49:31 so what we do is as we are running your computation, we are constantly breaking it up into little pieces and we will actually look at what functions you are calling and what values are on the stack so you know, I can notice if you call f with the number 10 it takes one second if you call it with number 1000 it takes a 100 seconds, I could do a little model for 49:49 all your function calls about how long they take and this means they don't subsequent runs that can actually make a projection for how long the different pieces of the computation are going to take and this is useful in two ways, it's useful for the scheduler because now it can say ok, well, this super long computation I better schedule that first because that will improve the overall runtime of the computation, gives it a sense of how to spread the calculations across the system better. But it's also useful in the long run, where I should be able to give you accurate estimates of how long your calculation is going to take from the beginning and then actually make recommendations to you about how much hardware you should use which would move us towards a more automated way of using the cloud where instead of booting machines and thinking of them as your 50:35 you just literally fire the thing off and think purely in terms of cost.
50:40 Michael: That's great, so you've like some machine that would just be sort of in charge of orchestrating this and you say go run this job and it would go all right, that's a 100 spot instances at this price or this many cores and go, something like that?
50:53 Brexton: Yeah, exactly, like it could actually come back to you and say I think this is going to take a 100 000 51:01 compute seconds and that it's parallelizable up to a 1000 cores, so this is how much it's going to cost and you could think of it totally that way and then really completely abstract the way the actual machines, but you can't really do until you have a good estimate because in the problem with computer programs is that it's like you can keep adding zeros and quickly get to runtimes that are astronomical, so you don't want to do that naively because then you'll end up with some system that just says hey, like I am just going to run on a million cores and I'll send you an astronomic Amazon bill, which you really don't want. So, actually getting a good estimate is pretty important if you are going to make that feature work. But like, that model also has incredibly useful features because it's essentially a profile of your code, it can tell you hey, by the way the reason why your program is taking so long is that like when you are calling this function it's taking 3 000 compute seconds and tells you oh I should go look at the error and maybe see why it is that it's so slow.
51:56 Michael: Yeah, that's really cool. And so far we have been talking about as running this on CPUs on dedicated VM, in the cloud. But, if you are going to do computational stuff, like pure math type stuff, some of the fastest hardware you can get hold of are actually graphics units GPUs right?
52:12 Brexton: Yeah, absolutely, I think that that's the thing that's driving the current 52:20 right now, in fact like these graphics processors aren't just good for playing video games they are good for doing general math and in many cases they are several hundred times more math operations in the regular CPU can do, the biggest issue with them is that they are really challenging to program, unlike your CPU where each of the cores on your machine can do a completely different thing at once and you can just program as if they are totally independent of each other, the threads on a GPU all have to do exactly the same thing simultaneously they can kind of try and hide that from you a little bit but as soon as you use something the GPU doesn't like it suddenly gets hundreds times slower, and there is no benefits for using it, and it's really hard to write code 53:05 and one of the things we are spending a lot of time on Ufora is thinking about how to get the Python code to run natively on GPU and to solve these programming problems and so the idea is in the same way that we are able to make fast programs, scaling out on a cluster by actually running them and learning from the way they are behaving like which functions are slow and how they are accessing data, we can apply those same techniques to solve some of the problems that people have.
53:33 So, as an example, we can identify places where threads are in fact doing something together in lock step, and say hey, you know, we've noticed that this is a good piece of code to run on a GPU and schedule that automatically, and then more aggressive program transformations you can do, well you can detect that if you modify the program slightly you would end up with something that actually would run efficiently on GPU. I was reading some blog post by 54:03 where he pointed out that if you move an array from local memory to shared memory in a GPU, suddenly it speeds up 20 times, which is like an enormous performance difference, right, and like I didn't even remember exactly what I was reading it like what's the difference between local and shared memory and my point is that this is the kind of thing like an optimizer that actually had a statistical model of your program would be perfectly capable of doing it automated way, which would free you from the burden of trying to figure that out and in some cases might do optimizations that no programmer had ever thought of because it's such a hard thing to optimize and they didn't realize wow, if I move this little over it's going to be faster. So, we are just getting started with this, but I think there is an enormous amount that we can do and you should expect to see more commits related to that coming out over the summer.
54:50 Michael: Yeah, I think that's really amazing, and just for those of you guys who don't know out there, you can go to Amazon AWS EC2 and say I would like to get a GPU cluster with Nvidia or ATI Radion or whatever type of graphic cards, right?
55:05 Brexton: Yeah, so what is even better, you can do this with 55:08 so you can go and get machines that have 4 teslas on them and you can pay like 25 cents an hour for that, and so you can go to hundred machines with 4 teslas on them for 25 dollars. Which is just a stupid amount of computing hardware, the GPU instance prices on Amazon fluctuate a little bit in while because I think there is some researchers doing a whole lot of deep learning research on there so it's actually kind of funny sometimes the smaller GPU instances actually cost a lot more than larger ones because there is pressure for those on pricing, but it is an amazing and a lot of our work is focused on this idea that like not only the easy those machines but it should be easy to use hundred GPUs without having to think about it, right, there is actually some great software out there right now for using a single GPU, effectively, what happens when you now say ok, well I have data on one GPU and it wants to talk to data on a completely different GPU on a completely different machine, right, that's the thing that I am trying to make totally transparent.
56:13 Michael: That's a really awesome problem. Obviously, the GPUs are fast and this would make it possibly maybe possible for you to get answers to your question sooner, but would it actually make it cheaper as well? To answer your question in general?
56:26 Brexton: No question that there are a whole hosts of problems that if you move them from CPU to GPU your total cost of computation goes down by couple of orders of magnitude, so actually there is an example this ma 56:39 calculation I was doing on the retail stuff for one of our customers, one of the reason we are pushing into the GPU space is that we currently use about 40 Amazon machines the really big instances to get their calculations done, I think about an hour, but if you do this very frequently to update the model when your data comes in, and you know, that cost them $10 every time they run it, and if they do that every day 4 times a day for a long time that starts to become real money again. We have estimated that it would be about 5% of the cost if we can get that calculations to run as efficiently as we think it could using GPU. Now the problem with that is like as soon as you do that, it's not clear that people will take the savings and just reinvest it, look that's great if they want to do it.
57:28 Michael: Yeah, that's interesting. So 20 times cheaper, when you say 5%?
57:32 Brexton: Yeah. Something like that.
57:33 Michael: Yeah, that's awesome.
57:34 Brexton: I mean, it's extremely dependent on the problem, sometimes it's only like two times faster, sometimes it's slower and sometimes it's like look for the neural network stuff that people are doing, it's at least a 100 times faster/ cheaper, and the neural network people all think in terms of electricity cost, right they literally say how many like training cycles they can do per kilowatt hour.
57:53 Michael: Wow, ok, that's really amazing. So, is there a way with your system to sort of extrapolate and estimate how much something would cost, is there a way to say I have these 3 problems, if I were to give it this much data, how much would each one of these cost because I can only afford to answer one.
58:12 Brexton: Well, so that's one of the things that we are hoping to answer with some of this extrapolatory runtime stuff, this ability to predict what runtime is actually going to be and how much compute power you need, so the idea would be that you would run all three problems at several smaller scales so the system could see how the runtime of all the little functions was changing as you are changing the problem side, and then it would be able to extrapolate from there. At the end of the day though, it all depends on how accurate you want your estimate to be if you really want a perfect estimate, you should actually do that experiment yourself and make a decision and like honestly the way I usually do this problem is doing that process by hand, I'll run it with some smaller input and keep adding zeros until it becomes slow and then I'll try to understand why it's changing. But yeah, in principle, that's totally a direction that we are moving in where you can literally just get a price estimate for each of the free things and there might be plus minus two times or whatever depending on how accurate the estimates are.
59:19 Michael: Yeah. Ok, that sounds great. So, I guess the last thing we have time to cover is how you get started, like I know it's on github but there is a lot of moving pieces with distributing computing, so can you just walk me through going from nothing until I have an answer maybe, what that looks like?
59:35 Brexton: Yeah, absolutely. So we published Docker images with the software backend, every time we do a release, so if you want to run this on your local machine, you pip install the front end which is called PyFora and that's just Vanilla Python code that like has the thing that takes your Python code and sends it to the server. And then, you need to get some nodes it can actually run work and you've got two options, one of them is to run it just on your local machine, and doing this gets you the benefit of using the cores on your local machine so it's not trivial. So to do that you just do Docker run Ufora/service:latest that pulls the latest version of the Ufora service and runs it, and if you want to run this on Amazon which is the way I do everything, we have a little command line utility called PyFora AWS and that thing can just boot instances, so you have to have your AWS credentials exposed in the environment the same way you would have if you were using Boto to interact with AWS.
60:35 But if you do that, it knows how to start machines, stop them, and it will make sure everything is configured on there correctly, you do need to make some decisions about the security model, personally I prefer to boot machines and then use SSH tunneling so then I am connecting to a server that looks like it's on local host and SSH is taking care of all of the security, but there are other ways to do it that are documented in PyFora AWS but the basic idea is that you use AWS or you use Docker locally to get something going, you know one of these machines, and that gives you an IP address that you can talk to, and your Python program you connect to that IP address, you get a little connection object, and then any code that you want to execute in Ufora you just put inside of a with block that references that connection, so you say with my connection colon and then anything inside of there when the Python interpreter gets to it, instead of executing it it will pick that code and the resulting objects, ship them over to the server, the server will execute them.
61:40 Now that we have IPython notebook integration I am going to be publishing a bunch of example IPython notebooks so you should be able to go to our website and just pull down a few examples of those things operating, but you know, the basic point is that if you are a user of Amazon already, you can get up and running with the amount of time it takes to boot an AWS instance and how long it takes you to pip install PyFora.
62:04 Michael: Wow, that sounds really cool. Nice work on that, and props for using the context manager to with block.
62:10 Brexton: It turns out it's a really nice way of doing it, you have to do some clever trickery under the hood to make that work, especially when it comes time to propagate exceptions, because if you produce an exception on the server you've got to take that state moving back over and then kind of rebuild the appropriate stack trace objects on the client which is kind of an interesting thing to do but the end result is a really nice integration and it gives you this nice ability to pick and choose where you want to do the PyFora, to use the technology, so it means that if there are parts of your code that are really never going to be parallelized, you don't want to move them or if you are touching things on your file or you are reading things off of the internet or whatever, that stuff you can all have on your local boxes, it's just that heavy compute stuff can happen remotely, it also gives you a nice way of deciding like which objects are going to live remotely and which objects are going to live locally, so you can do a calculation that ends up producing this, like think about the example we were talking about earlier, we have a list of a billion strings, like in a with block, that list of a billion strings you can do whatever you want, in your local Python interpreter you will end up with like a reference to that list, it's like a proxy object and you can't do anything with it locally because you obviously can't bring a terabyte of strings back into your local Python processor without crashing it or waiting for an hour, but you can then pass that back into a subsequent with block and do something there, cut it down to something smaller if you want to pull back a slice of it or whatever. And so it makes it a really nice work kind of describing which things are going to be in your local process and which things are going to be remote.
63:48 Michael: That sounds awesome. This is really cool, if you've got big data processing, it sounds like people should check it out. So, I think we are going to have to leave it there, we are just about out of time. Let me ask you just a couple of questions to wrap things up- I always ask my guests what their favorite PyPi package is, there is 80 thousand of them out there and we all get experience with different ones that we'd want to tell people about, what's yours?
64:12 Brexton: Well, I'd hate to not be super interesting on this, but I have to say I love Pandas, I think Wes did a great job, I would argue that the resurgence of Python in the financial services community is basically due to the existence of Pandas pulling people out of the R mindset and into Python, so if you are not familiar with it check it out but probably everybody listening to this knows about it.
64:34 Michael: Yeah, Pandas is very cool. And, how about an editor, what do you open up if you are going to write some code?
64:38 Brexton: I use Sublime, I even paid for it although I didn't have the energy to actually paste the key so that it keeps asking me for the key, but I did pay for it, I think they did a great job with that editor.
64:50 Michael: Yes, Sublime is great. All right, so final call to action, what should people
do to get started with Ufora?
64:56 Brexton: Go and check us out on github, try running some code, we try to make as easy as possible to get started as I said before, we are going to be posting a bunch of IPython notebooks with examples, and then, please give us feedback tell us what libraries you want ported, what problems are you running into using it and like honestly, if you run into some numpy function that we didn't get to yet, take a crack and implement it yourself, it's a pretty straightforward model and send us a pull request, we'd love to include it.
65:28 Michael: All right, excellent. I think this is great project you are on and I am happy to share it with everyone, so thanks for being on the show Brexton.
65:34 Brexton: Thank you so much for having me, it's a pleasure to talk about this stuff as always.
This has been another episode of Talk Python To Me. Today's guest was Braxton McKee and this episode has been sponsored by Snap CI and Opbeat. Thank you guys for supporting the show!
Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at snap.ci/talkpython
Opbeat is mission control for your Python web applications. Keep an eye on errors, performance, profiling, and more in your Django and Flask web apps. Tell them thanks for supporting the show on twitter where they are @opbeat
Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more enga ging way to learn Python.
You can find the links from the show at talkpython.fm/episodes/show/60. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm.
Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song at talkpython.fm/music.
This is your host, Michael Kennedy. Thanks for listening!
Smixx, take us out of here.