#239: Bayesian foundations Transcript
00:00 In this episode, we'll dive deep into one of the foundations of modern data science,
00:03 Bayesian algorithms and Bayesian thinking. Join me along with guest Max Sklar as we look at the
00:09 algorithmic side of data science. This is Talk Python to Me, episode 239, recorded November 10th, 2019.
00:29 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,
00:34 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.
00:39 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter
00:44 via at Talk Python. This episode is brought to you by Linode and Tidelift. Please check out what
00:50 they're offering during their segments. It really helps support the show. Max, welcome to Talk Python
00:55 to Me. Thanks for having me, Michael. It's very great to be on. It's great to have you on as well.
00:59 You've been on Python Bytes before, but never Talk Python to Me.
01:02 That was a lot of fun. I actually got someone reached out to me on Twitter the other day saying,
01:06 hey, I saw you on Python Bytes. So that was really exciting.
01:09 Right on, right on. That's super cool.
01:11 I heard you on Python Bytes. I always say saw you when it's really heard you, but anyway.
01:16 It's all good. So now they can say they saw you on Talk Python to Me as well.
01:20 Now, we're going to talk about some of the foundational ideas behind data science,
01:26 machine learning. That's going to be a lot of fun. But before we get to them, let's set the stage and
01:31 give people a sense of where you're coming from. How do you get into programming in Python?
01:34 That is a really interesting question because I think I started in Python a very long time ago,
01:40 like 10 years ago maybe. I was working on kind of a side project called stickymap.com. The website's
01:47 still up. It barely works. But it was basically a, it was, it was like my senior project as an
01:53 undergrad. So I really, I started this in 2005. And what it was, was it was, you know, Google Maps
01:59 had just come out with their API where you can like, you know, include a Google map on your site.
02:05 And so I was like, okay, this is cool. What can I do with this? Let's add markers all over the map
02:10 and it could be user generated. We would call them emojis now. And people could leave little messages
02:15 and little locations and things like that. This was before there was Foursquare, which is where I
02:19 worked, which is location intelligence. This was just me messing around, trying to make something cool
02:24 and being inspired by the whole host of like, you know, social media startups that were happening at
02:30 the time. And I was using, what was I using at the time? I was using PHP and MySQL to put that
02:38 together. I knew nothing about web development. So I went to the Barnes and Noble. I got that book,
02:42 PHP, MySQL. I got it. But then sometime around like 2008, 2009, I realized, you know, a lot of people
02:48 were talking about Python at work. And I realized like, sometimes I need, this is kind of when I was
02:52 winding down on the project, but I realized, you know, I had all this data and I realized I needed
02:59 a way to like clean the data. I needed a way to like write good scripts that would clear up certain
03:05 like if I have a flat file of like, here's the latitude and longitude, they're separated by
03:11 tabs. And here's a, you know, here's some text that someone wrote that needs to be cleaned up,
03:16 et cetera, et cetera. Yeah, I can write some scripts in like Python or Java, believe it or not, which I
03:23 knew at the time, but then, or sorry, a PHP or Python, which I knew at the time, but like, wait, wait,
03:28 not job. Sorry. I was trying to do it in PHP and Java, which is really bad idea.
03:34 Yeah. Especially PHP sounds tricky. Yeah. Yes, yes, yes. And then I was like, well,
03:39 I'm just learning this Python. I need something. So let me try to do it with Python. And it worked
03:44 really well. And then I had, you know, to deal a lot more with CSVs and stuff like that tab separated
03:52 files. And it really was just a way to like save time at work. And it was like a trick to say,
03:57 hey, that thing that you're doing manually, I can do that in like 10 minutes. And it's not 10 minutes,
04:03 maybe a couple hours and write a script. And it's going to take you like one week. Like I saw someone
04:08 at work trying to change something manually. And so this is all a very long time ago. So I don't
04:12 remember exactly what it was, but it was kind of like a good trick to save time. And it had nothing
04:17 to do with data science or machine learning at the time. It was more like writing scripts to clean up
04:20 files. Well, that's perfect for Python, right? Like it's really one of the things that's super good at.
04:25 It's so easy to read CSV files, JavaScript files, XML, whatever, right? It's just they're all like a
04:31 handful of lines of code and you know, magic happens.
04:34 Yeah. The one thing that I was really impressed with was like, how easy at the time now, when I
04:39 wanted to do more complicated Python packages in like 2012, 2013, I realized, oh, actually,
04:45 some of these packages are complicated to install. But like, I was so impressed with how easy it was
04:50 to just import the CSV package and just be like, okay, now we understand your CSV. If you have some
04:57 stuff in quotes, no problem. If you want to clean up the quotes, no problem. Like it was all just like,
05:01 it just happened very fast.
05:02 Yeah. You don't have to statically link to some library or add a reference to some other thing or
05:07 none of that, right? It's all good. It's all right there.
05:09 Yeah. I mean, that was, those were the days when like, I was still programming in C++ for work. So
05:15 you could imagine what, how big of a jump that was. I mean, that seems so ancient. I used to have to
05:21 program in C++ for the Palm Pilot. That was my first job out of school, which is crazy.
05:26 Oh, wow. That sounds interesting. Yeah.
05:28 Yeah.
05:28 Yeah. Coming from C++, I think people have two different reactions. One, like, wow, this is so
05:33 easy. I can't believe I did this in so few lines. Or this is cheating. It's not real programming.
05:40 It's not for me, you know? But I think people go, who even disagree, like, oh, this is not for me,
05:46 eventually like find their way over. They're pulled in.
05:48 I never had a phase where it was like, oh, this is not for me. But I did have a phase where it was like,
05:54 I don't see, this is just another language. And I don't see why it's better or worse than any other.
05:59 I think that's the phase that you go through when you learn any new language where it's like, okay,
06:02 I see all the features. I don't see what this brings me. It was only through doing those specific
06:07 projects where it was like, aha, no one could have convinced me.
06:10 Yeah. Also, you know, if you come from another language, right, if you come from C++, you come
06:14 from Java, whatever, you know how to solve problems super well in that language. And you're comfortable.
06:20 And when you sit down to work, you say, file a new project and file, new files, start typing.
06:26 And it's like, okay, well, what do I want to do? I want to call this website or talk to this database.
06:30 I'm going to create this and I'll do this. And bam, like, you can just do it. You don't have to just
06:35 pound on every little step. Like, how do I run the code? How do I use another library?
06:41 What libraries are there? Is there like, there's every, you know, it's just that transition is always
06:46 tricky. And it takes a while before you, you get over that and you feel like, okay,
06:52 I really actually do like it over here. I'm going to put the effort into learn it properly because
06:57 I don't care how amazing it is. You're still going to feel incompetent at first.
07:03 The switching costs are so tough. And that's why they say, oh, if you're going to build a new
07:07 product, it has to be like 10 X better than the one that exists or something like that. I don't know
07:11 if that's, you know, literally true, but like it's true with languages too, because it's really hard to
07:18 like pick up a new language and everyone's busy at work and busy doing all the tasks they need to do
07:21 every day. For me, frankly, it was helpful to take that time off in quotes, time off. When I was going to
07:27 grad school, time off from working full-time as a software engineer to actually pick some of this
07:33 stuff up. Absolutely. All right. So you had mentioned earlier that you do stuff at Foursquare and it
07:38 sounds like your early programming experience with sticky maps is not that different than Foursquare,
07:44 honestly. Tell people about what you do. Maybe, I'm pretty sure everyone knows what Foursquare is,
07:48 what you guys do, but tell them what you do there. People might not be aware of where Foursquare is
07:54 today. You know, there is Foursquare is kind of known as that quirky check-in app, find good places
08:01 to go with your friends and eat app, you know, share where you are. And that's where we were in 2011,
08:08 where, when I joined up to, you know, a few years ago, but ultimately, you know, the company kind of
08:14 pivoted business models and sort of said, Hey, we have this really cool technology that we built for the
08:20 consumer apps, which is called Pilgrim, which essentially takes the data from your phone and
08:25 translates that into stops. You know, you'd stopped at Starbucks in the morning, and then you stopped at
08:30 this other place, and then you stopped at work, et cetera, et cetera. And then, you know, that goes into,
08:35 that finds use cases like, you know, across the apposphere, I don't even know what to call it,
08:41 but many apps would like that technology. And so we have this panel and, you know, so for a few years,
08:47 I was working on a product at Foursquare called Attribution, where companies, our clients would say,
08:53 Hey, we want to know if our ads are working, our ads across the internet, not just on Foursquare.
08:57 And we would say, well, we could tell you whether your ads are actually causing people to go into your
09:03 stores more than they otherwise would. And I worked on that for a few years, which is a really cool
09:09 problem to solve, a really cool data science problem to solve, because it's a causality problem.
09:12 It's not just, you know, you can't just say, well, the people who saw the ads visited 10% more,
09:18 because maybe you targeted people who would have visited 10% more.
09:21 Exactly. I'm targeting my demographic, so they better visit more. I got it wrong.
09:26 That industry is a struggle, because the people that you're selling to often don't have the
09:31 backgrounds to understand the difference, and sometimes don't have the incentives to understand
09:36 the difference. But we did the best we could. And so that led to kind of an acquisition that
09:42 Foursquare did earlier this year of Placed, which was an attribution company owned by Snap,
09:49 but they sold it to us through this big deal. You can read about it online.
09:54 Giant tech company trade.
09:56 Yeah. And so I had left Foursquare in the interim, but then I recently went back to work with the
10:05 founder, Dennis Crowley, and just kind of building new apps and trying to build cool apps based on
10:10 location technology, which is really why I got into Foursquare, why I get into Sticky Map,
10:15 and I'm just having so much fun. So that's, and we have some products coming along the way where
10:21 it's not enterprise. It's not, you know, measuring ads. It's not ad retargeting. It's just
10:26 building cool stuff for people. And I, I don't know how long this will last, but I couldn't be happier.
10:32 Sounds really fun. I'm sure Squarespace is, sorry, Squarespace.
10:36 You're not the first fan. Squarespace is around here. Foursquare is in New York where you are.
10:43 Now, I'm sure that that's a great place to be, and they're doing a lot of stuff. They used
10:48 something like Scala. There's some functional programming language that primarily there,
10:52 right? Is it Scala?
10:53 Yeah, it's primarily Scala. I've actually done a lot of data science and machine learning in Scala. And
10:57 sometimes I'm kind of envious of Python because there's better tools in Python. And we do some of
11:03 our, we do some of our initial testing on data sets in Python sometimes, but there is a lot of momentum
11:10 to go with Scala because all of our backend jobs are written in Scala. And so we often have to
11:15 translate it into Scala, which has good tools, but not as good as Python.
11:19 Yeah. Yeah. So I was going to ask, what's the Python story there? Do you guys get to do much
11:24 Python there?
11:25 Yeah. So I have done, if I can take you back in the, to the olden days of 2014, if that's,
11:33 if that's allowed, because one of the things that I did at Foursquare that I'm pretty proud of
11:38 is building a sentiment model, which is trying to take a Foursquare tip, which were like three
11:45 sentences that people wrote in Foursquare on the Foursquare City Guide app. And that gets surfaced
11:50 later. It was sort of compared to the Yelp reviews, but except they're short and helpful and not as
11:57 negative. What we want to do is we want to take those tips and try to come up with the rating of
12:01 the venue because we have this one to 10 rating that every venue receives. And so using the likes
12:07 and dislikes explicitly wasn't good enough because there were so many people who would just click like
12:12 very casually. And so we realized at some point, Hey, we have a labeled training set here. We can say,
12:19 Hey, the person who explicitly liked a place and also left a text tip, that is a label of positive.
12:25 And someone who explicitly disliked a place, that's a label of negative. And someone who left the
12:29 middle option, which we called a meh or a mixed review, their tip is probably mixed. And so we have
12:35 this tremendous data set on tips and that allowed us to build a model, a pretty good model. And it
12:41 wasn't very sophisticated. It was multi-logistic regression based on sparse data, which was like
12:46 what phrases are included in the tip. Right. Trying to understand the sentiment of the actual words,
12:53 right? Yeah. There was logistic regression available in Python at the time, which is great,
12:58 but I wanted something a little custom, which is now available in Python. But back then it was kind
13:03 of hard to find these packages and not just that there, even when there were packages, sometimes
13:08 it's difficult to say, okay, is this working? How do I test what's going on into the hood? It's not very,
13:13 so I decided to build my own in Python, which was a multi-logistic regression means we're trying to find
13:20 out three categories like positive review, negative review, or mixed review based on the label data.
13:27 And we were going to have a sparse data set, which means it's not like there are 20 words that we're
13:34 looking for. No, there are like tens of thousands. I don't know the exact number, tens of thousands,
13:39 hundreds of thousands of phrases that we're going to look for. And for most of the tips, most of the
13:43 phrases are going to be zero. Didn't see it, didn't see it, didn't see it. But every once in a while,
13:47 you're going to have a one, didn't see it. So that's when you have that matrix where most of
13:50 them are zero, that's sparse. And then thirdly, we wanted to use elastic net, which meant that
13:56 most of the weights are going to be set to exactly zero. So when we store our model,
14:01 most words, it's going to say, hey, these words aren't sentiment. So we're just going to,
14:06 these don't really affect it. We want to have it exactly zero, except what a traditional logistic
14:10 regression would do is it would say, okay, we are going to come up with the optimal,
14:18 but everything will be close to zero. And so you have to kind of store it. You have to store the
14:21 like 0.0001. So that's a problem too. So I actually built that kind of open source and put that on my
14:29 GitHub on base pi back in 2014. I don't think anyone uses it, but it was a lot of fun. I use Cython to make
14:34 go really fast. It's kind of a problem at Foursquare because it's the only thing that
14:39 runs in Python. And every once in a while, someone asks me like, what's this doing here?
14:42 Exactly. How do I run this? I don't know. This doesn't fit to our world, right?
14:45 Yeah.
14:45 Cool. All right. Well, Foursquare sounds really fun. Another thing that you do
14:50 that I know you from, I don't know you through the Foursquare work that you're doing. I know you
14:54 through your podcast, The Local Maximum, which is pretty cool. You had me on back on episode 73.
15:00 So thanks for that. That was cool.
15:01 That is our most downloaded episode right now.
15:04 Really? Wow. Awesome.
15:06 Yeah.
15:06 That's super cool to hear.
15:07 Yeah.
15:07 More relevant for today's conversation, though, would be episode 78, which is all about Bayesian
15:14 thinking and Bayesian analysis and those types of things. So people can check that out for a more
15:20 high level, less technical, more philosophical view, I think, on what we're going to talk about
15:26 if they want to go deeper, right?
15:27 Absolutely. You could also ask me questions directly because I ramble a little bit in that, but I cover
15:31 some pretty cool ideas, some pretty deep ideas there that I've been thinking about for many years.
15:37 Yeah, for sure. So maybe tell people just really quickly what The Local Maximum is, just to give you
15:42 a chance to tell them about it.
15:43 Yeah. So I started this podcast about a year and a half ago in 2018.
15:48 And it started with, you know, I started basically interviewing my friends at Foursquare being like,
15:53 hey, this person's working on something cool, that person's working on something cool, but they never
15:57 get to tell their story. So why not let these engineers tell their story about what they're
16:02 working on? And since then, I've kind of expanded it to cover, you know, current events and interesting
16:09 topics in math and machine learning that people can kind of apply to their everyday life. Some episodes
16:14 get more technical, but I kind of want to bring it back to the more general audience that it's like,
16:18 hey, my guests and I, we have this expertise. We don't just want to talk amongst ourselves. We want
16:23 to actually engage with the current events, engage with the tech news and try to think, okay, how do we
16:29 apply these ideas? And so that's sort of the direction that I've been going in. And it's been a lot of fun.
16:36 I've expanded beyond tech several times. I've had a few historians on, I've had a few journalists on.
16:42 That's cool. I like the intersection of tech and those things as well. Yeah, it's pretty nice.
16:45 This portion of Talk Python to me is brought to you by Linode. Are you looking for hosting that's fast,
16:53 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at
16:57 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated
17:04 server with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or
17:09 where your users are, there's a data center for you. Whether you want to run a Python web app,
17:13 host a private Git server, or just a file server, you'll get native SSDs on all the machines,
17:19 a newly upgraded 200 gigabit network, 24-7 friendly support, even on holidays, and a seven-day money-back
17:26 guarantee. Need a little help with your infrastructure? They even offer professional
17:30 services to help you with architecture, migrations, and more. Do you want a dedicated server for free
17:34 for the next four months? Just visit talkpython.fm/Linode. Let's talk about general data science
17:43 before we get into the Bayesian stuff. So I think one of the misconceptions in general is that you have to be
17:52 a mathematician or be very good at math to be a programmer. I think that's a false statement.
17:59 To be a programmer.
18:00 Yes, yes. Software developer. Straight up, I built the checkout page on this e-commerce site,
18:05 for example.
18:06 I would agree. I think you need some abstract thinking. You can't escape letters and stuff and
18:12 variables, but you don't need, well, in the case of data science to compare, like you don't need,
18:18 you don't need algebra or you don't need maybe a little bit, but you don't really need calculus and
18:23 you don't need geometry, linear algebra and geometry. Yeah. Sometimes it's a UI engineer. You might need a
18:29 little geometry.
18:29 I mean, there's certain parts that you need that kind of stuff. Like video game development, for example,
18:33 everything is about multiplying something by a matrix, right? You put all your stuff on the screen,
18:39 even arrange it and rotate it by multiplying by matrices. There's some stuff happening there you
18:44 got to know about, but generally speaking, you don't. However, I feel like in data science,
18:48 you do get a little bit closer to statistics and you do need to maybe understand some of these
18:55 algorithms. And I think that's where we can focus our conversation for this show is like,
19:01 what do we need to know in general? And then the idea of Bayesian Bay's theorem and things like that.
19:06 What do we need to know if I wanted to go into say data science? Because like I said, I don't really
19:12 think you know that need to know that to do like, you know, connecting to a database and like saving a
19:16 user. And you absolutely need logical thinking, but not like stats, but for data science, what do you
19:23 think you need to know?
19:24 Well, for data science, it really depends on what you're doing and how far down the rabbit hole you
19:29 really want to go. You don't necessarily need all of the philosophical background that I talk about.
19:34 I just love thinking about it. And it sort of helps me focus my thoughts when I do work on it
19:41 to kind of go back and think about the first principles. So I get a lot of value out of that,
19:46 but maybe not everyone does. There is sort of a surface level data science that or machine learning
19:53 that you can get away with. If you want to do simple things, which is like, hey, I want to
19:58 understand the idea that I have a training set, you know what a training set is, and this is what I want
20:04 to predict. And here is roughly my mathematical function of how I know whether I'm predicting it well
20:12 or not, but it could be something simple like the square distance, but already you're introducing
20:16 some math there. And basically, I'm going to take a look at some libraries and I'm going to
20:23 see if something works out of the box and gives me what I need. And if you do it that way,
20:28 you need a little bit of understanding, but you don't need everything that like I would say kind of a
20:33 true data science or machine learning engineer needs. But if you want to go deeper and kind of
20:39 make it your profession, I would say you need kind of a background in calculus and linear algebra.
20:45 And again, like, look, if I went back to grad school and I like if I went to a linear algebra
20:52 final and I took it right now, would I be able to get every question right? Probably not. But I know
20:57 the basics and I have a great understanding of how it works. And if I look at the equations, I can kind of
21:03 break it down, you know, maybe with a little help from Google and all that.
21:06 I think there's a danger of using these libraries to make predictions and other stuff when you're
21:13 like, well, the data goes in here to this function and then I call it and then out comes the answer.
21:18 Maybe there's some conditionality versus independence requirement that you didn't understand and it's
21:24 not met or, you know, whatever, right?
21:26 That's why I said it's really surface level and you can get away with it sometimes, but
21:30 only for so long. And I think understanding where these things go wrong outside the, you know, when you
21:37 take these black box functions requires both kind of a theoretical understanding of how they work and
21:43 then also just like experience of seeing things going wrong in the past.
21:46 Yeah. That experience sounds hard to get, but it seems like I'm an experience, right?
21:51 You just, you got to get out there and do it.
21:52 Right. Well, here's a good example. One time I was trying to predict how likely someone is to
21:57 visit a store. This was part of working on Foursquare's attribution product, right? And
22:03 someone was using random forest algorithm, or maybe it was just a simple decision tree. I'm not sure,
22:09 but basically it creates a tree structure and puts people into buckets and determines whether or not,
22:18 you know, and for each bucket, it says, okay, in this bucket, everyone visited and in this bucket,
22:21 everyone didn't, or maybe this bucket is 90, 10 and this bucket is 10, 90. And so I can give good
22:26 predictions on the probability someone will visit based on where they fall on the leaves of the tree.
22:32 And we were using it and something just wasn't making sense to me. Somehow the numbers were just,
22:39 something was wrong. And then I said, okay, let's make, let's make more leaves. And then I made more
22:44 leaves. Like I made, I made the tree deeper, right? And then they're like, see, when you make the tree
22:49 deeper, it gets better. That makes sense because it's, it's more fine graining. I'm like, yeah,
22:54 but something doesn't make sense. It shouldn't be getting this good. And then as I realized what was
22:59 happening, what was it, what was happening was some of the leaves had nobody visited in this leaf.
23:04 That makes a lot of sense because most days you don't visit any particular chain.
23:08 And when it went to zero and then it saw someone visited, well, the log likelihood loss,
23:15 it basically predicted 0% of an event that didn't happen. And so log, when you do log likelihood
23:22 loss or negative log likelihood loss, the score is like the negative log of that. So essentially you
23:27 should be penalized infinitely for that because there was no smoothing. But the language we were using,
23:33 which I think was spark or something like that. And it was probably some library and spark. I probably
23:40 shouldn't throw a spark under the bus. It was probably some library or something was changing
23:43 that infinity to a zero. So the thing that was infinitely bad, it was saying was infinitely good.
23:49 And so the worst thing. And that took, oh God, that took us so long to figure out. Like it's embarrassing
23:56 how long that one took to figure out, but that's, that's a good example of when experience will get
24:02 you in something. I don't think I've ever talked about this one publicly.
24:05 Yeah. Well, you just got to know that, you know, that's not what we're expecting, right?
24:10 Yeah. But you know, theoretically, Hey, if I more fine grained my tree, if I, you know,
24:15 make my groups smaller, maybe it works better. But I was like something, I was like, something's not
24:21 right. It's working a little too good. There was nothing specifically that got me, but it was just
24:26 like, there's probably a lot of stuff out there. That's actually people are taking actions on and
24:30 spending money on, but it's, it's like that, right? Yeah. Yeah. So let's see. So we talked about
24:35 some of the math stuff. If you really want to understand the algorithms, you know, statistics,
24:39 calculus, linear algebra, you obviously need calculus to understand like real statistics,
24:44 right? Continuous statistics and stuff. What else though? Like, do you need to know machine learning?
24:50 What kind of algorithms do you need to know? Like what, what in the computer science-y side of things
24:55 do you think you got to know? Bread and butter of the data scientists that I work with is machine
25:00 learning algorithms. So I think that is very helpful to know. And I think that, you know, some of the
25:06 basic algorithms in machine learning are good to know, which is like the K nearest neighbor,
25:10 K means, logistic regression, decision trees, and then some kind of random forest algorithm,
25:17 whether it's just random forest, which is a mixture of trees or gradient boosted trees we've had a lot
25:22 of luck with. And then a lot of this deep learning stuff is, well, neural networks is one of them.
25:29 Maybe you don't need to be an expert in neural networks, but it's certainly one to be aware of.
25:33 And based on these neural networks, deep learning is becoming very popular. And I've been hearing and
25:40 kind of looking into reading about deep learning for many years, but I have to say, I haven't actually
25:45 implemented one of these algorithms myself. But I just interviewed a guy on my show, Mark Ryan,
25:51 and he came out with a book called machine learning for structured data, which means, hey, you don't
25:57 just, this doesn't just work for like images or audio recognition, you could actually use it for
26:01 regular marketing data, like use everything else for. So I was like, all right, that's interesting. Maybe
26:06 I'll work on that now. But I don't think at this point, you need to know machine learning to be a good
26:11 or deep learning to be a good data scientist or machine learning engineer. I think the basics are really
26:16 good to know, because in many problems, you know, the basics will get you very far. And there's a lot
26:20 less that can go wrong.
26:22 Yeah, a lot of those algorithms you talked about as well, like K-Nearest Neighbor and so on.
26:26 There are several books that seem to cover all of those. I can't think of any off the top of my
26:30 head, but I feel like I've looked through a couple and they all seem to have like, here are the main
26:33 algorithms you need to know to kind of learn data science. So not too hard to pick them up.
26:38 Slash names Bishop, the book that I read for grad school, but that's already 10 years old,
26:42 certainly had all that stuff. That was very deep on math. I can send you a link if I want.
26:46 Sure.
26:46 I think kind of any intro book to machine learning will have all of that stuff.
26:50 And basically, it's not in order of like hard to easy. It's just sort of, hey,
26:55 these are things that have helped in the past and that statisticians and machine learning engineers
27:02 have relied on in the past to get started and it's worked for them. So maybe it'll work for you.
27:06 Cool. Well, a lot of machine learning and data science is about making predictions. We have some
27:12 data. What does that tell us about the future, right?
27:16 Right.
27:16 That's where the Bayesian inference comes from in that world, right?
27:20 Yeah. It's trying to form beliefs, which could be a belief about something that already happened that
27:26 you don't know about, but you'll find out in the future or be affected by in the future, or it could
27:30 be a belief about something that will happen in the future. So something that either will happen in
27:35 the future or you'll learn about in the future. But Bayesian inference is more about, you know,
27:39 forming beliefs and I kind of call it like it's a quantification of the scientific method. So in the
27:47 basic form, the Bayes rule is very easy. You start with your current beliefs and you codify that in a
27:53 special mathematical way. And then you say, okay, here's some new data I received on this topic. And then it
27:59 gives you a framework to update your beliefs within the same framework that you've began with.
28:04 Right. And so like an example that you gave would be say a fire alarm, right?
28:09 We know from like life experience that most fire alarms are false alarms. You know, one example is
28:16 what is your prior belief that there is a fire right now without seeing the alarm? The alarm is the data.
28:23 The prior is what's the probability that, you know, my building is on fire and I need to
28:29 get the F out right now. You know, it's very low actually. Yeah. I mean,
28:34 yeah, for most of us, it hasn't really happened in our life. Maybe we've seen one or two fires,
28:39 but they weren't that big of a deal. I'm sure there are some people in the audience who have seen
28:44 bad fires and for them, maybe their prior is a little higher.
28:47 I once in my entire life have had to escape a fire.
28:49 Yeah.
28:50 Only once, right?
28:51 Were you in like real danger or?
28:53 Oh yeah, probably. It was a car and the car actually caught on fire.
28:57 Oh yeah. That sounds pretty bad.
28:58 It had been worked on by some mechanics and they put it back together wrong. It like shot
29:02 oil over something and it caught fire. And so we're like, Oh, the car's on fire. We should get out of
29:05 it.
29:06 Yeah. But yeah, sitting in your building at work, your prior is going to be much lower than in a car that
29:11 you just worked on. So when the alarm goes off, okay, that's your data. The data is that we received
29:18 an alarm today. And so then you have to think about, okay, I still have two hypotheses, right? Hypothesis one
29:26 is that there is a fire and I have to escape. And hypothesis two is that there is no fire.
29:32 And so once you hear the alarm, you still have those two hypotheses. One is that the alarm is
29:37 going off and there's a real fire. And two is that there is no fire, but this is a false alarm.
29:42 And so what ends up happening is that because there's a significant probability of a false alarm.
29:48 So at the beginning, there is a very low probability of a fire. After you hear the alarm,
29:54 there's still a pretty low probability of a fire, but the probability of a false alarm still overwhelms
29:58 that. Now I'm not saying that you should ignore fire alarms all the time, but because in that case,
30:03 that's a, that's a case where the action that you take is important regardless of the belief. So,
30:10 you know, Hey, there is a very low cost to checking into it, at least checking into it or leaving the
30:17 building in, if you have a fire alarm, but there's a very high consequence of failure. So high.
30:21 Exactly. Exactly. But in terms of just forming beliefs, which is a good reason not to panic,
30:26 you shouldn't put a lot of probability on the idea that there's definitely a fire.
30:31 Okay. Yeah. So that's basically Bayesian inference, right? I know how likely a fire is. I have all
30:37 of a sudden, I have this piece of data that now there is a fire. I have a set, a space of hypotheses
30:43 that could apply, try to figure out which hypothesis, start testing and figure out which one is the right
30:50 one. Maybe. Yeah. So you take your prior. So let's say there's like a, I don't know, one in 10,
30:56 a hundred thousand chance that there's a fire in the building today and a 99,999 chance there isn't.
31:03 Then you take that, that's your prior. Then you multiply it by your likelihood, which is okay.
31:10 What is the likelihood of seeing the data given that the hypothesis is true? So what's the likelihood
31:17 that the alarm would go off if there is a fire? Maybe that's pretty high. Maybe that's close to one
31:21 or a little bit lower than one. And then on the second hypothesis that there's no fire,
31:26 what's the likelihood of a false alarm today, which could actually be pretty high. Could be like one
31:32 in a thousand or even one in a hundred in some buildings. And then you multiply those together and
31:36 then you get an unnormalized posterior and that is your answer. So it's really just multiplication.
31:40 Yeah. It's like simple fractions once you have all the pieces, right? So it's a pretty simple
31:45 algorithm. It's very hard to describe through audio, but it's much better visually if you want to check it
31:51 out. I've been struggling to describe it through audio for, you know, for the last year and a half,
31:55 but I do the best I can.
31:57 This is like describing code. You can only take it so precisely.
32:00 Yeah.
32:00 This portion of Talk Python to Me is brought to you by Tidelift. Tidelift is the first managed
32:07 open source subscription, giving you commercial support and maintenance for the open source
32:12 dependencies you use to build your applications. And with Tidelift, you not only get more dependable
32:17 software, but you pay the maintainers of the exact packages you're using, which means your software
32:22 will keep getting better. The Tidelift subscription covers millions of open source projects across
32:27 Python, JavaScript, Java, PHP, Ruby, .NET, and more. And the subscription includes security updates,
32:33 licensing, verification, and indemnification, maintenance and code improvements, package selection,
32:38 and version guidance, roadmap input, and tooling and cloud integration. The bottom line is you get the
32:44 capabilities you'd expect and require from commercial software. But now for all the key open source
32:50 software you depend upon. Just visit talkpython.fm/Tidelift to get started today.
32:56 This comes from a reverend, Reverend Bays, who came up with this idea in the 1700s, but for a long time,
33:07 it wasn't really respected, right? And then it actually found some pretty powerful,
33:12 it solved some pretty powerful problems that matters a lot to people recently.
33:17 Yeah. I mean, I can't go through the whole, do the whole history justice in just a few minutes, but
33:22 I'll try to give my highlights, which was this reverend who was sort of, he was a, you know,
33:28 he was into theology and he was also into mathematics. So he was probably like pondering big questions and
33:34 he wrote down notes and he was trying to figure out the validity of various arguments.
33:39 His notes were found after he died, so he'd never published that. And so this was taken by
33:46 Pierre Laplace, who was a more well-known mathematician and kind of formalized. But when the basis of
33:52 statistical thinking was built in the late 20th, early 19th century, or late 19th, early 20th century,
34:00 it really went in a more frequentist direction where it's like, no, a probability is actually a
34:08 fraction of a repeatable experiment that kind of like over time, what fraction does it, does it end up
34:15 as? And so they consider probability as sort of a, an objective property of the system. So for example,
34:22 a dice flip, well, each side is one sixth. That's like kind of an objective property of the,
34:27 of the die. Whereas no Bayesian statistics is called sort of based on belief. And because belief kind of
34:33 seemed unscientific and the frequentists had very good methods for coming up with, with answers and
34:40 more, more objective ways of doing it, they sort of had the upper hand. But as kind of the focus got into
34:48 more complex issues and we had the rise of computers and that sort of thing, and the rise of more data and
34:55 that sort of thing, Bayesian inference started taking a bigger and bigger role until now, I think most
35:02 machine learning engineers and most data science scientists think as a Bayesian. And so it's like
35:07 some examples in history, most people are probably aware of Alan Turing at Bletchley Park, along with
35:14 many other people, you know, building these machines that broke the German codes during World War II.
35:19 It's all movie about it.
35:21 Right. That's trying to break the Enigma machine and the Enigma code. And that, those were some
35:26 important problems to solve, but also highly challenging.
35:31 Yeah. And so they incorporated a form of Bayes rule into this. Well, what are my relative beliefs
35:37 as to the setting of the machine? Because, you know, the machine could have had quadrillions of settings and
35:42 they're trying to distinguish between which one is likely to have and which one's not likely to have.
35:48 But after the war, that stuff was classified. So nobody could say, oh yeah, Bayesian inference was
35:55 used in that problem. And one interesting application that I found, even as it wasn't accepted by academia
36:00 for many years, was life insurance. Because they're kind of on the hook for determining if the actuaries
36:07 get the answer wrong as to how likely people are to live and die, then they're on the hook for lots and
36:13 lots of money or like the continuation of their company if they get it wrong.
36:17 And so-
36:18 Right. Right. Or how likely is it to flood here?
36:20 How likely is it for there to be a hurricane that wipes this part of the world off the map?
36:25 Right.
36:25 And a lot of these were one-off problems. You know, one problem is, you know, what's the
36:29 likelihood of two commercial planes flying into each other? It hadn't happened, but they wanted to
36:34 estimate the probability of that. And you can't do repeated experiments on that. So they really had to
36:38 use a priors, which was sort of like expert data. And then, you know, more recently, as we had the rise of
36:45 kind of machine learning algorithms and big data, you know, Bayesian methods have become more and more
36:52 relevant. But also a big problem was, you know, the problems that we just mentioned, which are, you
36:58 know, fire alarms and figuring out whether or not you have a disease and things like that. That's the
37:02 two hypothesis problem. But a lot of times you have an infinite space, you have an infinite hypothesis
37:08 problem that you're trying to determine between an infinite set of possible hypotheses. And that becomes
37:14 very difficult to do, becomes extremely difficult without a computer, even with a computer becomes
37:19 difficult to do. And so, you know, there's been a lot of research into how do you search that space
37:24 of hypotheses to find the ones that are most likely. And so if you've heard the term Markov chain Monte
37:29 Carlo, that is the most common algorithm used. And for that purpose, there is even current research
37:36 into that, to making that faster and finding the hypothesis you want more quickly. Andrew Gellman at
37:41 Columbia has some, a lot of stuff out about this. And he has like a new thing that's called like the
37:48 nuts, which is like the no U-turn sampler, which is based off a very complicated version of MCMC.
37:55 And so that's what's used in a framework that Python has called PyMC3 to come up with your
38:02 most likely hypothesis very, very quickly.
38:04 So let's take this over to the Python world. Yeah. Like, yeah, there's a lot of stuff that works
38:10 with it. And obviously, like you said, the machine learning deep down uses some of these techniques,
38:15 but this PyMC3 library is pretty interesting. Let's talk about it. So its subtitle is probabilistic
38:23 programming in Python.
38:25 If I could start with some alternatives, which I've used because I haven't, I've been diving into
38:30 reading about PyMC3, but I haven't used it personally. So even when I was doing things in 2014,
38:36 just on my own, basically without libraries, I was able to use Python very, very easily to
38:42 kind of put in these equations for Bayesian inference on whether it's multi-logistic regression,
38:50 or another one I did was Dirichlet prior calculator, which if I can kind of describe that, it's sort
38:56 of thinking, well, how, what should I believe about a place before I've seen any reviews? Should I
39:00 believe it's good? Should I believe it's bad? You know, if I have very few reviews, what should I
39:04 believe about it? Which was an important question to ask for something like four square city guide in
39:10 many cases, because we didn't have a lot of data. And so that was a good application of Bayesian
39:15 inference. And I was able to just use the equations straight up and kind of from first principles,
39:21 apply algorithms directly in Python. And it actually was not that hard to do because when searching the
39:29 space, there was a single global maximum, didn't have to worry about the local maximum in these
39:34 equations. So it was just a hill climbing. Hey, I'm going to start with this hypothesis in this
39:39 n dimensional space, and I'm going to find the gradient, I'm going to go a little higher,
39:43 a little higher, a little higher gradient ascent is what I described, although it's usually called
39:47 gradient descent. So that's sort of an easy one to understand. Then if you want to do MCMC directly,
39:54 because you have some space that you want to search, and you have the equations of the probability
40:00 on each of the points in that space, I used pi MC, which is spelled E M C E E, which is a simple
40:09 program that only does MCMC. And so I had a lot of success with that when I wanted to do some one off
40:18 sampling of, you know, non standard probability distributions. So those are ones that I've actually
40:24 used and had success with in the past. But pi MC three seems to be like the full, you know, we do
40:31 everything sort of a thing. And basically, what you do is you program probabilistically. So you say,
40:38 hey, I imagine that this is how the data is generated. So I'm just going to basically put
40:44 that in code. And then I'm going to let you, the algorithm work backwards and tell me what the
40:50 parameters originally were. So if I could do a specific here, let's say I'm doing logistic regression,
40:57 which is like, every item has a score, or, you know, in the case that I was working on,
41:02 every word has a score, the scores get added up, that's then a real number, then it's transformed
41:08 using a sigmoid into a number between zero and one. And that's the probability that's a positive review.
41:13 And so basically, you'll just say, hey, I have this vector that describes the words this has,
41:20 then I'm going to add these parameters, which I'm not going to tell you what they are.
41:24 And then I'm going to get this result. And then I'm going to give you the final data set at the end.
41:28 And it kind of works backwards and tells you, okay, this is what I think the parameters were.
41:33 And what's really interesting about something like pi MC3, which I would like to use in the future is
41:40 when you do a linear regression or logistic regression, in kind of standard practice,
41:44 you get one model at the end, right? This is the model that we think is best. And this is the model
41:50 that has the highest probability. And this is the model that we're going to use. Great. You know,
41:55 that that works for a lot of cases. But what pi MC3 does is that instead of picking a model at the end,
42:02 it says, well, we still don't know exactly which model produced this data. But because we have the
42:08 data set, we have a better idea of which models are now more likely and less likely. So we now have
42:13 a probability distribution over models. And we're going to let you pull from that.
42:17 So it kind of gives you a better sense of what the uncertainty is over the model. So
42:21 for example, if you have a word in your data set, let's say the word's delicious, and it's a pod,
42:28 we know it's a positive word. But let's say for some reason, there's not a lot of data on it,
42:32 then it can say, well, I don't really know what the weight of delicious should be.
42:38 It's being used at rock concerts. We don't know why. What does it mean?
42:41 Yeah, yeah, yeah. And so we're going to give you a few possible models. And, you know, and you can
42:45 keep sampling from that. And you'll see that the deviation, the discrepancy, the variance of that
42:52 model is going to be very high of that weight is going to be very high, because we just don't have a
42:56 lot of data on it. And that's something that standard regressions just don't do.
43:00 That's pretty cool. And the way you work with it is, you basically code out the model and like a
43:06 really nice Python language API. You kind of say, well, this, I think it's a linear model,
43:13 I think it's this type of thing. And then like you said, it'll go back and solve it for you. That's
43:18 pretty awesome. I think it's nice.
43:19 Right. A good thing to think about it is in terms of just a standard linear regression, like,
43:24 what's the easiest example I can think of? Try to find someone's weight from their height,
43:29 for example. And so you think there might be an optimal coefficient on there given the data.
43:36 But if you use PyMC3, it will say, no, we don't know exactly what the coefficient is given your data.
43:40 You don't have a lot of data, but we're going to give you several possibilities. We're going to give
43:44 you a probability distribution over it. And as I say, in the local maximum, you shouldn't make everything
43:50 probabilistic because there is a cost in that. But oftentimes you can, by considering something to be,
43:56 rather than considering one single truth by considering multiple truths probabilistically,
44:00 you can unlock a lot of value. In this case, you can kind of determine your variance a little better.
44:05 Yeah, that's super cool. I hadn't really thought about it. And like I said, the API is super clean
44:10 for doing this. So it's great. Yeah.
44:12 Where does this Bayesian inference, like, where do you see this solving problems today? Where do you see
44:18 like stuff going? What's the world look like now?
44:21 I've been using it to solve problems basically as soon as I started working as a machine learning engineer
44:26 at Foursquare, basically using Bayes' rule as kind of my first principles whenever I approach a problem.
44:32 And it's never driven me in the wrong direction. So I think it's one of those timeless things that
44:38 you can always use. For me, especially after working with our attribution product a lot,
44:44 I think that the future is trying to figure out causality a lot better. And I think that's where
44:51 some of these more sophisticated ideas come in. Because it's one thing to say, this variable is
44:56 correlated with that and I can have a model. But it's like, well, what's the probability that this
45:00 variable, changing this variable actually causes this other variable to change? In the case of ads,
45:06 where you could see where it's going to unlock a lot of value for companies where, you know,
45:10 there might be a lot of investment in this, is what is the probability that this ad affects
45:16 someone's likelihood to visit my place or to buy something from me more generally? Or what is my
45:23 probability distribution over that? And so can I estimate that? And I think that that whole industry
45:31 of online ads is, it's very frustrating for an engineer because it's so inefficient. And there's so
45:37 many people in there that don't know what they're doing. And it could be very frustrating at times.
45:40 But I think that means also that there's a lot of opportunity to like unlock value if you have
45:46 a lot of patience. Sure. Well, so much of it is just they looked for this keyword, so they must be
45:51 interested, right? It doesn't take very much into account. Yeah, but the question is, okay, maybe they
45:56 look for that keyword and now they're going to buy it no matter what I do. So don't send them the ad,
46:00 send the ad to someone who didn't search the keyword. Or maybe they need that extra push and that extra
46:04 push is very valuable. It's hard to know unless you measure it. And you measure it, you don't get a
46:10 whole lot of data. So you really, it really has to be a Bayesian model. Whoever uses these Bayesian
46:17 models is going to get way ahead. But right now it goes through several layers. I kept saying when we
46:23 were working on this problem and people weren't getting what we were doing, I was like, I wish the
46:29 people who are writing the check for these ads could get in touch with us because I know they care.
46:33 But, you know, oftentimes you're working through sales and someone on the other side.
46:39 It was just too many layers between, right?
46:41 Yeah.
46:41 Yeah, for sure.
46:42 Earlier, you spoke about having your code go fast and you talked about Cython.
46:48 Oh yeah.
46:48 What's your experience with Cython?
46:49 I used that for the multi-logistic regression. And all I can say is it took a little getting used
46:57 to, but, you know, I got an order of magnitude speed up, which we needed to launch that thing
47:02 in our one-off Python job at Foursquare. So it took only a few hours versus all day. So it was kind
47:12 of a helpful tool to get that thing launched. And I haven't used it too much since, but I kind of keep
47:18 that in the back of my mind as a part of my toolkit.
47:21 Yeah. It's great to have in the toolkit. I feel like it doesn't get that much love, but
47:25 I know people talk about Python speed and, oh, it's fast here. It's slow there.
47:30 Yeah.
47:31 First people just think it's slow because it's not compiled, but then you're like, oh,
47:34 but wait about the C extensions. You go, actually, yeah, that's actually faster than Java or something
47:39 like that. So interesting.
47:40 Yeah. I've also had a big speed up just by taking, you know, a dictionary or matrix I was using and then
47:47 using NumPy instead of the, or NumPy, I don't know how you pronounce it, but instead of using-
47:53 I go with NumPy, but yeah.
47:54 Okay.
47:54 NumPy instead of the standard, like, you know, Python tools, you could also get a big speed
48:01 up there.
48:01 Yeah, for sure. And that's pushing it down into the C layer, right?
48:04 Yeah.
48:04 But a lot of times you have your algorithm and Python, and one option is to go write that C
48:10 layer because you're like, well, we kind of need it. So here we go down the rabbit hole of writing
48:13 C code instead of Python. But Cython is sweet, right? Especially the latest one, you can just put
48:18 the regular type annotations, the Python three type annotations.
48:21 Oh, yeah.
48:21 On the types. And then, you know, magic happens.
48:24 I definitely, I just started with Python and it was like, you know, we're in this,
48:28 these three functions 90% of the time, just fix that.
48:31 It's usually the slow part is like really focused. Most of your code, it doesn't even matter what
48:35 happens to it, right? It's just, there's like that little bit where you loop around a lot
48:39 and that matters.
48:40 Yeah.
48:41 Yeah.
48:41 It's funny how we over optimize and you can't escape it. Like even when I'm creating,
48:46 you know, I see like a bunch of doubles. I'm like, oh, but these are only one and zero. Can
48:50 we like change them to Boolean? But like in the end, it doesn't care. It doesn't matter.
48:54 For most of the code, it really has no effect.
48:56 For sure.
48:57 Except in that one targeted place.
48:58 Yeah. So the trick is to use the tools to find it, right?
49:01 Yeah.
49:02 Like C profiler or something like that. The other major thing, you know, one thing you can do to
49:07 speed up stuff like this, these algorithms is just to say, well, I wrote it.
49:11 I wrote it in Python or I use this data structure and maybe if I rewrote it differently or I wrote
49:17 it in C or I applied Cython, it'll go faster. But it could be that you're speeding up the execution
49:23 of a bad algorithm. And if you had a better algorithm, it might go a hundred times faster
49:28 or something, right? Like, so how do you think about that with your problems?
49:31 That's what I did for the, back in 2014 with the Dirichlet prior calculator. And that was an
49:38 interesting problem to solve because to recap on that, it's one of the use cases we had.
49:44 Okay. What's my prior on a venue before I've gotten any reviews? What's my prior on a restaurant
49:49 before I've gotten any reviews? And I'm using the experience of the data on all the other restaurants
49:53 I've seen. So we know what the variance is. And let me try to come up with an equation that can
49:59 calculate that value from the data. And it turned out there were some algorithms available,
50:04 but as I dug into the math, I noticed that there was like a math trick that I could make use of.
50:12 In other words, it was something like certain logs were being taken of the same number,
50:17 were being taken over and over again. And it's like, okay, just store how many times we took the
50:23 log. And then when I dug into the math, they kind of combined into one term and multiply that together.
50:28 So essentially I used a bunch of factoring and refactoring, whether you think of it as factoring
50:33 code or factoring math to get kind of an exponential speed up in that algorithm. And so that's why I
50:41 published a paper on it. I was very proud of that. It was a, it was very satisfying thing to do.
50:45 It might not have mattered in terms of our product, but I think a lot of people used it though,
50:49 to be like, I want rather than just taking an average of what I've seen in the past. No,
50:53 I want to do something that is based on good principles. And so I want to use the Dirichlet
51:00 prior calculator. And so some people have used that. It's my Python code online. And the algorithm has
51:07 proven very fast and like almost instantaneous. Basically, as soon as you load all the data in,
51:13 it gives you the answer, which I like. Now, my next step to that is to use PyMC3,
51:19 rather than giving you an answer, it should give you a probability distribution over answers.
51:22 Yeah, that's right.
51:23 I haven't done that yet. Didn't know about that. Yeah. Didn't know about that at the time. I think
51:26 my speed up would still apply.
51:28 Yeah, that's cool. Well, that definitely takes it up a notch. What about learning more about
51:32 Bayesian analysis and inference and like, where should people go for more resources?
51:37 Oh, okay. Well, a kind of a history book that I read that I really like on Bayesian inference
51:42 is one called The Theory That Should Not Die by Sharon McGrane, a few years old, but it's really good
51:50 if you're interested in the history on that. I have a book about PyMC3, kind of a tech book that does go
51:56 into the basics of Bayesian inference that has a really good title. It's called Bayesian analysis
52:02 with Python. Oh, yeah.
52:04 Yeah, yeah. So that's a good one to look at. And then I have a bunch of episodes on my show
52:10 that are related to Bayesian analysis. So episode zero and one on my show were basically just starting
52:18 out trying to describe Bayes' role to everyone. I sort of attempted to do the description in episode
52:24 zero. And then in episode one, I applied it to the news story that was happening that day,
52:28 which was kind of the fire alarm at the bigger scale, which was everyone in Hawaii getting this
52:33 message that there's an ICBM missile coming their way because of a mistake someone made.
52:39 And then-
52:41 Yeah, because of some terrible UI decision on like the tooling.
52:45 Yeah, is that what it was?
52:46 Yeah, yeah.
52:47 Yeah.
52:47 There was some analysis about what had happened and not probabilistically, but there was some,
52:52 there's some really old crummy UI and they have to press some button to like acknowledge a test.
52:58 Or treat it as real and somehow they look like almost identical or there's some weird thing
53:03 about the UI that had like tricked the operator into saying, oh, it's real.
53:07 Yeah, yeah. And then another couple episodes I want to highlight is episode 21 and 22,
53:13 which is sort of kind of 21 is the philosophy of probability. In 22, we talk about the problem
53:18 of p-hacking, which is when people try their experiments over and over and until they get
53:24 something that works with p-values, which is a frequentist idea, which works if you're using
53:29 it properly. But the problem is most people don't. And then we did an episode, I think it
53:33 was 65 on probability, how to estimate the probability of something that's never happened. And then
53:39 78, the one that you mentioned, which was on the history of Bayes and a little more philosophy.
53:45 So I've talked about that a lot. You could probably go to localmaxradio.com or
53:49 localmaxradio.com slash archive and find the ones that you want.
53:52 That's really cool. So yeah, I guess we'll leave it there for now. That's quite interesting. And yeah,
53:58 it gives us a look into some of the algorithms and math we got to know for our data science.
54:02 Now, before you get out of here, though, I got the two questions I always ask everyone.
54:06 You're going to write some Python code. What editor do you use?
54:09 I just use Sublime or TextMate also on Mac. But I'm sure I could do something a little better
54:16 than that. I just picked one and never really looked back.
54:19 Sounds good. And then notable PyPI package?
54:23 Notable.
54:24 Maybe not the most popular, but like, oh, you should totally know about this. I mean,
54:28 you already threw out there PyMC3, if you want to claim that one, or if there's something else. Yeah,
54:32 pick that.
54:33 Yeah. Well, I have BayesPy, which is the one that's like in GitHub slash max slash BayesPy,
54:40 which has all the stuff I talked about. It's not actively developed, but it does have my kind of
54:44 one-off algorithms, which if you're in the market for multinomial models or Dirichlet,
54:52 or you want some kind of interesting new way to do multi-logistic regression, I could certainly give
55:00 that a try. But most people probably want to use kind of the standard toolings. Yeah. Why don't I go
55:06 with that? Why don't I go with the one I wrote a long time ago?
55:09 Yeah. Right on. Sounds good. All right. Final call to action. People are excited about this stuff.
55:13 What do you tell them? What do they do?
55:15 Check out the books I mentioned and check out my website, localmaxradio.com. And also subscribe to the Local Maximum. It should be on all of your podcatchers.
55:26 If it's not on one, please let me know. But it should be on all of your podcatchers.
55:31 localmaxradio.com. It's just every week. And we have a lot of fun. So definitely check it out.
55:35 Yeah, it's cool. You spend a lot of time talking about these types of things.
55:37 Super. All right. Well, Max, thanks for being on the show.
55:40 Michael, thank you so much. I really enjoy this conversation.
55:43 Yeah, same here. Bye-bye.
55:44 Bye.
55:45 This has been another episode of Talk Python to Me. Our guest on this episode was Max Sklar,
55:50 and it's been brought to you by Linode and Tidelift. Linode is your go-to hosting for whatever you're
55:56 building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E.
56:02 If you run an open source project, Tidelift wants to help you get paid for keeping it going strong.
56:08 Just visit talkpython.fm/Tidelift, search for your package, and get started today.
56:14 Want to level up your Python? If you're just getting started, try my Python Jumpstart by
56:19 Building 10 Apps course. Or if you're looking for something more advanced, check out our new
56:23 async course that digs into all the different types of async programming you can do in Python.
56:28 And of course, if you're interested in more than one of these, be sure to check out our
56:32 Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show.
56:37 Open your favorite podcatcher and search for Python. We should be right at the top.
56:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play,
56:46 and the direct RSS feed at /rss on talkpython.fm.
56:50 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.
56:54 Now get out there and write some Python code.
56:56 We'll see you next time.
57:10 We'll see you next time.