Learn Python with Talk Python's 270 hours of courses

#239: Bayesian foundations Transcript

Recorded on Thursday, Oct 10, 2019.

00:00 In this episode, we'll dive deep into one of the foundations of modern data science,

00:03 Bayesian algorithms and Bayesian thinking. Join me along with guest Max Sklar as we look at the

00:09 algorithmic side of data science. This is Talk Python to Me, episode 239, recorded November 10th, 2019.

00:29 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:34 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:39 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter

00:44 via at Talk Python. This episode is brought to you by Linode and Tidelift. Please check out what

00:50 they're offering during their segments. It really helps support the show. Max, welcome to Talk Python

00:55 to Me. Thanks for having me, Michael. It's very great to be on. It's great to have you on as well.

00:59 You've been on Python Bytes before, but never Talk Python to Me.

01:02 That was a lot of fun. I actually got someone reached out to me on Twitter the other day saying,

01:06 hey, I saw you on Python Bytes. So that was really exciting.

01:09 Right on, right on. That's super cool.

01:11 I heard you on Python Bytes. I always say saw you when it's really heard you, but anyway.

01:16 It's all good. So now they can say they saw you on Talk Python to Me as well.

01:20 Now, we're going to talk about some of the foundational ideas behind data science,

01:26 machine learning. That's going to be a lot of fun. But before we get to them, let's set the stage and

01:31 give people a sense of where you're coming from. How do you get into programming in Python?

01:34 That is a really interesting question because I think I started in Python a very long time ago,

01:40 like 10 years ago maybe. I was working on kind of a side project called stickymap.com. The website's

01:47 still up. It barely works. But it was basically a, it was, it was like my senior project as an

01:53 undergrad. So I really, I started this in 2005. And what it was, was it was, you know, Google Maps

01:59 had just come out with their API where you can like, you know, include a Google map on your site.

02:05 And so I was like, okay, this is cool. What can I do with this? Let's add markers all over the map

02:10 and it could be user generated. We would call them emojis now. And people could leave little messages

02:15 and little locations and things like that. This was before there was Foursquare, which is where I

02:19 worked, which is location intelligence. This was just me messing around, trying to make something cool

02:24 and being inspired by the whole host of like, you know, social media startups that were happening at

02:30 the time. And I was using, what was I using at the time? I was using PHP and MySQL to put that

02:38 together. I knew nothing about web development. So I went to the Barnes and Noble. I got that book,

02:42 PHP, MySQL. I got it. But then sometime around like 2008, 2009, I realized, you know, a lot of people

02:48 were talking about Python at work. And I realized like, sometimes I need, this is kind of when I was

02:52 winding down on the project, but I realized, you know, I had all this data and I realized I needed

02:59 a way to like clean the data. I needed a way to like write good scripts that would clear up certain

03:05 like if I have a flat file of like, here's the latitude and longitude, they're separated by

03:11 tabs. And here's a, you know, here's some text that someone wrote that needs to be cleaned up,

03:16 et cetera, et cetera. Yeah, I can write some scripts in like Python or Java, believe it or not, which I

03:23 knew at the time, but then, or sorry, a PHP or Python, which I knew at the time, but like, wait, wait,

03:28 not job. Sorry. I was trying to do it in PHP and Java, which is really bad idea.

03:34 Yeah. Especially PHP sounds tricky. Yeah. Yes, yes, yes. And then I was like, well,

03:39 I'm just learning this Python. I need something. So let me try to do it with Python. And it worked

03:44 really well. And then I had, you know, to deal a lot more with CSVs and stuff like that tab separated

03:52 files. And it really was just a way to like save time at work. And it was like a trick to say,

03:57 hey, that thing that you're doing manually, I can do that in like 10 minutes. And it's not 10 minutes,

04:03 maybe a couple hours and write a script. And it's going to take you like one week. Like I saw someone

04:08 at work trying to change something manually. And so this is all a very long time ago. So I don't

04:12 remember exactly what it was, but it was kind of like a good trick to save time. And it had nothing

04:17 to do with data science or machine learning at the time. It was more like writing scripts to clean up

04:20 files. Well, that's perfect for Python, right? Like it's really one of the things that's super good at.

04:25 It's so easy to read CSV files, JavaScript files, XML, whatever, right? It's just they're all like a

04:31 handful of lines of code and you know, magic happens.

04:34 Yeah. The one thing that I was really impressed with was like, how easy at the time now, when I

04:39 wanted to do more complicated Python packages in like 2012, 2013, I realized, oh, actually,

04:45 some of these packages are complicated to install. But like, I was so impressed with how easy it was

04:50 to just import the CSV package and just be like, okay, now we understand your CSV. If you have some

04:57 stuff in quotes, no problem. If you want to clean up the quotes, no problem. Like it was all just like,

05:01 it just happened very fast.

05:02 Yeah. You don't have to statically link to some library or add a reference to some other thing or

05:07 none of that, right? It's all good. It's all right there.

05:09 Yeah. I mean, that was, those were the days when like, I was still programming in C++ for work. So

05:15 you could imagine what, how big of a jump that was. I mean, that seems so ancient. I used to have to

05:21 program in C++ for the Palm Pilot. That was my first job out of school, which is crazy.

05:26 Oh, wow. That sounds interesting. Yeah.

05:28 Yeah.

05:28 Yeah. Coming from C++, I think people have two different reactions. One, like, wow, this is so

05:33 easy. I can't believe I did this in so few lines. Or this is cheating. It's not real programming.

05:40 It's not for me, you know? But I think people go, who even disagree, like, oh, this is not for me,

05:46 eventually like find their way over. They're pulled in.

05:48 I never had a phase where it was like, oh, this is not for me. But I did have a phase where it was like,

05:54 I don't see, this is just another language. And I don't see why it's better or worse than any other.

05:59 I think that's the phase that you go through when you learn any new language where it's like, okay,

06:02 I see all the features. I don't see what this brings me. It was only through doing those specific

06:07 projects where it was like, aha, no one could have convinced me.

06:10 Yeah. Also, you know, if you come from another language, right, if you come from C++, you come

06:14 from Java, whatever, you know how to solve problems super well in that language. And you're comfortable.

06:20 And when you sit down to work, you say, file a new project and file, new files, start typing.

06:26 And it's like, okay, well, what do I want to do? I want to call this website or talk to this database.

06:30 I'm going to create this and I'll do this. And bam, like, you can just do it. You don't have to just

06:35 pound on every little step. Like, how do I run the code? How do I use another library?

06:41 What libraries are there? Is there like, there's every, you know, it's just that transition is always

06:46 tricky. And it takes a while before you, you get over that and you feel like, okay,

06:52 I really actually do like it over here. I'm going to put the effort into learn it properly because

06:57 I don't care how amazing it is. You're still going to feel incompetent at first.

07:03 The switching costs are so tough. And that's why they say, oh, if you're going to build a new

07:07 product, it has to be like 10 X better than the one that exists or something like that. I don't know

07:11 if that's, you know, literally true, but like it's true with languages too, because it's really hard to

07:18 like pick up a new language and everyone's busy at work and busy doing all the tasks they need to do

07:21 every day. For me, frankly, it was helpful to take that time off in quotes, time off. When I was going to

07:27 grad school, time off from working full-time as a software engineer to actually pick some of this

07:33 stuff up. Absolutely. All right. So you had mentioned earlier that you do stuff at Foursquare and it

07:38 sounds like your early programming experience with sticky maps is not that different than Foursquare,

07:44 honestly. Tell people about what you do. Maybe, I'm pretty sure everyone knows what Foursquare is,

07:48 what you guys do, but tell them what you do there. People might not be aware of where Foursquare is

07:54 today. You know, there is Foursquare is kind of known as that quirky check-in app, find good places

08:01 to go with your friends and eat app, you know, share where you are. And that's where we were in 2011,

08:08 where, when I joined up to, you know, a few years ago, but ultimately, you know, the company kind of

08:14 pivoted business models and sort of said, Hey, we have this really cool technology that we built for the

08:20 consumer apps, which is called Pilgrim, which essentially takes the data from your phone and

08:25 translates that into stops. You know, you'd stopped at Starbucks in the morning, and then you stopped at

08:30 this other place, and then you stopped at work, et cetera, et cetera. And then, you know, that goes into,

08:35 that finds use cases like, you know, across the apposphere, I don't even know what to call it,

08:41 but many apps would like that technology. And so we have this panel and, you know, so for a few years,

08:47 I was working on a product at Foursquare called Attribution, where companies, our clients would say,

08:53 Hey, we want to know if our ads are working, our ads across the internet, not just on Foursquare.

08:57 And we would say, well, we could tell you whether your ads are actually causing people to go into your

09:03 stores more than they otherwise would. And I worked on that for a few years, which is a really cool

09:09 problem to solve, a really cool data science problem to solve, because it's a causality problem.

09:12 It's not just, you know, you can't just say, well, the people who saw the ads visited 10% more,

09:18 because maybe you targeted people who would have visited 10% more.

09:21 Exactly. I'm targeting my demographic, so they better visit more. I got it wrong.

09:26 That industry is a struggle, because the people that you're selling to often don't have the

09:31 backgrounds to understand the difference, and sometimes don't have the incentives to understand

09:36 the difference. But we did the best we could. And so that led to kind of an acquisition that

09:42 Foursquare did earlier this year of Placed, which was an attribution company owned by Snap,

09:49 but they sold it to us through this big deal. You can read about it online.

09:54 Giant tech company trade.

09:56 Yeah. And so I had left Foursquare in the interim, but then I recently went back to work with the

10:05 founder, Dennis Crowley, and just kind of building new apps and trying to build cool apps based on

10:10 location technology, which is really why I got into Foursquare, why I get into Sticky Map,

10:15 and I'm just having so much fun. So that's, and we have some products coming along the way where

10:21 it's not enterprise. It's not, you know, measuring ads. It's not ad retargeting. It's just

10:26 building cool stuff for people. And I, I don't know how long this will last, but I couldn't be happier.

10:32 Sounds really fun. I'm sure Squarespace is, sorry, Squarespace.

10:36 You're not the first fan. Squarespace is around here. Foursquare is in New York where you are.

10:43 Now, I'm sure that that's a great place to be, and they're doing a lot of stuff. They used

10:48 something like Scala. There's some functional programming language that primarily there,

10:52 right? Is it Scala?

10:53 Yeah, it's primarily Scala. I've actually done a lot of data science and machine learning in Scala. And

10:57 sometimes I'm kind of envious of Python because there's better tools in Python. And we do some of

11:03 our, we do some of our initial testing on data sets in Python sometimes, but there is a lot of momentum

11:10 to go with Scala because all of our backend jobs are written in Scala. And so we often have to

11:15 translate it into Scala, which has good tools, but not as good as Python.

11:19 Yeah. Yeah. So I was going to ask, what's the Python story there? Do you guys get to do much

11:24 Python there?

11:25 Yeah. So I have done, if I can take you back in the, to the olden days of 2014, if that's,

11:33 if that's allowed, because one of the things that I did at Foursquare that I'm pretty proud of

11:38 is building a sentiment model, which is trying to take a Foursquare tip, which were like three

11:45 sentences that people wrote in Foursquare on the Foursquare City Guide app. And that gets surfaced

11:50 later. It was sort of compared to the Yelp reviews, but except they're short and helpful and not as

11:57 negative. What we want to do is we want to take those tips and try to come up with the rating of

12:01 the venue because we have this one to 10 rating that every venue receives. And so using the likes

12:07 and dislikes explicitly wasn't good enough because there were so many people who would just click like

12:12 very casually. And so we realized at some point, Hey, we have a labeled training set here. We can say,

12:19 Hey, the person who explicitly liked a place and also left a text tip, that is a label of positive.

12:25 And someone who explicitly disliked a place, that's a label of negative. And someone who left the

12:29 middle option, which we called a meh or a mixed review, their tip is probably mixed. And so we have

12:35 this tremendous data set on tips and that allowed us to build a model, a pretty good model. And it

12:41 wasn't very sophisticated. It was multi-logistic regression based on sparse data, which was like

12:46 what phrases are included in the tip. Right. Trying to understand the sentiment of the actual words,

12:53 right? Yeah. There was logistic regression available in Python at the time, which is great,

12:58 but I wanted something a little custom, which is now available in Python. But back then it was kind

13:03 of hard to find these packages and not just that there, even when there were packages, sometimes

13:08 it's difficult to say, okay, is this working? How do I test what's going on into the hood? It's not very,

13:13 so I decided to build my own in Python, which was a multi-logistic regression means we're trying to find

13:20 out three categories like positive review, negative review, or mixed review based on the label data.

13:27 And we were going to have a sparse data set, which means it's not like there are 20 words that we're

13:34 looking for. No, there are like tens of thousands. I don't know the exact number, tens of thousands,

13:39 hundreds of thousands of phrases that we're going to look for. And for most of the tips, most of the

13:43 phrases are going to be zero. Didn't see it, didn't see it, didn't see it. But every once in a while,

13:47 you're going to have a one, didn't see it. So that's when you have that matrix where most of

13:50 them are zero, that's sparse. And then thirdly, we wanted to use elastic net, which meant that

13:56 most of the weights are going to be set to exactly zero. So when we store our model,

14:01 most words, it's going to say, hey, these words aren't sentiment. So we're just going to,

14:06 these don't really affect it. We want to have it exactly zero, except what a traditional logistic

14:10 regression would do is it would say, okay, we are going to come up with the optimal,

14:18 but everything will be close to zero. And so you have to kind of store it. You have to store the

14:21 like 0.0001. So that's a problem too. So I actually built that kind of open source and put that on my

14:29 GitHub on base pi back in 2014. I don't think anyone uses it, but it was a lot of fun. I use Cython to make

14:34 go really fast. It's kind of a problem at Foursquare because it's the only thing that

14:39 runs in Python. And every once in a while, someone asks me like, what's this doing here?

14:42 Exactly. How do I run this? I don't know. This doesn't fit to our world, right?

14:45 Yeah.

14:45 Cool. All right. Well, Foursquare sounds really fun. Another thing that you do

14:50 that I know you from, I don't know you through the Foursquare work that you're doing. I know you

14:54 through your podcast, The Local Maximum, which is pretty cool. You had me on back on episode 73.

15:00 So thanks for that. That was cool.

15:01 That is our most downloaded episode right now.

15:04 Really? Wow. Awesome.

15:06 Yeah.

15:06 That's super cool to hear.

15:07 Yeah.

15:07 More relevant for today's conversation, though, would be episode 78, which is all about Bayesian

15:14 thinking and Bayesian analysis and those types of things. So people can check that out for a more

15:20 high level, less technical, more philosophical view, I think, on what we're going to talk about

15:26 if they want to go deeper, right?

15:27 Absolutely. You could also ask me questions directly because I ramble a little bit in that, but I cover

15:31 some pretty cool ideas, some pretty deep ideas there that I've been thinking about for many years.

15:37 Yeah, for sure. So maybe tell people just really quickly what The Local Maximum is, just to give you

15:42 a chance to tell them about it.

15:43 Yeah. So I started this podcast about a year and a half ago in 2018.

15:48 And it started with, you know, I started basically interviewing my friends at Foursquare being like,

15:53 hey, this person's working on something cool, that person's working on something cool, but they never

15:57 get to tell their story. So why not let these engineers tell their story about what they're

16:02 working on? And since then, I've kind of expanded it to cover, you know, current events and interesting

16:09 topics in math and machine learning that people can kind of apply to their everyday life. Some episodes

16:14 get more technical, but I kind of want to bring it back to the more general audience that it's like,

16:18 hey, my guests and I, we have this expertise. We don't just want to talk amongst ourselves. We want

16:23 to actually engage with the current events, engage with the tech news and try to think, okay, how do we

16:29 apply these ideas? And so that's sort of the direction that I've been going in. And it's been a lot of fun.

16:36 I've expanded beyond tech several times. I've had a few historians on, I've had a few journalists on.

16:42 That's cool. I like the intersection of tech and those things as well. Yeah, it's pretty nice.

16:45 This portion of Talk Python to me is brought to you by Linode. Are you looking for hosting that's fast,

16:53 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at

16:57 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated

17:04 server with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or

17:09 where your users are, there's a data center for you. Whether you want to run a Python web app,

17:13 host a private Git server, or just a file server, you'll get native SSDs on all the machines,

17:19 a newly upgraded 200 gigabit network, 24-7 friendly support, even on holidays, and a seven-day money-back

17:26 guarantee. Need a little help with your infrastructure? They even offer professional

17:30 services to help you with architecture, migrations, and more. Do you want a dedicated server for free

17:34 for the next four months? Just visit talkpython.fm/Linode. Let's talk about general data science

17:43 before we get into the Bayesian stuff. So I think one of the misconceptions in general is that you have to be

17:52 a mathematician or be very good at math to be a programmer. I think that's a false statement.

17:59 To be a programmer.

18:00 Yes, yes. Software developer. Straight up, I built the checkout page on this e-commerce site,

18:05 for example.

18:06 I would agree. I think you need some abstract thinking. You can't escape letters and stuff and

18:12 variables, but you don't need, well, in the case of data science to compare, like you don't need,

18:18 you don't need algebra or you don't need maybe a little bit, but you don't really need calculus and

18:23 you don't need geometry, linear algebra and geometry. Yeah. Sometimes it's a UI engineer. You might need a

18:29 little geometry.

18:29 I mean, there's certain parts that you need that kind of stuff. Like video game development, for example,

18:33 everything is about multiplying something by a matrix, right? You put all your stuff on the screen,

18:39 even arrange it and rotate it by multiplying by matrices. There's some stuff happening there you

18:44 got to know about, but generally speaking, you don't. However, I feel like in data science,

18:48 you do get a little bit closer to statistics and you do need to maybe understand some of these

18:55 algorithms. And I think that's where we can focus our conversation for this show is like,

19:01 what do we need to know in general? And then the idea of Bayesian Bay's theorem and things like that.

19:06 What do we need to know if I wanted to go into say data science? Because like I said, I don't really

19:12 think you know that need to know that to do like, you know, connecting to a database and like saving a

19:16 user. And you absolutely need logical thinking, but not like stats, but for data science, what do you

19:23 think you need to know?

19:24 Well, for data science, it really depends on what you're doing and how far down the rabbit hole you

19:29 really want to go. You don't necessarily need all of the philosophical background that I talk about.

19:34 I just love thinking about it. And it sort of helps me focus my thoughts when I do work on it

19:41 to kind of go back and think about the first principles. So I get a lot of value out of that,

19:46 but maybe not everyone does. There is sort of a surface level data science that or machine learning

19:53 that you can get away with. If you want to do simple things, which is like, hey, I want to

19:58 understand the idea that I have a training set, you know what a training set is, and this is what I want

20:04 to predict. And here is roughly my mathematical function of how I know whether I'm predicting it well

20:12 or not, but it could be something simple like the square distance, but already you're introducing

20:16 some math there. And basically, I'm going to take a look at some libraries and I'm going to

20:23 see if something works out of the box and gives me what I need. And if you do it that way,

20:28 you need a little bit of understanding, but you don't need everything that like I would say kind of a

20:33 true data science or machine learning engineer needs. But if you want to go deeper and kind of

20:39 make it your profession, I would say you need kind of a background in calculus and linear algebra.

20:45 And again, like, look, if I went back to grad school and I like if I went to a linear algebra

20:52 final and I took it right now, would I be able to get every question right? Probably not. But I know

20:57 the basics and I have a great understanding of how it works. And if I look at the equations, I can kind of

21:03 break it down, you know, maybe with a little help from Google and all that.

21:06 I think there's a danger of using these libraries to make predictions and other stuff when you're

21:13 like, well, the data goes in here to this function and then I call it and then out comes the answer.

21:18 Maybe there's some conditionality versus independence requirement that you didn't understand and it's

21:24 not met or, you know, whatever, right?

21:26 That's why I said it's really surface level and you can get away with it sometimes, but

21:30 only for so long. And I think understanding where these things go wrong outside the, you know, when you

21:37 take these black box functions requires both kind of a theoretical understanding of how they work and

21:43 then also just like experience of seeing things going wrong in the past.

21:46 Yeah. That experience sounds hard to get, but it seems like I'm an experience, right?

21:51 You just, you got to get out there and do it.

21:52 Right. Well, here's a good example. One time I was trying to predict how likely someone is to

21:57 visit a store. This was part of working on Foursquare's attribution product, right? And

22:03 someone was using random forest algorithm, or maybe it was just a simple decision tree. I'm not sure,

22:09 but basically it creates a tree structure and puts people into buckets and determines whether or not,

22:18 you know, and for each bucket, it says, okay, in this bucket, everyone visited and in this bucket,

22:21 everyone didn't, or maybe this bucket is 90, 10 and this bucket is 10, 90. And so I can give good

22:26 predictions on the probability someone will visit based on where they fall on the leaves of the tree.

22:32 And we were using it and something just wasn't making sense to me. Somehow the numbers were just,

22:39 something was wrong. And then I said, okay, let's make, let's make more leaves. And then I made more

22:44 leaves. Like I made, I made the tree deeper, right? And then they're like, see, when you make the tree

22:49 deeper, it gets better. That makes sense because it's, it's more fine graining. I'm like, yeah,

22:54 but something doesn't make sense. It shouldn't be getting this good. And then as I realized what was

22:59 happening, what was it, what was happening was some of the leaves had nobody visited in this leaf.

23:04 That makes a lot of sense because most days you don't visit any particular chain.

23:08 And when it went to zero and then it saw someone visited, well, the log likelihood loss,

23:15 it basically predicted 0% of an event that didn't happen. And so log, when you do log likelihood

23:22 loss or negative log likelihood loss, the score is like the negative log of that. So essentially you

23:27 should be penalized infinitely for that because there was no smoothing. But the language we were using,

23:33 which I think was spark or something like that. And it was probably some library and spark. I probably

23:40 shouldn't throw a spark under the bus. It was probably some library or something was changing

23:43 that infinity to a zero. So the thing that was infinitely bad, it was saying was infinitely good.

23:49 And so the worst thing. And that took, oh God, that took us so long to figure out. Like it's embarrassing

23:56 how long that one took to figure out, but that's, that's a good example of when experience will get

24:02 you in something. I don't think I've ever talked about this one publicly.

24:05 Yeah. Well, you just got to know that, you know, that's not what we're expecting, right?

24:10 Yeah. But you know, theoretically, Hey, if I more fine grained my tree, if I, you know,

24:15 make my groups smaller, maybe it works better. But I was like something, I was like, something's not

24:21 right. It's working a little too good. There was nothing specifically that got me, but it was just

24:26 like, there's probably a lot of stuff out there. That's actually people are taking actions on and

24:30 spending money on, but it's, it's like that, right? Yeah. Yeah. So let's see. So we talked about

24:35 some of the math stuff. If you really want to understand the algorithms, you know, statistics,

24:39 calculus, linear algebra, you obviously need calculus to understand like real statistics,

24:44 right? Continuous statistics and stuff. What else though? Like, do you need to know machine learning?

24:50 What kind of algorithms do you need to know? Like what, what in the computer science-y side of things

24:55 do you think you got to know? Bread and butter of the data scientists that I work with is machine

25:00 learning algorithms. So I think that is very helpful to know. And I think that, you know, some of the

25:06 basic algorithms in machine learning are good to know, which is like the K nearest neighbor,

25:10 K means, logistic regression, decision trees, and then some kind of random forest algorithm,

25:17 whether it's just random forest, which is a mixture of trees or gradient boosted trees we've had a lot

25:22 of luck with. And then a lot of this deep learning stuff is, well, neural networks is one of them.

25:29 Maybe you don't need to be an expert in neural networks, but it's certainly one to be aware of.

25:33 And based on these neural networks, deep learning is becoming very popular. And I've been hearing and

25:40 kind of looking into reading about deep learning for many years, but I have to say, I haven't actually

25:45 implemented one of these algorithms myself. But I just interviewed a guy on my show, Mark Ryan,

25:51 and he came out with a book called machine learning for structured data, which means, hey, you don't

25:57 just, this doesn't just work for like images or audio recognition, you could actually use it for

26:01 regular marketing data, like use everything else for. So I was like, all right, that's interesting. Maybe

26:06 I'll work on that now. But I don't think at this point, you need to know machine learning to be a good

26:11 or deep learning to be a good data scientist or machine learning engineer. I think the basics are really

26:16 good to know, because in many problems, you know, the basics will get you very far. And there's a lot

26:20 less that can go wrong.

26:22 Yeah, a lot of those algorithms you talked about as well, like K-Nearest Neighbor and so on.

26:26 There are several books that seem to cover all of those. I can't think of any off the top of my

26:30 head, but I feel like I've looked through a couple and they all seem to have like, here are the main

26:33 algorithms you need to know to kind of learn data science. So not too hard to pick them up.

26:38 Slash names Bishop, the book that I read for grad school, but that's already 10 years old,

26:42 certainly had all that stuff. That was very deep on math. I can send you a link if I want.

26:46 Sure.

26:46 I think kind of any intro book to machine learning will have all of that stuff.

26:50 And basically, it's not in order of like hard to easy. It's just sort of, hey,

26:55 these are things that have helped in the past and that statisticians and machine learning engineers

27:02 have relied on in the past to get started and it's worked for them. So maybe it'll work for you.

27:06 Cool. Well, a lot of machine learning and data science is about making predictions. We have some

27:12 data. What does that tell us about the future, right?

27:16 Right.

27:16 That's where the Bayesian inference comes from in that world, right?

27:20 Yeah. It's trying to form beliefs, which could be a belief about something that already happened that

27:26 you don't know about, but you'll find out in the future or be affected by in the future, or it could

27:30 be a belief about something that will happen in the future. So something that either will happen in

27:35 the future or you'll learn about in the future. But Bayesian inference is more about, you know,

27:39 forming beliefs and I kind of call it like it's a quantification of the scientific method. So in the

27:47 basic form, the Bayes rule is very easy. You start with your current beliefs and you codify that in a

27:53 special mathematical way. And then you say, okay, here's some new data I received on this topic. And then it

27:59 gives you a framework to update your beliefs within the same framework that you've began with.

28:04 Right. And so like an example that you gave would be say a fire alarm, right?

28:09 We know from like life experience that most fire alarms are false alarms. You know, one example is

28:16 what is your prior belief that there is a fire right now without seeing the alarm? The alarm is the data.

28:23 The prior is what's the probability that, you know, my building is on fire and I need to

28:29 get the F out right now. You know, it's very low actually. Yeah. I mean,

28:34 yeah, for most of us, it hasn't really happened in our life. Maybe we've seen one or two fires,

28:39 but they weren't that big of a deal. I'm sure there are some people in the audience who have seen

28:44 bad fires and for them, maybe their prior is a little higher.

28:47 I once in my entire life have had to escape a fire.

28:49 Yeah.

28:50 Only once, right?

28:51 Were you in like real danger or?

28:53 Oh yeah, probably. It was a car and the car actually caught on fire.

28:57 Oh yeah. That sounds pretty bad.

28:58 It had been worked on by some mechanics and they put it back together wrong. It like shot

29:02 oil over something and it caught fire. And so we're like, Oh, the car's on fire. We should get out of

29:05 it.

29:06 Yeah. But yeah, sitting in your building at work, your prior is going to be much lower than in a car that

29:11 you just worked on. So when the alarm goes off, okay, that's your data. The data is that we received

29:18 an alarm today. And so then you have to think about, okay, I still have two hypotheses, right? Hypothesis one

29:26 is that there is a fire and I have to escape. And hypothesis two is that there is no fire.

29:32 And so once you hear the alarm, you still have those two hypotheses. One is that the alarm is

29:37 going off and there's a real fire. And two is that there is no fire, but this is a false alarm.

29:42 And so what ends up happening is that because there's a significant probability of a false alarm.

29:48 So at the beginning, there is a very low probability of a fire. After you hear the alarm,

29:54 there's still a pretty low probability of a fire, but the probability of a false alarm still overwhelms

29:58 that. Now I'm not saying that you should ignore fire alarms all the time, but because in that case,

30:03 that's a, that's a case where the action that you take is important regardless of the belief. So,

30:10 you know, Hey, there is a very low cost to checking into it, at least checking into it or leaving the

30:17 building in, if you have a fire alarm, but there's a very high consequence of failure. So high.

30:21 Exactly. Exactly. But in terms of just forming beliefs, which is a good reason not to panic,

30:26 you shouldn't put a lot of probability on the idea that there's definitely a fire.

30:31 Okay. Yeah. So that's basically Bayesian inference, right? I know how likely a fire is. I have all

30:37 of a sudden, I have this piece of data that now there is a fire. I have a set, a space of hypotheses

30:43 that could apply, try to figure out which hypothesis, start testing and figure out which one is the right

30:50 one. Maybe. Yeah. So you take your prior. So let's say there's like a, I don't know, one in 10,

30:56 a hundred thousand chance that there's a fire in the building today and a 99,999 chance there isn't.

31:03 Then you take that, that's your prior. Then you multiply it by your likelihood, which is okay.

31:10 What is the likelihood of seeing the data given that the hypothesis is true? So what's the likelihood

31:17 that the alarm would go off if there is a fire? Maybe that's pretty high. Maybe that's close to one

31:21 or a little bit lower than one. And then on the second hypothesis that there's no fire,

31:26 what's the likelihood of a false alarm today, which could actually be pretty high. Could be like one

31:32 in a thousand or even one in a hundred in some buildings. And then you multiply those together and

31:36 then you get an unnormalized posterior and that is your answer. So it's really just multiplication.

31:40 Yeah. It's like simple fractions once you have all the pieces, right? So it's a pretty simple

31:45 algorithm. It's very hard to describe through audio, but it's much better visually if you want to check it

31:51 out. I've been struggling to describe it through audio for, you know, for the last year and a half,

31:55 but I do the best I can.

31:57 This is like describing code. You can only take it so precisely.

32:00 Yeah.

32:00 This portion of Talk Python to Me is brought to you by Tidelift. Tidelift is the first managed

32:07 open source subscription, giving you commercial support and maintenance for the open source

32:12 dependencies you use to build your applications. And with Tidelift, you not only get more dependable

32:17 software, but you pay the maintainers of the exact packages you're using, which means your software

32:22 will keep getting better. The Tidelift subscription covers millions of open source projects across

32:27 Python, JavaScript, Java, PHP, Ruby, .NET, and more. And the subscription includes security updates,

32:33 licensing, verification, and indemnification, maintenance and code improvements, package selection,

32:38 and version guidance, roadmap input, and tooling and cloud integration. The bottom line is you get the

32:44 capabilities you'd expect and require from commercial software. But now for all the key open source

32:50 software you depend upon. Just visit talkpython.fm/Tidelift to get started today.

32:56 This comes from a reverend, Reverend Bays, who came up with this idea in the 1700s, but for a long time,

33:07 it wasn't really respected, right? And then it actually found some pretty powerful,

33:12 it solved some pretty powerful problems that matters a lot to people recently.

33:17 Yeah. I mean, I can't go through the whole, do the whole history justice in just a few minutes, but

33:22 I'll try to give my highlights, which was this reverend who was sort of, he was a, you know,

33:28 he was into theology and he was also into mathematics. So he was probably like pondering big questions and

33:34 he wrote down notes and he was trying to figure out the validity of various arguments.

33:39 His notes were found after he died, so he'd never published that. And so this was taken by

33:46 Pierre Laplace, who was a more well-known mathematician and kind of formalized. But when the basis of

33:52 statistical thinking was built in the late 20th, early 19th century, or late 19th, early 20th century,

34:00 it really went in a more frequentist direction where it's like, no, a probability is actually a

34:08 fraction of a repeatable experiment that kind of like over time, what fraction does it, does it end up

34:15 as? And so they consider probability as sort of a, an objective property of the system. So for example,

34:22 a dice flip, well, each side is one sixth. That's like kind of an objective property of the,

34:27 of the die. Whereas no Bayesian statistics is called sort of based on belief. And because belief kind of

34:33 seemed unscientific and the frequentists had very good methods for coming up with, with answers and

34:40 more, more objective ways of doing it, they sort of had the upper hand. But as kind of the focus got into

34:48 more complex issues and we had the rise of computers and that sort of thing, and the rise of more data and

34:55 that sort of thing, Bayesian inference started taking a bigger and bigger role until now, I think most

35:02 machine learning engineers and most data science scientists think as a Bayesian. And so it's like

35:07 some examples in history, most people are probably aware of Alan Turing at Bletchley Park, along with

35:14 many other people, you know, building these machines that broke the German codes during World War II.

35:19 It's all movie about it.

35:21 Right. That's trying to break the Enigma machine and the Enigma code. And that, those were some

35:26 important problems to solve, but also highly challenging.

35:31 Yeah. And so they incorporated a form of Bayes rule into this. Well, what are my relative beliefs

35:37 as to the setting of the machine? Because, you know, the machine could have had quadrillions of settings and

35:42 they're trying to distinguish between which one is likely to have and which one's not likely to have.

35:48 But after the war, that stuff was classified. So nobody could say, oh yeah, Bayesian inference was

35:55 used in that problem. And one interesting application that I found, even as it wasn't accepted by academia

36:00 for many years, was life insurance. Because they're kind of on the hook for determining if the actuaries

36:07 get the answer wrong as to how likely people are to live and die, then they're on the hook for lots and

36:13 lots of money or like the continuation of their company if they get it wrong.

36:17 And so-

36:18 Right. Right. Or how likely is it to flood here?

36:20 How likely is it for there to be a hurricane that wipes this part of the world off the map?

36:25 Right.

36:25 And a lot of these were one-off problems. You know, one problem is, you know, what's the

36:29 likelihood of two commercial planes flying into each other? It hadn't happened, but they wanted to

36:34 estimate the probability of that. And you can't do repeated experiments on that. So they really had to

36:38 use a priors, which was sort of like expert data. And then, you know, more recently, as we had the rise of

36:45 kind of machine learning algorithms and big data, you know, Bayesian methods have become more and more

36:52 relevant. But also a big problem was, you know, the problems that we just mentioned, which are, you

36:58 know, fire alarms and figuring out whether or not you have a disease and things like that. That's the

37:02 two hypothesis problem. But a lot of times you have an infinite space, you have an infinite hypothesis

37:08 problem that you're trying to determine between an infinite set of possible hypotheses. And that becomes

37:14 very difficult to do, becomes extremely difficult without a computer, even with a computer becomes

37:19 difficult to do. And so, you know, there's been a lot of research into how do you search that space

37:24 of hypotheses to find the ones that are most likely. And so if you've heard the term Markov chain Monte

37:29 Carlo, that is the most common algorithm used. And for that purpose, there is even current research

37:36 into that, to making that faster and finding the hypothesis you want more quickly. Andrew Gellman at

37:41 Columbia has some, a lot of stuff out about this. And he has like a new thing that's called like the

37:48 nuts, which is like the no U-turn sampler, which is based off a very complicated version of MCMC.

37:55 And so that's what's used in a framework that Python has called PyMC3 to come up with your

38:02 most likely hypothesis very, very quickly.

38:04 So let's take this over to the Python world. Yeah. Like, yeah, there's a lot of stuff that works

38:10 with it. And obviously, like you said, the machine learning deep down uses some of these techniques,

38:15 but this PyMC3 library is pretty interesting. Let's talk about it. So its subtitle is probabilistic

38:23 programming in Python.

38:25 If I could start with some alternatives, which I've used because I haven't, I've been diving into

38:30 reading about PyMC3, but I haven't used it personally. So even when I was doing things in 2014,

38:36 just on my own, basically without libraries, I was able to use Python very, very easily to

38:42 kind of put in these equations for Bayesian inference on whether it's multi-logistic regression,

38:50 or another one I did was Dirichlet prior calculator, which if I can kind of describe that, it's sort

38:56 of thinking, well, how, what should I believe about a place before I've seen any reviews? Should I

39:00 believe it's good? Should I believe it's bad? You know, if I have very few reviews, what should I

39:04 believe about it? Which was an important question to ask for something like four square city guide in

39:10 many cases, because we didn't have a lot of data. And so that was a good application of Bayesian

39:15 inference. And I was able to just use the equations straight up and kind of from first principles,

39:21 apply algorithms directly in Python. And it actually was not that hard to do because when searching the

39:29 space, there was a single global maximum, didn't have to worry about the local maximum in these

39:34 equations. So it was just a hill climbing. Hey, I'm going to start with this hypothesis in this

39:39 n dimensional space, and I'm going to find the gradient, I'm going to go a little higher,

39:43 a little higher, a little higher gradient ascent is what I described, although it's usually called

39:47 gradient descent. So that's sort of an easy one to understand. Then if you want to do MCMC directly,

39:54 because you have some space that you want to search, and you have the equations of the probability

40:00 on each of the points in that space, I used pi MC, which is spelled E M C E E, which is a simple

40:09 program that only does MCMC. And so I had a lot of success with that when I wanted to do some one off

40:18 sampling of, you know, non standard probability distributions. So those are ones that I've actually

40:24 used and had success with in the past. But pi MC three seems to be like the full, you know, we do

40:31 everything sort of a thing. And basically, what you do is you program probabilistically. So you say,

40:38 hey, I imagine that this is how the data is generated. So I'm just going to basically put

40:44 that in code. And then I'm going to let you, the algorithm work backwards and tell me what the

40:50 parameters originally were. So if I could do a specific here, let's say I'm doing logistic regression,

40:57 which is like, every item has a score, or, you know, in the case that I was working on,

41:02 every word has a score, the scores get added up, that's then a real number, then it's transformed

41:08 using a sigmoid into a number between zero and one. And that's the probability that's a positive review.

41:13 And so basically, you'll just say, hey, I have this vector that describes the words this has,

41:20 then I'm going to add these parameters, which I'm not going to tell you what they are.

41:24 And then I'm going to get this result. And then I'm going to give you the final data set at the end.

41:28 And it kind of works backwards and tells you, okay, this is what I think the parameters were.

41:33 And what's really interesting about something like pi MC3, which I would like to use in the future is

41:40 when you do a linear regression or logistic regression, in kind of standard practice,

41:44 you get one model at the end, right? This is the model that we think is best. And this is the model

41:50 that has the highest probability. And this is the model that we're going to use. Great. You know,

41:55 that that works for a lot of cases. But what pi MC3 does is that instead of picking a model at the end,

42:02 it says, well, we still don't know exactly which model produced this data. But because we have the

42:08 data set, we have a better idea of which models are now more likely and less likely. So we now have

42:13 a probability distribution over models. And we're going to let you pull from that.

42:17 So it kind of gives you a better sense of what the uncertainty is over the model. So

42:21 for example, if you have a word in your data set, let's say the word's delicious, and it's a pod,

42:28 we know it's a positive word. But let's say for some reason, there's not a lot of data on it,

42:32 then it can say, well, I don't really know what the weight of delicious should be.

42:38 It's being used at rock concerts. We don't know why. What does it mean?

42:41 Yeah, yeah, yeah. And so we're going to give you a few possible models. And, you know, and you can

42:45 keep sampling from that. And you'll see that the deviation, the discrepancy, the variance of that

42:52 model is going to be very high of that weight is going to be very high, because we just don't have a

42:56 lot of data on it. And that's something that standard regressions just don't do.

43:00 That's pretty cool. And the way you work with it is, you basically code out the model and like a

43:06 really nice Python language API. You kind of say, well, this, I think it's a linear model,

43:13 I think it's this type of thing. And then like you said, it'll go back and solve it for you. That's

43:18 pretty awesome. I think it's nice.

43:19 Right. A good thing to think about it is in terms of just a standard linear regression, like,

43:24 what's the easiest example I can think of? Try to find someone's weight from their height,

43:29 for example. And so you think there might be an optimal coefficient on there given the data.

43:36 But if you use PyMC3, it will say, no, we don't know exactly what the coefficient is given your data.

43:40 You don't have a lot of data, but we're going to give you several possibilities. We're going to give

43:44 you a probability distribution over it. And as I say, in the local maximum, you shouldn't make everything

43:50 probabilistic because there is a cost in that. But oftentimes you can, by considering something to be,

43:56 rather than considering one single truth by considering multiple truths probabilistically,

44:00 you can unlock a lot of value. In this case, you can kind of determine your variance a little better.

44:05 Yeah, that's super cool. I hadn't really thought about it. And like I said, the API is super clean

44:10 for doing this. So it's great. Yeah.

44:12 Where does this Bayesian inference, like, where do you see this solving problems today? Where do you see

44:18 like stuff going? What's the world look like now?

44:21 I've been using it to solve problems basically as soon as I started working as a machine learning engineer

44:26 at Foursquare, basically using Bayes' rule as kind of my first principles whenever I approach a problem.

44:32 And it's never driven me in the wrong direction. So I think it's one of those timeless things that

44:38 you can always use. For me, especially after working with our attribution product a lot,

44:44 I think that the future is trying to figure out causality a lot better. And I think that's where

44:51 some of these more sophisticated ideas come in. Because it's one thing to say, this variable is

44:56 correlated with that and I can have a model. But it's like, well, what's the probability that this

45:00 variable, changing this variable actually causes this other variable to change? In the case of ads,

45:06 where you could see where it's going to unlock a lot of value for companies where, you know,

45:10 there might be a lot of investment in this, is what is the probability that this ad affects

45:16 someone's likelihood to visit my place or to buy something from me more generally? Or what is my

45:23 probability distribution over that? And so can I estimate that? And I think that that whole industry

45:31 of online ads is, it's very frustrating for an engineer because it's so inefficient. And there's so

45:37 many people in there that don't know what they're doing. And it could be very frustrating at times.

45:40 But I think that means also that there's a lot of opportunity to like unlock value if you have

45:46 a lot of patience. Sure. Well, so much of it is just they looked for this keyword, so they must be

45:51 interested, right? It doesn't take very much into account. Yeah, but the question is, okay, maybe they

45:56 look for that keyword and now they're going to buy it no matter what I do. So don't send them the ad,

46:00 send the ad to someone who didn't search the keyword. Or maybe they need that extra push and that extra

46:04 push is very valuable. It's hard to know unless you measure it. And you measure it, you don't get a

46:10 whole lot of data. So you really, it really has to be a Bayesian model. Whoever uses these Bayesian

46:17 models is going to get way ahead. But right now it goes through several layers. I kept saying when we

46:23 were working on this problem and people weren't getting what we were doing, I was like, I wish the

46:29 people who are writing the check for these ads could get in touch with us because I know they care.

46:33 But, you know, oftentimes you're working through sales and someone on the other side.

46:39 It was just too many layers between, right?

46:41 Yeah.

46:41 Yeah, for sure.

46:42 Earlier, you spoke about having your code go fast and you talked about Cython.

46:48 Oh yeah.

46:48 What's your experience with Cython?

46:49 I used that for the multi-logistic regression. And all I can say is it took a little getting used

46:57 to, but, you know, I got an order of magnitude speed up, which we needed to launch that thing

47:02 in our one-off Python job at Foursquare. So it took only a few hours versus all day. So it was kind

47:12 of a helpful tool to get that thing launched. And I haven't used it too much since, but I kind of keep

47:18 that in the back of my mind as a part of my toolkit.

47:21 Yeah. It's great to have in the toolkit. I feel like it doesn't get that much love, but

47:25 I know people talk about Python speed and, oh, it's fast here. It's slow there.

47:30 Yeah.

47:31 First people just think it's slow because it's not compiled, but then you're like, oh,

47:34 but wait about the C extensions. You go, actually, yeah, that's actually faster than Java or something

47:39 like that. So interesting.

47:40 Yeah. I've also had a big speed up just by taking, you know, a dictionary or matrix I was using and then

47:47 using NumPy instead of the, or NumPy, I don't know how you pronounce it, but instead of using-

47:53 I go with NumPy, but yeah.

47:54 Okay.

47:54 NumPy instead of the standard, like, you know, Python tools, you could also get a big speed

48:01 up there.

48:01 Yeah, for sure. And that's pushing it down into the C layer, right?

48:04 Yeah.

48:04 But a lot of times you have your algorithm and Python, and one option is to go write that C

48:10 layer because you're like, well, we kind of need it. So here we go down the rabbit hole of writing

48:13 C code instead of Python. But Cython is sweet, right? Especially the latest one, you can just put

48:18 the regular type annotations, the Python three type annotations.

48:21 Oh, yeah.

48:21 On the types. And then, you know, magic happens.

48:24 I definitely, I just started with Python and it was like, you know, we're in this,

48:28 these three functions 90% of the time, just fix that.

48:31 It's usually the slow part is like really focused. Most of your code, it doesn't even matter what

48:35 happens to it, right? It's just, there's like that little bit where you loop around a lot

48:39 and that matters.

48:40 Yeah.

48:41 Yeah.

48:41 It's funny how we over optimize and you can't escape it. Like even when I'm creating,

48:46 you know, I see like a bunch of doubles. I'm like, oh, but these are only one and zero. Can

48:50 we like change them to Boolean? But like in the end, it doesn't care. It doesn't matter.

48:54 For most of the code, it really has no effect.

48:56 For sure.

48:57 Except in that one targeted place.

48:58 Yeah. So the trick is to use the tools to find it, right?

49:01 Yeah.

49:02 Like C profiler or something like that. The other major thing, you know, one thing you can do to

49:07 speed up stuff like this, these algorithms is just to say, well, I wrote it.

49:11 I wrote it in Python or I use this data structure and maybe if I rewrote it differently or I wrote

49:17 it in C or I applied Cython, it'll go faster. But it could be that you're speeding up the execution

49:23 of a bad algorithm. And if you had a better algorithm, it might go a hundred times faster

49:28 or something, right? Like, so how do you think about that with your problems?

49:31 That's what I did for the, back in 2014 with the Dirichlet prior calculator. And that was an

49:38 interesting problem to solve because to recap on that, it's one of the use cases we had.

49:44 Okay. What's my prior on a venue before I've gotten any reviews? What's my prior on a restaurant

49:49 before I've gotten any reviews? And I'm using the experience of the data on all the other restaurants

49:53 I've seen. So we know what the variance is. And let me try to come up with an equation that can

49:59 calculate that value from the data. And it turned out there were some algorithms available,

50:04 but as I dug into the math, I noticed that there was like a math trick that I could make use of.

50:12 In other words, it was something like certain logs were being taken of the same number,

50:17 were being taken over and over again. And it's like, okay, just store how many times we took the

50:23 log. And then when I dug into the math, they kind of combined into one term and multiply that together.

50:28 So essentially I used a bunch of factoring and refactoring, whether you think of it as factoring

50:33 code or factoring math to get kind of an exponential speed up in that algorithm. And so that's why I

50:41 published a paper on it. I was very proud of that. It was a, it was very satisfying thing to do.

50:45 It might not have mattered in terms of our product, but I think a lot of people used it though,

50:49 to be like, I want rather than just taking an average of what I've seen in the past. No,

50:53 I want to do something that is based on good principles. And so I want to use the Dirichlet

51:00 prior calculator. And so some people have used that. It's my Python code online. And the algorithm has

51:07 proven very fast and like almost instantaneous. Basically, as soon as you load all the data in,

51:13 it gives you the answer, which I like. Now, my next step to that is to use PyMC3,

51:19 rather than giving you an answer, it should give you a probability distribution over answers.

51:22 Yeah, that's right.

51:23 I haven't done that yet. Didn't know about that. Yeah. Didn't know about that at the time. I think

51:26 my speed up would still apply.

51:28 Yeah, that's cool. Well, that definitely takes it up a notch. What about learning more about

51:32 Bayesian analysis and inference and like, where should people go for more resources?

51:37 Oh, okay. Well, a kind of a history book that I read that I really like on Bayesian inference

51:42 is one called The Theory That Should Not Die by Sharon McGrane, a few years old, but it's really good

51:50 if you're interested in the history on that. I have a book about PyMC3, kind of a tech book that does go

51:56 into the basics of Bayesian inference that has a really good title. It's called Bayesian analysis

52:02 with Python. Oh, yeah.

52:04 Yeah, yeah. So that's a good one to look at. And then I have a bunch of episodes on my show

52:10 that are related to Bayesian analysis. So episode zero and one on my show were basically just starting

52:18 out trying to describe Bayes' role to everyone. I sort of attempted to do the description in episode

52:24 zero. And then in episode one, I applied it to the news story that was happening that day,

52:28 which was kind of the fire alarm at the bigger scale, which was everyone in Hawaii getting this

52:33 message that there's an ICBM missile coming their way because of a mistake someone made.

52:39 And then-

52:41 Yeah, because of some terrible UI decision on like the tooling.

52:45 Yeah, is that what it was?

52:46 Yeah, yeah.

52:47 Yeah.

52:47 There was some analysis about what had happened and not probabilistically, but there was some,

52:52 there's some really old crummy UI and they have to press some button to like acknowledge a test.

52:58 Or treat it as real and somehow they look like almost identical or there's some weird thing

53:03 about the UI that had like tricked the operator into saying, oh, it's real.

53:07 Yeah, yeah. And then another couple episodes I want to highlight is episode 21 and 22,

53:13 which is sort of kind of 21 is the philosophy of probability. In 22, we talk about the problem

53:18 of p-hacking, which is when people try their experiments over and over and until they get

53:24 something that works with p-values, which is a frequentist idea, which works if you're using

53:29 it properly. But the problem is most people don't. And then we did an episode, I think it

53:33 was 65 on probability, how to estimate the probability of something that's never happened. And then

53:39 78, the one that you mentioned, which was on the history of Bayes and a little more philosophy.

53:45 So I've talked about that a lot. You could probably go to localmaxradio.com or

53:49 localmaxradio.com slash archive and find the ones that you want.

53:52 That's really cool. So yeah, I guess we'll leave it there for now. That's quite interesting. And yeah,

53:58 it gives us a look into some of the algorithms and math we got to know for our data science.

54:02 Now, before you get out of here, though, I got the two questions I always ask everyone.

54:06 You're going to write some Python code. What editor do you use?

54:09 I just use Sublime or TextMate also on Mac. But I'm sure I could do something a little better

54:16 than that. I just picked one and never really looked back.

54:19 Sounds good. And then notable PyPI package?

54:23 Notable.

54:24 Maybe not the most popular, but like, oh, you should totally know about this. I mean,

54:28 you already threw out there PyMC3, if you want to claim that one, or if there's something else. Yeah,

54:32 pick that.

54:33 Yeah. Well, I have BayesPy, which is the one that's like in GitHub slash max slash BayesPy,

54:40 which has all the stuff I talked about. It's not actively developed, but it does have my kind of

54:44 one-off algorithms, which if you're in the market for multinomial models or Dirichlet,

54:52 or you want some kind of interesting new way to do multi-logistic regression, I could certainly give

55:00 that a try. But most people probably want to use kind of the standard toolings. Yeah. Why don't I go

55:06 with that? Why don't I go with the one I wrote a long time ago?

55:09 Yeah. Right on. Sounds good. All right. Final call to action. People are excited about this stuff.

55:13 What do you tell them? What do they do?

55:15 Check out the books I mentioned and check out my website, localmaxradio.com. And also subscribe to the Local Maximum. It should be on all of your podcatchers.

55:26 If it's not on one, please let me know. But it should be on all of your podcatchers.

55:31 localmaxradio.com. It's just every week. And we have a lot of fun. So definitely check it out.

55:35 Yeah, it's cool. You spend a lot of time talking about these types of things.

55:37 Super. All right. Well, Max, thanks for being on the show.

55:40 Michael, thank you so much. I really enjoy this conversation.

55:43 Yeah, same here. Bye-bye.

55:44 Bye.

55:45 This has been another episode of Talk Python to Me. Our guest on this episode was Max Sklar,

55:50 and it's been brought to you by Linode and Tidelift. Linode is your go-to hosting for whatever you're

55:56 building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E.

56:02 If you run an open source project, Tidelift wants to help you get paid for keeping it going strong.

56:08 Just visit talkpython.fm/Tidelift, search for your package, and get started today.

56:14 Want to level up your Python? If you're just getting started, try my Python Jumpstart by

56:19 Building 10 Apps course. Or if you're looking for something more advanced, check out our new

56:23 async course that digs into all the different types of async programming you can do in Python.

56:28 And of course, if you're interested in more than one of these, be sure to check out our

56:32 Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show.

56:37 Open your favorite podcatcher and search for Python. We should be right at the top.

56:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

56:46 and the direct RSS feed at /rss on talkpython.fm.

56:50 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

56:54 Now get out there and write some Python code.

56:56 We'll see you next time.

57:10 We'll see you next time.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon