Bayesian foundations

Episode #239, published Sat, Nov 23, 2019, recorded Thu, Oct 10, 2019

Episode Deep Dive Links Transcript

In this episode, we'll dive into one of the foundations of modern data science, Bayesian algorithms, and thinking. Join me along with guest Max Sklar as we look at the algorithmic side of data science.

Episode Deep Dive

Guest Introduction and Background

Max Sklar is a seasoned Python developer, data scientist, and machine learning engineer. He spent years at Foursquare working on everything from C++ for Palm Pilot projects (in his early days!) to Bayesian-driven data science and machine learning in Python and Scala. Max is also the host of the Local Maximum podcast, where he explores topics in data science, machine learning, tech trends, and beyond. In this episode, he shares insights on Bayesian thinking, discusses how it has shaped his work at Foursquare and elsewhere, and highlights examples of Python usage in data-driven products and research.

What to Know If You're New to Python

Below are a few basics that will help you follow the conversation about Bayesian inference, data science, and Python’s usage throughout the episode. Here are some resources and quick pointers:

Installing Python: Make sure you have a recent version (3.x) installed so you can work with modern Python features.
pip and Virtual Environments: Knowing how to install libraries (e.g., pip install pymc3) and keep your dependencies isolated is crucial for data science projects.
Reading and Cleaning Data: The CSV library (built-in) or tools like pandas can help you quickly import, clean, and explore data.
PyMC3 Documentation: Familiarity with a probabilistic programming library such as PyMC3 can help you see Bayesian inference in action.

Key Points and Takeaways

Bayesian Inference as a Core Foundation Bayesian methods form the bedrock of modern data science tasks, from calculating probabilities of events (e.g., spam detection, disease diagnosis) to interpreting customer behavior. Max highlights how Bayesian thinking models “beliefs” in a mathematical way and updates those beliefs when new data arrives, making it a powerful approach for everything from ads attribution to location-based services.
- Tools & Links:
  - PyMC3: Python library for Bayesian modeling and probabilistic machine learning
  - Local Maximum Podcast: Max’s podcast discussing Bayesian methods, data science, and more
Real-World Bayesian Example: Fire Alarms A simple yet illustrative example is using prior knowledge (fires are rare) and updating that with the data (the alarm is sounding). Because false alarms happen regularly, Bayesian reasoning reveals that the posterior probability of an actual fire is still often low, though the risk is high enough to act on. This example frames how prior probabilities combine with observed data to guide decisions.
- Tools & Links:
  - Bayes' Theorem Overview: General reference to the core mathematical formula
Practical Use at Foursquare Max used Bayesian principles to power key data products at Foursquare, including sentiment models and ads attribution. Whether it was inferring a restaurant’s quality from minimal user reviews or determining whether an ad influenced a store visit, Bayesian updates proved more nuanced and powerful than naive approaches.
- Tools & Links:
  - Foursquare: Location intelligence, originally known for “check-in” consumer app
Logistic Regression and Sentiment Modeling Logistic regression is a fundamental classification technique, and Max implemented a multi-logistic regression system (often with a Bayesian twist) to assign sentiments to short Foursquare user tips. Words like “delicious” need numeric weights to classify positivity vs. negativity, but limited data or context can introduce uncertainty, an area where Bayesian modeling really shines.
- Tools & Links:
  - scikit-learn Logistic Regression: Common Python library for regression models
  - spaCy: Though not discussed in detail here, it’s a popular NLP library that can also be integrated
Probabilistic Programming and PyMC3 Beyond simpler “point estimate” methods, PyMC3 lets you create a probability distribution over model parameters rather than a single best guess. Max talks about how this approach acknowledges uncertainty in data or limited observations. It also helps in analyzing confidence intervals and model variance effectively.
- Tools & Links:
  - PyMC3 Docs: Official documentation and examples
The Challenge of Causality Much of the discussion touches on ads attribution. Determining whether an action truly “caused” a user’s visit is more complex than mere correlation. Bayesian thinking plus careful experimentation can help uncover these cause-and-effect relationships with more rigor than naive or purely frequentist approaches.
- Tools & Links:
  - Local Maximum Podcast #78: Max’s deeper discussion on Bayesian history and philosophical underpinnings
Random Forests and Debugging Surprising Results Max shares a story about working on random forests and discovering a bug where a zero probability in log-likelihood became a “perfect score” due to a software glitch. This highlights how deep knowledge of statistics, combined with debugging experience, is crucial for spotting anomalies in machine learning outputs.
- Tools & Links:
  - Spark MLlib: Spark’s machine learning library (cited in passing during the conversation)
  - Random Forest Overview: scikit-learn documentation
Python Performance: Cython and NumPy Python’s speed can be boosted significantly by using libraries like NumPy (for vectorized operations) or translating hot spots into Cython. Max’s experience saw an order-of-magnitude speedup, turning a multi-hour logistic regression job into something that completed in minutes.
- Tools & Links:
  - Cython: Write Python-like code that compiles to C
  - NumPy: Fundamental package for scientific computing
Building Tools from Scratch vs. Using Libraries Before PyMC3 was mature, Max implemented custom Bayesian scripts in Python for tasks like Dirichlet calculations and logistic regression with L1/L2 regularizations. This underscores how quickly the ecosystem has matured; the barrier to advanced Bayesian modeling is much lower today.
- Tools & Links:
  - BayesPy (Max’s repo): Custom bayesian algorithms used at Foursquare (not to be confused with bayespy.org)
  - Dirichlet Distribution Overview: Key distribution in Bayesian statistics
How Bayesian Thinking Impacts Broader Data Science A recurring theme is the advantage of Bayesian methods beyond typical machine learning “just fit it” mindsets. Bayesian updating can model uncertainty directly, incorporate priors, and adapt more gracefully as new data arrives. This versatility makes it well-suited to real-world, changing data situations.

Tools & Links:
- Allen Turing’s Work on Enigma: Mentioned historically as an example of Bayesian ideas (though largely classified at the time)

Interesting Quotes and Stories

“I never had a phase where it was like, oh, Python’s not for me. But I did have a phase where it was just another language. It wasn’t until I started writing some quick scripts that it really clicked.” , Max Sklar

“One time I was trying to predict how likely someone is to visit a store… We used random forests, but got these perfect scores. It turned out that a false alarm for zero probability was being taken as infinitely good. That’s the kind of debugging you don’t forget!” , Max Sklar

Key Definitions and Terms

Bayesian Inference: A statistical method where we begin with a prior belief about an event and update that belief as new evidence arrives, producing a posterior belief.
Logistic Regression: A classification algorithm that uses a logistic (sigmoid) function to convert linear combinations of features into probabilities between 0 and 1.
Random Forest: An ensemble machine learning model using multiple decision trees to improve predictive performance and reduce overfitting.
Dirichlet Distribution: Often used in Bayesian statistics to represent probabilities of probabilities (e.g., how likely each category is, prior to seeing data).
PyMC3: A Python library for probabilistic programming, enabling Bayesian model definition and inference with modern sampling algorithms.
Cython: A superset of Python that allows the user to add type declarations and compile Python to C, greatly speeding up computational sections.

Learning Resources

Below are courses that can help you dive deeper into Python or enhance data science skills.

Python for Absolute Beginners: Ideal if you’re completely new to Python and want a solid foundation in the language’s core features and concepts.
Data Science Jumpstart with 10 Projects: Learn practical data science techniques by walking through real, hands-on projects in Python.
MongoDB with Async Python: If you’re interested in modern data layer techniques, this course shows you how to work with NoSQL and async Python.
Python Data Visualization: A robust introduction to creating plots and visuals in Python, often a crucial part of explaining Bayesian or other statistical results.

Overall Takeaway

Bayesian inference offers an elegant, mathematically grounded way to handle uncertainty, update beliefs, and guide decision-making in real-world data scenarios. Python’s ecosystem, from scikit-learn to PyMC3, enables developers and data scientists to tackle these Bayesian methods at multiple levels, whether they’re building from scratch or relying on powerful libraries. Max’s experiences at Foursquare and beyond demonstrate how Bayesian foundations can power everything from sentiment analysis to ads attribution. By embracing both the underlying theory and pragmatic tools, you can build models that adapt to new information and better reflect the real-world probabilities your organization cares about.

Links from the show

Max on Twitter: @maxsklar
Max's podcast on Bayesian Thinking: localmaxradio.com
Bayes Theorm: wikipedia.org
Simple MCMC sampling with Python: github.com
PyMC3 package - Probabilistic Programming in Python: pymc.io
Episode #239 deep-dive: talkpython.fm/239
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #239 deep-dive: talkpython.fm/239

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 In this episode, we'll dive deep into one of the foundations of modern data science,

00:03 Bayesian algorithms and Bayesian thinking. Join me along with guest Max Sklar as we look at the

00:09 algorithmic side of data science. This is Talk Python To Me, episode 239, recorded November 10th, 2019.

00:29 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:34 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:39 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter

00:44 via at Talk Python. This episode is brought to you by Linode and Tidelift. Please check out what

00:50 they're offering during their segments. It really helps support the show. Max, welcome to Talk Python

00:55 to Me. Thanks for having me, Michael. It's very great to be on. It's great to have you on as well.

00:59 You've been on Python Bytes before, but never Talk Python To Me.

01:02 That was a lot of fun. I actually got someone reached out to me on Twitter the other day saying,

01:06 hey, I saw you on Python Bytes. So that was really exciting.

01:09 Right on, right on. That's super cool.

01:11 I heard you on Python Bytes. I always say saw you when it's really heard you, but anyway.

01:16 It's all good. So now they can say they saw you on Talk Python To Me as well.

01:20 Now, we're going to talk about some of the foundational ideas behind data science,

01:26 machine learning. That's going to be a lot of fun. But before we get to them, let's set the stage and

01:31 give people a sense of where you're coming from. How do you get into programming in Python?

01:34 That is a really interesting question because I think I started in Python a very long time ago,

01:40 like 10 years ago maybe. I was working on kind of a side project called stickymap.com. The website's

01:47 still up. It barely works. But it was basically a, it was, it was like my senior project as an

01:53 undergrad. So I really, I started this in 2005. And what it was, was it was, you know, Google Maps

01:59 had just come out with their API where you can like, you know, include a Google map on your site.

02:05 And so I was like, okay, this is cool. What can I do with this? Let's add markers all over the map

02:10 and it could be user generated. We would call them emojis now. And people could leave little messages

02:15 and little locations and things like that. This was before there was Foursquare, which is where I

02:19 worked, which is location intelligence. This was just me messing around, trying to make something cool

02:24 and being inspired by the whole host of like, you know, social media startups that were happening at

02:30 the time. And I was using, what was I using at the time? I was using PHP and MySQL to put that

02:38 together. I knew nothing about web development. So I went to the Barnes and Noble. I got that book,

02:42 PHP, MySQL. I got it. But then sometime around like 2008, 2009, I realized, you know, a lot of people

02:48 were talking about Python at work. And I realized like, sometimes I need, this is kind of when I was

02:52 winding down on the project, but I realized, you know, I had all this data and I realized I needed

02:59 a way to like clean the data. I needed a way to like write good scripts that would clear up certain

03:05 like if I have a flat file of like, here's the latitude and longitude, they're separated by

03:11 tabs. And here's a, you know, here's some text that someone wrote that needs to be cleaned up,

03:16 et cetera, et cetera. Yeah, I can write some scripts in like Python or Java, believe it or not, which I

03:23 knew at the time, but then, or sorry, a PHP or Python, which I knew at the time, but like, wait, wait,

03:28 not job. Sorry. I was trying to do it in PHP and Java, which is really bad idea.

03:34 Yeah. Especially PHP sounds tricky. Yeah. Yes, yes, yes. And then I was like, well,

03:39 I'm just learning this Python. I need something. So let me try to do it with Python. And it worked

03:44 really well. And then I had, you know, to deal a lot more with CSVs and stuff like that tab separated

03:52 files. And it really was just a way to like save time at work. And it was like a trick to say,

03:57 hey, that thing that you're doing manually, I can do that in like 10 minutes. And it's not 10 minutes,

04:03 maybe a couple hours and write a script. And it's going to take you like one week. Like I saw someone

04:08 at work trying to change something manually. And so this is all a very long time ago. So I don't

04:12 remember exactly what it was, but it was kind of like a good trick to save time. And it had nothing

04:17 to do with data science or machine learning at the time. It was more like writing scripts to clean up

04:20 files. Well, that's perfect for Python, right? Like it's really one of the things that's super good at.

04:25 It's so easy to read CSV files, JavaScript files, XML, whatever, right? It's just they're all like a

04:31 handful of lines of code and you know, magic happens.

04:34 Yeah. The one thing that I was really impressed with was like, how easy at the time now, when I

04:39 wanted to do more complicated Python packages in like 2012, 2013, I realized, oh, actually,

04:45 some of these packages are complicated to install. But like, I was so impressed with how easy it was

04:50 to just import the CSV package and just be like, okay, now we understand your CSV. If you have some

04:57 stuff in quotes, no problem. If you want to clean up the quotes, no problem. Like it was all just like,

05:01 it just happened very fast.

05:02 Yeah. You don't have to statically link to some library or add a reference to some other thing or

05:07 none of that, right? It's all good. It's all right there.

05:09 Yeah. I mean, that was, those were the days when like, I was still programming in C++ for work. So

05:15 you could imagine what, how big of a jump that was. I mean, that seems so ancient. I used to have to

05:21 program in C++ for the Palm Pilot. That was my first job out of school, which is crazy.

05:26 Oh, wow. That sounds interesting. Yeah.

05:28 Yeah.

05:28 Yeah. Coming from C++, I think people have two different reactions. One, like, wow, this is so

05:33 easy. I can't believe I did this in so few lines. Or this is cheating. It's not real programming.

05:40 It's not for me, you know? But I think people go, who even disagree, like, oh, this is not for me,

05:46 eventually like find their way over. They're pulled in.

05:48 I never had a phase where it was like, oh, this is not for me. But I did have a phase where it was like,

05:54 I don't see, this is just another language. And I don't see why it's better or worse than any other.

05:59 I think that's the phase that you go through when you learn any new language where it's like, okay,

06:02 I see all the features. I don't see what this brings me. It was only through doing those specific

06:07 projects where it was like, aha, no one could have convinced me.

06:10 Yeah. Also, you know, if you come from another language, right, if you come from C++, you come

06:14 from Java, whatever, you know how to solve problems super well in that language. And you're comfortable.

06:20 And when you sit down to work, you say, file a new project and file, new files, start typing.

06:26 And it's like, okay, well, what do I want to do? I want to call this website or talk to this database.

06:30 I'm going to create this and I'll do this. And bam, like, you can just do it. You don't have to just

06:35 pound on every little step. Like, how do I run the code? How do I use another library?

06:41 What libraries are there? Is there like, there's every, you know, it's just that transition is always

06:46 tricky. And it takes a while before you, you get over that and you feel like, okay,

06:52 I really actually do like it over here. I'm going to put the effort into learn it properly because

06:57 I don't care how amazing it is. You're still going to feel incompetent at first.

07:03 The switching costs are so tough. And that's why they say, oh, if you're going to build a new

07:07 product, it has to be like 10 X better than the one that exists or something like that. I don't know

07:11 if that's, you know, literally true, but like it's true with languages too, because it's really hard to

07:18 like pick up a new language and everyone's busy at work and busy doing all the tasks they need to do

07:21 every day. For me, frankly, it was helpful to take that time off in quotes, time off. When I was going to

07:27 grad school, time off from working full-time as a software engineer to actually pick some of this

07:33 stuff up. Absolutely. All right. So you had mentioned earlier that you do stuff at Foursquare and it

07:38 sounds like your early programming experience with sticky maps is not that different than Foursquare,

07:44 honestly. Tell people about what you do. Maybe, I'm pretty sure everyone knows what Foursquare is,

07:48 what you guys do, but tell them what you do there. People might not be aware of where Foursquare is

07:54 today. You know, there is Foursquare is kind of known as that quirky check-in app, find good places

08:01 to go with your friends and eat app, you know, share where you are. And that's where we were in 2011,

08:08 where, when I joined up to, you know, a few years ago, but ultimately, you know, the company kind of

08:14 pivoted business models and sort of said, Hey, we have this really cool technology that we built for the

08:20 consumer apps, which is called Pilgrim, which essentially takes the data from your phone and

08:25 translates that into stops. You know, you'd stopped at Starbucks in the morning, and then you stopped at

08:30 this other place, and then you stopped at work, et cetera, et cetera. And then, you know, that goes into,

08:35 that finds use cases like, you know, across the apposphere, I don't even know what to call it,

08:41 but many apps would like that technology. And so we have this panel and, you know, so for a few years,

08:47 I was working on a product at Foursquare called Attribution, where companies, our clients would say,

08:53 Hey, we want to know if our ads are working, our ads across the internet, not just on Foursquare.

08:57 And we would say, well, we could tell you whether your ads are actually causing people to go into your

09:03 stores more than they otherwise would. And I worked on that for a few years, which is a really cool

09:09 problem to solve, a really cool data science problem to solve, because it's a causality problem.

09:12 It's not just, you know, you can't just say, well, the people who saw the ads visited 10% more,

09:18 because maybe you targeted people who would have visited 10% more.

09:21 Exactly. I'm targeting my demographic, so they better visit more. I got it wrong.

09:26 That industry is a struggle, because the people that you're selling to often don't have the

09:31 backgrounds to understand the difference, and sometimes don't have the incentives to understand

09:36 the difference. But we did the best we could. And so that led to kind of an acquisition that

09:42 Foursquare did earlier this year of Placed, which was an attribution company owned by Snap,

09:49 but they sold it to us through this big deal. You can read about it online.

09:54 Giant tech company trade.

09:56 Yeah. And so I had left Foursquare in the interim, but then I recently went back to work with the

10:05 founder, Dennis Crowley, and just kind of building new apps and trying to build cool apps based on

10:10 location technology, which is really why I got into Foursquare, why I get into Sticky Map,

10:15 and I'm just having so much fun. So that's, and we have some products coming along the way where

10:21 it's not enterprise. It's not, you know, measuring ads. It's not ad retargeting. It's just

10:26 building cool stuff for people. And I don't know how long this will last, but I couldn't be happier.

10:32 Sounds really fun. I'm sure Squarespace is, sorry, Squarespace.

10:36 You're not the first fan. Squarespace is around here. Foursquare is in New York where you are.

10:43 Now, I'm sure that that's a great place to be, and they're doing a lot of stuff. They used

10:48 something like Scala. There's some functional programming language that primarily there,

10:52 right? Is it Scala?

10:53 Yeah, it's primarily Scala. I've actually done a lot of data science and machine learning in Scala. And

10:57 sometimes I'm kind of envious of Python because there's better tools in Python. And we do some of

11:03 our, we do some of our initial testing on data sets in Python sometimes, but there is a lot of momentum

11:10 to go with Scala because all of our backend jobs are written in Scala. And so we often have to

11:15 translate it into Scala, which has good tools, but not as good as Python.

11:19 Yeah. Yeah. So I was going to ask, what's the Python story there? Do you guys get to do much

11:24 Python there?

11:25 Yeah. So I have done, if I can take you back in the, to the olden days of 2014, if that's,

11:33 if that's allowed, because one of the things that I did at Foursquare that I'm pretty proud of

11:38 is building a sentiment model, which is trying to take a Foursquare tip, which were like three

11:45 sentences that people wrote in Foursquare on the Foursquare City Guide app. And that gets surfaced

11:50 later. It was sort of compared to the Yelp reviews, but except they're short and helpful and not as

11:57 negative. What we want to do is we want to take those tips and try to come up with the rating of

12:01 the venue because we have this one to 10 rating that every venue receives. And so using the likes

12:07 and dislikes explicitly wasn't good enough because there were so many people who would just click like

12:12 very casually. And so we realized at some point, Hey, we have a labeled training set here. We can say,

12:19 Hey, the person who explicitly liked a place and also left a text tip, that is a label of positive.

12:25 And someone who explicitly disliked a place, that's a label of negative. And someone who left the

12:29 middle option, which we called a meh or a mixed review, their tip is probably mixed. And so we have

12:35 this tremendous data set on tips and that allowed us to build a model, a pretty good model. And it

12:41 wasn't very sophisticated. It was multi-logistic regression based on sparse data, which was like

12:46 what phrases are included in the tip. Right. Trying to understand the sentiment of the actual words,

12:53 right? Yeah. There was logistic regression available in Python at the time, which is great,

12:58 but I wanted something a little custom, which is now available in Python. But back then it was kind

13:03 of hard to find these packages and not just that there, even when there were packages, sometimes

13:08 it's difficult to say, okay, is this working? How do I test what's going on into the hood? It's not very,

13:13 so I decided to build my own in Python, which was a multi-logistic regression means we're trying to find

13:20 out three categories like positive review, negative review, or mixed review based on the label data.

13:27 And we were going to have a sparse data set, which means it's not like there are 20 words that we're

13:34 looking for. No, there are like tens of thousands. I don't know the exact number, tens of thousands,

13:39 hundreds of thousands of phrases that we're going to look for. And for most of the tips, most of the

13:43 phrases are going to be zero. Didn't see it, didn't see it, didn't see it. But every once in a while,

13:47 you're going to have a one, didn't see it. So that's when you have that matrix where most of

13:50 them are zero, that's sparse. And then thirdly, we wanted to use elastic net, which meant that

13:56 most of the weights are going to be set to exactly zero. So when we store our model,

14:01 most words, it's going to say, hey, these words aren't sentiment. So we're just going to,

14:06 these don't really affect it. We want to have it exactly zero, except what a traditional logistic

14:10 regression would do is it would say, okay, we are going to come up with the optimal,

14:18 but everything will be close to zero. And so you have to kind of store it. You have to store the

14:21 like 0.0001. So that's a problem too. So I actually built that kind of open source and put that on my

14:29 GitHub on base pi back in 2014. I don't think anyone uses it, but it was a lot of fun. I use Cython to make

14:34 go really fast. It's kind of a problem at Foursquare because it's the only thing that

14:39 runs in Python. And every once in a while, someone asks me like, what's this doing here?

14:42 Exactly. How do I run this? I don't know. This doesn't fit to our world, right?

14:45 Yeah.

14:45 Cool. All right. Well, Foursquare sounds really fun. Another thing that you do

14:50 that I know you from, I don't know you through the Foursquare work that you're doing. I know you

14:54 through your podcast, The Local Maximum, which is pretty cool. You had me on back on episode 73.

15:00 So thanks for that. That was cool.

15:01 That is our most downloaded episode right now.

15:04 Really? Wow. Awesome.

15:06 Yeah.

15:06 That's super cool to hear.

15:07 Yeah.

15:07 More relevant for today's conversation, though, would be episode 78, which is all about Bayesian

15:14 thinking and Bayesian analysis and those types of things. So people can check that out for a more

15:20 high level, less technical, more philosophical view, I think, on what we're going to talk about

15:26 if they want to go deeper, right?

15:27 Absolutely. You could also ask me questions directly because I ramble a little bit in that, but I cover

15:31 some pretty cool ideas, some pretty deep ideas there that I've been thinking about for many years.

15:37 Yeah, for sure. So maybe tell people just really quickly what The Local Maximum is, just to give you

15:42 a chance to tell them about it.

15:43 Yeah. So I started this podcast about a year and a half ago in 2018.

15:48 And it started with, you know, I started basically interviewing my friends at Foursquare being like,

15:53 hey, this person's working on something cool, that person's working on something cool, but they never

15:57 get to tell their story. So why not let these engineers tell their story about what they're

16:02 working on? And since then, I've kind of expanded it to cover, you know, current events and interesting

16:09 topics in math and machine learning that people can kind of apply to their everyday life. Some episodes

16:14 get more technical, but I kind of want to bring it back to the more general audience that it's like,

16:18 hey, my guests and I, we have this expertise. We don't just want to talk amongst ourselves. We want

16:23 to actually engage with the current events, engage with the tech news and try to think, okay, how do we

16:29 apply these ideas? And so that's sort of the direction that I've been going in. And it's been a lot of fun.

16:36 I've expanded beyond tech several times. I've had a few historians on, I've had a few journalists on.

16:42 That's cool. I like the intersection of tech and those things as well. Yeah, it's pretty nice.

16:45 This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast,

16:53 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at

16:57 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated

17:04 server with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or

17:09 where your users are, there's a data center for you. Whether you want to run a Python web app,

17:13 host a private Git server, or just a file server, you'll get native SSDs on all the machines,

17:19 a newly upgraded 200 gigabit network, 24-7 friendly support, even on holidays, and a seven-day money-back

17:26 guarantee. Need a little help with your infrastructure? They even offer professional

17:30 services to help you with architecture, migrations, and more. Do you want a dedicated server for free

17:34 for the next four months? Just visit talkpython.fm/Linode. Let's talk about general data science

17:43 before we get into the Bayesian stuff. So I think one of the misconceptions in general is that you have to be

17:52 a mathematician or be very good at math to be a programmer. I think that's a false statement.

17:59 To be a programmer.

18:00 Yes, yes. Software developer. Straight up, I built the checkout page on this e-commerce site,

18:05 for example.

18:06 I would agree. I think you need some abstract thinking. You can't escape letters and stuff and

18:12 variables, but you don't need, well, in the case of data science to compare, like you don't need,

18:18 you don't need algebra or you don't need maybe a little bit, but you don't really need calculus and

18:23 you don't need geometry, linear algebra and geometry. Yeah. Sometimes it's a UI engineer. You might need a

18:29 little geometry.

18:29 I mean, there's certain parts that you need that kind of stuff. Like video game development, for example,

18:33 everything is about multiplying something by a matrix, right? You put all your stuff on the screen,

18:39 even arrange it and rotate it by multiplying by matrices. There's some stuff happening there you

18:44 got to know about, but generally speaking, you don't. However, I feel like in data science,

18:48 you do get a little bit closer to statistics and you do need to maybe understand some of these

18:55 algorithms. And I think that's where we can focus our conversation for this show is like,

19:01 what do we need to know in general? And then the idea of Bayesian Bay's theorem and things like that.

19:06 What do we need to know if I wanted to go into say data science? Because like I said, I don't really

19:12 think you know that need to know that to do like, you know, connecting to a database and like saving a

19:16 user. And you absolutely need logical thinking, but not like stats, but for data science, what do you

19:23 think you need to know?

19:24 Well, for data science, it really depends on what you're doing and how far down the rabbit hole you

19:29 really want to go. You don't necessarily need all of the philosophical background that I talk about.

19:34 I just love thinking about it. And it sort of helps me focus my thoughts when I do work on it

19:41 to kind of go back and think about the first principles. So I get a lot of value out of that,

19:46 but maybe not everyone does. There is sort of a surface level data science that or machine learning

19:53 that you can get away with. If you want to do simple things, which is like, hey, I want to

19:58 understand the idea that I have a training set, you know what a training set is, and this is what I want

20:04 to predict. And here is roughly my mathematical function of how I know whether I'm predicting it well

20:12 or not, but it could be something simple like the square distance, but already you're introducing

20:16 some math there. And basically, I'm going to take a look at some libraries and I'm going to

20:23 see if something works out of the box and gives me what I need. And if you do it that way,

20:28 you need a little bit of understanding, but you don't need everything that like I would say kind of a

20:33 true data science or machine learning engineer needs. But if you want to go deeper and kind of

20:39 make it your profession, I would say you need kind of a background in calculus and linear algebra.

20:45 And again, like, look, if I went back to grad school and I like if I went to a linear algebra

20:52 final and I took it right now, would I be able to get every question right? Probably not. But I know

20:57 the basics and I have a great understanding of how it works. And if I look at the equations, I can kind of

21:03 break it down, you know, maybe with a little help from Google and all that.

21:06 I think there's a danger of using these libraries to make predictions and other stuff when you're

21:13 like, well, the data goes in here to this function and then I call it and then out comes the answer.

21:18 Maybe there's some conditionality versus independence requirement that you didn't understand and it's

21:24 not met or, you know, whatever, right?

21:26 That's why I said it's really surface level and you can get away with it sometimes, but

21:30 only for so long. And I think understanding where these things go wrong outside the, you know, when you

21:37 take these black box functions requires both kind of a theoretical understanding of how they work and

21:43 then also just like experience of seeing things going wrong in the past.

21:46 Yeah. That experience sounds hard to get, but it seems like I'm an experience, right?

21:51 You just, you got to get out there and do it.

21:52 Right. Well, here's a good example. One time I was trying to predict how likely someone is to

21:57 visit a store. This was part of working on Foursquare's attribution product, right? And

22:03 someone was using random forest algorithm, or maybe it was just a simple decision tree. I'm not sure,

22:09 but basically it creates a tree structure and puts people into buckets and determines whether or not,

22:18 you know, and for each bucket, it says, okay, in this bucket, everyone visited and in this bucket,

22:21 everyone didn't, or maybe this bucket is 90, 10 and this bucket is 10, 90. And so I can give good

22:26 predictions on the probability someone will visit based on where they fall on the leaves of the tree.

22:32 And we were using it and something just wasn't making sense to me. Somehow the numbers were just,

22:39 something was wrong. And then I said, okay, let's make, let's make more leaves. And then I made more

22:44 leaves. Like I made, I made the tree deeper, right? And then they're like, see, when you make the tree

22:49 deeper, it gets better. That makes sense because it's, it's more fine graining. I'm like, yeah,

22:54 but something doesn't make sense. It shouldn't be getting this good. And then as I realized what was

22:59 happening, what was it, what was happening was some of the leaves had nobody visited in this leaf.

23:04 That makes a lot of sense because most days you don't visit any particular chain.

23:08 And when it went to zero and then it saw someone visited, well, the log likelihood loss,

23:15 it basically predicted 0% of an event that didn't happen. And so log, when you do log likelihood

23:22 loss or negative log likelihood loss, the score is like the negative log of that. So essentially you

23:27 should be penalized infinitely for that because there was no smoothing. But the language we were using,

23:33 which I think was spark or something like that. And it was probably some library and spark. I probably

23:40 shouldn't throw a spark under the bus. It was probably some library or something was changing

23:43 that infinity to a zero. So the thing that was infinitely bad, it was saying was infinitely good.

23:49 And so the worst thing. And that took, oh God, that took us so long to figure out. Like it's embarrassing

23:56 how long that one took to figure out, but that's, that's a good example of when experience will get

24:02 you in something. I don't think I've ever talked about this one publicly.

24:05 Yeah. Well, you just got to know that, you know, that's not what we're expecting, right?

24:10 Yeah. But you know, theoretically, Hey, if I more fine grained my tree, if I, you know,

24:15 make my groups smaller, maybe it works better. But I was like something, I was like, something's not

24:21 right. It's working a little too good. There was nothing specifically that got me, but it was just

24:26 like, there's probably a lot of stuff out there. That's actually people are taking actions on and

24:30 spending money on, but it's, it's like that, right? Yeah. Yeah. So let's see. So we talked about

24:35 some of the math stuff. If you really want to understand the algorithms, you know, statistics,

24:39 calculus, linear algebra, you obviously need calculus to understand like real statistics,

24:44 right? Continuous statistics and stuff. What else though? Like, do you need to know machine learning?

24:50 What kind of algorithms do you need to know? Like what, what in the computer science-y side of things

24:55 do you think you got to know? Bread and butter of the data scientists that I work with is machine

25:00 learning algorithms. So I think that is very helpful to know. And I think that, you know, some of the

25:06 basic algorithms in machine learning are good to know, which is like the K nearest neighbor,

25:10 K means, logistic regression, decision trees, and then some kind of random forest algorithm,

25:17 whether it's just random forest, which is a mixture of trees or gradient boosted trees we've had a lot

25:22 of luck with. And then a lot of this deep learning stuff is, well, neural networks is one of them.

25:29 Maybe you don't need to be an expert in neural networks, but it's certainly one to be aware of.

25:33 And based on these neural networks, deep learning is becoming very popular. And I've been hearing and

25:40 kind of looking into reading about deep learning for many years, but I have to say, I haven't actually

25:45 implemented one of these algorithms myself. But I just interviewed a guy on my show, Mark Ryan,

25:51 and he came out with a book called machine learning for structured data, which means, hey, you don't

25:57 just, this doesn't just work for like images or audio recognition, you could actually use it for

26:01 regular marketing data, like use everything else for. So I was like, all right, that's interesting. Maybe

26:06 I'll work on that now. But I don't think at this point, you need to know machine learning to be a good

26:11 or deep learning to be a good data scientist or machine learning engineer. I think the basics are really

26:16 good to know, because in many problems, you know, the basics will get you very far. And there's a lot

26:20 less that can go wrong.

26:22 Yeah, a lot of those algorithms you talked about as well, like K-Nearest Neighbor and so on.

26:26 There are several books that seem to cover all of those. I can't think of any off the top of my

26:30 head, but I feel like I've looked through a couple and they all seem to have like, here are the main

26:33 algorithms you need to know to kind of learn data science. So not too hard to pick them up.

26:38 Slash names Bishop, the book that I read for grad school, but that's already 10 years old,

26:42 certainly had all that stuff. That was very deep on math. I can send you a link if I want.

26:46 Sure.

26:46 I think kind of any intro book to machine learning will have all of that stuff.

26:50 And basically, it's not in order of like hard to easy. It's just sort of, hey,

26:55 these are things that have helped in the past and that statisticians and machine learning engineers

27:02 have relied on in the past to get started and it's worked for them. So maybe it'll work for you.

27:06 Cool. Well, a lot of machine learning and data science is about making predictions. We have some

27:12 data. What does that tell us about the future, right?

27:16 Right.

27:16 That's where the Bayesian inference comes from in that world, right?

27:20 Yeah. It's trying to form beliefs, which could be a belief about something that already happened that

27:26 you don't know about, but you'll find out in the future or be affected by in the future, or it could

27:30 be a belief about something that will happen in the future. So something that either will happen in

27:35 the future or you'll learn about in the future. But Bayesian inference is more about, you know,

27:39 forming beliefs and I kind of call it like it's a quantification of the scientific method. So in the

27:47 basic form, the Bayes rule is very easy. You start with your current beliefs and you codify that in a

27:53 special mathematical way. And then you say, okay, here's some new data I received on this topic. And then it

27:59 gives you a framework to update your beliefs within the same framework that you've began with.

28:04 Right. And so like an example that you gave would be say a fire alarm, right?

28:09 We know from like life experience that most fire alarms are false alarms. You know, one example is

28:16 what is your prior belief that there is a fire right now without seeing the alarm? The alarm is the data.

28:23 The prior is what's the probability that, you know, my building is on fire and I need to

28:29 get the F out right now. You know, it's very low actually. Yeah. I mean,

28:34 yeah, for most of us, it hasn't really happened in our life. Maybe we've seen one or two fires,

28:39 but they weren't that big of a deal. I'm sure there are some people in the audience who have seen

28:44 bad fires and for them, maybe their prior is a little higher.

28:47 I once in my entire life have had to escape a fire.

28:49 Yeah.

28:50 Only once, right?

28:51 Were you in like real danger or?

28:53 Oh yeah, probably. It was a car and the car actually caught on fire.

28:57 Oh yeah. That sounds pretty bad.

28:58 It had been worked on by some mechanics and they put it back together wrong. It like shot

29:02 oil over something and it caught fire. And so we're like, Oh, the car's on fire. We should get out of

29:05 it.

29:06 Yeah. But yeah, sitting in your building at work, your prior is going to be much lower than in a car that

29:11 you just worked on. So when the alarm goes off, okay, that's your data. The data is that we received

29:18 an alarm today. And so then you have to think about, okay, I still have two hypotheses, right? Hypothesis one

29:26 is that there is a fire and I have to escape. And hypothesis two is that there is no fire.

29:32 And so once you hear the alarm, you still have those two hypotheses. One is that the alarm is

29:37 going off and there's a real fire. And two is that there is no fire, but this is a false alarm.

29:42 And so what ends up happening is that because there's a significant probability of a false alarm.

29:48 So at the beginning, there is a very low probability of a fire. After you hear the alarm,

29:54 there's still a pretty low probability of a fire, but the probability of a false alarm still overwhelms

29:58 that. Now I'm not saying that you should ignore fire alarms all the time, but because in that case,

30:03 that's a, that's a case where the action that you take is important regardless of the belief. So,

30:10 you know, Hey, there is a very low cost to checking into it, at least checking into it or leaving the

30:17 building in, if you have a fire alarm, but there's a very high consequence of failure. So high.

30:21 Exactly. Exactly. But in terms of just forming beliefs, which is a good reason not to panic,

30:26 you shouldn't put a lot of probability on the idea that there's definitely a fire.

30:31 Okay. Yeah. So that's basically Bayesian inference, right? I know how likely a fire is. I have all

30:37 of a sudden, I have this piece of data that now there is a fire. I have a set, a space of hypotheses

30:43 that could apply, try to figure out which hypothesis, start testing and figure out which one is the right

30:50 one. Maybe. Yeah. So you take your prior. So let's say there's like a, I don't know, one in 10,

30:56 a hundred thousand chance that there's a fire in the building today and a 99,999 chance there isn't.

31:03 Then you take that, that's your prior. Then you multiply it by your likelihood, which is okay.

31:10 What is the likelihood of seeing the data given that the hypothesis is true? So what's the likelihood

31:17 that the alarm would go off if there is a fire? Maybe that's pretty high. Maybe that's close to one

31:21 or a little bit lower than one. And then on the second hypothesis that there's no fire,

31:26 what's the likelihood of a false alarm today, which could actually be pretty high. Could be like one

31:32 in a thousand or even one in a hundred in some buildings. And then you multiply those together and

31:36 then you get an unnormalized posterior and that is your answer. So it's really just multiplication.

31:40 Yeah. It's like simple fractions once you have all the pieces, right? So it's a pretty simple

31:45 algorithm. It's very hard to describe through audio, but it's much better visually if you want to check it

31:51 out. I've been struggling to describe it through audio for, you know, for the last year and a half,

31:55 but I do the best I can.

31:57 This is like describing code. You can only take it so precisely.

32:00 Yeah.

32:00 This portion of Talk Python To Me is brought to you by Tidelift. Tidelift is the first managed

32:07 open source subscription, giving you commercial support and maintenance for the open source

32:12 dependencies you use to build your applications. And with Tidelift, you not only get more dependable

32:17 software, but you pay the maintainers of the exact packages you're using, which means your software

32:22 will keep getting better. The Tidelift subscription covers millions of open source projects across

32:27 Python, JavaScript, Java, PHP, Ruby, .NET, and more. And the subscription includes security updates,

32:33 licensing, verification, and indemnification, maintenance and code improvements, package selection,

32:38 and version guidance, roadmap input, and tooling and cloud integration. The bottom line is you get the

32:44 capabilities you'd expect and require from commercial software. But now for all the key open source

32:50 software you depend upon. Just visit talkpython.fm/Tidelift to get started today.

32:56 This comes from a reverend, Reverend Bays, who came up with this idea in the 1700s, but for a long time,

33:07 it wasn't really respected, right? And then it actually found some pretty powerful,

33:12 it solved some pretty powerful problems that matters a lot to people recently.

33:17 Yeah. I mean, I can't go through the whole, do the whole history justice in just a few minutes, but

33:22 I'll try to give my highlights, which was this reverend who was sort of, he was a, you know,

33:28 he was into theology and he was also into mathematics. So he was probably like pondering big questions and

33:34 he wrote down notes and he was trying to figure out the validity of various arguments.

33:39 His notes were found after he died, so he'd never published that. And so this was taken by

33:46 Pierre Laplace, who was a more well-known mathematician and kind of formalized. But when the basis of

33:52 statistical thinking was built in the late 20th, early 19th century, or late 19th, early 20th century,

34:00 it really went in a more frequentist direction where it's like, no, a probability is actually a

34:08 fraction of a repeatable experiment that kind of like over time, what fraction does it, does it end up

34:15 as? And so they consider probability as sort of a, an objective property of the system. So for example,

34:22 a dice flip, well, each side is one sixth. That's like kind of an objective property of the,

34:27 of the die. Whereas no Bayesian statistics is called sort of based on belief. And because belief kind of

34:33 seemed unscientific and the frequentists had very good methods for coming up with, with answers and

34:40 more, more objective ways of doing it, they sort of had the upper hand. But as kind of the focus got into

34:48 more complex issues and we had the rise of computers and that sort of thing, and the rise of more data and

34:55 that sort of thing, Bayesian inference started taking a bigger and bigger role until now, I think most

35:02 machine learning engineers and most data science scientists think as a Bayesian. And so it's like

35:07 some examples in history, most people are probably aware of Alan Turing at Bletchley Park, along with

35:14 many other people, you know, building these machines that broke the German codes during World War II.

35:19 It's all movie about it.

35:21 Right. That's trying to break the Enigma machine and the Enigma code. And that, those were some

35:26 important problems to solve, but also highly challenging.

35:31 Yeah. And so they incorporated a form of Bayes rule into this. Well, what are my relative beliefs

35:37 as to the setting of the machine? Because, you know, the machine could have had quadrillions of settings and

35:42 they're trying to distinguish between which one is likely to have and which one's not likely to have.

35:48 But after the war, that stuff was classified. So nobody could say, oh yeah, Bayesian inference was

35:55 used in that problem. And one interesting application that I found, even as it wasn't accepted by academia

36:00 for many years, was life insurance. Because they're kind of on the hook for determining if the actuaries

36:07 get the answer wrong as to how likely people are to live and die, then they're on the hook for lots and

36:13 lots of money or like the continuation of their company if they get it wrong.

36:17 And so-

36:18 Right. Right. Or how likely is it to flood here?

36:20 How likely is it for there to be a hurricane that wipes this part of the world off the map?

36:25 Right.

36:25 And a lot of these were one-off problems. You know, one problem is, you know, what's the

36:29 likelihood of two commercial planes flying into each other? It hadn't happened, but they wanted to

36:34 estimate the probability of that. And you can't do repeated experiments on that. So they really had to

36:38 use a priors, which was sort of like expert data. And then, you know, more recently, as we had the rise of

36:45 kind of machine learning algorithms and big data, you know, Bayesian methods have become more and more

36:52 relevant. But also a big problem was, you know, the problems that we just mentioned, which are, you

36:58 know, fire alarms and figuring out whether or not you have a disease and things like that. That's the

37:02 two hypothesis problem. But a lot of times you have an infinite space, you have an infinite hypothesis

37:08 problem that you're trying to determine between an infinite set of possible hypotheses. And that becomes

37:14 very difficult to do, becomes extremely difficult without a computer, even with a computer becomes

37:19 difficult to do. And so, you know, there's been a lot of research into how do you search that space

37:24 of hypotheses to find the ones that are most likely. And so if you've heard the term Markov chain Monte

37:29 Carlo, that is the most common algorithm used. And for that purpose, there is even current research

37:36 into that, to making that faster and finding the hypothesis you want more quickly. Andrew Gellman at

37:41 Columbia has some, a lot of stuff out about this. And he has like a new thing that's called like the

37:48 nuts, which is like the no U-turn sampler, which is based off a very complicated version of MCMC.

37:55 And so that's what's used in a framework that Python has called PyMC3 to come up with your

38:02 most likely hypothesis very, very quickly.

38:04 So let's take this over to the Python world. Yeah. Like, yeah, there's a lot of stuff that works

38:10 with it. And obviously, like you said, the machine learning deep down uses some of these techniques,

38:15 but this PyMC3 library is pretty interesting. Let's talk about it. So its subtitle is probabilistic

38:23 programming in Python.

38:25 If I could start with some alternatives, which I've used because I haven't, I've been diving into

38:30 reading about PyMC3, but I haven't used it personally. So even when I was doing things in 2014,

38:36 just on my own, basically without libraries, I was able to use Python very, very easily to

38:42 kind of put in these equations for Bayesian inference on whether it's multi-logistic regression,

38:50 or another one I did was Dirichlet prior calculator, which if I can kind of describe that, it's sort

38:56 of thinking, well, how, what should I believe about a place before I've seen any reviews? Should I

39:00 believe it's good? Should I believe it's bad? You know, if I have very few reviews, what should I

39:04 believe about it? Which was an important question to ask for something like four square city guide in

39:10 many cases, because we didn't have a lot of data. And so that was a good application of Bayesian

39:15 inference. And I was able to just use the equations straight up and kind of from first principles,

39:21 apply algorithms directly in Python. And it actually was not that hard to do because when searching the

39:29 space, there was a single global maximum, didn't have to worry about the local maximum in these

39:34 equations. So it was just a hill climbing. Hey, I'm going to start with this hypothesis in this

39:39 n dimensional space, and I'm going to find the gradient, I'm going to go a little higher,

39:43 a little higher, a little higher gradient ascent is what I described, although it's usually called

39:47 gradient descent. So that's sort of an easy one to understand. Then if you want to do MCMC directly,

39:54 because you have some space that you want to search, and you have the equations of the probability

40:00 on each of the points in that space, I used pi MC, which is spelled E M C E E, which is a simple

40:09 program that only does MCMC. And so I had a lot of success with that when I wanted to do some one off

40:18 sampling of, you know, non standard probability distributions. So those are ones that I've actually

40:24 used and had success with in the past. But pi MC three seems to be like the full, you know, we do

40:31 everything sort of a thing. And basically, what you do is you program probabilistically. So you say,

40:38 hey, I imagine that this is how the data is generated. So I'm just going to basically put

40:44 that in code. And then I'm going to let you, the algorithm work backwards and tell me what the

40:50 parameters originally were. So if I could do a specific here, let's say I'm doing logistic regression,

40:57 which is like, every item has a score, or, you know, in the case that I was working on,

41:02 every word has a score, the scores get added up, that's then a real number, then it's transformed

41:08 using a sigmoid into a number between zero and one. And that's the probability that's a positive review.

41:13 And so basically, you'll just say, hey, I have this vector that describes the words this has,

41:20 then I'm going to add these parameters, which I'm not going to tell you what they are.

41:24 And then I'm going to get this result. And then I'm going to give you the final data set at the end.

41:28 And it kind of works backwards and tells you, okay, this is what I think the parameters were.

41:33 And what's really interesting about something like pi MC3, which I would like to use in the future is

41:40 when you do a linear regression or logistic regression, in kind of standard practice,

41:44 you get one model at the end, right? This is the model that we think is best. And this is the model

41:50 that has the highest probability. And this is the model that we're going to use. Great. You know,

41:55 that that works for a lot of cases. But what pi MC3 does is that instead of picking a model at the end,

42:02 it says, well, we still don't know exactly which model produced this data. But because we have the

42:08 data set, we have a better idea of which models are now more likely and less likely. So we now have

42:13 a probability distribution over models. And we're going to let you pull from that.

42:17 So it kind of gives you a better sense of what the uncertainty is over the model. So

42:21 for example, if you have a word in your data set, let's say the word's delicious, and it's a pod,

42:28 we know it's a positive word. But let's say for some reason, there's not a lot of data on it,

42:32 then it can say, well, I don't really know what the weight of delicious should be.

42:38 It's being used at rock concerts. We don't know why. What does it mean?

42:41 Yeah, yeah, yeah. And so we're going to give you a few possible models. And, you know, and you can

42:45 keep sampling from that. And you'll see that the deviation, the discrepancy, the variance of that

42:52 model is going to be very high of that weight is going to be very high, because we just don't have a

42:56 lot of data on it. And that's something that standard regressions just don't do.

43:00 That's pretty cool. And the way you work with it is, you basically code out the model and like a

43:06 really nice Python language API. You kind of say, well, this, I think it's a linear model,

43:13 I think it's this type of thing. And then like you said, it'll go back and solve it for you. That's

43:18 pretty awesome. I think it's nice.

43:19 Right. A good thing to think about it is in terms of just a standard linear regression, like,

43:24 what's the easiest example I can think of? Try to find someone's weight from their height,

43:29 for example. And so you think there might be an optimal coefficient on there given the data.

43:36 But if you use PyMC3, it will say, no, we don't know exactly what the coefficient is given your data.

43:40 You don't have a lot of data, but we're going to give you several possibilities. We're going to give

43:44 you a probability distribution over it. And as I say, in the local maximum, you shouldn't make everything

43:50 probabilistic because there is a cost in that. But oftentimes you can, by considering something to be,

43:56 rather than considering one single truth by considering multiple truths probabilistically,

44:00 you can unlock a lot of value. In this case, you can kind of determine your variance a little better.

44:05 Yeah, that's super cool. I hadn't really thought about it. And like I said, the API is super clean

44:10 for doing this. So it's great. Yeah.

44:12 Where does this Bayesian inference, like, where do you see this solving problems today? Where do you see

44:18 like stuff going? What's the world look like now?

44:21 I've been using it to solve problems basically as soon as I started working as a machine learning engineer

44:26 at Foursquare, basically using Bayes' rule as kind of my first principles whenever I approach a problem.

44:32 And it's never driven me in the wrong direction. So I think it's one of those timeless things that

44:38 you can always use. For me, especially after working with our attribution product a lot,

44:44 I think that the future is trying to figure out causality a lot better. And I think that's where

44:51 some of these more sophisticated ideas come in. Because it's one thing to say, this variable is

44:56 correlated with that and I can have a model. But it's like, well, what's the probability that this

45:00 variable, changing this variable actually causes this other variable to change? In the case of ads,

45:06 where you could see where it's going to unlock a lot of value for companies where, you know,

45:10 there might be a lot of investment in this, is what is the probability that this ad affects

45:16 someone's likelihood to visit my place or to buy something from me more generally? Or what is my

45:23 probability distribution over that? And so can I estimate that? And I think that that whole industry

45:31 of online ads is, it's very frustrating for an engineer because it's so inefficient. And there's so

45:37 many people in there that don't know what they're doing. And it could be very frustrating at times.

45:40 But I think that means also that there's a lot of opportunity to like unlock value if you have

45:46 a lot of patience. Sure. Well, so much of it is just they looked for this keyword, so they must be

45:51 interested, right? It doesn't take very much into account. Yeah, but the question is, okay, maybe they

45:56 look for that keyword and now they're going to buy it no matter what I do. So don't send them the ad,

46:00 send the ad to someone who didn't search the keyword. Or maybe they need that extra push and that extra

46:04 push is very valuable. It's hard to know unless you measure it. And you measure it, you don't get a

46:10 whole lot of data. So you really, it really has to be a Bayesian model. Whoever uses these Bayesian

46:17 models is going to get way ahead. But right now it goes through several layers. I kept saying when we

46:23 were working on this problem and people weren't getting what we were doing, I was like, I wish the

46:29 people who are writing the check for these ads could get in touch with us because I know they care.

46:33 But, you know, oftentimes you're working through sales and someone on the other side.

46:39 It was just too many layers between, right?

46:41 Yeah.

46:41 Yeah, for sure.

46:42 Earlier, you spoke about having your code go fast and you talked about Cython.

46:48 Oh yeah.

46:48 What's your experience with Cython?

46:49 I used that for the multi-logistic regression. And all I can say is it took a little getting used

46:57 to, but, you know, I got an order of magnitude speed up, which we needed to launch that thing

47:02 in our one-off Python job at Foursquare. So it took only a few hours versus all day. So it was kind

47:12 of a helpful tool to get that thing launched. And I haven't used it too much since, but I kind of keep

47:18 that in the back of my mind as a part of my toolkit.

47:21 Yeah. It's great to have in the toolkit. I feel like it doesn't get that much love, but

47:25 I know people talk about Python speed and, oh, it's fast here. It's slow there.

47:30 Yeah.

47:31 First people just think it's slow because it's not compiled, but then you're like, oh,

47:34 but wait about the C extensions. You go, actually, yeah, that's actually faster than Java or something

47:39 like that. So interesting.

47:40 Yeah. I've also had a big speed up just by taking, you know, a dictionary or matrix I was using and then

47:47 using NumPy instead of the, or NumPy, I don't know how you pronounce it, but instead of using-

47:53 I go with NumPy, but yeah.

47:54 Okay.

47:54 NumPy instead of the standard, like, you know, Python tools, you could also get a big speed

48:01 up there.

48:01 Yeah, for sure. And that's pushing it down into the C layer, right?

48:04 Yeah.

48:04 But a lot of times you have your algorithm and Python, and one option is to go write that C

48:10 layer because you're like, well, we kind of need it. So here we go down the rabbit hole of writing

48:13 C code instead of Python. But Cython is sweet, right? Especially the latest one, you can just put

48:18 the regular type annotations, the Python three type annotations.

48:21 Oh, yeah.

48:21 On the types. And then, you know, magic happens.

48:24 I definitely, I just started with Python and it was like, you know, we're in this,

48:28 these three functions 90% of the time, just fix that.

48:31 It's usually the slow part is like really focused. Most of your code, it doesn't even matter what

48:35 happens to it, right? It's just, there's like that little bit where you loop around a lot

48:39 and that matters.

48:40 Yeah.

48:41 Yeah.

48:41 It's funny how we over optimize and you can't escape it. Like even when I'm creating,

48:46 you know, I see like a bunch of doubles. I'm like, oh, but these are only one and zero. Can

48:50 we like change them to Boolean? But like in the end, it doesn't care. It doesn't matter.

48:54 For most of the code, it really has no effect.

48:56 For sure.

48:57 Except in that one targeted place.

48:58 Yeah. So the trick is to use the tools to find it, right?

49:01 Yeah.

49:02 Like C profiler or something like that. The other major thing, you know, one thing you can do to

49:07 speed up stuff like this, these algorithms is just to say, well, I wrote it.

49:11 I wrote it in Python or I use this data structure and maybe if I rewrote it differently or I wrote

49:17 it in C or I applied Cython, it'll go faster. But it could be that you're speeding up the execution

49:23 of a bad algorithm. And if you had a better algorithm, it might go a hundred times faster

49:28 or something, right? Like, so how do you think about that with your problems?

49:31 That's what I did for the, back in 2014 with the Dirichlet prior calculator. And that was an

49:38 interesting problem to solve because to recap on that, it's one of the use cases we had.

49:44 Okay. What's my prior on a venue before I've gotten any reviews? What's my prior on a restaurant

49:49 before I've gotten any reviews? And I'm using the experience of the data on all the other restaurants

49:53 I've seen. So we know what the variance is. And let me try to come up with an equation that can

49:59 calculate that value from the data. And it turned out there were some algorithms available,

50:04 but as I dug into the math, I noticed that there was like a math trick that I could make use of.

50:12 In other words, it was something like certain logs were being taken of the same number,

50:17 were being taken over and over again. And it's like, okay, just store how many times we took the

50:23 log. And then when I dug into the math, they kind of combined into one term and multiply that together.

50:28 So essentially I used a bunch of factoring and refactoring, whether you think of it as factoring

50:33 code or factoring math to get kind of an exponential speed up in that algorithm. And so that's why I

50:41 published a paper on it. I was very proud of that. It was a, it was very satisfying thing to do.

50:45 It might not have mattered in terms of our product, but I think a lot of people used it though,

50:49 to be like, I want rather than just taking an average of what I've seen in the past. No,

50:53 I want to do something that is based on good principles. And so I want to use the Dirichlet

51:00 prior calculator. And so some people have used that. It's my Python code online. And the algorithm has

51:07 proven very fast and like almost instantaneous. Basically, as soon as you load all the data in,

51:13 it gives you the answer, which I like. Now, my next step to that is to use PyMC3,

51:19 rather than giving you an answer, it should give you a probability distribution over answers.

51:22 Yeah, that's right.

51:23 I haven't done that yet. Didn't know about that. Yeah. Didn't know about that at the time. I think

51:26 my speed up would still apply.

51:28 Yeah, that's cool. Well, that definitely takes it up a notch. What about learning more about

51:32 Bayesian analysis and inference and like, where should people go for more resources?

51:37 Oh, okay. Well, a kind of a history book that I read that I really like on Bayesian inference

51:42 is one called The Theory That Should Not Die by Sharon McGrane, a few years old, but it's really good

51:50 if you're interested in the history on that. I have a book about PyMC3, kind of a tech book that does go

51:56 into the basics of Bayesian inference that has a really good title. It's called Bayesian analysis

52:02 with Python. Oh, yeah.

52:04 Yeah, yeah. So that's a good one to look at. And then I have a bunch of episodes on my show

52:10 that are related to Bayesian analysis. So episode zero and one on my show were basically just starting

52:18 out trying to describe Bayes' role to everyone. I sort of attempted to do the description in episode

52:24 zero. And then in episode one, I applied it to the news story that was happening that day,

52:28 which was kind of the fire alarm at the bigger scale, which was everyone in Hawaii getting this

52:33 message that there's an ICBM missile coming their way because of a mistake someone made.

52:39 And then-

52:41 Yeah, because of some terrible UI decision on like the tooling.

52:45 Yeah, is that what it was?

52:46 Yeah, yeah.

52:47 Yeah.

52:47 There was some analysis about what had happened and not probabilistically, but there was some,

52:52 there's some really old crummy UI and they have to press some button to like acknowledge a test.

52:58 Or treat it as real and somehow they look like almost identical or there's some weird thing

53:03 about the UI that had like tricked the operator into saying, oh, it's real.

53:07 Yeah, yeah. And then another couple episodes I want to highlight is episode 21 and 22,

53:13 which is sort of kind of 21 is the philosophy of probability. In 22, we talk about the problem

53:18 of p-hacking, which is when people try their experiments over and over and until they get

53:24 something that works with p-values, which is a frequentist idea, which works if you're using

53:29 it properly. But the problem is most people don't. And then we did an episode, I think it

53:33 was 65 on probability, how to estimate the probability of something that's never happened. And then

53:39 78, the one that you mentioned, which was on the history of Bayes and a little more philosophy.

53:45 So I've talked about that a lot. You could probably go to localmaxradio.com or

53:49 localmaxradio.com slash archive and find the ones that you want.

53:52 That's really cool. So yeah, I guess we'll leave it there for now. That's quite interesting. And yeah,

53:58 it gives us a look into some of the algorithms and math we got to know for our data science.

54:02 Now, before you get out of here, though, I got the two questions I always ask everyone.

54:06 You're going to write some Python code. What editor do you use?

54:09 I just use Sublime or TextMate also on Mac. But I'm sure I could do something a little better

54:16 than that. I just picked one and never really looked back.

54:19 Sounds good. And then notable PyPI package?

54:23 Notable.

54:24 Maybe not the most popular, but like, oh, you should totally know about this. I mean,

54:28 you already threw out there PyMC3, if you want to claim that one, or if there's something else. Yeah,

54:32 pick that.

54:33 Yeah. Well, I have BayesPy, which is the one that's like in GitHub slash max slash BayesPy,

54:40 which has all the stuff I talked about. It's not actively developed, but it does have my kind of

54:44 one-off algorithms, which if you're in the market for multinomial models or Dirichlet,

54:52 or you want some kind of interesting new way to do multi-logistic regression, I could certainly give

55:00 that a try. But most people probably want to use kind of the standard toolings. Yeah. Why don't I go

55:06 with that? Why don't I go with the one I wrote a long time ago?

55:09 Yeah. Right on. Sounds good. All right. Final call to action. People are excited about this stuff.

55:13 What do you tell them? What do they do?

55:15 Check out the books I mentioned and check out my website, localmaxradio.com. And also subscribe to the Local Maximum. It should be on all of your podcatchers.

55:26 If it's not on one, please let me know. But it should be on all of your podcatchers.

55:31 localmaxradio.com. It's just every week. And we have a lot of fun. So definitely check it out.

55:35 Yeah, it's cool. You spend a lot of time talking about these types of things.

55:37 Super. All right. Well, Max, thanks for being on the show.

55:40 Michael, thank you so much. I really enjoy this conversation.

55:43 Yeah, same here. Bye-bye.

55:44 Bye.

55:45 This has been another episode of Talk Python To Me. Our guest on this episode was Max Sklar,

55:50 and it's been brought to you by Linode and Tidelift. Linode is your go-to hosting for whatever you're

55:56 building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E.

56:02 If you run an open source project, Tidelift wants to help you get paid for keeping it going strong.

56:08 Just visit talkpython.fm/Tidelift, search for your package, and get started today.

56:14 Want to level up your Python? If you're just getting started, try my Python Jumpstart by

56:19 Building 10 Apps course. Or if you're looking for something more advanced, check out our new

56:23 async course that digs into all the different types of async programming you can do in Python.

56:28 And of course, if you're interested in more than one of these, be sure to check out our

56:32 Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show.

56:37 Open your favorite podcatcher and search for Python. We should be right at the top.

56:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

56:46 and the direct RSS feed at /rss on talkpython.fm.

56:50 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

56:54 Now get out there and write some Python code.

56:56 We'll see you next time.

57:10 We'll see you next time.