WEBVTT

00:00:00.260 --> 00:00:04.480
Traditionally, when we've depended upon software to make a decision with real-world implications,

00:00:04.480 --> 00:00:06.260
that software was deterministic.

00:00:06.260 --> 00:00:10.940
It had some inputs, a few if statements, and we could point to the exact line of code where

00:00:10.940 --> 00:00:14.520
the decision was made, and the same inputs would lead to the same decisions.

00:00:14.520 --> 00:00:19.180
Nowadays, with the rise of machine learning and neural networks, this is much more blurry.

00:00:19.180 --> 00:00:20.620
How did the model decide?

00:00:20.620 --> 00:00:24.960
Has the model inputs drifted apart so the decisions are outside what it was designed for?

00:00:24.960 --> 00:00:28.840
These are just some of the questions discussed with our guest, Andrew Clark, on this,

00:00:29.120 --> 00:00:34.120
episode 261 of Talk Python to Me, recorded April 17th, 2020.

00:00:34.120 --> 00:00:52.140
Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the

00:00:52.140 --> 00:00:53.600
ecosystem, and the personalities.

00:00:53.600 --> 00:00:55.540
This is your host, Michael Kennedy.

00:00:55.540 --> 00:00:57.680
Follow me on Twitter, where I'm @mkennedy.

00:00:57.860 --> 00:01:02.780
Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter

00:01:02.780 --> 00:01:03.840
via at Talk Python.

00:01:03.840 --> 00:01:07.700
This episode is sponsored by Linode and Reuven Lerner.

00:01:07.700 --> 00:01:09.880
Please check out what they're both offering during their segments.

00:01:09.880 --> 00:01:11.200
It really helps support the show.

00:01:11.200 --> 00:01:13.620
Andrew Clark, welcome to Talk Python to Me.

00:01:13.620 --> 00:01:14.200
Hi, Michael.

00:01:14.200 --> 00:01:15.460
Glad to be here.

00:01:15.460 --> 00:01:17.240
It's always a pleasure listening to your podcast.

00:01:17.240 --> 00:01:19.180
I'm really excited to get to talk to you today.

00:01:19.180 --> 00:01:20.260
Well, thank you so much.

00:01:20.260 --> 00:01:22.320
And I'm excited to talk to you as well.

00:01:22.920 --> 00:01:24.900
It's going to be really fun to talk about machine learning.

00:01:24.900 --> 00:01:31.780
And I think of all of the programming type of stuff out there, more than most, machine

00:01:31.780 --> 00:01:35.760
learning is truly a black box for a lot of people.

00:01:35.760 --> 00:01:40.100
The decisions it makes is sort of unclear.

00:01:40.100 --> 00:01:41.840
It's sort of statistical.

00:01:42.260 --> 00:01:47.820
To me, it feels a little bit like quantum mechanics over Newtonian physics or whatever.

00:01:47.820 --> 00:01:53.020
It's like, well, I know there's rules that guide it, but it's kind of just it does its

00:01:53.020 --> 00:01:55.380
thing and you can't really know for sure.

00:01:55.380 --> 00:01:59.980
But at the same time, the world is beginning to rely more and more on machine learning, right?

00:01:59.980 --> 00:02:00.880
Yes, definitely.

00:02:01.400 --> 00:02:05.320
And there's definitely different levels of the complexity of it on how much is deterministic

00:02:05.320 --> 00:02:07.880
and how much of it really is quantum mechanics.

00:02:07.880 --> 00:02:09.520
Nice.

00:02:09.520 --> 00:02:10.420
All right.

00:02:10.420 --> 00:02:12.800
Now, before we get into that, though, of course, let's start with your story.

00:02:12.800 --> 00:02:14.560
How did you get into programming and into Python?

00:02:14.560 --> 00:02:15.360
Yeah, definitely.

00:02:15.360 --> 00:02:17.760
In my undergrad, I was actually an accounting major.

00:02:17.760 --> 00:02:22.820
And then about halfway through, I realized accounting is a great career, but it's really boring for

00:02:22.820 --> 00:02:23.040
me.

00:02:23.040 --> 00:02:26.060
So I started trying to just get into programming and stuff.

00:02:26.060 --> 00:02:27.980
Big data was like the craze when I was in college.

00:02:27.980 --> 00:02:30.880
Everybody was talking about big data and Hadoop and all that kind of thing.

00:02:31.400 --> 00:02:35.740
So I really started wanting to learn more about it and work through Code Academy and learn

00:02:35.740 --> 00:02:38.900
Python the hard way and some of those other books that are around.

00:02:38.900 --> 00:02:42.120
Tried a bunch of different things to see what worked for me and learn Python the hard way

00:02:42.120 --> 00:02:43.040
is what really seemed to stick.

00:02:43.040 --> 00:02:47.780
And then really started just trying to find little side projects of little things to do

00:02:47.780 --> 00:02:49.520
to actually build the programming.

00:02:49.520 --> 00:02:53.400
So I tried doing the top down, like here's the theory, more computer science-y approach.

00:02:53.400 --> 00:02:54.460
Didn't really work for me.

00:02:54.460 --> 00:02:56.580
It's more the bottom up of like, I'm trying to solve a problem.

00:02:56.580 --> 00:02:58.720
I want to move this file to this location.

00:02:58.960 --> 00:03:01.900
And I want to like orient this image in my Excel file.

00:03:01.900 --> 00:03:04.420
There's like different things like that to help learning how to program.

00:03:04.420 --> 00:03:08.840
Do you think that programming is often taught in like in reverse in the wrong way?

00:03:08.840 --> 00:03:09.060
Right.

00:03:09.060 --> 00:03:10.040
It's like very high level.

00:03:10.040 --> 00:03:11.840
And then we're going to work our way down and down.

00:03:11.840 --> 00:03:13.320
And like, here's the concepts.

00:03:13.320 --> 00:03:14.600
And then here's the theory.

00:03:14.600 --> 00:03:18.200
And instead of just like, here's a couple of problems and a few things you need to know

00:03:18.200 --> 00:03:21.520
and a whole bunch of stuff that we're not going to even tell you exists about.

00:03:21.520 --> 00:03:21.740
Right.

00:03:21.740 --> 00:03:23.060
Like you don't even know what a heap is.

00:03:23.060 --> 00:03:23.540
I don't care.

00:03:23.540 --> 00:03:27.520
We're going to just make this work until you like eventually care what like, you know,

00:03:27.520 --> 00:03:28.620
memory allocation is.

00:03:28.620 --> 00:03:28.940
Right.

00:03:28.940 --> 00:03:34.260
At least I found, especially in the academic world, I felt sort of flipped on its head.

00:03:34.260 --> 00:03:35.420
I definitely think so.

00:03:35.420 --> 00:03:37.120
A lot of careers are built that way.

00:03:37.120 --> 00:03:39.480
It's like you need to have some sort of a concrete of why does this matter?

00:03:39.480 --> 00:03:40.300
Why do I care?

00:03:40.300 --> 00:03:43.440
And like being able to do get your hands dirty and then go into the theory.

00:03:43.440 --> 00:03:44.940
So you really want to do both in tandem.

00:03:45.280 --> 00:03:48.940
But really starting with the hands on is really helped me with that approach.

00:03:48.940 --> 00:03:53.300
I know some people can do it the top down theory way, but I definitely think computer science

00:03:53.300 --> 00:03:56.660
could have a lot better programmers faster if they did it from the bottom up.

00:03:56.660 --> 00:03:57.280
Yeah, I agree.

00:03:57.280 --> 00:04:02.360
And computer science or programming in general is nice in that it has that possibility.

00:04:02.360 --> 00:04:02.660
Right.

00:04:02.660 --> 00:04:03.360
Like it's harder.

00:04:03.360 --> 00:04:05.260
You know, we joked about quantum mechanics.

00:04:05.260 --> 00:04:09.220
It's harder to just like, well, let's get your hands dirty with some quantum mechanics.

00:04:09.220 --> 00:04:13.420
And then we'll talk about like wave particle duality, like whatever.

00:04:13.620 --> 00:04:18.960
Like it just doesn't lend itself to that kind of presentation or that kind of like analysis,

00:04:18.960 --> 00:04:20.080
I don't think.

00:04:20.080 --> 00:04:21.680
But programming does, right?

00:04:21.680 --> 00:04:23.300
You could say, we're going to take this little API.

00:04:23.300 --> 00:04:24.320
I'm going to build a graph.

00:04:24.320 --> 00:04:25.800
And how cool is that?

00:04:25.800 --> 00:04:27.200
Now let's figure out what the heck we did.

00:04:27.200 --> 00:04:27.700
Exactly.

00:04:27.700 --> 00:04:28.260
Exactly.

00:04:28.260 --> 00:04:33.460
So that's one of the beauties about programming, especially Python with its ease of starting

00:04:33.460 --> 00:04:35.380
up and just getting something running on your computer.

00:04:35.380 --> 00:04:38.720
You can really start dealing with problems you wouldn't be able to do in some of these fields

00:04:38.720 --> 00:04:39.660
until you had a postdoc.

00:04:39.660 --> 00:04:41.380
You can't even start talking about quantum.

00:04:41.380 --> 00:04:42.660
Exactly.

00:04:42.900 --> 00:04:47.260
So back to your story, you were talking about like, this was the way that you found worked

00:04:47.260 --> 00:04:48.560
for you to get rolling.

00:04:48.560 --> 00:04:49.260
Yeah.

00:04:49.260 --> 00:04:52.140
And it really worked well with the different internships I was doing at the time.

00:04:52.140 --> 00:04:54.400
So I was working in financial audit.

00:04:54.400 --> 00:05:00.080
And during that time was really when auditors were starting to think about using data and

00:05:00.080 --> 00:05:03.080
using more than just normally in auditing, you do random sampling.

00:05:03.080 --> 00:05:08.320
So you'll look at a bunch of transactions from like accounts payable for like, these are bills

00:05:08.320 --> 00:05:09.000
the company paid.

00:05:09.000 --> 00:05:11.380
And being able to find that there's a lot of times there's duplicates.

00:05:11.380 --> 00:05:14.180
Some people are paying invoices twice and things like that.

00:05:14.180 --> 00:05:18.020
So being able to use programming to start solving just audit problems to speed up the process for

00:05:18.020 --> 00:05:20.320
my company and just make my life easier, honestly.

00:05:20.840 --> 00:05:25.340
It was really how I started getting very excited about Python besides just like very small little

00:05:25.340 --> 00:05:26.380
utils on my computer.

00:05:26.380 --> 00:05:29.880
So like seeing what can we really do with this and how can we make solve business problems

00:05:29.880 --> 00:05:30.220
with it?

00:05:30.220 --> 00:05:32.120
So I really came out of that pragmatic way.

00:05:32.120 --> 00:05:32.700
That's cool.

00:05:33.080 --> 00:05:38.440
Did you find that your co-workers, other people trying to solve the same problems, were trying

00:05:38.440 --> 00:05:41.640
to solve it with Excel and you just had like this power that they didn't have?

00:05:41.640 --> 00:05:42.520
Definitely.

00:05:42.520 --> 00:05:43.260
Definitely.

00:05:43.260 --> 00:05:46.440
Excel and some like specific analytics tools.

00:05:46.440 --> 00:05:51.540
Like there's a couple out there for auditors that like Caseware Idea and ACL and things.

00:05:51.540 --> 00:05:53.280
And they're just not as easy to use.

00:05:53.280 --> 00:05:56.180
They trump Excel, but they have a massive learning curve themselves.

00:05:56.180 --> 00:05:58.400
So it's like being able to do something like Python.

00:05:58.960 --> 00:06:03.820
I can both camps of different types of auditors, either Excel or ACL auditors were able to

00:06:03.820 --> 00:06:05.860
just run circles around them with testing.

00:06:05.860 --> 00:06:07.340
That's pretty cool.

00:06:07.340 --> 00:06:10.180
Well, that's a really interesting and practical way to get into it.

00:06:10.180 --> 00:06:17.000
I've recently realized or come to learn that so many people seem to be coming into Python

00:06:17.000 --> 00:06:20.440
and especially the data science side of Python from economics.

00:06:20.440 --> 00:06:25.160
Yeah, there's a big move these days from people traditionally in economics would be using

00:06:25.160 --> 00:06:27.920
like Matplotlib and no, sorry, Matlab.

00:06:28.540 --> 00:06:29.600
Python's on the brain here.

00:06:29.600 --> 00:06:32.500
Matlab and moving in to do more quantitative analysis.

00:06:32.500 --> 00:06:35.480
And now really Python has kind of taken over in economics.

00:06:35.480 --> 00:06:38.780
And there's like this a ton of people that are just coming over to the data science from

00:06:38.780 --> 00:06:41.240
you do those types of modeling for econometrics and stuff.

00:06:41.240 --> 00:06:45.040
So there's definitely a big surge of economists turned data scientists these days.

00:06:45.040 --> 00:06:45.720
Yeah, cool.

00:06:45.720 --> 00:06:47.420
What are you doing day to day now?

00:06:47.420 --> 00:06:51.360
Like you started out doing auditing and getting into Python that way.

00:06:51.360 --> 00:06:52.100
Now what?

00:06:52.100 --> 00:06:57.580
Well, it's been a really crazy ride, but I'm now doing a startup called Monitor that works on

00:06:57.580 --> 00:06:58.440
machine learning assurance.

00:06:58.440 --> 00:07:00.560
Like how do we provide assurance around machine learning models?

00:07:00.560 --> 00:07:02.740
So I'm the co-founder and CTO with them.

00:07:02.740 --> 00:07:07.740
So my day to day right now, we're currently in the midst of Techstars Boston's is very much

00:07:07.740 --> 00:07:12.240
just like meeting with different investors, meeting with their startup founders, meeting with

00:07:12.240 --> 00:07:16.140
client potential clients and running as different parts of business.

00:07:16.140 --> 00:07:18.220
So wearing a lot of hats right now.

00:07:18.320 --> 00:07:22.280
And then I run all the R&Ds type side and figuring out like, what are the things we want to build?

00:07:22.280 --> 00:07:24.560
And then I can start working with the engineering team to execute.

00:07:24.560 --> 00:07:30.480
Do you find that you're doing all these things that sort of a tech background wholly unprepared

00:07:30.480 --> 00:07:30.840
you for?

00:07:30.840 --> 00:07:32.260
There's definitely some of that.

00:07:32.260 --> 00:07:35.400
Some of it, like my accounting degree is definitely coming in handy.

00:07:35.400 --> 00:07:35.940
Yeah.

00:07:35.940 --> 00:07:39.620
A lot of the business side things, but there's definitely a lot of moving pieces.

00:07:39.620 --> 00:07:43.040
And it's more than just strictly like being a good Python programmer.

00:07:43.040 --> 00:07:46.340
So it's a little more than I was expecting on that side.

00:07:46.340 --> 00:07:51.920
Well, I do feel like a lot of people, and this is like my former self included, is like,

00:07:51.920 --> 00:07:53.640
if you build it, they will come.

00:07:53.640 --> 00:07:59.340
All we need is like really good tech and like a compelling project or technology or product.

00:07:59.720 --> 00:08:01.760
And then we'll figure out the rest of the stuff.

00:08:01.760 --> 00:08:06.680
We'll figure out the accounting, the marketing, the user growth, like all that stuff.

00:08:06.680 --> 00:08:07.560
And it's like, no, no, no, no.

00:08:07.560 --> 00:08:11.500
Like the product and the programming is table stakes.

00:08:11.500 --> 00:08:14.280
All this other stuff is what it actually takes to make it work, right?

00:08:14.280 --> 00:08:14.800
Definitely.

00:08:14.800 --> 00:08:15.380
Definitely.

00:08:15.380 --> 00:08:19.660
You need to have innovative tech, but that's only a component of it.

00:08:19.660 --> 00:08:21.840
Like that's like one fourth of the problem.

00:08:21.840 --> 00:08:27.100
And, you know, for I think a lot of myself and I think a lot of first time startup individuals

00:08:27.100 --> 00:08:30.760
as well, you come into that thinking like, hey, tech is going to be king when there's

00:08:30.760 --> 00:08:33.040
actually so many other moving pieces like you're mentioning.

00:08:33.040 --> 00:08:36.960
So it's actually an eye opening experience, but I really respect founders more.

00:08:36.960 --> 00:08:38.500
Yeah, same.

00:08:38.500 --> 00:08:44.860
Well, I do find it really interesting and almost disheartening when you realize like I have

00:08:44.860 --> 00:08:47.940
all this amazing programming skill and I know I can build this.

00:08:47.940 --> 00:08:51.600
If I had two months and I'm really focused, we can build this thing.

00:08:51.600 --> 00:08:55.620
And you realize like, actually, that's just a small part of like having to build it.

00:08:55.620 --> 00:08:56.680
Like this is where it starts.

00:08:56.740 --> 00:08:58.720
This is not like basically it's ready to roll.

00:08:58.720 --> 00:08:59.520
So I don't know.

00:08:59.520 --> 00:09:02.360
It's something I've been thinking about over the last few years.

00:09:02.360 --> 00:09:08.500
And it's not the way I think a lot of people perceive it starting something like this.

00:09:08.500 --> 00:09:10.820
If you have like, we're going to get a cool programming team together.

00:09:10.820 --> 00:09:12.020
We're going to build this thing and it'd be great.

00:09:12.020 --> 00:09:13.920
It's like, okay, well, that gets you started.

00:09:13.920 --> 00:09:14.540
But then what?

00:09:14.540 --> 00:09:14.800
Right?

00:09:14.800 --> 00:09:15.520
Exactly.

00:09:15.520 --> 00:09:19.420
And the other problem you have with a startup is like you can't even do, okay, we'll give

00:09:19.420 --> 00:09:20.620
me two months and we'll can build it.

00:09:20.620 --> 00:09:24.220
Like with all the other competing business things and stuff, you can't even get that straight

00:09:24.220 --> 00:09:27.620
two months of just building because you have all the investor obligations and things.

00:09:27.620 --> 00:09:29.460
So it's definitely a lot of things to juggle.

00:09:29.460 --> 00:09:32.400
So you get really good at time management and priority juggling.

00:09:32.400 --> 00:09:34.240
But it's exciting, right?

00:09:34.240 --> 00:09:34.860
It's a lot of fun.

00:09:34.860 --> 00:09:35.140
Yeah.

00:09:35.140 --> 00:09:36.440
Oh, it's fantastic.

00:09:36.440 --> 00:09:38.140
And being part of Techstars is really wonderful.

00:09:38.140 --> 00:09:39.920
It's wouldn't trade it for anything.

00:09:39.920 --> 00:09:42.560
It's just definitely eye opening and a great learning experience.

00:09:42.560 --> 00:09:47.460
Yeah, I've had some people on the show a while back who had gone through Techstars, but maybe

00:09:47.460 --> 00:09:49.480
folks haven't listened to that show or whatever.

00:09:49.480 --> 00:09:57.940
So probably the most popular, well-known something like that is probably Y Combinator, though I

00:09:57.940 --> 00:09:59.580
don't know that it's exactly the same thing.

00:09:59.580 --> 00:10:04.540
But tell folks about what Techstar is a little bit just so they have a sense of what you're

00:10:04.540 --> 00:10:05.520
working on or what you're doing.

00:10:05.520 --> 00:10:08.040
So Techstars is an accelerator.

00:10:08.040 --> 00:10:09.640
There's like accelerators and incubators.

00:10:09.640 --> 00:10:13.120
I didn't really know there was much of a difference, but being part of Techstars, the

00:10:13.120 --> 00:10:15.160
accelerators are really very mentor heavy.

00:10:15.160 --> 00:10:18.300
Like the first couple of weeks of Techstars, you have a ton of different mentors that you

00:10:18.300 --> 00:10:23.380
meet with, like five a day or more, and trying to find who's going to help guide the vision

00:10:23.380 --> 00:10:23.980
of your company.

00:10:23.980 --> 00:10:25.740
How do you get to write people around the table?

00:10:25.740 --> 00:10:27.340
What's your company vision?

00:10:27.340 --> 00:10:28.180
Your mission statement?

00:10:28.180 --> 00:10:29.420
How are you going to do fundraising?

00:10:29.420 --> 00:10:30.660
What's your marketing strategy?

00:10:30.660 --> 00:10:31.480
What's your go-to-market?

00:10:31.480 --> 00:10:32.640
How do you get your first few customers?

00:10:32.640 --> 00:10:33.820
How do you keep those customers?

00:10:33.820 --> 00:10:38.840
It's very focused on the execution and how do you make a successful business that will last?

00:10:39.100 --> 00:10:41.120
So it's a lot of very business-y things.

00:10:41.120 --> 00:10:47.040
For a tech-run startup company thing, it's very, very focused on the non-tech aspects,

00:10:47.040 --> 00:10:51.040
the business aspects, the how do we differentiate ourselves from the crowd and things.

00:10:51.040 --> 00:10:55.260
So there's seminars, mentor meetings, there's just lots of moving pieces with that.

00:10:55.260 --> 00:10:57.100
But it's a very, very good thing.

00:10:57.100 --> 00:11:02.200
And it helps in about three months to take your company from a cool idea and you have a product

00:11:02.200 --> 00:11:06.620
to really being able to execute your vision and have better staying power.

00:11:06.620 --> 00:11:11.360
Yeah, well, it sounds like it's a lot of the stuff to sort of build that structure and support

00:11:11.360 --> 00:11:14.480
of what we were just talking about, of like what you don't get if you know how to write

00:11:14.480 --> 00:11:15.820
code and create a product.

00:11:15.820 --> 00:11:16.140
Right.

00:11:16.140 --> 00:11:17.380
It's exactly right.

00:11:17.380 --> 00:11:21.120
Because like Techstars, 10 companies are accepted out of over 2,000 applicants.

00:11:21.120 --> 00:11:22.760
So it's like you already have a product.

00:11:22.760 --> 00:11:24.720
You already have good tech to make it across the bar.

00:11:24.720 --> 00:11:27.760
They're like, okay, everybody knows what they're doing, how to build tech.

00:11:27.760 --> 00:11:30.060
Now we've got to teach you guys how to actually run a company.

00:11:30.480 --> 00:11:33.480
So it's a very good way of approaching it.

00:11:33.480 --> 00:11:36.400
It's not like you all sit in a room with a bunch of group think and programmers making

00:11:36.400 --> 00:11:37.460
it more geeky.

00:11:37.460 --> 00:11:40.840
You think like, okay, you might even need to scale back the complexity of this a little

00:11:40.840 --> 00:11:42.720
bit so we can actually market this to the user.

00:11:42.720 --> 00:11:45.800
It's very, very good program for small companies to go through.

00:11:48.300 --> 00:11:50.980
This portion of Talk Python to Me is brought to you by Linode.

00:11:50.980 --> 00:11:55.100
Whether you're working on a personal project or managing your enterprise's infrastructure,

00:11:55.100 --> 00:11:59.640
Linode has the pricing, support, and scale that you need to take your project to the next

00:11:59.640 --> 00:11:59.920
level.

00:11:59.920 --> 00:12:05.640
With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise

00:12:05.640 --> 00:12:11.620
grade hardware, S3 compatible storage, and the next generation network, Linode delivers

00:12:11.620 --> 00:12:14.580
the performance that you expect at a price that you don't.

00:12:14.580 --> 00:12:18.280
Get started on Linode today with a $20 credit and you get access to the next generation

00:12:18.280 --> 00:12:23.840
to native SSD storage, a 40 gigabit network, industry leading processors, their revamped

00:12:23.840 --> 00:12:29.340
cloud manager, cloud.linode.com, root access to your server, along with their newest API and

00:12:29.340 --> 00:12:30.640
a Python CLI.

00:12:30.640 --> 00:12:36.720
Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically

00:12:36.720 --> 00:12:38.660
get $20 credit for your next project.

00:12:38.660 --> 00:12:40.740
Oh, and one last thing, they're hiring.

00:12:40.740 --> 00:12:43.840
Go to linode.com slash careers to find out more.

00:12:43.840 --> 00:12:45.160
Let them know that we sent you.

00:12:47.300 --> 00:12:51.460
Well, I want to talk a little bit more about that at the end and how it's probably changed

00:12:51.460 --> 00:12:53.920
as the world has changed with COVID-19 and all that.

00:12:53.920 --> 00:12:57.080
But let's talk about machine learning and stuff like that first.

00:12:57.080 --> 00:12:57.780
Yeah, definitely.

00:12:57.780 --> 00:13:04.380
We talked a little bit about machine learning and I joke that like, at least to my very limited

00:13:04.380 --> 00:13:09.560
experience, it feels a little bit quantum mechanics-y in that same sense.

00:13:09.560 --> 00:13:14.240
But let's tell people, I guess, what is machine learning?

00:13:14.240 --> 00:13:18.820
Obviously, we have a bit of a sense, but like, let's try to make it a little bit more concrete

00:13:18.820 --> 00:13:23.920
for people who maybe it's still like a buzzword and what makes up machine learning, I guess.

00:13:23.920 --> 00:13:24.860
Definitely.

00:13:25.400 --> 00:13:29.560
So I'll start by saying there's a lot of terminology where people kind of interchange

00:13:29.560 --> 00:13:31.820
together, like AI, machine learning, deep learning.

00:13:31.820 --> 00:13:37.380
So if you think of three big circles, AI is just a broad range of trying to make machines

00:13:37.380 --> 00:13:39.180
to copy human tasks.

00:13:39.680 --> 00:13:42.100
So AI doesn't have to exactly be modeling.

00:13:42.100 --> 00:13:46.240
It can also be rule-based systems and things that try and mimic human intelligence.

00:13:46.240 --> 00:13:50.220
So it's like a broad field of trying to do something that copies human behavior.

00:13:50.220 --> 00:13:50.580
Right.

00:13:50.580 --> 00:13:53.300
Computers making decisions, kind of.

00:13:53.300 --> 00:13:54.180
Right.

00:13:54.180 --> 00:13:54.600
Exactly.

00:13:54.600 --> 00:13:56.180
Trying to make decisions like a human.

00:13:56.180 --> 00:13:58.860
So there's a lot of different components in that.

00:13:58.860 --> 00:14:02.080
And it's been a field that people have been talking about for a long time and you've had

00:14:02.080 --> 00:14:03.600
movies about it for a long time.

00:14:03.600 --> 00:14:05.100
So it's just a broad field.

00:14:05.100 --> 00:14:08.620
There's very many components, a part of it, because you have some things like expert systems

00:14:08.620 --> 00:14:12.600
and some sort of rule-based engines that actually do a pretty good job of mimicking that when

00:14:12.600 --> 00:14:13.780
there's not actually any modeling.

00:14:13.780 --> 00:14:18.780
So they have a limited use because like self-driving cars, if you think about that, you can't build

00:14:18.780 --> 00:14:22.440
enough rules to specify every time a car should turn, there's too much going on.

00:14:22.440 --> 00:14:24.340
You have to use something more complex than that.

00:14:24.340 --> 00:14:25.940
By itself doesn't mean modeling.

00:14:25.940 --> 00:14:31.560
Machine learning is types of statistical models that learn from data to make decisions.

00:14:31.560 --> 00:14:36.300
So that is just a broad field of modeling that is what most people think about when you

00:14:36.300 --> 00:14:37.080
think about machine learning.

00:14:37.080 --> 00:14:38.700
And it's just the modeling aspect.

00:14:38.700 --> 00:14:44.060
And we can go into the classes of models you have, like class supervised, which is basically

00:14:44.060 --> 00:14:48.020
you can have a model, look at a bunch of data that has a decision.

00:14:48.020 --> 00:14:52.980
So like if you look at a bunch of pictures of cake and it says it will be cake, and then

00:14:52.980 --> 00:14:56.060
you look at another picture that's a bunch of raw ingredients and it'll say it's not cake.

00:14:56.060 --> 00:14:59.840
You know, you can train an algorithm to look at those two pictures and make the correct

00:14:59.840 --> 00:15:01.800
decision by optimizing a function.

00:15:01.800 --> 00:15:02.100
Right.

00:15:02.180 --> 00:15:06.380
Where you have, you already know the answer and you kind of tell it like a kid.

00:15:06.380 --> 00:15:07.140
That's right.

00:15:07.140 --> 00:15:08.140
No, that's wrong.

00:15:08.140 --> 00:15:08.620
Right.

00:15:08.620 --> 00:15:09.360
Oh, good job.

00:15:09.360 --> 00:15:10.000
That was right.

00:15:10.000 --> 00:15:10.920
You know, and so on.

00:15:10.920 --> 00:15:11.100
Yeah.

00:15:11.100 --> 00:15:14.740
And it takes that feedback sort of iteratively and like evolves itself.

00:15:14.740 --> 00:15:15.220
Exactly.

00:15:15.220 --> 00:15:16.740
And it's extremely, extremely dumb.

00:15:17.200 --> 00:15:21.680
So you have to just have a lot of training data that is balanced and meaning you have

00:15:21.680 --> 00:15:24.500
the same amount of pi not pi type scenarios.

00:15:24.500 --> 00:15:29.400
So it knows how to learn from that and have very catered data sets that don't generalize

00:15:29.400 --> 00:15:29.820
very well.

00:15:29.820 --> 00:15:34.080
On the far other extreme side of things, you have completely unsupervised learning, which

00:15:34.080 --> 00:15:38.980
is basically let's throw some data at this algorithm and see what patterns emerge, see

00:15:38.980 --> 00:15:39.700
what groups happen.

00:15:39.700 --> 00:15:44.920
So it's very good to use like for very early on exploratory analysis of like I have a bunch

00:15:44.920 --> 00:15:45.160
of data.

00:15:45.260 --> 00:15:46.340
I have no idea what to do with this.

00:15:46.340 --> 00:15:47.640
Let's see if there's anything interesting here.

00:15:47.640 --> 00:15:52.380
But in between the supervised and unsupervised, you'll have like supervised learning or semi

00:15:52.380 --> 00:15:58.000
supervised learning and reinforcement learning, which basically what they do is it's a combination

00:15:58.000 --> 00:16:02.800
of those two where you have not quite explicit labels on everything, but it's not completely

00:16:02.800 --> 00:16:05.900
just sending an algorithm out in the unknown to figure out what's happening.

00:16:05.900 --> 00:16:10.460
And that's where you've seen like a lot of AlphaGo and a lot of really cool problems

00:16:10.460 --> 00:16:13.480
have been done with reinforcement learning and semi supervised.

00:16:13.480 --> 00:16:13.840
Okay.

00:16:13.980 --> 00:16:15.580
And there's a lot going on in that space as well.

00:16:15.580 --> 00:16:15.820
Yeah.

00:16:15.820 --> 00:16:20.620
Well, I think that's a pretty good overview and unsupervised side.

00:16:20.620 --> 00:16:26.780
That's probably the realm of where people start to think, you know, computers are just learning

00:16:26.780 --> 00:16:27.500
on their own.

00:16:27.500 --> 00:16:32.760
And what if they get too smart and they just, you know, take over or whatever.

00:16:32.760 --> 00:16:38.000
But that's kind of, I would say that's probably the almost the science fiction side of things,

00:16:38.000 --> 00:16:39.300
but it's also real.

00:16:39.640 --> 00:16:43.700
Obviously people are doing it, but that's probably what people think of when they think

00:16:43.700 --> 00:16:48.220
of AI is like there's data, the computer looked at it and then it understood it and learned

00:16:48.220 --> 00:16:50.740
it and found things that humans maybe wouldn't have known about.

00:16:50.740 --> 00:16:51.100
Definitely.

00:16:51.100 --> 00:16:54.020
Some of the continuous learning stuff, which is a part of subset of that.

00:16:54.020 --> 00:16:54.320
Yeah.

00:16:54.320 --> 00:16:58.060
And then the very minor part of deep learning was basically as a subset of machine learning,

00:16:58.060 --> 00:17:01.080
which is looking at trying to copy how your brain synapses work.

00:17:01.160 --> 00:17:04.340
And it's just a type of machine learning where you think of the convolutional neural networks

00:17:04.340 --> 00:17:05.120
and things like that.

00:17:05.120 --> 00:17:08.860
So there's like the three big tiers and the different parts that fit in between those.

00:17:08.860 --> 00:17:09.140
Yeah.

00:17:09.140 --> 00:17:13.620
And when I was talking about quantum mechanics analogies, I was thinking of deep learning.

00:17:13.620 --> 00:17:14.400
Exactly.

00:17:14.400 --> 00:17:15.020
Exactly.

00:17:15.020 --> 00:17:16.740
And like deep learning is extremely valuable.

00:17:16.740 --> 00:17:20.860
A lot of the really cool research problems you're seeing people develop, open AI and all

00:17:20.860 --> 00:17:24.300
of these people are using that for doing reinforcement learning with deep learning and things

00:17:24.300 --> 00:17:29.180
like that is not what a lot of companies are using today in regulated industries or even

00:17:29.180 --> 00:17:30.660
in most of most of America.

00:17:30.660 --> 00:17:35.220
You know, a lot of companies are using more traditional supervised and just even getting

00:17:35.220 --> 00:17:37.540
past linear regression in a lot of instances.

00:17:37.540 --> 00:17:41.620
The deep learning stuff that science fiction people are using, but it's not very widely deployed

00:17:41.620 --> 00:17:44.160
yet because there's a lot of problems with the deployments.

00:17:44.160 --> 00:17:44.460
Yeah.

00:17:44.460 --> 00:17:49.880
And probably even just knowing with certainty that it's right and that it's reproducible.

00:17:49.880 --> 00:17:50.400
Completely.

00:17:50.400 --> 00:17:50.820
Yeah.

00:17:50.820 --> 00:17:51.280
Yeah.

00:17:51.280 --> 00:17:56.180
I think that the, you know, it's such a buzzword machine learning and AI that, right.

00:17:56.180 --> 00:18:01.680
If a lot of people seem to think, well, even if just an if statement is in there, like,

00:18:01.680 --> 00:18:03.440
well, the computer decided, so that's AI.

00:18:03.440 --> 00:18:07.380
It's like, well, that's not really what that term means, but that's okay.

00:18:07.380 --> 00:18:10.200
I guess the computer decided, yes, you're right.

00:18:10.200 --> 00:18:11.940
But that's just code.

00:18:11.940 --> 00:18:14.320
I mean, code's been having if statements for a super long time.

00:18:14.320 --> 00:18:19.460
I was thinking back to something in the British parliament where they were upset that there

00:18:19.460 --> 00:18:25.560
was some airline, which is bad, that some airline was looking when people would book tickets.

00:18:25.560 --> 00:18:30.480
If the people had the same last name, they would put them in different seats away from

00:18:30.480 --> 00:18:33.720
each other and then charge them a fee to put them back next to each other.

00:18:33.720 --> 00:18:39.200
And they were saying that this, this AI system that is causing this, that they're abused.

00:18:39.200 --> 00:18:40.740
This is not AI.

00:18:40.740 --> 00:18:41.920
This is just an if statement.

00:18:41.920 --> 00:18:43.040
If the names are the same.

00:18:43.040 --> 00:18:43.440
Exactly.

00:18:43.440 --> 00:18:46.000
They're going to come apart and offer them the upgrade to get it back together.

00:18:46.000 --> 00:18:46.560
Exactly.

00:18:46.560 --> 00:18:46.960
Yeah.

00:18:46.960 --> 00:18:52.940
So I think it's interesting to think about how companies can apply machine learning and maybe

00:18:52.940 --> 00:18:55.220
the hesitancy to do so.

00:18:55.220 --> 00:18:55.620
Right.

00:18:55.620 --> 00:19:02.180
Like if if I'm applying for a mortgage and the company is using, you know, some deep learning

00:19:02.180 --> 00:19:07.000
algorithm to decide whether I'm a good fit to provide a mortgage to.

00:19:07.000 --> 00:19:07.340
Right.

00:19:07.860 --> 00:19:10.940
Some states in the United States have like walkaway laws, right?

00:19:10.940 --> 00:19:12.460
You just decide like, I don't want to pay anymore.

00:19:12.460 --> 00:19:16.460
And you can have the house even if it's like in crappy condition and not worth what it was.

00:19:16.460 --> 00:19:16.760
Right.

00:19:16.760 --> 00:19:17.800
So it's a big risk.

00:19:17.800 --> 00:19:20.380
So they want to get that answer right.

00:19:20.380 --> 00:19:21.200
Yeah.

00:19:21.200 --> 00:19:21.720
Exactly.

00:19:21.720 --> 00:19:28.720
But at the same time, I'm sure there's regulations and stuff saying you can't just randomly assign

00:19:28.720 --> 00:19:34.340
or not assign or have some super biased algorithm just making this decision and say, well,

00:19:34.340 --> 00:19:38.180
it's not our fault the computer did this, you know, unjust thing or whatever.

00:19:38.180 --> 00:19:38.520
Right.

00:19:38.520 --> 00:19:39.020
Exactly.

00:19:39.020 --> 00:19:41.960
And that's what a lot of insurance companies, for instance, are struggling with right now

00:19:41.960 --> 00:19:49.460
is there are laws in the US on like fairness and you can't do gender discrimination or those

00:19:49.460 --> 00:19:51.500
sorts of different protected classes discrimination.

00:19:51.500 --> 00:19:57.420
But there's not a very hierarchical full regulation around machine learning right now.

00:19:57.420 --> 00:20:01.320
So a lot of companies are kind of in the dark on like regulators don't really like machine

00:20:01.320 --> 00:20:01.580
learning.

00:20:01.580 --> 00:20:03.280
They don't understand it.

00:20:03.360 --> 00:20:07.100
They don't like the non deterministic aspects or like it's a they like rule sets.

00:20:07.100 --> 00:20:07.320
Right.

00:20:07.320 --> 00:20:08.680
So if this happens, we'll do this.

00:20:08.680 --> 00:20:12.780
They're very comfortable with having those sorts of rules, but having this kind of nebulous

00:20:12.780 --> 00:20:13.320
modeling.

00:20:13.320 --> 00:20:14.180
Exactly.

00:20:14.180 --> 00:20:22.560
It's easy to do a code audit and say, OK, this is the part where they tested like their income

00:20:22.560 --> 00:20:25.480
to credit card debt ratio.

00:20:25.520 --> 00:20:28.100
And it was above the threshold you all have said.

00:20:28.100 --> 00:20:29.160
And so you said no.

00:20:29.160 --> 00:20:31.780
And so it's because they have too much debt.

00:20:31.780 --> 00:20:37.720
And in fact, I can set a break point and I can step down and look at the code, like making

00:20:37.720 --> 00:20:40.000
those decisions and comparing those values.

00:20:40.000 --> 00:20:41.700
And then here's what it does.

00:20:41.700 --> 00:20:41.980
Right.

00:20:41.980 --> 00:20:46.480
And as far as I know, like break points and deep learning don't really mean a lot.

00:20:46.480 --> 00:20:47.380
Yeah.

00:20:47.380 --> 00:20:52.000
Because a lot of times if you use like a support vector machine algorithm, you won't have a

00:20:52.000 --> 00:20:56.200
linear decision boundary, meaning that sometimes if you have income over this amount and this

00:20:56.200 --> 00:20:57.860
other thing will give you a loan.

00:20:57.860 --> 00:20:58.860
And other times we won't.

00:20:58.860 --> 00:21:00.560
It depends on what some of the other inputs are.

00:21:01.240 --> 00:21:05.400
So that's where regulators get very nervous because you can't have that strict rule based

00:21:05.400 --> 00:21:05.820
system.

00:21:05.820 --> 00:21:06.140
Yeah.

00:21:06.140 --> 00:21:11.920
But at the same time, this is an amazing technology that could make, you know, make those decisions

00:21:11.920 --> 00:21:12.760
more accurate.

00:21:12.760 --> 00:21:14.800
And it could be better.

00:21:14.800 --> 00:21:15.080
Right.

00:21:15.080 --> 00:21:18.220
It could actually understand like, yeah, OK, so you do have a lot of credit card debt, but

00:21:18.220 --> 00:21:22.580
you're doing this other thing that's really shows that you are a good borrower.

00:21:22.580 --> 00:21:23.460
Right.

00:21:23.460 --> 00:21:23.740
Exactly.

00:21:23.740 --> 00:21:25.820
Their code would have completely ignored.

00:21:25.820 --> 00:21:29.520
But the machine learning is like, no, no, that actually those type of people are good.

00:21:29.920 --> 00:21:31.120
You want to loan to them.

00:21:31.120 --> 00:21:31.520
Right.

00:21:31.520 --> 00:21:32.800
You want to lend to them.

00:21:32.800 --> 00:21:33.720
Exactly.

00:21:33.720 --> 00:21:35.320
Because it could be good to have this.

00:21:35.320 --> 00:21:35.460
Right.

00:21:35.460 --> 00:21:36.460
It's not just a negative.

00:21:36.460 --> 00:21:37.960
It's just you just don't know.

00:21:37.960 --> 00:21:39.080
That's a part of the challenge.

00:21:39.080 --> 00:21:39.640
Exactly.

00:21:39.640 --> 00:21:44.520
Because you could have be able to do more targeted rates for consumers so then they can

00:21:44.520 --> 00:21:45.300
have lower rates.

00:21:45.300 --> 00:21:48.380
The custom, the insurance company will have lower default rates.

00:21:48.380 --> 00:21:49.580
Basically, everybody wins.

00:21:49.580 --> 00:21:55.620
The problem is for for executives and regulators is how do you know that this how do you have assurance

00:21:55.620 --> 00:21:58.100
and trust the system is doing what you want to do and you're not going to end up

00:21:58.100 --> 00:22:02.960
on the front page soon like the Apple credit card thing a few months ago with, oh, you were

00:22:02.960 --> 00:22:05.160
really discriminatory or you messed this up.

00:22:05.160 --> 00:22:10.460
So the problem is we're being able to provide that assurance and trust over algorithms to allow

00:22:10.460 --> 00:22:11.300
them to be deployed.

00:22:11.300 --> 00:22:15.100
And that's kind of where the industry is struggling with right now because no one really knows quite

00:22:15.100 --> 00:22:17.200
how to have that with these deterministic qualities.

00:22:17.200 --> 00:22:20.220
Like how do you get that assurance around the model doing what it's supposed to be doing?

00:22:20.220 --> 00:22:22.960
So that's where there's a lot of work right now on the different machine learning monitoring

00:22:22.960 --> 00:22:23.540
you can doing.

00:22:23.880 --> 00:22:27.760
Do you need a third party to monitor those sorts of things so you can provide that trust

00:22:27.760 --> 00:22:31.560
and risk mitigation to be able to deploy the algorithms that are beneficial for both the

00:22:31.560 --> 00:22:32.560
company and the consumer?

00:22:32.560 --> 00:22:32.980
Sure.

00:22:32.980 --> 00:22:38.620
And, you know, we're talking about America because we both live here and whatnot.

00:22:38.620 --> 00:22:44.420
But I think actually Europe probably has even stricter rules around accountability and traceability.

00:22:44.420 --> 00:22:44.960
Definitely.

00:22:45.120 --> 00:22:47.800
That's a good stuff for my cursory experience of both.

00:22:47.800 --> 00:22:48.480
Definitely.

00:22:48.480 --> 00:22:52.500
And Europe's actually doing a better job of first regulating it right now and also providing

00:22:52.500 --> 00:22:54.180
guidance around the regulations.

00:22:54.180 --> 00:23:00.940
So GDPR has several, that's the General Data Privacy Act in Europe, has some regulations

00:23:00.940 --> 00:23:04.980
around like if a consumer's decision has been made fully automated, such as the example you

00:23:04.980 --> 00:23:09.020
just gave, the consumer needs to know and they can request the logic on how that decision

00:23:09.020 --> 00:23:09.460
was made.

00:23:09.720 --> 00:23:14.260
So there's a lot of user consumer protections around machine learning algorithm use in Europe.

00:23:14.260 --> 00:23:16.360
And there's also a lot more guidance.

00:23:16.360 --> 00:23:20.560
Europe is now a little behind on, they're creating now a white paper on guidance around how would

00:23:20.560 --> 00:23:24.020
you make sure the algorithms are doing what they should be doing to help the companies.

00:23:24.020 --> 00:23:30.940
But Britain, the UK has, Information Commissioner's Office has released two different white paper

00:23:30.940 --> 00:23:33.440
drafts on how you do explainable AI.

00:23:33.440 --> 00:23:37.420
How do you make sure you have the assurance around your algorithms and actually doing some prescriptive

00:23:37.420 --> 00:23:39.220
recommendations on how to do that correctly.

00:23:39.600 --> 00:23:44.980
So like Europe is really ahead of the US right now on first regulating AI and then also helping

00:23:44.980 --> 00:23:48.660
companies and consumers understand it and find ways so they can still use it.

00:23:48.660 --> 00:23:50.480
US hasn't really addressed those yet.

00:23:50.480 --> 00:23:50.780
Yeah.

00:23:50.780 --> 00:23:51.220
Yeah.

00:23:51.220 --> 00:23:54.060
And it was exactly that situation that you were talking about.

00:23:54.060 --> 00:24:02.640
If a computer has made a completely automated decision, the consumer can request why, right?

00:24:02.640 --> 00:24:03.060
Exactly.

00:24:03.060 --> 00:24:07.740
And that was like thinking around this entire conversation, like that is the core thing that I think is at

00:24:07.740 --> 00:24:12.920
the crux of the problem here that made me like interested in this because the deep learning stuff

00:24:12.920 --> 00:24:20.820
and the ability to get a computer to understand the nuances that we don't even know exist out there and make

00:24:20.820 --> 00:24:21.660
better decisions.

00:24:21.660 --> 00:24:24.160
I think that that's, that's really powerful.

00:24:24.160 --> 00:24:27.060
And as a company, you would definitely want to employ that.

00:24:27.120 --> 00:24:34.200
If it's required that you say, well, it's because your credit to debt ratio was too far out of bounds.

00:24:34.200 --> 00:24:42.480
Like, how are you going to get a deep learning, like poorest sort of thing going on and go, well, it said this.

00:24:42.480 --> 00:24:45.160
It's like, well, the weights of the nodes were this.

00:24:45.220 --> 00:24:48.540
And so it said, no, that's not satisfying or even meaningful.

00:24:48.540 --> 00:24:48.900
Right.

00:24:48.900 --> 00:24:51.580
Like, I don't even know how, how do we get there?

00:24:51.580 --> 00:24:57.320
Maybe we get, maybe ultimately people would be willing to accept, like, I don't even know.

00:24:57.320 --> 00:25:03.500
Maybe there's something like, okay, it, it's made these decisions and it, we had these false positives and we had these false negatives.

00:25:03.840 --> 00:25:07.800
Like, almost like an AB testing or clinical trial type of thing.

00:25:07.800 --> 00:25:12.280
Like, we're going to let 10% of the people through on our old analysis anyway.

00:25:12.280 --> 00:25:17.200
And then compared to how the machine learning did on both like the positive and negative.

00:25:17.200 --> 00:25:17.900
I don't know.

00:25:17.900 --> 00:25:21.920
But like, how do you see that there's a possible way to get that answer?

00:25:21.920 --> 00:25:22.300
Yeah.

00:25:22.300 --> 00:25:23.360
There's a couple approaches.

00:25:23.360 --> 00:25:32.800
One of the main ones has been a lot of my research the past couple of years, which is you can provide assurance around the implementations because like a lot of the points that you just mentioned, we don't have that ability with a human.

00:25:32.800 --> 00:25:35.240
And you usually understand why did a loan officer give that loan either?

00:25:35.240 --> 00:25:40.780
Like, there's the type of understanding some people are asking from algorithms doesn't really exist with humans either.

00:25:40.780 --> 00:25:45.620
If you ask a human two weeks later why you made that exact decision, they're not going to say the same thing that they were thinking at that time.

00:25:45.620 --> 00:25:50.940
So you want to provide a trust and an audit trail and then transparency around an algorithm.

00:25:50.940 --> 00:25:58.080
Basically give it a history and show that it's been making reliable decisions and it's operating within the acceptable bounds for the inputs and the outputs.

00:25:58.580 --> 00:26:02.760
So being able to provide this holistic business understanding and process understanding is very huge.

00:26:02.760 --> 00:26:07.460
It's very it's not really as much of a tech problem as it is a business problem and a process problem.

00:26:07.460 --> 00:26:12.600
But also be able to provide the ability of this is what the algorithm saw when it made this decision.

00:26:13.260 --> 00:26:21.180
So for even a deep learning algorithm, say you are taking in FICO score and you're taking an age and you're taking in zip code and things like that to make this decision.

00:26:21.180 --> 00:26:26.100
Some of those are protected glasses, but you're taking in information by a consumer to understand.

00:26:26.100 --> 00:26:28.600
You don't need to see like what neural network layer said something.

00:26:28.700 --> 00:26:38.800
It's like based on these features, because your your FICO score is above 600 and because your zip code was in this high income area, we're going to approve this loan or not approve this loan.

00:26:38.800 --> 00:26:41.400
So that you can see with some research.

00:26:41.400 --> 00:26:44.480
Anchors is a great library that was open sourced a few years ago.

00:26:44.480 --> 00:26:51.500
It's a there's Python implementation and Selden Alibi has a library that's really, really good that has a production grade implementation.

00:26:51.720 --> 00:26:54.880
Can help you see what did the algorithm see when it made this decision?

00:26:54.880 --> 00:27:00.780
So you can start addressing that inside the whole bigger process of providing business and process understanding.

00:27:00.780 --> 00:27:01.260
I see.

00:27:01.260 --> 00:27:12.700
So there's all these different facets or features that come in and you can say you can't exactly say, you know, here's how the flow went through the code and here's the if statement.

00:27:12.700 --> 00:27:17.000
But you can say it keyed off of these three things.

00:27:17.120 --> 00:27:24.360
It keyed off the fact that you have three cars leased and you have this credit card or whatever.

00:27:24.360 --> 00:27:24.600
Right.

00:27:24.600 --> 00:27:28.420
Like it said, those are the things that like sort of helped it decide.

00:27:28.420 --> 00:27:29.000
Made a decision.

00:27:29.000 --> 00:27:29.700
Exactly.

00:27:29.700 --> 00:27:30.140
Exactly.

00:27:30.140 --> 00:27:39.480
So when you put that in tandem with understanding the whole process, providing the ability to go back and verify, you can start getting more assurance around the implementation and start getting comfortable with it.

00:27:39.480 --> 00:27:40.500
Because that's the other problem.

00:27:40.500 --> 00:27:42.980
A lot of times with these algorithms is like it makes a decision.

00:27:43.220 --> 00:27:47.360
And as a company, like how in the world if a customer asks, why did you make this decision about me six months ago?

00:27:47.360 --> 00:27:55.360
How are they going to go back and get the exact log file, make sure you have the exact model version, be able to rerun that specific model version and see all these types of information?

00:27:55.360 --> 00:28:02.960
It's not strictly just a tech problem of let me see which neural network layer was activated and which neuron.

00:28:03.100 --> 00:28:07.440
Like it's more process based and there's a lot of components to it.

00:28:07.440 --> 00:28:15.040
So that's where a lot of times we see right now in the data science community, people trying to solve this problem by building better explainability, by understanding that neural network better.

00:28:15.040 --> 00:28:19.240
But that's not really addressing the consumers ask on like, why did this make decision happen?

00:28:19.240 --> 00:28:21.700
How do I know that you just didn't arbitrarily make this?

00:28:21.700 --> 00:28:22.960
How do we know it's not biased?

00:28:22.960 --> 00:28:29.100
Those sorts of more fundamental issues aren't something you can just address by better neural network explainability tool.

00:28:29.100 --> 00:28:42.420
Yeah. Do you think there's somewhere out in the future, the possibility of like a meta network in the sense that like there is a neural network that looks at how the neural network is working and then tries to explain how it works?

00:28:42.420 --> 00:28:46.840
Like use AI to like get it, get it to sort of answer what the other AI did.

00:28:46.840 --> 00:28:49.060
There's definitely some things like that currently in research.

00:28:49.060 --> 00:28:54.220
There's a whole bunch of different good explainability libraries out there and people that are addressing specifically those types of problems.

00:28:54.860 --> 00:29:00.860
And that's Monotar is doing that as well, except for more of the business process side and the basic understanding about the model.

00:29:00.860 --> 00:29:03.560
It's really to get the consumers comfortable.

00:29:03.560 --> 00:29:10.200
It's going to be understanding and being able to prove why something was done, which is more than just specific deep learning interpretability.

00:29:10.200 --> 00:29:17.440
Because a lot of times these loan models and stuff are still like we're moving from a loan officer to trying to do a very basic machine learning model.

00:29:17.440 --> 00:29:19.940
Like they haven't even gotten to complexity of deep learning.

00:29:20.340 --> 00:29:21.720
So it's like incremental improvements.

00:29:21.720 --> 00:29:22.080
Right.

00:29:22.080 --> 00:29:27.440
You talked about your origins being in the big data era and all that.

00:29:27.440 --> 00:29:34.860
Is there some sort of statistical way that people might become, you think, comfortable with things like deep learning, making these decisions?

00:29:34.860 --> 00:29:41.600
Like, so for example, you've got all of the records of the mortgages in the US.

00:29:42.080 --> 00:29:57.520
If you were able to take your algorithm and run it across those as if it was trying to make the decision and then you have the actual data, kind of like the supervised learning you talked about going through and saying, OK, we're going to apply it to all of this.

00:29:57.520 --> 00:30:01.720
Maybe we don't share that data back with the company, but they give us the model.

00:30:01.720 --> 00:30:02.960
We run it through all of it.

00:30:03.020 --> 00:30:09.240
And it's within bounds of what we've deemed to be fair to the community or to the country.

00:30:09.240 --> 00:30:09.800
Definitely.

00:30:09.800 --> 00:30:12.520
So that's those are some definitely tests we can do.

00:30:12.520 --> 00:30:24.800
And ideally, if you're building a loan algorithm, you're going to want to see what those historical statistics are to make sure that our our model is doing, say, classification percentages to be in line with what's expected for the general population.

00:30:25.000 --> 00:30:34.220
So, for instance, if you're you're doing an algorithm, if you have basically this at certain FICO bands, FICO score bands, you're going to have X amount of acceptance.

00:30:34.220 --> 00:30:40.740
So if you start seeing your model starts really not accepting people that are in a certain range, like we can definitely start raising a flag.

00:30:40.740 --> 00:30:42.120
There's concept drift going on.

00:30:42.120 --> 00:30:44.060
So there's definitely tests you can do around there.

00:30:44.060 --> 00:30:46.580
There's a Fisher exact test that lets you check.

00:30:46.580 --> 00:30:52.040
Say if you have age, you don't really want to be using age as a indication or gender, for instance.

00:30:52.040 --> 00:30:53.460
But in some instances, you have to.

00:30:53.460 --> 00:30:59.940
So you can run a test to see if it's ever statistically significant that that one variable is negatively influencing the outcome.

00:30:59.940 --> 00:31:10.240
There are definitely ways you can make sure that the algorithm, based on the example you said with the loans, this is the amount of normally acceptance, not acceptance that we've had over the past X amount of years in America for loans.

00:31:10.240 --> 00:31:12.320
And that's kind of what we think is OK.

00:31:12.320 --> 00:31:15.260
There are definitely tests you can do, such as the Fisher exact test.

00:31:15.260 --> 00:31:22.680
It's a statistical test and others that you can do around making sure an algorithm isn't biased and it's going within the percentage that you're wanting it to do for acceptance.

00:31:22.680 --> 00:31:29.300
So there's a lot of tests that companies can do there, some that we're implementing, some that other people are implementing and a lot of things that people can do.

00:31:29.460 --> 00:31:39.200
But really, for the public to really start accepting that machine learning is OK, I think there really needs to be some sort of more government regulation that's at least even like rubber stamping that this is OK.

00:31:39.200 --> 00:31:40.900
And that we've looked at this algorithm.

00:31:40.900 --> 00:31:48.180
There needs to be like third party assurance and third party audits of algorithms or not even you don't have to share your code if it's like it's a proprietary thing.

00:31:48.400 --> 00:31:56.000
But just the understanding that this is doing what it should be doing and it's not discriminatory can be very important for people to be able to trust AI systems.

00:31:56.000 --> 00:31:57.420
I think it's going to be super important.

00:31:57.420 --> 00:32:14.820
And when I threw that idea out there, what I was thinking of was the XPRIZE stuff done around health care and breast cancer, where instead of you getting all the data, you submitted a Docker image with your algorithm that was then run and it couldn't communicate out.

00:32:14.820 --> 00:32:22.180
It was run against the data and then you got just the trained model out of it and then you could basically go from there.

00:32:22.180 --> 00:32:29.900
So like there was this sort of arbitrage of your model meets the data, but you don't ever see the data and no one else sees your model kind of thing.

00:32:29.900 --> 00:32:31.120
Definitely.

00:32:31.120 --> 00:32:33.700
It's just somebody needs to facilitate that.

00:32:33.700 --> 00:32:34.720
That's a trusted party.

00:32:34.720 --> 00:32:38.060
So like some sort of government regulation to enable that kind of thing.

00:32:38.060 --> 00:32:48.960
But those sorts of processes will definitely start allowing people to trust the system and allow state regulators and things to be able to be signing off on systems and being comfortable to let the public enjoy better insurance policies.

00:32:48.960 --> 00:32:49.720
Right.

00:32:49.720 --> 00:32:51.820
But we need to have some sort of assurance to get there.

00:32:54.060 --> 00:33:00.100
This episode of Talk Python to Me is brought to you by me, Reuven Lerner, and Weekly Python Exercise.

00:33:00.100 --> 00:33:04.260
You want to do more with less code or just write more idiomatic Python?

00:33:04.260 --> 00:33:06.900
It won't happen on its own or even from a course.

00:33:06.900 --> 00:33:08.760
Practice is the only way.

00:33:08.760 --> 00:33:13.480
Now in its fourth year, Weekly Python Exercise makes you a more fluent developer.

00:33:13.480 --> 00:33:20.500
Between pytest tests, our private forum, and live office hours, your Python will improve one week at a time.

00:33:21.100 --> 00:33:23.900
Developers of all levels rave about Weekly Python Exercise.

00:33:23.900 --> 00:33:29.080
Get free samples and see our schedule at talkpython.fm/exercise.

00:33:29.080 --> 00:33:40.780
So you talked a little bit about some of the Python libraries that are out there to help people understand how these models are making decisions and understand a little bit better.

00:33:40.780 --> 00:33:48.380
Some other things that are probably, like we could talk about some of the other things you're doing and, you know, some things that might matter.

00:33:48.440 --> 00:33:57.600
It's like you talked about the company that if I asked them six months later, why did you decide this thing, right?

00:33:57.600 --> 00:34:00.740
They probably still have all of my form fields I filled out.

00:34:00.740 --> 00:34:02.900
Like my credit history is X.

00:34:02.900 --> 00:34:05.860
My average income is whatever.

00:34:05.860 --> 00:34:07.340
I've been in the job for this long.

00:34:08.180 --> 00:34:13.180
But how would they go back and run that against the same code, right?

00:34:13.180 --> 00:34:16.580
So maybe what they would do is they would go and say, well, let me just ask the system again.

00:34:16.580 --> 00:34:18.380
What would the answer be now and why?

00:34:18.380 --> 00:34:21.880
But they may have completely rolled out a new version of code, right?

00:34:21.880 --> 00:34:23.340
It might not even do the same thing.

00:34:23.480 --> 00:34:33.400
And I'm sure they don't have like retroactive versions of like the entire infrastructure of the insurance company around to go back and run it exactly.

00:34:33.400 --> 00:34:35.900
Maybe they do, but probably not in a lot of companies.

00:34:35.900 --> 00:34:40.680
Like the ability to almost a version control, but like production, right?

00:34:40.680 --> 00:34:51.240
So knowing how the model changes over time, like what else do you guys need to look at or keep track of to be able to like give that kind of, you know, why did this happen?

00:34:51.240 --> 00:34:51.880
Answer.

00:34:51.880 --> 00:34:52.380
Definitely.

00:34:52.380 --> 00:34:57.760
And that's the key part is summarizing all information and being able to replay that exact algorithm with the decision.

00:34:57.760 --> 00:35:02.040
So to be able to do that, you really have to have exact feature inputs.

00:35:02.040 --> 00:35:11.960
Like what exactly did the user say, the exact model version, you need to know the, so like the model object file, the pickle file, something like that, the exact production Python code.

00:35:11.960 --> 00:35:16.820
Then you have to have the actual version of the library used in the same environment.

00:35:16.820 --> 00:35:25.280
So there's a lot of things going on there and the amount of logging and that you have to have in place there and have it very easily to be accessible and non tampered with and be able to recreate that environment.

00:35:25.400 --> 00:35:26.700
That's a hard technical problem.

00:35:26.700 --> 00:35:35.760
A lot of companies don't have in addition to just having like the, how do I understand that the exact, some sort of interpretability for the decision and then the metrics and monitoring around it.

00:35:35.760 --> 00:35:36.520
That's a big ask.

00:35:36.520 --> 00:35:39.720
And that's where a lot of companies are struggling when they start hitting these regulatory audits.

00:35:39.720 --> 00:35:40.020
Sure.

00:35:40.020 --> 00:35:52.200
Well, I can imagine it's really tricky to say, all right, well, when our app started up back then and we went back and we looked in the logs and it said this version and this version, this version of the library, but maybe somebody forgot the log.

00:35:52.200 --> 00:35:54.540
Oh yeah, there's a dependency library that actually matters.

00:35:54.540 --> 00:36:04.560
And it was that version where we were running on this version of Python and its implementation of, you know, floating points slightly, slightly changed.

00:36:04.560 --> 00:36:06.560
And it had some effect or I, you know, I don't know.

00:36:06.560 --> 00:36:06.760
Right.

00:36:06.760 --> 00:36:08.320
Like there's just all these things.

00:36:08.320 --> 00:36:17.520
And it seems like there's probably some analogies here to the reproducibility challenges and solutions that science even has.

00:36:17.520 --> 00:36:18.020
Exactly.

00:36:18.180 --> 00:36:23.040
It's along the exact same lines, but it's even more exacerbated and more difficult to solve.

00:36:23.040 --> 00:36:27.140
Because if you think about science, it's like you're trying to reproduce the results of one paper.

00:36:27.140 --> 00:36:30.880
You know, if it's a biological experiment, it might be a little harder to recreate those conditions.

00:36:30.880 --> 00:36:36.360
But in computer science, like you should be able to save the seed, send a Docker file with everything and rerun your algorithm results.

00:36:36.360 --> 00:36:36.620
Right.

00:36:37.060 --> 00:36:45.120
But people are even having a hard time to do that because the amount of process and forethought into how you're going to be packaging things, how you're going to be setting things up, it's a lot to deal with.

00:36:45.120 --> 00:36:48.520
But like reproducibility is just a major crisis in science right now, too.

00:36:48.620 --> 00:36:56.000
But it's really hitting the corporate environment really hard to be able to provide that kind of ability around the algorithms to be able to answer questions.

00:36:56.000 --> 00:37:00.600
Because if like you want to implement these things, people are going to want to have audits and they're going to want to understand why something was done.

00:37:00.600 --> 00:37:04.780
So exactly what's happening in the scientific community, except even on a larger scale.

00:37:04.780 --> 00:37:10.600
Now, two things you talk about when you look at Monitar's website, talking about what you all do there.

00:37:10.600 --> 00:37:13.660
One is counterfactuals.

00:37:13.660 --> 00:37:19.020
And the other is detecting when model and feature drift occur.

00:37:19.020 --> 00:37:20.900
Do you want to address those problems a little?

00:37:20.900 --> 00:37:21.580
Yeah, definitely.

00:37:21.580 --> 00:37:27.500
So counterfactuals is basically what we've just described is the re-performance of an exact transaction.

00:37:27.500 --> 00:37:30.720
So we record all those things, the versioning of all the files.

00:37:30.840 --> 00:37:43.040
So when you go back six months from now, six years from now, and go and select on a specific transaction, if it's a tabular transaction, we can actually hit a button and we will pull all that stuff up in a Docker container and rerun it.

00:37:43.040 --> 00:37:48.840
So that's what we're calling counterfactuals is the ability to go back and re-perform a transaction and then perform what-if analysis.

00:37:48.840 --> 00:37:53.180
Say like one of the variables said your income was $200,000.

00:37:53.180 --> 00:37:59.160
And if you want to change it to $150,000, you can go do that and rerun the transaction as well off of that old version.

00:37:59.240 --> 00:38:03.540
So it allows you to do the sensitivity analysis if a consumer is like, well, what if my income was slightly different?

00:38:03.540 --> 00:38:08.880
But also re-perform for audit tracing that it's doing exactly what it said it was going to do.

00:38:08.880 --> 00:38:09.140
Sure.

00:38:09.140 --> 00:38:09.540
Okay.

00:38:09.540 --> 00:38:11.520
And then model drift.

00:38:11.520 --> 00:38:11.780
Yeah.

00:38:11.780 --> 00:38:15.560
Model drift is exactly like what we were talking about a few minutes ago with you have loans.

00:38:15.560 --> 00:38:18.620
Normally 60% of loans are rejected, 40% are accepted.

00:38:18.620 --> 00:38:21.300
And that's kind of the average for this risk class.

00:38:21.300 --> 00:38:27.980
Model drift will allow to see in a monitor platform when your model has started to drift out of those bounds.

00:38:27.980 --> 00:38:34.940
If you say, I'm okay if the model's in between 75% classification and 50% classification of you don't get a loan of rejection.

00:38:34.940 --> 00:38:40.540
And if we start slipping to 80% of loans are now being rejected, we're going to throw you alerts and say, hey, your model has drifted.

00:38:40.540 --> 00:38:41.660
Something is wrong here.

00:38:41.660 --> 00:38:41.820
I see.

00:38:41.820 --> 00:38:42.460
Yeah.

00:38:42.520 --> 00:38:52.200
So it's not necessarily detecting that the model is making some variation in its predictions, but it's saying you've set these bounds.

00:38:52.200 --> 00:38:54.880
And if it's outside of these bounds, like something is wrong.

00:38:54.880 --> 00:38:56.120
Let us know.

00:38:56.480 --> 00:38:56.760
Exactly.

00:38:56.760 --> 00:39:01.260
And that will be saying that the algorithm is making kind of a drift in what it's supposed to be predicting.

00:39:01.260 --> 00:39:02.400
Same with features.

00:39:02.400 --> 00:39:10.540
There's a lot of times, there's a great paper a couple of years ago by Google called the high debt or high credit card debt or something in machine learning implementations.

00:39:10.780 --> 00:39:20.000
The name's not quite right there, but it's a very popular paper on technical debt, whereas basically a lot of times when you have an algorithm, a lot of the code around it is where your issues are going to happen.

00:39:20.000 --> 00:39:33.200
So if you have a model that's used to having features between 1 and 10, and you start having a drift with a mean of 5, and you start having a drift up, that can be affecting the model's outputs, but you can't really detect it until it's too late.

00:39:33.200 --> 00:39:49.920
So what Monotar does is we allow you to look at the feature drift and see, like, if I'm expecting this feature between 1 to 2 standard deviations of 5, a mean of 5, and you start getting higher numbers out of there, we'll know, like, hey, there's feature drift, which means your model is not performing in the same environment that it was built for.

00:39:50.040 --> 00:39:55.280
So we'll be able to know that, hey, you need to go look at this, the situation your model's built for has changed.

00:39:55.280 --> 00:40:03.100
Because a lot of times when these models start misbehaving, it's not because the model code has changed, quote unquote, it's because the environment that it was built for has changed.

00:40:03.100 --> 00:40:04.460
You're no longer in that same environment.

00:40:04.460 --> 00:40:09.620
I see. And that's one of the real challenges of the whole ML story, right?

00:40:09.620 --> 00:40:18.200
Is the model's good at answering the question the model was intended to answer, but it may be completely inappropriate for something else, right?

00:40:18.300 --> 00:40:27.400
Like, self-driving cars work great in Arizona, but you put them in the snow and they can't see the road anymore because they don't know to look at snowy roads or whatever, right?

00:40:27.560 --> 00:40:30.280
Exactly. Because algorithms are extremely, extremely dumb.

00:40:30.280 --> 00:40:36.260
Like, people are trying to make them better with this transfer learning and some of the semi-supervised learning and things.

00:40:36.260 --> 00:40:39.660
But when it comes down to the root of it, there is no thinking going on here.

00:40:39.660 --> 00:40:42.840
It doesn't matter how accurate it is at detecting cancer and radiology images.

00:40:42.840 --> 00:40:46.780
There is no thinking. It's a dumb algorithm that's made for a specific set of circumstances.

00:40:46.780 --> 00:40:49.720
It can be fantastic as long as those circumstances change.

00:40:50.000 --> 00:40:55.840
But that's where the key problem happens in production is those circumstances no longer hold true, so your model starts performing badly.

00:40:55.840 --> 00:40:59.600
Your model is doing the exact same thing it was trained to do, just the inputs are different.

00:40:59.600 --> 00:41:13.540
I see. So one of the things that might be important is to keep track of the input range and document that in the early days and then keep checking in production that it still holds true.

00:41:13.780 --> 00:41:14.180
Exactly.

00:41:14.180 --> 00:41:15.340
Exactly.

00:41:15.340 --> 00:41:19.000
And that's where we can start testing for bias and things like that as well.

00:41:19.000 --> 00:41:28.420
Seeing with one specific variable, such as we mentioned, gender variable, if that starts being a key influencer for a decision, we also know something is up there.

00:41:28.420 --> 00:41:30.400
So you can start doing proactive bias monitoring.

00:41:30.400 --> 00:41:39.180
Interesting. So maybe you don't actually want to take gender into account, but it's somewhere in the algorithm or somewhere in the data.

00:41:39.640 --> 00:41:44.400
And you can test whether or not it actually seems to be influencing.

00:41:44.400 --> 00:41:46.640
Like you can detect bias in a sense.

00:41:46.640 --> 00:41:48.380
Exactly. You choose whatever feature.

00:41:48.380 --> 00:41:52.160
Like if you're building an algorithm, you should try and never have gender or something like that in there.

00:41:52.160 --> 00:41:56.760
But occasionally, such as you think of cancer screening, you're going to have to include that because that's a key component.

00:41:57.040 --> 00:42:00.200
I would argue you should never include gender if you're doing a credit card application.

00:42:00.200 --> 00:42:05.800
But if you're doing something more fundamental, like cancer screening or something, you still need to be able to have those sorts of things.

00:42:05.800 --> 00:42:13.280
Or like if it's a image recognition algorithm, you're going to have to include race just because it may may different skin tones may affect the results of the algorithm.

00:42:13.280 --> 00:42:16.060
Doesn't mean we're trying to be any sort of bias.

00:42:16.200 --> 00:42:18.680
But that's why you want to have these controls to make sure bias doesn't occur.

00:42:18.680 --> 00:42:23.540
We've had a lot of radiology implementations that won't work as well on certain individuals.

00:42:23.540 --> 00:42:27.120
So like there's all these things that machine learning can improve everybody's lives.

00:42:27.120 --> 00:42:32.140
We just need to have the right safeguards in place because almost every single company is deployed machine learning.

00:42:32.140 --> 00:42:33.680
None of them are trying to be discriminatory.

00:42:33.680 --> 00:42:35.000
That's not the purpose.

00:42:35.000 --> 00:42:37.620
They just there's these things that will happen and they're not aware of.

00:42:37.620 --> 00:42:43.580
So it's just making sure you have a controlled environment to make sure that doesn't happen or being able to catch it when it does happen so you can fix it.

00:42:43.580 --> 00:42:44.680
Yeah, that's pretty awesome.

00:42:44.680 --> 00:42:56.740
What else should people be thinking about that they either need to be doing, logging, trying to use libraries to help with, especially in production with their models?

00:42:56.740 --> 00:43:01.620
Definitely like the structure of how you do your models is very important.

00:43:01.820 --> 00:43:11.480
I'm a huge advocate of using microservices and Docker containers and trying to do, especially for these complicated deployments, make as many of the services possible.

00:43:11.480 --> 00:43:17.300
So it's just like sometimes you want to have your algorithm is strictly in a container and then it will interact with your logic in a different container.

00:43:17.300 --> 00:43:23.940
Because when sometimes you have everything combined into one area is when you can start having like that technical debt and things build up.

00:43:23.940 --> 00:43:26.640
And it's very hard to figure out what's broken and where is it broken.

00:43:26.980 --> 00:43:32.940
So being able to keep things as separate as possible in complex deployments really helps to figure out the root cause.

00:43:32.940 --> 00:43:33.880
Yeah, that's interesting.

00:43:33.880 --> 00:43:40.560
Because if you have it kind of mixed in to your app, you probably deployed 10 versions of your app.

00:43:40.560 --> 00:43:50.180
Did any of them actually affect the model or was that just like to change some other aspect of an API or some aspect of the website or something like that that is just,

00:43:50.180 --> 00:43:51.680
well, we changed how users log in.

00:43:51.680 --> 00:43:53.120
They can now log in with Google.

00:43:53.260 --> 00:43:56.780
But that didn't affect the machine learning model that we're deploying.

00:43:56.780 --> 00:44:01.700
But if you have it as a separate API endpoint that's its own thing, then you know.

00:44:01.700 --> 00:44:02.500
Exactly.

00:44:02.500 --> 00:44:08.140
It's the same thing from like science is how do you get rid of the confounding variables and do as much of a control test as possible.

00:44:08.140 --> 00:44:12.900
So the more you can have those things that you know what's changing, you'll know what to fix when something breaks.

00:44:13.300 --> 00:44:19.140
So having those sorts of architectures and really coming in with document, document, document and auditing because I'm an ex-auditor.

00:44:19.140 --> 00:44:21.280
If it wasn't documented, it doesn't exist.

00:44:21.280 --> 00:44:24.100
Not really true, but that's how auditors look at things.

00:44:24.100 --> 00:44:29.960
But that's extremely important to do when you're working in regulated industries or you're working in areas when you're building these models.

00:44:30.140 --> 00:44:31.660
You need to document all your assumptions.

00:44:31.660 --> 00:44:32.620
Where's your data coming from?

00:44:32.620 --> 00:44:36.820
All of this, the plain businessy things that, frankly, data scientists hate to do.

00:44:36.820 --> 00:44:38.500
I hate to do it as well.

00:44:38.500 --> 00:44:47.020
But you need to have that kind of thing for being able to show other people, different stakeholders, and also, frankly, even cover yourself if something comes back later.

00:44:47.020 --> 00:44:51.040
Like, here's why we did something and helping you remember if you look at this code six months ago.

00:44:51.040 --> 00:44:57.320
So just the documentation, having these sorts of planned way that you're building an algorithm instead of just agileing.

00:44:57.320 --> 00:45:03.240
Agile is great, but you have to have an overarching plan for some of these things instead of just MVP it until it hits production.

00:45:03.240 --> 00:45:03.860
Sure.

00:45:03.860 --> 00:45:09.680
And while it also sounds like a little bit of the guidance is also just good computer science.

00:45:09.680 --> 00:45:10.560
Right, exactly.

00:45:10.560 --> 00:45:17.400
It's a very big problem in this space is data scientists are not normally good software engineers.

00:45:17.400 --> 00:45:21.900
So a lot of the great software engineering practices haven't really translated into machine learning code.

00:45:21.900 --> 00:45:25.440
There's becoming a big trend towards that with machine learning engineers and things.

00:45:25.440 --> 00:45:27.680
So we're definitely trending in the right direction.

00:45:27.680 --> 00:45:33.720
But still, there's a lot of models that get deployed that don't have the engineering rigor that is common in the Python community.

00:45:33.720 --> 00:45:39.020
Like, there's certain ways we do things with CICD, with unit tests and stuff, and a lot of machine learning code doesn't have those things.

00:45:39.020 --> 00:45:43.100
Sometimes you can't really, in a non-deterministic outcome, it's hard to unit test.

00:45:43.100 --> 00:45:49.120
But there's a lot of the processes of good engineering that we can apply to help make everybody's lives easier and better AI deployments.

00:45:49.120 --> 00:45:49.460
Sure.

00:45:49.460 --> 00:45:51.540
I was thinking about that as well in testing.

00:45:51.540 --> 00:45:55.580
And it just seems like that's one of the areas that is tricky to test.

00:45:55.580 --> 00:46:05.440
Because it's not like, well, if I set the user to none, and then I ask to access this page, I want to make sure it does a redirect over to the login page.

00:46:05.500 --> 00:46:09.100
If I set the user to be logged in, and I ask the same thing, I want them to go to their account page.

00:46:09.100 --> 00:46:11.780
And so it's super easy to write that test.

00:46:11.780 --> 00:46:14.060
User's none, call the test, check the outcome.

00:46:14.060 --> 00:46:15.940
User's this, call the test.

00:46:15.940 --> 00:46:18.900
What about testing these ML models?

00:46:18.900 --> 00:46:21.720
It's definitely a challenge, and it's a developing art.

00:46:21.720 --> 00:46:25.840
And it's more of a science at the moment than, more of an art than a science.

00:46:26.360 --> 00:46:35.080
But one of the ways you could do it is kind of like how we talk about ranges, is you'll say, like, to unit test a machine learning algorithm, like, hey, we're expecting a range between this and this, and we're acceptable.

00:46:35.080 --> 00:46:36.220
These are acceptable ranges.

00:46:36.220 --> 00:46:38.260
And you just use it there.

00:46:38.260 --> 00:46:42.060
You won't know if 100% is working, because you would have to look at over time to see if there's drift.

00:46:42.060 --> 00:46:53.220
But you'll be able to say, like, hey, if it's a regression problem, meaning I have an output between 1 and 10, and it's supposed to be a 5, if I'm hitting an 8 or a 2, that means something's probably not quite right.

00:46:53.220 --> 00:46:55.460
You can put some sort of ranges for your unit test.

00:46:55.520 --> 00:47:02.780
So it can't be quite the deterministic, like, I know exactly this test is failing or not, but at least can give you some assurance and some heads up for your curve.

00:47:02.780 --> 00:47:03.360
Sure.

00:47:03.360 --> 00:47:08.520
So maybe it feels a lot like testing scientific things as well, right?

00:47:08.520 --> 00:47:13.340
You can't say, if I call the, you know, estimate, whatever, right?

00:47:13.340 --> 00:47:20.480
And I give it these inputs, all floating point numbers, I can't say it's going to be equal to 2, or it's going to be false.

00:47:20.480 --> 00:47:22.860
It's probably some number that comes out.

00:47:22.960 --> 00:47:31.180
And you're willing to allow a slight variation, because, like, the algorithm might evolve, and it might actually be more accurate if it's a little, you know, if it's slightly better.

00:47:31.180 --> 00:47:35.700
But it's got to basically be within a hundredth of this other number, right?

00:47:35.700 --> 00:47:37.280
So you've got to do, like, range.

00:47:37.940 --> 00:47:40.740
Like, it's in this range, and it's not too far out.

00:47:40.740 --> 00:47:42.400
I guess it's probably similar.

00:47:42.400 --> 00:47:57.280
What do you think about, like, a hypothesis or other sort of, like, automatic property-based testing where you could say, give me some integers here in this range and give me one of these values and sort of give me a whole bunch of examples and let's test those.

00:47:57.480 --> 00:48:02.000
That's definitely a very good, sensitivity analysis is a very good way to do that.

00:48:02.000 --> 00:48:04.320
So, like, ideally, you'd want to do that kind of testing.

00:48:04.320 --> 00:48:05.240
So it's better.

00:48:05.240 --> 00:48:06.580
You can't really unit test as well.

00:48:06.580 --> 00:48:07.460
Like, we talked about that.

00:48:07.520 --> 00:48:17.620
But sensitivity analysis testing is fantastic to do, which is, like, here are the different scenarios, here are the different users we would get, and run a bunch of them with slight variations through your model and see if it performs as it should.

00:48:17.620 --> 00:48:22.040
So that is definitely a very good way to test a model, and you should never deploy a model without doing that.

00:48:22.040 --> 00:48:29.040
The other test you can kind of do is you can't really unit test machine learning code too accurately, but you can unit test data.

00:48:29.040 --> 00:48:33.680
So there's a couple good libraries out there that will help you unit test data right now.

00:48:33.680 --> 00:48:39.100
I think marbles might be one in Python, and I think there's great expectations that you see, like, is the schema right?

00:48:39.100 --> 00:48:45.560
So that's really huge because, like, this is supposed to be an int, or is this supposed to be a float, or, like, a float between these numbers and things.

00:48:45.560 --> 00:48:47.540
So you can really check the input data too.

00:48:47.540 --> 00:48:52.560
So there are some tests you can do a little more accurately around your data in addition to the sensitivity analysis.

00:48:52.560 --> 00:48:53.040
Yeah.

00:48:53.040 --> 00:48:54.080
Great expectations.

00:48:54.080 --> 00:48:54.600
It's cool.

00:48:54.600 --> 00:49:03.500
It's got things like expect column values to be not null, or expect column values to be unique, or expect them to be, you know,

00:49:03.520 --> 00:49:04.960
between such and such values.

00:49:04.960 --> 00:49:05.740
That's pretty cool.

00:49:05.740 --> 00:49:06.080
Yeah.

00:49:06.080 --> 00:49:07.320
So it's a very cool library.

00:49:07.320 --> 00:49:08.800
Very good to do that kind of testing.

00:49:08.800 --> 00:49:09.140
Yeah.

00:49:09.140 --> 00:49:15.260
I suppose if you have some sort of, you talked about unexpected situations as, like, it's, you know,

00:49:15.260 --> 00:49:22.560
my analogy was it's built for dry, sunny roads, not snowy roads, and automatic car driving or automated driving.

00:49:22.560 --> 00:49:30.680
But similarly, it's probably not built to take a null or none in a spot where it expected a number, right?

00:49:30.680 --> 00:49:31.920
What's it going to do with that, right?

00:49:32.040 --> 00:49:37.000
And that's, it's great to do the looking for the, using a Taleb example, black swans.

00:49:37.000 --> 00:49:40.160
Like, it's good to test your model for when something bad is going to happen.

00:49:40.160 --> 00:49:43.960
Like, even just to do the correct try accepts type statements around it.

00:49:43.960 --> 00:49:47.340
For like, here is something crazy that should never happen, but let's see what happens anyway.

00:49:47.340 --> 00:49:48.480
So we can handle that.

00:49:48.600 --> 00:49:53.000
So we don't become the next news story with this company really screwed up type scenario.

00:49:53.000 --> 00:49:58.280
So it's good to do those types of tests that are just like, we hopefully never have this, but what happens if?

00:49:58.280 --> 00:50:02.500
So we can build the type of exceptions and logic around those sorts of crazy scenarios.

00:50:02.500 --> 00:50:03.220
Sounds good.

00:50:03.480 --> 00:50:10.580
Sounds like a lot of things that people can go figure out, you know, find some of these libraries, try to add some of these techniques.

00:50:10.580 --> 00:50:13.120
I mean, you guys over at Monitor are doing this.

00:50:13.120 --> 00:50:15.040
You're not quite out of Techstars yet.

00:50:15.040 --> 00:50:16.740
You know, people can check you out.

00:50:16.740 --> 00:50:23.240
What else would you recommend that folks maybe look into to make this practical in their environments?

00:50:23.400 --> 00:50:27.120
So definitely those two great expectations is great for like unit testing your data.

00:50:27.120 --> 00:50:30.260
Alibi is a library that has anchors implementation.

00:50:30.260 --> 00:50:32.300
I was talking about anchors at explainability library.

00:50:32.300 --> 00:50:35.820
It basically gives you if statements on why a transaction was done.

00:50:35.820 --> 00:50:39.340
It's kind of a 2.0 from Lime, which is a very popular data explanation.

00:50:39.340 --> 00:50:40.360
Those are some good ones.

00:50:40.360 --> 00:50:42.480
Shapp is a great library as well.

00:50:42.480 --> 00:50:47.720
It gives you, it's game theory based for giving you values of what's, how your algorithm is interpreting a decision.

00:50:47.720 --> 00:50:48.280
That's cool.

00:50:48.280 --> 00:50:48.600
Yeah.

00:50:48.600 --> 00:50:51.140
So it's a very, very cool, cool library.

00:50:51.140 --> 00:50:53.040
And then it's just really starting to apply.

00:50:53.040 --> 00:50:56.480
So that's all of the like explainability type scenarios.

00:50:56.480 --> 00:51:00.340
Py Metrics has a very cool library that does some bias detection.

00:51:00.340 --> 00:51:02.720
That's something that's really cool to check out.

00:51:02.720 --> 00:51:06.680
Quintus is as a bias and fairness audit toolkit that is really cool.

00:51:06.680 --> 00:51:07.980
And that's out of the University of Chicago.

00:51:07.980 --> 00:51:12.400
So those are some good ways to get started on how to provide more assurance around your models.

00:51:12.400 --> 00:51:16.580
And then just really, you want to be doing Docker, you want to be doing best practices on your engineering.

00:51:16.580 --> 00:51:17.040
Yeah.

00:51:17.040 --> 00:51:18.640
So some of the good places to start.

00:51:18.640 --> 00:51:19.340
Yeah, absolutely.

00:51:19.460 --> 00:51:30.320
You know, something that came to mind while you're talking, there's this thing called missing no, missing N O, which is a Python library for visualizing missing numbers and data.

00:51:30.320 --> 00:51:32.640
And so you can give it interesting.

00:51:32.900 --> 00:51:35.260
I think Panda data frames or something like that.

00:51:35.260 --> 00:51:42.200
And it gives you these graphs with like continuous lines if it's everything's good or like little marks where there's like bad data stuff.

00:51:42.200 --> 00:51:49.480
It's a pretty cool library for like trying to just like throw your data set at it and visually quickly identify if something's busted.

00:51:49.480 --> 00:51:50.280
Well, that's very cool.

00:51:50.280 --> 00:51:51.540
I have not heard that one.

00:51:51.540 --> 00:51:52.360
I'm going to go check that out.

00:51:52.360 --> 00:51:53.260
Yeah, absolutely.

00:51:53.500 --> 00:51:54.080
Yeah, absolutely.

00:51:54.080 --> 00:51:55.020
Absolutely.

00:51:55.020 --> 00:51:55.400
Okay.

00:51:55.400 --> 00:51:57.780
One final machine learning question, I guess.

00:51:57.780 --> 00:51:58.760
Keep it positive.

00:51:58.760 --> 00:52:01.140
Like what are the opportunities for this?

00:52:01.140 --> 00:52:09.400
You talked about how many companies are kind of at the rudimentary stage of this modeling and predictability.

00:52:09.920 --> 00:52:14.260
And there's obviously a lot more way to go with the challenges that the whole episode basically has been about.

00:52:14.260 --> 00:52:16.160
But what do you see as the big opportunities?

00:52:16.160 --> 00:52:17.560
You get this.

00:52:17.560 --> 00:52:19.340
You get these key components, right?

00:52:19.340 --> 00:52:25.460
You're going to be able to start having companies providing cheaper insurance, better medical screening.

00:52:25.460 --> 00:52:27.060
I mean, self-driving cars.

00:52:27.060 --> 00:52:30.900
These are all the types of things that are really going to help society and humanity when we get right.

00:52:30.900 --> 00:52:33.960
There's just a few things like when we talked about to get ironed out.

00:52:33.960 --> 00:52:35.940
But really, like the future is very bright.

00:52:35.940 --> 00:52:37.540
Machine learning can do some great things.

00:52:38.000 --> 00:52:40.620
And like together as a community, we'll be able to make that happen.

00:52:40.620 --> 00:52:41.400
Yeah, awesome.

00:52:41.400 --> 00:52:50.460
And, you know, so you talked also at the beginning about your Techstars experience and you guys are going through and launching Monotar through Techstars.

00:52:50.460 --> 00:52:51.280
And that's awesome.

00:52:51.280 --> 00:52:55.700
But I suspect that a bit of a change has happened.

00:52:55.700 --> 00:53:00.160
Like Techstar, to me, these incubators and accelerators are all about like coming together.

00:53:00.160 --> 00:53:02.740
It's almost like a hackathon, but for a company.

00:53:02.740 --> 00:53:07.900
You come together with all these people and it's just like this intense period where you're all together and working with mentors.

00:53:07.900 --> 00:53:09.740
And co-founders and creating.

00:53:09.740 --> 00:53:12.600
And we've all been told to not do that.

00:53:12.600 --> 00:53:14.120
So how's that work?

00:53:14.120 --> 00:53:18.140
Well, Techstars did a very good job of switching to online.

00:53:18.140 --> 00:53:19.600
So still all the same things happen.

00:53:19.600 --> 00:53:23.900
They even started setting up some like water cooler sessions and stuff for people to kind of just chat informally.

00:53:24.620 --> 00:53:28.860
And so they moved to online to Zoom very, very early on and did a very smooth transition.

00:53:28.860 --> 00:53:29.980
So that was very good.

00:53:29.980 --> 00:53:31.880
They did a great job of handling it.

00:53:31.880 --> 00:53:37.120
We're having a virtual demo day in two weeks, end of April, which is not ideal.

00:53:37.120 --> 00:53:40.880
But we're going to also have a live one in September as well, I believe.

00:53:40.880 --> 00:53:48.900
So they're really going above and beyond to help with the transition for companies and doing a bunch of extra like extra classes around fundraising during COVID and things like that.

00:53:48.900 --> 00:53:54.860
So Techstars has done a great job transitioning and they had all the technology in place to make it a pretty smooth transition.

00:53:54.860 --> 00:53:59.260
So even though it's not the same, like the quality is still very much there and we're still getting a lot out of it.

00:53:59.260 --> 00:54:00.100
Yeah, that's good.

00:54:00.100 --> 00:54:01.200
I'm glad that it's still going.

00:54:01.200 --> 00:54:03.160
I mean, it won't be rough.

00:54:03.160 --> 00:54:04.680
How did you get into it?

00:54:04.680 --> 00:54:11.440
You talked about, I don't remember the exact numbers, but it sounded like 101 applicant to acceptance ratio or something like that.

00:54:11.440 --> 00:54:19.680
Like if people out there are listening, they're like, I would love to go through something like Techstars, maybe ideally in a year when it's back to its in-person one.

00:54:19.680 --> 00:54:21.100
But even not, right?

00:54:21.100 --> 00:54:22.200
I'm sure it'd be super helpful.

00:54:22.200 --> 00:54:23.700
What was that experience like?

00:54:23.700 --> 00:54:27.000
How do you decide to get going and how'd you get in there?

00:54:27.000 --> 00:54:34.520
So we did some networking around from the Boston area, kind of like meeting some of the founders in the area, different VC firms, kind of like early socializing.

00:54:34.880 --> 00:54:36.160
Like who we are, what we're about.

00:54:36.160 --> 00:54:37.780
So we kind of got our name around.

00:54:37.780 --> 00:54:46.280
So starting like the socialization with the, there's Techstars in different cities, kind of like getting kind of the community of founders and startups, kind of get yourself known.

00:54:46.280 --> 00:54:48.060
And then it's the application process.

00:54:48.060 --> 00:54:49.840
You just need to really rock that.

00:54:49.840 --> 00:54:53.960
And then just being part of the community and doing a good application and have a compelling story.

00:54:54.520 --> 00:55:02.600
But definitely the pre-networking before the application to kind of like start showing who you are, do as many meetups as you can, like get your name out there.

00:55:02.600 --> 00:55:06.260
So if someone has heard of you when they, when the application comes across, it's definitely helpful.

00:55:06.260 --> 00:55:08.480
Yeah, I'm sure they, oh, I've heard of them.

00:55:08.480 --> 00:55:09.800
They actually were doing something pretty cool.

00:55:09.800 --> 00:55:16.580
I talked to Andrew or something like that kind of stuff goes really far, farther than it seems like it should necessarily just like on the surface.

00:55:16.580 --> 00:55:17.500
But yeah, absolutely.

00:55:17.500 --> 00:55:22.440
There are definitely companies that got into this year's Techstars program that had, that were completely cold.

00:55:22.440 --> 00:55:25.120
So it's not like, it's just based on the merit.

00:55:25.120 --> 00:55:28.920
We happen to be in the area and had the ability to network with people before.

00:55:28.920 --> 00:55:34.320
But honestly, like if you're in Techstars or not in Techstars, you should be doing what we were doing anyway, socializing your company.

00:55:34.320 --> 00:55:38.580
So like they are very, very sure they're a fair company.

00:55:38.580 --> 00:55:40.980
Several of our companies are female only led.

00:55:40.980 --> 00:55:46.880
So it's like they're doing a great job of being a fair and inclusive place that's trying to be as non-biased as possible with acceptance.

00:55:47.140 --> 00:55:48.040
Yeah, that's excellent.

00:55:48.040 --> 00:55:57.140
I suspect that machine learning in particular and data science type of companies in general are probably pretty hot commodities at the moment.

00:55:57.140 --> 00:56:04.520
And you have an above average chance of getting into these type of things, just looking in from the outside.

00:56:04.520 --> 00:56:09.080
So even a lot of listeners who particularly care about this episode, maybe they've got a good chance.

00:56:09.080 --> 00:56:09.680
Definitely.

00:56:09.680 --> 00:56:10.540
And let me know.

00:56:10.540 --> 00:56:11.740
I'd love to hear your idea as well.

00:56:11.740 --> 00:56:12.140
Awesome.

00:56:12.140 --> 00:56:12.820
Feel free to reach out.

00:56:12.820 --> 00:56:14.000
I'm still connected in the space.

00:56:14.000 --> 00:56:15.260
So let me know.

00:56:15.260 --> 00:56:19.300
But definitely innovative AI ML companies are definitely big right now.

00:56:19.300 --> 00:56:24.800
It's just making sure you have something fully baked and are solving a real business problem and not just a tech problem.

00:56:24.800 --> 00:56:25.080
Yeah.

00:56:25.080 --> 00:56:25.780
Very cool.

00:56:25.780 --> 00:56:26.560
All right.

00:56:26.560 --> 00:56:29.480
Well, I think we're just about out of time there, Andrew.

00:56:29.480 --> 00:56:31.640
So I'll hit you with the final two questions.

00:56:31.640 --> 00:56:32.320
Sounds good.

00:56:32.320 --> 00:56:32.680
Yeah.

00:56:32.680 --> 00:56:35.900
If you're going to write some Python code these days or editors, do you use?

00:56:35.900 --> 00:56:36.540
Ah, yes.

00:56:36.540 --> 00:56:37.520
The famous question.

00:56:37.520 --> 00:56:40.540
When I'm writing real Python code, I'm using Visual Studio.

00:56:40.540 --> 00:56:44.140
When I'm doing exploratory data analysis stuff, I can't be Jupyter.

00:56:44.140 --> 00:56:44.480
Yeah.

00:56:44.480 --> 00:56:45.860
Those are my two go-tos.

00:56:45.860 --> 00:56:50.500
I used to be an Atom with Vim for my real coding stuff.

00:56:50.500 --> 00:56:54.900
But the newer versions of Visual Studio are just so good, I can't pass it up anymore.

00:56:54.900 --> 00:56:59.780
Even though I liked being a rebel and doing completely open source, like Vim type stuff.

00:56:59.780 --> 00:57:00.540
Yeah.

00:57:00.540 --> 00:57:06.160
Well, and then Atom and Visual Studio Code have such similar origins, right?

00:57:06.200 --> 00:57:07.220
They're both Electron.

00:57:07.220 --> 00:57:08.580
Atom came from GitHub.

00:57:08.580 --> 00:57:11.920
Microsoft now overseeing GitHub.

00:57:11.920 --> 00:57:18.680
I think there's a lot of zen between Visual Studio Code and Atom these days.

00:57:18.680 --> 00:57:20.640
So it's not that wild, right?

00:57:20.640 --> 00:57:21.700
It's not that different.

00:57:21.700 --> 00:57:22.220
That's cool.

00:57:22.220 --> 00:57:26.060
And then you already mentioned some really good Python packages and libraries out there.

00:57:26.060 --> 00:57:28.340
But other notable ones maybe worth throwing out there?

00:57:28.340 --> 00:57:28.760
Yeah.

00:57:28.760 --> 00:57:29.640
NetworkX.

00:57:29.640 --> 00:57:33.580
I really love NetworkX for doing graph theory and things like that.

00:57:33.580 --> 00:57:39.820
And it's a very good connection with like Pandas and the Python dictionaries to be able to have different nodes and edges and things.

00:57:39.820 --> 00:57:46.820
So if you're doing any type of network-based work, which I've done at some previous companies and PhD program, it's a very good library to have.

00:57:46.820 --> 00:57:49.780
My favorite machine learning library has to be SkyKit Learn.

00:57:49.780 --> 00:57:51.500
It's just so easy to use, intuitive.

00:57:51.500 --> 00:57:57.080
The documentation, like I've never seen a Python library with such good documentation and so many good examples.

00:57:57.080 --> 00:57:57.420
Yeah.

00:57:57.500 --> 00:58:02.200
And then for Alibi, I really like as well for like that Anchors implementation I was talking about.

00:58:02.200 --> 00:58:04.260
So those are probably my three at the moment.

00:58:04.260 --> 00:58:05.320
But the change is weekly.

00:58:05.320 --> 00:58:07.680
Yeah.

00:58:07.680 --> 00:58:08.280
Those are great.

00:58:08.280 --> 00:58:08.640
Awesome.

00:58:08.640 --> 00:58:10.280
And then final call to action.

00:58:10.280 --> 00:58:15.820
People want to put their machine learning models into production or they have them and they want them to be better.

00:58:15.820 --> 00:58:17.960
Maybe what can they do in general?

00:58:17.960 --> 00:58:20.180
And then also how do they learn about what you guys are up to?

00:58:20.180 --> 00:58:20.640
Definitely.

00:58:20.640 --> 00:58:22.580
Monitor.ai.

00:58:22.960 --> 00:58:23.960
Come check us out.

00:58:23.960 --> 00:58:28.820
If you're in a regulated industry and you're wanting to deploy a machine learning model, we offer services to help you do that.

00:58:28.820 --> 00:58:41.420
And also the machine learning record platform that will allow you to have that audit trail and the counterfactuals and all the things we talked about in this episode to allow you to get past the auditors and regulators and being able to move those models to protection.

00:58:41.420 --> 00:58:41.780
Yeah.

00:58:41.780 --> 00:58:42.220
Cool.

00:58:42.220 --> 00:58:42.820
All right.

00:58:42.820 --> 00:58:48.560
Well, it's been really fun to talk about machine learning and watching how it goes, verifying what it does.

00:58:48.880 --> 00:59:00.500
And, you know, I think these, like you said, these are the things that are going to need to be in place before it can really sort of serve the greater public or the greater companies out there that need to make use of it.

00:59:00.500 --> 00:59:00.680
Right.

00:59:00.680 --> 00:59:02.240
So great topic to cover.

00:59:02.240 --> 00:59:02.580
Thanks.

00:59:02.580 --> 00:59:03.300
Thank you so much.

00:59:03.300 --> 00:59:04.300
Been great talking to you.

00:59:04.300 --> 00:59:04.920
Yeah, you bet.

00:59:04.920 --> 00:59:05.240
Bye bye.

00:59:05.240 --> 00:59:05.420
Bye.

00:59:05.420 --> 00:59:09.260
This has been another episode of Talk Python to Me.

00:59:09.260 --> 00:59:16.320
Our guest on this episode was Andrew Clark, and it's been brought to you by Linode and Reuven Lerner's weekly Python exercise.

00:59:17.020 --> 00:59:21.080
Start your next Python project on Linode's state-of-the-art cloud service.

00:59:21.080 --> 00:59:25.400
Just visit talkpython.fm/Linode, L-I-N-O-D-E.

00:59:25.400 --> 00:59:28.700
You'll automatically get a $20 credit when you create a new account.

00:59:28.700 --> 00:59:35.600
Learn Python using deliberate practice every week with Reuven Lerner's weekly Python exercise.

00:59:35.600 --> 00:59:39.000
Just visit talkpython.fm/exercise.

00:59:39.000 --> 00:59:41.080
Want to level up your Python?

00:59:41.420 --> 00:59:45.920
If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

00:59:45.920 --> 00:59:54.080
Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python.

00:59:54.080 --> 00:59:58.740
And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

00:59:58.740 --> 01:00:00.620
It's like a subscription that never expires.

01:00:00.620 --> 01:00:02.780
Be sure to subscribe to the show.

01:00:02.780 --> 01:00:05.180
Open your favorite podcatcher and search for Python.

01:00:05.180 --> 01:00:06.400
We should be right at the top.

01:00:06.400 --> 01:00:15.380
You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:00:15.940 --> 01:00:17.480
This is your host, Michael Kennedy.

01:00:17.480 --> 01:00:18.980
Thanks so much for listening.

01:00:18.980 --> 01:00:20.020
I really appreciate it.

01:00:20.020 --> 01:00:21.760
Now get out there and write some Python code.

01:00:21.760 --> 01:00:21.760
Thank you.

01:00:21.760 --> 01:00:41.380
Thank you.

01:00:41.380 --> 01:01:11.360
Thank you.