Monitoring and auditing machine learning

Episode Deep Dive Links Transcript

Traditionally, when we have depended upon software to make a decision with real-world implications, that software was deterministic. It had some inputs, a few if statements, and we could point to the exact line of code where the decision was made. And the same inputs lead to the same decisions.

Nowadays, with the rise of machine learning and neural networks, this is much more blurry. How did the model decide? Has the model and inputs drifted apart, so the decisions are outside what it was designed for?

These are just some of the questions discussed with our guest, Andrew Clark, on this episode of Talk Python To Me.

Episode Deep Dive

Guest introduction and background

Andrew Clark is the co-founder and CTO of Monitar, a startup focused on machine learning assurance. He first started in the field of accounting and auditing, where he discovered the power of Python for data analytics. Over time, he shifted into machine learning, recognizing the need to monitor and audit models in regulated industries. Currently, Andrew and his team are part of the Techstars Boston accelerator program, where they are refining their business, building robust ML auditing tools, and learning how to bring trustworthy machine learning solutions into real-world production environments.

What to Know If You’re New to Python

You’ll hear about Docker containers, logging, and version control in this episode. If you’re not familiar, just know they’re essential for making your Python code reproducible and consistent.
Basic knowledge of how Python functions and packages work will help, especially if you want to run tests on your data or re-run older versions of your code.
Understanding Python’s virtual environments or ways to separate dependencies is crucial, since different library versions can affect your machine learning model outcomes.
If you only know a little Python, that’s still enough to follow the main ideas on why auditing ML is so important, don’t be intimidated by the mention of advanced libraries.

Key points and takeaways

Why Traditional Auditing Methods Struggle with ML Machine learning replaces deterministic, rule-based decisions with statistical models that can’t be easily stepped through line by line. Regulators and auditors used to reading “if-else” logic now face black-box models. This can prevent companies from confidently rolling out advanced ML in regulated areas like finance and insurance.
- Tools / Links:
  - monitar.ai
Assurance and Trust in Machine Learning Organizations need assurance that their ML models make fair, accurate decisions, especially when those decisions affect people’s lives (e.g. loans or insurance). Andrew emphasized the importance of an audit trail for each decision. This includes tracking inputs, model version, environment details, and the resulting output to demonstrate fairness and transparency.
- Tools / Links
  
  :
  - Alibi Anchors (via Selden Alibi library) – For interpretability
Regulations and Bias Detection Europe currently leads with guidelines on AI transparency and fairness, while the US is still forming specific regulations. Bias detection is essential: Tools like PyMetrics and academic projects like Quintus can help detect unintended bias in models. By actively checking if certain demographic features influence outcomes, businesses can avoid discriminatory practices.
- Tools / Links
  
  :
  - PyMetrics (mentioned for bias detection)
  - Quintus (University of Chicago fairness toolkit)
Model Drift and Feature Drift As real-world data changes, machine learning models can “drift” away from their original performance. Andrew explained how drifting data distributions can create inaccurate outputs even if the underlying code hasn’t changed. Keeping tabs on feature ranges and alerts when they stray from expected values helps maintain consistent predictions.
- Tools / Links:
  - Great Expectations – For data validation
  - marbles – For testing data integrity
Counterfactuals and Reperformance To investigate decisions months or years later, you must know exactly what data the model saw and how it was configured. Andrew uses the term “counterfactuals” to describe re-performing a transaction under specific conditions or changed inputs. This technique is vital for forensics, compliance, and even “what if” analysis (e.g. what if income was different?).
- Tools / Links:
  - Docker containers – Ensuring version consistency
  - Pickle / model object files – Capturing the exact model state
Explainability Libraries for ML Because deep learning layers are notoriously opaque, libraries like SHAP (game-theory-based explanations), Alibi, and LIME aim to highlight which features mattered most in individual predictions. For Andrew, these are just part of the puzzle, business context, logs, and an end-to-end process are equally important.
- Tools / Links:
  - SHAP – Game theory-based interpretability
  - LIME – Local Interpretable Model-Agnostic Explanations
Quality Testing and Proper Engineering Practices Typical unit tests aren’t enough because ML outputs are probabilistic rather than purely deterministic. Instead, engineers often rely on range checks, data schema validations, and sensitivity analysis (e.g. small changes in inputs leading to expected changes in outputs). Andrew suggests Docker-based deployments and microservices architecture to keep the ML component isolated, auditable, and easier to update without interfering with the rest of the system.
- Tools / Links:
  - missingno – Visualizing missing data
  - Microservices with Docker – For code isolation and reproducibility
Reproducibility Crisis and Data Logging From academic science to corporate AI, reproducibility is a challenge. Andrew stressed meticulous logging of library versions, seeds, and environment variables. This ensures you can re-run or re-check a training job or inference step exactly as it happened, critical for regulatory inquiries or debugging complex problems.
- Tools / Links:
  - NetworkX – Mentioned for graph approaches, though less an auditing tool, it illustrates deeper data analysis in Python
  - scikit-learn – Andrew’s go-to library for supervised ML
Managing Real-World Business vs. Tech Andrew pointed out that building a reliable model is “table stakes.” The real challenge often lies in stakeholder acceptance, ensuring fairness, compliance, and proving the model is stable. Especially for regulated verticals like lending or healthcare, forging trust among executives, regulators, and consumers is vital to adoption.
Future Prospects of Regulated AI Responsible AI promises immense benefits: cheaper insurance, more accurate credit decisions, and better healthcare diagnostics. But it requires robust safeguards, audits, interpretability, bias checks, so that the technology remains beneficial, not harmful. According to Andrew, a combination of better toolchains and clear regulations can accelerate these positive outcomes.

Interesting quotes and stories

"Machine learning is extremely, extremely dumb. So you have to just have a lot of training data." -- Andrew Clark

"If you're building an algorithm, you should try and never have gender in there if it's a credit card application. It's not about being discriminatory, it's about preventing bias from sneaking in." -- Andrew Clark

"We want to provide a trust and an audit trail and then transparency around an algorithm... that’s more than just interpretability of the code." -- Andrew Clark

Key definitions and terms

Counterfactuals: Re-performing a past model decision with exact or modified inputs to see how that decision might change. Useful for audits and “what if” scenarios.
Model (or Concept) Drift: The phenomenon where model performance degrades because real-world data or usage patterns shift beyond the model’s original training scope.
Bias Detection: Checking whether protected or sensitive attributes (like gender or race) disproportionately affect outcomes. Ensures fairness and compliance with anti-discrimination laws.
Explainability / Interpretability: Techniques and libraries (SHAP, LIME, Anchors) that clarify how a model reaches decisions by highlighting the most influential features.

Learning resources

Below are a few curated learning resources if you want to go further into Python and data science.

Python for Absolute Beginners: A premier course if you need a solid foundation in Python before tackling ML auditing techniques.
Data Science Jumpstart with 10 Projects: Practice the core data workflows and libraries that often precede building advanced ML assurance processes.
Build An Audio AI App: While focused on audio, this course shows a real-world application of AI, including how to handle data pipelines and production-ready code.

Overall takeaway

Machine learning can deliver tremendous benefits, accurate decisions, faster processes, and new insights, if companies and developers address trust and transparency. By systematically logging data, versioning models, testing for bias, explaining decisions, and monitoring for drift, you make ML not just powerful but also accountable and safe. Andrew Clark’s insights underscore how being intentional about auditing and assuring ML is the key to leveraging AI in high-stakes, regulated environments.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Traditionally, when we've depended upon software to make a decision with real-world implications,

00:04 that software was deterministic.

00:06 It had some inputs, a few if statements, and we could point to the exact line of code where

00:10 the decision was made, and the same inputs would lead to the same decisions.

00:14 Nowadays, with the rise of machine learning and neural networks, this is much more blurry.

00:19 How did the model decide?

00:20 Has the model inputs drifted apart so the decisions are outside what it was designed for?

00:24 These are just some of the questions discussed with our guest, Andrew Clark, on this,

00:29 episode 261 of Talk Python To Me, recorded April 17th, 2020.

00:34 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:52 ecosystem, and the personalities.

00:53 This is your host, Michael Kennedy.

00:55 Follow me on Twitter, where I'm @mkennedy.

00:57 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter

01:02 via at Talk Python.

01:03 This episode is sponsored by Linode and Reuven Lerner.

01:07 Please check out what they're both offering during their segments.

01:09 It really helps support the show.

01:11 Andrew Clark, welcome to Talk Python To Me.

01:13 Hi, Michael.

01:14 Glad to be here.

01:15 It's always a pleasure listening to your podcast.

01:17 I'm really excited to get to talk to you today.

01:19 Well, thank you so much.

01:20 And I'm excited to talk to you as well.

01:22 It's going to be really fun to talk about machine learning.

01:24 And I think of all of the programming type of stuff out there, more than most, machine

01:31 learning is truly a black box for a lot of people.

01:35 The decisions it makes is sort of unclear.

01:40 It's sort of statistical.

01:42 To me, it feels a little bit like quantum mechanics over Newtonian physics or whatever.

01:47 It's like, well, I know there's rules that guide it, but it's kind of just it does its

01:53 thing and you can't really know for sure.

01:55 But at the same time, the world is beginning to rely more and more on machine learning, right?

01:59 Yes, definitely.

02:01 And there's definitely different levels of the complexity of it on how much is deterministic

02:05 and how much of it really is quantum mechanics.

02:07 Nice.

02:09 All right.

02:10 Now, before we get into that, though, of course, let's start with your story.

02:12 How did you get into programming and into Python?

02:14 Yeah, definitely.

02:15 In my undergrad, I was actually an accounting major.

02:17 And then about halfway through, I realized accounting is a great career, but it's really boring for

02:22 me.

02:23 So I started trying to just get into programming and stuff.

02:26 Big data was like the craze when I was in college.

02:27 Everybody was talking about big data and Hadoop and all that kind of thing.

02:31 So I really started wanting to learn more about it and work through Code Academy and learn

02:35 Python the hard way and some of those other books that are around.

02:38 Tried a bunch of different things to see what worked for me and learn Python the hard way

02:42 is what really seemed to stick.

02:43 And then really started just trying to find little side projects of little things to do

02:47 to actually build the programming.

02:49 So I tried doing the top down, like here's the theory, more computer science-y approach.

02:53 Didn't really work for me.

02:54 It's more the bottom up of like, I'm trying to solve a problem.

02:56 I want to move this file to this location.

02:58 And I want to like orient this image in my Excel file.

03:01 There's like different things like that to help learning how to program.

03:04 Do you think that programming is often taught in like in reverse in the wrong way?

03:08 Right.

03:09 It's like very high level.

03:10 And then we're going to work our way down and down.

03:11 And like, here's the concepts.

03:13 And then here's the theory.

03:14 And instead of just like, here's a couple of problems and a few things you need to know

03:18 and a whole bunch of stuff that we're not going to even tell you exists about.

03:21 Right.

03:21 Like you don't even know what a heap is.

03:23 I don't care.

03:23 We're going to just make this work until you like eventually care what like, you know,

03:27 memory allocation is.

03:28 Right.

03:28 At least I found, especially in the academic world, I felt sort of flipped on its head.

03:34 I definitely think so.

03:35 A lot of careers are built that way.

03:37 It's like you need to have some sort of a concrete of why does this matter?

03:39 Why do I care?

03:40 And like being able to do get your hands dirty and then go into the theory.

03:43 So you really want to do both in tandem.

03:45 But really starting with the hands on is really helped me with that approach.

03:48 I know some people can do it the top down theory way, but I definitely think computer science

03:53 could have a lot better programmers faster if they did it from the bottom up.

03:56 Yeah, I agree.

03:57 And computer science or programming in general is nice in that it has that possibility.

04:02 Right.

04:02 Like it's harder.

04:03 You know, we joked about quantum mechanics.

04:05 It's harder to just like, well, let's get your hands dirty with some quantum mechanics.

04:09 And then we'll talk about like wave particle duality, like whatever.

04:13 Like it just doesn't lend itself to that kind of presentation or that kind of like analysis,

04:18 I don't think.

04:20 But programming does, right?

04:21 You could say, we're going to take this little API.

04:23 I'm going to build a graph.

04:24 And how cool is that?

04:25 Now let's figure out what the heck we did.

04:27 Exactly.

04:28 So that's one of the beauties about programming, especially Python with its ease of starting

04:33 up and just getting something running on your computer.

04:35 You can really start dealing with problems you wouldn't be able to do in some of these fields

04:38 until you had a postdoc.

04:39 You can't even start talking about quantum.

04:41 Exactly.

04:42 So back to your story, you were talking about like, this was the way that you found worked

04:47 for you to get rolling.

04:48 Yeah.

04:49 And it really worked well with the different internships I was doing at the time.

04:52 So I was working in financial audit.

04:54 And during that time was really when auditors were starting to think about using data and

05:00 using more than just normally in auditing, you do random sampling.

05:03 So you'll look at a bunch of transactions from like accounts payable for like, these are bills

05:08 the company paid.

05:09 And being able to find that there's a lot of times there's duplicates.

05:11 Some people are paying invoices twice and things like that.

05:14 So being able to use programming to start solving just audit problems to speed up the process for

05:18 my company and just make my life easier, honestly.

05:20 It was really how I started getting very excited about Python besides just like very small little

05:25 utils on my computer.

05:26 So like seeing what can we really do with this and how can we make solve business problems

05:29 with it?

05:30 So I really came out of that pragmatic way.

05:32 That's cool.

05:33 Did you find that your co-workers, other people trying to solve the same problems, were trying

05:38 to solve it with Excel and you just had like this power that they didn't have?

05:41 Definitely.

05:42 Definitely.

05:43 Excel and some like specific analytics tools.

05:46 Like there's a couple out there for auditors that like Caseware Idea and ACL and things.

05:51 And they're just not as easy to use.

05:53 They trump Excel, but they have a massive learning curve themselves.

05:56 So it's like being able to do something like Python.

05:58 I can both camps of different types of auditors, either Excel or ACL auditors were able to

06:03 just run circles around them with testing.

06:05 That's pretty cool.

06:07 Well, that's a really interesting and practical way to get into it.

06:10 I've recently realized or come to learn that so many people seem to be coming into Python

06:17 and especially the data science side of Python from economics.

06:20 Yeah, there's a big move these days from people traditionally in economics would be using

06:25 like Matplotlib and no, sorry, Matlab.

06:28 Python's on the brain here.

06:29 Matlab and moving in to do more quantitative analysis.

06:32 And now really Python has kind of taken over in economics.

06:35 And there's like this a ton of people that are just coming over to the data science from

06:38 you do those types of modeling for econometrics and stuff.

06:41 So there's definitely a big surge of economists turned data scientists these days.

06:45 Yeah, cool.

06:45 What are you doing day to day now?

06:47 Like you started out doing auditing and getting into Python that way.

06:51 Now what?

06:52 Well, it's been a really crazy ride, but I'm now doing a startup called Monitor that works on

06:57 machine learning assurance.

06:58 Like how do we provide assurance around machine learning models?

07:00 So I'm the co-founder and CTO with them.

07:02 So my day to day right now, we're currently in the midst of Techstars Boston's is very much

07:07 just like meeting with different investors, meeting with their startup founders, meeting with

07:12 client potential clients and running as different parts of business.

07:16 So wearing a lot of hats right now.

07:18 And then I run all the R&Ds type side and figuring out like, what are the things we want to build?

07:22 And then I can start working with the engineering team to execute.

07:24 Do you find that you're doing all these things that sort of a tech background wholly unprepared

07:30 you for?

07:30 There's definitely some of that.

07:32 Some of it, like my accounting degree is definitely coming in handy.

07:35 Yeah.

07:35 A lot of the business side things, but there's definitely a lot of moving pieces.

07:39 And it's more than just strictly like being a good Python programmer.

07:43 So it's a little more than I was expecting on that side.

07:46 Well, I do feel like a lot of people, and this is like my former self included, is like,

07:51 if you build it, they will come.

07:53 All we need is like really good tech and like a compelling project or technology or product.

07:59 And then we'll figure out the rest of the stuff.

08:01 We'll figure out the accounting, the marketing, the user growth, like all that stuff.

08:06 And it's like, no, no, no, no.

08:07 Like the product and the programming is table stakes.

08:11 All this other stuff is what it actually takes to make it work, right?

08:14 Definitely.

08:15 You need to have innovative tech, but that's only a component of it.

08:19 Like that's like one fourth of the problem.

08:21 And, you know, for I think a lot of myself and I think a lot of first time startup individuals

08:27 as well, you come into that thinking like, hey, tech is going to be king when there's

08:30 actually so many other moving pieces like you're mentioning.

08:33 So it's actually an eye opening experience, but I really respect founders more.

08:36 Yeah, same.

08:38 Well, I do find it really interesting and almost disheartening when you realize like I have

08:44 all this amazing programming skill and I know I can build this.

08:47 If I had two months and I'm really focused, we can build this thing.

08:51 And you realize like, actually, that's just a small part of like having to build it.

08:55 Like this is where it starts.

08:56 This is not like basically it's ready to roll.

08:58 So I don't know.

08:59 It's something I've been thinking about over the last few years.

09:02 And it's not the way I think a lot of people perceive it starting something like this.

09:08 If you have like, we're going to get a cool programming team together.

09:10 We're going to build this thing and it'd be great.

09:12 It's like, okay, well, that gets you started.

09:13 But then what?

09:14 Right?

09:14 Exactly.

09:15 And the other problem you have with a startup is like you can't even do, okay, we'll give

09:19 me two months and we'll can build it.

09:20 Like with all the other competing business things and stuff, you can't even get that straight

09:24 two months of just building because you have all the investor obligations and things.

09:27 So it's definitely a lot of things to juggle.

09:29 So you get really good at time management and priority juggling.

09:32 But it's exciting, right?

09:34 It's a lot of fun.

09:34 Yeah.

09:35 Oh, it's fantastic.

09:36 And being part of Techstars is really wonderful.

09:38 It's wouldn't trade it for anything.

09:39 It's just definitely eye opening and a great learning experience.

09:42 Yeah, I've had some people on the show a while back who had gone through Techstars, but maybe

09:47 folks haven't listened to that show or whatever.

09:49 So probably the most popular, well-known something like that is probably Y Combinator, though I

09:57 don't know that it's exactly the same thing.

09:59 But tell folks about what Techstar is a little bit just so they have a sense of what you're

10:04 working on or what you're doing.

10:05 So Techstars is an accelerator.

10:08 There's like accelerators and incubators.

10:09 I didn't really know there was much of a difference, but being part of Techstars, the

10:13 accelerators are really very mentor heavy.

10:15 Like the first couple of weeks of Techstars, you have a ton of different mentors that you

10:18 meet with, like five a day or more, and trying to find who's going to help guide the vision

10:23 of your company.

10:23 How do you get to write people around the table?

10:25 What's your company vision?

10:27 Your mission statement?

10:28 How are you going to do fundraising?

10:29 What's your marketing strategy?

10:30 What's your go-to-market?

10:31 How do you get your first few customers?

10:32 How do you keep those customers?

10:33 It's very focused on the execution and how do you make a successful business that will last?

10:39 So it's a lot of very business-y things.

10:41 For a tech-run startup company thing, it's very, very focused on the non-tech aspects,

10:47 the business aspects, the how do we differentiate ourselves from the crowd and things.

10:51 So there's seminars, mentor meetings, there's just lots of moving pieces with that.

10:55 But it's a very, very good thing.

10:57 And it helps in about three months to take your company from a cool idea and you have a product

11:02 to really being able to execute your vision and have better staying power.

11:06 Yeah, well, it sounds like it's a lot of the stuff to sort of build that structure and support

11:11 of what we were just talking about, of like what you don't get if you know how to write

11:14 code and create a product.

11:15 Right.

11:16 It's exactly right.

11:17 Because like Techstars, 10 companies are accepted out of over 2,000 applicants.

11:21 So it's like you already have a product.

11:22 You already have good tech to make it across the bar.

11:24 They're like, okay, everybody knows what they're doing, how to build tech.

11:27 Now we've got to teach you guys how to actually run a company.

11:30 So it's a very good way of approaching it.

11:33 It's not like you all sit in a room with a bunch of group think and programmers making

11:36 it more geeky.

11:37 You think like, okay, you might even need to scale back the complexity of this a little

11:40 bit so we can actually market this to the user.

11:42 It's very, very good program for small companies to go through.

11:48 This portion of Talk Python To Me is brought to you by Linode.

11:50 Whether you're working on a personal project or managing your enterprise's infrastructure,

11:55 Linode has the pricing, support, and scale that you need to take your project to the next

11:59 level.

11:59 With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise

12:05 grade hardware, S3 compatible storage, and the next generation network, Linode delivers

12:11 the performance that you expect at a price that you don't.

12:14 Get started on Linode today with a $20 credit and you get access to the next generation

12:18 to native SSD storage, a 40 gigabit network, industry leading processors, their revamped

12:23 cloud manager, cloud.linode.com, root access to your server, along with their newest API and

12:29 a Python CLI.

12:30 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically

12:36 get $20 credit for your next project.

12:38 Oh, and one last thing, they're hiring.

12:40 Go to linode.com slash careers to find out more.

12:43 Let them know that we sent you.

12:47 Well, I want to talk a little bit more about that at the end and how it's probably changed

12:51 as the world has changed with COVID-19 and all that.

12:53 But let's talk about machine learning and stuff like that first.

12:57 Yeah, definitely.

12:57 We talked a little bit about machine learning and I joke that like, at least to my very limited

13:04 experience, it feels a little bit quantum mechanics-y in that same sense.

13:09 But let's tell people, I guess, what is machine learning?

13:14 Obviously, we have a bit of a sense, but like, let's try to make it a little bit more concrete

13:18 for people who maybe it's still like a buzzword and what makes up machine learning, I guess.

13:23 Definitely.

13:25 So I'll start by saying there's a lot of terminology where people kind of interchange

13:29 together, like AI, machine learning, deep learning.

13:31 So if you think of three big circles, AI is just a broad range of trying to make machines

13:37 to copy human tasks.

13:39 So AI doesn't have to exactly be modeling.

13:42 It can also be rule-based systems and things that try and mimic human intelligence.

13:46 So it's like a broad field of trying to do something that copies human behavior.

13:50 Right.

13:50 Computers making decisions, kind of.

13:53 Right.

13:54 Exactly.

13:54 Trying to make decisions like a human.

13:56 So there's a lot of different components in that.

13:58 And it's been a field that people have been talking about for a long time and you've had

14:02 movies about it for a long time.

14:03 So it's just a broad field.

14:05 There's very many components, a part of it, because you have some things like expert systems

14:08 and some sort of rule-based engines that actually do a pretty good job of mimicking that when

14:12 there's not actually any modeling.

14:13 So they have a limited use because like self-driving cars, if you think about that, you can't build

14:18 enough rules to specify every time a car should turn, there's too much going on.

14:22 You have to use something more complex than that.

14:24 By itself doesn't mean modeling.

14:25 Machine learning is types of statistical models that learn from data to make decisions.

14:31 So that is just a broad field of modeling that is what most people think about when you

14:36 think about machine learning.

14:37 And it's just the modeling aspect.

14:38 And we can go into the classes of models you have, like class supervised, which is basically

14:44 you can have a model, look at a bunch of data that has a decision.

14:48 So like if you look at a bunch of pictures of cake and it says it will be cake, and then

14:52 you look at another picture that's a bunch of raw ingredients and it'll say it's not cake.

14:56 You know, you can train an algorithm to look at those two pictures and make the correct

14:59 decision by optimizing a function.

15:01 Right.

15:02 Where you have, you already know the answer and you kind of tell it like a kid.

15:06 That's right.

15:07 No, that's wrong.

15:08 Right.

15:08 Oh, good job.

15:09 That was right.

15:10 You know, and so on.

15:10 Yeah.

15:11 And it takes that feedback sort of iteratively and like evolves itself.

15:14 Exactly.

15:15 And it's extremely, extremely dumb.

15:17 So you have to just have a lot of training data that is balanced and meaning you have

15:21 the same amount of pi not pi type scenarios.

15:24 So it knows how to learn from that and have very catered data sets that don't generalize

15:29 very well.

15:29 On the far other extreme side of things, you have completely unsupervised learning, which

15:34 is basically let's throw some data at this algorithm and see what patterns emerge, see

15:38 what groups happen.

15:39 So it's very good to use like for very early on exploratory analysis of like I have a bunch

15:44 of data.

15:45 I have no idea what to do with this.

15:46 Let's see if there's anything interesting here.

15:47 But in between the supervised and unsupervised, you'll have like supervised learning or semi

15:52 supervised learning and reinforcement learning, which basically what they do is it's a combination

15:58 of those two where you have not quite explicit labels on everything, but it's not completely

16:02 just sending an algorithm out in the unknown to figure out what's happening.

16:05 And that's where you've seen like a lot of AlphaGo and a lot of really cool problems

16:10 have been done with reinforcement learning and semi supervised.

16:13 Okay.

16:13 And there's a lot going on in that space as well.

16:15 Yeah.

16:15 Well, I think that's a pretty good overview and unsupervised side.

16:20 That's probably the realm of where people start to think, you know, computers are just learning

16:26 on their own.

16:27 And what if they get too smart and they just, you know, take over or whatever.

16:32 But that's kind of, I would say that's probably the almost the science fiction side of things,

16:38 but it's also real.

16:39 Obviously people are doing it, but that's probably what people think of when they think

16:43 of AI is like there's data, the computer looked at it and then it understood it and learned

16:48 it and found things that humans maybe wouldn't have known about.

16:50 Definitely.

16:51 Some of the continuous learning stuff, which is a part of subset of that.

16:54 Yeah.

16:54 And then the very minor part of deep learning was basically as a subset of machine learning,

16:58 which is looking at trying to copy how your brain synapses work.

17:01 And it's just a type of machine learning where you think of the convolutional neural networks

17:04 and things like that.

17:05 So there's like the three big tiers and the different parts that fit in between those.

17:08 Yeah.

17:09 And when I was talking about quantum mechanics analogies, I was thinking of deep learning.

17:13 Exactly.

17:14 Exactly.

17:15 And like deep learning is extremely valuable.

17:16 A lot of the really cool research problems you're seeing people develop, open AI and all

17:20 of these people are using that for doing reinforcement learning with deep learning and things

17:24 like that is not what a lot of companies are using today in regulated industries or even

17:29 in most of most of America.

17:30 You know, a lot of companies are using more traditional supervised and just even getting

17:35 past linear regression in a lot of instances.

17:37 The deep learning stuff that science fiction people are using, but it's not very widely deployed

17:41 yet because there's a lot of problems with the deployments.

17:44 Yeah.

17:44 And probably even just knowing with certainty that it's right and that it's reproducible.

17:49 Completely.

17:50 Yeah.

17:51 I think that the, you know, it's such a buzzword machine learning and AI that, right.

17:56 If a lot of people seem to think, well, even if just an if statement is in there, like,

18:01 well, the computer decided, so that's AI.

18:03 It's like, well, that's not really what that term means, but that's okay.

18:07 I guess the computer decided, yes, you're right.

18:10 But that's just code.

18:11 I mean, code's been having if statements for a super long time.

18:14 I was thinking back to something in the British parliament where they were upset that there

18:19 was some airline, which is bad, that some airline was looking when people would book tickets.

18:25 If the people had the same last name, they would put them in different seats away from

18:30 each other and then charge them a fee to put them back next to each other.

18:33 And they were saying that this, this AI system that is causing this, that they're abused.

18:39 This is not AI.

18:40 This is just an if statement.

18:41 If the names are the same.

18:43 Exactly.

18:43 They're going to come apart and offer them the upgrade to get it back together.

18:46 Exactly.

18:46 Yeah.

18:46 So I think it's interesting to think about how companies can apply machine learning and maybe

18:52 the hesitancy to do so.

18:55 Right.

18:55 Like if if I'm applying for a mortgage and the company is using, you know, some deep learning

19:02 algorithm to decide whether I'm a good fit to provide a mortgage to.

19:07 Right.

19:07 Some states in the United States have like walkaway laws, right?

19:10 You just decide like, I don't want to pay anymore.

19:12 And you can have the house even if it's like in crappy condition and not worth what it was.

19:16 Right.

19:16 So it's a big risk.

19:17 So they want to get that answer right.

19:20 Yeah.

19:21 Exactly.

19:21 But at the same time, I'm sure there's regulations and stuff saying you can't just randomly assign

19:28 or not assign or have some super biased algorithm just making this decision and say, well,

19:34 it's not our fault the computer did this, you know, unjust thing or whatever.

19:38 Right.

19:38 Exactly.

19:39 And that's what a lot of insurance companies, for instance, are struggling with right now

19:41 is there are laws in the US on like fairness and you can't do gender discrimination or those

19:49 sorts of different protected classes discrimination.

19:51 But there's not a very hierarchical full regulation around machine learning right now.

19:57 So a lot of companies are kind of in the dark on like regulators don't really like machine

20:01 learning.

20:01 They don't understand it.

20:03 They don't like the non deterministic aspects or like it's a they like rule sets.

20:07 Right.

20:07 So if this happens, we'll do this.

20:08 They're very comfortable with having those sorts of rules, but having this kind of nebulous

20:12 modeling.

20:13 Exactly.

20:14 It's easy to do a code audit and say, OK, this is the part where they tested like their income

20:22 to credit card debt ratio.

20:25 And it was above the threshold you all have said.

20:28 And so you said no.

20:29 And so it's because they have too much debt.

20:31 And in fact, I can set a break point and I can step down and look at the code, like making

20:37 those decisions and comparing those values.

20:40 And then here's what it does.

20:41 Right.

20:41 And as far as I know, like break points and deep learning don't really mean a lot.

20:46 Yeah.

20:47 Because a lot of times if you use like a support vector machine algorithm, you won't have a

20:52 linear decision boundary, meaning that sometimes if you have income over this amount and this

20:56 other thing will give you a loan.

20:57 And other times we won't.

20:58 It depends on what some of the other inputs are.

21:01 So that's where regulators get very nervous because you can't have that strict rule based

21:05 system.

21:05 Yeah.

21:06 But at the same time, this is an amazing technology that could make, you know, make those decisions

21:11 more accurate.

21:12 And it could be better.

21:14 Right.

21:15 It could actually understand like, yeah, OK, so you do have a lot of credit card debt, but

21:18 you're doing this other thing that's really shows that you are a good borrower.

21:22 Right.

21:23 Exactly.

21:23 Their code would have completely ignored.

21:25 But the machine learning is like, no, no, that actually those type of people are good.

21:29 You want to loan to them.

21:31 Right.

21:31 You want to lend to them.

21:32 Exactly.

21:33 Because it could be good to have this.

21:35 Right.

21:35 It's not just a negative.

21:36 It's just you just don't know.

21:37 That's a part of the challenge.

21:39 Exactly.

21:39 Because you could have be able to do more targeted rates for consumers so then they can

21:44 have lower rates.

21:45 The custom, the insurance company will have lower default rates.

21:48 Basically, everybody wins.

21:49 The problem is for for executives and regulators is how do you know that this how do you have assurance

21:55 and trust the system is doing what you want to do and you're not going to end up

21:58 on the front page soon like the Apple credit card thing a few months ago with, oh, you were

22:02 really discriminatory or you messed this up.

22:05 So the problem is we're being able to provide that assurance and trust over algorithms to allow

22:10 them to be deployed.

22:11 And that's kind of where the industry is struggling with right now because no one really knows quite

22:15 how to have that with these deterministic qualities.

22:17 Like how do you get that assurance around the model doing what it's supposed to be doing?

22:20 So that's where there's a lot of work right now on the different machine learning monitoring

22:22 you can doing.

22:23 Do you need a third party to monitor those sorts of things so you can provide that trust

22:27 and risk mitigation to be able to deploy the algorithms that are beneficial for both the

22:31 company and the consumer?

22:32 Sure.

22:32 And, you know, we're talking about America because we both live here and whatnot.

22:38 But I think actually Europe probably has even stricter rules around accountability and traceability.

22:44 Definitely.

22:45 That's a good stuff for my cursory experience of both.

22:47 Definitely.

22:48 And Europe's actually doing a better job of first regulating it right now and also providing

22:52 guidance around the regulations.

22:54 So GDPR has several, that's the General Data Privacy Act in Europe, has some regulations

23:00 around like if a consumer's decision has been made fully automated, such as the example you

23:04 just gave, the consumer needs to know and they can request the logic on how that decision

23:09 was made.

23:09 So there's a lot of user consumer protections around machine learning algorithm use in Europe.

23:14 And there's also a lot more guidance.

23:16 Europe is now a little behind on, they're creating now a white paper on guidance around how would

23:20 you make sure the algorithms are doing what they should be doing to help the companies.

23:24 But Britain, the UK has, Information Commissioner's Office has released two different white paper

23:30 drafts on how you do explainable AI.

23:33 How do you make sure you have the assurance around your algorithms and actually doing some prescriptive

23:37 recommendations on how to do that correctly.

23:39 So like Europe is really ahead of the US right now on first regulating AI and then also helping

23:44 companies and consumers understand it and find ways so they can still use it.

23:48 US hasn't really addressed those yet.

23:50 Yeah.

23:51 And it was exactly that situation that you were talking about.

23:54 If a computer has made a completely automated decision, the consumer can request why, right?

24:02 Exactly.

24:03 And that was like thinking around this entire conversation, like that is the core thing that I think is at

24:07 the crux of the problem here that made me like interested in this because the deep learning stuff

24:12 and the ability to get a computer to understand the nuances that we don't even know exist out there and make

24:20 better decisions.

24:21 I think that that's, that's really powerful.

24:24 And as a company, you would definitely want to employ that.

24:27 If it's required that you say, well, it's because your credit to debt ratio was too far out of bounds.

24:34 Like, how are you going to get a deep learning, like poorest sort of thing going on and go, well, it said this.

24:42 It's like, well, the weights of the nodes were this.

24:45 And so it said, no, that's not satisfying or even meaningful.

24:48 Right.

24:48 Like, I don't even know how, how do we get there?

24:51 Maybe we get, maybe ultimately people would be willing to accept, like, I don't even know.

24:57 Maybe there's something like, okay, it, it's made these decisions and it, we had these false positives and we had these false negatives.

25:03 Like, almost like an AB testing or clinical trial type of thing.

25:07 Like, we're going to let 10% of the people through on our old analysis anyway.

25:12 And then compared to how the machine learning did on both like the positive and negative.

25:17 I don't know.

25:17 But like, how do you see that there's a possible way to get that answer?

25:21 Yeah.

25:22 There's a couple approaches.

25:23 One of the main ones has been a lot of my research the past couple of years, which is you can provide assurance around the implementations because like a lot of the points that you just mentioned, we don't have that ability with a human.

25:32 And you usually understand why did a loan officer give that loan either?

25:35 Like, there's the type of understanding some people are asking from algorithms doesn't really exist with humans either.

25:40 If you ask a human two weeks later why you made that exact decision, they're not going to say the same thing that they were thinking at that time.

25:45 So you want to provide a trust and an audit trail and then transparency around an algorithm.

25:50 Basically give it a history and show that it's been making reliable decisions and it's operating within the acceptable bounds for the inputs and the outputs.

25:58 So being able to provide this holistic business understanding and process understanding is very huge.

26:02 It's very it's not really as much of a tech problem as it is a business problem and a process problem.

26:07 But also be able to provide the ability of this is what the algorithm saw when it made this decision.

26:13 So for even a deep learning algorithm, say you are taking in FICO score and you're taking an age and you're taking in zip code and things like that to make this decision.

26:21 Some of those are protected glasses, but you're taking in information by a consumer to understand.

26:26 You don't need to see like what neural network layer said something.

26:28 It's like based on these features, because your your FICO score is above 600 and because your zip code was in this high income area, we're going to approve this loan or not approve this loan.

26:38 So that you can see with some research.

26:41 Anchors is a great library that was open sourced a few years ago.

26:44 It's a there's Python implementation and Selden Alibi has a library that's really, really good that has a production grade implementation.

26:51 Can help you see what did the algorithm see when it made this decision?

26:54 So you can start addressing that inside the whole bigger process of providing business and process understanding.

27:00 I see.

27:01 So there's all these different facets or features that come in and you can say you can't exactly say, you know, here's how the flow went through the code and here's the if statement.

27:12 But you can say it keyed off of these three things.

27:17 It keyed off the fact that you have three cars leased and you have this credit card or whatever.

27:24 Right.

27:24 Like it said, those are the things that like sort of helped it decide.

27:28 Made a decision.

27:29 Exactly.

27:30 So when you put that in tandem with understanding the whole process, providing the ability to go back and verify, you can start getting more assurance around the implementation and start getting comfortable with it.

27:39 Because that's the other problem.

27:40 A lot of times with these algorithms is like it makes a decision.

27:43 And as a company, like how in the world if a customer asks, why did you make this decision about me six months ago?

27:47 How are they going to go back and get the exact log file, make sure you have the exact model version, be able to rerun that specific model version and see all these types of information?

27:55 It's not strictly just a tech problem of let me see which neural network layer was activated and which neuron.

28:03 Like it's more process based and there's a lot of components to it.

28:07 So that's where a lot of times we see right now in the data science community, people trying to solve this problem by building better explainability, by understanding that neural network better.

28:15 But that's not really addressing the consumers ask on like, why did this make decision happen?

28:19 How do I know that you just didn't arbitrarily make this?

28:21 How do we know it's not biased?

28:22 Those sorts of more fundamental issues aren't something you can just address by better neural network explainability tool.

28:29 Yeah. Do you think there's somewhere out in the future, the possibility of like a meta network in the sense that like there is a neural network that looks at how the neural network is working and then tries to explain how it works?

28:42 Like use AI to like get it, get it to sort of answer what the other AI did.

28:46 There's definitely some things like that currently in research.

28:49 There's a whole bunch of different good explainability libraries out there and people that are addressing specifically those types of problems.

28:54 And that's Monotar is doing that as well, except for more of the business process side and the basic understanding about the model.

29:00 It's really to get the consumers comfortable.

29:03 It's going to be understanding and being able to prove why something was done, which is more than just specific deep learning interpretability.

29:10 Because a lot of times these loan models and stuff are still like we're moving from a loan officer to trying to do a very basic machine learning model.

29:17 Like they haven't even gotten to complexity of deep learning.

29:20 So it's like incremental improvements.

29:21 Right.

29:22 You talked about your origins being in the big data era and all that.

29:27 Is there some sort of statistical way that people might become, you think, comfortable with things like deep learning, making these decisions?

29:34 Like, so for example, you've got all of the records of the mortgages in the US.

29:42 If you were able to take your algorithm and run it across those as if it was trying to make the decision and then you have the actual data, kind of like the supervised learning you talked about going through and saying, OK, we're going to apply it to all of this.

29:57 Maybe we don't share that data back with the company, but they give us the model.

30:01 We run it through all of it.

30:03 And it's within bounds of what we've deemed to be fair to the community or to the country.

30:09 Definitely.

30:09 So that's those are some definitely tests we can do.

30:12 And ideally, if you're building a loan algorithm, you're going to want to see what those historical statistics are to make sure that our our model is doing, say, classification percentages to be in line with what's expected for the general population.

30:25 So, for instance, if you're you're doing an algorithm, if you have basically this at certain FICO bands, FICO score bands, you're going to have X amount of acceptance.

30:34 So if you start seeing your model starts really not accepting people that are in a certain range, like we can definitely start raising a flag.

30:40 There's concept drift going on.

30:42 So there's definitely tests you can do around there.

30:44 There's a Fisher exact test that lets you check.

30:46 Say if you have age, you don't really want to be using age as a indication or gender, for instance.

30:52 But in some instances, you have to.

30:53 So you can run a test to see if it's ever statistically significant that that one variable is negatively influencing the outcome.

30:59 There are definitely ways you can make sure that the algorithm, based on the example you said with the loans, this is the amount of normally acceptance, not acceptance that we've had over the past X amount of years in America for loans.

31:10 And that's kind of what we think is OK.

31:12 There are definitely tests you can do, such as the Fisher exact test.

31:15 It's a statistical test and others that you can do around making sure an algorithm isn't biased and it's going within the percentage that you're wanting it to do for acceptance.

31:22 So there's a lot of tests that companies can do there, some that we're implementing, some that other people are implementing and a lot of things that people can do.

31:29 But really, for the public to really start accepting that machine learning is OK, I think there really needs to be some sort of more government regulation that's at least even like rubber stamping that this is OK.

31:39 And that we've looked at this algorithm.

31:40 There needs to be like third party assurance and third party audits of algorithms or not even you don't have to share your code if it's like it's a proprietary thing.

31:48 But just the understanding that this is doing what it should be doing and it's not discriminatory can be very important for people to be able to trust AI systems.

31:56 I think it's going to be super important.

31:57 And when I threw that idea out there, what I was thinking of was the XPRIZE stuff done around health care and breast cancer, where instead of you getting all the data, you submitted a Docker image with your algorithm that was then run and it couldn't communicate out.

32:14 It was run against the data and then you got just the trained model out of it and then you could basically go from there.

32:22 So like there was this sort of arbitrage of your model meets the data, but you don't ever see the data and no one else sees your model kind of thing.

32:29 Definitely.

32:31 It's just somebody needs to facilitate that.

32:33 That's a trusted party.

32:34 So like some sort of government regulation to enable that kind of thing.

32:38 But those sorts of processes will definitely start allowing people to trust the system and allow state regulators and things to be able to be signing off on systems and being comfortable to let the public enjoy better insurance policies.

32:48 Right.

32:49 But we need to have some sort of assurance to get there.

32:54 This episode of Talk Python To Me is brought to you by me, Reuven Lerner, and Weekly Python Exercise.

33:00 You want to do more with less code or just write more idiomatic Python?

33:04 It won't happen on its own or even from a course.

33:06 Practice is the only way.

33:08 Now in its fourth year, Weekly Python Exercise makes you a more fluent developer.

33:13 Between pytest tests, our private forum, and live office hours, your Python will improve one week at a time.

33:21 Developers of all levels rave about Weekly Python Exercise.

33:23 Get free samples and see our schedule at talkpython.fm/exercise.

33:29 So you talked a little bit about some of the Python libraries that are out there to help people understand how these models are making decisions and understand a little bit better.

33:40 Some other things that are probably, like we could talk about some of the other things you're doing and, you know, some things that might matter.

33:48 It's like you talked about the company that if I asked them six months later, why did you decide this thing, right?

33:57 They probably still have all of my form fields I filled out.

34:00 Like my credit history is X.

34:02 My average income is whatever.

34:05 I've been in the job for this long.

34:08 But how would they go back and run that against the same code, right?

34:13 So maybe what they would do is they would go and say, well, let me just ask the system again.

34:16 What would the answer be now and why?

34:18 But they may have completely rolled out a new version of code, right?

34:21 It might not even do the same thing.

34:23 And I'm sure they don't have like retroactive versions of like the entire infrastructure of the insurance company around to go back and run it exactly.

34:33 Maybe they do, but probably not in a lot of companies.

34:35 Like the ability to almost a version control, but like production, right?

34:40 So knowing how the model changes over time, like what else do you guys need to look at or keep track of to be able to like give that kind of, you know, why did this happen?

34:51 Answer.

34:51 Definitely.

34:52 And that's the key part is summarizing all information and being able to replay that exact algorithm with the decision.

34:57 So to be able to do that, you really have to have exact feature inputs.

35:02 Like what exactly did the user say, the exact model version, you need to know the, so like the model object file, the pickle file, something like that, the exact production Python code.

35:11 Then you have to have the actual version of the library used in the same environment.

35:16 So there's a lot of things going on there and the amount of logging and that you have to have in place there and have it very easily to be accessible and non tampered with and be able to recreate that environment.

35:25 That's a hard technical problem.

35:26 A lot of companies don't have in addition to just having like the, how do I understand that the exact, some sort of interpretability for the decision and then the metrics and monitoring around it.

35:35 That's a big ask.

35:36 And that's where a lot of companies are struggling when they start hitting these regulatory audits.

35:39 Sure.

35:40 Well, I can imagine it's really tricky to say, all right, well, when our app started up back then and we went back and we looked in the logs and it said this version and this version, this version of the library, but maybe somebody forgot the log.

35:52 Oh yeah, there's a dependency library that actually matters.

35:54 And it was that version where we were running on this version of Python and its implementation of, you know, floating points slightly, slightly changed.

36:04 And it had some effect or I, you know, I don't know.

36:06 Right.

36:06 Like there's just all these things.

36:08 And it seems like there's probably some analogies here to the reproducibility challenges and solutions that science even has.

36:17 Exactly.

36:18 It's along the exact same lines, but it's even more exacerbated and more difficult to solve.

36:23 Because if you think about science, it's like you're trying to reproduce the results of one paper.

36:27 You know, if it's a biological experiment, it might be a little harder to recreate those conditions.

36:30 But in computer science, like you should be able to save the seed, send a Docker file with everything and rerun your algorithm results.

36:36 Right.

36:37 But people are even having a hard time to do that because the amount of process and forethought into how you're going to be packaging things, how you're going to be setting things up, it's a lot to deal with.

36:45 But like reproducibility is just a major crisis in science right now, too.

36:48 But it's really hitting the corporate environment really hard to be able to provide that kind of ability around the algorithms to be able to answer questions.

36:56 Because if like you want to implement these things, people are going to want to have audits and they're going to want to understand why something was done.

37:00 So exactly what's happening in the scientific community, except even on a larger scale.

37:04 Now, two things you talk about when you look at Monitar's website, talking about what you all do there.

37:10 One is counterfactuals.

37:13 And the other is detecting when model and feature drift occur.

37:19 Do you want to address those problems a little?

37:20 Yeah, definitely.

37:21 So counterfactuals is basically what we've just described is the re-performance of an exact transaction.

37:27 So we record all those things, the versioning of all the files.

37:30 So when you go back six months from now, six years from now, and go and select on a specific transaction, if it's a tabular transaction, we can actually hit a button and we will pull all that stuff up in a Docker container and rerun it.

37:43 So that's what we're calling counterfactuals is the ability to go back and re-perform a transaction and then perform what-if analysis.

37:48 Say like one of the variables said your income was $200,000.

37:53 And if you want to change it to $150,000, you can go do that and rerun the transaction as well off of that old version.

37:59 So it allows you to do the sensitivity analysis if a consumer is like, well, what if my income was slightly different?

38:03 But also re-perform for audit tracing that it's doing exactly what it said it was going to do.

38:08 Sure.

38:09 Okay.

38:09 And then model drift.

38:11 Yeah.

38:11 Model drift is exactly like what we were talking about a few minutes ago with you have loans.

38:15 Normally 60% of loans are rejected, 40% are accepted.

38:18 And that's kind of the average for this risk class.

38:21 Model drift will allow to see in a monitor platform when your model has started to drift out of those bounds.

38:27 If you say, I'm okay if the model's in between 75% classification and 50% classification of you don't get a loan of rejection.

38:34 And if we start slipping to 80% of loans are now being rejected, we're going to throw you alerts and say, hey, your model has drifted.

38:40 Something is wrong here.

38:41 I see.

38:41 Yeah.

38:42 So it's not necessarily detecting that the model is making some variation in its predictions, but it's saying you've set these bounds.

38:52 And if it's outside of these bounds, like something is wrong.

38:54 Let us know.

38:56 Exactly.

38:56 And that will be saying that the algorithm is making kind of a drift in what it's supposed to be predicting.

39:01 Same with features.

39:02 There's a lot of times, there's a great paper a couple of years ago by Google called the high debt or high credit card debt or something in machine learning implementations.

39:10 The name's not quite right there, but it's a very popular paper on technical debt, whereas basically a lot of times when you have an algorithm, a lot of the code around it is where your issues are going to happen.

39:20 So if you have a model that's used to having features between 1 and 10, and you start having a drift with a mean of 5, and you start having a drift up, that can be affecting the model's outputs, but you can't really detect it until it's too late.

39:33 So what Monotar does is we allow you to look at the feature drift and see, like, if I'm expecting this feature between 1 to 2 standard deviations of 5, a mean of 5, and you start getting higher numbers out of there, we'll know, like, hey, there's feature drift, which means your model is not performing in the same environment that it was built for.

39:50 So we'll be able to know that, hey, you need to go look at this, the situation your model's built for has changed.

39:55 Because a lot of times when these models start misbehaving, it's not because the model code has changed, quote unquote, it's because the environment that it was built for has changed.

40:03 You're no longer in that same environment.

40:04 I see. And that's one of the real challenges of the whole ML story, right?

40:09 Is the model's good at answering the question the model was intended to answer, but it may be completely inappropriate for something else, right?

40:18 Like, self-driving cars work great in Arizona, but you put them in the snow and they can't see the road anymore because they don't know to look at snowy roads or whatever, right?

40:27 Exactly. Because algorithms are extremely, extremely dumb.

40:30 Like, people are trying to make them better with this transfer learning and some of the semi-supervised learning and things.

40:36 But when it comes down to the root of it, there is no thinking going on here.

40:39 It doesn't matter how accurate it is at detecting cancer and radiology images.

40:42 There is no thinking. It's a dumb algorithm that's made for a specific set of circumstances.

40:46 It can be fantastic as long as those circumstances change.

40:50 But that's where the key problem happens in production is those circumstances no longer hold true, so your model starts performing badly.

40:55 Your model is doing the exact same thing it was trained to do, just the inputs are different.

40:59 I see. So one of the things that might be important is to keep track of the input range and document that in the early days and then keep checking in production that it still holds true.

41:13 Exactly.

41:14 Exactly.

41:15 And that's where we can start testing for bias and things like that as well.

41:19 Seeing with one specific variable, such as we mentioned, gender variable, if that starts being a key influencer for a decision, we also know something is up there.

41:28 So you can start doing proactive bias monitoring.

41:30 Interesting. So maybe you don't actually want to take gender into account, but it's somewhere in the algorithm or somewhere in the data.

41:39 And you can test whether or not it actually seems to be influencing.

41:44 Like you can detect bias in a sense.

41:46 Exactly. You choose whatever feature.

41:48 Like if you're building an algorithm, you should try and never have gender or something like that in there.

41:52 But occasionally, such as you think of cancer screening, you're going to have to include that because that's a key component.

41:57 I would argue you should never include gender if you're doing a credit card application.

42:00 But if you're doing something more fundamental, like cancer screening or something, you still need to be able to have those sorts of things.

42:05 Or like if it's a image recognition algorithm, you're going to have to include race just because it may may different skin tones may affect the results of the algorithm.

42:13 Doesn't mean we're trying to be any sort of bias.

42:16 But that's why you want to have these controls to make sure bias doesn't occur.

42:18 We've had a lot of radiology implementations that won't work as well on certain individuals.

42:23 So like there's all these things that machine learning can improve everybody's lives.

42:27 We just need to have the right safeguards in place because almost every single company is deployed machine learning.

42:32 None of them are trying to be discriminatory.

42:33 That's not the purpose.

42:35 They just there's these things that will happen and they're not aware of.

42:37 So it's just making sure you have a controlled environment to make sure that doesn't happen or being able to catch it when it does happen so you can fix it.

42:43 Yeah, that's pretty awesome.

42:44 What else should people be thinking about that they either need to be doing, logging, trying to use libraries to help with, especially in production with their models?

42:56 Definitely like the structure of how you do your models is very important.

43:01 I'm a huge advocate of using microservices and Docker containers and trying to do, especially for these complicated deployments, make as many of the services possible.

43:11 So it's just like sometimes you want to have your algorithm is strictly in a container and then it will interact with your logic in a different container.

43:17 Because when sometimes you have everything combined into one area is when you can start having like that technical debt and things build up.

43:23 And it's very hard to figure out what's broken and where is it broken.

43:26 So being able to keep things as separate as possible in complex deployments really helps to figure out the root cause.

43:32 Yeah, that's interesting.

43:33 Because if you have it kind of mixed in to your app, you probably deployed 10 versions of your app.

43:40 Did any of them actually affect the model or was that just like to change some other aspect of an API or some aspect of the website or something like that that is just,

43:50 well, we changed how users log in.

43:51 They can now log in with Google.

43:53 But that didn't affect the machine learning model that we're deploying.

43:56 But if you have it as a separate API endpoint that's its own thing, then you know.

44:01 Exactly.

44:02 It's the same thing from like science is how do you get rid of the confounding variables and do as much of a control test as possible.

44:08 So the more you can have those things that you know what's changing, you'll know what to fix when something breaks.

44:13 So having those sorts of architectures and really coming in with document, document, document and auditing because I'm an ex-auditor.

44:19 If it wasn't documented, it doesn't exist.

44:21 Not really true, but that's how auditors look at things.

44:24 But that's extremely important to do when you're working in regulated industries or you're working in areas when you're building these models.

44:30 You need to document all your assumptions.

44:31 Where's your data coming from?

44:32 All of this, the plain businessy things that, frankly, data scientists hate to do.

44:36 I hate to do it as well.

44:38 But you need to have that kind of thing for being able to show other people, different stakeholders, and also, frankly, even cover yourself if something comes back later.

44:47 Like, here's why we did something and helping you remember if you look at this code six months ago.

44:51 So just the documentation, having these sorts of planned way that you're building an algorithm instead of just agileing.

44:57 Agile is great, but you have to have an overarching plan for some of these things instead of just MVP it until it hits production.

45:03 Sure.

45:03 And while it also sounds like a little bit of the guidance is also just good computer science.

45:09 Right, exactly.

45:10 It's a very big problem in this space is data scientists are not normally good software engineers.

45:17 So a lot of the great software engineering practices haven't really translated into machine learning code.

45:21 There's becoming a big trend towards that with machine learning engineers and things.

45:25 So we're definitely trending in the right direction.

45:27 But still, there's a lot of models that get deployed that don't have the engineering rigor that is common in the Python community.

45:33 Like, there's certain ways we do things with CICD, with unit tests and stuff, and a lot of machine learning code doesn't have those things.

45:39 Sometimes you can't really, in a non-deterministic outcome, it's hard to unit test.

45:43 But there's a lot of the processes of good engineering that we can apply to help make everybody's lives easier and better AI deployments.

45:49 Sure.

45:49 I was thinking about that as well in testing.

45:51 And it just seems like that's one of the areas that is tricky to test.

45:55 Because it's not like, well, if I set the user to none, and then I ask to access this page, I want to make sure it does a redirect over to the login page.

46:05 If I set the user to be logged in, and I ask the same thing, I want them to go to their account page.

46:09 And so it's super easy to write that test.

46:11 User's none, call the test, check the outcome.

46:14 User's this, call the test.

46:15 What about testing these ML models?

46:18 It's definitely a challenge, and it's a developing art.

46:21 And it's more of a science at the moment than, more of an art than a science.

46:26 But one of the ways you could do it is kind of like how we talk about ranges, is you'll say, like, to unit test a machine learning algorithm, like, hey, we're expecting a range between this and this, and we're acceptable.

46:35 These are acceptable ranges.

46:36 And you just use it there.

46:38 You won't know if 100% is working, because you would have to look at over time to see if there's drift.

46:42 But you'll be able to say, like, hey, if it's a regression problem, meaning I have an output between 1 and 10, and it's supposed to be a 5, if I'm hitting an 8 or a 2, that means something's probably not quite right.

46:53 You can put some sort of ranges for your unit test.

46:55 So it can't be quite the deterministic, like, I know exactly this test is failing or not, but at least can give you some assurance and some heads up for your curve.

47:02 Sure.

47:03 So maybe it feels a lot like testing scientific things as well, right?

47:08 You can't say, if I call the, you know, estimate, whatever, right?

47:13 And I give it these inputs, all floating point numbers, I can't say it's going to be equal to 2, or it's going to be false.

47:20 It's probably some number that comes out.

47:22 And you're willing to allow a slight variation, because, like, the algorithm might evolve, and it might actually be more accurate if it's a little, you know, if it's slightly better.

47:31 But it's got to basically be within a hundredth of this other number, right?

47:35 So you've got to do, like, range.

47:37 Like, it's in this range, and it's not too far out.

47:40 I guess it's probably similar.

47:42 What do you think about, like, a hypothesis or other sort of, like, automatic property-based testing where you could say, give me some integers here in this range and give me one of these values and sort of give me a whole bunch of examples and let's test those.

47:57 That's definitely a very good, sensitivity analysis is a very good way to do that.

48:02 So, like, ideally, you'd want to do that kind of testing.

48:04 So it's better.

48:05 You can't really unit test as well.

48:06 Like, we talked about that.

48:07 But sensitivity analysis testing is fantastic to do, which is, like, here are the different scenarios, here are the different users we would get, and run a bunch of them with slight variations through your model and see if it performs as it should.

48:17 So that is definitely a very good way to test a model, and you should never deploy a model without doing that.

48:22 The other test you can kind of do is you can't really unit test machine learning code too accurately, but you can unit test data.

48:29 So there's a couple good libraries out there that will help you unit test data right now.

48:33 I think marbles might be one in Python, and I think there's great expectations that you see, like, is the schema right?

48:39 So that's really huge because, like, this is supposed to be an int, or is this supposed to be a float, or, like, a float between these numbers and things.

48:45 So you can really check the input data too.

48:47 So there are some tests you can do a little more accurately around your data in addition to the sensitivity analysis.

48:52 Yeah.

48:53 Great expectations.

48:54 It's cool.

48:54 It's got things like expect column values to be not null, or expect column values to be unique, or expect them to be, you know,

49:03 between such and such values.

49:04 That's pretty cool.

49:05 Yeah.

49:06 So it's a very cool library.

49:07 Very good to do that kind of testing.

49:08 Yeah.

49:09 I suppose if you have some sort of, you talked about unexpected situations as, like, it's, you know,

49:15 my analogy was it's built for dry, sunny roads, not snowy roads, and automatic car driving or automated driving.

49:22 But similarly, it's probably not built to take a null or none in a spot where it expected a number, right?

49:30 What's it going to do with that, right?

49:32 And that's, it's great to do the looking for the, using a Taleb example, black swans.

49:37 Like, it's good to test your model for when something bad is going to happen.

49:40 Like, even just to do the correct try accepts type statements around it.

49:43 For like, here is something crazy that should never happen, but let's see what happens anyway.

49:47 So we can handle that.

49:48 So we don't become the next news story with this company really screwed up type scenario.

49:53 So it's good to do those types of tests that are just like, we hopefully never have this, but what happens if?

49:58 So we can build the type of exceptions and logic around those sorts of crazy scenarios.

50:02 Sounds good.

50:03 Sounds like a lot of things that people can go figure out, you know, find some of these libraries, try to add some of these techniques.

50:10 I mean, you guys over at Monitor are doing this.

50:13 You're not quite out of Techstars yet.

50:15 You know, people can check you out.

50:16 What else would you recommend that folks maybe look into to make this practical in their environments?

50:23 So definitely those two great expectations is great for like unit testing your data.

50:27 Alibi is a library that has anchors implementation.

50:30 I was talking about anchors at explainability library.

50:32 It basically gives you if statements on why a transaction was done.

50:35 It's kind of a 2.0 from Lime, which is a very popular data explanation.

50:39 Those are some good ones.

50:40 Shapp is a great library as well.

50:42 It gives you, it's game theory based for giving you values of what's, how your algorithm is interpreting a decision.

50:47 That's cool.

50:48 Yeah.

50:48 So it's a very, very cool, cool library.

50:51 And then it's just really starting to apply.

50:53 So that's all of the like explainability type scenarios.

50:56 Py Metrics has a very cool library that does some bias detection.

51:00 That's something that's really cool to check out.

51:02 Quintus is as a bias and fairness audit toolkit that is really cool.

51:06 And that's out of the University of Chicago.

51:07 So those are some good ways to get started on how to provide more assurance around your models.

51:12 And then just really, you want to be doing Docker, you want to be doing best practices on your engineering.

51:16 Yeah.

51:17 So some of the good places to start.

51:18 Yeah, absolutely.

51:19 You know, something that came to mind while you're talking, there's this thing called missing no, missing N O, which is a Python library for visualizing missing numbers and data.

51:30 And so you can give it interesting.

51:32 I think Panda data frames or something like that.

51:35 And it gives you these graphs with like continuous lines if it's everything's good or like little marks where there's like bad data stuff.

51:42 It's a pretty cool library for like trying to just like throw your data set at it and visually quickly identify if something's busted.

51:49 Well, that's very cool.

51:50 I have not heard that one.

51:51 I'm going to go check that out.

51:52 Yeah, absolutely.

51:53 Yeah, absolutely.

51:54 Absolutely.

51:55 Okay.

51:55 One final machine learning question, I guess.

51:57 Keep it positive.

51:58 Like what are the opportunities for this?

52:01 You talked about how many companies are kind of at the rudimentary stage of this modeling and predictability.

52:09 And there's obviously a lot more way to go with the challenges that the whole episode basically has been about.

52:14 But what do you see as the big opportunities?

52:16 You get this.

52:17 You get these key components, right?

52:19 You're going to be able to start having companies providing cheaper insurance, better medical screening.

52:25 I mean, self-driving cars.

52:27 These are all the types of things that are really going to help society and humanity when we get right.

52:30 There's just a few things like when we talked about to get ironed out.

52:33 But really, like the future is very bright.

52:35 Machine learning can do some great things.

52:38 And like together as a community, we'll be able to make that happen.

52:40 Yeah, awesome.

52:41 And, you know, so you talked also at the beginning about your Techstars experience and you guys are going through and launching Monotar through Techstars.

52:50 And that's awesome.

52:51 But I suspect that a bit of a change has happened.

52:55 Like Techstar, to me, these incubators and accelerators are all about like coming together.

53:00 It's almost like a hackathon, but for a company.

53:02 You come together with all these people and it's just like this intense period where you're all together and working with mentors.

53:07 And co-founders and creating.

53:09 And we've all been told to not do that.

53:12 So how's that work?

53:14 Well, Techstars did a very good job of switching to online.

53:18 So still all the same things happen.

53:19 They even started setting up some like water cooler sessions and stuff for people to kind of just chat informally.

53:24 And so they moved to online to Zoom very, very early on and did a very smooth transition.

53:28 So that was very good.

53:29 They did a great job of handling it.

53:31 We're having a virtual demo day in two weeks, end of April, which is not ideal.

53:37 But we're going to also have a live one in September as well, I believe.

53:40 So they're really going above and beyond to help with the transition for companies and doing a bunch of extra like extra classes around fundraising during COVID and things like that.

53:48 So Techstars has done a great job transitioning and they had all the technology in place to make it a pretty smooth transition.

53:54 So even though it's not the same, like the quality is still very much there and we're still getting a lot out of it.

53:59 Yeah, that's good.

54:00 I'm glad that it's still going.

54:01 I mean, it won't be rough.

54:03 How did you get into it?

54:04 You talked about, I don't remember the exact numbers, but it sounded like 101 applicant to acceptance ratio or something like that.

54:11 Like if people out there are listening, they're like, I would love to go through something like Techstars, maybe ideally in a year when it's back to its in-person one.

54:19 But even not, right?

54:21 I'm sure it'd be super helpful.

54:22 What was that experience like?

54:23 How do you decide to get going and how'd you get in there?

54:27 So we did some networking around from the Boston area, kind of like meeting some of the founders in the area, different VC firms, kind of like early socializing.

54:34 Like who we are, what we're about.

54:36 So we kind of got our name around.

54:37 So starting like the socialization with the, there's Techstars in different cities, kind of like getting kind of the community of founders and startups, kind of get yourself known.

54:46 And then it's the application process.

54:48 You just need to really rock that.

54:49 And then just being part of the community and doing a good application and have a compelling story.

54:54 But definitely the pre-networking before the application to kind of like start showing who you are, do as many meetups as you can, like get your name out there.

55:02 So if someone has heard of you when they, when the application comes across, it's definitely helpful.

55:06 Yeah, I'm sure they, oh, I've heard of them.

55:08 They actually were doing something pretty cool.

55:09 I talked to Andrew or something like that kind of stuff goes really far, farther than it seems like it should necessarily just like on the surface.

55:16 But yeah, absolutely.

55:17 There are definitely companies that got into this year's Techstars program that had, that were completely cold.

55:22 So it's not like, it's just based on the merit.

55:25 We happen to be in the area and had the ability to network with people before.

55:28 But honestly, like if you're in Techstars or not in Techstars, you should be doing what we were doing anyway, socializing your company.

55:34 So like they are very, very sure they're a fair company.

55:38 Several of our companies are female only led.

55:40 So it's like they're doing a great job of being a fair and inclusive place that's trying to be as non-biased as possible with acceptance.

55:47 Yeah, that's excellent.

55:48 I suspect that machine learning in particular and data science type of companies in general are probably pretty hot commodities at the moment.

55:57 And you have an above average chance of getting into these type of things, just looking in from the outside.

56:04 So even a lot of listeners who particularly care about this episode, maybe they've got a good chance.

56:09 Definitely.

56:09 And let me know.

56:10 I'd love to hear your idea as well.

56:11 Awesome.

56:12 Feel free to reach out.

56:12 I'm still connected in the space.

56:14 So let me know.

56:15 But definitely innovative AI ML companies are definitely big right now.

56:19 It's just making sure you have something fully baked and are solving a real business problem and not just a tech problem.

56:24 Yeah.

56:25 Very cool.

56:25 All right.

56:26 Well, I think we're just about out of time there, Andrew.

56:29 So I'll hit you with the final two questions.

56:31 Sounds good.

56:32 Yeah.

56:32 If you're going to write some Python code these days or editors, do you use?

56:35 Ah, yes.

56:36 The famous question.

56:37 When I'm writing real Python code, I'm using Visual Studio.

56:40 When I'm doing exploratory data analysis stuff, I can't be Jupyter.

56:44 Yeah.

56:44 Those are my two go-tos.

56:45 I used to be an Atom with Vim for my real coding stuff.

56:50 But the newer versions of Visual Studio are just so good, I can't pass it up anymore.

56:54 Even though I liked being a rebel and doing completely open source, like Vim type stuff.

56:59 Yeah.

57:00 Well, and then Atom and Visual Studio Code have such similar origins, right?

57:06 They're both Electron.

57:07 Atom came from GitHub.

57:08 Microsoft now overseeing GitHub.

57:11 I think there's a lot of zen between Visual Studio Code and Atom these days.

57:18 So it's not that wild, right?

57:20 It's not that different.

57:21 That's cool.

57:22 And then you already mentioned some really good Python packages and libraries out there.

57:26 But other notable ones maybe worth throwing out there?

57:28 Yeah.

57:28 NetworkX.

57:29 I really love NetworkX for doing graph theory and things like that.

57:33 And it's a very good connection with like Pandas and the Python dictionaries to be able to have different nodes and edges and things.

57:39 So if you're doing any type of network-based work, which I've done at some previous companies and PhD program, it's a very good library to have.

57:46 My favorite machine learning library has to be SkyKit Learn.

57:49 It's just so easy to use, intuitive.

57:51 The documentation, like I've never seen a Python library with such good documentation and so many good examples.

57:57 Yeah.

57:57 And then for Alibi, I really like as well for like that Anchors implementation I was talking about.

58:02 So those are probably my three at the moment.

58:04 But the change is weekly.

58:05 Yeah.

58:07 Those are great.

58:08 Awesome.

58:08 And then final call to action.

58:10 People want to put their machine learning models into production or they have them and they want them to be better.

58:15 Maybe what can they do in general?

58:17 And then also how do they learn about what you guys are up to?

58:20 Definitely.

58:20 Monitor.ai.

58:22 Come check us out.

58:23 If you're in a regulated industry and you're wanting to deploy a machine learning model, we offer services to help you do that.

58:28 And also the machine learning record platform that will allow you to have that audit trail and the counterfactuals and all the things we talked about in this episode to allow you to get past the auditors and regulators and being able to move those models to protection.

58:41 Yeah.

58:41 Cool.

58:42 All right.

58:42 Well, it's been really fun to talk about machine learning and watching how it goes, verifying what it does.

58:48 And, you know, I think these, like you said, these are the things that are going to need to be in place before it can really sort of serve the greater public or the greater companies out there that need to make use of it.

59:00 Right.

59:00 So great topic to cover.

59:02 Thanks.

59:02 Thank you so much.

59:03 Been great talking to you.

59:04 Yeah, you bet.

59:04 Bye bye.

59:05 Bye.

59:05 This has been another episode of Talk Python To Me.

59:09 Our guest on this episode was Andrew Clark, and it's been brought to you by Linode and Reuven Lerner's weekly Python exercise.

59:17 Start your next Python project on Linode's state-of-the-art cloud service.

59:21 Just visit talkpython.fm/Linode, L-I-N-O-D-E.

59:25 You'll automatically get a $20 credit when you create a new account.

59:28 Learn Python using deliberate practice every week with Reuven Lerner's weekly Python exercise.

59:35 Just visit talkpython.fm/exercise.

59:39 Want to level up your Python?

59:41 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

59:45 Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python.

59:54 And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

59:58 It's like a subscription that never expires.

01:00:00 Be sure to subscribe to the show.

01:00:02 Open your favorite podcatcher and search for Python.

01:00:05 We should be right at the top.

01:00:06 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:00:15 This is your host, Michael Kennedy.

01:00:17 Thanks so much for listening.

01:00:18 I really appreciate it.

01:00:20 Now get out there and write some Python code.

01:00:21 Thank you.

01:00:41 Thank you.