Learn Python with Talk Python's 270 hours of courses

#351: Machine Learning Ethics and Laws Panel Transcript

Recorded on Friday, Dec 17, 2021.

00:00 The world of AI is changing fast, and the AI-ML space is a bit out of the ordinary for software developers.

00:05 Typically in software, we can prove that given a certain situation, the code will always behave the same.

00:11 We can point to where and why a decision is made.

00:14 ML isn't like that. We set it up and it takes on a life of its own.

00:17 Regulators and governments are starting to step in and make rules over AI.

00:22 The EU is one of the first to do so.

00:24 That's why it's great to have Ines Montani and Catherine Jarmol, both awesome data scientists and EU residents,

00:29 here to give us an overview of the coming regulations and other benefits and pitfalls of the AI-ML space.

00:35 This is Talk Python to Me, episode 351, recorded December 17th, 2021.

00:54 Welcome to Talk Python to Me, a weekly podcast on Python.

00:58 This is your host, Michael Kennedy.

00:59 Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past episodes at talkpython.fm.

01:05 And follow the show on Twitter via at Talk Python.

01:09 This episode is brought to you by Sentry and Signal Wire.

01:13 Please check out what they're offering during their segments.

01:15 It really helps support the show.

01:16 Catherine Ines, welcome to Talk Python to Me.

01:20 Hey, it's great to be back.

01:22 Hi, Michael.

01:22 Yeah, it's great.

01:23 Great to have you here.

01:25 You've been here a bunch of times.

01:26 Catherine, have I had you on before?

01:27 Yeah, I think so.

01:28 A while ago now.

01:29 I think so as well, but it's been a really long time, hasn't it?

01:32 Yeah, yeah.

01:33 It's great to have you back.

01:34 This is a very Berlin-focused podcast today.

01:37 Just unrelated.

01:40 Both of you happen to be there, so that's really cool.

01:42 Thank you for taking time out of your evening to be part of the show.

01:45 Of course.

01:46 Yeah.

01:46 All right.

01:47 Well, we're going to talk about machine learning, some of the rules and regulations coming around

01:54 there, especially in Europe.

01:55 We're going to talk about fairness.

01:58 We're going to talk even a little bit about interesting indirect implications like Google,

02:03 or not Google, GitHub Copilot, and these types of things.

02:07 We'll sort of go through the whole ML space and talk about some of these ideas.

02:11 But you both are doing very interesting things in the machine learning space.

02:15 Let's sort of just get a little bit of your background.

02:17 Catherine, tell people a bit about what you're doing these days.

02:20 Yeah, I'm here in Berlin and focused on how do we think about data privacy and data security

02:26 concerns in machine learning.

02:27 So for the past five years, I've worked on the space of how do we think about problems

02:32 like anonymization, differential privacy, as well as how do we think about solutions like

02:37 encrypted learning and building ways that we can learn from encrypted data.

02:41 So it's been really fun and I'm excited.

02:44 But first, first to publicly announce here that I'll be joining ThoughtWorks in January.

02:50 That's awesome.

02:51 Yeah.

02:51 As a principal data scientist, their focus exactly on this problem, which they've been

02:57 noticing here in Europe is how do we think about data privacy and data security problems

03:02 when we think about machine learning.

03:04 It's a growing concern.

03:05 So it should be pretty exciting.

03:07 Yeah.

03:07 A company like ThoughtWorks is one of these companies that works with other companies a lot

03:13 and sort of this consulting side of things.

03:15 And I feel like you can have a pretty wide ranging impact through those channels.

03:20 Yeah.

03:20 Yeah.

03:21 Do you think that being in Germany, there's more focus on these kinds of things in Europe,

03:27 but especially in Germany, it seems like, than a pair, like say in the US?

03:32 Does the US have more of a YOLO attitude towards privacy and machine learning stuff?

03:39 Yeah.

03:39 Yeah.

03:39 I mean, I think just from a regulatory aspect, since we saw a passage of the GDPR, which is

03:45 the big European privacy law in 2018 that it went into effect, we definitely saw kind of

03:51 a growing trend here in Europe.

03:53 Overall, I would say like actually France and the Netherlands have done quite a lot of good

03:59 work, even Ireland at questioning, let's say larger tech usage.

04:04 But the activism, I would say on the ground activism here in Germany from the chaos communication

04:10 club and other types of activists that are here has been quite strong, which is exciting

04:16 to see.

04:16 And therefore, I think leads kind of ends up being in the headlines, maybe a bit more internationally.

04:22 Yeah.

04:23 Also actually a fun fact that the story I always like to tell Americans is that like,

04:27 if you go on Google Street View here in Berlin, it's an awesome, I don't know, time travel

04:31 because the day to day, it's like over 10 years old now.

04:34 So you can be there.

04:35 And Berlin's heavily gendered right now.

04:36 So you can really say, wow, how did my neighborhood look 10, 12 years ago?

04:40 Because Google did it once.

04:41 They never came back because everyone wanted their buildings pixelated.

04:45 And they were like, okay, fuck this.

04:47 Germany's so difficult.

04:48 We're never going to send our cars to here again.

04:50 So, you know, I definitely encourage you to use Google Street View in Berlin.

04:54 It's really fun from like the historical perspective.

04:56 How funny.

04:57 Yeah.

04:58 So you can go and basically say, I want my house fuzzed out.

05:02 So you can't see details about my personal residence.

05:05 Yeah.

05:05 And a lot of buildings will look like that.

05:06 Yeah.

05:06 Yeah.

05:07 Yeah.

05:07 If I went to my place on Google Maps, you can see it evolve over time.

05:11 Like, oh, that's when I still have that other car before it broke down or crashed or whatever.

05:17 And I can sort of judge how old the pictures are by, you know, is it what season is it?

05:22 What's in the driveway?

05:24 Or what's the porch look like?

05:26 What kind of, you know, chairs do we have?

05:28 There's all sorts of detail.

05:29 Like, none of it's obscured, right?

05:32 There's a fun fact that some researchers worked on in the U.S. of could they do the census just via Google Street View?

05:40 And they found there was a heavy correlation between census data and the makes of cars that people had in their driveway.

05:46 Oh, my goodness.

05:48 Wow.

05:48 It's an interesting paper.

05:50 Yeah.

05:50 I think, actually, Timnit Gabrou might have been on that paper as well.

05:54 The kind of very well-known machine learning ethics researcher who's now running her own organization in the space.

06:00 Anyways, it's a really cool paper.

06:02 I'll see if I can find it and send it to you for the show notes, Michael.

06:04 Yeah.

06:05 Yeah, put it in the show notes.

06:06 Awesome.

06:06 All right.

06:07 Well, congratulations on the ThoughtWorks thing.

06:09 That's really cool.

06:09 Ines, how about you tell people about yourself as well?

06:12 It's been almost a year, I think, maybe since I had you on Talk Python.

06:16 Yeah, I think it was the year in review.

06:18 I was in Australia at the time.

06:20 It was summer.

06:21 Exactly.

06:22 Now I'm in Berlin.

06:23 Now it's winter.

06:24 Cool.

06:24 Yeah.

06:25 I'm still the co-founder of Explosion.

06:26 We're probably most known for our open source library at Spacey, which is an open source library for natural language processing.

06:33 And one of the main things people do with our stack is build NLP and machine learning applications.

06:39 We also published an annotation tool called 4G, which allows creating training data for machine learning models.

06:44 And all our work and everything we do is very focused on running things on your own hardware, data privacy.

06:50 And that's also something that's very important and something that we see our own users and customers do.

06:55 So people want to train their own models and actually think about how do I create my data?

07:00 What do I do to make my model good?

07:02 What do I do to make my application work?

07:04 And so this all ties in quite well with other questions about like, okay, what should I do?

07:09 How should I reason about my data?

07:12 Which we also see as a very, very important point.

07:14 And I actually think this can even prevent a lot of problems that we see.

07:17 If you actually just look at your data, think about what do I want my system and my pipeline to do?

07:23 How do I break down my problem?

07:25 And that's kind of, that's exactly what, yeah, the tools we're building, hopefully helping people to do.

07:30 Yeah, fantastic.

07:31 So you have Spacey and then you have some, that's the open source thing.

07:34 You also have some products on top of that, right?

07:36 Yeah, exactly.

07:37 So we have Prodigy.

07:38 If you scroll down a bit here, you'll link at the page.

07:41 Yeah, exactly.

07:42 Prodigy, that's an annotation tool.

07:44 And we're currently working on an extension product for it that is a bit more like a SaaS cloud tool, but has a private cloud aspect.

07:52 So you can bring your own hardware, you can run your code, your data on your own cloud resources.

07:57 So no data, nothing has to leave your service.

08:00 And that's something that people already find very appealing about Prodigy.

08:03 You can just download it, run it.

08:04 It doesn't send everything, anything across over the internet.

08:09 And yeah, that's also what we're going to keep doing.

08:11 Yeah, I love that you all are embracing that because there's such, you know, we'll get into this later.

08:16 Not the first topic, but it's related.

08:18 So I'll just talk a bit about it.

08:19 I really like that you're not sending the people's data back because if they're going to trust your tools,

08:25 they need to know that you're not sharing data that is either part of their competitive advantage

08:31 or they have to protect for privacy reasons.

08:33 I recently started, got into the GitHub Copilot trial and I installed that and it said, or preview, whatever it's called.

08:42 And they said, oh, you just have to accept this agreement where it says, if you generate code from us, we're going to get some analytics.

08:51 I'm like, all right, that's fine.

08:52 Whatever.

08:52 I ask it how to connect to SQLAlchemy because I forgot it'll just tell me.

08:55 Oh, and if you make any edits, we're going to send those back.

08:58 I'm like, wait a minute.

09:00 What if one of the edits has put my AWS access key in there?

09:03 Because it needs to be there or, you know, for a little not thing that I'm going to publish, but it's still going back.

09:08 Right.

09:09 So there's a lot of things and I just uninstalled.

09:11 I'm like, you know what?

09:12 No, this is, this is just too much risk for too little benefit for me in my world.

09:17 Yeah.

09:17 I think we also see a lot of it is also kind of pointless.

09:20 I think there used to be this idea that like, oh, you can just like collect all the data.

09:25 And then at some point you can do some magical AI with it.

09:28 And I think for years, this used to be the classic pitch of every startup.

09:31 Like, I don't know, it's almost used to be what we work pitched in some way.

09:34 It's usually like, oh, we do X and then we collect data and then it's dot, dot, dot.

09:39 And then it's AI and then it's profit.

09:41 And that used to be how people would like map out their business.

09:45 And I think this has changed a bit, but I think you can still see some of the leftovers where,

09:49 you know, companies are like, oh, we might as well get as much data as possible because

09:53 maybe there's something we can do with it.

09:55 And we've always seen working in the field that like, nah, I don't want, I don't want

09:59 new annotations.

10:00 Yeah.

10:00 Like, there's literally no advantage I get from that.

10:04 So I might as well set up the tools so that you can keep them.

10:07 That's perfect.

10:08 And Catherine, Mr. Hypermagnetic says, hi, I thought KJ Amistan was a hip new tech stack.

10:14 No, it's my company name.

10:17 So yeah, yeah, yeah, yeah.

10:19 It's like a good inside joke that lives on many, many decades later.

10:24 I love it.

10:25 All right.

10:25 Well, let's kick things off with some regulation and then we can go and talk more thing about

10:31 maybe some other laws.

10:33 We could talk about things like copilot and other stuff.

10:35 But I did say this was a European focused, at least kickoff to the show here.

10:42 So I think one of the biggest tech stories out of the EU in the last couple of years has got

10:48 to be GDPR, right?

10:49 Like that's, I still heard just yesterday people talking about, well, this because of GDPR and

10:55 because we're doing that.

10:56 And this company is not right.

10:57 There's still just so much awareness of privacy because of GDPR.

11:01 And I do think there are some benefits and I think there are some drawbacks to it, but

11:06 it certainly is making a difference, right?

11:08 And so now there's something that might be sort of an equivalent on the AI side of things,

11:15 right?

11:15 I mean, not exactly the same, but that kind of regulation.

11:18 Yeah, yeah.

11:19 It's interesting because it's been a work in progress for some years, like the GDPR was.

11:24 So the initial talks for the GDPR started, I think, in like 2014, 2015, didn't get written

11:30 until 2016, went into effect in 2018 and still a topic of conversation.

11:35 Now, many years later, some pluses, some minuses, right?

11:40 We can talk about how GDPR was intended versus how we've seen it rolled out across the web,

11:45 which is quite different than what was intended, obviously.

11:48 I think that's always a problem with a lot of regulations and the EU in general.

11:53 You see, like, you know, I'm very pro-regulation and I think the idea of GDPR is great, but of course,

11:58 you know, once a large organization like the EU rolls things out, it can kind of go a bit

12:04 wrong here and there.

12:05 Let me set a little foundation about why I sort of said, I think there are some negatives.

12:09 I think the privacy stuff is all great.

12:12 I think the ability to export your data is great.

12:14 The ability to have it erased, to know what's being done with it.

12:17 These are all good things.

12:18 I feel, though, that it disproportionately was difficult for small businesses, but was

12:24 aimed at places like Facebook and Google and stuff.

12:27 So, for example, for like my courses and stuff, I had to stop doing anything else for almost

12:33 two weeks and rewrite a whole bunch of different things to comply.

12:36 And to the best of my knowledge, I'm doing it right.

12:40 But who knows?

12:41 Whereas Facebook didn't shut down for two weeks.

12:44 You know, they had a small team who did it, right?

12:46 Yeah.

12:47 Well, no, they had quite a bit internal engineering work.

12:51 Only as a percentage.

12:52 Only as a percentage of total employees.

12:54 I mean, small.

12:54 But they actually had to shut down several products that are no longer available in Europe

12:59 that are available in other jurisdictions.

13:01 And also, when we look at who's been fined, it's been predominantly the FANGs and other

13:06 large operators.

13:07 I do think that the enforcement is focused on the FANG side of things.

13:12 Yeah.

13:12 Which is fair.

13:13 It's like, you know, which is basically what most folks said when it went into enforcement

13:17 is like, yes, we believe these are things that everybody should be doing to better look

13:22 after the security of sensitive data, regardless of, you know, the provenance, so to speak.

13:27 But also, we intend to employ this legislation to look after these problems, right?

13:34 And everything that Max Scrams has been doing, he's based here in Germany, and he's been

13:40 filing quite a lot of amazing lawsuits against a variety of the FANGs and been getting some

13:46 interesting rulings, let's say, from the European courts.

13:50 Yeah, good.

13:50 Ines, how was the GDPR for you?

13:52 Before we get to the next law, at Explosion, was it a big deal?

13:56 Not so much, because I think actually already our standards were such that we weren't really

14:01 doing anything that was violating or intended to violate what then became GDPR.

14:06 I think it was just, you know, we had to go through some things to make, I don't know,

14:10 we've always intended to not have any cookies on our sites.

14:14 So we got in the first place.

14:16 And then, you know, it's actually a lot of work to make sure that nothing you use tries

14:21 to sneak some cookies in there.

14:22 And then you're like, ah, I used the wrong URL here.

14:25 Now I have all these YouTube cookies again.

14:27 But in general, you know, we were already, even before it came up or before we, you know,

14:34 GDPR really came out, we realized that, oh, we're actually quite compliant or like, at least

14:38 we already aim to be compliant.

14:40 We didn't have to do very much.

14:41 I think I was too, in terms of principle, but not exactly in practice.

14:47 There were those types of things.

14:48 Like, for example, I had the discuss comment section at the bottom of all the pages.

14:53 And then I realized they were putting double click cookies and Facebook cookies and all

15:00 sorts of stuff.

15:00 I'm like, wait a minute.

15:01 I don't want people who come to my page to get that, but I'm not trying to use it.

15:05 It's like this cascading chain.

15:06 And yeah, like embedding YouTube videos.

15:09 We go to a lot of work to make sure that it doesn't actually embed YouTube.

15:13 It does a thing that then allows you to get to the YouTube without that.

15:16 It's like that kind of stuff, right?

15:18 But still, I think it's good.

15:19 I think it's pretty good in general.

15:21 But let's talk about machine learning.

15:23 AI stuff.

15:24 So I pulled out some highlights here.

15:27 Let me maybe throw them out and you all can react to them.

15:30 So we've got this article on techmonitor.ai called, The EU's leaked AI regulation is ambitious, but disappointingly vague.

15:40 New rules for AI in Europe are simultaneously bold and underwhelming.

15:44 I think they interviewed different people who probably have those opinions.

15:49 As you could see through the article, it's not the same person necessarily.

15:52 It holds both those at once.

15:53 But this was leaked in April 15th of this year.

15:57 And I think seven days later or something, the actual thing was published.

16:00 So it's not so much about the leak, just that the article kind of covers the details, right?

16:05 This is still not unknown, is it?

16:07 No, no, no, no.

16:08 The full text is available.

16:09 And there's been a lot of good kind of deeper analysis from variety perspectives.

16:15 This portion of Talk Python To Me is brought to you by Sentry.

16:21 How would you like to remove a little stress from your life?

16:23 Do you worry that users may be encountering errors, slowdowns, or crashes with your app right now?

16:29 Would you even know it until they sent you that support email?

16:32 How much better would it be to have the error or performance details immediately sent to you,

16:37 including the call stack and values of local variables and the active user recorded in the report?

16:43 With Sentry, this is not only possible, it's simple.

16:46 In fact, we use Sentry on all the Talk Python web properties.

16:50 We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got the support email.

16:56 And that was a great email to write back.

16:58 Hey, we already saw your error and have already rolled out the fix.

17:02 Imagine their surprise.

17:03 Surprise and delight your users.

17:05 Create your Sentry account at talkpython.fm/sentry.

17:09 And if you sign up with the code talkpython, all one word, it's good for two free months of Sentry's business plan,

17:16 which will give you up to 20 times as many monthly events as well as other features.

17:21 Create better software, delight your users, and support the podcast.

17:25 Visit talkpython.fm/sentry and use the coupon code talkpython.

17:30 Catherine, you want to give us a quick summary of what the goal of this is?

17:36 Yeah, so I think, I mean, when I first got wind that this was going to be happening,

17:42 I was talking with some folks at the Bundesministerium intern, which is basically the intern German administration here.

17:52 So you could think of like the, if we had a, in the U.S., sorry, U.S.-centric, in the U.S., if we had like an office of Homeland Security and the interior,

18:02 and they were all together, and they like also did like FTC-like things, that's what it would be.

18:07 Anyways.

18:08 And they have a group called the Dotton Ethic Commission, which is Data Ethics Commission, and they had built several large reports on thinking and analyzing about the risk of algorithmic-based systems and algorithmic-based decision-making,

18:21 which has been a topic of conversation, obviously, for a long time.

18:25 Eventually, all of that, what I found out was that they were talking then with other groups in the EU about forming a regulation like this.

18:33 And if anybody wants to read the German Dotton Ethic Commission report, which also is available in English,

18:40 you can see that a lot of the ideas are kind of taken and transferred there,

18:44 which is basically like when we think about AI systems, can we analyze the level of risk that they would have, let's say, in use in society?

18:55 So you can think of very high risk being like bombing people and very low risk.

19:00 Bombing people like drones or self-flying planes.

19:04 Absolutely.

19:05 I mean, we have drones that bomb people.

19:07 Yes, it's true.

19:08 That's the thing that happens in the world.

19:09 That is the thing.

19:10 But what is less common is that you just send the drone out and say, go find the, quote, bad people and take care of them, right?

19:17 There's still usually a person somewhere that makes a decision.

19:21 And so I don't think we want a world where we just send out robots to take care of stuff and just tell them to go.

19:27 Some people want that because it can be very nice to like, you know, absolve yourself of that responsibility.

19:34 Because if you're the one pressing the button, you have to answer for that.

19:38 And you have to take accountability.

19:40 If the machine did that, well, I mean, it's kind of like this problem of,

19:44 okay, whose fault is it if the self-driving car kills?

19:47 Yeah.

19:48 Yeah.

19:49 There's a really great psychological theorem around that too called the moral crumple zone,

19:54 which is basically talks about how the nearest human to an automated system gets blamed.

20:01 So, yeah.

20:03 So, you know, it's like, well, why didn't you do something?

20:06 I don't know.

20:06 The computer said yes.

20:08 So, you know, it's interesting psychology that we use to like judge people.

20:14 You should have done something.

20:15 Yeah.

20:16 And I do think actually in the self-driving car example, I think a lot of people would actually say,

20:19 yeah, the developer who built that system that made the decision to drive forward and not stop is to blame.

20:26 So, you know, that does check out.

20:28 Yeah.

20:28 It could be.

20:28 I really like the crumple zone analogy.

20:30 I don't know if I'm receiving it correctly, but, you know, if you're in a car crash, like the radiator is going to get smashed straight away.

20:37 Like that's the first thing when it caves in.

20:39 But the driver back in the middle might be the one who did it in the software world.

20:44 Maybe the equivalent is, yeah, the developer made that choice, but they made that choice because the CEO and the manager said, we're optimizing for this and we don't care.

20:54 We want to ship faster or we want to make sure this scenario is perfect.

20:58 And they're like, you know what?

20:59 That's going to have a problem when it gets snowy and we can't see this line.

21:02 Like, you know what?

21:02 This is what we're aiming for.

21:04 And like they don't necessarily make them do it, but they say this is really where you got to go.

21:09 So crumple zone, I like it.

21:10 Yeah.

21:11 Yeah.

21:11 And actually, I think just to make it clear, the crumple zone was like that the driver would get blamed rather than the company that produced the software.

21:19 And so like, or the operator of the radiology machine would get blamed, like rather than the producers, because you kind of create this like trust, inherent trust, like, well, you know, like they're building a self-driving car.

21:33 Clearly, it's not their fault.

21:34 It's the driver's fault.

21:35 Why weren't you paying attention or something like this?

21:37 Right.

21:38 Yeah.

21:38 Yeah.

21:38 Yeah.

21:39 Absolutely.

21:40 So really quickly, I just want to say none of us are lawyers.

21:45 So don't take any of this advice and go do legal things.

21:49 Talk to real lawyers.

21:50 But I do want to talk about the law.

21:52 And so one of the things the article points out is these rules apply to EU companies and those that operate in the EU.

21:58 And then what is way more broadly for tech companies or impact EU citizens.

22:03 Right.

22:04 So if you have a website and EU citizens use it or you have an app and EU citizens use your API and it makes decisions, probably this applies to you as well.

22:12 I'm guessing.

22:13 Yeah.

22:13 We'll see in practice how it gets rolled out.

22:16 But yeah, it's always about the case law afterwards.

22:19 But in theory, yes.

22:21 Yeah, yeah, yeah.

22:22 And it's about it's mainly about the, you know, documenting risk is, I guess, what I would say, like documenting and addressing risk.

22:29 So one interesting thing about it, and I'd be curious to hear both of y'all's thoughts around it, is kind of bringing to the forefront the idea of auditing AI systems and what should be done to better audit and document problems in automated systems like AI systems or machine learning systems.

22:49 That I find quite interesting.

22:51 Would be curious to hear y'all's take.

22:53 Yeah.

22:53 I mean, I think even fundamentally, I think that's also something that's pointed out in the article is that there's already this problem of like, how do you even, you know, define what, when AI system starts, where it ends, what is the system?

23:05 Is it the, just the component?

23:07 Is it the model by itself?

23:09 The same model can be used in all kinds of different use cases.

23:12 And some of them can be bad and some of them can be good.

23:14 So it has to be the larger component.

23:17 But then, you know, so even where does AI really like, you could have a robot system that does the exact same thing.

23:22 It's that like exempt from, even if the outcomes are like pretty much identical.

23:26 I think that's already where it gets pretty problematic.

23:29 And where I think you also probably see a lot of people being able to get away with things.

23:34 Yeah.

23:34 The law seems to try to characterize the behavior of the software rather than the implementation or the technology of it.

23:42 Right.

23:42 It doesn't say, you know, when a neural network comes up with an outcome or something along those lines.

23:49 And they talked about how the idea is to try to make it more long-lived and more broadly applicable.

23:55 But also, you know, that could result in ways to sneak around it.

24:00 You know, technically it's still doing what the law is trying to cover.

24:04 But, you know, somehow we built software that doesn't quite line up.

24:07 Yeah.

24:07 To get back to this auditing question, I do think this is, it's definitely a very interesting part of it.

24:12 And I do think that it also shows that stuff like interpretability will become much more relevant and interesting, even more relevant going forward.

24:20 Because I do think if, you know, we end up with laws like that, companies will have to be able to either just explain internally why did my system make a certain decision.

24:30 Or maybe even this has to be communicated to the user or to, you know, to the citizen in a similar way of how with GDPR you can request your data.

24:40 Maybe you can be able to request more information on why a certain outcome was produced for you or which features of you the system is using.

24:50 And I think that all that does require.

24:52 The feature seems completely doable.

24:54 It seems entirely reasonable to say, well, we used your age, your gender and your income and your education level to make this decision.

25:01 I think maybe more tricky is why did it make a decision?

25:07 I mean, you two know better than I do, but it seems really, I can sit down and read code and say, well, there's an if statement that if this number is greater than this, we do that.

25:15 But it's really hard to do that with neural networks, right?

25:18 Yeah.

25:18 And machine learning gets tricky because machine learning is always cold in data.

25:21 And yeah, so it's kind of, it's this software 2.0 idea where even, you know, testing a system is much more difficult.

25:27 Like if you're just writing straightforward Python code, you can then write a test and you can say, oh, if this goes in, this should come out.

25:33 And if that's true, then yeah, you have a green test and then exactly.

25:36 Exactly.

25:37 But I mean, it kind of is a testament to the fact that we don't really test machine learning systems today very well.

25:44 Like we have very, you know, early in the whole ML upside of the equation.

25:50 And I think one of the things, so first off is a lot of these audits, people think they're going to be self-assessments.

25:57 So leave that to question mark of how a self-assessment is going to work at any type of scale.

26:03 But then also they put forward, one thing that I really liked about this law is they actually put forward things that people should be testing for, like security of the models, like privacy of the models, like the interpretability of the models and so forth.

26:16 And I would say that most places that are throwing machine learning into production today do not test for any of those things before it's done.

26:25 No, I think it's mostly like, oh, some accuracy score and then maybe some feedback of a music and then it's what you go.

26:31 Does it seem like a reasonable outcome?

26:33 Okay, good.

26:33 It's working.

26:34 Yeah.

26:34 And I think that there's also a lot of, you know, we see this kind of disconnect if you go from academia to industry where like, you know, you can't like academia works differently.

26:42 You have very different objectives and that's great.

26:44 You're trying to explore new algorithms.

26:46 You're trying to solve problems that are really hard and then see what works best on them and, you know, rank different technical solution.

26:53 In industry, you're actually shipping something that affects people.

26:56 And yeah, if you just apply the exact same metrics, that's just not enough.

27:00 What do you think about testing by coming up with scenarios that should pass or not pass?

27:06 For example, if you're testing some kind of algorithm that says yes or no on a mortgage for a house or something like that, just say, look, okay, I would expect that a single mom would still be able to get a mortgage.

27:21 So let's, you know, have a single mom apply to the model and see what you come up with these scenarios.

27:26 If it fails any of these, it's unfair and try to give it examples.

27:31 Is it possible?

27:31 I don't want to call it cute.

27:32 Like, you know, it's a very idealistic view and that's all very nice, but I do think, I think I personally, I see two problems with this.

27:39 One is that a lot of, you know, AI systems are usually quite different.

27:42 Not everything is as straightforward as like, oh, I have a pipeline here that predicts whether you should get a mortgage.

27:46 It's often like lots of different components.

27:48 Every company tries to solve very, very different problems.

27:51 So you can't like easily develop a framework where, you know, you have one input and one output.

27:55 Usually predicting the thing is one tiny part of the much larger application.

28:00 And then also, okay, if you have like, you know, it seems like, oh, should someone get a mortgage?

28:05 Should a private company give you a mortgage or not?

28:07 I think a lot of, you know, companies would find that, oh, maybe it's up to us whether, you know, you get a mortgage or not.

28:13 There's no, you know, there's no general framework for, and, you know, maybe with a mortgage, it's a bit different.

28:18 But there are so many applications where, yes, you can say it's really unfair, but it's still, you know, within the realms of like what a company would argue is, you know, up to their discretion.

28:29 And I'm not defending that.

28:30 I'm just saying it's very, very difficult to say, oh, you're being unfair.

28:33 It's not as straightforward as my naive example.

28:36 At least in the U.S., like there's actually laws for this.

28:40 So like equal treatment or disparate treatment, we would say.

28:43 So like there's actual mathematical, it's a statistical relationship, like 70, 70, 30 or 80, 20.

28:50 I think that you can show that there's disparate treatment.

28:53 So for example, if you could prove that there's that much of a difference if you're a single mother, let's say, versus other groups and you actually have a legal court case, you can take the bank to court and you can sue them.

29:04 So there's some precedence for equal treatment, at least in some countries and some jurisdictions.

29:10 And I think, but from thinking about the mathematical problem of fairness, I mean, in all of the research that a lot of really amazing, intelligent researchers have done is they've shown that fairness definitions and the choice of fairness and how you define it can actually be mathematically diametrically opposed.

29:28 So depending on what definition you choose, and there's a whole bunch.

29:32 So Arvind Narayanan and his group in the U.S. have been doing a ton of research on this.

29:37 There's a bunch of folks that have been doing research for more than a decade on this.

29:41 All the fat ML people, the fairness, accountability and transparency and ML conferences that run every year have been doing, again, two decades nearly of research on this stuff.

29:51 But it's not a solved problem, even if, let's say, you choose a fairness definition mathematically, you measure the model, you have met that requirement.

30:01 It doesn't mean that actually what you're trying to show in the world or what you're trying to do in the world from like how we humans would define fairness is what you've met.

30:10 Right.

30:11 So, yeah.

30:12 Yeah.

30:12 Statistics and intuition are not necessarily the same for sure.

30:16 Yeah.

30:16 Catherine, you had an interesting comment about fairness is not the only metric before we started recording.

30:23 Oh, yeah.

30:24 You want to share that?

30:24 Yeah.

30:25 I mean, the question that I like to ask people is, let's say you're building a computer vision system to bomb people, to identify people and bomb the right targets.

30:37 If you said it performed fairly, let's say, in relation to gender or in relation to skin color, to the darkness of your skin color, would that be an ethical or fair system?

30:48 Yeah.

30:49 It's hard.

30:50 It's not.

30:50 It's certainly not an easy answer.

30:52 Yeah.

30:52 Yeah.

30:52 So there's more to it.

30:54 So one of the things that's interesting about this law is that it talks about high risk AI systems and it refers to those pretty frequently through there.

31:04 So high risk AI systems include those used to manipulate human behavior, conduct social scoring or for indiscriminate surveillance.

31:13 Those are actually banned in the EU, according to this law.

31:17 Right.

31:18 I mean, I guess you can, you can tell us read, like you can, you can, by reading, you can read who this was written for and who, you know, you know, who they had in mind when they wrote that.

31:27 I think it's quite, you know, it's quite clear what, what types of applications and what types of companies.

31:32 Yeah.

31:32 Yeah.

31:32 The social scoring stuff is really creepy, but yeah.

31:35 Indiscriminate surveillance also.

31:37 And then it also talks about how special authorization will be required for remote biometric identification.

31:46 This is, I'm guessing, types of biometric identification that you don't actively participate in.

31:52 Right.

31:52 You don't put your fingerprint on something, but you just happen to be there.

31:56 They call out specifically facial recognition, but I've also heard things like gait, the way that you walk and weird stuff like that.

32:03 So it's not banned, but special authorization will be required.

32:06 Oh yeah.

32:07 You're typing too, right?

32:08 Yeah.

32:08 Even your, your typing pattern is quite, or is more unique than you think.

32:13 Yeah.

32:13 Yeah.

32:14 Yeah.

32:14 Yeah.

32:14 Yeah.

32:14 Actually, even, even more than sort of old school fingerprinting.

32:17 I'm always like amazed at what life can be done to like uniquely identify you on the internet,

32:22 even without having any personal identifiable information.

32:28 This portion of Talk Python to me is brought to you by SignalWire.

32:31 Let's kick this off with a question.

32:33 Do you need to add multi-party video calls to your website or app?

32:36 I'm talking about live video conference rooms that host 500 active participants, run in the browser, and work within your existing stack, and even support 1080p without devouring the bandwidth and CPU on your users' devices.

32:49 SignalWire offers the APIs, the SDKs, and edge networks around the world for building the realest of real-time voice and video communication apps with less than 50 milliseconds of latency.

32:59 Their core products use WebSockets to deliver 300% lower latency than APIs built on REST, making them ideal for apps where every millisecond of responsiveness makes a difference.

33:10 Now, you may wonder how they get 500 active participants in a browser-based app.

33:14 Most current approaches use a limited but more economical approach called SFU, or Selective Forwarding Units,

33:20 which leaves the work of mixing and decoding all those video and audio streams of every participant to each user's device.

33:26 Browser-based apps built on SFU struggle to support more than 20 interactive participants.

33:32 So SignalWire mixes all the video and audio feeds on the server and distributes a single unified stream back to every participant.

33:39 So you can build things like live streaming fitness studios where instructors demonstrate every move from multiple angles,

33:45 or even live shopping apps that highlight the charisma of the presenter and the charisma of the products they're pitching at the same time.

33:52 SignalWire comes from the team behind FreeSwitch, the open-source telecom infrastructure toolkit used by Amazon, Zoom,

33:59 and tens of thousands of more to build mass-scale telecom products.

34:03 So sign up for your free account at talkpython.fm/signalwire, and be sure to mention Talk Python to me to receive an extra 5,000 video minutes.

34:12 That's talkpython.fm/signalwire, and mention Talk Python to me for all those credits.

34:16 Another thing that stood out to me I thought was fun is that people have to be told when they're interacting with an AI system.

34:27 So you have to explicitly say, hey, this thing that you're talking to here, this one, you're not talking to a human right now, you're talking to a machine.

34:36 Yeah, we'll see if it gets rolled out like cookies.

34:41 It's like a big blurb of text that says there may or may not be automated systems that you interact with on this product or something.

34:49 Yeah.

34:50 Yeah, I think it's almost like, I don't know, like a classic disclaimer, because I think the way it's written probably makes people think more about conversational AI,

34:57 but I do think this also covers everything else.

35:00 And if you use some component that does some arbitrary predictions in order to, I don't know, this can go like 20 levels deep,

35:07 then you need this disclaimer on it.

35:08 And then, unfortunately, I think the appropriate side effect of that will be that, okay, people are less likely to notice it,

35:14 because it will have to be on everything.

35:16 Like, you know, even really small features you might want to have on your website that do use AI in some form or another.

35:22 It seems totally reasonable.

35:24 Maybe unnecessary, but certainly reasonable.

35:26 But yeah, you're right.

35:28 It's going to be like the cookie notices.

35:30 You know, so if you go to say on Netflix, and you go to watch a movie, well, that list of recommended for you, do you have to like, okay, this?

35:39 I'm not sure if I can find it.

35:41 Just it will just be a Netflix is like terms and conditions.

35:43 When you would set those terms and conditions that most people probably don't agree, you will accept that yes, everything you're interacting with is AI.

35:49 By the way, here's a very long 20 page document about how we may or may not use automated systems in your use of this website.

35:57 Yeah, exactly.

35:58 Exactly.

35:59 Exactly.

36:01 Have fun.

36:03 Let me know how the reading goes.

36:05 I said I thought there was a lot of stuff that came out of the GDPR that was pretty good.

36:10 This website may use cookies.

36:12 That to me is the worst.

36:14 It's the worst.

36:15 It's like, do you want to be able to log in?

36:17 Or do you want to not be able to log in?

36:20 Well, I want to be able to log in.

36:21 Okay, we've got to use cookies.

36:22 So I actually got this thing called I Don't Care About Cookies as a browser extension.

36:28 And if it sees that, it tries to agree to it automatically.

36:31 Like every side of it.

36:32 Just so to cut down on the cookie.

36:34 Okay.

36:35 Okay.

36:35 By the way, this was by no means the intention of the law.

36:39 Just to make it clear to everyone.

36:41 It's important to bring that up.

36:43 Because also, you know, especially I think, yeah, from the European perspective, you know,

36:46 I'm genuinely a fan of GDPR.

36:48 And then often, yeah, people go like, oh, it's all these cookie pop-ups.

36:51 You know, it's like, yeah, no, that wasn't, that's not GDPR.

36:54 That's just.

36:55 Weren't the cookie pop-ups predated the GDPR, didn't it?

36:58 It was mainly, so some people did it before, but it was deeply rolled out for GDPR because

37:04 there's all this compliance.

37:06 Now, a tip for folks is somebody got sued.

37:10 It was Google or Facebook that it was too hard to just do the least possible.

37:16 So now if you're, if you haven't installed this extension, there's usually a big button

37:22 that says legitimate interest.

37:24 And you can just press that one.

37:25 It's the least amount.

37:26 Yes.

37:27 Does it usually now involve two clicks rather than one?

37:30 Yeah.

37:30 I wonder if this extension does it, because I think as far as I know, it's also, if you

37:35 get to a set, if it offers you to go to a settings page, everything has to be unchecked

37:39 by default.

37:39 So it's actually, yeah, it's actually quite convenient.

37:42 You just go to the button that's not accept all, and then you accept whatever is there, and

37:48 then you get nothing.

37:49 It might be this big and the same color as the page.

37:53 So just be like really looking hard.

37:55 I didn't see if there's a little, like a shadow.

37:57 Which just talks about, I don't know if you've talked about this on the podcast recently,

38:02 but I've been reading a lot about dark patterns and like dark patterns and privacy are like

38:07 in a very deep love relationship on the internet of like, no, you really want to give us all

38:14 your data.

38:14 You're going to be so sad if you don't.

38:17 Yeah.

38:17 Yeah.

38:18 The dark patterns and the lack of privacy.

38:20 Yeah.

38:20 Yeah.

38:21 Yeah.

38:21 Absolutely.

38:21 The same color as foreground and background that ties into another compliance.

38:25 thing, which is accessibility.

38:26 And at least if like in the US, you can get sued for having a not accessible website.

38:31 Even companies will at least not do this.

38:34 Even if it's only if they don't care about anyone accessing their website and all they

38:37 care about is not getting sued.

38:38 You won't have buttons with the same foreground and background color anymore.

38:42 Yeah, indeed.

38:43 And I'm okay with my, I'm okay with cookies, little clicker thing, because I also have

38:49 like network level tracking blocking.

38:52 So, so that if then they say, sure, well, fine.

38:56 Here's your Facebook cookie.

38:57 It's like, no, it's blocked.

38:58 So like, it's my weird setup.

39:00 So Vincent out in the audience says, just to mention Raza, a Python tool for open source

39:06 chatbots took the effort for writing down some ethical principles of good design.

39:11 And one of those is that it lists a conversational assistant should identify itself as one.

39:17 Hi, Vincent.

39:18 Yeah.

39:18 No, Raza, Stephanie.

39:19 Raza is a great open source library as well.

39:21 We kind of, you know, friends with Spacey.

39:24 It's kind of the same ecosystem.

39:25 No, I think it's, yeah, I think this is a good principle.

39:28 Actually, often when I use like a, like a chat, a bot or whatever, and I'm not sure it's

39:32 a chat bot, they're like a lot of, you know, things and ways of things you can write to

39:36 check if it's a human or not, because they're like, you know, there's certain things where

39:40 like, usually these things are quite bad at like resolving references.

39:42 So, you know, if you use like a pronoun or like you, to refer to something you previously

39:47 said or a person or something like, there are a lot of things that often these things are

39:51 quite bad at.

39:52 And if you, if you vague enough, a human will always get it, but like a machine might not.

39:57 So if you use your text processing ML skills and experience for good use, that's right.

40:04 Yeah.

40:04 No, because there was a, there was a case where I was like, this person, this, this agent is

40:08 so incompetent.

40:09 It must be like, it must be like a machine.

40:11 And then it's actually, it would be a pretty good chat bot because you know, the chat bot passed

40:15 as an incompetent human, but no, it turned out it was just an incompetent human.

40:19 Oh no.

40:20 Yeah, that's true.

40:22 The chat bots are very bad at, at retaining, like building up a state of the conversation.

40:27 It's like, they see the message and then they respond to it.

40:30 It's like you ask a question and then it does, did you say, what exactly is this about?

40:36 Well, I said it was about this above.

40:38 So what do you think it's, you know, it's, it's just those kinds of tie-ins that they don't

40:42 carry on.

40:42 Yeah.

40:43 These systems are getting, getting better at this, but there's just some like, you know,

40:47 if you really try to be as vague as possible, you can like trick them and then you find

40:51 out if it's a human or not.

40:52 Yeah, exactly.

40:53 So let's see some more things about the law is over here.

40:57 The military, AI in the military is exempt.

41:00 So that's not a surprise.

41:02 I mean, there's probably top secret stuff.

41:05 How are you going to submit that?

41:06 I don't know.

41:07 Yeah.

41:07 Yeah.

41:08 But then it's like, oh, and you know, a lot of the worst things that happen, happen in

41:11 this context.

41:12 So it's like, you know, as like, you know, as a little anecdote, for example, we've for

41:16 a long time, we've always had, we've had exposure.

41:17 We've had this policy that we do not sell to organizations who are primarily engaged in

41:23 government, military or intelligence, national security, because, and our reasoning for that

41:28 has always been that, well, in the free market, you have a lot of other ways that companies

41:33 and applications can be regulated by regulations like this, but also just by market pressures

41:38 and by things just being more public.

41:40 All of these things you do not have, the work is military intelligence or certain government

41:45 work.

41:46 So we see that as very problematic because you have absolutely no idea what the software

41:50 is used for, and there's absolutely no way to regulate it ever.

41:53 And then we'd say, okay, that's not what we want to sell our software to.

41:58 But the other use cases, you know, it's not, some government things are fine.

42:01 You know, we'd happily sell to the IRS and equivalents or, you know, federal reserves.

42:07 There are a lot of things that are like not terrible that are government adjacent or just

42:11 a lot of research labs as well.

42:12 But yeah, military, that's quite obvious.

42:14 Yeah.

42:14 Well, and then you think like how many companies that work on machine learning today that focus

42:19 on selling explicitly into the military.

42:23 And it's like, well, are they exempt?

42:25 Because just, you know, basically an extent like it's Palantir exempt from this.

42:30 Interesting, right?

42:31 Because yeah, the law would otherwise apply to them, but sort of indirectly.

42:36 So you're asking about the transitive property, basically.

42:39 Yeah.

42:40 Yeah.

42:41 Like, well, it's only in used in military use, so it's probably okay.

42:45 It's probably exempt or whatever.

42:47 Yeah.

42:47 Yeah.

42:48 Well, I guess if you can make the case that it's, you know, it's classified, that that's

42:52 probably what companies like that will have, you know, the means.

42:54 They would make sure that every project they're taking on, it's classified in some way.

42:59 And then, you know, they get a brand bad.

43:00 Yeah.

43:01 That's probably true.

43:02 All right.

43:02 Another thing that I thought was interesting.

43:04 So all the stuff we talked about so far is sort of just laying out the details.

43:09 On the imprecision and subjectivity side, one of the quotes was, one area that raised eyebrows is part of the report, which reads,

43:17 AI systems designed or used in a manner that exploits information or prediction about a person

43:23 or group of persons in order to target their vulnerabilities or special circumstances,

43:27 causing a person to behave or form an opinion or take a decision to their detriment.

43:32 Yeah.

43:33 That sounds like a lot of big tech, honestly, like a lot of the social networks,

43:37 maybe they even suggest maybe that's even like Amazon shopping recommendations, right?

43:42 Encourage you to buy something that you don't need or whatever.

43:46 What do you think about that?

43:47 Yeah.

43:47 I mean, I guess it's quite vague and it's like, okay, how do you define, you know,

43:51 we have to wait for like the actual cases to come up and someone making the case that like,

43:55 oh, I don't know, my wife divorced me because X happened and that was, you know,

44:01 the outcome and it's clearly that company's fault and then someone can decide whether that's true or not.

44:06 And then, you know, it's quite, and, you know, of course, these are not cases that this was like designed for or written for, but like there's,

44:12 you know, it is vague to this extent where like, yes, this would probably be a legit case that a judge has to decide over and maybe the person would win.

44:19 Wouldn't it be great if they gave examples?

44:21 I didn't want to accept the cookies.

44:23 So I'm sitting under the new law.

44:25 Exactly.

44:26 Yeah.

44:27 That made me feel bad.

44:28 Yeah.

44:28 But I mean, I think some of it is like really feel like the conversation here.

44:34 And I'm being curious to the opinion on the conversation in the U.S.

44:37 is around kind of the political ad manipulation and the amount, let's say,

44:43 when we think about topics like disinformation and misinformation, the amount of kind of algorithmic use of, let's say, opinion pieces to kind of push particular agendas.

44:54 And when I read this, I'm guessing that's like one of the things they had in mind.

44:59 I had misinformation and fake news and all that kind of stuff is what popped to mind when I saw this.

45:07 Yeah.

45:07 And I was also thinking of recommendation systems and like even, I mean, I don't know,

45:12 not even fake news, but like, okay, you can, you know, manipulate people into, you know,

45:17 joining certain groups.

45:18 Yeah, exactly.

45:19 Yeah.

45:20 You're a relatively just stable, normal person.

45:23 Yeah.

45:24 And then, then you, you read some posts, they suggest you join a group.

45:29 Three months later, you'll, you know, you're in the wilderness training with a gun or something.

45:34 Like, it's just, it's so easy to like send people down these, these holes, I think, you know,

45:39 on a much more relatable note, I would say, even though I really love YouTube, you know,

45:44 one of the sort of sayings, I think I heard it somewhere.

45:47 I don't know where it came from, so I can't attribute it as, but you're never extreme enough

45:51 for YouTube.

45:52 If you watch three YouTube videos on like some topic, like let's suppose your washer

45:57 broke.

45:57 And so you need to figure out how does my dishwasher work?

46:00 And so you watch several videos to try to fix it.

46:03 Well, your feed is full of dishwasher stuff.

46:05 And if you watch a few more, it's nothing but dishwashers.

46:07 There's a lot of other videos besides dishwasher.

46:10 So any little, like, it's almost like the butterfly effect, the chaos theory effect of like, I washed

46:15 a little bit this way and then you just, you end up down that, that channel.

46:19 One of the interesting things I think about that is I've been talking with a few folks

46:23 that like where, you know, a friend's, you know, family has been kind of like radicalized

46:29 around some of the topics that let's say are very radical online right now.

46:33 And they're like, I just don't know how it happens.

46:35 And it's kind of like, well, the internet that they're experiencing is incredibly different

46:41 than the internet that you're experiencing.

46:43 And so it's kind of like when we think about lockdown or where the internet is going to

46:48 be like a major source of people's life and then their internet is just a completely different

46:54 experience than yours based off of some related search terms across maybe four or five different

47:00 sites that have been linked via cookies or other types of information.

47:04 I mean, it's like, yeah, you can say, well, I have this experience, but if your entire world

47:12 online was different, maybe you wouldn't have the same experience.

47:15 I think it would be very hard to say what, how you would think and feel if your entire information

47:21 experience was completely different.

47:23 Don't make me think about weird alternate realities of myself.

47:26 What if just one decision was made differently?

47:28 What world would you be in?

47:30 You know, it could be really different, honestly.

47:32 No, I mean, you wouldn't even necessarily know when I think, but I think that's also kind

47:35 of a problem where I like, in that sense, I do like that it's relatively vague.

47:39 And I think laws can be vague because, you know, you don't know what's going to happen.

47:42 And you might have people who are in a situation where they don't necessarily feel like, oh,

47:45 I've been like, you know, tricked or like treated badly here.

47:49 And maybe, you know, the outcome, maybe the outcomes of the behavior are bad, but,

47:54 you know, maybe what the platform did wasn't necessarily illegal.

47:57 Like that's also the problem.

47:58 It's like, you know, you can what, like a lot of the content you can watch on YouTube is not,

48:03 it's legal and it's your right as like, you know, a free citizen, especially, you know,

48:07 in the U S where people take this even more seriously to some degree than people in Europe.

48:13 Like you can, you can watch like anti-vax videos all day and that's your rights.

48:17 And, nobody can keep you from that.

48:19 It's not good for you.

48:19 You can do it.

48:20 No, it's not good for you, but like, you know, and so otherwise I think, yeah,

48:25 terms that are maybe, you know, less fake in that respect, it would be much harder to actually,

48:31 you know, go after cases where yes, it's clearly the platform is clearly to blame or the platform

48:36 should be regulated, which obviously it's very clear that it's what they had in mind.

48:39 Right.

48:39 Absolutely.

48:40 I really liked the part here.

48:42 That's like exploit information, target vulnerabilities, because it's kind of like, okay, I know these,

48:49 you know, I mean, what we saw with Cambridge Analytica and then a bunch of the targeted

48:52 stuff after that was like, we can figure out exactly how to target undecided voters of these

48:59 different racial groups in these counties.

49:01 And we can like feed them as many Facebook ads as possible.

49:04 And it's just like, wow.

49:06 Okay.

49:06 I don't think people realize that that was so easy to put together and do given like a fairly

49:13 small amount of information about a person.

49:15 So, I mean, and, and it's not personal information, right?

49:20 Because usually it's what we would call, you know, profiles of individuals.

49:24 So you fit a profile because, you know, you like these three brands on Facebook and you live in these

49:31 districts and you're this age and this race, or you report this race, or we, we can infer your race

49:37 because of these other things that you've liked.

49:39 It means, you know, it adds a lot of information that I don't think most people know that,

49:44 that you can get that specific in the advertising world.

49:48 Yeah.

49:48 How do you ladies feel about the whole flock thing?

49:52 Oh yeah.

49:52 That Chrome was, Chrome was doing to replace cookies.

49:55 I mean, we wouldn't even have to have those little buttons or my add in.

49:58 It'd be a good world.

49:59 Oh yeah.

50:01 Oh yeah.

50:02 Oh yeah.

50:02 There's been a lot of important writing about flocks and, and vulnerabilities in the design.

50:08 Just so people, if they, sorry, if they don't know.

50:10 Yeah.

50:10 I'm sorry.

50:10 Yeah.

50:11 Yeah.

50:11 Federated learning of cohorts.

50:13 Learning of cohorts.

50:14 I believe.

50:14 Yeah.

50:15 Yeah.

50:15 Yeah.

50:15 Yeah.

50:16 And federated learning is essentially a tool that can be privacy preserving, but doesn't

50:22 have to be.

50:22 And it basically means that the data stays on device and the things that we send to a centralized

50:29 location or several centralized location are usually gradient updates.

50:34 So these are small updates to the model.

50:36 All the model is then shared amongst participants and the process repeats.

50:40 The exact design of how flocks was rolled out and is rolled out is I think not fully clear.

50:48 And in general, I'm a fan of some parts of federated learning, but there's a lot of loopholes in

50:54 flux design that would still involve the ability for people to both reverse engineer the models, but also

51:01 to fingerprint people.

51:03 Yeah.

51:04 Yeah.

51:04 So to take your cohort plus your browser fingerprint, you combine the two, it becomes

51:10 fairly easy to re-identify individuals.

51:13 So.

51:13 Yeah.

51:13 And I think the more underlying problem is also that, well, you know, are you going to trust

51:17 something that comes out of Google that's marketed as like, oh, it will like, you know,

51:21 preserve your privacy and be like really great for you and the internet.

51:25 And I mean, that's, that's just like screams of like red flags.

51:29 Yeah.

51:29 You know, to me, it feels like, it feels like we've been presented a false dichotomy.

51:36 We could either have this creepy cookie world or because we must still have tracking or we

51:43 could have this flock.

51:44 It's like, well, or we could just not have tracking.

51:45 Like that's also a possible future.

51:47 We don't have to have tracking.

51:49 And here's a better tracking mechanism.

51:51 We can just have not have tracking.

51:52 How about that?

51:53 I was reading a wonderful article about IE6.

51:57 Okay.

51:58 I don't know.

51:58 That's some, that's some.

52:00 Young children.

52:01 I'm sorry.

52:02 I'm sorry.

52:03 I'm bringing up ancient history, but there was a browser once and it was called Internet Explorer

52:08 6.

52:08 It was the bane of every web developer's existence for a long time.

52:12 But one thing that I didn't know about it until recently is it actually had privacy standards

52:18 built into the browser.

52:19 You could set up certain privacy preferences and it would like block cookies and websites

52:24 and stuff for you automatically.

52:26 Like there was this whole standard called P3P that would the WC3 put together around like

52:36 everybody's going to have your local stored privacy preferences.

52:40 And then when you browse the web, it's just going to automatically block stuff and all

52:43 this stuff.

52:44 And I was like, we figured this out during IE6.

52:47 What happened?

52:48 So yeah, just let you know a little bit of history.

52:52 Look up P3P.

52:53 So yeah.

52:54 Absolutely.

52:55 All right.

52:56 I guess to close out the flock thing, the thing that scares me about this is if I really wanted

53:00 I could open up a private window and I could go, I could even potentially fire up a VPN

53:06 on a different location and go visit a place.

53:08 And when I show up there, no matter how creepy tracking that place happens to be, I am basically

53:14 an unknown to that location.

53:16 Whereas this stuff, if your browser constantly puts you into this category, well, you show up

53:20 already in that category.

53:22 There's really no way to sort of have a fresh start, I guess.

53:26 All right.

53:26 So one thing, Ines, maybe you could speak to this since you're right in the middle of it,

53:31 they talk about how one of the things that's not mentioned in here is, you know, how does it,

53:37 basically they say the regulation does little to incentivize or support EU innovation and

53:41 entrepreneurship in this space.

53:43 There's nothing in here and this law that specifically is to promote EU-based

53:50 ML companies.

53:52 I guess, I think.

53:53 Well, does it even belong there or is it, or is it okay?

53:55 Or what do you think?

53:56 I don't know.

53:56 I was actually, I was a bit confused by that.

53:58 It does remind me of like, well, in general, for a long time, a lot of people have said,

54:01 oh, the EU is like a bad place for startups.

54:04 I think actually regulation is a big part of that, which is sort of, you know, goes full circle.

54:08 Like a lot of people find that, well, the EU is more difficult.

54:12 You have to stick to all of these roles and people actually enforce them and you're less

54:16 free and you can't be whatever the fuck you want.

54:18 So you should go to the US where people are a bit more like chill.

54:21 And it's a bit more common to like, I don't know, ask for forgiveness later and just like,

54:26 so I think, I think that is definitely kind of a mentality that people have.

54:29 So I'm like, I'm not sure, honestly, I'm not sure what like, incentivizing

54:35 EU entrepreneurship could be.

54:38 I actually, I mean, for me personally, like it was a very, for me, it was a very conscious decision for us to start a company in Berlin.

54:44 And the EU was like a big part in that.

54:47 I know that like, maybe, you know, I'm not the typical entrepreneur and I'm,

54:50 we're doing things quite differently with our company as well.

54:52 We're not like, you know, your typical startup, but being in the EU was actually very attractive

54:57 to us.

54:58 And even, you know, recently, as we sold some shares in the company, it was incredibly

55:02 important to us to stay a German company and be a company paying taxes to the country that

55:07 we actually incorporated in and not just become a US company.

55:10 Yeah.

55:10 But I know that's not necessarily true for like, everyone.

55:14 But are you maximizing shareholder value?

55:16 No, but I mean, yeah.

55:18 That leads to so many wrongs that this short, short-sighted thinking, I think that's, that's

55:23 great that you have principles about this stuff.

55:25 Yeah.

55:25 But I mean, you know, you did that capitalism is going to capitalism, like, you know, and

55:29 then I say that as like, you know, someone who is also participating in capitalism.

55:33 Sure.

55:34 And yeah, it's like, I mean, I don't know.

55:36 I do think, you know, Europe is becoming more attractive as a location for companies.

55:41 I think Berlin is becoming more attractive as a location to be based in and start a company.

55:46 But it is also true that there are a lot of more general things that make it harder to actually

55:53 run a business here, especially if you directly compare it to the US.

55:56 And yes, a lot of that is also the bureaucracy.

55:59 It is a lot of the, you know, structures not being, you know, as developed.

56:04 It's also, yeah, if you are looking to get investment in your company, it often makes a

56:09 lot more sense to look in the US for that, which then causes other difficulties.

56:14 If you, you know, especially a young company and you don't, you know, you can't have as

56:17 many demands, like in our case, we could be like, okay, here's, here's what we want to do.

56:21 If you're not in that position, you can't do that.

56:23 And so I, I agree with like the problems here, but I don't know how this law and this proposal.

56:28 Yeah. Well, what was it supposed to do? Right?

56:31 Yeah. I mean, say, oh, you're sort of, you're exempt from some of the things.

56:34 If you are like, I don't know, a startup coming to the EU or like...

56:40 Here's how it advantages the EU. Like all companies that are not EU based have to follow this law

56:46 and no rules for the EU based ones.

56:47 No. Yeah.

56:48 Of course that wouldn't be with the principles of it, but right.

56:50 Yeah. I don't know what it would do.

56:51 Company moving to the EU, you would like get like, you know, you don't only have to follow

56:54 half of these things. And then, you know, you can, you know, then everyone's like back in like,

56:58 I don't know. Yeah. Having their like mailbox companies all over Europe.

57:02 Yeah. Exactly. I guess one other thing I would just want to touch on with this law,

57:07 speaking of what is absent. And this also surprises me a little bit is that there's

57:12 nothing in here about climate change and model training and sort of the cost of operation

57:19 of these things. Does that surprise you? Would it belong here? What do you all think?

57:24 I mean, is it a high risk? This is my ask is like when I saw it wasn't in there at all,

57:30 not even lightly mentioned, I was like, how many like carbon emissions do we have to go until it's

57:37 high risk? But evidently the thinking of human side of high risk, actually climate change is also

57:42 human side of high risk on humans. Maybe it's too many steps.

57:47 Is the AI going to kill me eventually or tomorrow?

57:51 Exactly. Is it armed or is it just Is it like just 30 years from now when it floods or something? Yeah. Yeah. Yeah.

57:59 And so, yeah, I think like I was definitely curious to see that they didn't include it,

58:04 despite all of the kind of work here from the Greens and other parties like them for,

58:09 you know, climate change awareness when we talk about what is a risk, right? Obviously,

58:14 there's a huge risk for the entire world, right? Yeah.

58:18 I guess it also seems like maybe it was too difficult to like, you know, implement in terms of how do we

58:23 police that? Like, would this then imply, I don't know, would AWS have to report to the EU about like,

58:29 who's using what compute? Yeah.

58:32 And, or I don't know, report if the compute exceeds a certain limit, so that then you can be audited.

58:37 Like, these could all be potential implications, which again then tie into other privacy concerns.

58:42 Yeah. Because yeah, I would, you know, I wouldn't necessarily want like, you know, AWS to sniff around

58:48 my compute. Yeah.

58:50 But maybe they have to, if the EU needs it.

58:52 Like, wait a minute. We just had to reveal that this company did $2 million worth of GPU training.

58:59 And we thought they were just a little small company. What's going on, right? Like some,

59:02 something like that could come out. But you know, I don't know. Something I had in mind is maybe if you

59:08 create ML models for European citizens, those models must be trained with renewable energy or something to

59:15 that effect, right? That you don't have to report it, but that has to be the case. I don't know.

59:20 Yeah. But I mean, I don't know. It's interesting. It's an interesting question because the thing is,

59:24 if you had like, you know, too many restrictions around this, this would encourage people to like,

59:28 I don't know, train less, which then in turn is quite bad. I think actually what's quite important is that,

59:34 like, if you are developing these systems, you should train, you know, you know, you should care

59:39 about like what you're training. You shouldn't like, you know, constantly train these large language

59:44 models for no reason, just so you can, you know, say, oh, look at my, my model that's bigger than yours.

59:49 But it is, you know, on a smaller scale, it's very important to keep training your model, keep

59:54 collecting data and to keep improving it. And to also train models that are really specific for your

01:00:00 problems and not just find something that you download off the internet or that, you know,

01:00:04 someone gives you by an API that kind of sort of does what you do. And then you're like, ah,

01:00:09 that's good enough because that's how you end up with a lot more problems. Like being able to,

01:00:14 like creating data and being able to train a system for your really, really specific use case.

01:00:18 That's an advantage. That's not like, you know, a disadvantage that you're trying to avoid.

01:00:23 Yeah, that's a really good point. It could be absolutely in contrast with some of the other

01:00:29 things. Like it has to be fair, but if it uses too much training, that's going to go over the

01:00:35 other one. So let's do less trainings. It's kind of close enough to be unfair, right?

01:00:39 Yeah. And then that, that encourages people to use, I don't know, just like some arbitrary API

01:00:44 that they can find, which again is also, you know, not great or like, yeah, I don't know. I think the

01:00:49 the bigger takeaway or the very important takeaway from these really large language models, in my

01:00:54 opinion, is it's not necessarily that like, whoa, if we just make it bigger and bigger, we can get a

01:00:58 system that then can cut us pretty good at pretty much everything considering it's never learned

01:01:03 anything about any of these things. I think the takeaway, I know, and I think that many people are

01:01:07 still seeing it this way. And I think the more reasonable takeaway is if it's a model that was just

01:01:13 trained on like tons of texts can do pretty good things with stuff it's never seen before, how well could a

01:01:19 much smaller, more specific system do if we actually, you know, trained it on very,

01:01:24 very small subset of only what we want to do and that will be more efficient. And, you know,

01:01:28 I think people, we should stop like, you know, hoping that there'll be one model that can magically do,

01:01:33 I don't know, your arbitrary, like, you know, accounting task and also, I don't know, decide whether

01:01:39 Michael should get a mortgage or not. Like, I think that's, that's kind of this weird idea. It's like,

01:01:43 you want a specific system that requires training. I think training is good.

01:01:47 Yeah. Put a good word in with the mortgage AI for me, will you?

01:01:51 Catherine, I think you want to have a quick comment on this and maybe we should wrap it up after that.

01:01:56 Yeah. I mean, I guess I was just going to reference the opportunity and risks of foundation models,

01:02:02 which I think touches on some of these things. So it's this mega paper and it's exactly, well,

01:02:08 some of the sections are about exactly this problem of like, why do we believe that we need to have these

01:02:14 foundational models with these large, extremely large, even larger than the last largest models

01:02:21 to do all of the things with it also has all these other implications, environmental factors being one

01:02:26 of them. Because obviously when you train one of these models, it's like driving your car around for like 10

01:02:31 years or something like this. So, you know, there's big implications. And I think the point of,

01:02:37 can you build a smaller targeted model to do the same thing? And then the other point of,

01:02:41 if we need these big models, are there ways for us to hook in and do small bits of training rather than

01:02:48 to retrain from the start, from the very beginning? I mean, these are like the hard problems that I

01:02:53 think need solving, not maybe not always building a better recommendation machine. So yeah, if you're

01:02:59 looking for a problem, solve some of these problems.

01:03:02 Yeah. Fantastic. This is a big article. The PDF is published. People can check it out. We'll link to

01:03:07 it in the show notes.

01:03:08 Yeah. Also, actually, a good point on the foundation. Sorry, no, I just wanted to say,

01:03:12 sorry, I've been referring to these as language models. I've been trying to train myself to use the

01:03:16 like, you know, more explicit term because I think foundation models are a much better way

01:03:21 to expect this. And I'm so happy this term was introduced because it finally solves

01:03:24 a lot of these problems of everything being a model that I think causes a lot of confusion

01:03:29 when talking about machine learning. Yeah. Excellent. Haley asks what, in the audience asks, what does climate change have to do with this? The reason I brought it up,

01:03:36 one, is because Europe seems to be leading at least in the consensus side of things,

01:03:42 trying to address climate change. I feel like there's a lot of citizens there where it's

01:03:46 on their mind and they want the government to do something about it and stuff.

01:03:50 The governments do a lot there. So as a law, I thought, you know, maybe it would touch on that

01:03:56 because Catherine, you pointed out some crazy number. You want to like, just reemphasize

01:04:00 the cost of some of these things. It's not just like, oh, well, it's like leaving a few lights on.

01:04:05 It's a lot. It's a whole lot. No, it's just huge. Yeah. And they keep getting bigger. I forget who

01:04:10 released the newest one. I don't know if you remember Ines, but it's like they keep getting

01:04:16 bigger and bigger. So some of these have like billions and billions and billions of parameters.

01:04:20 they sometimes have extremely large amounts of data, either as external reference or in the model

01:04:27 itself. And yeah, Tivnick Gebru's paper that essentially she was basically fired from Google

01:04:34 for researching was around the, or one of the parts of the paper was around how much carbon emissions come

01:04:41 from training these models. They've only gotten bigger since that paper. And yeah, it's just,

01:04:47 I may have the statistic wrong, but it is as it is almost as bad as driving a car around, you know,

01:04:54 with the motor on every day, you know, with your normal commute for like 10 plus years to just train

01:05:00 one model. And it's really absurd because some of these models were just training to prove that we can

01:05:06 train them. And so it's like, yeah.

01:05:08 Yeah. Yeah. And often it's not, the, the, the, the artifact isn't even as useful where it's like, okay,

01:05:12 with a lot of the bird models, we can at least, I think it's good that we just reuse these weights.

01:05:16 And I think often in practice, that's what's done, you know, some, you take some of these weights that

01:05:21 someone else has trained or use these embeddings, and then you train something else on top of that.

01:05:26 Like transfer learning or something like that.

01:05:27 Yeah. Or even just like you use these embeddings to initialize your model, and then you train different

01:05:32 components using these embeddings. And that's, that is efficient, but it also makes

01:05:38 that, okay, we kind of stuck with a lot of these, like, you know, artifacts that are getting,

01:05:42 you know, stale all the time.

01:05:44 Yeah. So the comment in the audience was, I could train one on my laptop and use electricity, like,

01:05:49 true, but it's like 50,000 laptops, or is, I mean, it's an, it's a much different thing.

01:05:54 I mean, it's a lot more. Exactly. No. And I think training on a laptop is great. Like,

01:05:56 for example, we recently did some work to be able to hook into the accelerate library on the new M1

01:06:02 MacBooks, which make things a lot faster in space. And that was quite cool to see. And we want to do a

01:06:08 bit more there because like, oh, you can really, you know, if we optimize this further, you can actually train a model on your

01:06:13 MacBook. And this can be really accurate. And you don't necessarily need like all this computer power.

01:06:19 Yeah.

01:06:20 And yep. So training on your laptop is good.

01:06:22 It is, if you could do it. But a lot of the ones that we're actually talking about use,

01:06:26 like these huge, huge modules that take a lot. So you can say you don't really care about climate

01:06:31 change or whatever. But if you do, the ML training side has a pretty significant impact. And I was

01:06:39 unsure whether or not to see it. But yeah, I guess it makes sense that it's not there. Who knows?

01:06:43 It's also, they also, they said, this is the foundation for potentially future AI laws in Europe.

01:06:50 Yeah. And also appreciate that. Okay. They didn't want to tie, you know, tie everything together.

01:06:54 Like I can't even, I think from a political perspective, if you are proposing this pretty

01:06:58 bold framework for regulation, tying it into too many other topics can easily, I don't know,

01:07:04 distract from like the core point that they want to make. So I think it might've actually been like a,

01:07:09 you know, stupid decision.

01:07:10 Yep. Absolutely. All right, ladies, this has been a fantastic conversation. I've learned a lot and

01:07:17 really enjoyed having you here. Now, before we get out of here, maybe since there's two of you

01:07:20 and we're sort of over time, I'll just ask you one question. So the two. So if you're going to write

01:07:25 some Python code, what editor are you using these days? Catherine, you go first.

01:07:28 I'm still in Vim. Am I old? I think I'm old now.

01:07:32 I still use Vim.

01:07:34 Oh, you're classic. Come on.

01:07:35 No, I'm quite boring. Visual Studio Code. I've been using that for years. It's very nice. I think,

01:07:47 I think it's probably the most common answer you get. And it's quite, yeah.

01:07:50 It's certainly the last year.

01:07:52 I think using Vim is a lot edgier and cooler. I wish, you know, maybe for that reason alone, I should just like, you know.

01:07:57 She edits code without even a window. It just appears on this black surface.

01:08:02 I mean, my microphone, he programs that way. And I'm like, okay, if it makes you happy.

01:08:08 Yeah, yeah, you do. Awesome.

01:08:10 Some people just like to suffer. That's okay.

01:08:13 Oh, damn.

01:08:15 All right.

01:08:17 I'm sorry. No offense. Like, I don't know. No offense. This was a joke. No offense to anyone who's

01:08:21 programming Vim. I know lots of great people who do that. I don't want to get any paid messages.

01:08:25 Please. Don't email me. All right. All right. Well, Catherine Ines, thanks for coming back on the show and sharing your thoughts.

01:08:34 Yeah. Thanks for having me. Yeah. Thanks for having me.

01:08:37 Yeah. Bye.

01:08:37 Ciao.

01:08:37 This has been another episode of Talk Python to Me. Thank you to our sponsors.

01:08:43 Be sure to check out what they're offering. It really helps support the show.

01:08:46 Take some stress out of your life. Get notified immediately about errors and performance issues

01:08:52 in your web or mobile applications with Sentry. Just visit talkpython.fm/sentry and get started for

01:08:59 free. And be sure to use the promo code talkpython, all one word. Add high-performance,

01:09:05 multi-party video calls to any app or website with SignalWire. Visit talkpython.fm/signalwire

01:09:11 and mention that you came from Talk Python to me to get started and grab those free credits.

01:09:15 Want to level up your Python? We have one of the largest catalogs of Python video courses over at

01:09:21 Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async.

01:09:27 And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm.

01:09:33 Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:09:49 We're live streaming most of our recordings these days. If you want to be part of the show and have

01:09:53 your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:10:00 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

01:10:14 I'll see you next time. Bye.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon