#89: A conversation with the Chief Data Scientist of the United States Transcript
00:00 In this special episode, you'll meet DJ Patil, the current Chief Data Scientist of the United States.
00:05 You'll hear his thoughts on data at the level of the United States government,
00:09 and look back on his term over the past few years.
00:12 This is Talk Python to Me, Episode 89.
00:15 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:46 This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy.
00:50 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
00:57 This episode has been sponsored by Rollbar and GoCD.
01:02 Thank them both for supporting the podcast by checking out what they're offering during their segments.
01:07 Hey, Jonathan. Welcome back to Talk Python.
01:11 Hey, man. Thanks so much for having me. It's great to be back.
01:13 Yeah, absolutely. You're actually here twice this month.
01:16 So later this month, we're going to do our top 10 data science stories of 2016, and that's super fun.
01:21 We already recorded it, and I'm looking forward to sharing it with everyone.
01:24 But we actually have a really special opportunity here.
01:27 And you talked to me a few weeks ago and said, hey, look, I have this great opportunity.
01:30 Could I do a co-host show on Talk Python?
01:35 And I'll just let you tell everyone what it is because it's really great.
01:38 You told me what it was. I'm like, yes, you should do this.
01:40 What do you got on store for us?
01:41 Yeah, yeah, absolutely.
01:43 So thank you, by the way, and thanks to the audience for letting me elbow into this week's episode.
01:50 But basically, we got invited.
01:51 So I co-host a podcast called Partially Derivative, which is a kind of a nerdy data science and data podcast.
01:58 And we got invited to the White House to go interview the U.S. chief data scientist, DJ Patil,
02:05 who's part of the Office of Science and Technology Policy, which is where the CTO, Megan Smith, works.
02:11 And so all of the cool stuff that government has been doing during the Obama administration to get technology and data and data science into government,
02:19 we just kind of wanted to talk to him about it and say, like, how's it been going?
02:22 You know, basically, like, do DJ's exit interview as the U.S. chief data scientist.
02:27 And it was awesome.
02:29 I mean, we got to go to the actual White House.
02:32 And we just wanted to share what we learned from DJ and the conversation that we had with as many tech people as possible.
02:39 And so I thought, well, I know a guy who gets to talk to a lot of really fantastic tech people every week.
02:45 And so I'm really glad this worked out.
02:46 Yeah, I'm really glad as well.
02:48 And, you know, doing a show live from the White House is pretty amazing for a podcast, I think, especially for a tech podcast.
02:55 DJ is the first chief data scientist in the United States, right?
02:58 That's right.
02:59 He's the first person to ever hold that position, which is which is very cool.
03:03 I think he really understands the significance of that of that position.
03:06 So it was really fun to hear him reflect on it.
03:08 Well, I really enjoyed the interview that you did with him.
03:11 And so without further ado, take it to the White House.
03:13 All right.
03:14 Let's go to the White House.
03:15 All right.
03:16 So I am here in the actual White House with the actual chief data scientist of the United States.
03:23 DJ, thank you so much for being on the show.
03:25 My pleasure.
03:25 Awesome.
03:26 All right.
03:26 So we should probably start first things first.
03:28 I think most of the audience will be familiar with who you are and your work.
03:31 But just in case they're not, what does the chief data scientist of the United States do?
03:35 Like, what's your gig?
03:36 Well, the simplest way to put it is actually the mission the president gave us.
03:42 And it's something that's sort of phenomenal in itself is that how does a constitutional law professor get so focused on data and technology?
03:50 And the mission he gave us is to responsibly unleash the power of data to benefit all Americans in return on America's investment in data.
03:59 And the components that I think are critical.
04:00 And the components that I think are critical there, responsibly, something that we've, we've, in all of, I mean, both you and Chris and NVIDIA have talked about extensively, is what does responsibly mean with respect to data algorithms technology?
04:13 And then what does it mean to benefit all Americans?
04:16 You know, just because we have technology, you know, and then people have access to certain systems and solutions doesn't mean everyone in the country does.
04:26 So how do you make that happen?
04:28 And if our belief in theory of the case is that if we do it here, everybody around the world will benefit as a result.
04:34 Yeah.
04:34 I mean, fortunately, that's a nice, small, manageable mandate that, you know, has no serious, long, broad reaching implications or consequences.
04:42 But I mean, so you've, and, but you've been a part of this from the beginning, obviously, you're the first chief data scientist.
04:47 So how, how have you, how have those, how's that mission that the president gave you, how's that been playing out over your time here?
04:55 Well, I actually think that the first chief data scientist was really Washington.
05:00 He's a cartographer.
05:02 That's true.
05:03 That's true.
05:04 And if you look through the arc of history, you know, even Lincoln did basically Euclid's principles of mathematics from first, like we've had a lot of presidents who've been deeply, deeply mathematical or analytical in their, their, just the way they operate, the way they think.
05:22 I think what's true in the case specifically for this president, when you walk into the oval, you don't see dishware.
05:29 You don't see, you know, just kind of like kind of little cotch, tchotchkes in the, in the wall that are that he actually has the submissions of the real original patents for things like the telegraph and the gear cutter.
05:42 And the reason for that is if you think through the arc of our entire history as a country from the founding of the institution, what's really there is, is that in every case data and technology has been a force multiplier.
05:56 It has really transformed our ability to move as a society.
06:01 And we're seeing that next wave of transformation take place right now.
06:05 So what does that look like over this arc of just this time period, just in this time period alone, just in the last couple of years, we have major movements on precision medicine, the idea of creating tailored treatments.
06:20 And as, as, as for health purposes, you have the affordable care act that kind of doubles down in a way that people don't always realize.
06:29 One of those components is that you can't be denied coverage because of a preexisting condition.
06:34 When you get to the genome, every one of us has a preexisting condition.
06:38 It's called being human.
06:40 So, so there's kind of like these fundamental kind of things that are entwined.
06:43 Cancer moonshot.
06:44 All the aspects of cancer are fundamentally based on being able to move data, collect it, store it, use it responsibly, and then act on it extremely fast, get you to the right treatment.
06:54 So right cares, smarter, faster, better, those things.
06:58 And criminal justice.
07:00 We have the data driven justice program on the police data initiative, both working on different sides, one working to create transparency for police departments with citizens, the other to find early intervention techniques.
07:11 All of these is just a few across every single aspect that we have.
07:17 There is data at almost every level of conversation, whether that's how do you think about getting kids into school, whether that's national security.
07:28 That's a weather forecast.
07:30 It's in every single thing.
07:32 Every single thing is as it's supposed to be, as it is supposed to be with DNA, which just basically in lies that everyone that's listening out there, we have good job security.
07:39 That's true.
07:41 That's true.
07:42 It's a good time to be a data person.
07:44 And you touched on something interesting just there.
07:46 I mean, there's these sort of specific programs and initiatives that you talked about, the precision medicine and kind of data driven policing.
07:54 But then you've also you also talked about this kind of general capacity building inside government.
07:58 I mean, this administration has used data or kind of embedded data into different agencies like like no other administration before.
08:06 And can you talk a little bit about the play between those two things, both kind of building capacity, looking at big issues like government transparency and accountability and how data informs that.
08:16 But then also and then we can talk about some of these specific examples as well.
08:20 But I'd be interested in how you this kind of this general idea about growing the data capacity of government.
08:25 And that's been a real I think a real revolution here in D.C.
08:28 Right.
08:28 Well, it's actually one of the more fascinating things is that, you know, there's this narrative going around that Silicon Valley has got to come to D.C. to save it.
08:38 But people forget, though, like where is all the investment in data originally come from?
08:42 You know, it's the government, whether it's census, whether or any of the other type of things.
08:49 DoD has funded some of the greatest advancements in data.
08:53 So as the Department of Energy, so has NIH.
08:56 All these programs, whether it's CERN and big, large scale atomic, you know, understanding of forces of nature to just the operational aspects of health care.
09:09 It isn't everything.
09:11 The part that I think is unique and why there's a data site, like why do we need a chief data scientist when we have?
09:19 We're economists and statistician.
09:21 There's a chief statistician, by the way.
09:23 There is basically a chief economist, Jason Furman, who's the head of Council Economic Advisors.
09:28 So why do we need a chief data scientist?
09:30 It's interesting that the statistician and economists don't always talk to each other.
09:37 And it's not a Bayesian thing.
09:39 It's like we're in different verticals or different silos.
09:43 And then the aspect is an increasing amount of data is happening from outside the federal government.
09:49 There's more.
09:50 So how do you bring that data together?
09:52 How do you really rethink the way data is being used?
09:56 That's the component that the role is there.
09:59 And that's the shift that we're seeing is that people who are coming into this, these, these new roles in the federal government have much more of that data science ethos, which say, well, there's lots of ways we can be clever to solve this problem.
10:12 I may not actually have the data.
10:15 I know where to go get the data.
10:16 Oh, by the way, the data is incredibly messy and not in the right format.
10:19 So I'm going to figure out how to get it together.
10:21 We're going to try a hypothesis.
10:23 We're going to test.
10:23 We're going to iterate.
10:24 We're going to try lots of different things.
10:26 Most of what we do at the end of the day is literally giving them freedom of space to do their job because they have the ideas.
10:34 They know what to do.
10:35 We just got to give them a runway.
10:36 Yeah.
10:37 And I think you're in a unique position to recognize that kind of that technology professional mentality, right?
10:45 Like you come out of Silicon Valley.
10:46 Well, I know that you worked in government before.
10:49 You've been in Silicon Valley and now you're back in government again, kind of trying to bridge that gap between these two communities that sometimes can be a little bit distant.
10:57 Like even the idea that Silicon Valley needs to come in and save government.
11:01 I know sometimes people bristle at that here in D.C.
11:05 And then I know at the same time, people in the tech community kind of bristle at the way that government does things because they don't really understand it.
11:12 So how do you see that relationship getting stronger?
11:18 How are you bridging that gap?
11:19 Because it really sounds like that's a big part of what you're describing here is data inside government, data outside government, kind of adopting that mentality and bringing that capacity to the work that government's already doing.
11:29 But at the same time, not forgetting that the foundation of a lot of these technologies and data driven approaches came out of government in the first place.
11:35 So how are you kind of making that marriage stronger?
11:37 Yeah.
11:38 So the easiest way to think about this is think through all the different podcasts that you've done where you said, wow, we need to work on this.
11:47 Or you call it a story and you're like, geez, how did that happen?
11:50 That's crazy.
11:51 Like there's so many different problems.
11:53 The biggest issue is we don't know how to help.
11:57 Like we say, I know how to do that.
11:58 We know how to raise our hand and say, I can help on that.
12:00 But there's no door.
12:02 There actually often is a door.
12:04 It's just not well marked or it's hidden or obfuscated in layers of bureaucracy.
12:10 So what we really tried to do is try to figure out how to show everyone the door.
12:14 The other part that's there is really just helping people exchange, like create a common language.
12:23 And one of the most powerful common languages is data.
12:26 At the end of the day, you're able to talk about things, share things and kind of see the difference approaches.
12:32 When we don't use data as a weapon against each other, we're actually using to have a conversation.
12:36 It changes the tenor of the whole nature of how we're actually in discussion.
12:41 We make it a discussion.
12:42 So the final part there, I think that's most important is if we say that this is the mission, the Secretary of Defense has a great way of saying it.
12:52 It's like there's nothing greater than waking up knowing you're part of something bigger.
12:56 And when you're part of a mission, it's awesome.
12:59 I mean, you've had a chance to experience it.
13:00 Others have like who people have worked around these problems.
13:03 And the part there that that is unbelievable is just.
13:08 When you get a chance to do it and that's it's just giving people doors.
13:15 Yeah.
13:16 And actually, that's a good point to bring up, because I know this has been a big a big part of what you've been doing here is inviting more technology professionals kind of in data science professionals into the government.
13:27 And I know that actually speaking of the Secretary of Defense, you were recently acknowledged for a pretty significant award.
13:33 I think the highest award that a civilian can can receive from the from the DOD.
13:39 Is that right?
13:40 That's right.
13:41 I know you probably want to talk about it too much.
13:44 Where's your medal?
13:45 I know.
13:47 I can't.
13:48 I can coin check you, but I can't.
13:51 Exactly.
13:52 I can't help check you.
13:52 You can't.
13:53 You can't.
13:53 That's what I'm about.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:57 You'll probably lose at that.
13:58 You'll probably lose at that.
13:59 You'll probably lose at that.
14:00 But it is true that I.
14:02 It was an incredible honor to be able to receive an award, even a medal, which is kind of a weird, surreal thing.
14:12 But the number one thing that I.
14:15 You don't come to this job trying to win medals.
14:19 What you're trying to do is find an avenue to add value.
14:23 And the number one thing I tell the data scientists all the time is data scientists are you should be in a role where you are ridiculously overwhelmingly impactful.
14:35 If you're not.
14:36 Figure out how to be ridiculously impactful or go find another place where you can be.
14:43 Because the ability to singularly be a force multiplier, like not like a 1x, a 2x, a 3x, like 10x, 100x force multiplier on any problem is never been more true.
14:57 We got easy ability to have access to technology.
15:01 Data ubiquity is there.
15:03 And more than anything else, the ability to have a combination of some data with domain expertise allows a different type of integration that we just have not seen in recent times.
15:18 Yeah.
15:19 And so.
15:20 But it sounds like that's already some of these programs just to give people examples for the as they're kind of thinking about their their new career in government after they hear this and and how excited and inspired they'll be to, you know, to come join the digital service.
15:32 Like what's the.
15:33 I mean, we mentioned precision medicine.
15:35 Maybe let's focus on that.
15:36 What what what did that look like from a data scientist perspective?
15:39 So the highest level for precision medicine.
15:42 So precision medicine first going to get over at least one million Americans to contribute their data and donate it to the National Institutes of Health so that researchers can work with it.
15:55 Who are those researchers?
15:56 So I'd love to see there just being generic data scientists who are training on it.
16:00 Now, we have this kid, Nathan Hahn.
16:02 He came to the White House Science Fair a couple of years back.
16:05 And here's a kid literally 16, 17 years old, and he's just interested in machine learning algorithms.
16:12 So he goes to D.B.
16:13 Gap and he starts to play with some data and he's like, look, let me see what there is with cancer sites and all this stuff.
16:17 His algorithms on machine learning are up there with the best.
16:20 You know, we have stories of that all the time.
16:24 We have this woman out of out of Kentucky.
16:26 She's working on renal failure.
16:28 She's building an artificial kidney.
16:30 It's so good.
16:31 It's so good.
16:31 University of Kentucky gave her a whole lab.
16:33 She's 17 this year.
16:35 Like you have this amazing, like America, more than any place in the world, has an unbelievable arsenal of talent.
16:46 It's everywhere.
16:46 What we have to do is connect it with the problem.
16:50 So if we're able to just open up a little bit of that data through the precision medicine initiative and say, hey, come in and work with it.
16:56 What might we find?
16:57 We talk about finding signatures of Vioxx.
17:00 Why did somebody have to look?
17:01 Why didn't that signal just emerge?
17:04 What happens if you start to apply the machine learning algorithm, the feature set selection, things just interesting happen?
17:08 As you get into that problem, you start to work with more and more sensitive data.
17:13 You have to get vetted and you have to work on the data and maybe eventually the data is even air gapped or, you know, in some type of sandbox environment.
17:20 We're working through all those security aspects as it develops.
17:24 But I would love to see people do that and find a force multiplier change.
17:28 Today, by the end of today, 100 people will die on our highways and roads.
17:34 And that number is going up.
17:36 Why?
17:37 Better fuel efficiency?
17:39 Is it cars are driving faster?
17:43 Is it distracted driving?
17:45 We don't know.
17:45 But what we do know is that's a pristine data set.
17:48 And what happens if we just say, hey, America, why don't you solve it?
17:53 When you bring the full force of the United States of America to a problem, you will make that problem break.
18:02 Like you will find a solution to break it.
18:04 Like it is unbelievable how amazing we are as a country when we decide to do something.
18:09 Yeah.
18:10 And I think that you've mentioned two examples.
18:12 I know you've talked about kind of criminal justice and open data and policing.
18:17 It sounds like there's just so there's nothing but opportunities for people who are in the public and just want to engage even kind of prior to thinking about maybe coming and joining the digital service or being a data scientist inside government.
18:28 How can people get involved in these various kind of open data initiatives that are part of tackling these really big society level problems?
18:37 Yeah.
18:38 So there's a whole lot of great ways to get involved.
18:40 And a lot of times people think, oh, you've got to come to the White House to serve.
18:43 No, you can serve in your local neighborhood.
18:46 You know, there's there's a police department.
18:48 There's an education department.
18:50 There's somebody in the local city or your town or call your community, however you want to define it.
18:58 County.
18:58 All of them need help.
19:01 And your ability to look and play with some data and get involved is really powerful.
19:06 That could be through something like Code for America.
19:09 It could also be through something around a cancer program or some other type of area where that just needs this skill sets.
19:16 It could be education, looking at some of that data.
19:19 You can there's all these competitions around what I say competitions.
19:23 I just want to do something like that.
19:26 Because nobody actually knows the answer.
19:28 Get some ideas.
19:28 Jump in there.
19:29 If you want to get to the full federal level, there are the U.S.
19:33 Digital Service and the different digital services in the Department of Defense or VA, Veteran Affairs, or even Department of Transportation.
19:42 All those places are open and waiting for help.
19:45 But the bigger thing is if you want to see a change happen to a community, you've got to jump in.
19:50 Yeah.
19:51 And so I think, OK, so people can make changes on a very micro level.
19:54 People could, of course, get involved.
19:56 The digital services will continue into the next administration.
20:01 And then how do you think the – and kind of at a macro level, I know that your organization has been really out in front about data and ethics like we touched on.
20:14 And I think that this is something that – it's a very challenging problem because on the one hand, it's a policy problem.
20:20 So it does seem to come from government.
20:23 It's really different to most technology innovation or, you know, maybe not if you have kind of a DARPA perspective, but different to most technology innovation.
20:31 This came government first.
20:33 This wasn't something that came out of industry that was a practice that was later adopted inside government.
20:37 This was really the Office of Science and Technology Policy out in front, really the White House leading the charge on this.
20:44 And I'm just – I'm curious, A, kind of how you see that permeating across data sciences and industry.
20:51 And then, B, how you see that being carried forward into the future.
20:54 Like what are the implications for this idea about ethics and data science and artificial intelligence and machine learning?
21:01 Like what does that mean for us?
21:03 Yeah.
21:03 So the person who's led from the front of this is the president.
21:07 This has been a forefront issue for the president.
21:10 And it's – where does it really stem from?
21:13 Is that force multiplier?
21:15 Is the force multiplier good or is it harmful?
21:18 Does it – what are the edge cases?
21:19 And one of the things that I have taken away personally from this job is when you're building a company, as you guys are, you always get to say, well, that's an edge case.
21:27 When you're here, those edge cases have names.
21:30 They're names like Sally, Giselle, Juan, Ricardo, whatever.
21:36 They all have names.
21:37 And what's the impact for them?
21:40 So when data is being used, how is it being used and what are the implications?
21:44 So the very first big data report that John Podesta led, that emphasized the need for thinking along this direction.
21:51 We've had since then three other data reports.
21:54 And the latest one was an AI report.
21:56 All of them, every single time we go out and talk to people, this is what's actually on their mind.
22:00 It's not about the sentient being that's going to emerge and figure out how to become the robo-pocalypse.
22:07 It's actually, well, who's getting harmed and what's there?
22:11 And how do I know that I can trust this for my kids or my kids' kids?
22:16 The place where this gets impactful, and you guys have talked a lot about this, is the black box of algorithms.
22:22 The ability to know, is this data okay or not?
22:25 There was a great aspect the other day where somebody pointed out, well, if we got self-driving cars and we got good data, bad data coming in,
22:33 what are the implications when an algorithm can't recognize even African-American faces?
22:40 Does that mean self-driving cars have a decision disparity when this comes to race?
22:46 And is that a data science?
22:48 Whose problem is that?
22:49 Do we just be able to say, oh, sorry, that didn't have good training data?
22:55 That's not acceptable when you're putting something out there that may harm the public.
23:00 You wouldn't want a drug maker suddenly saying, oops, sorry, we didn't take into account the fact that hipster data scientists
23:10 with beards living in Austin aren't in the training data set.
23:14 Present company excluded.
23:17 Sure, sure, sure.
23:18 This portion of Talk Python to Me has been brought to you by Rollbar.
23:36 One of the frustrating things about being a developer is dealing with errors, relying on users to report errors, digging through log files, trying to debug issues, or a million alerts just flooding your inbox and ruining your day.
23:47 With Rollbar's full stack error monitoring, you'll get the context, insights, and control that you need to find and fix bugs faster.
23:55 It's easy to install.
23:56 You can start tracking production errors and deployments in eight minutes or even less.
24:01 Rollbar works with all the major languages and frameworks, including the Python ones such as Django, Flask, Pyramid, as well as Ruby, JavaScript, Node, iOS, and Android.
24:10 You can integrate Rollbar into your existing workflow, send error alerts to Slack or HipChat, or even automatically create issues in Jira, Pivotal Tracker, and a whole bunch more.
24:19 Rollbar has put together a special offer for Talk Python to Me listeners.
24:23 Visit rollbar.com slash Talk Python to Me, sign up, and get the bootstrap plan free for 90 days.
24:29 That's 300,000 errors tracked all for free.
24:31 But hey, just between you and me, I really hope you don't encounter that many errors.
24:35 Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch, and more.
24:41 Give Rollbar a try today.
24:43 Go to rollbar.com slash Talk Python to Me.
24:45 Zach Kohani did this great research project, sending cardiac death syndrome.
24:58 Turns out African-American males have been being given too high a false positive reading on this thing.
25:06 Why is that the case?
25:07 Turns out genetics and genomics are substantially more complicated, we thought.
25:12 But also there's not enough healthy African-American males in those clinical trials.
25:17 But if you look even more, there's not even really women in a lot of clinical trials, let alone talk about ethnicity of women in trials.
25:28 So as we're working with data, you kind of go back to the source and say, well, what bias is there?
25:33 And all of these other things.
25:34 And then the other one that I think we have to confront and everyone has to start really asking these questions is what does it mean when somebody just slaps a label on something that says data science verified?
25:47 Just because it's great for sales and marketing.
25:50 And as data scientists, all of us need to take a very serious look at that and say, does that meet our bar?
25:57 Because those people, when they sell something, they're representing it on behalf of the community.
26:02 And as a community, I'd like to think that we're better than that.
26:06 Yeah.
26:06 So I mean, I think there's this idea that, of course, it really speaks to the need for data scientists and almost like in policymaking positions or near policymakers to be able to inform at a high level when we should be or what steps should we be taking to verify that the data that we're using to make these decisions is not biased or that our processes aren't biased.
26:27 But then at the same time, you're talking about this individual responsibility.
26:31 And where do you feel like that's going to come from?
26:34 I mean, is it situations like this where those of us in the community who have sort of seen this in action can be advocating for it or those who are listening to this podcast can accept it and do some thinking?
26:45 But maybe beyond that, as a kind of a practice or as an industry, how do we – is that something we can institute?
26:55 Like, what do you think?
26:56 Are there changing policies, changing education, certifying?
26:59 I mean, what do you think?
27:00 Well, if we're not careful, we're going to get regulation.
27:02 And the regulation will come in usually in the form of legislation.
27:06 And that has good and bad effects.
27:09 It's very tough to get that kind of legislation correct when it's a very fast-changing technical landscape.
27:17 Extremely hard to do.
27:19 So what I would like to see, and I think the place where we first are and that we have called for from the White House is that every training program, data science, economics, computer science, whatever, you've got to be trained in two things.
27:31 As not electives, but as two core principles.
27:33 What does ethics look like?
27:36 And what does security look like?
27:37 Because if you're learning about databases and you don't know what overflow is, that's crazy in this day and age.
27:43 In the same way, if you don't know about training bias and with the ethical implications, as you're building something, that can't be a slap-on additive elective kind of thing.
27:54 It's got to be intrinsic to every course.
27:56 So if you don't have that in the training program you're in right now, you need to demand it because you're getting a subpar training course.
28:02 And it's not going to prepare for the real world.
28:04 The other aspect is as we're interviewing people, as we're talking to people, we should all ask an ethics question.
28:10 Methods question could be just very simply, we're going to pretend we're doing an interview here and say, Jonathan, thanks for coming in.
28:17 You happen to be building an algorithm and we're really focused on this because we're building job matching.
28:22 And we're not supposed to use race.
28:25 You got this amazing data set and because you're the all-star data scientist and you have a podcast, you just happen to look at all the data and say, hey, I think I found ability to bypass race or create a proxy of race.
28:39 What do you do?
28:40 Yeah, these are interesting questions, right?
28:43 Because I think it's the kind of thing that as you could imagine, I think sometimes as technical people, we get really focused on the solution to the problem.
28:49 And we put that ahead of everything else.
28:51 And we're not really thinking about the implications of our work.
28:54 And we just go, hey, I found the solution.
28:56 I solved the Rubik's Cube.
28:58 I'm done.
28:58 That's right.
28:59 But I think we don't say this all.
29:01 We don't have a conversation.
29:02 Right.
29:03 Like all the people that are in the cadre of data scientists who have recognized this problem.
29:09 What's the one commonality that we all have other than generally going to bars and having beer talking about these problems is we talk about the problems.
29:18 We talk about these.
29:20 We don't just talk about, oh, how did you do that?
29:23 We always kind of look back and we're like, well, is that what we should do?
29:27 What about this?
29:28 What about these other implications?
29:30 We care about the edge cases.
29:32 We care about the longer term implications of what we're building.
29:36 When you create something with data or an algorithm, that's equally important as if you were creating an artwork, you're creating a bridge or building a building.
29:45 We have to take that responsibility.
29:48 And it's not that because the responsibility will be imposed on us.
29:51 It's because if we are going to be a massive force multiplier in the world, you have to accept the responsibility that comes with that.
29:59 Yeah.
30:00 Well, because we're kind of we're at the we're at the stage where it's a little bit the Wild West right now in terms of how data and data science works.
30:08 And we're, I think, still in that window before there's any there aren't really sort of set mechanisms for how people collaborate in teams.
30:17 There aren't set mechanisms for how data science models are deployed or or checked or QA and kind of all the stuff you have in other technical disciplines.
30:25 It doesn't exist here yet.
30:26 And so I think ultimately we get to decide this generation of data scientists get to decide how we would like history to look back or how we would like our industry to evolve going forward.
30:38 And even more so, I think history will judge us.
30:42 It will either judge us kindly or harshly depending on what comes out of these products and the implications of how not only our society uses these products, but other societies that could even be repressive use our products.
30:57 And what are the implications of that?
30:59 All of that falls on us for the implications of these things.
31:03 And we just have to get ready to drive that world because the the the time is now for us to do that.
31:10 But let's actually, you know, one of the things I think given your audience, I'd actually be really curious to hear what the audience has to say on this.
31:19 I think one of the things that we haven't heard sufficiently because it's just the data science community is early is what is the data science community think?
31:27 What would be the most powerful mechanisms to move the needle on this?
31:32 Should we have more of the institutional review boards and very fixated regimented process?
31:37 Should we have a very laissez faire?
31:38 Where do we stand?
31:40 That's you know, that's one that I think the community has an incredible opportunity to stand up and say this is where we believe.
31:45 And I have no idea what the the the broad consensus of data scientists are thinking, except for the ones that we've had a lot of interactions with our our very traditional White House processes.
31:58 And that makes sense.
32:00 And I think that there I would wonder how much people are even thinking about it.
32:05 It does seem like they're good because in other industries, there are there are roadmaps in the legal profession, the medical profession.
32:12 Like these are also kind of knowledge based, highly technical professions that have established some concept of ethics and mechanisms for either encouraging or forcing people to adhere to whatever the community standards are.
32:24 But but as of yet, we don't know.
32:27 Is that the appropriate way to think about it for data science?
32:30 I don't know.
32:31 Well, the ethics, if you look at bioethics, biomedical ethics or physicians, any of the the physicians who are at the very early front end of that space in terms of ethics getting implemented across that they're still practicing.
32:46 That's how recent these things are.
32:49 So for us to get ahead of that, that's going to be critical.
32:52 And we're going to have an equal opportunity to drive more of medicine in this way through data driven approaches than people even recognize.
33:00 So the analogies could be even more similar than we appreciate.
33:04 And same with law.
33:06 So speaking of the kind of implications for the broader community, now that, you know, now that this administration is coming to a close and obviously you're very passionate about where the community should be going.
33:16 I feel like I have to ask, what are you doing after this, DJ?
33:19 How are you going to be?
33:20 How's your leadership role in the data science community going to transition?
33:24 Right.
33:24 Well, as much as you can talk about it, as much as I can talk about.
33:27 So I feel like I should insert somewhere a Chris Albin is my cousin joke.
33:31 I'll visit my own long lost relatives out on the border.
33:37 He'll appreciate that.
33:38 Having not been able to be here in person.
33:39 That's right.
33:40 It'll at least give him a laugh.
33:42 Chris, we miss you.
33:46 Most of the time.
33:47 Most of the time.
33:48 Most of the time.
33:49 You know.
33:49 But for me, the biggest thing will be to take a nap.
33:56 I'll be taking off the tie.
33:57 And the thing that I'm excited is that the biggest thing is the wave of data scientists that are in the federal structure.
34:06 A lot of people are questioning what's going to happen.
34:08 Is this going to collapse?
34:09 The federal, the civil servants who are going to just carry this forward.
34:14 There are chief data scientists or chief data officers or some type of analytics leader, data leader in more than 24 of the federal agencies.
34:25 And they're going to carry on the mission.
34:26 So I feel very good about the progress we've made.
34:30 That doesn't mean like at all to say the mission is done and the community is going to have to continue to kind of continue to champion this.
34:36 For me personally, I'm a big believer that there's a big difference between experience and wisdom.
34:43 And you go from experience to wisdom through reflection.
34:47 Reflection is sitting down thinking.
34:50 It's talking to people, having these kind of conversations, writing, all those different things.
34:56 And so I'll go through some period of reflection to try to distill as much as I can from this really very, very unique experience.
35:04 And then I think I'm really going to be excited to get back to building.
35:08 I think there's in the powerful thing about all data scientists is we're makers at heart.
35:13 And if we've taken anything that's theirs, your ability to be creative and create something novel and do something that's unique that nobody saw, like your work on ISIS, you know, using Twitter data.
35:28 Those type of things you just see, like people seeing the world through a lens that people hadn't seen before.
35:34 It's kind of like when you see a photograph and you're like, wow, I never saw the world that way.
35:38 We have that unique ability to do that with data.
35:40 How does that resonate as a true product or something that somebody else can do?
35:45 When we're building, we're learning in different ways.
35:48 And I'm really excited to get back to a different form of building.
35:52 This one has been largely policy-based.
35:55 And I'll be looking forward to getting my hands dirty and hands back on keys and trying to figure out what that looks like in some form or another.
36:02 All right.
36:03 All right.
36:04 All right.
36:04 Well, we look forward to some writing, perhaps, to some reflection, and then ultimately to spitballing on how to build the right model for whatever it is, whatever problem you'd like to build to try and solve.
36:16 So that's cool.
36:18 That sounds like it'll be – and a well-deserved rest after – I mean, I know around here at the White House, I was joking just last night that it's not uncommon to find a fair number of workaholics around here.
36:28 You guys all push pretty hard because everything is important at this level.
36:31 There's always a crisis.
36:32 And there's always an opportunity to do more.
36:35 Yeah.
36:36 So when you're balancing those, you can't let the urgent get in the way of the important.
36:40 And there's this fundamental thing.
36:43 It's actually – there's a great one.
36:44 So I'll just share this kind of card with you.
36:47 There's this – the president gives us these cards.
36:50 And this one says, everything we do needs to be infused with a sense of possibility.
36:54 We are not scared of the future.
36:56 Everything we do needs to be infused with a sense of possibility.
36:59 We are not scared of the future.
37:01 So there's also the other analogy, Cardo, that he gives us, which is remarkable things happened in the last quarter.
37:07 And for all the sports people, you don't need to – you can imagine your favorite game where you've seen that.
37:16 So don't think that we're done.
37:19 Yeah.
37:19 Okay.
37:19 So there's more to come here in the last quarter.
37:23 But then also even going further past that.
37:27 I mean, you mentioned something just a moment ago about the way that kind of data and analytics has been really embedded in government.
37:35 And so the mission continues.
37:36 And there's – I'm not – you know, again, I know that there's kind of a lot going on right now.
37:42 And so there's maybe not a ton to say.
37:44 But I bet a lot of people listening are thinking to themselves, well, what happens in the next administration?
37:48 It's an administration from the other party.
37:51 So it's a big transition.
37:52 But then at the same time, I know, you know, from a technology perspective, that this is a largely bipartisan issue.
37:59 I think that you actually served in George W. Bush's administration, if that's right.
38:04 So, you know, you've seen this from the perspective of a civil servant.
38:08 I don't know.
38:10 What happens in the change as we move from one administration to the next?
38:14 And what are the implications for people who might be thinking about getting involved?
38:17 Well, the biggest thing to think of as an administration is it's a baton.
38:22 It's like a baton race.
38:23 And so your job is to hand the baton off to the next team while they are sprinting equally as fast as you are.
38:30 No baton drops are acceptable because that's national security.
38:36 That's people getting hurt.
38:38 That's people – you know, there's a lot of services that people critically depend on.
38:42 So we have to make sure the baton is well passed.
38:45 So – and that is what the president has really emphasized is that just as the Bush administration transitioned to the Obama presidency, that was such a clean handoff.
38:57 We have to do that as well.
38:58 And we're all team America.
39:01 We're team USA here.
39:02 So we have to do that.
39:04 And you never want to bet against the country.
39:07 That's not who we are.
39:09 The other part there that's the case is these problems – cancer doesn't care what religion you are, what political party you are, what socioeconomic class you are.
39:20 It is a problem of species.
39:23 Zika, Ebola, these are problems of a species.
39:27 Climate change is as well.
39:28 Obviously debated.
39:30 The science is clear.
39:32 And I'm sure as more and more people actually take a look at it as they start to shift to thinking about these things, more people will say that is obvious.
39:41 When the people start looking at these other sets of problems around criminal justice at the local level, these are not federal problems inherently.
39:49 They're local.
39:50 And if you look at Governor Bevin, who is the governor of Kentucky, he knows that there's a giant gaping budget gap that is being caused by the local criminal justice system.
40:02 And for those that are out there that don't know, we're talking like $20 billion out of the U.S.
40:08 That's like basically how much we're paying for jails.
40:12 And who are we paying for?
40:14 There are more like – there's more than 10 million people, like basically 11 million people going through our 1,300 jails.
40:24 And it's crazy.
40:26 Like you think about those numbers.
40:28 95% of them never will go to prison.
40:30 These are local jails.
40:32 And they're staying there for average 23 days.
40:34 So 11.3 million people going through 3,100 jails.
40:40 I said my number slightly differently there.
40:41 But 11.3 million through 3,100 jails.
40:45 That's insanity.
40:47 And who are those people?
40:50 You look at some of our jail facilities.
40:52 Cook County Jail, 92 acres of a single-site jail in Illinois.
40:58 That has one-third mentally ill.
41:01 We were just in Las Vegas the other day, and they were talking about assaults on officers.
41:05 And when they thought about it, they all thought, oh, it's gang-related violence.
41:09 When they actually looked at the data, it's all mental illness.
41:12 So why are we sending mentally ill to jail?
41:15 Why not train officers?
41:17 Miami-Dade, Florida did this.
41:18 They trained officers in intervention.
41:20 And what happens?
41:21 Oh, gee.
41:23 It turns out that if you spend a million bucks to train officers and dispatch and crisis intervention,
41:29 you can save more than $10 million in the jails.
41:34 And you can close the jail, which is the more important measure.
41:38 Now, think about that with respect to the opioid crisis.
41:42 So these ideas of what it means to use data in these clever ways.
41:47 And how do you do that?
41:48 Take your data from your portion of your criminal justice system.
41:53 Move it over to the health care system.
41:55 Not super sophisticated.
41:57 Just move it over and look at who are the people that are constantly cycling.
42:00 How many jail days are they going through?
42:03 How many dollars are they doing?
42:04 And then ask, next time the police see them, why don't they take them instead of jail,
42:08 put them into this other treatment plan?
42:10 Let's take them directly to treatment.
42:12 Those are the ideas.
42:13 This portion of Talk Python to Me is brought to you by GoCD from ThoughtWorks.
42:33 GoCD is the on-premise, open-source, continuous delivery server.
42:38 With GoCD's comprehensive pipelining model, you can model complex workflows for multiple teams with ease.
42:44 And GoCD's value stream map lets you track changes from commit to deployment at a glance.
42:50 GoCD's real power is in the visibility it provides over your end-to-end workflow.
42:55 You get complete control of and visibility into your deployments across multiple teams.
43:00 Say goodbye to release day panic and hello to consistent, predictable deliveries.
43:05 Commercial support and enterprise add-ons, including disaster recovery, are available.
43:09 To learn more about GoCD, visit talkpython.fm/gocd for a free download.
43:15 That's talkpython.fm/gocd.
43:18 Check them out.
43:20 It helps support the show.
43:28 Miami-Dade did that one year alone.
43:30 They're making those crazy savings costs.
43:31 How do you make that happen at scale?
43:33 That's where the data science comes in.
43:35 Because Miami can do it.
43:37 How about Florida and other portions of Florida?
43:39 How about, you know, Louisville?
43:41 How about, you know, Boston?
43:43 How about somewhere in, you know, Seattle?
43:46 When that common platform is there, that's where we're going to see that change.
43:51 So it sounds like at an individual level that there's opportunities to find important applications
43:58 for data like this that really start bottom up.
44:01 Like they really do start people getting engaged and being active in their communities and working
44:05 on data that's local to their communities.
44:07 And then when interesting solutions are discovered, then that's where at a federal level, you can
44:13 be thinking, oh, how can we now be another force multiplier to take that kind of solution and
44:18 see where else it might be applied or bring people together or orchestrate policy so that
44:23 it enables this at some kind of national scale?
44:25 Well, we think of it as scout and scale.
44:27 Okay.
44:27 So if somebody's doing it great over here, we say, hey, everybody else, look at this.
44:33 And the White House gives you an incredible bully pulpit to say, hey, here's how we can
44:38 scale that.
44:38 But data-driven justice and police data initiative, these two kind of programs in this space, they're
44:44 what they don't really have a White House, like no data is coming to the federal government.
44:49 They're all local and they're trying to say, hey, here's what works for us.
44:53 Here's what works for us.
44:55 You know, we talk about A-B experiments.
44:56 We forget in healthcare, each one of us is an A-B experiment of life.
45:02 The question is of what was the original questions?
45:06 What were those hypotheses?
45:07 Was it local environment?
45:08 Was it genetics?
45:09 All these other things.
45:10 Same way, each city, each township, each community is an A-B experiment across the country.
45:16 But when we use big data techniques, we're able to abstract and say, hey, here are the
45:20 common features that we believe lead to this.
45:23 That creates a hypothesis that then we can test with policy.
45:28 Okay.
45:28 Okay.
45:28 Okay.
45:28 So, and then, so you can, you can basically test this hypothesis in multiple places and
45:34 then, and then see whether or not the features that we assume actually lead to a reduction
45:38 in the cost of, say, local incarceration are actually the features that are common from
45:44 city to city with, you know, when we take away everything else that might be, might be
45:49 relevant to the problem.
45:49 Like, like location, for example, geographical location.
45:52 That's absolutely correct.
45:53 And one of the things that you start to see as you start asking these questions is you
45:57 realize that other people just haven't had time to ask a question or technical expertise
46:03 at helping them to ask the question or legacy systems that prevent them from asking the question.
46:07 So in the case of an saying, hey, your officer assaults when your officer has been assaulted,
46:13 who's, why is that happening?
46:16 They say, huh, you know, we had a thing, but we've actually never checked.
46:20 Now, is it the police officer's fault or the police department's fault that they haven't
46:24 had time to do that?
46:25 These officers are so massively overloaded.
46:29 It's unbelievable because we're asking them to do more and more and more.
46:32 And just to kind of go to another example here, because this is one that I think is important.
46:37 It's like, we talk about police officers and we forget what is data doing for the officer.
46:43 And the team at University of Chicago did a really cool set of research projects with a town down in the south.
46:52 And what they did was they looked at the data and they sort of said, what is causing officers to use excessive force?
46:58 What are those features?
46:59 And so right away, it's a signal and noise problem because there's a very small number of officers that are actually using excessive force.
47:06 So then you've got to kind of separate that out and you start looking at that data.
47:10 And the first set of features is super obvious.
47:12 You have a history of traffic accidents and, you know, the usual kind of things.
47:16 Then suddenly a couple of features emerge in the middle.
47:19 Number one, oh, look, you responded to multiple suicide calls.
47:24 Oh, you responded to domestic violence where children were present.
47:29 So what's a good data scientist doing this situation?
47:32 They don't just try to extrapolate.
47:33 They go talk to the officers and they go follow along with the officers.
47:38 So what happens in a suicide?
47:39 Suicides are physically messy.
47:41 It's a lot of it's just a gory situation.
47:45 It's very uncomfortable.
47:46 But also there are human emotions that are highly supercharged in their families.
47:52 Everything is it's just a high emotional thing.
47:55 Same thing with domestic violence, especially when a child is present.
47:58 So what's happening?
48:00 Dispatch says at the end of that says, you're done.
48:04 We'll get back on the beat.
48:05 So now you pull some kid over with a broken taillight and they're flipping with you.
48:10 You just came off this highly emotionally charged thing.
48:13 None of us are good enough to go from that context shift.
48:16 So why is the dispatch system not thinking about this ahead or anticipating this?
48:23 Because the data is obvious.
48:24 Give the officer some time to decompress.
48:27 Give them some some chance to become back to being normal and human.
48:31 That's a failure of data rather than an opportunity of where data is being used in the way it could be to help the officers.
48:40 And it sounds like these are things that I mean, given.
48:43 And of course, you know, things are always obvious after the fact.
48:47 But but that sounds like a relatively straightforward discovery that somebody made once it was once somebody looked at it.
48:54 Playing just with a little bit of it.
48:56 We're talking just a few months of effort.
48:58 We're not talking like, you know, some whiz bag thing.
49:02 Even danger of injustice, this idea of moving the data around from one system to another.
49:05 The first portion of this, we're talking like passing spreadsheets.
49:09 You know, we're not talking like crazy super infrastructure.
49:14 This basic level stuff gets you very close to the problem.
49:18 When you start working with the people, you will see a very different angle of the problem.
49:24 Yeah. And so it sounds like it's it's kind of right there.
49:26 It's right in front of us.
49:28 And there are problems to be solved, like real human lives in the balance problems that we can be out.
49:36 We can go out and be solving.
49:38 Yeah.
49:38 Imagine you did this.
49:40 We always talk about a data set.
49:43 We don't talk about the people behind the data set.
49:46 And the thing that I have taken away more than this job, anything is people are greater than data.
49:50 We all know that intrinsically.
49:52 But if you remember and you have the people that you have in your mind when you're working on this, you'll have a different approach.
49:59 All right.
50:00 Well, I think that that's actually a fantastic place to end it.
50:02 That's a that's a it's a nice reminder for the audience.
50:04 And thank you so much, DJ.
50:06 This has been a really fantastic interview.
50:07 We really appreciate you coming on the show.
50:09 And thanks for everything that you and your team have have done for both the data science industry and for the country.
50:15 Yeah. Thank you, guys.
50:16 Thanks for it's been fun and looking forward to seeing what the community can do.
50:20 I'm really excited for everything.
50:21 This has been another episode of Talk Python to Me.
50:26 Today's guest was DJ Patel.
50:29 And this episode was guest hosted by Jonathan Morgan.
50:33 Thank you both for bringing us an excellent conversation.
50:36 And thank you to Rollbar and GoCD for sponsoring this episode.
50:40 Rollbar takes the pain out of errors.
50:43 They give you the context and insight you need to quickly locate errors that might have otherwise gone unnoticed until your users complained to you, of course.
50:50 As Talk Python to Me listeners, you can track a ridiculous number of errors for free.
50:54 Just go to Rollbar dot com slash Talk Python to me to get started.
50:58 GoCD is the on premise open source continuous delivery server.
51:03 Want to improve your deployment workflow, but keep your code and builds in house.
51:07 Check out GoCD at Talk Python dot FM slash G O C D and take control over your process.
51:13 Are you or a colleague trying to learn Python?
51:16 Have you tried books and videos that just left you bored by covering topics point by point?
51:20 Well, check out my online course Python Jumpstart by building 10 apps at Talk Python dot FM slash course to experience a more engaging way to learn Python.
51:29 And if you're looking for something a little more advanced, try my write Pythonic code course at Talk Python dot FM slash Pythonic.
51:36 Be sure to subscribe to the show.
51:39 Open your favorite podcatcher and search for Python.
51:41 We should be right at the top.
51:43 You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on Talk Python dot FM.
51:52 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.
51:57 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at Talk Python dot FM slash music.
52:04 You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song.
52:09 This is your host, Michael Kennedy.
52:11 Thanks so much for listening.
52:12 I really appreciate it.
52:14 Smix, let's get out of here.
52:16 Stating with my voice, there's no norm that I can feel within.
52:20 Haven't been sleeping, I've been using lots of rest.
52:23 I'll pass the mic back to who rocked it best.
52:26 First developers.
52:27 First developers.
52:28 First developers.
52:29 Developers einfach florist like God bless you and be Remembering a bit?
52:37 Bye.
52:37 .
52:38 Thank you.