#89: A conversation with the Chief Data Scientist of the United States Transcript
00:00 In this special episode, you'll meet DJ Patil, the current chief data scientists of the United States. And you'll hear his thoughts on data at the level of the United States government and look back on his term over the past few years. This is talk Python to me, Episode 89.
00:19 developer in many senses of the word because I make these applications, vowels and use these verbs to make this music constructed. To think when I'm coding another software design, in both cases, it's about design patterns, anyone can get the job done. It's the execution of
00:38 interest. Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode has been sponsored by robar and gocd. Thank them both for supporting the podcast by checking out what they're offering during their segments. Hey, Jonathan, welcome back to talk Python. Hey, man,
01:12 thanks so much for having me. It's great to be back.
01:14 Yeah, absolutely. You actually here twice this month. So later this month, we're gonna do our top 10 data science stories to 2016. And that's super fun. We already recorded it. And I'm looking forward to sharing it with everyone. But we actually have a really special opportunity here. And you talk to me a few weeks ago said, hey, look, I have this great opportunity. Could I do a co host? Show on talk Python? And I'll just let you tell me what it is. Because it's really great. I you told me what it was. I'm like, Yes, you should do this. What do you got in store for us?
01:42 Yeah, yeah, absolutely. So thank you, by the way, and thanks to the audience for letting me elbow into to this week's episode. But basically, we got invited. So our I co host a podcast called partially derivative, which is a kind of nerdy data science and data podcast. And we got invited to the White House to go interview the US chief data scientist, DJ Patil, who's part of the Office of Science and Technology Policy, which is where this CTO, Megan Smith works. And so all of the cool stuff that government has been doing during the Obama administration to get technology and data and data science into government, we just kind of wanted to talk to him about it and say, like, how's it been going? You know, basically, like, do DJs exit interview as the US chief data scientist. And and it was, it was awesome. I mean, we got to go to the actual White House. And and we just wanted to share the, what we learned from DJ and the conversation that we had with as many tech people as possible. And so I thought, well, I know a guy who gets to talk to a lot of really fantastic tech people every week. And so I'm really glad this worked out.
02:46 Yeah, I'm really glad as well. And, you know, doing a show live from the White House. It's pretty amazing for a podcast, I think, especially for a tech podcast. DJ is the first chief data scientist United States, right?
02:59 That's right. He's the first person to ever hold that position, which is, which is very cool. I think he really understands the significance of that of that position. So it was really fun to hear him reflect
03:08 on it. Well, I really enjoyed the interview that you did with him. And so without further ado, take it to the White House. All right, let's go to the White House. All right. So
03:16 I am here in the actual White House with the actual chief data scientist of the United States. DJ, thank you so much for being on the show. My
03:25 pleasure. Awesome. All right. So we
03:26 should probably start First things first, I think most of the audience will be familiar with who you are in your work, but just in case they're not. What is the chief data scientist of the united states do like what's your gig?
03:36 Well, the simplest way to put it is actually the mission the president gave us. And it's something that's sort of phenomenon itself is that how does a constitutional law professor get so focused on data and technology and mission he gave us is to responsibly unleash the power of data to benefit all Americans in return on America's investment in data. And the components? I think they're critical. They're responsibly, something that we've, we've, in all of you, I mean, both you and Chris and Nvidia have talked about extensively is what is responsibly mean with respect to data algorithms, technology? And then what does it mean to benefit all Americans in just because we have technology? No. And then people have access to certain systems and solutions doesn't mean everyone in the country does. So how do you make that happen? And if our belief in theory, the cases that if we do it here, everybody around the world will benefit as a result?
04:34 Yeah, I mean, fortunately, that's a nice, small, manageable mandate that, you know, has no serious, long, broad reaching implications or consequences. But I mean, so you've been you've been a part of this from the beginning, obviously, you're the first chief data scientist. So how, how have you? How have those how's that mission that the President gave you? How's that been playing out over your time here?
04:55 Well, I actually think that the first chief data scientist was really Washington. As a cartographer,
05:02 that's true, right? If
05:04 you look through the arc of history, you know, even Lincoln did, basically Euclid principles of mathematics from for Switzer like we've had a lot of presidents who've been deeply, deeply mathematical or analytical in their, their, they're just the way they operate the way they think. I think what's true in the case specifically for this president, when you walk into the oval, you don't see dishware, you don't see, you know, just kind of like, kind of little cottage tchotchkes in the wall that are that he actually has the submissions of the real original patents, for things like the Telegraph and the gear cutter. And the reason for that is, if you think the real arc of our entire history as a country, from the founding of the institution, what's really there is, is that in every case, data and technology has been a force multiplier, it has really transformed our ability to move as a society. And we're seeing that next wave of transformation take place right now. So what does that look like over this arc of just this time period, just in this time period alone, just in the last couple years, we have major movements on precision medicine, the idea of catering, tailored treatments. And as as for health purposes, you have the Affordable Care Act that kind of doubles down in a way that people don't always always realize one of those components is that you can't be denied coverage because of a pre existing condition. When you get to the genome, every one of us has a pre existing condition. It's called being human. So there's kind of like these fundamental kind of things that are trying Cancer Moonshot, all the aspects of cancer are fundamentally based on being able to move data, collect it, store it, use it responsibly, and then act on it extremely fast get you to the right treatment. So right cares, smarter, faster, better, those things. in criminal justice, we have the data driven justice program on the police data initiative, both working on different sides, when working to create transparency for police departments with systems, the other to find early intervention techniques. All of these is just a few. across every single aspect that we have, there is data at almost every level of conversation, whether that's how do you think about getting kids into school, whether that's national security, that's a weather forecast, it's an every single thing, every single thing is as it's supposed to be as it is supposed to be with DNA, which just basically implies that everyone that's listening out there, we have good job security.
07:41 That's true. That's true. It's
07:41 a it's a good time to be a data person. But and you touched on something interesting, just there. I mean, there's these sort of specific programs and initiatives that you talked about the precision medicine, and kind of data driven policing. But then you've also you also talked about this kind of general capacity building inside government. I mean, this this administration has used data or kind of embedded data into different agencies, like, like no other administration before? And can you talk a little bit about the play between those two things, both kind of building capacity, looking at big issues, like government transparency, and accountability, and how data informs that but then also, and then we can talk about some of these specific examples as well. But I'd be interested in how you this kind of this general idea about growing the data capacity of government, and that's been a real, I think, a real revolution here in DC?
08:28 Well, it's actually, one of the more fascinating things is that, you know, there's this narrative going around that Silicon Valley's got to come to DC to save it. But people forget, though, like, Where is all the investment in data originally come from? You know, it's the government, whether it's census, whether or any of the other types of things, do D has funded some of the greatest advancements in data? So as Department of Energy, so has NIH, all these programs, whether it's CERN and big, large scale, atomics, you know, understanding of forces of nature to just the operational aspects of health care, isn't everything. The part that I think is unique and why there's a data sight? Why do we need a chief data scientist when we have economists and statisticians, there's a chief statistician, by the way, there is basically a chief economist Jason Furman, who's the head of council economic advisors. So why do we need a chief data scientist, essentially interesting that the statistician, and economists don't always talk to each other. And it's not a Bayesian thing. It's like we're in different verticals or different silos. And then the aspect is increasing amount of data is happening from outside the federal government. There's more so how do you bring that data together? How do you really rethink the way data is being used? That's the component that that the role is there and That's the shift that we're seeing is that people who are coming into this, these, these new roles in the federal government have much more of that data sign ethos, which say, well, there's lots of ways we can be clever to solve this problem, I may not actually have the data, I know where to go get the data, oh, by the way, the data is incredibly messy and not in the right format. So I'm going to figure out how to get it together, we're going to try a hypothesis, we're going to test we're going to iterate we're going to try lots of different things. Most of what we do at the end of the day, is literally giving them freedom of space to do their job, because they have the ideas they know what to do. We just got to give them a runway.
10:36 Yeah, I think you're in a unique position to recognize that kind of that technology professional mentality, right? Like you come out of Silicon Valley. Well, I know that you worked in government before you've been in Silicon Valley. And now you're back in government, again, kind of trying to bridge that gap between these two communities that sometimes can be a little bit distant, like the like, even the the idea that Silicon Valley needs to come in and save government. I know, sometimes, people bristle at that here in DC. And then I know at the same time people in the tech community kind of bristle at the way that government does things because they don't really understand it. And so how do you how do you see that? How do you see that relationship getting stronger? How do you How are you bridging that gap? Because it really sounds like that's a big part of what you're describing here is data inside government data outside government kind of adopting that mentality and bringing that capacity to the work that government's already doing. But at the same time, not forgetting that the foundation of a lot of these technologies and data driven approaches came out of government in the first place. So how are you kind of making that marriage stronger?
11:37 Yeah. So the easiest way to think about this is through all the different podcasts that you've done, where you said, Wow, we need to work on this, or you call it a story. And you're like, geez, how did that happen? That's crazy. Like, there's so many different problems. The biggest issue is we don't know how to help. Like, we say, I know how to do that we know how to raise our hand, say I could help on that. But there's no door, there actually often is a door is just not well marked, or it's hidden or obfuscated in layers of bureaucracy. So what we really tried to do is try to figure out how to show everyone the door. The other part that's there is really just helping people exchange that create a common language. And one of the most powerful common languages is data. At the end of the day, you're able to talk about things share things and kind of see the difference approaches. When we don't use data as a weapon against each other, we're actually using to have a conversation, it changes the tenor of the conversation, the whole nature of how we're actually in discussion, we make it a discussion. So the and the final part that I think that's most important is, if we say that this is the mission, the Secretary of Defense has a great way of saying is like, there's nothing greater than knowing waking up knowing you're part of something bigger. And when you're part of the mission. It's awesome. I mean, you've had a chance to experience that others have like who people worked around these problems. And the part there that that is, is unbelievable is just when you get a chance to do it. And that's, it's just giving people doors.
13:16 Yeah. And actually, this that's a good point to bring up. Because I know this has been a big a big part of what you've been doing here is inviting more technology professionals kind of in data science professionals into the government. And I know that actually speaking of the Secretary of Defense, we were recently acknowledged for a pretty significant award, I think, the highest award that a civilian can can receive from the, from the God Is that right?
13:40 That's right. Yeah, but where's your metal?
13:48 I can't, I can check you
13:50 but I can't, I
13:52 can't. But in all seriousness, it is true that I, it was incredible honor to be able to receive an award, even a metal which is kind of a weird, surreal thing. But the the number one thing that I you don't come in this job trying to win medals is what you're trying to do is find an avenue to add value. And a number one thing I tell the data scientists all the time is data scientists are, you should be in a row where you are really gillislee overwhelmingly impactful if you're not figure out how to be ridiculously impactful or go find another place where you can be because the ability to singularly be a force multiplier, like not like a one x two x three x like 10 x 100 x force multiplier on any problem is never been more true. We got easier To have access to technology, data, ubiquity is there. And more than anything else, the ability to have a combination of some data with domain expertise allows a different type of integration that we just have not seen in recent times. Yeah. And
15:19 then and so but it sounds like that's already some of these programs just to give people examples for the as they're kind of thinking about their, their new career in government after they hear this and, and how excited and inspired they'll be to, you know, to come join the digital service. Like, what's the, I mean, we mentioned precision medicine, maybe let's focus on that. What What would that look like from a data scientist perspective?
15:39 So the highest level for precision medicine, so precision medicine, first going to get over at least 1 million Americans to contribute their data, and donate it to the National Institutes of Health so that researchers can work with it, who are those researchers? So I'd love to see they're just being generic data scientists are turning on it. Now we have this this kid, Nathan Hahn, you came to the White House Science Fair a couple years back. And here's a kid, literally 1617 years old. And he's just interested in machine learning algorithms. So it goes to the DB gap. And he starts to play with some data. And he's like, Look, let me see what there is, with cancer sites and all this stuff. His algorithms on machine learning are up there with the best. You know, we have stories of that all the time, we have this woman out of out of Kentucky, she's working on renal failure, she's building an artificial kidney, it's so good university, Kentucky, give her a whole lab. She's 17 this year, like you have this amazing, like America, more than any place in the world has an unbelievable arsenal of talent. It's everywhere, what we have to do is connected with the problem. So if we're able to just open up a little bit of that data through the Precision Medicine Initiative and say, Hey, come in and work with it. What might we find? We talked about finding signatures of Vioxx, why did somebody have to look, when that signal just emerged, what happens, you start to apply the machine learning algorithm, the feature set selection, things just interesting happen. As you get into that problem, you start to work with more and more sensitive data, you have to get vetted, and you have to work on the data. And maybe eventually, the data is even air gapped or, you know, in some type of sandbox environment. We're working through all those security aspects as it develops. But I would love to see people do that and find a force multiplier change today. By the end of today, 100 people will die on our highways and roads. And that number is going up. Why? better fuel efficiency? Is it cars are driving faster? Is it distracted? Driving? We don't know. But what we do know is this pristine data set. And what happens if we just say, hey, America, why don't you solve it? When you bring the full force of the United States of America to a problem, you will make that problem break? Like you will find a solution to break it. Like it is unbelievable how amazing we are as a country when we decide to do something.
18:10 Yeah. And I think that you've mentioned two examples. I know you've talked about criminal justice and open data and policing, it sounds like there's just so there's nothing but opportunities for people who are in the public and just want to engage even kind of prior to thinking about maybe coming in joining the digital service or being a data scientist inside government. What How can people get involved in, in these various kind of open data initiatives that that are part of tackling these really big society level problems?
18:37 Yeah. So there's a whole lot of great ways to get involved. And a lot of times people think, Oh, you got to come to the White House to serve. No, you can serve in your local neighborhood. You know, there's, there's a police department, there's an education department, there's somebody in the local city, or your town or call your community, however you want to define a county. All of them need help in your ability to look and play with some data. And get involved is really powerful. That can be through something like Code for America, it can also be through something around a cancer program or some other type of area where they just need this skill sets could be education, looking at some of that data. You can there's all these competitions around where can i say competitions that just really mean a hackathon or people playing with data because nobody actually knows the answer, get some ideas, jump in there. If you want to get to the full federal level, there are the US Digital Service and the different digital services in the Department of Defense or VA Veteran Affairs or even Department of Transportation, all those places are open and waiting for help. But the bigger thing is if if you want to see a change happened in community, you gotta jump in.
19:51 Yeah, and so I think okay, so people can make changes on a very micro level. People could of course get involved the the digital services will continue into the next administration? And then and then how how do you think the and kind of at a macro level, I know that, that your organization has been really out in front about data and ethics like we touched on? And I think that this is something that it's a it's a very challenging problem, because on the one hand, it's a policy problem. So it does seem to come from government, maybe different to most technology innovation, or, you know, maybe not if you have kind of a DARPA perspective, but but different to most technology, innovation, this this came government first, this wasn't something that came out of industry, that was a practice that was later adopted, inside government. This was really the the Office of Science and Technology Policy out in front, really the White House leading the charge on this. And I'm just, I'm curious, a kind of how you see that permeating across data sciences and industry? And then be how you see that being carried forward into the future? Like, what are the implications for this? This idea about ethics and data science and artificial intelligence and machine learning? Like, what does that what does that mean for us? Yeah.
21:03 So the person who's led from the front of this is the president, and this has been a forefront issue for the President. And it's where does it really stemmed from? Is it said that force multiplier is a force multiplier of good? Or is it harmful? Does it what are the edge cases? And one of the things that I have taken away personally from this job is, when you're building a company, as you guys are, you always get to say, well, that's an edge case. When you're here, those edge cases have names, their names like Sally, Joe, Zell, Juan, Ricardo, whatever, they all have names, and what's the impact for them? So when data is being used? How is it being used? And what are the implications? So the very first big data report that john Podesta led that emphasized the need for thinking along this direction, we've had since then three other data reports, and the latest one was an AI report all of them every single time we go out and talk to people, this is what's actually on their mind. It's not about the sentient being that's going to emerge and, you know, figure out how to become the robot apocalypse. It's actually it's actually well, who's getting harmed? And what's there? And how do I know that I can trust this for my kids or my kids, kids? That the place where this gets impactful? And you guys have talked a lot about this is the black box of algorithms, the ability to know is this data, okay, or not? There was a great aspect the other day where somebody pointed out, well, if we got self driving cars, and we got good data, bad data coming in, what are the implications when an algorithm can't recognize even African American faces? Does that mean self driving cars have a decision disparity when it comes to race? And is that a data set? Whose whose problem is that? Do we just be able to say, Oh, sorry, that's, you know, that didn't have good training data. That's not acceptable. When you're putting something out there that may harm the public. You wouldn't want a drugmaker suddenly saying, oops, sorry, we didn't take into account the fact that hipster data scientists with beards living in Austin, aren't in the training dataset. present company excluded.
23:32 This portion of talk Python to me has been brought to you by robar. One of the frustrating things about being a developer is dealing with errors, relying on users to report errors, digging through log files, trying to debug issues, or a million alerts just flooding your inbox and ruining your day. With roll bars full stack error monitoring to get the context insights and control that you need to find and fix bugs faster. It's easy to install, you can start tracking production errors and deployments in eight minutes, or even less. rhobar works with all the major languages and frameworks, including the Python one such as Django flask pyramid, as well as Ruby, JavaScript, node, iOS and Android. You can integrate robar into your existing workflow, send error alerts to slack or HipChat, or even automatically create issues in JIRA and Pivotal Tracker and a whole bunch more. roll bars put together a special offer for talk Python to me listeners, visit robar.com slash talk Python to me sign up and get the bootstrap plan free for 90 days. That's 300,000 errors tracked all for free. But hey, just between you and me, I really hope you don't encounter that many errors. I love to buy developers and awesome companies like Roku Twilio, kayak instacart Zendesk Twitch and more. give rhobar a try today. Go to robar.com slash talk Python to me.
24:53 Sacco hottie did this great research project sudden cardiac death syndrome and turns out African American men have been given been given too high a false positive reading on the same. Why is that the case? Turns out genetics and genomics are substantially more complicated with thought. But also, there's not enough healthy African American males in those clinical trials. But if you look even more, there's not even really women and a lot of clinical trials, let alone talk about ethnicity of women and trials. So as we're working with data, you kind of go back to the source and say, Well, what biases there and all these other things, and then the other one that I think we have to confront, and everyone has to start really asking these questions is, what does it mean, when somebody just slaps a label on something that says, data science verified, just because it's a great for sales and marketing? And as data scientists, all of us need to take a very serious look at that and say, does that meet our bar? Because those people when they sell something they are representing on behalf of the community? And as a community, I'd like to think that we're better than that. Yeah. So I mean, I think
26:07 there's, there's this idea that, of course, it really speaks to the need for data scientists and almost like in Policymaking positions, or near policymakers to be able to inform, at a high level when we should be or what steps should we be taking to verify that the data that we're using to make these decisions is, is not biased? Or that our processes aren't biased? But then at the same time, you're talking about this individual responsibility? And why do you feel like that's gonna come from? I mean, is it is it situations like this, where those of us in the community who have sort of seen this in action can be advocating for it, or those who are listening to this podcast can accept it and do some thinking and, but but maybe beyond that, as a as a kind of a practice? Or as a as an industry? How do we, is this something we can Institute like, what do you think there's changing policies changing education? Well, fine. I
26:59 mean, what do you think?
27:00 Well, if we're not careful, we're going to get regulation. And the regulation will come in the usual form of legislation. And that has good and bad effects. The it's very tough to get that kind of legislation correct? When it's a very fast changing technical landscape, extremely hard to do. So what I would like to see and I think the place of we first started that we have called for from the White House is that every training program, data science, economy, economics, computer science, wherever you got to be trained in two things as not electives, but as two core principles, what is ethics look like? And what is security look like? Because if you're learning about databases, and you don't know what overflow is, that's crazy, in this day and age, in the same way, if you don't know about training bias, and with the ethical implications, as you're building something that can't be a slap on additive, elective kind of thing, it's got to be intrinsic to every course. So if you don't have that, in the training program you're in right now, you need to demand it, because you're getting a subpar training course. And it's not going to prepare you for the real world. The other aspect is, as we're interviewing people, as we're talking people, we should all ask an ethics question, methods question to be this very simply, we're gonna pretend we'll do an interview here. I say, Jonathan, thanks for coming in. You happen to be building an algorithm. And we're really focused on this because we're building job matching. And we're not supposed to use race, you are got this amazing data set. And because you're the all star data sciences, and you have a podcast, you just happen to look at all the data and say, Hey, I think I found ability to bypass race or to proxy of race. What do you do?
28:41 Yeah, these are interesting questions, right? Because I think it's the kind of thing that as you could, you could imagine, I think sometimes as technical people, we get really focused on the solution to the problem. And we put that ahead of everything else. And we're not really thinking about the implications of our work. And we just go, Hey, I found the solution. I solved the Rubik's Cube. I'm done. That's right. But I think we don't we don't
29:01 have a conversation, right? Like all the people that are in the cadre of data scientists who have recognized this problem. What's the one commonality that we all have, other than generally going to bars and having beer talking about the problem? Is we talk about the problems. We talk about these, we don't just talk about, oh, how did you do that? It's like we always kind of look back, you're like, well, is that what we should do? What about this? What about these other implications, we care about the edge cases, we care about the longer term implications of what we're building. When you create something with data or an algorithm that's equally important as if you were creating an artwork, you're creating a bridge or building a building, we have to take that responsibility. And it's not that because of the responsible we imposed on us, is because if we are going to be a massive force multiplier in the world, you have to accept the responsibility that comes with that.
30:00 Because we're kind of we're at the we're at the stage where it's a little bit the Wild West right now, in terms of how data and data science works. And we're, I think still in that window before there's any, there aren't really sort of set mechanisms for how people collaborate in teams that aren't setting mechanisms for how data science models are deployed, or, or checked, or QA kind of all the stuff, you have another technical disciplines, it doesn't exist here yet. And so I think, ultimately, we we get to decide this generation of data, scientists get to decide how we would like history to look back, or how we would like our industry to evolve going forward. And even more, so I think history will judge us, it'll either judge us kindly or harshly depending on what comes out of these products and the implications of how not only our societies, his products, but other societies that could even be repressive use our products, and what are the implications of that? All of that falls on us for the implications of these things. And we just have to get ready to drive that world because the that the time is now for us to do that. But that's actually a you know, one
31:12 of the things I think, given your audience, I've actually been really curious to hear what the audience has to say on this. I think one of the things that we haven't heard sufficiently, because it's just the size community is early is what does the data science community think? What would be the most powerful mechanisms to move the needle on this? Should we have more of an institutional review boards and very fixated, regimented process? Should we have a very laws a fair? Where do we stand? That's, you know, that's one that I think the community has an incredible opportunity to stand up and say, this is where we believe, and I have no idea what the the broad consensus of data scientists are thinking, except for the ones that we've had a lot of interactions with, are, are very traditional White House processes.
31:59 And that makes sense. And I think that there, I would wonder how much people are even thinking about it. It does seem like they're good because in other industries, there are there are roadmaps in the legal profession, the medical profession, like these are also kind of knowledge based, highly technical professions that have established some concept of ethics and mechanisms for either encouraging or forcing people to adhere to whatever the community standards are. But but as of yet, we don't know is that the appropriate way to think about it for data science, or I don't know,
32:31 well, the ethics, if you look at bioethics, medical ethics, or physicians, any of that never the physicians who are at the very early front end of that space in terms of ethics getting implemented across that there's so practicing. That's how recent these things are. So for us to get ahead of that, that's, that's going to be critical. And we're going to have an equal opportunity to drive more of medicine in this way through data driven approaches than people even recognize. So the analogies could be even more similar than we appreciate. And same with law.
33:06 So speaking of the kind of implications for the broader community, now that you know, now that this administration is coming to a close, and obviously you're very passionate about where the community should be going, I feel like I have to ask, what are you doing after this? DJ?
33:19 How are you going?
33:20 Yeah.
33:21 How's your leadership role in the data
33:23 science community going to transition? Right?
33:24 Well, as much as you can talk about as much as I can talk about, so I feel like I should insert somewhere at Chris Albin is my cousin Joe. I'll put my own long lost relatives out of the out on the border, appreciate that
33:38 having not been able to be here in person.
33:41 Please give him
33:44 the curse, we miss you.
33:47 Most of the time.
33:49 But the, for me, the biggest thing will be to take a nap. I'll be taking off the tie. And the thing that I'm excited is that the biggest thing is the wave of data scientists that are in the federal structure, a lot of people are questioning, what's going to happen is it's going to collapse as the federal, the civil servants who are going to just carry this forward, there are ditch chief data scientists or Chief Data officers or some type of analytics, leader, data leader in more than 24 of the federal agencies, and they're going to carry on the mission. So I feel very good about the progress we made. That doesn't mean like at all to say the mission is done, and the community is going to have to continue to kind of keep to champion this. For me personally, I'm a big believer that there's a big difference between experience and wisdom. And you go from experience to wisdom through reflection, reflection is sitting down thinking it's talking to people having these kinds of conversations, writing all those different things and so we'll go through some period of reflection to try to distill this As much as I can from this really very, very unique experience. And then I think I'm really going to be excited to get back to building, I think there's an the powerful thing about all data scientists is we're makers at heart. And if we've taken anything that that's there's your ability to be creative, and create something novel and do something that's unique that nobody saw, like your work on ISIS. You know, using Twitter data, those type of things, you just see people seeing the world through a lens that people hadn't seen before. It's kind of like when you see a photograph, you're like, wow, I never saw the world that way. We have that unique ability to do that with data. How does that resonate as a true product, or something that somebody else can do? When we're building we're learning in different ways. I'm really excited to get back to a different form of building this one has been largely policy base. And I'll be looking forward to getting my hands dirty and hands back on keys and trying to figure out what that what that looks like in some form or another.
36:03 All right, all right. All right. Well, we
36:04 look forward to to some writing, perhaps to some reflection, and then ultimately, to, to spitballing, on on how to build the right model for whatever it is, you'd like to build to try and solve. So that's cool. That sounds like it'll be an a well deserved rest after I mean, I know around here at the White House, I was joking. Just last night, that it's not uncommon to find a fair number of workaholics around here, you guys, I'll push pretty hard, because everything is important that this law,
36:31 there's always a crisis. And there's always an opportunity to do more. Yeah. So when you're balancing those, you can't let the urgent get in the way of the important. And there's this fundamental thing, it's actually there's a great one. So I'll just share this kind of card with you. There's this president gives us these cards. And in this one says, everything we do needs to be infused with the sense of possibility, we are not scared of the future. Everything we do needs to be infused with a sense of possibility. We are not scared of the future. So there's also the other analogy cars like he gives us which is remarkable things happened in the last quarter. And throw all the sports people you don't need to you can imagine your favorite game. Where were you seen that? So don't think that we're done?
37:19 Yeah. Okay. So
37:19 there's there's more to come here, here in the last quarter. And then but then also even going further past that. I mean, you mentioned something just a moment ago about the way that kind of data and analytics has been really embedded in government. And so the Mission Continues. And there's I'm not, you know, I'm, again, I know that there's a kind of a lot going on right now. And so there's maybe not a ton to say, but I bet a lot of people listening are thinking to themselves, well, what happens in the next administration? It's a, it's an administration from from the other party. So it's a big transition. And but then at the same time, I know, you know, from a technology perspective that this is this is a largely bipartisan issue. I think that you actually served in George W. Bush's administration. That's right. So so you know, you've you've seen this from the perspective of a civil servant. I don't know what what what happens in the change as we move from one administration to the next. And what are the implications for people who might be thinking about getting involved?
38:18 Well, the biggest thing to think of as an administration is, it's a baton. It's like a baton race. And so your job is to hand the baton off to the next team, while they are sprinting is equally as fast as you are. No baton drops are acceptable, because that's national security. That's people getting hurt. That's people. You know, there's a lot of services that are people critically depend on. So we have to make sure the baton is well passed. So and that is, that is what the President has really emphasizes that just as the Bush administration transition to the Obama presidency, that was such a clean handoff, we have to do that as well. And we're all Team America. We're Team USA here. So we have to do that. And you never want to bet against the country. That's that's not who we are. The other part there that's the case is these problems. Cancer doesn't care what religion you are, what political party you are, what socio economic class you are. It is a problem of species Zika a bola. These are problems of a species climate change is well, obviously debated. The science is clear. And I'm sure as more and more people actually take a look at it as they start to shift to thinking about these things. More people will say that is obvious. When that people people start looking at these other sets of problems around criminal justice at the local level. These are not federal problems inherently they're local. And if you look at Governor Bevin, who is the governor of Kentucky, he knows that there's a giant gaping budget gap That is being caused by the local criminal justice system. And for those that are out there that don't know, we're talking like $20 billion out of the US. That's like that, basically how much we're paying for jails. And who are we paying for? There are more like, there's more than 10 million people like basically 11 million people going through our 13,000 1300 jails. And, and it's crazy. Like you think about those numbers. 95% of them never will go to prison. These are local jails, and they're staying there for average, 23 days. So 11 point 3 million people going to 3200 jealous said My number is slightly differently there, but 11.7 point 3 million to 3100 jails. That's insanity. And who are those people? You look at some of our jail facilities, Cook County Jail, 92 acres of a single site jail in Illinois, that has one third mentally ill, we're just in Las Vegas the other day, and they're talking about assaults on officers. And they thought about it. They all thought, Oh, it's gang related violence, when they actually looked at the data, it's all mental illness. So why are we sending mentally ill to jail? Why not train officers, Miami Dade Florida to this they trained officers and intervention. And what happens? Oh, gee, it turns out that if you spend a million bucks to train officers and dispatch and crisis intervention, you can save more than $10 million in the jails. And you can close the jail, which is the more important measure. Now think about that with respect to the opioid crisis. So these ideas of what it means to use data in these clever ways and how do you do that? Take your data from your portion of your criminal justice system, move it over to the healthcare system, not super sophisticated, just move it over and look at who are the people that are constantly cycling? How many Dale jail days are they going through? How many dollars are they doing? And then ask next time to please see them? Why don't they take him to jail, put them into this other treatment plan. Let's take him directly to treatment. Those are the ideas.
42:28 This portion of talk Python to me is brought to you by gocd from thoughtworks. Go CD is the on premise, open source Continuous Delivery server. With go CDs comprehensive pipeline and model you can model complex workflows for multiple teams with ease, and go see these Value Stream Map lets you track changes from commit to deployment at a glance. Go see these real power is in the visibility it provides over your end to end workflow. You get complete control of and visibility into your deployments across multiple teams say goodbye to release a panic and hello to consistent predictable deliveries. Commercial support and enterprise add ons including disaster recovery are available. To learn more about gocd visit talkpython.fm/ go CD for free download. Let's talkpython.fm/ g OCD. Check them out. It helps support the show.
43:28 Miami Dade did that when you're alone, they're making those crazy savings costs. How do you make that happen at scale? That's where the data science comes in? Because Miami can do it. How about Florida and other portions of Florida? How about you know Louisville? How about no Boston? How about somewhere in you know Seattle, when that common platform is there. That's where we're going to see that change.
43:51 So it sounds like at an individual level, that there's a there's opportunities to find important applications for data like this that really start bottom up. Like they really do start people getting engaged and being active in their communities and working on data that's local to their communities. And then when interesting solutions are discovered, then that's where the at a federal level you can be thinking, Oh, how can how can we now be another force multiplier to take that kind of solution and see where else it might be applied or bring people together or orchestrate policy so that it enables this at some kind of national scale?
44:25 Well, we think of it as scouting scale. Okay, so somebody's doing it. Great. Over here, we say Hey, everybody else, look at this. And in the White House gives you an incredible bully pulpit to say, Hey, here's how we can scale that. But data driven justice and police data initiative, these two kind of programs in the space. They're what they don't really have a White House, like no data has come to the federal government. They're all local and they're trying to say, Hey, here's what works for us. Here's what sort of what works for us. You know, we talked about a B experiments we forget in health care, each one As an AB experiment of life, the question is Of what? What was the original questions?
45:05 What
45:06 were those hypotheses? Was it local environment? Was it genetics, all these other things? Same way, each city, each Township, each community is an AB experiment across the country. But when we use big data techniques, were able to abstract and say, Hey, here are the common features that we believe lead to this, that creates a hypothesis that then we can test with policy.
45:27 And okay, okay, so and then, so that you can you can basically test this hypothesis in multiple places. And then and then see whether or not the features that we assume actually lead to a reduction in the cost of, say, local incarceration are actually the the features that are common from city to city with, you know, when we take away everything else that might be might be relevant to the problem like, like location, for example, geographic, right?
45:52 That's absolutely correct. And one of the things that you start to see, as you start asking these questions is you realize that other people just haven't had time to ask a question, or technical expertise that helping them to ask a question, or legacy systems that prevent them from asking the question. So in the case of an OB, saying, Hey, your officer assaults that when your officer has been assaulted. Who's Why is it happening? They say, Hmm, you know, we had a fate, but we've actually never checked. Now, is it the police officers fault, or the police department's fault that they haven't had time to do that? These officers are so massively overloaded? It's unbelievable, because we're asking him to do more and more and more. And just to kind of go to the another example here, because this is one that I think is important, like, we talk about police officers, and we forget what is data doing for the officer. And my, the team at universe Chicago did a really cool set of research projects with a town down in south. And what they did was they looked at the data and they sort of said, what is causing officers to use excessive force? What are those features. And so right away, it's a signal and noise problem, because there's a very small number of officers that are actually using excessive force. So then you got to kind of separate that out. And they start looking at that data. And the first set of features is super obvious. You have a history of traffic accidents, and you know, the usual kind of things, then suddenly, a couple features emerge in the middle number one, oh, look, you responded to multiple suicide calls, oh, you respond to domestic violence, where children are present. So what's a good data scientists do in this situation, they don't just try to extrapolate. They go talk to the officers, and they go follow along with the officer. So what happens in a suicide, suicides are physically messy, it's a lot of it's just a gory situation, it's very uncomfortable. But also, there are human emotions that are highly supercharged in their families, everything is it's a high emotional thing. Same thing with domestic violence, especially when a child is present. So what's happening, dispatch says at the end of that says, you're done, we'll get back on the beat. So now you pull some kid over with a broken taillight, and they're flippant with you, you just came off this highly emotionally charged thing. None of us are good enough to go from that context shift. So why is the dispatch system not extinct about this ahead or anticipating this, because the data is obvious, give the officer some time to decompress, give them some some chance to become back to being normal and human. That's a failure of data, rather than an opportunity of where data is being used in the way it could be to help the officers. Yeah, that make
48:40 and and and it sounds like these are things that I mean, given. And of course, you know, things are always obvious after the fact. But, but that sounds like a relatively straightforward discovery that somebody made once. It was, once somebody looked at it plain,
48:54 just with a little bit of it, we're talking just a few months of effort. We're not talking like, you know, some whiz bang thing, even data driven justice, this idea of moving the data around from one system to the first portion of this, we're talking like passing spreadsheets. In we're not talking like crazy, super infrastructure, this basic level stuff gets you very close to the problem. When you start working with the people, you will see a very different angle of the problem. Yeah.
49:24 And so it sounds like it's it's kind of right there. It's right in front of us. And and there are problems to be solved like real human lives in the balance problems that the we can be out, we can go out and be solving.
49:38 Yeah. Imagine you did this. We always talk about the data set. We don't talk about the people behind the data set. And the thing that I have taken away more than anything is people are greater than data. We all know that intrinsically. But if you remember and you have the people that you have in your mind when you're working on this, you'll have a different approach.
49:58 All right. Well, I think that that's actually a fantastic place to end it. That's a that's a, it's a nice reminder for the audience. And thank you so much, DJ, this has been a really fantastic interview. We really appreciate you coming on the show. And thanks for everything that you and your team have have done for both the data science industry and for the country. Yeah,
50:15 thank you guys. Thanks for it's been fun and looking forward to seeing what the community can do. I'm really excited for everything.
50:23 This has been another episode of talk Python to me. Today's guest was DJ Patel in this episode was guest hosted by Jonathan Morgan. Thank you both for bringing us an excellent conversation. And thank you to roll bar and gocd for sponsoring this episode. rhobar takes the pain out of errors. They give you the context and insight you need to quickly locate errors that might have otherwise gone unnoticed till your users complain to you of course, as taught by them to me listeners, you can track a ridiculous number of errors for free Just go to robar.com slash talk Python to me to get started. Go CD is the on premise open source Continuous Delivery server will improve your deployment workflow but keep your code and builds in house check out go CD at talk Python dot f m slash g OCD and take control over your process. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point, well check out my online course Python jumpstart by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python. And if you're looking for something a little more advanced, try my write pythonic code course at talkpython.fm/pythonic. And be sure to subscribe to the show. Open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play indirect RSS feed at /rss on talk python.fm. Our theme music is developers developers, developers by Cory Smith Goes by some mix. Corey just recently started selling his tracks on iTunes. So I recommend you check it out at talkpython.fm/music. You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Let's mix. Let's get out of here.
52:17 Dealing with my boys.
52:20 Having been sleeping. I've been using lots of rest. I've got the mic back.
52:27 Developers,
52:33 developers, developers