#286: Python and ML at NASA Jet Propulsion Laboratory (JPL) Transcript
00:00 NASA's Jet Propulsion Laboratory JPL primary function is the construction and operation of planetary robotic spacecraft, though it also conducts Earth orbits and astronomy missions, and it's responsible for operating NASA's Deep Space Network. On this episode, you'll meet Chris mapman. He's the division manager for artificial intelligence, analytics and innovation at NASA's JPL. And he's JPL is first principal scientist in the area of data science. We cover a wide range of topics and dive into how Python and open source are growing in the space exploration field. And he answers the question of whether he thinks we'll have Python running on robots and rovers in space. This is talk Python to me, Episode 286, recorded August 14 2020.
00:59 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is brought to you by linode and monday.com. Please check out what they're offering during their segments. It really helps support the show. Chris, welcome to talk Python to me.
01:26 Hey, Michael, thanks for having me. It's great to be here. And hi to your listeners.
01:30 Well, I almost don't know where to start. There's so many things that you have done in Python in the open source space that I think is gonna be really fun to talk about. But I think let's focus mostly on space and JPL and that and then we'll get to what I think will be a surprisingly large and impressive list of interesting things that you've done. And so we'll start our conversation there. But before we get actually into that, let's just start with how you got into programming in Python.
01:58 Yeah, well, it's a long story. So I came from the Java world, rewind the clock, maybe, I don't know, 15 or 20 years ago, and I grew up in a trailer in Santa Clarita. It's about an hour north of LA. And let's see, my childhood was super interesting. But anyways, I became a teenager. I like to tell people I think I had my first longer than five minute conversation with my father when I was a teenager when you know, my brother who was out hanging out with the ladies and being an extrovert, I was an introvert. He asked me to read the paper with him the local paper and so that was nice. I think it had longer than a five minute conversation with him at that point, you know, and so yeah, so I went from there. I played sports in high school. I went to Saugus high, I was five nine up until my sophomore year and staying that height throughout. Everyone else got taller. So they got to play football. I didn't try to do something else. You know, I had a I was I had a 4.6 GPA. The only reason I didn't have a 5.0 on a four scale was they didn't have honors football. So I decided to go to into computers now at the time. And so I went to USC, I couldn't afford it. I'm still paying it off right now. And when I was there at USC, I was sitting in the computer lab one night, and as my sophomore year and I needed money in a job. It was like midnight, and an email came through from a place called JPL not JBL. The headphone place JPL Jet Propulsion Laboratory. And a real nice gentleman, Dr. Rob Raskin was looking for us computer people to help Earth scientists understand earthquake data and other things. And so I went for an interview. I never been for one for I interned at a company called Iowan calm. That was my only other experience, basically building video games and Java, Java applets, for the 35 to 55 year old demographic of people. It was like online poker games during that era. So I mean, that was nice that way, I was on the west side, you know, LA, I got to be near the Miracle Mile, I learned where UCLA was, as a USC person, you really need to learn that you can have really feel the rivalry. But it was a beautiful area over there by mid Wilsher. Anyways, JPL was a nice change, it was closer to USC and where I lived at the time, JPL is about maybe 1520 minute drive into northeast la from downtown and USC. And so yeah, I got a gig at JPL. And I was like a computer programmer. I was doing Perl, PHP, you know, other stuff, building websites and databases, and my sequel for scientists. And so I did that for maybe, I want to say, as an academic part time and eventually as an employee for three or four years, and then I got sucked into the real hardcore Java community. I was working with folks on technology projects for databases I even worked. I even worked for a project JPL is doing work with the National Cancer Institute. We're basically putting together data for cancer detection, because a lot of the stuff we did for remote sensing could be applied to that. And so yeah, I mean, Java was big at the time, and my trick was trying to figure out how instead of using C and C++ to build science missions, then I eventually started working on those I worked on an earth science mission called Orbiting Carbon, observe inventory. And I had to figure out how to use john, I want to use job I refuse to use C++. Not that I didn't know it,
05:06 right. But
05:07 I was like God, you know, go ahead, guys, you
05:09 see, I've gone through these these same stages, not with Java, but with dotnet. Just like, you know what, I know, I could do this to C++, but I'm really over just all the pain in the hoops and the page faults and all the stuff that I just don't care about anymore. You know,
05:23 Oh, my God. Same thing. I had done nachos for operating systems. That was the Berkeley sorry, Stanford little tutorial project on how to do it. I learned multi programming and memory management. But who the hell wants to do that regularly? And so I was like, Java does this for me, it's being shoved down my throat, let me accept and assimilate. So yeah, so my gig was my big thing I cut my teeth on and became kind of known at JPL was that I forced us to basically use Java to implement a ground data system for osio, The Orbiting Carbon Observatory in 2005. And so my trick there was, okay, all the prior earth science missions, say took in 10 years, 10 gigabytes of data. And they ran on order in terms of their processing their daily processing for jobs to produce data, maybe 10s of jobs per day to produce that 10 gigabyte record over, you know, 10 years. And so osio was basically Okay, we're gonna take you into the realm of 10,000 jobs per day daily workload, and then it was going to generate 150 terabytes of data in the first three months. And so so I actually looked at the C++ system that we built before to do this. And I was like, you know, it was tied to a database. It was tied to, like, I don't say Oracle, but I think it was like, it was tied to Postgres. It couldn't run without a database running all the configuration was in a database. It was like single processor, everything else. And I was like, This needs to be completely rewritten. And at the time, I had hung around at USC to get a master's degree, and I had a really inspirational professor there during my master's, Dr. Nina Meadville, kovitch, who ended up becoming my PhD advisor who got me into research, and I hung around for the PhD as well. And so in my PhD that I was doing at USC, I took a search engine class, and I got really into this thing called nutch. in you, tch, right? And you Tch, and it was a creation of a guy named Doug cutting who was the guy who created leucine, and eventually things like Hadoop and all that whole ecosystem. And my professor at the time wanted us to do our final projects in nuts. And so mine was a really simple syndication or RSS parser. And by the way, for your listeners, I'm going to get to the Python part, I just have to tell you this, so you can make fun of me, because I started out in big, big hardcore Java. So I got into knotch, I built the RSS parser, for feeds for news feeds. That was my final project and my search engines class during my PhD. And I contributed to nutch. And I got involved in Apache, the Apache Software Foundation where notch had just moved to. And I started talking to Doug and all the other developers and I became friends with them and I became an nutch committer. The funny part was my academic cousin at UC Irvine, because it's all in the academia. There's like, Who's your advisor, he's like, Dad, who's your advisors by your grandpa, you know, you got cousins and uncles on the academic side, one of my cousins was Justin Aaron Krantz, who was the president of the patchy and my academic uncle was Roy fielding, who was the founder of Apache and so I had this sort of a patchy connection without even
08:16 as well as rest. The whole rest idea. Yeah, the
08:18 rest architecture yet Roy's famous for that. So UC Irvine during the mid 90s, which is like the decade before I was doing it was like the place to do software, you know, big software, gooey and multi architecture, development and architecture and stuff like that. Dick Taylor ran the group. And anyways, he had a bunch of all stars, he had the guy who invented Argo, UML, the guy who invented WebDAV, the guy who invented rests, the guy who did the component, connector architecture styles. So those are my ancestors academically, so So yeah, so I'm doing search engines and whatever, I've got this connection anyways, to Justin and Roy, who are telling me get involved in open source. And I'm like, cool, okay, I could do that. Let's, and so I started contributing to notch and that eventually became Hadoop. Yeah. And so then Hadoop became Spark. And so I'm in the ecosystem, I'm playing, you know, I'm contributing, I got into search engines that was really became a passion, because that was I was building these big data systems at JPL for mission science. I was like, we need to use all this Java stuff and all this ecosystem, and we need to scale and do all that. So the funny part was, here's the Python, long winded answer. He asked me, How did I get into Python? In about 2009, after I had led teams done this, myself built three mission ground systems on Java. And we proved all the naysayers wrong that said, I can't do it. You know, yeah, gotta be C++. We did a ground system on Java. I got tired of Java. And Python was getting shoved down my throat then and that was like everybody has, at some level in their career, one or two or three areas of their career. And my second era beyond early programming was really Java. And so then my third era was Python. I mean, I started to get involved. I went away from missions. I got involved in technology development, and I start going to government technology develop In the early 2010 2012, during the Obama administration, there was the big data initiative. And they funded a bunch of programs, hundred million dollar investments and things in big data. And I was toe to toe in some of these programs standing alongside of them with people like Peter Wang, Travis old font, and I started to learn. In fact, I even funded them during a program called DARPA memex to grow what became what was continuum analytics at the time, a smaller company out of n thought to what became now Anaconda. And so I started talking, and then Peter sitting there telling me all Chris screw Java, you know, I was the Java guy, you know, sitting amongst all the pipes. I feel
10:34 I can just hear Pete I could hear Peter saying that as well. Yeah. What are you doing? You're messing around here?
10:40 Yeah. You know, Peters, tell me, he's telling me about, okay. And he's like, Oh, you know, numba and all this. And I was like you guys got and you got your own foundation, num focus. Andy Terrell and I became friends. I got Andy to come talk at Apache con, which was great. They never invited me to PI data or anything. I don't know, you know, whatever it is them. But yeah, so I got involved in that. And so I said, I can do this one more time, I can go deep and learn. And so really, probably circa 2013. The big thing for me was I created Tika. We'll talk about that later. But that was my big thing in Java besides Hadoop, and all this. And so in 2013 2014, we ported that to Python. And I did that with a guy named Brian Wilson at JPL. And that was my, I can go deep and do Python and deliver something of big value to the Python community. And so, so yeah, around them. That's how I got involved in Python. That's how I'm there. Now today, I'm doing machine learning and other stuff. And anyways, I don't want to dominate. But yes, that's kind of the answer.
11:33 Well, two things first, to get a call from JPL. Out of the blue in your undergrad or master's degree program, say, Hey, why don't you just drop in and just do some work on like cutting edge space, just down the street, like that is incredible, right to get such an opportunity. And I think that's really, that's really neat.
11:51 It's a big time opportunity, Michael. And for me, the thing I like to tell people, I've learned a lot at JPL, my 20 years there, and a lot of people look at me, I go, God, you've been here for 20 years. I said, Yeah, I'm just sort of entering mid career. 20 years at JPL is mid career. For
12:05 me, that's like, you want to run a mission? You got that? Like, that's a 20 year commitment, a 15 year commitment sometimes.
12:10 Oh, dude, you're exactly right. And so you can see at JPL, I tell people, this, you can see the people that are going to be there for five years. And you can see some of the people that are going to be there. And by the way, we like those five year people, too, we'll get whatever we can, we can talk about that later out of whoever, you know, our mission is space. And it hit me in 2003 2004. The big thing was the Spirit and Opportunity twin rovers. You know, I mean, the first three or four years of JPL, it's awesome. And you're just like, but you're young, and you don't know and appreciate space and everything else. And so I was like, yeah, maybe I'll go work at a startup. Yeah, after this. And it hit me in the middle rovers, Mars Exploration Rovers, Spirit and Opportunity. And they send them and they land and I saw the landing. I stayed up at night, I watched NASA TV was just my wife and I at the time in our new house, we bought our first house and I bought my first 55 inch TV that if you compared to the TVs, the thin ones now it was like as big as my living room. And I'm like, Yeah, well
13:03 as own cooling unit.
13:05 its own cooling in it. Oh, yeah. And so he's sitting there, we're watching it. And then Arnold Schwarzenegger comes out. He's the governor of California, and he's shaking hands with my friends. Some of my friends who I worked with, on some of this, and I'm like, I'm like, I know. This is amazing. And, and everybody and then it's like the JPL Yeah, you know, when we lay stuff like that. And so, so yeah, that's when I was like, Oh, God, I work there.
13:27 That's awesome. Yeah.
13:28 And I knew I was gonna stay there, you know?
13:30 Yeah. Yeah, really cool.
13:34 This portion of talk Python to me is brought to you by linode. Whether you're working on a personal project or managing your enterprises infrastructure, linode has the pricing support and scale that you need to take your project to the next level, with 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise grade hardware, s3 compatible storage, and the next generation network linode delivers the performance that you expect, at a price that you don't get started on the node today with a $20 credit and you get access to native SSD storage, a 40 gigabit network industry leading processors, their revamped Cloud Manager cloud not linode.com root access to your server along with their newest API and a Python COI just visit talkpython.fm/ linode when creating a new linode account, and you'll automatically get $20 credit for your next project. Oh, and one last thing they're hiring go to lynda.com slash careers to find out more, let him know that we sent you. The number two, you talked about going deep in Python, after being a Java. And I think it's really interesting to learn a language that has all these nuances and patterns. And it takes a while to master I think Java does. And then you come to Python. And it's a language that, you know, one of the jokes is, Hey, I learned Python. It was a great weekend or something like that. And yet, I've been doing by them for a long time. 810 hours a day. I'm still learning Python, right? So I think there's this really interesting distinction between, I learned something, and I learned it, you know what I mean? Like, I really build something meaningful in it, rather than Yeah, I know how to do loops really well now,
15:10 oh my god. And that's how it starts. I mean, the loops are okay, we're gonna start with those. We're gonna start with the basic constructs in our mind. But yes, absolutely. Like, like, I tell people this, you know, only recently do I tell people, you know, you got the three levels, beginner, medium and expert. And I tell people, I say I only recently became an expert in Python, it took me a while, because to me to become an expert, you need to go deep. And so for me, and people still tell me today, Chris, your Python code looks like Java dammit. It's not Pep, you know, certified or peps, or whatever the hell the PEP is, and or set it on black on a pre commit hook for you? Well, you know, I use a lot of camel case, still, you know, and naming stuff, you know, I don't use the underscore, you know, whatever. I say Shut up. It works. And, you know, and to be honest, so here's my trick, though. And maybe you'll agree with me on this, Michael, this is true in any software development language, even though I don't do this anymore. My PhD is in software engineering and software architecture, I studied software development. And so for me, I tell people, you've succeeded, when you cannot just publish something to open source and say, Oh, I did it. And it but I built it, I'm still the only guy maintaining it or gal. And you succeeded when not only you've put it out into open source, but you've convinced somebody else that it's worth their time to build on your library. Yeah. And to do something with it. And to me, you've only achieved that level of mastery when you've become master of the programming language as well. But also, when some of that mastery is also knowing where to bend and break, how to configure what sensible defaults are, and how to componentize something in a way that not everyone will love. But you know, you can get at least a few people to love. And that's how you succeed.
16:50 Yeah, I totally agree. And so much of I built this thing the right way, right way often, is somebody's perception of like, their context and their use case, like we built it the right way. Because we're Google and we have a million requests a second versus we built it the right way. Because we're a startup, and we have 100 users, but we're growing, those two things should not look the same. Probably, you know what I mean? Like
17:14 absolutely best captured in the multi dimensionality problem when you talk today about big data, and they talk about the five V's. And so for a long time, people thought Big Data just meant volume. Yeah, or velocity. Yeah. But the reality is like what you just said, it's a perfect context, like, hey, variety matters, veracity matters, value, your value stream matters. And so that's the only thing and at Apache, we used to have these huge debates. Yeah, I became on the board of Apache, I was hung around for left, they suckered me into doing it. And I did it for five years. And yeah, it was a great experience. But I used to sit there in these like discussions with like, the big data companies who Apache is a great place for them independently, to kind of have a DMZ build software together without wearing their company hat and to achieve some general framework consensus, that doesn't disrupt the value stream for people downstream who want to do stuff commercially. And I'm very supportive of Apaches mission and what they do, and a lot of open source foundations, I became friends with a lot of the founders of that during that time. And so the thing I used to sit though, is like, you'd get the LinkedIn people who've made Kafka initially and donated it to Apache. And they'd have their way, like you said, it's like the Google way. It's like, well, this could only be good if you test it on a million computers, or we're not going to accept this dude or gals patch from you know, this other country, because, you know, they test it on their laptop, and it doesn't pass our massive, scalable test. But I'm like, Yeah, but it adds a feature, just do it in a branch, isn't that what source code control is for, get the person the value for contributing, don't let it sit in a ticket system forever, because then people go away. And your real goal is to capture everybody's interest and contribution in the moment that they're interested in. Either they've got the time and their free time their companies paying them to do it. And time is the thing that you learn. Is that really the last precious resource? And yeah, that's it time, we're never gonna get that back. You can get a lot of other things back, but you're not gonna get time back. And that's the key to open source.
19:08 Yeah. Jason freed from the Basecamp 37. signals through has a great saying that inspiration is perishable, right? Like, if you're currently super excited to add this feature to that thing, but then maybe you work on it, then the PR just sits there and gets ignored. Like, you're done with that product. Like you're not done, right. But if you could have really done a lot of interesting stuff, if they capture that two week period, or whatever, where you are on fire about it.
19:34 Yeah, I've got a good story real quick that I'll share with you and your listeners. It's the nutch project. So nuts was dead post to do. water projects were dead post to do. And the reason for that was that basically, when Hadoop came everyone, it was a new hotness and everyone to go work on distributed systems. They're like, Oh, this is what Google did MapReduce and now it's open source, oh bloom and they all went to that. And they all left us and nuts and I was one of the people holding the bag after that. And we had To in our JIRA and our whatever system, I can't even remember, I think we were still using review board at the time. But we were using review board in JIRA. And we probably had 100 patches. And we had from people that were just sitting there, some had been sitting for two or three years. And basically, because the interest and the community of committers left, that could actually merged the patches, they just sat there. And so we had one gentleman, I won't say his name, but he had been contributing probably 50 patches in like still trying to get something in he was running a web crawler company in the UK. And finally, I was just like, you know, what, you know, because we had these standards in touch that had been imposed on us by the Doug's and the whatever, Doug's amazing guy, and I get it, I get why we're doing it. But I just what I did is I you know, the guy reached out to me, the guy in the UK, and he's like, are you ever gonna merge my patches? Because this is Bs, and you really make me hate open source. And so what I did is I said, You know what, guys, I know, we have all these rules, and blah, blah, blah, but none of you around none of you are doing anything anymore. Here's what I'm gonna do. Oh, and I just merge all of his patches and figure out how to do it and get it in there. That is still alive today because of that, because, yeah, we got that guy interested, he pulled a couple of other people that he was working within, and he's like, Oh, God, the floodgates are open, we can develop again. And then I just let them take their I don't haven't done anything and nothing years. But the projects alive today because of that. And so it's you got to capture absolutely is a scarce resource, like you said, motivate what was that RSA animate video about like purpose motivation. And that's such a great way to capture it to the same thing. So
21:32 absolutely. So these days, you're over JPL. And you have some really interesting things going on there. You're the first principal scientist in the area of data science. What's the story there?
21:45 So JPL has this thing called the principal designation, which basically as somebody that's normally been there, like 50 years. And so you know, I'm just joking. No one kill me for that, please. But it's been there for a long time. And usually, our principles are in, we've got the founder of hyperspectral, their guy, Rob green, you could argue He's the founder of the field of hyperspectral. Science, we've got people who explore you know, we had a guy who used to be the project scientist for the Square Kilometre Array, huge billion dollar international project of ground based sensing, looking at the cosmos and answering the tough question,
22:16 that one has so much interesting stuff in terms of how much data it has, I had some of the folks on from Australia on there to talk about that. It's, it's kind of mine like it, you can't put it in hard drives that so much data type of problems. So yeah,
22:28 oh, my God, 700 terabits per second, I used to in the 10th 2010 to 2015 timeframe, that's or 2016, even, that's all anyone ever wanted to hear from me is that I had some peripheral involvement in that they're like, talk about that, you know, yeah. But we've got the guy that was the project scientists founder of JPL. So I mean, those are our principles, usually. And so yeah, so in 2014, they gave me that title, because they realized that in data science, a lot of the stuff data science was becoming something that basically, we were developing a maturity, a skill set, and a capability and JPL actually needed to go triple down and quadruple down on that. And so what it means Yeah, is that all the experiences they have I talked to people like your great podcasts, Michael and others, I'd sit there and talk, I needed to talk and evangelize to that at JPL. And so that was the recognition, I was an individual contributor, still, then I basically just would tell people, here's all the stuff with data science, it's, it's science, it's math, it's around that time, I wrote a paper in nature, called a vision for data science to and you know, people like me don't normally get paper, in papers in nature. Yeah, that was a big deal. And basically, I was thinking a lot about data science, and how you'll like this, I had this sort of dichotomy in education at the time, I'm a PhD, computer science software engineering person, who after about a decade at JPL learned hyperspectral, remote sensing, why western US water matters cared about the cosmos, and this SK, and I even thought at times about getting master's degrees, but kids, kids mortgages and other things, you know, other interests got in the way, which is important. But I was sitting there thinking, how many of me at JPL make it to me time being there for 10 years. And you know, any software engineer that we hire five years at JPL before they learn the lingo? And it's really hard unless they only live in it, or what we're seeing at the time was an emergence of PhD atmospheric scientists or PhD computational biologists or whatever, who learn Python, believe it or not, yeah, could write code, understood what logistic regression was, and whatever. And we had this emerging class of them as data scientists, they wanted to share their code. They wanted to, they want to work with software engineers. Yeah. As opposed to really and you know, it's gonna sound ages in a way but it really isn't. But as opposed to sort of the generation before, who didn't want to share their data, they wanted nine month publication, moratoriums to them, they want to file a patent and so make an open source code and things like that. And they're still the evolution of those folks into the new generation today. But I was looking at that sort of call supply chain in the education community for data science. And I was at asking myself, what's better? Is it the Python person highly skilled in deep discipline domain? That their software engineering code? Isn't that great. But if we pair them with a Master's or PhD level software engineer, they could clean that up. And then over time, they'll learn it. Because like you said, Everyone starts out with Python, but doesn't mean anyone's going to contribute to their code, you know. And so I was actually seeing more on the PhD atmospheric science side of people in Python, being more useful in data science. And so that's really one of the questions I was asking in that nature paper. And one of the things that I still don't have an answer to today, but we've seen different, I'd say momentums. And it's not just at JPL, and how we source the talent. And the same is true today with ai, ai engineers and things like that.
25:42 Yeah. Well, I think one of the really interesting questions is, do we need more software developers? Or do we need more experts with software development capabilities, right? You hear the politicians and policymakers go on and on, like, we need to teach coding, because we have all these coding skill gaps, and whatnot. And I think often at that level, it kind of gets portrayed as to what we need is more computer science graduates, right. But my theory is what we need is amazing biologists, physicists, doctors, lawyers, who can take whatever they do, and really amplify it with a little bit of code. And I think that's why Python is so powerful is Python is one of these special languages where you can be effective with a very partial understanding of even what it is or what it does. You don't even have to know what how to create a function. And you could be useful in Python
26:31 100% 100% agree, that was the conclusion of my nature paper at the time, and I'm trying to be diplomatic, but let me say something controversial, maybe degenerate cliques? Yeah, I completely agree with you. And it's heresy in my community, where I originally come from, but I don't read transactions on software engineering anymore. I'm sorry, I don't only stay in the software engineering community and computer science. And so for me, I've noticed the same my direct experience in both building big software projects for big, big national and international things and sourcing over hundreds of people at JPL. And in consulting roles and other things. I've come to the same conclusion. I completely agree with you.
27:07 Yeah, yeah. And that's not to discount Computer Science degrees. I think it's, there's a real important role for good software developers with all the practices in place there. But I don't think we need 10 times of those, I think the value would be better if we brought everyone sort of into that camp, rather than growing that in that isolated camp. So yeah,
27:27 totally. And you can relate it back to the Twitter hashtag campaign, the learn to code and why it generated so many, I think, hey, on both sides of the political aisle with that, you know, one of the challenges with that, you know, we're gonna, let's talk about this, we're gonna have a big, there's gonna be a big AI skill gap. And people ask me about this all the time related to AI ethics and other things. So anyways, for your listeners, after all the software development in Java and big data and whatever stuff now I just the guy that keeps reinventing myself, I got into a, you know, an MLM. last five years, I've been doing that pretty much a lot. And so yeah, so I'm gonna make a bold statement that's been said before. So I've got air cover, in the next year, 2 million truckers are going to be displaced, because really smart cars are here. And smart trucks are here. And they will drive and especially the pandemic is accelerating some of these things. And so as that happens, the learn to code like, Oh, we got to take all the truckers and make them software engineers, I think I've developed a happy medium between that and what you and I just said, we're not going to make them software engineers, let's glean Smee knowledge from them, they understand the business, they understand the value stream, pair them with domain discipline, you know, non computer scientists, but say people that want to model the weather aspects of that the computer vision aspects of that the supply chain aspects of that, that have that plus some Python code. And I think those truckers still have jobs. Yeah. If you tell the truckers that the chasm is becoming a Java, or hardcore Python developer, the chasms too big. But yeah, if you tell them your job is to sit here, click on this tool will capture your domain knowledge and labels and you're going to interact with really high power people. We could do that,
29:02 you know? Sure. Well, I think also, the difference is, do you tell them your past 15 years of experience has no value, go back to zero, and learn programming and then go work for a marketing company, versus we're gonna take what you're already really good at what you actually have a ton of experience and are uniquely qualified for it, we're gonna teach you a little code to change, you're gonna build the robots instead of, you know, be replaced by a robot.
29:27 That's exactly right. So that's the win on all sides. And that's the way it's gonna move forward. And I see some progress on that besides the initial you know, everyone thinks Twitter is real life or you know, everything you read on the news is real life, you know, until you live real life and you talk with people and so anyways, I I've seen the people that are making progress in that area doing what you just said.
29:46 I think that's really great. And just for people maybe are not aware, like you pulled out this trucker thing like that's a maybe a DOD X and Y truck driver is the biggest single job category for men in the United States. Probably many places in the world builing know, the data for the United States. Right? But that's significant.
30:05 Absolutely. And it's a big industry. Actually truck drivers that, you know, some people look at this, they learn about they, you know, law enforcement or you know, and this I mean, especially in the latest news and all this thing, this is a big deal. And everything is like, Oh, God, I didn't know law enforcement made XYZ money or you know, they think of it as sort of unskilled labor, things like that a lot of folks but they actually are well compensated for what they do. And actually truck drivers. Besides being that statistic, like, like you said, I learned something new every day. One thing I do know, related to that is that they are also well, renumerated for their services and things like that. And so the challenge, the other economical challenge is to say, hey, truck driver, like you said, your 15 years of skills are gone. And oh, by the way, that maturity that you developed in those skills to achieve that salary, you know, and things like that is also gone. That's a bad lose on all sides. Yeah,
30:51 it absolutely is. But I do think there's more positive path forward. So hopefully, we can go that path. This portion of talk Python, to me is sponsored by monday.com monday.com is an online platform that powers over 100,000 teams daily work, it's an easy to use, flexible and visual teamwork platform, beautifully designed to manage any team organization or online process. Now for most of us, we missed our chance to build the first apps ever in the mobile app stores. It was a once in a lifetime opportunity. But it's one that's coming around again. Monday, comm is launching their marketplace and running a contest for the best new apps featured right from the get go want to be one of the first in the monday.com Apps Marketplace, start building today, they're even giving away $184,000 in prizes, including three, Tesla's 10, MacBooks. And more, build your idea for an app and get in front of hundreds of thousands of users on day one, start building today by visiting monday.com slash Python, or just click the link in your podcast players show notes. So let's talk about Python at JPL. I think there's some interesting angles, especially around some of the remote stuff. A lot of things you guys do work with, like rovers, you talked about spirit, just to have a conversation with those things, is like it, we complain about latency, you know, like that website was slow, or I was playing this game. And it was hard because there was 200 milliseconds of latency. There's different kinds of latency out in space, right when the speed of light is not enough. So taking some of the smarts and putting it on like rovers and other stuff, some of this AI work that you're doing it sounds like it might have some legs.
32:32 Hey, I hope so. And we think it does, too. So Michael, basically the work that we're doing for your listeners, we have a project that we've been investigating now. So let's fast forward the clock. So rovers nowadays, the last one that landed on the planet, I won't say that we shipped because we just shipped one, which we'll talk about called right
32:46 just a couple of weeks ago or something, right?
32:48 We did pandemic shipping and launching of rockets and rovers, the new fad, but yes, for pre that it pre pandemic in 2012, we shipped the Mars Science Laboratory or the Curiosity rover, and that one is about so Spirit and Opportunity just the size them for you know your listeners, it's about the size. Like if you have kids of one of those cars that you push, maybe or something like that, or maybe like a power wheel, big wheel type of thing. That's the size of Spirit and Opportunity. The MSL rover is about the size of a small car like a Volkswagen, you know, buggin, if you came to JPL, and it was open and loved to have some day, and things like that, you could walk into our building 180 and see a full scale model of it to really get the feel of it. But that's the size rover that we're talking about. Now that's sort of the modern class of them. And so 2020 is the perseverance, the one we just launched, it's the same size. So we've got MSL still operating Spirit and Opportunity, you know, aren't anymore because they were solar powered. MSL is powered basically by nuclear fission, uses an RTG power source and things like that. So it doesn't have to worry about solar panels. So it can go for quite a while and has been. So it's a great test,
33:56 basically, as long as it mechanically is still functioning,
33:59 right? Absolutely. And so challenges with mechanical functioning are like, Hey, we learned a lot about the wheels for like a car size thing as we drove over walks and it tore the wheels up, you know, and things and so we did, we learned a lot about them, if you look at one quick update in 2020 is that the wheels have little Homer Simpson speed holes or not speed holes, but holes to prevent having just track and tread that dies by catching on everything. And that's just one thing we learned amongst other things. We got smart engineers at JPL. But MSL is a great platform to test stuff out on however, let's talk about AI and ml. I'm gonna dispel some myths and rumors. So MSL and space assets and others they all need, right? We got to do computing, we need a processor and a board and things like that. They're running off of an old do what is that the latest GPU probably like? Nvidia, like 2080, something like that. Yeah, everybody thinks that and I know you're being facetious and that's why I like the Snark. It's awesome. But yes, no, and that's the challenge. Everybody thinks That and it's not it's running off of a rad 750, which is a BGA chip that's as bad as powerful as a power PC chip and iPhone, one processor. And so and why, real quick? Why, right? When we crash something in the government, we've got a congressional inquiry that we have to respond to. And when this virtual companies do it, and we love the commercial companies, we're partnering with them now they don't, right they can I mean, not to say that it doesn't ruin their value stream or their reputation or things like that. But they've got a little bit more flexibility to do testing and stuff like that than we do. Yeah. And so we are risk averse by profile and definition. And so because of that, we will only use things that are what we call radiation hardened. Yeah. Which means that when it gets up there into space, a space does and cosmic radiation, do weird things to your hardware, they flip the bits amongst that's like the easy stuff they do, they do a lot of other nasty stuff. And so you got to make sure that the hardware works in space. And so because of that, the technology, the Gartner life cycle for what we can use, you know, for that is real behind. And so this big, smart, I mean, this big, potentially smart, you know, and it is smart, they did great things on MSL, and they're gonna do even greater in 2020 is running off of an old processor. So all the AI and ml is human in the loop, even more so coupled with the fact that you alluded to, yeah, Hey, no bandwidth latency, you think that's an issue, the lifetime from Earth to Mars is eight minutes roundtrip. So anything you send to Mars, you got to wait eight minutes to figure out what the heck happened or what even what happened for your report back then, you know, that's not it doesn't all have to be synchronous or asynchronous ways. There are ways to kind of achieve some advantages and cue things up. But it's still it's eight minutes, basically. And so because of that, there's a great video on YouTube, by the way, for your listeners, if you haven't seen it, it's called the seven minutes of terror. It's really kind of closer to eight. Yeah,
36:42 yeah, that's a great one. Yeah,
36:44 yeah, that's for the entry, descent and landing. When they landed MSL curiosity, they had to use a big sky crane, instead of the typical big balloon, wrap the rover in a balloon and let it bounce, which is the way they did it before it was so big. They had to have this elaborate sky crane thing. And in that seven minutes, when you go into entry, descent, and landing there seven minutes before you knew, hey, what the heck happened. And all this stuff had to happen. You know, autonomous land, things like that, which is great. But yeah, normally, eight minutes. And so if I told you today that the Mars surface operations people use about 200 images a day that are taken from the rover from its nav cams, which are cams by the wheels, and it's mass cam, which is the big head that takes selfies, you know, and other things that you see with its arm. If I told you that today, they only use 200 images to plan what to do for rover operations. The next day, you'd understand why we're bandwidth limited. We're limited on what we can process on the rover versus sucking them down to the ground and making decisions. What if I told you tomorrow, we'll get close to that Nvidia chip, maybe not exactly. But there's efforts called high performance spacelike computing to build a multi core GPU like chip that is radiation hardened. It's a big government project that has an emulator already and that they're making and that we also today have Mars helicopter on perseverance, which is a little drone that went along with it that if successful, is running a Qualcomm Snapdragon, which is a GPU like chip, and why are we it's not fully radiation hardened? And all this it does, we've tested in whatever but it's not like has the years and years of testing. Why are we doing that? Because it's a technology demonstration. And we have a bigger, like, the mission is still successful, even if Mars heli you know, is not successful with that, which we call ingenuity.
38:23 Right? And I suspect that the risk to a little drone helicopter thing, the highest risk is not a hardware getting messed up from radiation. It's like it got caught in the wind, and they just crashed. And well, that was nice, but it's upside down. So there it goes. Or there's just so many, it's probably just considered mostly expendable. Let's try this. It's small, and it'll be cool.
38:42 Exactly, dude. And all the dads in the world just knew what we were talking about from their drones. They were given a couple Christmases ago. But yes, like, your drone is gonna flip upside down. It's gonna do you know, all the nasty stuff that you don't think about because driving a drone is hard. But apparently, you know, JPL is pretty good at all this stuff. So we have a really good feeling this will be successful. But yes, so those will have GPUs ish. And we can do some stuff. So a couple things that your listeners and you might care about that we're doing. So we have a task to do killer apps, given the new GPU environment for future rovers, one of them we call drive by science, we take Google show Intel tensor flow image captioning model, which I don't mean to just jump into tech speak, but basically uses a labeler for object labeling, which is a head that we call you know, could be resnet 50, it could be VGG 16, which is given an image, what objects are in it, and then it uses it,
39:31 that's a rock that's sand, that's a ledge, that kind of stuff, right? Bingo. And don't go over the edge By the way,
39:38 we that's the second part, which is the language model, which we use an RNN or an LS tm a recurrent neural network or a long short term memory network to basically learn the language surrounding those labels. Ah, this rock is close bedrock is far away. This planar surface is you know, there are three objects in this planar surface. and things like that. And those are all language properties than an lsdm. And an RNN when you hook it to, basically an object detection network or a CNN a convolution neural network, basically, given an image, I can emit a human language sentence about it. Now, what are a caption about it? Now, what's the value of that the value of that is that text is cheap, it's a lot smaller than images. And so instead of 200 images today, on that very, very, very thin pipe, I can send you a million captions, if I can run the image captioning model on the rover, if I could run it on a GPU or if I could run it on a heli, and things like that drone asset. So yeah, that's drive by science that's make the rover not miss science when it's driving. And we got away, you know, those eight minutes. Yeah. And so that's very helpful tomorrow for people. The other is also another image recognition problem. It's called energy aware, optimal auto navigation, you know, say that five times fast. But the idea is that when the rover's motor, hey, even though it's RTG, and things like that, we still want to be it's got asleep for a while and recharge, you know, and stuff. And so we want to be efficient with its time. And so when we do that, we want to know if in the distance, we're going to drive over sand, or we're going to drive a rock because if it's sand, just like your car, it's going to have to expend more energy from the motor than it if is if it's going to be rocks, and the wheels are going to catch better. And so we do image recognition with that using a similar network. And then we develop a motor power profile from that based on what we're seeing. Yeah, so that Yeah,
41:26 because if you say, I want to go inspect that Canyon, go there, it would be great if you could get there twice as fast.
41:33 That's exactly right. And so today, we've got car size rovers, because that's what we got 50 million instruments I 50. But we have 20 instruments or so it's a big laboratory that we want to do experiments on, you know, and stuff in the future with Mars sample return, and other profiles and missions, which we'll talk about, but MSR that has something called the fetch rover, what 2020 the current perseverance rover is going to do as its drilling cores and analyzing stuff, it's going to drop tubes on Mars, that a future mission, MSR Mars sample return that has the fetch rover, which is a much smaller rover, that's going to have to drive way, way farther, if successful. And now this is a collaboration, it'll
42:11 go around and pick up these tubes, and then return them to us,
42:14 bingo, and it's gonna take the tubes, it's gonna take them to a spot, drop the tubes off, launch the tubes up into space to an asset, and then take that asset back, you know, on the journey back to earth and get a samples from Mars. Amazing mission. Yeah, we have a lot of the technology we didn't have 10 years ago, when they envision this, it will be successful. And I'm going to go out there on a limb. And so but yes, those rovers, those rovers, like the fetch rover, they need energy were optimal out of navigation, they're going to need, it's going to be a much smaller rover, possibly RC car, not maybe that small, but maybe like a bigger RC car type of thing it's going to be or maybe two of them, it's that size rover, it can drive much farther. And so energy where optimal auto navigation is real big. And so is drive by science, even though it's not going to have all the instruments on it, visual, optical things like that all the deep learning models that are built terrestrially today, we can leverage those. So
43:05 that's really cool. And the nuclear power versus solar power seems like something that's almost required, if you're going to burn that much energy running high end GPUs ish as you put them, right? You don't want to burn up all your solar energy for the day, trying to figure out where you should drive.
43:22 Yep, absolutely. And so they estimate that. So the trade on that is going to be how they packed that energy profile in there. But there's a lot of Spark, again, with all the advancements that are happening in the smart car and other industry and things like this, for how to do power in an efficient way in different cell, you know, cell technologies and stuff like that. We're not that tech is here today. And which is good, because we're in Phase B of that, which is there's ABCD, you know, either you're operating the mission, you know, as pre phase a planning, we're already in B, we're partnering with the Europeans. They're now building the rover on MSR, or working together. And so yeah, we're gonna need all that.
44:01 Yeah. But sounds like a really cool project. Do you see Python running on Mars potentially? Or would this be like something lower level? I think
44:08 Python could run on Mars. They told me 10 years ago, Java could never run on earth to power ground data system. And we figured it out. It did. Yeah, it did. Python will run on Mars. So I guarantee so Python today, if we take the emulator stuff that we're doing, and also the stuff we're doing on the Qualcomm Snapdragon Python is running in those environments, because we're using TensorFlow light, we're the hardest part is the magic to is like, Oh, we got all this deep learning, you know, bring it into this embedded processor. And then everything breaks, right, your model, you got to quantize the weights, you know, you don't have the same floating point units. Wow, what a pain and you know, Google knows this. And they're making a lot of investments to make this easier with the ecosystem. So Facebook and pi torch all the places are Nvidia, they're figuring out they need to make that ecosystem process a lot easier for deployment, because everyone's just assumed capacious everything and infinite everything. That's fine. That's where the value stream for these products came from,
45:03 especially when they come from the cloud where it's just scales up. And yeah, we'll just throw machines at it.
45:08 That's right. And so yeah, that's the biggest challenge right now is, but yes, today, Python is running on hpsc emulator with TensorFlow light, we've got it working, we're running these models, basically, instead of using resnet head and stuff like that we're using, like many versions, like many models of that, like tiny resnet, and other things that have quantized, you know, way to trade accuracy. But the real thing that this is driving, and this is a big research area I'm in right now is what's the least amount of labeled data? How quantize weights and things like that to still achieve? And what are the limits of learning? What accuracy Do you need that you can trade that you can give some up on for that and still achieve the results? You're looking? You'd be surprised at some of the results like you don't need these insane accuracies. Like you're better than human results, you just need really good results and things like that.
45:54 Yeah, especially if what you're trying to accomplish doesn't have to be completely accurate. If it can just do better. It's a win.
46:01 That's right.
46:04 Talk Python to me is partially supported by our training courses. How does your team keep their Python skills sharp? How do you make sure new hires Get Started fast and learn the pythonic? way? If the answer is a series of boring videos that don't inspire, or a subscription service you pay way too much for and use way too little. Listen up. A Talk Python Training, we have enterprise tiers for all of our courses, get just the one course you need for your team with for reporting and monitoring, or ditch that unused subscription for our course bundles, which include all the courses and you pay about the same price as a subscription. Once For details, visit training dot talkpython.fm/ business or just email sales at talk python.fm. Let me ask you just really quickly about what else you see Python dude and JPL that you think is interesting, you want to share. And then I'd like to talk just like a couple of other things that you were involved in that I think were interesting.
47:00 So I think Python has real deep depth in Data Science at JPL. Jupyter is now you know, Jupyter notebooks is the lingua franca of the way that we share data science with people I've got in my division. So I leave the department now. So I'm even they took away all my keys to tech, and I'm not supposed to do anything. That's why I do these interviews with you. And still the open source. So I can do some tech, you know, and be involved. But my people that are doing the tech Now, again, like when they show me something, they show our stakeholders something business people things, they jump into Jupyter. And that used to be
47:31 it's not like a PowerPoint picture of what they drew. And then they could talk about the code, like here is the live thing, right?
47:38 They did. There's the live thing and they show it you know, we have demonstrations to partners and various industries and things. We've got the space demonstrations, and they're jumping straight into that they're showing these amazing visualizations, tabular statistics, and they've got the data story. And so one thing I tell all the people is you got to become visual storytellers. Nowadays, you got to be able to communicate and, and Jupyter is perfect for that pandas, the whole ecosystem matplotlib, even but also plotly and how you can embed like, okay, and some of these things that anyways, all that ecosystem is for communicating data science is big. And then there's such connection between Python to the cloud community and whatever to the funny joke back in the Peter Wang and Travis and I days and Andy Terrell was, you still need us Java people, because all your Python does are their thing clients to our Java server. Yeah, that's not totally true. Today, it hasn't been sort of fully supplanted, I would say there are still some cases where that's still true. But Python has come a long way. And its ability to support big processing natively in Python. And especially if you're doing deep learning, but even for more practical workloads, and non GPU workloads and stuff, Python has come a long way. And so there's still very, very deep, I would say, infection of Python into JPL, not just in the specialized domains and disciplines engineering and science, but we see it in it. That's where I live now. And it I used to be the deputy CTO, but now I'm division manager for AI. And I report to the CIO, and manage these people. But it's now infected Python into business, it's infected into even some of our programmatic director, it's like you said, you know, you got the managers, the solar system, exploration managers that will jump into a Jupyter notebook to show you something, right, and some of them and so that's really told me that it's become really material, you know, and again, we're not having the Python is good battles anymore. And that's why I also believe Python will make it onto the rovers and into space, because at that level, if it's sort of infected into that domain, all areas of your business, it really is there. And so I'm sorry, Java community, I love you. I still do stuff with Tika every now and then. But it's been a while, but I do Python now. So
49:40 you know, I had this really weird experience long ago, when I learned Python. And I'd come from C#, which is C++ and then C# and then Python, but JavaScript sprinkled in, and I think C# is a pretty good equivalent to Java in terms of like how the language looks and works. That's earning a battle between those two things right. But I think like Just compared to Python, like those are sort of coming from the same space. When I went to Python, it was really weird to me that whitespace mattered that there wasn't like a curly brace type thing. Like, I didn't miss the semicolon, that was fine. Like an if statement didn't have parentheses around the condition like, gosh, that is weird. Like almost every language I've worked with, from scheme to C++ to C to they all they just to me, that's what language meant, you know. And like it was, my paradigm was, this must be part of the programming languages, all these sort of structural symbols, because everyone I worked with that was a real language was like that. And so I came to Python. And it took a week or so to get comfortable. And then I was like, I have this I'm kind of okay with this. Now, that's fine. Like, the editors are super smart. So they kind of like build the structure for me. Anyway, I hit colon and enter it auto indent, this is great. And then I went back, and I thought it would be fine. to work on some project, I was still working on it. And I'm like, Why are these symbols here? I used to think they had to be here. Now I'm typing all this junk? And literally, it's not necessary. Like, it's literally not necessary. Why am I doing this? Right? Like, you could have the statements without the parentheses just as well. There's just so many things. And I, I really, like you talked about, like, I don't do that much Java anymore. And they're just like this comfortableness to work in this space. It was unexpected to me, I guess,
51:18 hundred percent. Same thing. And the real evangelist for me I want to give credit to them was a guy named Sean Kelly, who was a member of the plone Foundation Board. And Sean actually is a been a contractor engineer for us at JPL for a long time. For your listeners, JPL had Larry wall, the inventor of Perl, actually, he was in my old section. And when I was there at the time, but I was young. And I didn't know him. So for those of you, but you know, we had Sean Kelly, who was one of the founders of plone, which largely, I'd say plugins still developed, but other CMS have come but plone was like the thing. It came a foundation for a while. And Shawn was heavily involved in Python and PSF, and stuff like that. And so his thing for me was Chris, yeah, he got to get on this. He had been telling me for years, you got to get on the Python drug. And he same thing, so I get on it. And he's like, you won't miss all the stuff you just said Michael like you don't miss semicolon you. So then I'm still co developing a lot in Java and Python to beginning and I'm going back and forth. And I'm just like, oh my god, it's so it is so bloated. I can't stand this anymore. Like, oh, yeah, this is a waste for stupid curly braces. And this is just a waste. And, and then also, one thing that Python does is it almost makes you think more executive like in a way I think about that it's bullets. The reason that tabs are there is it's like bullets. It's indent this it organizes your thoughts and just a more natural way. Whereas the structure in Java and other more verbose languages is imposed through like you having to sort of renumerated but this is life. I mean, over the years, what we have in an ID now didn't exist before. And actually the way I learned to program and maybe you and I still think this valuable skills, I still go into vi instrument, the code, I don't even use debuggers, because they really sucked back in the day when I learned, I mean, gdb was there and other things. But the debuggers weren't that great. And so I still guess what people instrumentation in any language and printing stuff out and values of your variables works independent of any debugger. And so people look at me nowadays, too. And they're just like, Oh, God, you still do. I'm like, yeah, and it works. And I still managed to be hyper efficient. So some of these like basic constructs, but again, like the tooling, the IDs and all of that they eventually supplant it. I think Python is a natural evolution of language. Everyone tells me Julia's The next thing, actually, my PL people at JPL, my programming language walks who are just amazing sometimes people world I got Julia, Julia, Julia, and I like, it's great. It came out of the DARPA x data program. I was a part of that I know all the people that MIT that made it fantastic. It's growing. It is not Python Python has now become, it's not just scientific, cool. It's enterprise. And Python is that that's I really think it's the plan to Java in many ways. And yeah, Julia may be the next thing. But by then I don't plan to be programming. Sorry.
53:50 You know, I may be sitting on a beach somewhere. Hmm. Then on a beach has the goal. Perfect. All right. Two other things I want to talk about. We'll get short on time here. But just really quickly, tell us about a couple years ago, I read this book called the Panama Papers, and it was rocking the world. There are so many people who had been doing shady things through offshore companies and whatnot. And there was a guest someone on the inside that dumped gigs and gigs of data that exposed a bunch of folks. I don't remember the details well enough to go into them. But it was a big deal. And some of your projects were involved in sort of the discovery of that. Right. Tell us about that.
54:30 Yeah, so the key was Tika. And that was a really interesting time. We were right in the middle of the DARPA memex program, which was to build the next generation of search. And by that I was mostly exclusively in technology development. It was pre it I had moved into it. I was finishing out my career really, in engineering and science and leading a team of real rock stars, the best in the world, some of them who are building and have built Siri out at Apple in their future things right now, some of them who have been bought by Apple for huge company values. Talent buys people That just went in and changed the world in search. I was every time I worked with these DARPA programs, which is why I love DARPA so much, and NASA and DARPA work well together, I look around the room and I'm like, Oh, God, you know, I just fanboy geek out on all the people that are there. Yeah, I
55:11 worked on some DARPA project as well, and has the same feeling.
55:14 Yeah, that's so awesome. And so we should talk about that offline. But so yeah, so I'm sitting there, I know, Max. And Max, my big goal was to build out Tikka to evolve Tikka for the next generation, not like full AI. You know, today, although we're in the process of really AI and Tikka and its constructs, you know, and but it was like the first step into AI beyond just statistical information retrieval, which like Tikka does. So what is tikka, it's the digital babelfish. The way I describe it is Tikka is just like the Babel fish from The Hitchhiker's Guide to the Galaxy, you put it to your ear, and you can understand any language tikka, you give it any type of file format, any type of file that exists on the internet that we know of 1400 plus file types more, Pika will extract out the text, extract out metadata from the file, and it'll tell you information about the language of the file, which is basically everything, you need to do something with the data in the file. And it incorporates all of the major third party parsing libraries that are free and open source to get that information out. And it uses all the standard metadata models and this and that. And so what we were doing during memex, is we were evolving Tika to support the non standard types of content, you know, the easy text and the other things. But when you get an image, instead of just getting metadata out, get out who's in there, do object recognition and tell me that people places things, dates, times, and other things that are in those images, videos and multimedia formats. That was the goal of mob x. And so we were significantly building out teak at the time, there was so much action going on to the open source community. Well, we had this guy, Matthew Corona, griglia, show up. And I'm like, Oh, you know, who is this guy. And he had built node tikka, we had a bunch of people building teacup interfaces into other programming languages. And so he did that he contribute it and we start talking to him. And he starts asking us these questions and telling us, he's part of the ici for j or international consortium of journalists. And we're like, cool, we start looking it up, and then boom, Panama Papers drops. And we're like, Holy sh they use Tika. That's why that guy was asking us all these questions, you know, and, and all this. And so what they did is Yeah, they got this data dump was leaked, from a company called mn seca, who, which basically said, Yeah, like the heads of state and various countries and famous actors, your favorite ones like mine, Hermione, and Emma Watson, they all had money, you know, in shady ways, and these offshore accounts, and they had their wealth there, you know, and of course, that has all sorts of ethical and other implications. So that was in these data files, which were leaked off of a content management site, 11 terabytes of data. And the way that you did that, is you do a big ETL, extract, transform and load process and do data forensics, using tools like Tika,
57:42 and literally, I write told me the important stuff about these gigs of files, and then we'll go through that,
57:47 bingo, what are the people places, things, connections and other things. And if you look at the Wikipedia page, it was like, we're the fastest Wikipedia pages I've ever seen made. When that story came out. It's like it had all it was like the hugest Wikipedia page, about midway down through and I got a link to my page on Wikipedia from this. They say one of the probably the key technology, you know, in doing this was Tika. And so by the way, those journalists and others won the Pulitzer Prize in 2017. And so I tell people, hey, I contributed to the Pulitzer Prize. And so yeah, that was a big, big deal. And that, yeah, that's awesome. Some people only know me for that, you know, I had a very small hand in it. But then again, I helped invent Hadoop. And we found out many years later that that's what the NSA used to build. accumulo Yeah, I mean, so I don't really care about all this stuff. Like people asked me like, are you ethically concerned and what my goal is to change the world and build the software that everyone uses, like you can't control for that what you can control for is that in the world got better somehow, in some way. You can't control the maybe actors and how they use it. But I think the sum total of everything at the end I'm proud of so
58:48 yeah, well, this is definitely a check, you know, plus one for the good guys on the Panama Papers. That's really cool. It didn't last maybe tell people about your book, your TensorFlow book.
58:57 Yeah. So So right now, in this sort of era, where I'm deep into AI about a year and a half ago, I tell myself every year and a half to two, not that I get bored, but my wife lets me, you know, do some fun project. And she helps. That's amazing with our three children. 11, five and three, boy boy girl, right in the thick of things, and energetic ages there. Yes, energetic and suffering from this pandemic like everyone else, but we're getting through it. But yeah, so about a year and a half ago, I was reading and I'm a Manning author. So I wrote take an action 10 years ago, and I so I get access to the books and like they tell me about the books. And so they were spoke on machine learning. And it was like, you know, it was TensorFlow. And it was machine learning with TensorFlow. And I said, you know, what, I want to learn what my people are talking about. There was heavy debates between TensorFlow or pi torch or whatever. And there hadn't been a Manning pi torch book, there was a TensorFlow one. So I said, cool. Let me read it. I'm reading it. And when I'm reading it, I'm like, Hey, I got to go deep. Each one of these chapters that's in there, there's like a suggestion at the end on the CNN chapter, there's like a bullet. It's like, oh, and you could build a facial recognition system, but we're not going to do it. You know, the author, the first edition to make guy named shot Shukla, UCLA PhD in computer vision, he's at a startup now. He's doing great things. I've communicated and corresponded with him during the reading of the book, and also the development of this book, you know, he kind of throws out there, oh, yeah, you could build a facial recognition system. And so you go explore that, like, go look at VGG face. So I go explore it. And I'm like, the data doesn't exist anymore. You can't find this model anywhere. And I have to rebuild all this stuff for scratch and get all the celebrity images. So VGG face is basically a celebrity image recognition model. Actually, Google just came out with the product, that's basically the commercial version of it. But it's this sort of seminal model from 2015. That's like facial recognition using cn ns. And so I start to go build it and like seven weeks later, stand up at night, you know, multiple hours. And I'm like, God, this is like a graduate level program, you know, assignment, if you want to work hard. It's a lot of work. So my wife sees me doing this over like a nine month timespan, and I'm like, Oh, my God, AI is real. I know what Ellen's talking about, you know, or this and that. And I'm like, and so I had, at the end of nine months, basically a ton of Jupyter notebooks, a ton of data sets, I had built a ton of end and examples everywhere that he threw a bullet suggestion at the end of a chapter where he'd do something, I implemented basically, a new chapter and a half for that. And so that's tense machine learning with TensorFlow two or second edition. That's my book. It uses TensorFlow, but I will say it's TensorFlow and friends. Yeah. So you know, some people have told me it's not fully updated to the latest TensorFlow, two x or whatever. So what I did is I did my typical Chris thing with that, I got the head of AI at Google, Scott penberthy, a buddy of mine, basically write the foreword. And in the foreword, he talks about look, in the time it took Chris to read this book, we released 20 versions of TensorFlow, one x. And we also started to, you can't chase the version, all of the material knowledge of doing data cleaning preparation, building these models, evaluating them, none of that changes, if the train step changes, or if you use declarative programming versus imperative, or if you don't use placeholders anymore. Those are implementation details that could be swapped out in a few lines of code. But what I really do, it's now 450, almost 500 pages, it's going to be released in a month, I just finished all of the chapters Michael and I just finished, I just went to three thirds review, it's about to go to production. Awesome. Em, just by the way, people listening, due to time travel of podcasting, this will be just about the time, it should be just about out or just released a week or two ago, that it's gonna be great. Please check it out. It really adds an supplants the first edition of the book, thank Nishan for doing it, and so forth. But it's it really is a book on machine learning, and deep learning and things like that and how to do it. You know, it could be a textbook, but it's written in my you know, Dad type of style and jokey funny. It's got a lot of my personality in there. I hope you guys like it. But that's the book. Yep.
01:02:44 Awesome. Yeah. Well, people should definitely check it out. Especially if they're interested in TensorFlow. It sounds like a great one. All right. Well, I think we're just out of time, even though I know we've only scratched the surface. So let me ask you the final two questions before I let you out here. You're going to write some Python code. What editor do you use?
01:03:00 I'm old school. So I will jump into Emacs or I'll jump into Jupyter and I write a letter to Jupyter to start because I'm doing a lot of exploration and then if I go like a script, I'm jumping into Emacs. Okay, cool. Cool. And then notable pi pi package you're not necessarily the most popular thing that people but like something you found like oh, this is really cool people should know about x Oh, that's a good one near and dear to my heart right now are let me complain about something first and then I'll do negative and positive so god why does the Yahoo stock quote thing have to change like every few years like like, I was using like, why stock quote after the old one broke now it's like and then just all of a sudden that broke and now I got to use you know why finance and like it changes just ever so slight. So that bugs me, but thanks to the guy who wrote we finance and updated that to whatever because I think Yahoo's upstream stuff breaks. But I do a lot of I mess around in financing quad and stuff like that, just for fun. And so you know, anyways, that's Yeah, that's when it kind of bugs. One that I really like, is tiku. dm. That's, yeah, I just love Kiki dm. And so it's basically progress bars, and it's instrumentation over iterators nice progress bars. Yeah, amazing progress bars, beautiful Jupyter progress bars, beautiful command line, progress bars, all the stuff that you make lame progress bars, if you just try and do it with print statements yourself. You seek EDM, so
01:04:18 Yeah, all right. Very, very cool. All right, good recommendation. And final call to action people. Guess I'll give you two angles here. One, people want to get started and do more TensorFlow, what do they do? Or maybe they're interested in space and JPL, and they want to get closer to you guys in some way about that as well.
01:04:34 You want to get started with TensorFlow, spend 90% of your time not using TensorFlow or any ml toolkit and create a clean dataset in pandas. Make sure you have clean labels, make sure it's a tabular structure. That's the biggest mistake or not even a mistake, but challenge that I see people doing. They're using imbalanced classes. It's ugly data that hasn't been cleaned. Machine learning today still requires clean data. And so you do that you can bring it in TensorFlow pi torch anything, just clean your frickin data and spend the time doing it and get a good data set. decadent. How do you get started at JPL and NASA, look at our open source and other code. We have a big open source library on GitHub, look at some of our projects, read our press releases, and then connect with me connect with a lot of our people are on LinkedIn connect with us, the best way to do it is to come not just saying you have XYZ skills, but you've actually read about a project like you've done or the research, Michael, and you know about some of this stuff, and you have specific areas you want to contribute. JPL has an amazing internship program, 800 people a year, we took 450, even during the pandemic, virtually, and it's still going to great way to come in school and join like I did. And then also we hire for full time, and JPL jobs is the best place to take a look at for that. Fantastic. Oh, that's awesome.
01:05:46 Well, thank you so much for being on the show, Chris. It's been great to tell you.
01:05:49 Thanks for having me, Michael. I really appreciate it.
01:05:51 Yeah, you bet. Bye bye.
01:05:55 This has been another episode of talk Python to me. Our guest in this episode was Chris madman and is brought to you by linode and monday.com. Start your next Python project on the nodes state of the art cloud service. Just visit talkpython.fm/ linode li n od E, you'll automatically get a $20 credit when you create a new account. Build your idea for an app and get it in front of a hundreds of thousands of users on day one. Start building today at the monday.com marketplace by visiting monday.com slash Python. Want to level up your Python. If you're just getting started, try my Python jumpstart by building 10 apps course. Or if you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python we should be right at the top. You can also find the iTunes feed at /itunes. The Google Play feed is /play in the direct RSS feed at /rss on talk python.fm. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code