Python and ML at NASA Jet Propulsion Laboratory (JPL)

Episode #286, published Fri, Oct 16, 2020, recorded Fri, Aug 14, 2020

Episode Deep Dive Links Transcript

NASA's Jet Propulsion Laboratory (JPL)'s primary function is the construction and operation of planetary robotic spacecraft, though it also conducts Earth-orbit and astronomy missions. It is also responsible for operating NASA's Deep Space Network.

On this episode, you'll meet Chris Mattman. He's the Division Manager for the Artificial Intelligence, Analytics and Innovation at NASA JPL and he's JPL's first Principal Scientist in the area of Data Science. We cover a wide range of topics, and dive into how Python and open-source are growing in the space exploration field. And he answers the question of whether he thinks we'll have Python running on robots and rovers in space.

Episode Deep Dive

Guest Introduction and Background

Chris Mattmann is NASA JPL's Division Manager for Artificial Intelligence, Analytics, and Innovation, and JPL's first principal scientist in the area of data science. He has a long history working on big data and software engineering projects, including NASA Earth science missions and open-source search technologies such as Apache Nutch and Apache Tika. Chris pivoted from Java-based mission systems at NASA to Python-powered data science solutions, emphasizing the flexibility and community-driven nature of Python. His recent work focuses on machine learning and AI applications for NASA missions, alongside open-source contributions that span government, research, and industry needs.

What to Know If You're New to Python

If you're fairly new to Python, here are a few points to help you get more from this episode:

Python's readability and simplicity make it particularly effective for scientific and mission-focused contexts.
Tools like Jupyter and frameworks such as TensorFlow or PyTorch are central to data science and AI projects at NASA JPL.
Basic understanding of data structures (lists, dictionaries) and working with external libraries (via pip install) will help you follow many references in this conversation.

Key Points and Takeaways

NASA JPL's AI and ML Vision Chris oversees JPL's AI, Analytics, and Innovation work, which includes using Python-driven solutions for data science and machine learning. His team aims to integrate advanced ML models into mission operations, tackling bottlenecks like communication latency and bandwidth on Mars missions.
- Links and Tools:
Python on Rovers and the Mars "Drive-by Science" Initiative With upcoming rover hardware potentially featuring more capable (radiation-hardened) GPU-like chips, there's a push to run ML models directly on the rover. This includes drive-by science: a rover automatically classifying rocks and terrain in real time and sending back only key textual summaries to conserve bandwidth. Such on-board AI could significantly reduce the eight-minute communication delay and allow the rover to make smart, autonomous decisions.
- Links and Tools:
  - Mars 2020 / Perseverance Rover (mars.nasa.gov/mars2020)
  - Qualcomm Snapdragon (discussion of GPU-like chips)
  - TensorFlow Lite (tensorflow.org/lite)
Radiation-Hardened Hardware Constraints Current hardware on Mars rovers often resembles an old PowerPC chip due to the need for radiation-hardened processors. These constraints mean that large-scale deep learning models must be optimized (e.g., through weight quantization) or partly computed back on Earth. The conversation underscores how Python can adapt well once the hardware is ready for more advanced compute.
- Links and Tools:
  - RAD750 Processor (bae.com)
  - HPSC (high-performance spaceflight computing) Project (nasa.gov)
From Big Data and Search to AI: Chris's Path Chris cut his teeth with Java-based big data, working on Apache Nutch (the precursor to Hadoop) and building ground data systems for NASA missions. Over time, he shifted to Python, drawn by its readability and extensive scientific stack (NumPy, pandas, Jupyter, etc.). He cites how the Python community's momentum now parallels or surpasses earlier Java-based ecosystems.
- Links and Tools:
Apache Tika and the Panama Papers Chris is one of the creators of Apache Tika, a toolkit dubbed the "digital babelfish" for parsing and extracting data from diverse document formats. Tika played a key role in uncovering corruption in the Panama Papers by helping journalists sift through 11+ TB of leaked data. This success story underscores how open-source Python (alongside Java-based Tika) can drive real-world impact.
- Links and Tools:
  - Apache Tika (tika.apache.org)
  - Panama Papers overview (icij.org/investigations/panama)
Open-Source Culture and Collaboration at JPL NASA JPL historically used a variety of languages (C, C++, Java, Perl), but open-source Python is now heavily favored for data analytics, AI, and quick prototyping. Jupyter notebooks have become the standard for sharing results, bridging the gap between scientists, software engineers, and managers. This collaborative culture mirrors the open-source approach: share code, capture domain knowledge quickly, and iterate.
- Links and Tools:
  - Jupyter Notebooks (jupyter.org)
  - SciPy ecosystem (scipy.org)
Mixing Domain Experts with Python Skills Rather than turning every domain expert into a hardcore software developer, JPL pairs scientists and domain experts (e.g., Earth scientists, atmospheric specialists) with Python-savvy engineers. This synergy leverages the unique insights of each group, rapidly prototypes solutions, and speeds up adoption of new technologies like machine learning.
- Links and Tools:
  - NumPy (numpy.org)
  - pandas (pandas.pydata.org)
Internship and Hiring at NASA JPL JPL takes on hundreds of interns each year, even during challenging times, emphasizing that real mission work benefits from new talent and fresh eyes. Interested developers or students are encouraged to engage with NASA's open-source projects and show initiative by exploring JPL's code, reading about missions, and reaching out with targeted skills.
- Links and Tools:
  - JPL Jobs (jpl.jobs)
  - NASA's GitHub (github.com/nasa)
Balancing Data Preparation with Modeling A recurring theme is that clean data stands paramount over fancy models. Chris points out that 80% (or more) of a successful ML project is about data gathering, cleaning, and carefully labeling. Models like TensorFlow or PyTorch come after establishing a robust foundation in data preprocessing.
- Links and Tools:
Chris's TensorFlow Book Chris co-authored the second edition of Machine Learning with TensorFlow, featuring real-world applications and expanded coverage of deep learning concepts. It's designed to help developers understand and implement ML pipelines rather than just chase the latest library APIs. This underscores Chris's educational focus on solid fundamentals and lasting patterns in data science.

Links and Tools:
- Machine Learning with TensorFlow, 2nd Edition (manning.com)

Interesting Quotes and Stories

"We built Java-based mission software back when C++ was the norm. Now we're seeing the same wave with Python." , Chris Mattmann

"The rover might drive past some really interesting rock formations. Instead of sending 200 images, it can send 200 captions from an AI model, there's a massive bandwidth saving." , Chris Mattmann

"The reason we don't have the latest GPUs on Mars is cosmic radiation. You have to have a processor that won't fry under radiation. That's why we're still using these older chips." , Chris Mattmann

Key Definitions and Terms

Radiation Hardened: Hardware designed to withstand cosmic radiation without losing data or malfunctioning, crucial for space missions.
Jupyter Notebooks: A web-based interactive development environment for data science and scientific computing in Python.
Apache Tika: An open-source toolkit for detecting and extracting metadata and text from a wide variety of file formats (the "digital babelfish").
Drive-by Science: Concept of rovers running onboard ML to identify interesting scientific targets without waiting for ground control instructions.
Quantized Weights: Technique in neural networks to reduce the precision of model parameters, allowing them to run on low-power or resource-constrained devices.
DARPA Memex: A research program aiming at advancing the state of the art in search and information retrieval, part of which led to Tika's advanced capabilities.

Learning Resources

Python for Absolute Beginners (talkpython.fm): A great place to gain solid Python foundations if you're new to the language.
Data Science Jumpstart with 10 Projects (talkpython.fm): Practice building real data-driven projects from the ground up.
Build An Audio AI App (talkpython.fm): Dive deeper into machine learning and AI, especially if audio or transcription interests you.

Overall Takeaway

The conversation highlights Python's pivotal role in the future of space exploration, from controlling rovers on Mars and performing onboard AI to building open-source data processing tools that deliver global impact. Chris Mattmann's experiences show how Python empowers multi-disciplinary teams, drives innovation at NASA JPL, and shapes next-generation missions, ultimately making cutting-edge science more efficient, powerful, and accessible.

Links from the show

Chris on Twitter: @chrismattmann
Chris at JPL: jpl.nasa.gov
Chris' book, Machine Learning with TensorFlow (45% off with code talkpython45): manning.com
Nature: A vision for data science: nature.com
Open source at JPL: github.com
Apache Nutch: nutch.apache.org
7 Minutes of Terror: The Challenges of Getting to Mars: youtube.com
tqdm package: pypi.org
Panama Papers: wikipedia.org
Episode #286 deep-dive: talkpython.fm/286
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #286 deep-dive: talkpython.fm/286

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 NASA's Jet Propulsion Laboratory, JPL's primary function, is the construction and operation of planetary robotic spacecraft,

00:07 though it also conducts Earth orbits and astronomy missions, and it's responsible for operating NASA's Deep Space Network.

00:14 On this episode, you'll meet Chris Matman.

00:17 He's the Division Manager for Artificial Intelligence, Analytics, and Innovation at NASA's JPL,

00:22 and he's JPL's first principal scientist in the area of data science.

00:27 We cover a wide range of topics and dive into how Python and open source are growing in the space exploration field,

00:34 and he answers the question of whether he thinks we'll have Python running on robots and rovers in space.

00:40 This is Talk Python To Me, episode 286, recorded August 14th, 2020.

00:46 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:06 This is your host, Michael Kennedy.

01:07 Follow me on Twitter, where I'm @mkennedy.

01:10 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:16 This episode is brought to you by Linode and monday.com.

01:20 Please check out what they're offering during their segments.

01:22 It really helps support the show.

01:23 Chris, welcome to Talk Python To Me.

01:25 Hey, Michael.

01:26 Thanks for having me.

01:27 It's great to be here, and hi to your listeners.

01:30 Well, I almost don't know where to start.

01:32 There's so many things that you have done in Python in the open source space that I think is going to be really fun to talk about.

01:40 But I think let's focus mostly on space and JPL and that, and then we'll get to what I think will be a surprisingly large and impressive list of interesting things that you've done.

01:52 And so we'll start our conversation there.

01:54 But before we get actually into that, let's just start with how you got into programming in Python.

01:57 Yeah.

01:58 Well, it's a long story.

02:00 So I came from the Java world, rewind the clock maybe, I don't know, 15 or 20 years ago.

02:04 And I grew up in a trailer in Santa Clarita.

02:07 It's about an hour north of LA.

02:09 And let's see, my childhood wasn't super interesting.

02:13 But anyways, I became a teenager.

02:14 I like to tell people, I think I had my first longer than five-minute conversation with my father when I was a teenager.

02:20 When my brother, who was out hanging out with the ladies and being an extrovert, I was an introvert,

02:25 he asked me to read the paper with him, the local paper.

02:28 And so that was nice.

02:28 I think I had longer than a five-minute conversation with him at that point.

02:31 And so yeah, so I went from there.

02:34 I played sports in high school.

02:36 I went to Saugus High.

02:37 I was 5'9 up until my sophomore year.

02:40 And staying that height throughout, everyone else got taller.

02:43 So they got to play football.

02:44 I didn't.

02:45 So I had to do something else.

02:47 I had a 4.6 GPA.

02:50 The only reason I didn't have a 5.0 on a four scale was they didn't have honors football.

02:54 So I decided to go into computers at the time.

02:57 And so I went to USC.

02:58 I couldn't afford it.

02:59 I'm still paying it off right now.

03:01 And when I was there at USC, I was sitting in the computer lab one night.

03:04 And it was my sophomore year.

03:06 And I needed money and a job.

03:08 It was like midnight.

03:09 And an email came through from a place called JPL.

03:12 Not JBL.

03:13 The headphone place.

03:14 JPL.

03:15 Jet Propulsion Laboratory.

03:17 And a real nice gentleman, Dr. Rob Raskin, was looking for us computer people to help Earth scientists understand earthquake data and other things.

03:26 And so I went for an interview.

03:27 I'd never been for one before.

03:29 I'd interned at a company called iWin.com.

03:32 That was my only other experience basically building video games and Java.

03:36 Java applets for the 35 to 55-year-old demographic of people.

03:41 It was like online poker games during that era.

03:44 So, I mean, that was nice.

03:45 That was on the West Side.

03:47 You know, of LA.

03:48 I got to be near the Miracle Mile.

03:49 I learned where UCLA was.

03:51 As a USC person, you really need to learn that.

03:53 So you can really feel the rivalry.

03:55 But it was a beautiful area over there by Mid-Wilshire.

03:57 But anyways, JPL was a nice change.

03:59 It was closer to USC and where I lived at the time.

04:02 JPL is about maybe a 15-20 minute drive into northeast LA from downtown in USC.

04:07 And so, yeah, I got a gig at JPL.

04:09 And I was like a computer programmer.

04:11 I was doing Pearl, PHP, you know, other stuff.

04:16 Building websites and databases and MySQL for Earth scientists.

04:19 And so I did that for maybe, I want to say, as an academic part-time and eventually as an employee for three or four years.

04:27 And then I got sucked into the real hardcore Java community.

04:31 I was working with folks on technology projects for databases.

04:34 I even worked for a project.

04:37 JPL was doing work with the National Cancer Institute.

04:40 We were basically putting together data for cancer detection because a lot of the stuff we did for remote sensing could be applied to that.

04:46 And so, yeah, I mean, Java was big at the time.

04:49 And my trick was trying to figure out how instead of using C and C++ to build science missions, then I eventually started working on those.

04:57 I worked on a nurse science mission called Orbiting Carbon Observatory.

05:01 And I had to figure out how to use Java.

05:02 I wanted to use Java.

05:03 I refused to use C++.

05:04 Not that I didn't know it.

05:06 Right.

05:06 But I was like, God, you know, go ahead.

05:09 I was going to say, I've gone through these same stages.

05:12 Not with Java, but with .NET.

05:14 Just like, you know what?

05:15 I know I could do this as a C++, but I'm really over just all the pain and the hoops and the page faults and all the stuff that I just don't care about anymore, you know?

05:23 Oh, my God.

05:24 Same thing.

05:24 I had done nachos for operating systems.

05:26 That was the Berkeley, sorry, Stanford little tutorial project on how to do it.

05:30 I learned multiprogramming and memory management.

05:32 But who the hell wants to do that regularly?

05:34 And so I was like, Java does this for me.

05:37 It's being shoved down my throat.

05:38 Let me accept and assimilate.

05:39 So, yeah.

05:40 So my gig was my big thing I cut my teeth on and became kind of known at JPL was that I forced us to basically use Java to implement a ground data system for OCO, the Orbiting Carbon Observatory in 2005.

05:52 And so my trick there was, okay, all the prior Earth Science missions, say, took in 10 years, 10 gigabytes of data.

06:01 And they ran on order in terms of their processing, their daily processing for jobs to produce data, maybe tens of jobs per day to produce that 10 gigabyte record over 10 years.

06:14 And so OCO was basically, okay, we're going to take you into the realm of 10,000 jobs per day, daily workload.

06:21 And then it was going to generate 150 terabytes of data in the first three months.

06:26 And so I actually looked at the C++ system that we built before to do this.

06:31 And I was like, you know, it was tied to a database.

06:33 It was tied to, like, I don't want to say Oracle, but I think it was like, it was tied to Postgres.

06:38 It couldn't run without a database running.

06:40 All the configuration was in a database.

06:41 It was like single processor, everything else.

06:45 And I was like, this needs to be completely rewritten.

06:47 And at the time, I had hung around at USC to get a master's degree.

06:51 And I had a really inspirational professor there during my master's, Dr. Nino Medvedevich, who ended up becoming my PhD advisor, who got me into research.

06:58 And I hung around for the PhD as well.

07:00 And so in my PhD that I was doing at USC, I took a search engines class.

07:05 And I got really into this thing called NUTCH.

07:07 N-U-T-C-H, right?

07:09 N-U-T-C-H.

07:11 And it was a creation of a guy named Doug Cutting, who was the guy who created Lucene and eventually things like Hadoop and all that whole ecosystem.

07:18 And my professor at the time wanted us to do our final projects in NUTCH.

07:23 And so mine was a really simple syndication or RSS parser.

07:26 And by the way, for your listeners, I'm going to get to the Python part.

07:29 I just have to tell you this so you can make fun of me because I started out in big, big, hardcore Java.

07:34 So I got into NUTCH.

07:35 I built the RSS parser for feeds, for Nudes feeds.

07:38 That was my final project in my search engines class.

07:41 During my PhD.

07:42 And I contributed to NUTCH.

07:43 And I got involved in Apache, the Apache Software Foundation, where NUTCH had just moved to.

07:48 And I started talking to Doug and all the other developers.

07:51 And I became friends with them.

07:52 And I became a NUTCH committer.

07:53 The funny part was my academic cousin at UC Irvine, because it's all in the academia.

07:58 There's like, who's your advisor?

07:59 He's like your dad.

08:00 Who's your advisor?

08:01 Your grandpa.

08:02 You know, you got cousins and uncles on the academic side.

08:04 One of my cousins was Justin Ehrenkranz, who was the president of Apache.

08:08 And my academic uncle was Roy Fielding, who was the founder of Apache.

08:13 And so I had this sort of Apache connection without even...

08:16 As well as REST.

08:17 The whole REST idea, yeah.

08:18 The REST architecture.

08:19 Yep.

08:19 Roy's famous for that.

08:21 So UC Irvine during the mid-90s, which was like the decade before I was doing it, was

08:25 like the place to do software, you know, big software GUI and multi-architecture development

08:31 and architecture and stuff like that.

08:33 Dick Taylor ran the group.

08:34 And yeah, anyways, he had a bunch of all-stars.

08:36 He had the guy who invented Argo UML, the guy who invented WebDAV, the guy who invented REST,

08:41 the guy who did the component connector architecture status.

08:43 So those are my ancestors academically.

08:46 So yeah, so I'm doing search engines and whatever.

08:48 I've got this connection anyways to Justin and Roy who are telling me, get involved in

08:52 open source.

08:53 And I'm like, cool, okay, I could do that.

08:55 And so I started contributing to Nudge.

08:57 And Nudge eventually became Hadoop.

08:59 And so then Hadoop became Spark.

09:01 And so I'm in the ecosystem.

09:02 I'm playing, you know, I'm contributing.

09:04 I got into search engines.

09:06 That really became a passion because I was building these big data systems at JPL for mission science.

09:11 I was like, we need to use all this Java stuff and all this ecosystem and we need to scale

09:15 and do all that.

09:16 So the funny part was, here's the Python long-winded answer.

09:20 You asked me, how did I get into Python?

09:22 In about 2009, after I had led teams, done this myself, built three mission ground systems on Java.

09:28 And we proved all the naysayers wrong that said, you can't do it.

09:31 You know, it's got to be C++.

09:33 We did a ground system on Java.

09:35 I got tired of Java.

09:38 And Python was getting shoved down my throat then.

09:40 And that was like, everybody has, at some level in their career, one or two or three eras of their career.

09:46 And my second era beyond early programming was really Java.

09:50 And so then my third era was Python.

09:52 I mean, I started to get involved.

09:54 I went away from missions.

09:55 I got involved in technology development.

09:57 And I started going to government technology development.

10:00 And in the early 2010s, 2012, during the Obama administration, there was the Big Data Initiative.

10:05 And they funded a bunch of programs, $100 million investments and things in big data.

10:09 And I was toe-to-toe in some of these programs, standing alongside of them with people like Peter Wang, Travis Oliphant.

10:15 And I started to learn.

10:16 In fact, I even funded them during a program called DARPA Memex to grow what became what was Continuum Analytics at the time, a smaller company out of EnThought, to what became now Anaconda.

10:26 And so I started talking.

10:28 And then Peter's sitting there telling me, oh, Chris, screw Java.

10:31 You know, I was the Java guy, you know, sitting amongst all the Python fields.

10:34 I can just hear Peter saying that as well.

10:37 Yeah.

10:37 What are you doing?

10:39 You're messing around here, man.

10:40 Yeah.

10:40 You know, Peter's telling me about Bokeh.

10:43 And he's like, oh, you know, Numba and all this.

10:45 And I was like, you guys got all.

10:47 And you got your own foundation, NumFocus.

10:49 Andy Terrell and I became friends.

10:51 I got Andy to come talk at ApacheCon, which was great.

10:53 They never invited me to PyData or anything.

10:55 I don't know, you know, whatever to them.

10:57 But yeah, so I got involved in that.

10:59 And so I said, I can do this one more time.

11:00 I can go deep and learn.

11:02 And so really, probably circa 2013, the big thing for me was I created Tika.

11:07 We'll talk about that later.

11:08 But that was my big thing in Java besides Hadoop and all this.

11:12 And so in 2013, 2014, we ported that to Python.

11:15 And I did that with a guy named Brian Wilson at JPL.

11:18 And that was my, I can go deep and do Python and deliver something of big value to the Python

11:23 community.

11:24 And so, yeah, around then, that's how I got involved in Python.

11:27 That's how I'm there.

11:28 Now, today, I'm doing machine learning and other stuff.

11:29 And anyways, I don't want to dominate.

11:31 But yes, that's kind of the answer.

11:33 Well, two things.

11:35 First, to get a call from JPL out of the blue in your undergrad or master's degree program

11:41 and say, hey, why don't you just drop in and just do some work on like cutting edge space

11:45 just down the street?

11:46 Like that is incredible, right?

11:48 To get such an opportunity.

11:49 And I think that's really, that's really neat.

11:51 It's a big time opportunity, Michael.

11:53 And for me, the thing I like to tell people, I've learned a lot of JPL in my 20 years there.

11:57 And a lot of people look at me and go, God, you've been here for 20 years.

12:00 I said, yeah, I'm just sort of entering mid-career.

12:03 20 years at JPL is mid-career.

12:05 Well, I mean, it's like you want to run a mission.

12:06 You got to like, that's a 20-year commitment, a 15-year commitment sometimes.

12:10 Oh, dude, you're exactly right.

12:11 And so you can see at JPL, I tell people this, you can see the people that are going to be

12:16 there for five years.

12:17 And you can see some of the people that are going to be there.

12:19 And by the way, we like those five-year people too.

12:21 We'll get whatever we can.

12:22 We can talk about that later out of whoever.

12:24 Our mission is space.

12:26 And it hit me in 2003, 2004.

12:28 The big thing was the Spirit and Opportunity twin rovers.

12:31 You know what I mean?

12:31 The first three or four years of JPL, it's awesome.

12:34 And you're just like, but you're young and you don't know and appreciate space and everything

12:38 else.

12:38 And so I was like, oh, yeah, maybe I'll go work at a startup after this.

12:41 And it hit me in the MER rovers, Mars Exploration rovers, Spirit and Opportunity.

12:46 And they send them and they land.

12:49 And I saw the landing.

12:50 I stayed up at night.

12:51 I watched NASA TV.

12:52 It was just my wife and I at the time in our new house.

12:54 We bought our first house.

12:55 And I had bought my first 55-inch TV that if you compare to the TVs, the thin ones now, it

13:00 was like as big as my living room.

13:02 And I'm like, yeah.

13:03 It has its own cooling unit.

13:05 It's own cooling unit.

13:06 Oh, yeah.

13:07 And so we're sitting there.

13:08 We're watching it.

13:09 And then Arnold Schwarzenegger comes out.

13:10 He's the governor of California.

13:12 And he's shaking hands with my friends, some of my friends who I worked with on some of

13:15 this.

13:16 And I'm like, I know.

13:17 This is amazing.

13:18 And everybody.

13:19 And then it's like the JPL.

13:21 Yeah.

13:21 You know, when we landed.

13:23 Stuff like that.

13:24 And so, yeah, that's when I was like, oh, God, I worked there.

13:27 That's awesome.

13:28 Yeah.

13:28 And I knew I was going to stay there, you know.

13:30 Yeah.

13:31 Yeah.

13:31 Really cool.

13:32 This portion of Talk Python To Me is brought to you by Linode.

13:37 Whether you're working on a personal project or managing your enterprise's infrastructure,

13:41 Linode has the pricing, support, and scale that you need to take your project to the next

13:45 level.

13:46 With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise

13:51 grade hardware, S3 compatible storage, and the next generation network, Linode delivers the

13:57 performance that you expect at a price that you don't.

14:00 Get started on Linode today with a $20 credit and you get access to native SSD storage, a 40

14:06 gigabit network, industry leading processors, their revamped cloud manager at cloud.linode.com,

14:12 root access to your server, along with their newest API and a Python CLI.

14:17 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically

14:22 get $20 credit for your next project.

14:24 Oh, and one last thing.

14:26 They're hiring.

14:26 Go to linode.com slash careers to find out more.

14:29 Let them know that we sent you.

14:31 And then number two, you talked about going deep in Python after being in Java.

14:38 And I think it's really interesting to learn a language that has all these nuances and patterns

14:43 and it takes a while to master.

14:45 I think Java does.

14:45 And then you come to Python and it's a language that, you know, one of the jokes is, hey, I

14:50 learned Python.

14:51 It was a great weekend or something like that.

14:52 And yet I've been doing Python for a long time, eight, 10 hours a day.

14:57 I'm still learning Python.

14:58 Right.

14:58 So I think there's this really interesting distinction between I learned something and I learned it.

15:04 You know what I mean?

15:04 Like I really built something meaningful in it rather than, yeah, I know how to do loops

15:09 really well now.

15:10 Oh my God.

15:11 And that's how it starts.

15:12 I mean, the loops are okay.

15:13 We're going to start with those.

15:14 We're going to start with the basic constructs in our mind.

15:16 But yes, absolutely.

15:17 Like, like I tell people this, you know, only recently do I tell people, you know, you got

15:22 the three levels, beginner, medium and expert.

15:25 And I tell people, I say, I only recently became an expert in Python.

15:29 It took me a while because to me to become an expert, you need to go deep.

15:32 And so for me, and people still tell me today, Chris, your Python code looks like Java.

15:37 Damn it.

15:38 It's not PEP, you know, certified or PEPs or whatever the hell the PEP is.

15:42 And we're sitting on black on a pre-commit hook for you.

15:45 Well, you know, I use a lot of camel case still, you know, in naming stuff.

15:50 You know, I don't use the underscore, you know, whatever.

15:52 I say, shut up.

15:53 It works.

15:54 And, you know, and to be honest, so here's my trick though.

15:57 And maybe you'll agree with me on this, Michael.

15:59 This is true in any software development language.

16:01 Even though I don't do this anymore, my PhD is in software engineering and software architecture.

16:04 I studied software development.

16:06 And so for me, I tell people, you've succeeded when you cannot just publish something to open source and say, oh, I did it.

16:16 And it, but I built it.

16:17 I'm still the only guy maintaining it or gal.

16:20 And you've succeeded when not only you've put it out into open source, but you've convinced somebody else that it's worth their time to build on your library.

16:28 Yeah.

16:29 And to do something with it.

16:30 And to me, you've only achieved that level of mastery when you've become master of the programming language as well.

16:35 But also when some of that mastery is also knowing where to bend and break, how to configure, what sensible defaults are, and how to componentize something in a way that not everyone will love.

16:46 But, you know, you can get at least a few people to love.

16:48 And that's how you succeed.

16:49 Yeah, I totally agree.

16:51 And so much of, I built this thing the right way.

16:54 Right way often is somebody's perception of, like, their context and their use case.

17:01 Like, we built it the right way because we're Google and we have a million requests a second versus we built it the right way because we're a startup and we have 100 users, but we're growing.

17:10 Those two things should not look the same, probably.

17:13 You know what I mean?

17:14 Absolutely.

17:15 Best captured in the multidimensionality problem when you talk today about big data and they talk about the five Vs.

17:20 And so for a long time, people thought big data just meant volume or velocity.

17:25 Yeah.

17:25 But the reality is, like what you just said, it's a perfect context.

17:28 Like, hey, variety matters.

17:30 Veracity matters.

17:31 Value.

17:32 Your value stream matters.

17:33 And so that's the only thing.

17:35 And at Apache, we used to have these huge debates.

17:38 I became on the board of Apache.

17:39 I was hung around for a while.

17:40 They suckered me into doing it.

17:42 And I did it for five years.

17:43 And, you know, it was a great experience.

17:45 But I used to sit there in these, like, discussions with, like, the big data companies who...

17:50 Apache is a great place for them independently to kind of have a DMZ, build software together without wearing their company hat,

17:56 and to achieve some general framework consensus that doesn't disrupt the value stream for people downstream who want to do stuff commercially.

18:03 And I'm very supportive of Apache's mission and what they do.

18:06 And a lot of open source foundations.

18:07 I became friends with a lot of the founders of that during that time.

18:10 And so the thing I used to sit, though, is, like, you'd get the LinkedIn people who made Kafka initially and donated it to Apache.

18:17 And they'd have their way, like you said, it's like the Google way.

18:20 It's like, well, this could only be good if you test it on a million computers.

18:24 Or we're not going to accept this dude or gal's patch from, you know, this other country.

18:28 Because, you know, they tested it on their laptop.

18:30 And it doesn't pass our massive scalable test.

18:34 But I'm like, yeah, but it adds a feature.

18:35 Just do it in a branch.

18:37 Isn't that what source card control is for?

18:39 Get the person the value for contributing.

18:41 Don't let it sit in a ticket system forever.

18:43 Because then people go away.

18:44 And your real goal is to capture everybody's interest and contribution in the moment that they're interested in.

18:51 Either they've got the time and their free time, their company's paying them to do it.

18:55 And time is the thing that you learn is really the last precious resource.

19:00 And that's it.

19:02 Time, we're never going to get that back.

19:04 You can get a lot of other things back, but you're not going to get time back.

19:06 And that's the key to open source.

19:08 Yeah.

19:09 Jason Fried from the Basecamp 37 Signals crew has a great saying that inspiration is perishable.

19:17 Right?

19:17 Like, if you're currently super excited to add this feature to that thing, but then maybe you work on it, then the PR just sits there and gets ignored.

19:25 Like, you're done with that product.

19:26 Like, you're now done.

19:27 Right?

19:28 But you could have really done a lot of interesting stuff if they captured that two-week period or whatever where you were on fire about it.

19:34 Yeah.

19:34 I've got a good story real quick that I'll share with you and your listeners.

19:37 It's the Nutch Project.

19:38 So Nutch was dead post-Hadoop.

19:41 A lot of projects were dead post-Hadoop.

19:42 And the reason for that was that basically, when Hadoop came, everyone...

19:47 It was a new hotness and everyone wanted to go work on distributed systems.

19:50 They were like, oh, this is what Google did, MapReduce, and now it's open source.

19:53 Oh, boom.

19:54 And they all went to that.

19:54 And they all left us in Nutch.

19:56 And I was one of the people holding the bag after that.

19:58 And we had in our Jira and our whatever system...

20:03 I can't even remember.

20:03 I think we were still using ReviewBoard at the time.

20:06 But we were using ReviewBoard and Jira.

20:08 And we probably had 100 patches.

20:10 And we had from people that were just sitting there.

20:12 Some had been sitting for two or three years.

20:14 And basically, because the interest and the community of committers left that could actually

20:18 merge the patches, they just sat there.

20:20 And so we had one gentleman, I won't say his name, but he had been contributing probably

20:24 50 patches and still trying to get something in.

20:27 And he was running a web crawler company in the UK.

20:30 And finally, I was just like, you know what?

20:33 Because we had these standards in Nutch that had been imposed on us by the Dougs and the

20:38 whatever.

20:38 Doug's an amazing guy.

20:39 And I get it.

20:40 I get why we're doing it.

20:41 But what I did is...

20:43 The guy reached out to me, the guy in the UK, and he's like, are you ever going to merge

20:46 my patches?

20:46 Because this is BS.

20:47 And you're really making me hate open source.

20:50 And so what I did is I said, you know what, guys?

20:52 I know we have all these rules and blah, blah, blah.

20:54 But none of you around, none of you are doing anything anymore.

20:56 Here's what I'm going to do.

20:57 And I just merge all of his patches and figure out how to do it and get it in there.

21:03 Nutch is still alive today because of that.

21:05 Because we got that guy interested.

21:08 He pulled a couple of other people that he was working with in and is like, oh, God, the

21:12 floodgates are open.

21:12 We can develop again.

21:14 And then I just let them take their...

21:16 I haven't done anything in Nutch in years.

21:17 But the project's alive today because of that.

21:19 And so you've got to capture absolutely as a scarce resource, like you said, motivate.

21:24 What was that RSA animate video about purpose, motivation?

21:29 And that's such a great way to capture it, too.

21:31 It's the same thing.

21:32 Absolutely.

21:33 So these days, you're over at JPL, and you have some really interesting things going on

21:39 there.

21:39 You're the first principal scientist in the area of data science.

21:43 What's the story there?

21:44 So JPL has this thing called the principal designation, which basically is somebody that's normally

21:49 been there like 50 years.

21:50 And so, you know, I'm just joking.

21:52 No one kill me for that, please.

21:54 But it's been there for a long time.

21:55 And usually our principles are in...

21:57 We've got the founder of hyperspectral there.

22:00 Guy Rob Green.

22:01 You could argue he's the founder of the field of hyperspectral science.

22:04 We've got people who explore.

22:07 You know, we had the guy who used to be the project scientist for the Square Kilometer Array.

22:10 Huge billion dollar international project of ground-based sensing, looking at the cosmos and

22:15 answering the tough question.

22:16 That one has so much interesting stuff in terms of how much data it has.

22:20 I had some of the folks on from Australia on there to talk about that.

22:23 It's kind of my...

22:24 Like, you can't put it in hard drives.

22:26 It's so much data type of problem.

22:27 So, yeah.

22:28 Oh my God.

22:29 700 terabits per second.

22:30 I used to...

22:31 In the 2010 to 2015 timeframe, or 2016 even, that's all anyone ever wanted to hear from me

22:36 is that I had some peripheral involvement in that.

22:38 And they're like, talk about that, you know?

22:40 Yeah, yeah.

22:40 But we've got the guy that was the project scientist founder of that at JPL.

22:44 So, I mean, those are our principles usually.

22:46 And so, yeah.

22:47 So, in 2014, they gave me that title because they realized that in data science, a lot of

22:52 the stuff, data science was becoming something that basically we were developing a maturity,

22:57 a skill set, and a capability.

22:59 And JPL actually needed to go triple down and quadruple down on that.

23:04 And so, what it means, yeah, is that all the experiences they have, I talked to people

23:08 like your great podcast, Michael, and others.

23:10 I'd sit there and talk.

23:11 I needed to talk and evangelize to that at JPL.

23:14 And so, that was the recognition.

23:15 I was an individual contributor still then.

23:18 And I basically just would tell people, here's all the stuff with data science.

23:21 It's science.

23:22 It's math.

23:23 It's...

23:24 Around that time, I wrote a paper in Nature called A Vision for Data Science, too.

23:28 And people like me don't normally get papers in Nature.

23:31 Yeah.

23:32 That was a big deal.

23:33 And basically, I was thinking a lot about data science and how...

23:37 You'll like this.

23:38 I had this sort of dichotomy in education at the time.

23:41 I'm a PhD computer science software engineering person who, after about a decade at JPL, learned

23:46 hyperspectral remote sensing, why Western US water matters, cared about the cosmos and the

23:51 SKA.

23:52 And I even thought at times about getting master's degrees, but kids, mortgages and other things,

23:57 other interests got in the way, which is important.

23:59 But I was sitting there thinking, how many of me at JPL make it to me, time, being there

24:05 for 10 years?

24:06 And any software engineer that we hire five years at JPL before they learn the lingo, and

24:11 it's really hard unless they only live in IT.

24:13 Or what we were seeing at the time was an emergence of PhD atmospheric scientists or PhD

24:18 computational biologists or whatever who learned Python, believe it or not, could write code,

24:23 understood what logistic regression was, and whatever.

24:26 And we had this emerging class of them as data scientists.

24:29 They wanted to share their code.

24:30 They wanted to work with the software engineers.

24:34 As opposed to really, and this is going to sound ageist in a way, but it really isn't.

24:38 But as opposed to sort of the generation before who didn't want to share their data, they wanted

24:42 nine-month publication moratoriums.

24:44 To them, they wanted to file a patent instead of making open source code and things like that.

24:49 And there's still the evolution of those folks into the new generation today.

24:54 But I was looking at that sort of, I'll call supply chain in the education community for

24:58 data science.

24:59 And I was asking myself, what's better?

25:01 Is it the Python person, highly skilled in deep discipline domain, that their software

25:07 engineering code isn't that great?

25:08 But if we pair them with a master's or PhD level software engineer, they could clean that

25:12 up.

25:13 And then over time, they'll learn it.

25:14 Because like you said, everyone starts out with Python, but it doesn't mean anyone's going

25:18 to contribute to their code.

25:19 You know?

25:19 And so I was actually seeing more on the PhD atmospheric science side of people in Python

25:24 being more useful in data science.

25:26 And so that's really one of the questions I was asking in that nature paper.

25:29 And one of the things that I still don't have an answer to today, but we've seen different,

25:34 I'd say, momentums.

25:35 And it's not just at JPL, in how we source the talent.

25:38 And the same is true today with AI, AI engineers and things like that.

25:42 Yeah.

25:42 Well, I think one of the really interesting questions is, do we need more software developers

25:46 or do we need more experts with software development capabilities, right?

25:51 You hear the politicians and policymakers go on and on, like, we need to teach coding because

25:57 we have all these coding skill gaps and whatnot.

25:59 And I think often at that level, it kind of gets portrayed as, so what we need is more computer

26:04 science graduates, right?

26:06 But my theory is what we need is amazing biologists, physicists, doctors, lawyers who can take whatever

26:15 they do and really amplify it with a little bit of code.

26:18 And I think that's why Python is so powerful is Python is one of these special languages where

26:23 you can be effective with a very partial understanding of even what it is or what it does.

26:27 You don't even have to know how to create a function and you could be useful in Python.

26:32 A hundred percent.

26:32 A hundred percent agree.

26:33 That was the conclusion of my nature paper at the time.

26:35 And I'm trying to be diplomatic, but let me say something controversial maybe to generate

26:39 clicks.

26:39 Yeah.

26:41 I completely agree with you.

26:42 And it's heresy in my community where I originally come from, but I don't read transactions on

26:47 software engineering anymore.

26:48 I'm sorry.

26:48 I don't only stay in the software engineering community and computer science.

26:52 And so for me, I've noticed the same.

26:53 My direct experience in both building big software projects for big, big national and international

26:59 things and sourcing over hundreds of people at JPL and in consulting roles and other things,

27:04 I've come to the same conclusion.

27:05 I completely agree with you.

27:06 Yeah.

27:07 Yeah.

27:07 And that's not to discount computer science degrees.

27:09 I think it's, there's a real important role for good software developers with all the

27:15 practices in place there.

27:17 But I don't think we need 10 times of those.

27:20 I think the value would be better if we brought everyone sort of into that camp rather than growing

27:25 that in that isolated camp.

27:27 So, yeah.

27:27 Totally.

27:28 And you can relate it back to the Twitter hashtag campaign, the learn to code and why it generated

27:32 so many, I think, hate on both sides of the political aisle with that.

27:36 One of the challenges with that, let's talk about this.

27:39 We're going to have a big, there's going to be a big AI skill gap.

27:42 And people ask me about this all the time related to AI ethics and other things.

27:47 So anyways, for your listeners, after all the software development and Java and big data and

27:51 whatever stuff.

27:51 Now, I'm just a guy that keeps reinventing myself.

27:53 I got into AI and ML.

27:55 Last five years, I've been doing that pretty much a lot.

27:59 And so, yeah.

28:00 So I'm going to make a bold statement that's been said before.

28:03 So I've got air cover.

28:04 In the next year or two, a million truckers are going to be displaced because really smart

28:08 cars are here and smart trucks are here and they will drive.

28:11 And especially the pandemic is accelerating some of these things.

28:14 And so as that happens, the learn to code like, oh, we're going to take all the truckers

28:18 and make them software engineers.

28:19 I think I've developed a happy medium between that and what you and I just said.

28:23 We're not going to make them software engineers.

28:25 Let's glean SME knowledge from them.

28:27 They understand the business.

28:28 They understand the value stream.

28:30 Pair them with, say, domain discipline, non-computer scientists, but say people that want to model

28:36 the weather aspects of that, the computer vision aspects of that, the supply chain aspects of

28:41 that, that have that plus some Python code.

28:43 And I think those truckers still have jobs.

28:45 Yeah.

28:45 If you tell the truckers that the Chasm is becoming a Java or hardcore Python developer,

28:51 the Chasm is too big.

28:52 But if you tell them your job is to sit here, click on this tool, we'll capture your domain

28:58 knowledge and labels, and you're going to interact with really high power.

29:01 We could do that.

29:02 You know?

29:02 For sure.

29:03 Well, I think also the difference is, do you tell them your past 15 years of experience has

29:09 no value, go back to zero and learn programming and then go work for a marketing company versus

29:15 we're going to take what you're already really good at, what you actually have a ton of experience

29:19 and are uniquely qualified for.

29:21 And we're going to teach you a little code to change.

29:23 You're going to build the robots instead of, you know, be replaced by a robot.

29:27 That's exactly right.

29:28 So that's the win on all sides.

29:30 And that's the way it's going to move forward.

29:31 And I see some progress on that besides the initial, you know, everyone thinks Twitter is

29:36 real life or, you know, everything you read on the news is real life, you know, until you

29:40 live real life and you talk with people.

29:41 And so anyways, I've seen the people that are making progress in that area doing what you

29:45 just said.

29:46 I think that's really great.

29:47 And just for people who maybe are not aware, like you pulled out this trucker thing, like

29:52 that's a, maybe an odd example.

29:53 Truck driver is the biggest single job category for men in the United States, probably many

30:00 places in the world, but I only know the data for the United States, right?

30:03 That's, that's significant.

30:04 Absolutely.

30:05 And it's a big industry.

30:06 Actually truck drivers, you know, some people look at this, they learn about, say, you know,

30:10 law enforcement or, you know, and this, I mean, especially in the latest news and all

30:14 this, you know, this is a big deal and everything is like, Oh God, I didn't know.

30:17 You know, law enforcement made X, Y, Z money, or, you know, they think of it as sort of unskilled

30:21 labor or things like that.

30:22 A lot of folks, but they actually are well compensated for what they do.

30:26 And actually truck drivers, besides being that statistic, like, like you said, I learned something

30:30 new every day.

30:31 One thing I do know related to that is that they are also well remunerated for their services

30:36 and things like that.

30:36 And so the challenge, the other economical challenge is to say, Hey, truck driver, like you said, your

30:41 15 years of skills are gone.

30:42 And Oh, by the way, the maturity that you developed in those skills to achieve that salary,

30:47 you know, and things like that is also gone.

30:48 That's a bad lose on all sides.

30:51 Yeah, it absolutely is.

30:52 But I do think there's more positive fast forward.

30:54 So hopefully we can go that path.

30:59 This portion of Talk Python To Me is sponsored by Monday.com.

31:02 Monday.com is an online platform that powers over 100,000 teams daily work.

31:07 It's an easy to use, flexible and visual teamwork platform beautifully designed to manage any team,

31:13 organization or online process.

31:15 Now, for most of us, we missed our chance to build the first apps ever in the mobile app

31:20 stores.

31:20 It was a once in a lifetime opportunity.

31:22 It was a once in a lifetime opportunity, but it's one that's coming around again.

31:25 Monday.com is launching their marketplace and running a contest for the best new apps featured

31:30 right from the get go.

31:31 Want to be one of the first in the Monday.com apps marketplace?

31:35 Start building today.

31:36 They're even giving away $184,000 in prizes, including three Teslas, 10 MacBooks and more.

31:43 Build your idea for an app and get in front of hundreds of thousands of users on day one.

31:47 Start building today by visiting Monday.com slash Python, or just click the link in your podcast

31:53 player's show notes.

31:54 So let's talk about Python at JPL.

31:59 I think there's some interesting angles, especially around some of the remote stuff.

32:04 A lot of things you guys do work with like rovers.

32:07 You talked about spirit.

32:09 Just to have a conversation with those things is like it.

32:13 We complain about latency, you know, like that website was slow or I was playing this game

32:17 and it was hard because there was 200 milliseconds of latency.

32:19 There's different kinds of latency out in space, right?

32:22 When the speed of light is not enough.

32:23 So taking some of the smarts and putting it on like rovers and other stuff, some of this

32:28 AI work that you're doing, it sounds like it might have some legs.

32:32 Hey, I hope so.

32:33 And we think it does too.

32:34 So Michael, basically the work that we're doing for your listeners, we have a project

32:38 that we've been investigating now.

32:39 So let's fast forward the clock.

32:40 So rovers nowadays, the last one that landed on the planet, I won't say that we shipped

32:44 because we just shipped one, which we'll talk about called...

32:47 Right.

32:47 Just a couple of weeks ago or something, right?

32:48 We did.

32:49 Pandemic shipping and launching of rockets and rovers.

32:52 It's the new fad.

32:53 But yes, for pre that, pre-pandemic in 2012, we shipped the Mars Science Laboratory or the

33:00 Curiosity rover.

33:00 And that one is about...

33:03 So Spirit and Opportunity, just the size them for your listeners.

33:06 It's about the size, like if you have kids of one of those cars that you push, maybe, or

33:11 something like that, or maybe like a power wheel, big wheel type of thing.

33:13 That's the size of Spirit and Opportunity.

33:15 The MSL rover is about the size of a small car, like a Volkswagen bug.

33:21 And if you came to JPL and it was open and love to have you someday and things like that,

33:25 you could walk into our building 180 and see a full scale model of it to really get the feel

33:29 of it.

33:29 But that's the size rover that we're talking about now.

33:32 That's sort of the modern class of them.

33:34 And so 2020 is the Perseverance, the one we just launched.

33:37 It's the same size.

33:38 So we've got MSL still operating.

33:40 Spirit and Opportunity aren't anymore because they were solar powered.

33:44 MSL is powered basically by nuclear fission, uses an RTG power source and things like that.

33:51 So it doesn't have to worry about solar panels.

33:53 So it can go for quite a while and has been.

33:55 So it's a great test.

33:56 Basically, as long as it mechanically is still functioning, right?

34:00 Absolutely.

34:00 And so challenges with mechanical functioning are like, hey, we learned a lot about the

34:04 wheels from like a car size thing as we drove over walks and it tore the wheels up, you know,

34:10 and things.

34:10 And so we did.

34:11 We learned a lot about them.

34:12 If you look at one quick update in 2020 is that the wheels have little Homer Simpson speed

34:17 holes or not speed holes, but holes to prevent having just track and tread that dies by catching

34:23 on everything.

34:23 And that's just one thing we learned amongst other things.

34:26 We got smart engineers at JPL.

34:27 But MSL is a great platform to test stuff out on.

34:30 However, let's talk about AI and ML.

34:32 I'm going to dispel some myths and rumors.

34:35 So MSL and space assets and others, they all need, right?

34:39 We got to do computing.

34:41 We need a processor and a board and things like that.

34:43 They are running off of an old.

34:46 What is it?

34:46 The latest GPU?

34:47 Probably like a NVIDIA, like 2080.

34:51 Something like that.

34:52 Yeah.

34:52 Everybody thinks that.

34:53 And I know you're being facetious.

34:55 And that's why I like the snark.

34:57 It's awesome.

34:57 But yes, no.

34:58 And that's the challenge.

34:59 Everybody thinks that.

35:00 And it's not.

35:01 It's running off of a RAD 750, which is a BAA chip that's about as powerful as a PowerPC

35:06 chip and iPhone 1 processor.

35:08 And so, and why?

35:09 Real quick, why?

35:10 Right.

35:11 When we crash something in the government, we've got a congressional inquiry that we

35:15 have to respond to.

35:16 And when this commercial companies do it, and we love the commercial companies, we're partnering

35:20 with them now, they don't, right?

35:21 They can, I mean, not to say that it doesn't ruin their value stream or their reputation or

35:25 things like that, but they've got a little bit more flexibility to do testing and stuff

35:29 like that than we do.

35:30 And so we are risk averse by profile and definition.

35:33 And so because of that, we will only use things that are what we call radiation hardened.

35:37 Yeah.

35:38 Which means that when it gets up there into space, a space does in cosmic radiation, do

35:42 weird things to your hardware.

35:43 They flip the bits amongst, that's like the easy stuff they do.

35:46 They do a lot of other nasty stuff.

35:47 And so you got to make sure that the hardware works in space.

35:50 And so because of that, the technology, the Gartner lifecycle for what we could use, you

35:54 know, for that is real behind.

35:56 And so this big smart, I mean, this big potentially smart, you know, and it is smart.

36:01 They did great things on MSL and they're going to do even greater in 2020 is running off

36:05 of an old processor.

36:06 So all of the AI and ML is human in the loop, even more so.

36:10 Coupled with the fact that you alluded to.

36:12 Hey, you know, bandwidth, latency, you think that's an issue?

36:15 The light time from Earth to Mars is eight minutes round trip.

36:18 So anything you send to Mars, you got to wait eight minutes to figure out what the heck happened

36:23 or even what happened for your report back.

36:25 And you know, that's not, it doesn't all have to be synchronous.

36:28 There are asynchronous ways.

36:29 There are ways to kind of achieve some advantages and cue things up, but it's still, it's eight

36:33 minutes basically.

36:34 And so because of that, there's a great video on YouTube, by the way, for your listeners,

36:38 if you haven't seen it, it's called the seven minutes of terror.

36:41 It's really kind of closer to eight.

36:42 Yeah.

36:43 That's a great one.

36:44 Yeah.

36:44 That's for the entry, descent and landing.

36:46 When they landed MSL Curiosity, they had to use a big sky crane instead of the typical

36:51 big balloon, wrap the rover in a balloon and let it bounce, which was the way they did it

36:54 before.

36:54 It was so big.

36:55 They had to have this elaborate sky crane thing.

36:57 And in that seven minutes, when you go into entry, descent and landing, there are seven minutes

37:02 before you knew, Hey, what the heck happened?

37:04 And all this stuff had to happen, you know, autonomously and things like that, which is great.

37:08 But yeah, normally eight minutes.

37:10 And so if I told you today that the Mars surface operations people use about 200 images a day

37:17 that are taken from the rover from its nav cams, which are cams by the wheels and its mass

37:21 cam, which is the big head that takes selfies, you know, and other things that you see with

37:24 its arm.

37:25 If I told you that today they only use 200 images to plan what to do for rover operations the

37:31 next day, you'd understand why we're bandwidth limited.

37:34 We're limited on what we can process on the rover versus sucking them down to the ground and

37:38 making decisions.

37:39 What if I told you tomorrow, we'll get close to that Nvidia chip?

37:43 Maybe not exactly.

37:44 But there's efforts called high performance spaceflight computing to build a multi core

37:48 GPU like chip that is radiation hardened.

37:50 It's a big government project that has an emulator already and that they're making.

37:54 And that we also today have Mars helicopter on perseverance, which is a little drone that

37:59 went along with it that if successful, is running a Qualcomm Snapdragon, which is a GPU like

38:05 chip.

38:05 And why are we it's not fully radiation hardened and all this it does.

38:09 We've tested in whatever, but it's not like has the years and years of testing.

38:12 Why are we doing that?

38:14 Because it's a technology demonstration and we have a bigger like the mission is still successful,

38:18 even if Mars heli, you know, is not successful with that, which we call ingenuity.

38:24 Right.

38:24 And I suspect that the risk to a little drone helicopter thing, the highest risk is not

38:30 the hardware getting messed up from radiation.

38:32 It's like it got caught in the wind and it just crashed.

38:34 And well, that was nice, but it's upside down.

38:36 So there it goes.

38:37 Or there's just so many risks.

38:38 It's probably just considered mostly expendable.

38:40 Let's try this.

38:41 It's small and it'll be cool.

38:42 Exactly, dude.

38:43 And all the dads in the world just knew what we were talking about from their drones.

38:46 They were given a couple of Christmases ago.

38:48 But yes, like your drone is going to flip upside down.

38:50 It's going to do, you know, all the nasty stuff that you don't think about because driving

38:54 a drone is hard.

38:54 But apparently, you know, JPL is pretty good at all this stuff.

38:57 So we have a really good feeling this will be successful.

38:59 But yes, so those will have GPUs ish and we can do some stuff.

39:03 So a couple of things that your listeners and you might care about that we're doing.

39:06 So we have a task to do killer apps given the new GPU environment for future rovers.

39:11 One of them we call Drive-by Science.

39:13 We take Google's show and tell TensorFlow image captioning model, which I don't mean to just

39:18 jump into tech speak, but basically uses a labeler for object labeling, which is a head

39:23 that we call, you know, it could be ResNet 50.

39:26 It could be VGG 16, which is given an image, what objects are in it.

39:30 And then it uses...

39:31 That's a rock, that's sand, that's a ledge, that kind of stuff, right?

39:35 Bingo.

39:36 Don't go over the ledge, by the way.

39:37 That's the second part, which is the language model, which we use an RNN or an LSTM, a recurrent

39:45 neural network or a long short-term memory network to basically learn the language surrounding

39:51 those labels.

39:51 Ah, this rock is close.

39:53 Bedrock is far away.

39:54 This planar surface is, you know, there are three objects in this planar surface and things

40:00 like that.

40:01 And those are all language properties that an LSTM and an RNN, when you hook it to basically

40:05 an object detection network or a CNN, a convolutional neural network.

40:09 Basically, given an image, I can emit a human language sentence about it.

40:14 Now, what...

40:14 Or a caption about it.

40:16 Now, what's the value of that?

40:17 The value of that is that text is cheap.

40:20 It's a lot smaller than images.

40:22 And so instead of 200 images today on that very, very, very thin pipe, I can send you

40:26 a million captions if I can run the image captioning model on the rover.

40:30 If I can run it on a GPU or if I can run it on a heli and things like that drone asset.

40:36 So that's drive-by science.

40:38 That's make the rover not miss science when it's driving and we got to wait, you know,

40:42 those eight minutes.

40:42 Yeah.

40:43 And so that's very helpful tomorrow for people.

40:46 The other is also another image recognition problem.

40:48 It's called energy-aware optimal auto navigation.

40:51 You know, say that five times fast.

40:53 But the idea is that when the rover's motor, hey, even though it's RTG and things like that,

40:58 we still want to be...

40:59 It's got to sleep for a while and recharge, you know, and stuff.

41:02 And so we want to be efficient with its time.

41:04 And so when we do that, we want to know if in the distance, we're going to drive over sand

41:08 or we're going to drive over rocks.

41:10 Because if it's sand, just like your car, it's going to have to expend more energy from

41:13 the motor than it is if it's going to be rocks and the wheels are going to catch better.

41:17 And so we do image recognition with that using a similar network.

41:21 And then we develop a motor power profile from that based on what we're seeing.

41:25 Yeah.

41:26 So that...

41:26 Yeah.

41:26 Because if you say, I want to go inspect that canyon, go there.

41:30 It would be great if you could get there twice as fast.

41:33 That's exactly right.

41:34 And so today we've got car-sized rovers because that's what we got 50 million instruments.

41:38 I don't have 50, but we have 20 instruments or so.

41:40 It's a big laboratory that we want to do experiments on, you know, and stuff.

41:44 In the future, with Mars sample return and other profiles and missions, which I'll talk about,

41:49 MSR, that has something called the Fetch Rover.

41:52 What 2020, the current Perseverance Rover, is going to do as it's drilling cores and analyzing

41:58 stuff, it's going to drop tubes on Mars that a future mission, MSR, Mars sample return that

42:04 has the Fetch Rover, which is a much smaller rover that's going to have to drive way, way

42:08 farther if successful.

42:10 And now this is a collaboration.

42:11 So it'll go around and pick up these tubes and then return them to us.

42:14 Bingo.

42:15 And it's going to take the tubes.

42:16 It's going to take them to a spot, drop the tubes off, launch the tubes up into space to

42:21 an asset, and then take that asset back, you know, on the journey back to Earth and get

42:25 us samples from Mars.

42:26 It's an amazing mission.

42:27 Yeah.

42:28 We have a lot of the technology we didn't have 10 years ago when they envisioned this.

42:31 It will be successful.

42:33 And I'm going to go out there on a limb.

42:34 And so, but yes, those rovers, those rovers like the Fetch Rover, they need energy aware

42:40 optimal auto navigation.

42:41 They're going to need it.

42:42 It's going to be a much smaller rover, possibly RC car, not maybe that small, but maybe like

42:46 a bigger RC car type of thing.

42:48 It's going to be, or maybe two of them, it's that size rover.

42:50 It can drive much farther.

42:52 And so energy where optimal auto navigation is real big.

42:55 And so is drive by science, even though it's not going to have all the instruments on it,

42:58 visual, optical, things like that.

43:01 All the deep learning models that are built terrestrially today, we can leverage those.

43:05 So that's really cool.

43:06 The nuclear power versus solar power seems like something that's almost required.

43:11 If you're going to burn that much energy running high end GPUs ish, as you put them, right?

43:18 You don't want to burn up all your solar energy for the day, trying to figure out where you

43:22 should drive.

43:22 Yep, absolutely.

43:23 And so they estimate that.

43:25 So the trade on that is going to be how they pack that energy profile in there.

43:29 But there's a lot of smart, again, with all the advancements that are happening in

43:32 the smart car and other industry and things like this for how to do power in an efficient

43:38 way and different cell technologies and stuff like that.

43:41 We're not.

43:41 That tech is here today, which is good because we're in phase B of that, which is there's

43:48 an A, B, C, D, E, E's, you're operating the mission, A's pre-phase A and planning.

43:52 We're already in B.

43:53 We're partnering with the Europeans.

43:55 They're now building the rover on MSR.

43:57 We're working together.

43:59 And so, yeah, we're going to need all that.

44:01 Yeah.

44:01 Well, that sounds like a really cool project.

44:03 Do you see Python running on Mars potentially?

44:05 Or would this be like something lower level?

44:07 I think Python could run on Mars.

44:09 They told me 10 years ago, Java could never run on Earth to power a ground data system.

44:13 And we figured it out.

44:15 It did.

44:16 Yeah, it did.

44:16 Python will run on Mars.

44:18 So I guarantee you, so Python today, if we take the emulator stuff that we're doing and

44:23 also the stuff we're doing on the Qualcomm Snapdragon, Python is running in those environments

44:27 because we're using TensorFlow Lite.

44:29 The hardest part is the magic too is like, oh, we got all this deep learning, you know,

44:33 bring it into this embedded processor and then everything breaks, right?

44:36 Your model, you got to quantize the weights, you know, like you don't have the same floating

44:40 point units.

44:41 Wow.

44:42 What a pain.

44:42 And you know, Google knows this and they're making a lot of investments to make this easier

44:46 with the ecosystem.

44:46 So is Facebook and PyTorch.

44:48 All the places are NVIDIA.

44:50 They're figuring out they need to make that ecosystem process a lot easier for deployment

44:56 because everyone's just assumed capacious everything and infinite everything.

44:59 And that's fine.

45:00 That's where the value stream for these products came from.

45:03 Especially when they come from the cloud where it's just scales up and you know, we'll just

45:07 throw machines at it.

45:08 That's right.

45:09 And so yeah, that's the biggest challenge right now is, but yes, today Python is running

45:13 on HPSC emulator with TensorFlow Lite.

45:15 We've got it working.

45:16 We're running these models.

45:17 Basically, instead of using ResNet head and stuff like that, we're using like mini versions,

45:22 like mini models of that, like tiny ResNet and other things that have quantized, you know,

45:27 weights.

45:27 You trade accuracy.

45:28 But the real thing that this is driving, and this is a big research area I'm in right

45:32 now is what's the least amount of labeled data, how quantized weights and things like that

45:37 to still achieve?

45:38 And what are the limits of learning?

45:39 What accuracy do you need that you can trade that you can give some up on for that and still

45:44 achieve the results you're looking at?

45:45 You'd be surprised at some of the results.

45:46 Like you don't need these insane accuracies like you're better than human results.

45:51 You just need really good results and things like that.

45:54 Yeah, especially with what you're trying to accomplish.

45:56 It doesn't have to be completely accurate.

45:59 If it can just do better, it's a win.

46:01 That's right.

46:02 Talk Python To Me is partially supported by our training courses.

46:07 How does your team keep their Python skills sharp?

46:10 How do you make sure new hires get started fast and learn the Pythonic way?

46:14 If the answer is a series of boring videos that don't inspire or a subscription service you

46:20 pay way too much for and use way too little, listen up.

46:24 At Talk Python Training, we have enterprise tiers for all of our courses.

46:27 Get just the one course you need for your team with full reporting and monitoring.

46:32 Or ditch that unused subscription for our course bundles, which include all the courses and

46:36 you pay about the same price as a subscription once.

46:39 For details, visit training. talkpython.fm/business or just email sales at talkpython.fm.

46:49 Let me ask you just really quickly about what else you see Python doing in JPL that you think

46:54 is interesting you want to share.

46:55 And then I'd like to talk just like a couple of the other things that you were involved

46:58 in that I think were interesting.

47:00 So I think Python has real deep depth in data science at JPL.

47:05 Jupyter is now, you know, Jupyter notebooks is the lingua franca of the way that we share

47:10 data science with people.

47:11 I've got in my division.

47:12 So I lead the department now.

47:13 So I'm even, they took away all my keys to tech and I'm not supposed to do anything.

47:17 That's why I do these interviews with you and still do open source so I can do some tech,

47:20 you know, and be involved.

47:22 But my people that are doing the tech now, again, like when they show me something, they

47:26 show our stakeholders, something, business people, things, they jump into Jupyter.

47:30 And that used to be...

47:31 It's not like a PowerPoint picture of what they drew and then they could talk about the

47:35 code, but like here is the live thing, right?

47:37 They did.

47:38 Here's the live thing and they show it.

47:40 You know, we have demonstrations to partners and various industries and things.

47:44 We've got the space demonstrations and they're jumping straight into that.

47:47 They're showing these amazing visualizations, tabular statistics, and they've got the data

47:51 story.

47:52 And so one thing I tell all the people is you got to become visual storytellers nowadays.

47:56 You got to be able to communicate.

47:57 And Jupyter is perfect for that.

47:59 Pandas, the whole ecosystem, MapPlotlib even, but also Plotly and how you can embed like Bokeh

48:05 and some of these things.

48:06 Anyways, all that ecosystem for communicating data science is big.

48:10 And then there's such connection between Python to the cloud community and whatever too.

48:14 The funny joke back in the Peter Wang and Travis and I days and Andy Terrell was,

48:18 you still need us Java people because all your Python does are their thing clients to

48:22 our Java server.

48:23 Yeah.

48:23 That's not totally true today.

48:26 It hasn't been sort of fully supplanted.

48:28 I would say there are still some cases where that's still true, but Python has come a long

48:32 way in its ability to support big processing natively in Python.

48:35 And especially if you're doing deep learning, but even for more practical workloads and non-GPU

48:41 workloads and stuff, Python's come a long way.

48:43 And so there's still very, very deep, I would say, infection of Python into JPL, not just in

48:48 the specialized domains and disciplines, engineering and science, but we see it in IT.

48:52 That's where I live now in IT.

48:54 I used to be the deputy CTO, but now I'm division manager for AI and I report to the CIO and manage

49:00 these people.

49:00 But it's now infected Python into business.

49:03 It's infected into even some of our programmatic directorates.

49:07 Like you said, you got the managers, the solar system exploration managers that will jump into

49:12 a Jupyter notebook to show you something.

49:14 Right.

49:15 And so that's really told me that it's become really material.

49:19 You know, and again, we're not having the Python as good battles anymore.

49:22 And that's why I also believe Python will make it onto the rovers and into space, because at

49:27 that level, if it's sort of infected into that domain, all areas of your business, it really

49:31 is there.

49:32 And so I'm sorry, Java community.

49:33 I love you.

49:34 I still do stuff with Tika every now and then, but it's been a while, but I do Python now.

49:39 You know, I had this really weird experience long ago when I learned Python and I'd come

49:46 from C#, which is C++ and then C# and then Python with JavaScript sprinkled in.

49:51 And I think C# is a pretty good equivalent to Java in terms of like how the language looks

49:56 and works.

49:57 Thought starting a battle between those two things, right?

49:59 But I think like just compared to Python, like those are sort of coming from the same space.

50:03 When I went to Python, it was really weird to me that white space mattered, that there wasn't

50:08 like a curly brace type thing.

50:09 Like I didn't miss the semicolon.

50:10 That was fine.

50:11 Like an if statement didn't have parentheses around the condition.

50:14 Like, gosh, that is weird.

50:16 That is like almost every language I've worked with from scheme to C++ to C to they all, they

50:24 just, to me, that's what language meant, you know?

50:28 And like, it was my paradigm was this must be part of the programming languages, all these

50:32 sort of structural symbols, because everyone I worked with that was a real language was like

50:37 that.

50:37 And so I came to Python and it took a week or so to get comfortable.

50:41 And then I was like, I'm kind of okay with this now.

50:44 It's fine.

50:44 Like the editors are super smart.

50:45 So they kind of like build the structure for me anyway.

50:48 I hit colon and enter.

50:49 It'll auto indent.

50:49 This is great.

50:50 And then I went back and I thought it would be fine to work on some project that I'm still

50:54 working on.

50:54 And I'm like, why are these symbols here?

50:57 I used to think they had to be here.

50:58 Now I'm typing all this junk.

51:00 And literally, it's not necessary.

51:01 Like, it's literally not necessary.

51:03 Why am I doing this?

51:04 Right?

51:04 Like you could have the if statements without the parentheses just as well.

51:07 There's just so many things.

51:09 And I really like you talk about like, I don't do that much Java anymore.

51:13 And there's just like this comfortableness to work in this space.

51:16 It was unexpected to me, I guess.

51:18 100% same thing.

51:19 And the real evangelist for me, I want to give credit to him, was a guy named Sean Kelly,

51:23 who was a member of the Plone Foundation board.

51:25 And Sean actually has been a contractor engineer for us at JPL for a long time.

51:29 For your listeners, JPL had Larry Wall, the inventor of Pearl.

51:33 Actually, he was in my old section.

51:34 And when I was there at the time, but I was young and I didn't know him.

51:37 So for those of you.

51:38 But we had Sean Kelly, who was one of the founders of Plone, which largely, I'd say,

51:43 Plone is still developed, but other CMSs have come.

51:46 But Plone was like the thing.

51:47 It became a foundation for a while.

51:48 And Sean was heavily involved in Python and PSF and stuff like that.

51:52 And so his thing for me was, Chris, you got to get on this.

51:55 He had been telling me for years, you got to get on the Python drug and the same thing.

51:58 So I get on it.

51:59 And he's like, you won't miss all the stuff you just said, Michael.

52:02 You don't miss somebody calling you out.

52:04 So then I'm still co-developing a lot in Java and Python at the beginning.

52:07 And I'm going back and forth.

52:09 And I'm just like, oh my god, it is so bloated.

52:12 I can't stand this anymore.

52:14 Like, oh, you know, this is a waste for stupid curly braces.

52:17 And this is just a waste.

52:19 And then also, one thing that Python does is it almost makes you think more executive-like

52:24 in a way I think about that.

52:25 It's bullets.

52:26 The reason that tabs are there is it's like bullets.

52:28 It's indent this.

52:29 It organizes your thoughts in just a more natural way.

52:32 Whereas the structure in Java and other more verbose languages is imposed through you having

52:38 to sort of renumerate it.

52:39 But this is life.

52:40 I mean, over the years, what we have in an IDE now didn't exist before.

52:44 And actually, the way I learned to program, and maybe you, and I still think this is a

52:47 valuable skill, is I still go into VI, instrument the code.

52:50 I don't even use debuggers because they really sucked back in the day when I learned.

52:54 I mean, GDB was there and other things, but the debuggers weren't that great.

52:58 And so I still, guess what, people?

53:00 Instrumentation in any language and printing stuff out and values of your variables works

53:04 independent of any debugger.

53:05 And so people look at me nowadays, too, and they're just like, oh, God, you still do that?

53:10 I'm like, yeah.

53:10 And it works.

53:11 And I still managed to be hyper-efficient.

53:12 So some of these basic constructs, but again, the tooling, the IDEs, and all of that, they

53:17 eventually supplanted.

53:18 I think Python is a natural evolution of language.

53:21 Everyone tells me Julia is the next thing.

53:23 Actually, my PL people at JPL, my programming language wonks, who are just amazing.

53:27 Some of the best people in the world are like, oh, Julia, Julia, Julia.

53:29 And I'm like, it's great.

53:30 It came out of the DARPA X data program.

53:32 I was a part of that.

53:33 I know all the people at MIT that made it.

53:35 It's fantastic.

53:36 It's growing.

53:37 It is not Python.

53:38 Python has now become, it's not just scientific cool, it's enterprise.

53:41 And Python is that, I really think it's supplanted Java in many ways.

53:45 And Julia may be the next thing, but by then, I don't plan to be programming.

53:49 Sorry.

53:50 You know, I-

53:52 Maybe sitting on a beach somewhere, huh?

53:53 Sitting on a beach.

53:54 That's the goal.

53:55 Perfect.

53:56 All right.

53:57 Two other things I want to talk about.

53:58 We're getting short on time here, but just really quickly tell us about a couple of years

54:02 ago, I read this book called The Panama Papers, and it was rocking the world.

54:07 There are so many people who had been doing shady things through offshore companies and

54:12 whatnot.

54:13 And there was, I guess, someone on the inside that dumped gigs and gigs of data that exposed

54:19 a bunch of folks.

54:20 I don't remember the details well enough to go into them, but it was a big deal.

54:23 And some of your projects were involved in sort of the discovery of that, right?

54:29 Tell us about that.

54:30 Yes.

54:30 So the key was Tika.

54:32 And that was a really interesting time.

54:34 We were right in the middle of the DARPA Memex program, which was to build the next generation

54:39 of search.

54:39 And by then, I was mostly exclusively in technology development.

54:42 It was pre-IT.

54:43 I hadn't moved into IT yet.

54:44 And I was finishing out my career, really, in engineering and science and leading a team

54:48 of real rock stars, the best in the world.

54:50 Some of them who are building and have built Siri out at Apple in their future things right

54:55 now.

54:55 Some of them who have been bought by Apple for huge company values, talent buys, people

55:00 really that just went in and changed the world in search.

55:02 Every time I worked with these DARPA programs, which is why I love DARPA so much, and NASA

55:05 and DARPA work well together.

55:06 I look around the room and I'm like, oh God, I just fanboy geek out on all the people that

55:11 are there.

55:11 Yeah, I've worked on some DARPA project as well and have the same feeling.

55:14 Yeah, that's so awesome.

55:15 And so we should talk about that offline.

55:17 But so yeah, so I'm sitting there on Memex and Memex, my big goal was to build out Tika,

55:22 to evolve Tika for the next generation.

55:23 Not like full AI today, although we're in the process of really AI-ing Tika and its constructs.

55:30 But it was like the first step into AI beyond just statistical information retrieval, which

55:36 Tika does.

55:37 So what is Tika?

55:38 It's the digital babble fish.

55:39 The way I describe it is Tika is just like the babble fish from the Hitchhiker's Guide

55:43 to the Galaxy.

55:44 You put it to your ear and you can understand any language.

55:46 Tika, you give it any type of file format, any type of file that exists on the internet

55:50 that we know, 1400 plus file types and more.

55:52 Tika will extract out the text, extract out metadata from the file, and it'll tell you

55:57 information about the language of the file, which is basically everything you need to do

56:00 something with the data in the file.

56:02 And it incorporates all of the major third-party parsing libraries that are free and open source

56:07 to get that information out.

56:08 And it uses all the standard metadata models and this and that.

56:11 And so what we were doing during Memex is we were evolving Tika to support the non-standard

56:16 types of content, the easy text and the other things.

56:19 But when you get an image, instead of just getting metadata out, get out who's in there.

56:23 Do object recognition and tell me the people, places, things, dates, times, and other things

56:28 that are in those images, videos, and multimedia formats.

56:31 That was the goal of Memex.

56:31 And so we were significantly building out Tika at the time.

56:34 There was so much action going on Tika in the open source community.

56:36 Well, we had this guy, Matthew Karuna-Gaziglia, show up.

56:40 And I'm like, oh, you know, who's this guy?

56:42 And he had built NodeTika.

56:43 We had a bunch of people building Tika interfaces into other programming languages.

56:47 And so he did that.

56:48 He contributed it.

56:49 And we start talking to him.

56:50 And he starts asking us these questions and telling us he's part of the ICI4J, or International

56:54 Consortium of Journalists.

56:56 And we're like, cool.

56:57 We start looking it up.

56:58 And then, boom, Panama Papers drops.

56:59 And we're like, holy SH.

57:02 They used Tika?

57:04 That's why that guy was asking us all these questions, you know?

57:08 And all this.

57:09 And so what they did is, yeah, they got this data dump leaked from a company called Monseca,

57:14 which basically said, yeah, like the heads of state in various countries and famous actors,

57:20 your favorite ones, like mine, Hermione and Emma Watson, they all had money, you know,

57:24 in shady ways in these offshore accounts, and they hid their wealth there, you know?

57:27 And of course, that has all sorts of ethical and other implications.

57:30 So that was in these data files, which were leaked off of a content management site, 11

57:34 terabytes of data.

57:36 And the way that you did that is you do a big ETL, extract, transfer, and load process,

57:40 and do data forensics using tools like Tika.

57:42 And literally...

57:43 Right, right.

57:44 Tell me the important stuff about these gigs of files, and then we'll go through that.

57:47 Bingo.

57:48 What are the people, places, things, connections, and other things?

57:50 And if you look at the Wikipedia page, which was like one of the fastest Wikipedia pages

57:54 I've ever seen made, when that story came out, it's like it had all...

57:58 It was like the hugest Wikipedia page.

58:00 About midway down through, and I got a link to my page on Wikipedia from this, they say one

58:05 of the...

58:05 Probably the key technology, you know, in doing this was Tika.

58:08 And so, by the way, those journalists and others won the Pulitzer Prize in 2017.

58:12 And so I tell people, hey, I contributed to the Pulitzer Prize.

58:15 And so, yeah, that was a big, big deal.

58:17 And that...

58:18 Yeah, that's awesome.

58:19 Some people only know me for that.

58:20 And I had a very small hand in it.

58:22 But then again, I helped invent Hadoop.

58:24 And we found out many years later that that's what the NSA used to build Accumulo.

58:28 You know what I mean?

58:29 So I don't really care about all this stuff.

58:31 People ask me, are you ethically concerned?

58:33 My goal is to change the world and build the software that everyone uses.

58:36 You can't control for that.

58:38 What you can control for is that the world got better somehow in some way.

58:42 You can't control all the maybe actors and how they use it.

58:45 But I think the sum total of everything at the end, I'm proud of.

58:48 Yeah.

58:49 Well, this is definitely a check, you know, plus one for the good guys on the Panama Papers.

58:53 That's really cool.

58:54 And then last, maybe tell people about your book, your TensorFlow book.

58:57 Yeah.

58:58 So right now in this sort of era where I'm deep into AI about a year and a half ago,

59:02 I tell myself every year and a half to two, not that I get bored, but my wife lets me, you know,

59:07 do some fun project.

59:08 And she helps, does amazing with our three children, 11, five, and three, boy, boy, girl,

59:13 right in the thick of things.

59:14 And energetic ages there.

59:16 Yes.

59:16 Energetic and suffering from this pandemic like everyone else, but we're getting through it.

59:21 But yeah, it's about a year and a half ago.

59:23 So I was reading and I'm a Manning author.

59:25 So I wrote Tika in action 10 years ago.

59:27 And I saw I get access to the books and they tell me about the books.

59:30 And so there was a book on machine learning.

59:31 And I was like, you know, it was TensorFlow.

59:33 And it was machine learning with TensorFlow.

59:34 And I said, you know what?

59:35 I want to learn what my people are talking about.

59:37 There was heavy debates between TensorFlow or PyTorch or whatever.

59:40 And there hadn't been a Manning PyTorch book.

59:43 There was a TensorFlow one.

59:43 So I said, cool, let me read it.

59:45 So I'm reading it.

59:46 And when I'm reading it, I'm like, hey, I got to go deep.

59:48 Each one of these chapters that's in there, there's like a suggestion at the end on the CNN chapter.

59:53 There's like a bullet.

59:54 It's like, oh, and you could build a facial recognition system, but we're not going to do it.

59:58 You know, they author the first edition, amazing guy named Nishant Shukla, UCLA PhD, computer vision.

01:00:03 He's at a startup now.

01:00:04 He's doing great things.

01:00:05 I've communicated and corresponded with him during the reading of the book and also the development of this book.

01:00:09 You know, he kind of throws out there, oh, yeah, you can build a facial recognition system.

01:00:13 And so you go explore that, like, go look at VGG Face.

01:00:16 So I go explore it and I'm like, the data doesn't exist anymore.

01:00:19 You can't find this model anywhere.

01:00:21 And I have to rebuild all this stuff for scratch and get all the celebrity images.

01:00:26 So VGG Face is basically a celebrity image recognition model.

01:00:29 Actually, Google just came out with the product that's basically the commercial version of it.

01:00:33 But it's this sort of seminal model from 2015 that's like facial recognition using CNNs.

01:00:37 And so I start to go build it.

01:00:39 And like seven weeks later, staying up at night, you know, multiple hours at night, I'm like, God, this is like a graduate level program, you know, assignment.

01:00:45 It's a lot of work.

01:00:47 It's hard.

01:00:47 It's a lot of work.

01:00:48 So my wife sees me doing this over like a nine month time span.

01:00:52 And I'm like, oh, my God, AI is real.

01:00:55 I know what Elon's talking about, you know, or this and that.

01:00:58 And I'm like, and so I had at the end of nine months, basically a ton of Jupyter notebooks, a ton of data sets.

01:01:04 I had built a ton of end to end examples everywhere that he threw a bullet suggestion at the end of a chapter where he introduced something.

01:01:10 I implemented basically a new chapter and a half for that.

01:01:13 And so that's 10 machine learning with TensorFlow 2 or second edition.

01:01:17 That's my book.

01:01:18 It uses TensorFlow, but I will say it's TensorFlow and friends.

01:01:21 Yeah.

01:01:22 So, you know, some people have told me it's not fully updated to the latest TensorFlow 2X or whatever.

01:01:26 So what I did is I did my typical Chris thing with that.

01:01:28 I got the head of AI at Google, Scott Penberthy, a buddy of mine, basically write the foreword.

01:01:33 And in the foreword, he talks about, look, in the time it took Chris to wrote this book, we released 20 versions of TensorFlow 1X.

01:01:40 And we also started two.

01:01:41 You can't chase the versions.

01:01:43 All of the material knowledge of doing data cleaning preparation, building these models, evaluating them.

01:01:49 None of that changes.

01:01:50 If the train step changes or if you use declarative programming versus imperative, or if you don't use placeholders anymore,

01:01:55 those are implementation details that could be swapped out in a few lines of code.

01:01:59 But what I really do, it's now 450, almost 500 pages.

01:02:02 It's going to be released in a month.

01:02:04 I just finished all of the chapters, Michael, and I just finished.

01:02:07 I just went to three-thirds review.

01:02:09 It's about to go to production.

01:02:10 Awesome.

01:02:11 And just, by the way, people listening, due to time travel of podcasting, this will be just about the time.

01:02:17 It should be just about out or just released a week or two ago.

01:02:20 It's going to be great.

01:02:22 Please check it out.

01:02:23 It really adds and supplants the first edition of the book.

01:02:27 Thank Nishant for doing it and so forth.

01:02:29 But it really is a book on machine learning and deep learning and things like that and how to do it.

01:02:34 It could be a textbook.

01:02:35 But it's written in my dad type of style and jokey, funny.

01:02:40 It's got a lot of my personality in there.

01:02:42 I hope you guys like it.

01:02:43 But that's the book.

01:02:43 Yep.

01:02:44 Awesome.

01:02:44 Yeah, well, people should definitely check it out, especially if they're interested in TensorFlow.

01:02:48 It sounds like a great one.

01:02:49 All right.

01:02:50 Well, I think we're just out of time, even though I know we've only scratched the surface.

01:02:54 So let me ask you the final two questions before I let you out of here.

01:02:57 You're going to write some Python code?

01:02:58 What editor do you use?

01:03:00 I'm old school, so I will jump into Emacs or I'll jump into Jupyter.

01:03:04 And I write a lot of Jupyter to start because I'm doing a lot of exploration.

01:03:08 And then if I go like a script, I'm jumping into Emacs.

01:03:11 Okay, cool, cool.

01:03:12 And then a notable PyPI package.

01:03:15 You're not necessarily the most popular thing that people...

01:03:18 But something you found, like, oh, this is really cool.

01:03:20 People should know about X.

01:03:21 Oh, that's a good one.

01:03:23 Near and dear to my heart right now are, let me complain about something first, and then I'll

01:03:27 do negative and then positive.

01:03:28 So, God, why does the Yahoo stock quote thing have to change every few years?

01:03:33 I was using Y stock quote after the old one broke.

01:03:38 Now it's like...

01:03:39 And then just all of a sudden that broke.

01:03:40 And now I got to use, you know, Y Finance and like it changes just ever so slow.

01:03:44 So that bugs me.

01:03:45 But thanks to the guy who wrote Y Finance and updated that to whatever, because I think Yahoo's

01:03:49 upstream stuff breaks.

01:03:50 But I do a lot of...

01:03:51 I mess around in finance and quant and stuff like that just for fun.

01:03:54 And so, you know, anyways, that's one that kind of bugs.

01:03:56 One that I really like is TQDM.

01:04:00 That's...

01:04:01 Yeah, I just love TQDM.

01:04:02 And so it's basically progress bars and it's instrumentation over iterators.

01:04:06 Nice progress bars.

01:04:07 Yeah.

01:04:08 Amazing progress bars.

01:04:09 Beautiful Jupyter progress bars.

01:04:11 Beautiful command line progress bars.

01:04:12 All the stuff that you make lame progress bars.

01:04:15 If you just try and do it with print statements yourself.

01:04:16 Use TQDM.

01:04:17 So...

01:04:18 Yeah.

01:04:18 All right.

01:04:19 Very, very cool.

01:04:19 All right.

01:04:20 Good recommendation.

01:04:21 And final call to action.

01:04:22 People...

01:04:23 I guess I'll give you two angles here.

01:04:25 One, people want to get started and do more with TensorFlow.

01:04:27 What do they do?

01:04:28 Or maybe they're interested in space and JPL and they want to get closer to you guys in some

01:04:33 way.

01:04:33 How about that as well?

01:04:34 You want to get started with TensorFlow?

01:04:35 TensorFlow?

01:04:36 So...

01:04:36 Spend 90% of your time not using TensorFlow or any ML toolkit and create a clean data set.

01:04:42 In Pandas, make sure you have clean labels.

01:04:45 Make sure it's a tabular structure.

01:04:47 That's the biggest mistake.

01:04:48 Or not even a mistake, but challenge that I see people doing.

01:04:51 They're using imbalanced classes.

01:04:53 It's ugly data that hasn't been cleaned.

01:04:54 Machine learning today still requires clean data.

01:04:57 And so you do that, you can bring it into TensorFlow, PyTorch, anything.

01:05:01 Just clean your freaking data and spend the time doing it and get a good data set.

01:05:05 Second, how do you get started at JPL and NASA?

01:05:08 Look at our open source and other code.

01:05:10 We have a big open source library on GitHub.

01:05:12 Look at some of our projects, read our press releases, and then connect with me.

01:05:16 Connect with...

01:05:17 A lot of our people are on LinkedIn.

01:05:18 Connect with us.

01:05:19 The best way to do it is to come not just saying you have XYZ skills, but you've actually

01:05:24 read about a project like you've done or the research, Michael, and you know about some

01:05:27 of this stuff and you have specific areas you want to contribute.

01:05:29 JPL has an amazing internship program, 800 people a year.

01:05:32 We took 450 even during the pandemic virtually, and it's still going.

01:05:36 It's a great way to come in school and join like I did.

01:05:40 And then also we hire for full time and jpl.jobs is the best place to take a look at for that.

01:05:45 Oh, that's awesome.

01:05:46 Well, thank you so much for being on the show, Chris.

01:05:48 It's been great to chat with you.

01:05:49 Thanks for having me, Michael.

01:05:50 I really appreciate it.

01:05:51 Yeah, you bet.

01:05:52 Bye-bye.

01:05:55 This has been another episode of Talk Python To Me.

01:05:58 Our guest on this episode was Chris Mattman and is brought to you by Linode and Monday.com.

01:06:03 Start your next Python project on Linode's state-of-the-art cloud service.

01:06:09 Just visit talkpython.fm/Linode, L-I-N-O-D-E.

01:06:13 You'll automatically get a $20 credit when you create a new account.

01:06:16 Build your idea for an app and get it in front of hundreds of thousands of users on day one.

01:06:21 Start building today at the Monday.com marketplace by visiting monday.com slash Python.

01:06:27 Want to level up your Python?

01:06:29 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:06:34 Or if you're looking for something more advanced, check out our new async course that digs into

01:06:39 all the different types of async programming you can do in Python.

01:06:42 And of course, if you're interested in more than one of these, be sure to check out our

01:06:46 Everything Bundle.

01:06:47 It's like a subscription that never expires.

01:06:49 Be sure to subscribe to the show.

01:06:51 Open your favorite podcatcher and search for Python.

01:06:53 We should be right at the top.

01:06:54 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:06:59 and the direct RSS feed at /rss on talkpython.fm.

01:07:03 This is your host, Michael Kennedy.

01:07:05 Thanks so much for listening.

01:07:07 I really appreciate it.

01:07:08 Now get out there and write some Python code.

01:07:10 I'll see you next time.