WEBVTT

00:00:00.001 --> 00:00:03.740
Machine learning and data science are full of best practices and important workflows.

00:00:03.740 --> 00:00:06.300
Can we extrapolate these to our broader lives?

00:00:06.300 --> 00:00:11.600
Eugene Yann and I give it a shot on this slightly more philosophical episode of Talk Python To Me.

00:00:11.600 --> 00:00:16.560
This is episode 309, recorded March 19th, 2021.

00:00:16.560 --> 00:00:36.200
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:00:36.200 --> 00:00:38.000
This is your host, Michael Kennedy.

00:00:38.000 --> 00:00:40.200
Follow me on Twitter where I'm @mkennedy.

00:00:40.200 --> 00:00:43.920
And keep up with the show and listen to past episodes at talkpython.fm.

00:00:44.040 --> 00:00:47.080
And follow the show on Twitter via at talkpython.

00:00:47.080 --> 00:00:50.180
This episode is brought to you by Retool and Linode.

00:00:50.180 --> 00:00:52.240
Please check out what they're offering during their segments.

00:00:52.240 --> 00:00:53.500
It really helps support the show.

00:00:53.500 --> 00:00:58.280
We'll be giving away five tickets to attend PyCon US 2021.

00:00:58.280 --> 00:01:02.300
This conference is one of the primary sources of funding for the PSF.

00:01:02.300 --> 00:01:05.660
And it's going to be held May 14th to 15th online.

00:01:05.660 --> 00:01:09.400
And because it's online this year, it's open to anyone around the world.

00:01:09.400 --> 00:01:15.600
So we decided to run a contest to help people, especially those who have never been part of PyCon before, attend it this year.

00:01:15.600 --> 00:01:23.640
Just visit talkpython.fm/PyCon 2021 and enter your email address and you'll be in the running for an individual PyCon ticket.

00:01:23.640 --> 00:01:25.380
Compliments of Talk Python.

00:01:25.380 --> 00:01:28.300
These normally sell for about $100 each.

00:01:28.640 --> 00:01:35.820
And if you're certain you want to go, I encourage you to visit the PyCon website, get a ticket, and that money will go to support the PSF and the Python community.

00:01:35.820 --> 00:01:37.720
Congratulations to Ron Lee.

00:01:37.720 --> 00:01:40.260
He won number three of the five tickets that were given away.

00:01:40.260 --> 00:01:41.620
And there's still more chances to win.

00:01:41.620 --> 00:01:45.940
If you want to be in this drawing, just visit talkpython.fm/PyCon 2021.

00:01:45.940 --> 00:01:47.800
Enter your email address.

00:01:47.940 --> 00:01:49.920
You'll be in the running to win a ticket.

00:01:49.920 --> 00:01:52.040
Now let's get on to that interview.

00:01:52.040 --> 00:01:54.340
Eugene, welcome to Talk Python To Me.

00:01:54.340 --> 00:01:54.960
Thank you.

00:01:54.960 --> 00:01:55.380
Yeah.

00:01:55.380 --> 00:01:56.480
It's great to have you here.

00:01:56.480 --> 00:02:02.860
I love getting down into the details of programming and writing code and working with APIs and building amazing things.

00:02:02.860 --> 00:02:08.140
But it's also really interesting to sort of step back and take a big picture view of the world of the software.

00:02:08.140 --> 00:02:14.000
And you wrote a really interesting article about applying some of the lessons you might learn from code back to your life.

00:02:14.000 --> 00:02:15.500
And I really enjoyed the idea of it.

00:02:15.500 --> 00:02:17.360
So I'm looking forward to talking to you about it on this show.

00:02:17.760 --> 00:02:17.940
Thank you.

00:02:17.940 --> 00:02:19.260
Happy to chat more about it as well.

00:02:19.260 --> 00:02:19.680
Yeah.

00:02:19.680 --> 00:02:19.880
Yeah.

00:02:19.880 --> 00:02:20.840
It's going to be super fun.

00:02:20.840 --> 00:02:24.800
Now, before we get into the details of all that, you know, let's start with your story.

00:02:24.800 --> 00:02:26.320
How did you get into programming and Python?

00:02:26.320 --> 00:02:27.680
My degree is in psychology.

00:02:27.680 --> 00:02:36.820
So since then, I've been really interested in understanding, you know, how people behave, why they think the way they think, and how information changes their perceptions and behavior.

00:02:36.820 --> 00:02:37.840
You know, I'll do this.

00:02:37.840 --> 00:02:41.700
I used to run experiments and analyze the data in SPSS and Excel.

00:02:41.700 --> 00:02:44.340
But eventually, the data just got bigger and bigger.

00:02:44.340 --> 00:02:45.380
So I moved on to R.

00:02:45.380 --> 00:02:46.640
And now I'm using Python.

00:02:46.900 --> 00:02:48.280
So that's how it happened.

00:02:48.280 --> 00:02:54.020
I mainly use Python because I have problems to solve that require me to process data with Python.

00:02:54.020 --> 00:02:54.940
Yeah, really cool.

00:02:54.940 --> 00:03:03.040
One of my very first programming jobs was at this incredibly cool place where it started out as a research lab and then it spun out of the university to a startup.

00:03:03.040 --> 00:03:12.900
And the whole premise of what they did was to use eye tracking, not some phone thing, but like where you're looking to understand how people solve problems and how they think and so on.

00:03:13.000 --> 00:03:17.300
And it was mostly a bunch of PhD cognitive science folks.

00:03:17.300 --> 00:03:19.960
And they would work in MATLAB and Excel and all that stuff.

00:03:19.960 --> 00:03:25.260
And I would help write software that would take that stuff and turn it into products and turn it into automation and whatnot.

00:03:25.260 --> 00:03:27.000
And it's a really interesting world.

00:03:27.000 --> 00:03:33.320
And there's a lot more opportunities for code and solving problems with code, especially on the data analysis side in psychology.

00:03:33.320 --> 00:03:34.220
Fully agree.

00:03:34.480 --> 00:03:36.260
Then you might first think, right?

00:03:36.260 --> 00:03:38.300
You think, oh, psychology, that's talking to people on the couch.

00:03:38.300 --> 00:03:40.040
Like sometimes, but not always.

00:03:40.040 --> 00:03:41.500
Not a lot of the time.

00:03:41.500 --> 00:03:42.040
You're right.

00:03:42.040 --> 00:03:43.700
A lot of it is running experiments.

00:03:43.700 --> 00:03:47.300
And like, I'm sure they collected data about eye tracking and you have to process the data.

00:03:47.300 --> 00:03:49.480
And that happens in real life as well.

00:03:49.480 --> 00:03:53.000
We run experiments, A-B testing, and we have to analyze the data.

00:03:53.360 --> 00:03:54.300
Yeah, that was it.

00:03:54.300 --> 00:03:55.840
We would run tons of experiments.

00:03:55.840 --> 00:03:58.840
We would have maybe a week where 50 people came in.

00:03:58.840 --> 00:04:03.860
We'd have like one-way mirrors and all sorts of recording equipment and like analyze that.

00:04:03.860 --> 00:04:06.260
They'd even put ads, I think, in Craigslist.

00:04:06.260 --> 00:04:07.180
All sorts of places.

00:04:07.180 --> 00:04:10.500
Like, hey, we need somebody to come like surf on these websites for half an hour.

00:04:10.500 --> 00:04:11.560
We'll pay you 50 bucks.

00:04:11.560 --> 00:04:12.620
Can you come do that during lunch?

00:04:12.620 --> 00:04:13.680
Be like, yeah, I'll do that.

00:04:13.680 --> 00:04:14.600
That's really cool.

00:04:14.600 --> 00:04:15.720
Yeah, it was really fun.

00:04:15.720 --> 00:04:17.620
But it was really neat to do the programming and stuff there.

00:04:17.620 --> 00:04:22.200
So tell me a little bit more about that transition, because it must have been a little bit challenging, right?

00:04:22.240 --> 00:04:28.000
There's not a lot in your traditional education of psychology that teaches you programming.

00:04:28.000 --> 00:04:29.620
Maybe a little bit, but not a ton, right?

00:04:29.620 --> 00:04:34.680
No, actually, in my traditional education, for psychology, we use SPSS and R.

00:04:34.680 --> 00:04:37.180
SPSS is this IBM product proprietary.

00:04:37.180 --> 00:04:38.460
So nothing about it.

00:04:38.460 --> 00:04:43.440
I did know a bit about the statistics and how to work with data, but that's about it.

00:04:43.440 --> 00:04:50.960
So I think back then, when I first started getting my first job, I had to learn a lot of Python and SQL on my own.

00:04:51.120 --> 00:04:53.700
And, you know, that was the time when, you know, Coursera is available.

00:04:53.700 --> 00:04:55.380
And Coursera is a lifesaver.

00:04:55.380 --> 00:04:58.900
If you ask me, I learned all my Python and data science stuff from Coursera.

00:04:58.900 --> 00:05:06.440
So it was after work, spending one or two hours doing the courses and doing the lessons, the hands-on exercises were amazing.

00:05:06.440 --> 00:05:15.720
And today and nowadays, you have so much available Python resources to help you quickly pick up, get something working, iterate on something, fix the bugs, and, you know, have something that you can play with.

00:05:15.760 --> 00:05:17.900
And it just makes it so fun to learn Python now.

00:05:17.900 --> 00:05:18.320
Yeah.

00:05:18.600 --> 00:05:20.920
There's so much to know in programming these days.

00:05:20.920 --> 00:05:23.940
But at the same time, there's so many resources out there to help you.

00:05:23.940 --> 00:05:25.980
It's really a huge benefit.

00:05:25.980 --> 00:05:31.660
I remember when MOOCs came on, as they called them, like these courses with many, many, many people in them.

00:05:31.660 --> 00:05:33.500
And it was such a revolution at the time.

00:05:33.500 --> 00:05:36.040
And now it's just one of the million options, you know?

00:05:36.040 --> 00:05:37.020
Yeah, I agree.

00:05:37.020 --> 00:05:39.680
I think MOOCs really the great equalizer.

00:05:39.680 --> 00:05:41.120
I mean, education is the great equalizer.

00:05:41.120 --> 00:05:47.840
And MOOCs, by making it freely available across the internet with great educators, can just teach anyone and not confine to a classroom.

00:05:47.840 --> 00:05:48.780
I think that was amazing.

00:05:48.780 --> 00:05:54.720
And that was what helped me transition from SPSS and R to Python and Spark and machine learning.

00:05:54.720 --> 00:05:55.160
Yeah.

00:05:55.160 --> 00:05:56.440
It opened that door for me.

00:05:56.440 --> 00:05:57.020
Yeah, yeah.

00:05:57.020 --> 00:06:00.400
Instead of studying neurons, you're studying neural networks now.

00:06:00.400 --> 00:06:00.820
Yeah.

00:06:00.820 --> 00:06:01.460
Nice.

00:06:01.480 --> 00:06:02.200
So how about now?

00:06:02.200 --> 00:06:03.140
What do you do day to day?

00:06:03.140 --> 00:06:05.060
You just made a big life change.

00:06:05.060 --> 00:06:08.800
I was reading that you recently moved from Singapore to Seattle.

00:06:08.800 --> 00:06:11.840
That's a big change, even though not a whole lot in between those two things.

00:06:11.840 --> 00:06:13.140
If you draw a line, just a lot of water.

00:06:13.140 --> 00:06:15.840
But yeah, it's still a big change in terms of jobs and whatnot.

00:06:15.840 --> 00:06:16.740
Tell us about that.

00:06:16.740 --> 00:06:23.980
So I think towards the end of 2019, I think, or maybe in the middle of 2019, my wife and I were thinking of, you know, stepping off our comfort zone.

00:06:23.980 --> 00:06:28.140
I don't know how many of your listeners have been to Singapore, but it's a beautiful place.

00:06:28.140 --> 00:06:31.240
It's a very comfortable, tiny island, amazing weather.

00:06:31.620 --> 00:06:33.120
But we thought, you know, we like to travel.

00:06:33.120 --> 00:06:35.600
So we thought, hey, no, let's try to live somewhere for a bit.

00:06:35.600 --> 00:06:41.860
So we looked at a couple of places, a couple of tech hubs, Seattle, San Francisco, Berlin, Shanghai.

00:06:42.400 --> 00:06:46.340
And it so happens that I got an offer from Amazon.

00:06:46.340 --> 00:06:46.860
Nice.

00:06:46.860 --> 00:06:47.120
Yeah.

00:06:47.360 --> 00:06:48.220
So that's where I am now.

00:06:48.220 --> 00:06:49.680
I'm an applied scientist at Amazon.

00:06:49.680 --> 00:06:53.360
And so what I do, I'm part of the Kindle team.

00:06:53.360 --> 00:06:57.500
So what I do in my day-to-day job is I try to help people read more.

00:06:57.500 --> 00:07:00.840
And we try to do this by helping them find books that they need to find.

00:07:00.840 --> 00:07:08.320
So what I do is I work on recommendation engines, try to help people, you know, as you're finding a book or based on what you have read to the extent that you have read it.

00:07:08.520 --> 00:07:10.100
We know this because we have Kindle data.

00:07:10.100 --> 00:07:11.540
We know how much of a book we have completed.

00:07:11.540 --> 00:07:14.680
We can recommend you new books that you might be interested in.

00:07:14.680 --> 00:07:20.060
Or based on what you browse on the website, we can update our recommendations in real time and recommend you new books as well.

00:07:20.060 --> 00:07:22.340
So that's what I do in my day-to-day job.

00:07:22.340 --> 00:07:22.740
Yeah.

00:07:22.740 --> 00:07:23.820
Well, that sounds really fun.

00:07:23.820 --> 00:07:28.420
I actually really appreciate that from Amazon, specifically around the Kindle, to be honest.

00:07:28.420 --> 00:07:31.020
Like a lot of stores are like, oh, you might also like that.

00:07:31.020 --> 00:07:32.240
I'm like, no, I don't also like that.

00:07:32.240 --> 00:07:33.260
I really don't care.

00:07:33.260 --> 00:07:37.500
You know, there's just so many things where I'm shopping online or whatever, and it just doesn't make sense.

00:07:37.500 --> 00:07:41.460
But specifically for Kindle, I'll be reading a book and say you should also check out these other ones.

00:07:41.460 --> 00:07:44.420
Like a lot of times, like my next book is out of that list.

00:07:44.420 --> 00:07:45.160
I really love it.

00:07:45.160 --> 00:07:47.440
So can't go browse a bookstore these days.

00:07:47.440 --> 00:07:49.820
I mean, we have a fantastic bookstore called Powell's.

00:07:49.820 --> 00:07:54.300
It's like one of the largest bookstores around here in Portland, but you can't even go to it anymore.

00:07:54.300 --> 00:07:55.380
So, yeah, it's pretty cool.

00:07:55.380 --> 00:07:56.440
Thanks for that good list there.

00:07:56.440 --> 00:07:57.300
Yeah, you and your folks.

00:07:57.300 --> 00:07:58.420
I'm happy to hear that.

00:07:58.420 --> 00:08:01.760
And this is a great place where what I'm doing is very aligned with what I do,

00:08:01.860 --> 00:08:07.280
which is trying to help people, very aligned with my values, which is trying to help people learn more by reading more books.

00:08:07.280 --> 00:08:08.100
So that's great.

00:08:08.100 --> 00:08:08.980
Yeah, absolutely.

00:08:08.980 --> 00:08:10.980
And you talk a lot about writing.

00:08:10.980 --> 00:08:12.940
And well, I guess not just talk.

00:08:12.940 --> 00:08:16.780
You also put your time and energy where your words are, right?

00:08:16.780 --> 00:08:18.020
You do a lot of writing.

00:08:18.020 --> 00:08:23.680
And the reason that I wanted to talk to you about having you on the show, like I said, is one of these articles that you wrote.

00:08:23.680 --> 00:08:26.000
But there's a whole bunch of them that you have.

00:08:26.000 --> 00:08:33.860
So there's a couple of people who are really successful in the tech field that I can think of who really make writing an important part of what they're doing.

00:08:33.860 --> 00:08:38.140
Like I had Jesse Giroud Davis on the podcast at the time.

00:08:38.140 --> 00:08:38.740
He's from MongoDB.

00:08:38.740 --> 00:08:39.620
I think he still is.

00:08:39.620 --> 00:08:43.720
Anyway, he talks about like all these design patterns of technical writing, right?

00:08:43.720 --> 00:08:47.340
Like here's the how to, here's the first and like all these really interesting things.

00:08:47.340 --> 00:08:56.880
And I kind of get that vibe from you as well, that like you've got these really strong ideas about writing and how it reinforces your programming side of the world.

00:08:56.880 --> 00:08:58.240
What's your thoughts on that?

00:08:58.560 --> 00:09:02.940
Well, I guess maybe just to share a bit of backstory on why I started writing so much.

00:09:02.940 --> 00:09:08.100
I think a couple of years ago, I was interviewing a couple of mentors, informal mentors, right?

00:09:08.100 --> 00:09:13.100
They didn't know I was getting them to be a mentor, but I would just reach out to people who are two to three steps ahead of me.

00:09:13.100 --> 00:09:15.880
Heads of data science, CTOs, lead data scientists.

00:09:15.880 --> 00:09:17.740
I would ask them the same question.

00:09:17.740 --> 00:09:20.240
What makes an effective data scientist?

00:09:20.660 --> 00:09:29.300
And I would say, you know, is it understanding the business domain deeply or is it PhD research level skills, ninja hacking skills, Python or C++ or Java?

00:09:29.300 --> 00:09:34.420
And a lot of them said, actually, you know, all that is really important, but there's one thing that you're missing out.

00:09:34.420 --> 00:09:37.020
And this one thing is transferable across all your entire career.

00:09:37.020 --> 00:09:39.260
And that one thing that said is communication.

00:09:39.260 --> 00:09:41.360
At that point in time, I didn't believe it.

00:09:41.360 --> 00:09:41.920
I was immature.

00:09:41.920 --> 00:09:45.840
I didn't think it was, I didn't think it makes sense, but I thought so many of these people tried it.

00:09:45.840 --> 00:09:50.640
I owe it to them to, so many of these people mentioned, I owe it to them to try it and update it on my program.

00:09:50.640 --> 00:09:57.500
So for one year, I said, I'm just going to volunteer for whatever writing opportunity, whatever speaking opportunity the company has.

00:09:57.500 --> 00:09:59.860
So I started writing about my side projects.

00:09:59.860 --> 00:10:05.280
I started writing an internal company newsletter to share about what our team, our data science team was doing.

00:10:05.280 --> 00:10:07.420
And I started speaking at conferences and meetups.

00:10:07.420 --> 00:10:09.140
And so that's why I've been started writing.

00:10:09.140 --> 00:10:16.400
And since then, I found that writing actually helps me learn a lot because I would write about something and I realized, hey, you know, I don't know anything about this thing I'm writing about.

00:10:16.400 --> 00:10:17.300
Yeah, exactly.

00:10:17.300 --> 00:10:23.520
You think, you know, you've learned enough to sort of have some thoughts or, you know, in the programming space, you can maybe get something to work.

00:10:23.520 --> 00:10:24.000
Right.

00:10:24.000 --> 00:10:29.340
But if you've got to explain it, all of a sudden, it's not enough to know, well, there's two ways and I'll just use that way.

00:10:29.340 --> 00:10:30.920
You've got to know, well, here's the two ways.

00:10:30.920 --> 00:10:32.040
What are the trade-offs?

00:10:32.040 --> 00:10:33.420
When should I use which?

00:10:33.420 --> 00:10:38.480
Like you've got to dive into these sorts of details when you're either writing or speaking or presenting about it.

00:10:38.540 --> 00:10:43.200
And it just forces you another level down in your understanding and depth, right?

00:10:43.200 --> 00:10:43.820
Exactly.

00:10:43.820 --> 00:10:45.080
Writing is really difficult.

00:10:45.080 --> 00:10:48.620
And truth be told, I think a lot of people say, you know, you must really love writing.

00:10:48.620 --> 00:10:51.880
No, actually, I don't really love writing, which is a bit weird.

00:10:51.880 --> 00:10:56.040
I love learning and I love sharing about what I've learned online.

00:10:56.040 --> 00:10:58.040
And writing is my vehicle to let me do that.

00:10:58.040 --> 00:10:59.780
So that's why I started writing.

00:10:59.920 --> 00:11:03.200
So why do I talk a lot about writing at work or writing in general?

00:11:03.200 --> 00:11:05.320
There's this question I ask myself a lot.

00:11:05.320 --> 00:11:08.300
I used to ask myself this question and now a lot of people ask me this question.

00:11:08.300 --> 00:11:13.320
Hey, you know, as a data scientist, is your job to write code or is your job to write documents?

00:11:13.320 --> 00:11:17.460
And I used to think my job was to write code, to build systems, to help customers.

00:11:17.460 --> 00:11:21.960
But then as I'm doing this more, I find that, hey, you know, a lot of times before writing code,

00:11:21.960 --> 00:11:25.440
I need to spend a lot of time thinking, researching and designing.

00:11:25.440 --> 00:11:28.980
And the medium of that work is writing documents.

00:11:29.520 --> 00:11:32.240
That's why I encourage a lot of people, you know, it's just like what you said.

00:11:32.240 --> 00:11:37.700
Do I serve my recommendations via Redis cache or via Lambda or API Gateway or whatever?

00:11:37.700 --> 00:11:40.760
And I'm thinking through all these designs and I need to make trade-offs.

00:11:40.760 --> 00:11:45.680
And writing in a document the pros and cons and the rationale for the decision forces me to do that.

00:11:45.680 --> 00:11:50.700
So that when we start implementing things, we don't do things the wrong way,

00:11:50.700 --> 00:11:51.820
which is really expensive, right?

00:11:51.820 --> 00:11:52.860
Implementation is expensive.

00:11:52.860 --> 00:11:56.300
So that's why, and I find that it has really helped me a lot.

00:11:56.300 --> 00:11:59.000
I don't feel that enough people talk about it.

00:11:59.120 --> 00:12:01.060
And I want to encourage people to write a lot.

00:12:01.060 --> 00:12:02.580
Well, yeah, I agree.

00:12:02.580 --> 00:12:09.180
And it seems to me like one of the fundamental jobs of a data scientist, at least one branch of data science,

00:12:09.180 --> 00:12:14.720
is to take the raw information, think about it, and then communicate what it means, right?

00:12:14.720 --> 00:12:18.140
And that seems to go really hand in hand with what you're saying.

00:12:18.420 --> 00:12:18.780
Exactly.

00:12:18.780 --> 00:12:22.720
I think I was doing this one year of writing and speaking practice, right?

00:12:22.720 --> 00:12:23.120
At work.

00:12:23.120 --> 00:12:31.540
And then in the past, I had people come up to me and say that, you know, Eugene, every time I have a meeting with the data scientist on your team for half an hour,

00:12:31.540 --> 00:12:33.340
and then I work with not knowing anything.

00:12:33.340 --> 00:12:34.640
I don't know what they spoke about.

00:12:34.880 --> 00:12:41.200
And that's because I think, by and large, most people, you know, we tend to use jargon like AUC, ROC, distributed, all that.

00:12:41.200 --> 00:12:43.360
And we assume that business people will know it.

00:12:43.360 --> 00:12:45.240
But I made this mistake as well.

00:12:45.240 --> 00:12:50.240
And then one day, I had a great boss who brought me aside, you know, Eugene, the way you're communicating, no one understands you.

00:12:50.240 --> 00:12:51.660
And I asked, what do you mean?

00:12:51.740 --> 00:12:53.560
And he gave me very good feedback.

00:12:53.560 --> 00:12:54.520
He was a great boss.

00:12:54.520 --> 00:12:57.740
And I started changing how I communicated things.

00:12:57.740 --> 00:12:59.940
And that made me a lot more effective at work.

00:12:59.940 --> 00:13:02.220
So that's how I get started on that.

00:13:02.220 --> 00:13:03.000
Oh, fantastic.

00:13:03.000 --> 00:13:05.800
Well, I really like what you've done with the writing and stuff.

00:13:05.800 --> 00:13:12.380
And so maybe let's spend some time diving into the one that I think is probably the centerpiece of what we'll talk about.

00:13:12.380 --> 00:13:21.720
We'll touch on a few other ones because you do a really good job in your writing of not just putting down your thoughts, but bringing other people's ideas and influences in there.

00:13:21.720 --> 00:13:22.780
So you have a lot of quotes.

00:13:22.780 --> 00:13:26.260
You have a lot of references to other things and so on.

00:13:26.260 --> 00:13:28.480
So I really like the way, the style.

00:13:28.480 --> 00:13:33.000
So let's talk about your article, What Machine Learning Can Teach Us About Life?

00:13:33.000 --> 00:13:33.820
Seven Lessons.

00:13:33.820 --> 00:13:34.960
Why did you write this?

00:13:34.960 --> 00:13:39.300
Because when I was writing some of my previous articles, okay, this is an odd thing.

00:13:39.300 --> 00:13:41.640
I know that my audience are machine learning practitioners.

00:13:42.140 --> 00:13:47.680
And, you know, sometimes I write about things that sometimes I want to write about things that I know will not interest them.

00:13:47.680 --> 00:13:49.620
So this is one of those.

00:13:49.620 --> 00:13:51.300
I like to write about life lessons, right?

00:13:51.300 --> 00:13:54.220
And I know it will not interest them, but I think it's really important.

00:13:54.220 --> 00:13:55.380
I want to write about that anyway.

00:13:55.380 --> 00:14:00.700
So, you know, in order to sneak it in, in this case, machine learning is really just a Trojan horse.

00:14:00.700 --> 00:14:01.200
Yeah.

00:14:01.200 --> 00:14:05.860
Machine learning is a Trojan horse where I sneak in these life lessons that I found really helpful for me.

00:14:05.860 --> 00:14:09.080
And I just, upon some reflection, I just want to write about it.

00:14:09.080 --> 00:14:09.800
And that's it.

00:14:09.800 --> 00:14:11.100
That's how this article came about.

00:14:11.900 --> 00:14:13.380
I love these Trojan horse ideas.

00:14:13.380 --> 00:14:14.520
Like, oh, I'm going to teach you.

00:14:14.520 --> 00:14:15.660
I'll do something fun.

00:14:15.660 --> 00:14:16.840
I'll actually have a lesson, right?

00:14:16.840 --> 00:14:17.140
Yeah.

00:14:17.140 --> 00:14:18.060
And now the secret's out.

00:14:18.060 --> 00:14:18.360
Yeah.

00:14:18.360 --> 00:14:19.940
I haven't actually shared that with anyone yet.

00:14:19.940 --> 00:14:21.260
Now they're going to know.

00:14:21.260 --> 00:14:21.800
That's right.

00:14:24.180 --> 00:14:26.680
This portion of Talk Python To Me is brought to you by Retool.

00:14:26.680 --> 00:14:30.820
Do you really need a full dev team to build that simple internal app at your company?

00:14:30.820 --> 00:14:33.120
I'm talking about those back office apps.

00:14:33.120 --> 00:14:36.360
The tool your customer service team uses to access your database.

00:14:36.360 --> 00:14:39.500
That S3 uploader you built last year for the marketing team.

00:14:39.740 --> 00:14:47.840
The quick admin panel that lets you monitor key KPIs or maybe even the tool your data science team hacked together so they could provide custom ad spend insights.

00:14:47.840 --> 00:14:51.700
Literally every type of business relies on these internal tools.

00:14:51.940 --> 00:14:57.780
But not many engineers love building these tools, let alone get excited about maintaining or supporting them over time.

00:14:57.780 --> 00:15:02.240
They eventually fall into the please don't touch it, it's working category of apps.

00:15:02.240 --> 00:15:04.220
And here's where Retool comes in.

00:15:04.220 --> 00:15:10.720
Companies like DoorDash, Brex, Plaid, and even Amazon use Retool to build internal tools super fast.

00:15:11.260 --> 00:15:13.900
The idea is that almost all internal tools look the same.

00:15:13.900 --> 00:15:15.060
Forms over data.

00:15:15.060 --> 00:15:18.800
They're made up of tables, drop-downs, buttons, text input, and so on.

00:15:18.800 --> 00:15:27.600
Retool gives you a point, click, and drag and drop interface that makes it super simple to build internal UIs like this in hours, not days.

00:15:27.600 --> 00:15:30.420
Retool can connect to any database or API.

00:15:30.420 --> 00:15:32.260
Want to pull data from Postgres?

00:15:32.260 --> 00:15:35.020
Just write a SQL query and drag the table onto your canvas.

00:15:35.020 --> 00:15:39.380
Search across those fields, add a search input bar and update your query.

00:15:39.660 --> 00:15:41.440
Save it, share it, super easy.

00:15:41.440 --> 00:15:45.320
Retool is built by engineers, explicitly for engineers.

00:15:45.320 --> 00:15:50.540
It can be set up to run on-prem in about 15 minutes using Docker, Kubernetes, or Heroku.

00:15:50.540 --> 00:15:52.380
Get started with Retool today.

00:15:52.380 --> 00:15:58.240
Just visit talkpython.fm/retool or click the Retool link in your podcast player show notes.

00:15:58.240 --> 00:16:01.960
So let's go through the seven lessons.

00:16:01.960 --> 00:16:04.640
The first one here is data cleaning.

00:16:04.640 --> 00:16:06.180
Assess what you consume.

00:16:06.180 --> 00:16:10.840
And I'm a big fan of this idea as a life lesson as well.

00:16:10.840 --> 00:16:11.960
So what's the story?

00:16:11.960 --> 00:16:17.120
Wait, before you get into that, like every one of these lessons, you start by saying, okay, here's the machine learning meaning.

00:16:17.120 --> 00:16:20.020
And then here's the sort of follow-on lesson, right?

00:16:20.020 --> 00:16:21.920
So what's the machine learning lesson of data cleaning?

00:16:22.180 --> 00:16:31.720
I think the machine learning lesson of data cleaning is most machine learning practitioners would know when you use noisy data, your machine learning model is going to be noisy and it's just not going to work.

00:16:31.720 --> 00:16:34.180
I think that's this cliche, garbage in, garbage out.

00:16:34.180 --> 00:16:36.580
In machine learning world, that is absolutely true.

00:16:36.580 --> 00:16:36.920
Yeah.

00:16:37.180 --> 00:16:45.820
And cleaning the data itself is actually most of the work in terms of training your model, cleaning the data, refining it, making the model be able to learn from it.

00:16:45.820 --> 00:16:47.800
So that's really important in the machine learning world.

00:16:47.800 --> 00:16:48.700
Yeah, absolutely.

00:16:48.700 --> 00:16:50.900
You have a really interesting quote in here.

00:16:50.900 --> 00:16:53.700
And you also have some really fun pictures, which we can talk about.

00:16:53.700 --> 00:16:54.800
Where is it in here?

00:16:54.900 --> 00:16:59.280
It was Randy Owl shares, data cleaning isn't the grunt work.

00:16:59.280 --> 00:17:00.980
It is the work, right?

00:17:00.980 --> 00:17:08.820
I think so many of these things in a lot of these machine learning and like scientific programming sides, you have these amazing libraries, right?

00:17:08.820 --> 00:17:11.560
Like you can pip install in TensorFlow or whatever.

00:17:11.560 --> 00:17:14.400
And then you just feed the data, you know, you got your data frame, you feed it over, boom.

00:17:14.400 --> 00:17:15.820
And like magic happens, right?

00:17:15.820 --> 00:17:17.020
You've got to get that data.

00:17:17.020 --> 00:17:18.260
You've got to like format the data.

00:17:18.260 --> 00:17:19.240
You've got to convert the data.

00:17:19.240 --> 00:17:22.280
Like that's, you've got to understand that it's all correct, right?

00:17:22.320 --> 00:17:28.240
There's cool libraries, like great expectations that are like sort of unit tests for your data to make sure you don't feed in bad data and all that.

00:17:28.240 --> 00:17:30.320
But yeah, I mean, I agree with that statement a lot.

00:17:30.320 --> 00:17:30.860
That's pretty neat.

00:17:30.860 --> 00:17:31.320
Thank you.

00:17:31.320 --> 00:17:34.180
And I linked to Randy Owl's post, which is a very good post.

00:17:34.180 --> 00:17:42.300
I think he wrote that, of course, I might be putting words into his mouth, but I think he wrote that because a lot of people think that data preparation is not sexy.

00:17:42.300 --> 00:17:43.720
Their cleaning is just grunt work.

00:17:43.720 --> 00:17:47.400
But, and he's trying to get people to remind people, no, actually it is the work.

00:17:47.400 --> 00:17:51.680
It is what actually makes a big difference in your analysis outcomes, in your machine learning outcomes.

00:17:51.900 --> 00:17:52.520
And I fully agree.

00:17:52.520 --> 00:17:53.460
Yeah, absolutely.

00:17:53.460 --> 00:17:53.880
All right.

00:17:53.880 --> 00:17:57.680
So life lesson from this one, what's the parallel life lesson you were trying to draw?

00:17:57.680 --> 00:18:02.800
Well, the main life lesson that I was trying to draw is actually the one below that image.

00:18:02.800 --> 00:18:03.380
Okay.

00:18:03.380 --> 00:18:09.120
So you have a really fantastic image and you talk a little bit about food first before you get into what I think is more important.

00:18:09.120 --> 00:18:10.760
Although food is not unimportant.

00:18:10.900 --> 00:18:16.820
But we've all seen these horrible pictures of like the 50s and 60s of doctors recommending cigarettes.

00:18:16.820 --> 00:18:19.260
Like my doctor recommends Camel, not Marlboro.

00:18:19.260 --> 00:18:20.820
You're like, oh my God, what is this?

00:18:20.820 --> 00:18:26.640
But there's like this really similar one for sugar of like all these like, here's like, how do you, you don't get overweight.

00:18:26.640 --> 00:18:27.360
You eat your sugar.

00:18:27.360 --> 00:18:28.500
So you don't eat fatty food.

00:18:28.500 --> 00:18:29.100
Oh my gosh.

00:18:29.620 --> 00:18:32.780
Anyway, so yeah, that's one is food, which is really interesting.

00:18:32.780 --> 00:18:34.680
But then more importantly is news, right?

00:18:34.680 --> 00:18:35.360
And information.

00:18:35.740 --> 00:18:38.460
So if you could just scroll up to the image, just in this image, right?

00:18:38.460 --> 00:18:43.320
We see that food is important and bad food actually makes you bad, makes you unhealthy.

00:18:43.320 --> 00:18:47.800
But in this image, this image is both bad food and bad information.

00:18:47.800 --> 00:18:52.120
There's so much bad information out there on Twitter, on Facebook or social media.

00:18:52.120 --> 00:18:55.340
And you really have to be careful about what you consume, right?

00:18:55.340 --> 00:18:56.040
Misinformation.

00:18:56.040 --> 00:19:04.320
I mean, it is really easy to consume, you know, this small 200 character tweets or, you know, a small empty calorie.

00:19:04.500 --> 00:19:06.900
I like to call them empty calories on social media.

00:19:06.900 --> 00:19:09.160
You know, there's empty calorie info bites, you call them.

00:19:09.160 --> 00:19:09.580
That's great.

00:19:09.580 --> 00:19:10.100
Yeah.

00:19:10.100 --> 00:19:14.960
There's some influencers who post really short things and, you know, they just go viral.

00:19:14.960 --> 00:19:19.320
And if you consume a lot of that, it actually, you think about, hey, did that actually change my life?

00:19:19.320 --> 00:19:21.060
No, it's actually kind of empty calories.

00:19:21.060 --> 00:19:24.340
And a lot of good writers, David Perel writes about this.

00:19:24.340 --> 00:19:28.780
It's like, you want your content to be very niche, very deep, very high in nutrition.

00:19:28.780 --> 00:19:30.700
So I think that's the same thing.

00:19:30.700 --> 00:19:33.900
When you consume light content, you don't actually gain a lot.

00:19:34.040 --> 00:19:35.140
It can be actually downright toxic.

00:19:35.140 --> 00:19:36.280
A lot of times it's light content.

00:19:36.280 --> 00:19:39.980
It's just one statement, you know, trying to sway information, sway the public.

00:19:39.980 --> 00:19:42.540
So I think for your own sake, curate what you consume.

00:19:42.540 --> 00:19:43.260
I agree.

00:19:43.260 --> 00:19:44.980
I mean, it's so important.

00:19:44.980 --> 00:19:46.800
And there's so many knock on effects, right?

00:19:46.800 --> 00:19:49.920
And it also has some interesting machine learning tie-ins.

00:19:49.920 --> 00:19:52.460
I would say here is some recommendation engine tie-ins.

00:19:52.460 --> 00:19:53.340
Yes, definitely.

00:19:53.340 --> 00:19:58.020
I think maybe it's, I'm not 100% sure about the attribution, but I think Tony Robbins said,

00:19:58.080 --> 00:20:00.260
you've got to be the guardian of your own mind, right?

00:20:00.260 --> 00:20:05.480
You've got to consciously decide what you let in, what thoughts you let influence you and

00:20:05.480 --> 00:20:07.820
what ones you just eject and say, this doesn't matter.

00:20:07.820 --> 00:20:11.000
Because what you think can really affect you.

00:20:11.000 --> 00:20:15.360
But I was thinking even more, like if I go onto YouTube and I start to consume something

00:20:15.360 --> 00:20:17.040
that's a little kind of crummy, but whatever.

00:20:17.040 --> 00:20:21.120
The very next thing is you're never intense enough for YouTube.

00:20:21.120 --> 00:20:25.100
If you watch three videos on one topic, like it's like you need a hundred more of these

00:20:25.100 --> 00:20:25.760
videos, right?

00:20:25.760 --> 00:20:29.760
It's just, with a lot of social media, and I don't use Facebook enough, but I can imagine

00:20:29.760 --> 00:20:30.300
it's similar.

00:20:30.300 --> 00:20:35.520
Like as you start to trend even a little bit in one way or the other, it just throws you

00:20:35.520 --> 00:20:40.240
a huge rope and tries to pull you hard for the sake of engagement down that path.

00:20:40.240 --> 00:20:43.360
And so small curves these days seem to matter a lot more.

00:20:43.360 --> 00:20:48.020
Like if you used to like grab, I don't know, some crummy newspaper, like a rumor newspaper

00:20:48.020 --> 00:20:49.580
at the grocery store and read it.

00:20:49.640 --> 00:20:51.520
Then you went back and you read the New York Times and you got home.

00:20:51.520 --> 00:20:54.520
The New York Times wouldn't just stop showing you the important news.

00:20:54.520 --> 00:20:58.960
It would show you all sorts of junk because you read the, but nowadays that's what happens.

00:20:58.960 --> 00:20:59.600
It's crazy.

00:20:59.600 --> 00:21:00.240
It is.

00:21:00.240 --> 00:21:03.360
I think a lot of it is because of just how machine learning works.

00:21:03.360 --> 00:21:07.720
If you click on something and you read something, it thinks that you like that thing and it recommends

00:21:07.720 --> 00:21:12.720
you more of that thing, which sometimes it's just a, you misclick or something and sometimes

00:21:12.720 --> 00:21:13.140
it's a mistake.

00:21:13.140 --> 00:21:17.820
And that's why machine learning and social media can sort of polarize people, which is not what

00:21:17.820 --> 00:21:18.340
we want to do.

00:21:18.620 --> 00:21:22.180
So it really takes, you need to be conscious about how you're affected by it.

00:21:22.180 --> 00:21:22.380
Right.

00:21:22.380 --> 00:21:25.580
Obviously your article is not about this, but we're going to have to reckon with this as

00:21:25.580 --> 00:21:26.200
a society.

00:21:26.200 --> 00:21:26.880
Definitely.

00:21:26.880 --> 00:21:27.480
Yes.

00:21:27.480 --> 00:21:27.940
Period.

00:21:27.940 --> 00:21:28.660
In a big way.

00:21:28.660 --> 00:21:34.380
But let's go on to lesson number two, low versus high signal data and seeking to disconfirm

00:21:34.380 --> 00:21:35.040
an update.

00:21:35.040 --> 00:21:36.140
Tell us about this one.

00:21:36.140 --> 00:21:38.300
So maybe I'll start with the machine learning aspect for it.

00:21:38.300 --> 00:21:41.180
I think what this here is a support vector machine.

00:21:41.180 --> 00:21:43.920
So, you know, you're trying to separate the blue dots from the red dots.

00:21:44.260 --> 00:21:47.820
And, you know, on the leftmost image, I mean, it's very easy to separate, right?

00:21:47.820 --> 00:21:49.320
So you can see the margin is very wide.

00:21:49.320 --> 00:21:53.200
The margin is the dash line, the distance from the dash line to the solid line.

00:21:53.200 --> 00:21:58.980
On the middle image, all of a sudden we introduce a new red dot and the margin becomes very narrow.

00:21:59.360 --> 00:22:02.260
So, you know, your certainty is a lot less and you start to think, hey, you know, maybe

00:22:02.260 --> 00:22:02.920
I'm less certain.

00:22:02.920 --> 00:22:06.640
And then on the right, and then you start to collect more information, more data points

00:22:06.640 --> 00:22:07.620
around that, around that.

00:22:07.620 --> 00:22:09.800
And then all of a sudden your margin becomes a curved margin.

00:22:09.800 --> 00:22:15.780
So what you thought was true and you were very certain about it on the left side of the

00:22:15.780 --> 00:22:18.340
image now suddenly changes to the right.

00:22:18.340 --> 00:22:18.740
Yeah.

00:22:18.740 --> 00:22:22.680
You get a little bit more information and you're like, oh, this is not the dividing line or

00:22:22.680 --> 00:22:23.600
the distinction at all.

00:22:23.600 --> 00:22:25.740
It's totally more nuanced or whatever.

00:22:25.740 --> 00:22:26.000
Yeah.

00:22:26.200 --> 00:22:26.560
Exactly.

00:22:26.560 --> 00:22:30.440
So I know I shouldn't be referring to the image because this is a podcast, but I think

00:22:30.440 --> 00:22:34.660
the image is really powerful in terms of how in machine learning, you change your decision

00:22:34.660 --> 00:22:35.180
boundaries.

00:22:35.180 --> 00:22:37.700
And in real life, this is the same.

00:22:37.700 --> 00:22:41.900
So I think Jeff Bezos has a very powerful quote that says, I think something like this

00:22:41.900 --> 00:22:45.740
that says that, I actually didn't put it here, but I'm just reminded of it that says that,

00:22:45.740 --> 00:22:51.900
hey, you know, when data disagrees with anecdotes, he tends to prefer the anecdotes because

00:22:51.900 --> 00:22:55.500
a lot of times it sort of means that maybe you're measuring things wrongly.

00:22:55.500 --> 00:22:57.520
So data is very easy to collect.

00:22:57.520 --> 00:23:01.940
We have a lot of data, but whereas anecdotes is one or two of those feedback points that

00:23:01.940 --> 00:23:05.560
disagrees and we sort of need to jump into an anecdote, collect more data around it.

00:23:05.560 --> 00:23:07.040
So I think this is the same thing.

00:23:07.040 --> 00:23:10.940
And when you're asking for feedback, often people give you good feedback, you know, you're

00:23:10.940 --> 00:23:13.660
doing great, continuing what you're doing, you're doing fantastic.

00:23:13.660 --> 00:23:17.660
And when people give you bad feedback, dig into it, right?

00:23:17.660 --> 00:23:19.640
That is a gift for you to improve.

00:23:19.640 --> 00:23:20.460
Dig into it.

00:23:20.460 --> 00:23:22.640
Ask them, hey, you know, I love what you just said.

00:23:22.680 --> 00:23:23.960
Can you give me more detail?

00:23:23.960 --> 00:23:28.900
And so it's giving you more detail, more information on how you should be thinking, how you should

00:23:28.900 --> 00:23:31.960
be designing, how your code should change in your code reviews.

00:23:31.960 --> 00:23:33.160
And it helps you grow.

00:23:33.160 --> 00:23:33.620
Yeah.

00:23:33.620 --> 00:23:37.640
I think especially in the US, people are very uncomfortable with negative feedback, both

00:23:37.640 --> 00:23:38.720
giving and receiving it.

00:23:38.720 --> 00:23:41.120
If it's given in the right way, it can be very valuable.

00:23:41.120 --> 00:23:45.120
I mean, maybe listening to like what people put on your YouTube video, that might not be

00:23:45.120 --> 00:23:46.000
all that constructive.

00:23:46.000 --> 00:23:49.040
There's a lot of weird, just people with issues at scale.

00:23:49.040 --> 00:23:50.220
But I totally agree.

00:23:50.220 --> 00:23:52.640
And, you know, some more close space, right?

00:23:52.640 --> 00:23:54.020
Like a code review or something, right?

00:23:54.020 --> 00:23:56.320
That's definitely an opportunity to learn something.

00:23:56.320 --> 00:23:59.800
Even if the person is wrong, it's still an opportunity to learn their perspective.

00:23:59.800 --> 00:24:00.300
Definitely.

00:24:00.300 --> 00:24:01.520
And learn from it, right?

00:24:01.520 --> 00:24:04.740
So one of the quotes you have in the section is by Karl Popper.

00:24:04.740 --> 00:24:09.060
True ignorance is not the absence of knowledge, but the refusal to acquire it.

00:24:09.060 --> 00:24:12.380
And I think that also goes hand in hand with the polarization and stuff you talked about

00:24:12.380 --> 00:24:12.720
before.

00:24:12.720 --> 00:24:13.260
I agree.

00:24:13.260 --> 00:24:15.420
I think that I adopt the growth mindset.

00:24:15.420 --> 00:24:17.260
I think that people can change, can grow.

00:24:17.260 --> 00:24:18.640
And that's what's necessary, right?

00:24:18.640 --> 00:24:19.560
In our industry, right?

00:24:19.560 --> 00:24:20.880
Things change so fast.

00:24:20.880 --> 00:24:21.220
Yeah.

00:24:21.220 --> 00:24:24.720
So we have to try to keep up with the times in terms of the technology.

00:24:24.720 --> 00:24:28.520
I think fundamentally, we should focus on the problems, but don't neglect how technology

00:24:28.520 --> 00:24:29.380
is changing as well.

00:24:29.380 --> 00:24:29.860
For sure.

00:24:29.860 --> 00:24:30.340
All right.

00:24:30.340 --> 00:24:32.540
Number three, explore, exploit.

00:24:32.540 --> 00:24:34.960
Balance for the greater long-term reward.

00:24:34.960 --> 00:24:36.780
It has to do with reinforcement learning, right?

00:24:36.780 --> 00:24:37.100
Yeah.

00:24:37.300 --> 00:24:41.280
So in reinforcement learning, well, you can imagine that at a start, you don't know anything

00:24:41.280 --> 00:24:42.960
about the state of the world, right?

00:24:42.960 --> 00:24:46.000
Let's say you have two, this is an example for me.

00:24:46.000 --> 00:24:48.240
Let's say you have a restaurant that you love.

00:24:48.240 --> 00:24:49.980
Let's say you first landed in Seattle.

00:24:49.980 --> 00:24:51.960
When I first landed in Seattle, I didn't know where to eat.

00:24:51.960 --> 00:24:52.980
I didn't know what was good.

00:24:52.980 --> 00:24:56.500
And I would explore many different restaurants, many different takeouts.

00:24:56.500 --> 00:25:01.040
And after exploring, after maybe I explored 10 of them, I found that, hey, you know, this

00:25:01.040 --> 00:25:01.620
place is great.

00:25:01.620 --> 00:25:02.260
It's cheap.

00:25:02.260 --> 00:25:02.800
It's nearby.

00:25:03.080 --> 00:25:03.780
The food is solid.

00:25:03.780 --> 00:25:06.440
And I would just exploit it all the time.

00:25:06.440 --> 00:25:06.940
Exactly.

00:25:06.940 --> 00:25:11.640
Like in my neighborhood, there's maybe two or three Thai food restaurants that I always

00:25:11.640 --> 00:25:11.980
go to.

00:25:11.980 --> 00:25:14.820
And if somebody says, hey, let's get some Thai food, I'm not going to go, well, there's

00:25:14.820 --> 00:25:15.980
another one we haven't tried yet.

00:25:15.980 --> 00:25:17.480
Let's go to, it's just like, nope, that one's good.

00:25:17.480 --> 00:25:18.060
We're going here.

00:25:18.060 --> 00:25:18.600
You know what I mean?

00:25:18.600 --> 00:25:20.020
I am exactly like that.

00:25:20.020 --> 00:25:21.600
I'm such a lazy thinker.

00:25:21.600 --> 00:25:21.880
Yeah.

00:25:21.980 --> 00:25:24.320
And I just like, this is really good.

00:25:24.320 --> 00:25:25.560
It's my appetite.

00:25:25.560 --> 00:25:26.420
It fits my taste.

00:25:26.420 --> 00:25:27.440
And it's cheap.

00:25:27.440 --> 00:25:28.840
Why do we have to try something new?

00:25:28.840 --> 00:25:31.520
But thankfully, my wife is an explorer.

00:25:31.520 --> 00:25:36.420
So while I'm exploiting, she encourages me, you know, let's explore these new things.

00:25:36.420 --> 00:25:40.740
And sometimes I find treasure gems that I would never have tried.

00:25:40.740 --> 00:25:44.640
But because she's encouraging me to explore, I found it.

00:25:44.640 --> 00:25:44.840
Yeah.

00:25:44.840 --> 00:25:46.220
So that's a balance.

00:25:46.220 --> 00:25:49.740
So at the start of your career or at the start when you're trying to solve a new problem,

00:25:49.740 --> 00:25:50.600
explore.

00:25:50.600 --> 00:25:52.540
Take some time to explore as much as you can.

00:25:52.540 --> 00:25:54.260
But then once you find it, exploit.

00:25:54.260 --> 00:25:58.240
But you know, as you're exploiting, don't forget to also be exploring a bit.

00:25:58.240 --> 00:25:58.600
Yeah.

00:25:58.600 --> 00:26:03.840
So you tie this back to careers a lot about basically, as you sort of touched on, continuous

00:26:03.840 --> 00:26:09.900
learning and, you know, don't get too comfortable and it just go with, don't just fall all the

00:26:09.900 --> 00:26:11.480
way into the exploiting side.

00:26:11.480 --> 00:26:13.720
Like I went to college, I got my engineering degree.

00:26:13.720 --> 00:26:15.540
Why do I need to learn Jupyter?

00:26:15.940 --> 00:26:19.340
I'm just going to keep using MATLAB or Excel and we're just going to keep working on this

00:26:19.340 --> 00:26:20.600
building or bridge or whatever.

00:26:20.600 --> 00:26:20.860
Right.

00:26:20.860 --> 00:26:22.460
Like I think it's easy to do that.

00:26:22.460 --> 00:26:26.680
I spent the time and the money and worked hard and got good grades and a four year degree.

00:26:26.680 --> 00:26:29.140
Why do I need, I'm done with tests and learning.

00:26:29.140 --> 00:26:29.840
I can just stop.

00:26:29.840 --> 00:26:30.060
Right.

00:26:30.060 --> 00:26:30.580
Exactly.

00:26:30.580 --> 00:26:34.260
I was just going to say this exact same, a very similar example.

00:26:34.260 --> 00:26:36.420
I studied SPSS and R in college.

00:26:36.420 --> 00:26:37.860
Why do I need to learn Python?

00:26:37.860 --> 00:26:38.380
Right.

00:26:38.380 --> 00:26:39.780
Well, you know, Python has a lot of benefits.

00:26:39.780 --> 00:26:40.500
It's a lot faster.

00:26:40.500 --> 00:26:45.100
And then, you know, now that I learned Python, I know SQL, why do I need to learn Spark?

00:26:45.420 --> 00:26:50.020
But, you know, if you explore a bit, it can really help you make your work a lot more

00:26:50.020 --> 00:26:50.420
effective.

00:26:50.420 --> 00:26:52.740
Maybe I know decision trees and linear regressions.

00:26:52.740 --> 00:26:54.600
Do I really need neural networks?

00:26:54.600 --> 00:26:58.600
But, you know, it's always taking some time to explore about, explore what might be new.

00:26:58.600 --> 00:27:02.000
Sometimes some of this exploration doesn't work out and that's fine.

00:27:02.000 --> 00:27:05.700
If you think of it from a learning perspective, you're learning a lot, but take the time to

00:27:05.700 --> 00:27:07.080
sample around every now and then.

00:27:07.080 --> 00:27:07.960
Yeah, absolutely.

00:27:07.960 --> 00:27:11.380
You have a couple of great quotes in here from two people.

00:27:11.380 --> 00:27:12.920
I find both of them very interesting.

00:27:12.920 --> 00:27:17.100
Naval, Silicon Valley guy that just goes by Naval, N-A-V-A-L.

00:27:17.100 --> 00:27:23.240
And he has a really interesting wheat storm that turned into a podcast series and some interesting

00:27:23.240 --> 00:27:23.580
thinking.

00:27:23.580 --> 00:27:24.360
Are you familiar with this?

00:27:24.360 --> 00:27:24.940
Oh, yes.

00:27:24.940 --> 00:27:25.740
I love that.

00:27:25.740 --> 00:27:26.940
Yeah, I do too, actually.

00:27:26.940 --> 00:27:28.500
It may be reading your articles and stuff.

00:27:28.500 --> 00:27:29.060
It reminded me of that.

00:27:29.060 --> 00:27:33.480
But he says your goal in life is to find out the people who need you the most, to find

00:27:33.480 --> 00:27:37.780
out the businesses that need you the most, to find the projects and the art that need

00:27:37.780 --> 00:27:38.260
you the most.

00:27:38.260 --> 00:27:39.440
There's something out there for you.

00:27:39.440 --> 00:27:45.720
And then Matthew McConaughey talks about when he was going, he was in law school and decided

00:27:45.720 --> 00:27:49.320
to go into film school, which is obviously a big career switch, somebody.

00:27:49.320 --> 00:27:50.680
And he's like, no, no, I have to do this.

00:27:50.680 --> 00:27:50.920
Right.

00:27:50.920 --> 00:27:52.480
And he was afraid of what his dad would say.

00:27:52.480 --> 00:27:57.360
And instead of saying you've thrown away your career or whatever, he just said, don't

00:27:57.360 --> 00:27:57.900
half-ass it.

00:27:57.900 --> 00:27:59.940
Like, if you're going to do this, you better go do it.

00:27:59.940 --> 00:28:00.140
Right.

00:28:00.140 --> 00:28:04.620
So that's sort of the, Naval's is the explore, McConaughey's, maybe it's the exploit.

00:28:04.620 --> 00:28:06.260
Like once you're in it, go in it full.

00:28:06.260 --> 00:28:06.940
Exactly.

00:28:06.940 --> 00:28:11.340
That being said, I mean, a lot of us, maybe we are a couple of years in our careers, but

00:28:11.340 --> 00:28:15.720
I think what Naval tries to remind us, you might not be very happy what you're doing right

00:28:15.720 --> 00:28:19.360
now, or you might love what you're doing right now, but there's always something that

00:28:19.360 --> 00:28:21.060
suits you specifically.

00:28:21.060 --> 00:28:25.640
So some people might love research and they might not fit into a startup environment.

00:28:25.640 --> 00:28:31.280
Or some people in a, for example, maybe a big innovation lab, they need some of their

00:28:31.280 --> 00:28:34.180
data scientists to really focus on research and they might not fit in there.

00:28:34.180 --> 00:28:37.680
If the data scientist is more about iterating fast and, you know, shipping fast to customers.

00:28:37.680 --> 00:28:39.740
So there's always something better for you.

00:28:39.740 --> 00:28:40.880
So keep exploring.

00:28:41.240 --> 00:28:43.960
And once you found it, exploit it like Matthew McConaughey.

00:28:43.960 --> 00:28:45.000
He didn't half-ass it.

00:28:45.000 --> 00:28:47.360
He ran all in and he's doing fantastically well.

00:28:47.360 --> 00:28:47.780
Yeah.

00:28:47.780 --> 00:28:49.080
It worked out okay for him.

00:28:49.080 --> 00:28:49.320
Yeah.

00:28:49.320 --> 00:28:49.740
Yeah.

00:28:49.740 --> 00:28:51.480
Nice.

00:28:51.480 --> 00:28:51.880
All right.

00:28:51.880 --> 00:28:52.640
Transfer learning.

00:28:52.640 --> 00:28:54.460
Books and papers are cheat codes.

00:28:54.460 --> 00:28:55.000
Yeah.

00:28:55.160 --> 00:29:00.080
So in machine learning, I think a couple of years ago, there was this thing called transfer

00:29:00.080 --> 00:29:03.160
learning, which it didn't make quite a stop, but I thought it was really breakthrough.

00:29:03.160 --> 00:29:08.820
So what it means is that there's this competition called ImageNet, where you try to classify images

00:29:08.820 --> 00:29:10.760
into thousands of categories, right?

00:29:10.760 --> 00:29:11.900
I think 1,000 categories.

00:29:12.760 --> 00:29:18.400
And those big companies like Google, Microsoft, they would train huge models to classify this.

00:29:18.400 --> 00:29:19.900
And this would be deep neural networks.

00:29:19.900 --> 00:29:24.300
And what people found is that you can take what they have trained, this huge model with

00:29:24.300 --> 00:29:29.580
all the weights and parameters, and you can just chop off the last layer and put your own

00:29:29.580 --> 00:29:29.880
model.

00:29:29.880 --> 00:29:31.900
Maybe you're trying to classify cat versus dog.

00:29:31.900 --> 00:29:36.580
You can use that model and then classify cat versus dog and put in your own data and just

00:29:36.580 --> 00:29:37.480
update the model.

00:29:37.480 --> 00:29:39.760
And it would work fantastically well.

00:29:40.160 --> 00:29:43.360
And all those of magnitude better than if you had to train from scratch.

00:29:43.360 --> 00:29:43.820
Wow.

00:29:43.820 --> 00:29:44.260
Yeah.

00:29:44.260 --> 00:29:45.340
I think that's a cheat code.

00:29:45.340 --> 00:29:47.860
I use that when I first heard about it, I use that cheat code.

00:29:47.860 --> 00:29:48.300
Okay.

00:29:48.300 --> 00:29:52.280
Since I've heard about it, I've only used transfer learning for work as much as I can.

00:29:52.280 --> 00:29:53.600
If there's a transfer learning model.

00:29:53.600 --> 00:29:53.900
Yeah.

00:29:53.900 --> 00:29:58.460
So the idea is you, instead of starting with a completely blank set of weights in your model

00:29:58.460 --> 00:30:00.720
and just feeding data and going, no, that's right.

00:30:00.720 --> 00:30:01.220
That's not right.

00:30:01.220 --> 00:30:01.700
That's a dog.

00:30:01.700 --> 00:30:02.220
That's not a cat.

00:30:02.220 --> 00:30:03.000
No, that's yes.

00:30:03.000 --> 00:30:03.600
That one is a cat.

00:30:03.600 --> 00:30:04.040
Good job.

00:30:04.040 --> 00:30:08.180
You can use kind of a vague one to automate some of that driving.

00:30:08.180 --> 00:30:09.060
Is that kind of the idea?

00:30:09.160 --> 00:30:11.620
Like you give it a little bit of knowledge, but not too much.

00:30:11.620 --> 00:30:13.120
And then you keep still teaching it.

00:30:13.120 --> 00:30:18.520
Is that this model is able to distinguish a thousand different cats and dogs and hamburgers

00:30:18.520 --> 00:30:19.360
and cars.

00:30:19.360 --> 00:30:19.840
Yeah.

00:30:19.840 --> 00:30:23.560
And you're just taking all that knowledge that's in there and you're just fine tuning

00:30:23.560 --> 00:30:26.080
it for your specific use case of cat versus dog.

00:30:26.080 --> 00:30:28.400
And it really cuts out so much effort.

00:30:28.400 --> 00:30:29.220
Yeah, absolutely.

00:30:29.220 --> 00:30:30.000
Very, very cool.

00:30:30.120 --> 00:30:34.880
So in this one, the life lesson here is we were touching this before that people feel

00:30:34.880 --> 00:30:37.420
like a lot of times they've gotten, they've got their degree, they've studied, they've

00:30:37.420 --> 00:30:37.880
worked hard.

00:30:37.880 --> 00:30:40.460
Like they just want to have fun and live their life and not just keep going.

00:30:40.460 --> 00:30:46.500
But like the way to think of education, formalized education, let's say, even I would say college

00:30:46.500 --> 00:30:49.000
is this generalized pre-training, right?

00:30:49.000 --> 00:30:50.620
You're not ready to really go do the thing.

00:30:50.620 --> 00:30:55.320
You haven't really learned the thing, but you're much closer than somebody who hasn't, right?

00:30:55.320 --> 00:30:55.920
Something like that.

00:30:55.920 --> 00:30:57.380
Schools, generalized pre-training.

00:30:57.720 --> 00:30:58.600
Yes, I fully agree.

00:30:58.600 --> 00:31:03.420
I think a lot of people think that at least what I've seen, some people see that after

00:31:03.420 --> 00:31:05.020
they graduated, I'm done learning.

00:31:05.020 --> 00:31:09.220
But I don't know about you, but after I graduated, after I got into working world, I realized,

00:31:09.220 --> 00:31:11.320
hey, I didn't learn any of this stuff.

00:31:11.320 --> 00:31:11.740
Yeah.

00:31:11.740 --> 00:31:14.580
In the working world, I had to learn a lot more.

00:31:14.580 --> 00:31:15.580
Almost none.

00:31:15.580 --> 00:31:20.460
I think the only thing that I learned at school that helped was Excel and R and SPSS, the technical

00:31:20.460 --> 00:31:20.820
skills.

00:31:20.820 --> 00:31:26.280
So school sort of trains, you know, teamwork, how to work in a project team, communication,

00:31:26.960 --> 00:31:30.380
how to learn fast because, you know, every semester you'll be taking four subjects.

00:31:30.380 --> 00:31:31.940
It teaches you the general stuff.

00:31:31.940 --> 00:31:35.500
And then once you get to work, there's a lot of on-the-job training that you have to do yourself.

00:31:35.500 --> 00:31:40.520
So the point that I was trying to make here is that school is really just the start of

00:31:40.520 --> 00:31:40.620
it.

00:31:40.620 --> 00:31:41.700
It's generalized pre-training.

00:31:41.700 --> 00:31:42.880
You could stop there.

00:31:42.880 --> 00:31:48.840
But if you fine-tune, if you take the effort to fine-tune your model from school onto your

00:31:48.840 --> 00:31:52.020
very specific tasks, you would be orders of magnitude more effective.

00:31:52.020 --> 00:31:53.520
That's the message I'm trying to give.

00:31:53.520 --> 00:31:58.060
And I also want to tell people, and this is the next paragraph, is that we have transfer

00:31:58.060 --> 00:31:59.080
learning models, right?

00:31:59.080 --> 00:32:04.580
And in real life, that's the same thing for transfer learning models, which is books and

00:32:04.580 --> 00:32:05.680
academic papers.

00:32:05.680 --> 00:32:08.500
I want to try to get people to read more books.

00:32:08.500 --> 00:32:10.460
I've gained a lot from books.

00:32:10.660 --> 00:32:12.240
I've gained a lot from academic papers.

00:32:12.240 --> 00:32:16.540
And you can imagine, I don't know, maybe you read a book by, I don't know, Sapiens by

00:32:16.540 --> 00:32:21.320
Noor Harari or Deep Work by Cal Newport, who I'm a big fan of.

00:32:21.320 --> 00:32:24.080
They have thought about this on the Vals, Navelle's thread, right?

00:32:24.080 --> 00:32:29.200
They have thought about this for so long, five years, decades, and they have compressed it in

00:32:29.200 --> 00:32:31.700
a book that you can read in eight to 10 hours.

00:32:31.700 --> 00:32:34.140
Read it and you'll be that much smarter.

00:32:34.140 --> 00:32:37.940
And you'll see life in a different way and you'll gain a lot.

00:32:37.940 --> 00:32:39.300
Yeah, that's super interesting.

00:32:39.300 --> 00:32:42.460
I do agree with it that you don't necessarily have to agree with everything they say.

00:32:42.460 --> 00:32:47.220
It doesn't have to match your situation, but it does give you a whole lot more experience

00:32:47.220 --> 00:32:50.300
without going through the hardship of getting that experience.

00:32:50.300 --> 00:32:50.880
Exactly.

00:32:50.880 --> 00:32:52.060
And I think it's magic.

00:32:52.060 --> 00:32:52.520
Yeah.

00:32:52.520 --> 00:32:55.620
That's what separates human beings from animals, right?

00:32:55.620 --> 00:32:56.980
Where we can transfer knowledge.

00:32:56.980 --> 00:32:58.760
We can perform telepathy.

00:32:58.760 --> 00:33:02.300
I can transfer knowledge to you, to anyone on the internet by writing.

00:33:02.300 --> 00:33:04.320
And people do that through books in the past.

00:33:04.320 --> 00:33:05.340
Yeah, very interesting.

00:33:05.340 --> 00:33:06.180
So that's magic.

00:33:06.180 --> 00:33:11.140
So you say books are the weights and biases of the great thinkers who've come before us.

00:33:11.140 --> 00:33:11.740
That's pretty awesome.

00:33:14.040 --> 00:33:16.600
This portion of Talk Python To Me is sponsored by Linode.

00:33:16.600 --> 00:33:20.960
Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

00:33:20.960 --> 00:33:25.000
Develop, deploy, and scale your modern applications faster and easier.

00:33:25.000 --> 00:33:32.400
Whether you're developing a personal project or managing large workloads, you deserve simple, affordable, and accessible cloud computing solutions.

00:33:32.640 --> 00:33:36.680
As listeners of Talk Python To Me, you'll get a $100 free credit.

00:33:36.680 --> 00:33:40.800
You can find all the details at talkpython.fm/Linode.

00:33:40.800 --> 00:33:46.540
Linode has data centers around the world with the same simple and consistent pricing regardless of location.

00:33:46.540 --> 00:33:49.380
Just choose the data center that's nearest to your users.

00:33:49.560 --> 00:33:56.080
You'll also receive 24-7365 human support with no tiers or handoffs regardless of your plan size.

00:33:56.080 --> 00:34:06.300
You can choose shared and dedicated compute instances, or you can use your $100 in credit on S3 compatible object storage, managed Kubernetes clusters, and more.

00:34:06.520 --> 00:34:08.780
If it runs on Linux, it runs on Linode.

00:34:08.780 --> 00:34:16.160
Visit talkpython.fm/Linode or click the link in your show notes, then click that create free account button to get started.

00:34:16.160 --> 00:34:22.680
On this, I wanted to ask you about what you thought about, quote, new media, right?

00:34:22.680 --> 00:34:26.240
Like, I'm thinking in particular YouTube, but other places, right?

00:34:26.240 --> 00:34:33.120
Like, these books are very, like, the tradition of meaningful books are very important and long, and everybody knows about them, right?

00:34:33.120 --> 00:34:35.560
But, you talked about the MOOCs before.

00:34:35.940 --> 00:34:46.900
I mean, YouTube, I was talking about the negative aspects of it before, but there, it's amazing what you can learn if you go over to places like YouTube with the desire to seek out this kind of information.

00:34:46.900 --> 00:34:52.580
Like, there's so much good stuff there mixed in with, like, cat videos, right?

00:34:52.580 --> 00:34:53.340
Yeah.

00:34:53.340 --> 00:34:53.980
What do you think?

00:34:53.980 --> 00:34:57.340
I think that I am ambivalent to it.

00:34:57.340 --> 00:35:00.220
I think I have a preferred way of learning, which is to read books.

00:35:00.220 --> 00:35:05.840
Nonetheless, I know that video is awesome in the sense that I can write some code, I can iterate, and you can see me doing it.

00:35:05.860 --> 00:35:09.640
It's almost impossible to convey that in a book, maybe some code examples.

00:35:09.640 --> 00:35:11.140
So, that's really amazing.

00:35:11.140 --> 00:35:19.440
I think both new media has strengths, and it also has, the difference between new media and traditional media is that new media is just a lot more powerful.

00:35:19.440 --> 00:35:20.380
A lot more powerful.

00:35:20.380 --> 00:35:23.720
It can be powerful for good, and also powerful for not so good.

00:35:23.720 --> 00:35:26.320
A lot more powerful in the sense that, you know, MOOCs are amazing, right?

00:35:26.520 --> 00:35:29.340
A professor can teleport into your house and teach you.

00:35:29.340 --> 00:35:30.300
That's powerful.

00:35:30.300 --> 00:35:39.880
But powerful for not so good in the sense that we have things like that reduce your attention span, TikTok videos, or short snippets of articles that it's also very powerful.

00:35:39.880 --> 00:35:42.560
It's just whether it's for good or not so good.

00:35:42.680 --> 00:35:46.160
So, that's something to think about when you're using new media to learn.

00:35:46.160 --> 00:35:46.700
It is.

00:35:46.700 --> 00:35:52.560
It's definitely more risky in that you can get distracted and pulled away because all those places are about that for sure.

00:35:52.680 --> 00:36:04.780
You know, like, I was recently watching a video of an opera singer and, like, vocal coach analyzing, like, this heavy metal singer and, like, dissecting the song from a perspective of, like, an opera singer.

00:36:04.780 --> 00:36:10.120
And I'm like, I appreciate both of those art forms way more having seen, like, that experience.

00:36:10.120 --> 00:36:12.000
And I would never have that experience.

00:36:12.000 --> 00:36:12.620
Exactly.

00:36:12.620 --> 00:36:17.920
And I would never go down to the bookstore and pick up a book on, like, singing theory and stuff.

00:36:17.920 --> 00:36:19.080
I just wouldn't, right?

00:36:19.080 --> 00:36:21.280
I would find something else to spend my time on.

00:36:21.280 --> 00:36:24.800
But anyway, like, those are the kinds of things I'm thinking that you just, you wouldn't expect.

00:36:24.800 --> 00:36:26.900
But you can find interesting things there, right?

00:36:26.900 --> 00:36:27.380
Definitely.

00:36:27.380 --> 00:36:27.820
Definitely.

00:36:27.820 --> 00:36:32.300
Hey, I want to go back really to this one really super quick because I skipped over it right at the end.

00:36:32.300 --> 00:36:42.820
But you talk about this exploring versus exploitation thing and saying, look, you don't always have to worry so much about this stuff because sometimes many things are what you call two-way doors.

00:36:42.820 --> 00:36:45.680
And there's this interview over here of Jeff Bezos.

00:36:45.680 --> 00:36:51.620
And he has this distinction of some decisions being two-way doors and some of them being one-way doors.

00:36:51.620 --> 00:36:57.320
And you shouldn't put about the same concern, worry, energy, and whatnot into both types of decisions.

00:36:57.320 --> 00:36:58.840
They're not equal, so don't treat them equal.

00:36:58.840 --> 00:36:59.920
Can you speak to that real quick?

00:36:59.920 --> 00:37:01.000
I thought that was super interesting.

00:37:01.000 --> 00:37:01.540
Yeah.

00:37:01.660 --> 00:37:03.420
I'm just going to use Jeff's example.

00:37:03.420 --> 00:37:10.080
For example, maybe you're doing a side hustle and someone wants to acquire it or someone wants to completely purchase it from you.

00:37:10.080 --> 00:37:13.940
That's a one-way door that, you know, after you sold it, there's no way to reverse it, right?

00:37:13.940 --> 00:37:14.380
Yeah.

00:37:14.600 --> 00:37:24.460
So, however, you know that maybe you start a side hustle and then maybe you're thinking, hey, should I be targeting Python programmers or R programmers or, I don't know, Scala programmers?

00:37:24.460 --> 00:37:26.160
That's a two-way door.

00:37:26.160 --> 00:37:27.580
You can easily pivot.

00:37:27.580 --> 00:37:33.960
So, I think what Jeff is saying is that the one-way door, very difficult to reverse, but it's a lot less.

00:37:33.960 --> 00:37:34.900
It happens a lot less.

00:37:34.900 --> 00:37:39.340
And a lot of us, a lot of people treat two-way door decisions as one-way door decisions.

00:37:39.340 --> 00:37:48.260
And I think he's saying that, hey, you know, it's good to distinguish between both and devote the amount of energy and due diligence and analysis into each of this.

00:37:48.260 --> 00:37:53.440
And that makes you help you make more effective decisions more efficiently.

00:37:53.440 --> 00:37:53.960
Yeah.

00:37:53.960 --> 00:37:54.900
I totally agree.

00:37:54.900 --> 00:37:57.320
I thought this was, like I said, I thought this was insightful.

00:37:57.560 --> 00:38:04.680
And I feel like in programming and in code, a lot of people get stuck in the so-called analysis paralysis.

00:38:04.680 --> 00:38:06.540
They're just stuck, like, trying to decide.

00:38:06.540 --> 00:38:12.640
Like, every decision is, like, it's overwhelming and it's hard to decide what to do because you feel like you might make the wrong decision.

00:38:12.640 --> 00:38:13.700
You don't have enough experience.

00:38:13.700 --> 00:38:16.320
Like, so much of those things, you can just, oh, we'll just refactor this later.

00:38:16.320 --> 00:38:17.620
Or we can just throw away this later.

00:38:17.620 --> 00:38:19.880
Like, oh, I'm going to use a relational database.

00:38:19.880 --> 00:38:21.300
Oh, we should have used a NoSQL database.

00:38:21.300 --> 00:38:23.500
We'll just throw it away and switch it over.

00:38:23.500 --> 00:38:30.840
It's, like, probably not that big of a deal versus we're going to let all the Python guys go and hire a bunch of Java guys and rewrite the whole thing.

00:38:30.840 --> 00:38:33.720
Like, if you're a year into that decision, you're fairly committed, right?

00:38:33.720 --> 00:38:40.180
Like, so there's really, I think, understanding, like, oh, that's a two-way door decision allows you to just, like, try it and experiment.

00:38:40.180 --> 00:38:40.980
Fully agree.

00:38:40.980 --> 00:38:46.420
And that's hopefully by understanding the difference between one-way door and two-way doors, it makes it easier to explore, right?

00:38:46.420 --> 00:38:46.880
Yeah.

00:38:46.880 --> 00:38:47.320
Yeah.

00:38:47.320 --> 00:38:47.760
I agree.

00:38:47.760 --> 00:38:48.900
I think that's worth calling out.

00:38:48.900 --> 00:38:49.340
All right.

00:38:49.340 --> 00:38:49.960
Iterations.

00:38:49.960 --> 00:38:52.300
Find reps you can tolerate and iterate fast.

00:38:52.600 --> 00:38:55.400
So I think machine learning, a lot of machine learning involves iteration.

00:38:55.400 --> 00:38:57.780
Clearly, neural networks are iterations.

00:38:57.780 --> 00:39:06.120
Every time you pass the data through multiple epochs and the data learns, with each iteration, with each epoch, the machine learning model error reduces.

00:39:06.120 --> 00:39:11.080
Machine learning gets model, provides better predictions and gets more accurate with your metrics.

00:39:11.080 --> 00:39:12.860
That is the same with life.

00:39:12.860 --> 00:39:16.160
I think a lot of people expect, this is what I used to expect.

00:39:16.160 --> 00:39:19.880
I expected that I would read something once and I would fully understand it.

00:39:19.880 --> 00:39:21.300
All of the knowledge is in my brain.

00:39:21.680 --> 00:39:23.980
And then, you know, I realized that that is never true.

00:39:23.980 --> 00:39:27.860
So, and actually, that actually lowers my expectation of myself.

00:39:27.860 --> 00:39:32.280
You know, sometimes I'll read a paper and then, you know, I'll try to discuss it and realize I don't actually know the details.

00:39:32.280 --> 00:39:33.660
That lowers expectation of myself.

00:39:33.660 --> 00:39:37.020
You know, I tell myself, Eugene, by reading it once, you're never going to get it.

00:39:37.020 --> 00:39:37.540
Yeah.

00:39:37.540 --> 00:39:39.200
Don't fully expect yourself to that.

00:39:39.200 --> 00:39:40.100
It's too high a bar.

00:39:40.100 --> 00:39:42.280
So maybe read it a few times.

00:39:42.280 --> 00:39:43.520
Be kinder to yourself.

00:39:43.520 --> 00:39:44.620
So that's the same thing.

00:39:44.940 --> 00:39:47.360
When I read papers, I go through it multiple times.

00:39:47.360 --> 00:39:52.360
When I do A-B tests, I fail two times for every one time I succeed.

00:39:52.360 --> 00:39:54.720
So I feel like 50 to 75% of the time.

00:39:54.720 --> 00:39:58.660
And you just got to learn to be kinder with yourself where you iterate, right?

00:39:58.660 --> 00:40:00.620
I think I've posted some examples here.

00:40:00.980 --> 00:40:03.620
The Angry Birds developers failed 51 times.

00:40:03.620 --> 00:40:09.900
Sir James Dyson failed 5,000 times in 15 years before a vacuum cleaner work, right?

00:40:09.900 --> 00:40:12.600
And, you know, imagine if he gave up, we would never have that.

00:40:12.600 --> 00:40:13.280
Yeah, yeah.

00:40:13.280 --> 00:40:17.580
But again, a lot of great examples here about people who just iterated and just stuck to it.

00:40:17.580 --> 00:40:20.300
I think it also sort of ties into the previous one, right?

00:40:20.300 --> 00:40:21.260
The two-way doors.

00:40:21.260 --> 00:40:23.940
It's okay if it doesn't work on a lot of these types of things.

00:40:23.940 --> 00:40:24.600
Just keep going.

00:40:24.600 --> 00:40:25.560
Just try again, right?

00:40:25.560 --> 00:40:27.340
Eventually, you'll find one that fits, yeah?

00:40:27.340 --> 00:40:27.860
Exactly.

00:40:27.860 --> 00:40:28.380
Yeah.

00:40:28.480 --> 00:40:31.340
Speaking of fitting, overfitting, focus on intuition and keep learning.

00:40:31.340 --> 00:40:36.420
So I think overfitting is, well, I guess it's overfitting is when your machine learning model

00:40:36.420 --> 00:40:40.320
memorizes the training set too much and can't predict well on the prediction set.

00:40:40.320 --> 00:40:40.700
Right.

00:40:40.700 --> 00:40:42.340
It's almost perfect on the training set.

00:40:42.340 --> 00:40:46.600
Like it knows that it's cold, but it's so specific that any slight variation, even though

00:40:46.600 --> 00:40:49.040
it should be a dog, it doesn't know it's a dog.

00:40:49.040 --> 00:40:49.580
Exactly.

00:40:49.580 --> 00:40:53.480
An example of this is when your machine learning model learns on customer IDs.

00:40:53.480 --> 00:40:56.500
And, you know, when new customer IDs come in, it's just crap.

00:40:57.040 --> 00:41:01.160
So I think the clearest, the person who really pushes for this is Richard Feynman.

00:41:01.160 --> 00:41:04.700
He says that there's no, and the way he teaches math and physics is the same.

00:41:04.700 --> 00:41:06.540
He goes directly into intuition.

00:41:06.540 --> 00:41:09.420
Forget about formulas or forget about memorizing stuff.

00:41:09.420 --> 00:41:15.040
If you understand the intuition, you will understand it better and you can generalize across

00:41:15.040 --> 00:41:15.920
many, many things.

00:41:15.920 --> 00:41:17.940
I think in life, it also makes the same sense.

00:41:18.020 --> 00:41:22.120
Don't try to memorize things or don't try to memorize knowledge, right?

00:41:22.120 --> 00:41:28.280
I think if you have the intuition of the fundamentals, you'll find that it transfers across many,

00:41:28.280 --> 00:41:29.340
many different domains.

00:41:29.340 --> 00:41:32.320
For example, I think Elon Musk talks about knowledge as a tree.

00:41:32.320 --> 00:41:35.320
So, you know, the fundamentals of the tree are the trunk.

00:41:35.320 --> 00:41:38.800
That thick trunk is the fundamentals that supports all the branches.

00:41:38.800 --> 00:41:41.620
You want to make sure that the intuition is like the trunk.

00:41:41.800 --> 00:41:45.640
You want to make sure that your trunk is solid and then you can build on new branches or cut

00:41:45.640 --> 00:41:47.960
off new branches and grow new branches as necessary.

00:41:47.960 --> 00:41:52.100
So I think the way to grow this intuition, at least for me, I find that being a beginner

00:41:52.100 --> 00:41:53.420
is the best way to do this.

00:41:53.420 --> 00:41:54.240
Yeah, absolutely.

00:41:54.240 --> 00:41:54.760
Okay.

00:41:54.760 --> 00:41:55.320
Very interesting.

00:41:55.320 --> 00:41:58.940
A last one has to do with ensembles and ensembling.

00:41:58.940 --> 00:42:00.180
Diversity is strength.

00:42:00.180 --> 00:42:01.380
What is ensembling?

00:42:01.680 --> 00:42:06.380
Yeah, I guess ensembling is that I could train a model, maybe a linear regression and another

00:42:06.380 --> 00:42:09.840
model, maybe a decision tree and then another model, maybe a key nearest neighbors.

00:42:09.840 --> 00:42:12.180
And they would all have their different errors.

00:42:12.180 --> 00:42:13.980
They would all have the different biases and strengths.

00:42:13.980 --> 00:42:17.420
But the unusual thing in machine learning is that, you know, you can just take all their

00:42:17.420 --> 00:42:21.320
predictions and average them and they would do better than all of them combined.

00:42:21.320 --> 00:42:25.140
And actually that's a cheat code that everyone is doing on Kaggle competition.

00:42:25.140 --> 00:42:28.180
You just train thousands of models and just combine all of them.

00:42:29.080 --> 00:42:32.340
It reminds me of like a much simpler example.

00:42:32.340 --> 00:42:34.420
There's sort of the wisdom of crowds.

00:42:34.420 --> 00:42:39.260
Like you hear stories of people saying, look, here's a jar, a big glass jar full of jelly

00:42:39.260 --> 00:42:39.580
beans.

00:42:39.580 --> 00:42:41.840
You got to guess how many jelly beans there are.

00:42:41.840 --> 00:42:45.280
Like many, any given person will over underestimate a whole lot.

00:42:45.280 --> 00:42:48.460
But if you ask a hundred people, it's usually really close to the actual number.

00:42:48.460 --> 00:42:53.960
Or there's some weird examples of this at like state fairs, there'll be like a cow and people

00:42:53.960 --> 00:42:56.180
have to ask like, how much does the cow weigh?

00:42:56.180 --> 00:42:58.520
You know, it's like a competition and people get it really wrong.

00:42:58.580 --> 00:43:03.160
But it's usually really close if enough people answer and participate and it's average, right?

00:43:03.160 --> 00:43:03.680
Exactly.

00:43:03.680 --> 00:43:08.900
So having diversity, diversity of opinions, diversity of thoughts, I think is very powerful.

00:43:08.900 --> 00:43:09.420
Yeah.

00:43:09.420 --> 00:43:11.300
So that's what I'm trying to encourage here as well.

00:43:11.300 --> 00:43:11.660
Yeah.

00:43:11.660 --> 00:43:15.440
So what's the story about life here instead of like guessing the way to cows, which is not

00:43:15.440 --> 00:43:16.200
all that practical.

00:43:16.200 --> 00:43:21.680
Well, I think that one way, okay, maybe a quick one, which is one way to do when you are trying

00:43:21.680 --> 00:43:27.320
to build teams is you might want to deliberately try to find people which are different from

00:43:27.320 --> 00:43:27.820
you, right?

00:43:27.960 --> 00:43:29.760
which complement your strengths.

00:43:29.760 --> 00:43:33.460
Sometimes in tech interviews, we want to find people that are similar to us, have the

00:43:33.460 --> 00:43:37.240
same skill sets that, you know, fit this mold, fit this job description.

00:43:37.240 --> 00:43:38.280
That's useful.

00:43:38.280 --> 00:43:39.100
It's effective.

00:43:39.100 --> 00:43:43.400
But I personally have built teams whereby it's very diverse, maybe from different countries

00:43:43.400 --> 00:43:44.740
or one third female.

00:43:45.220 --> 00:43:49.440
And I found that the creativity that comes from this is really powerful.

00:43:49.440 --> 00:43:52.980
And the other one, which is, I think, of course, Scott Adams is known for this.

00:43:52.980 --> 00:43:56.800
He says that, you know, if you can't be the top of your field, combine multiple superpowers

00:43:56.800 --> 00:43:57.840
like Scott Adams did.

00:43:57.840 --> 00:44:01.780
He combined his ability to draw, his sense of humor and his business know-how and he created

00:44:01.780 --> 00:44:05.180
Dilbert, which is no one else can replicate Dilbert.

00:44:05.180 --> 00:44:07.220
It needs someone like Scott Adams to do that.

00:44:07.220 --> 00:44:08.100
Yeah.

00:44:08.100 --> 00:44:12.260
You know, that general idea, I actually hit on this a lot because there are a lot of people

00:44:12.260 --> 00:44:16.660
who listen to this show who are not traditional computer science developers, traditional data

00:44:16.660 --> 00:44:17.680
science folks.

00:44:17.680 --> 00:44:23.140
And I think sometimes they feel like they don't have quite the same skill set to compete with

00:44:23.140 --> 00:44:23.620
those people.

00:44:23.620 --> 00:44:27.220
And how are they going to compete with somebody with a master's degree from Stanford in computer

00:44:27.220 --> 00:44:27.600
science?

00:44:27.600 --> 00:44:33.020
And what I, my thought on all this is, you look, if you're really good at economics and

00:44:33.020 --> 00:44:36.800
you're pretty good at programming, there's not too many people who have both of those skills,

00:44:36.800 --> 00:44:37.160
right?

00:44:37.160 --> 00:44:41.760
Like all of a sudden you go from competing with a hundred thousand down to like 500 or something.

00:44:41.920 --> 00:44:42.180
I don't know.

00:44:42.180 --> 00:44:45.940
Not that maybe it's a little bit extreme, but you know, like the, if you need that intersection

00:44:45.940 --> 00:44:48.440
of skills, all of a sudden it becomes super powerful.

00:44:48.440 --> 00:44:51.200
And what you're suggesting here is maybe like building teams.

00:44:51.200 --> 00:44:54.400
You can kind of build that in the team rather than in an individual.

00:44:54.400 --> 00:44:55.080
Exactly.

00:44:55.080 --> 00:44:59.300
And I want to go back to your previous example, someone who's maybe decent programmer, but,

00:44:59.300 --> 00:45:03.400
you know, can't compete with someone who graduated with a degree in, degree in CS and a

00:45:03.400 --> 00:45:04.860
master's in CS and PhD in CS.

00:45:04.860 --> 00:45:06.920
It goes back to Navelle's tweet, right?

00:45:07.220 --> 00:45:12.780
There's something that is just right for you that can tap on your skills in economics and

00:45:12.780 --> 00:45:13.480
comm science.

00:45:13.480 --> 00:45:15.860
You just need to find it and that'll be a great fit.

00:45:15.860 --> 00:45:16.800
Yeah, absolutely.

00:45:16.800 --> 00:45:17.540
All right.

00:45:17.540 --> 00:45:22.680
Well, that was the seven items and I, you know, I enjoyed thinking about them and just seeing

00:45:22.680 --> 00:45:27.400
how these machine learning examples maybe can be analogies for living life.

00:45:27.400 --> 00:45:28.160
It's pretty cool.

00:45:28.160 --> 00:45:28.660
Thank you.

00:45:28.660 --> 00:45:29.120
Yeah.

00:45:29.340 --> 00:45:30.120
Yeah, absolutely.

00:45:30.120 --> 00:45:33.860
So a couple of other things real quick that you've spoken about is you've written a couple

00:45:33.860 --> 00:45:38.760
of things on sort of productivity as a developer and in the tech field.

00:45:38.760 --> 00:45:42.900
One article called how to accomplish more with less useful tools and routines.

00:45:42.900 --> 00:45:48.340
And then also routines and tools to optimize your day, which is a guest post by Susan Hsu.

00:45:48.340 --> 00:45:48.840
Yep.

00:45:48.840 --> 00:45:50.460
So those are really interesting.

00:45:50.460 --> 00:45:56.100
But in particular, during one of them, I don't remember which one you talk about this article

00:45:56.100 --> 00:46:03.040
by Paul Graham and Paul Graham wrote this thing called maker's schedule versus manager's schedule.

00:46:03.040 --> 00:46:08.640
And, you know, I think you talk a lot or we talk in the tech field a lot about getting into flow,

00:46:08.640 --> 00:46:12.140
really programming and just having uninterrupted time.

00:46:12.140 --> 00:46:17.020
And yet I think probably more than ever, people are being pulled in different directions because

00:46:17.020 --> 00:46:19.460
everyone is just a Zoom call away.

00:46:19.460 --> 00:46:23.860
It's not even if you're not in the office, you're now just as eligible to be sucked into a meeting

00:46:23.860 --> 00:46:24.720
as anyone else.

00:46:24.720 --> 00:46:24.980
Right.

00:46:24.980 --> 00:46:25.820
Yep, definitely.

00:46:26.180 --> 00:46:26.380
Yes.

00:46:26.380 --> 00:46:27.840
Can you talk real quickly about this?

00:46:27.840 --> 00:46:32.720
Because I think it's a short article, but I think having awareness of this idea of a maker's

00:46:32.720 --> 00:46:37.740
schedule and a manager's schedule and how they're not super compatible and you got to be careful

00:46:37.740 --> 00:46:38.740
to help them coexist.

00:46:38.740 --> 00:46:39.780
I think that's important.

00:46:39.780 --> 00:46:40.240
Yeah.

00:46:40.240 --> 00:46:43.620
So I think that, of course, all credit goes to Paul Graham for this.

00:46:43.620 --> 00:46:48.860
So makers to even start to design something or to start to code a framework, you sort of

00:46:48.860 --> 00:46:52.640
need, I don't know about you, but it takes me like 30 minutes to warm up, to have to load

00:46:52.640 --> 00:46:55.860
all the concepts into my memory so I can start juggling them in my head.

00:46:55.860 --> 00:46:58.960
And then, you know, once I load all that into my head and then I can, okay, I can start

00:46:58.960 --> 00:47:02.260
writing pseudocode, you know, tweaking things, testing things iteratively.

00:47:02.260 --> 00:47:03.460
And that takes time.

00:47:03.460 --> 00:47:06.860
And it takes me maybe about 45 minutes, 60 minutes to get into flow.

00:47:07.220 --> 00:47:11.180
And once I'm in the flow, I'm moving really quickly, like speeding things through or once

00:47:11.180 --> 00:47:12.760
I'm in the flow of, you know, fixing a bug.

00:47:12.760 --> 00:47:16.060
And I don't know about you, but if I don't fix the bug, I can't stop.

00:47:16.060 --> 00:47:16.920
I can't go for lunch.

00:47:16.920 --> 00:47:21.780
And that motivation, that drive, if someone pulls me into a meeting, it sort of kills the

00:47:21.780 --> 00:47:23.420
motivation sometimes for the day.

00:47:23.420 --> 00:47:27.400
And, you know, if you had continued for just 30 minutes, you would have fixed it.

00:47:27.400 --> 00:47:30.680
But if it's broken by something in the middle, you'll be gone.

00:47:31.020 --> 00:47:36.640
So how I try to do this is that I actually deliberately block my time in the morning before

00:47:36.640 --> 00:47:37.040
lunch.

00:47:37.040 --> 00:47:40.820
I actually block it out with meetings, my own meetings, so that I can actually use that time

00:47:40.820 --> 00:47:41.380
to get in the flow.

00:47:41.380 --> 00:47:43.420
Depends on when your energy level is highest.

00:47:43.420 --> 00:47:44.520
For me, it's actually in the morning.

00:47:44.520 --> 00:47:48.140
And then I actually have, I say that when people want to ask for a meeting, I say, oh, sure,

00:47:48.140 --> 00:47:49.520
let's do it after 3 p.m.

00:47:49.520 --> 00:47:50.660
If you're okay with it.

00:47:50.660 --> 00:47:53.780
Because after 3 p.m., I mostly can't do deep work anyway.

00:47:53.780 --> 00:47:55.580
So I think that's useful to be aware of.

00:47:55.580 --> 00:47:56.620
Yeah, it's really interesting.

00:47:56.620 --> 00:47:57.580
And I agree with that.

00:47:57.580 --> 00:48:01.000
Paul talks about if you're on a manager's schedule, what you do?

00:48:01.000 --> 00:48:02.940
You go from meeting to meeting to meeting.

00:48:02.940 --> 00:48:06.660
And if you've got an hour gap in your day, you know, oh, you could just meet with somebody

00:48:06.660 --> 00:48:06.840
else.

00:48:06.840 --> 00:48:10.360
Maybe that's like a time to just set up a meeting so you could get to know somebody and dig in

00:48:10.360 --> 00:48:12.640
with that person or the team or the project.

00:48:12.640 --> 00:48:15.560
And that's fine if you're on a manager's schedule.

00:48:15.560 --> 00:48:19.820
But if you're on the maker's schedule, maybe you do need the whole morning uninterrupted so

00:48:19.820 --> 00:48:20.780
that you can get into that.

00:48:20.780 --> 00:48:25.240
You know, like, you know, you've had a good session when, you know, you've been programmed,

00:48:25.240 --> 00:48:26.640
program, and then you stop for a second.

00:48:26.640 --> 00:48:28.240
And you're like, wow, I'm hungry.

00:48:28.640 --> 00:48:29.900
I really have to go get sick.

00:48:29.900 --> 00:48:31.520
It's like three in the afternoon.

00:48:31.520 --> 00:48:32.400
I forgot to eat lunch.

00:48:32.400 --> 00:48:34.680
Like, that's totally possible that that happens, right?

00:48:34.680 --> 00:48:38.700
And I would just want to say that these sessions feel so fulfilling, feel so satisfying.

00:48:38.700 --> 00:48:43.500
You feel like you've gotten so much work done in such a compressed amount of time that, okay,

00:48:43.500 --> 00:48:46.320
and now you can, sure, I can have office hours now.

00:48:46.320 --> 00:48:47.840
So those sessions are really fulfilling.

00:48:47.840 --> 00:48:48.300
Yeah.

00:48:48.300 --> 00:48:49.660
I don't know how this is going to work out.

00:48:49.680 --> 00:48:54.200
But after reading this and some of your other writing, I decided on my calendar, I'm just

00:48:54.200 --> 00:48:57.520
blocking, like, Tuesday and Friday, like, all day.

00:48:57.520 --> 00:48:58.880
And I'm just going to call those maker days.

00:48:58.880 --> 00:48:59.800
We'll see how that works out.

00:48:59.800 --> 00:49:00.000
Wow.

00:49:00.000 --> 00:49:01.400
And if I can just, like, get a lot of stuff done.

00:49:01.400 --> 00:49:03.080
So other days, I'll have more meetings.

00:49:03.080 --> 00:49:03.720
We'll see.

00:49:03.720 --> 00:49:04.160
I don't know.

00:49:04.160 --> 00:49:05.920
I'm looking forward to hearing your experience.

00:49:05.920 --> 00:49:06.700
Yeah, absolutely.

00:49:06.700 --> 00:49:07.540
And Chris May.

00:49:07.540 --> 00:49:08.300
Hey, Chris.

00:49:08.300 --> 00:49:12.800
Out in the live stream says, personal productivity brings superpowers to the powers you got by learning

00:49:12.800 --> 00:49:13.120
Python.

00:49:13.120 --> 00:49:13.820
Totally agree.

00:49:13.820 --> 00:49:14.220
Yep.

00:49:14.500 --> 00:49:14.920
Yeah, awesome.

00:49:14.920 --> 00:49:15.580
Very, very cool.

00:49:15.580 --> 00:49:16.060
All right.

00:49:16.060 --> 00:49:20.040
Well, I think that's probably about all the stuff that we have time to talk about.

00:49:20.040 --> 00:49:25.020
Although, maybe really quickly, you could touch on the bottom of your homepage.

00:49:25.020 --> 00:49:26.240
You've got a bunch of resources.

00:49:26.240 --> 00:49:29.400
Maybe just highlight something you think that people would find valuable there.

00:49:29.400 --> 00:49:33.800
So, yeah, I like to, again, a lot of these answering questions that people ask me.

00:49:33.800 --> 00:49:35.780
So people ask me, you know, what your favorite papers are?

00:49:35.780 --> 00:49:36.800
What paper should I read?

00:49:36.800 --> 00:49:41.780
So that's the second one on the list, Applied ML, where, you know, I try to collect papers on

00:49:41.780 --> 00:49:44.300
real-world machine learning by companies that have implemented it.

00:49:44.380 --> 00:49:45.200
The lessons they learn.

00:49:45.200 --> 00:49:48.500
And, you know, sometimes people ask me, you know, wow, I'm starting to get into this field.

00:49:48.500 --> 00:49:49.360
There's so much to learn.

00:49:49.360 --> 00:49:53.260
And that's the third link there, where I find machine learning surveys, where, you know,

00:49:53.260 --> 00:49:55.660
people summarize what has happened in the past.

00:49:55.660 --> 00:50:00.220
And, of course, you know, people ask me things like, you know, how do you set up your Python

00:50:00.220 --> 00:50:03.660
repo so that you have code reviews and all that automatically or linting?

00:50:03.660 --> 00:50:07.820
So I have things like, you know, the Python collab template or, you know, how to test machine

00:50:07.820 --> 00:50:08.620
learning models.

00:50:08.620 --> 00:50:14.260
And, of course, recently I wrote about how to write machine learning design docs, design documents

00:50:14.260 --> 00:50:15.620
and, of course, I have that as well.

00:50:15.620 --> 00:50:17.940
So, you know, I mean, some of these are Git repos.

00:50:17.940 --> 00:50:19.340
Some of these are just articles.

00:50:19.340 --> 00:50:23.760
And, of course, there's the email course, which is a lot of people ask me, you know, what makes

00:50:23.760 --> 00:50:24.700
an effective data scientist?

00:50:24.700 --> 00:50:28.360
This is the question I ask a lot of my mentors five years ago.

00:50:28.580 --> 00:50:34.420
And I try to summarize the five lessons that I've learned in a short email course where I only

00:50:34.420 --> 00:50:35.800
send you one lesson a day.

00:50:35.800 --> 00:50:39.300
And, of course, there's a short exercise that I hope people will do.

00:50:39.300 --> 00:50:40.880
And that's why I send you one lesson a day.

00:50:40.880 --> 00:50:42.900
And that short exercise maybe takes an hour each.

00:50:42.900 --> 00:50:46.800
And I hope that after this, it sort of opens your mind that, you know, being an effective

00:50:46.800 --> 00:50:52.020
data scientist is beyond coding well, is beyond PhD level research, is beyond math.

00:50:52.020 --> 00:50:52.420
Cool.

00:50:52.420 --> 00:50:53.940
Yeah, that looks really useful.

00:50:54.140 --> 00:50:57.120
And you also have a Papermill-MLFlow.

00:50:57.120 --> 00:50:58.420
What do you think of Papermill?

00:50:58.420 --> 00:51:04.400
I started using this because I wanted to run rapid experimentations in Jupyter notebooks.

00:51:04.400 --> 00:51:08.600
And MLFlow is something that, you know, helps you track your machine learning models.

00:51:08.600 --> 00:51:10.560
Papermill allows you to parameterize.

00:51:10.560 --> 00:51:13.420
At least how I'm using it is I'm parameterizing my Jupyter notebooks.

00:51:13.420 --> 00:51:18.340
By combining both of them, I have a master Jupyter notebook that has all the different params

00:51:18.340 --> 00:51:20.020
and all the different countries and marketplaces.

00:51:20.020 --> 00:51:23.480
I just run that huge Jupyter notebook and all the experiments are logged.

00:51:23.800 --> 00:51:24.940
So I love it.

00:51:24.940 --> 00:51:27.460
So I decided to make it a template that other people can use as well.

00:51:27.460 --> 00:51:29.360
Well, yeah, people can check out all those things.

00:51:29.360 --> 00:51:31.180
Put them in the show notes also on your website.

00:51:31.180 --> 00:51:31.900
All right.

00:51:31.900 --> 00:51:34.620
Well, I think that's probably it for all the time we have.

00:51:34.620 --> 00:51:37.040
So let me ask you the final two questions before you get out of here.

00:51:37.040 --> 00:51:38.540
You've written about two options.

00:51:38.540 --> 00:51:40.320
So I don't know which one you're going to go with here.

00:51:40.320 --> 00:51:44.220
But you're going to write some Python code outside of Jupyter, say, like what text editor

00:51:44.220 --> 00:51:44.640
do you use?

00:51:44.640 --> 00:51:45.580
I have an answer.

00:51:45.580 --> 00:51:47.020
And I'm curious about your answer as well.

00:51:47.020 --> 00:51:49.800
For me, I'm a diehard PyCharm fan.

00:51:49.940 --> 00:51:51.320
I've tried using VS Code.

00:51:51.320 --> 00:51:55.040
Just doesn't feel as snappy, as IntelliSense as you think.

00:51:55.040 --> 00:51:57.120
I've been using VS Code for my JavaScript.

00:51:57.120 --> 00:51:58.580
But what's your take, Michael?

00:51:58.580 --> 00:52:00.180
Should I be using VS Code more?

00:52:00.180 --> 00:52:02.200
Look, I'm a fan of people doing VS Code.

00:52:02.200 --> 00:52:03.560
And I know a lot of people love it.

00:52:03.560 --> 00:52:06.760
The style of PyCharm is exactly, it just fits my brain.

00:52:06.760 --> 00:52:12.300
Like, I feel that it just so perfectly understands the project I'm working on, that it's the right

00:52:12.300 --> 00:52:13.180
tool for me as well.

00:52:13.180 --> 00:52:13.960
That's me.

00:52:13.960 --> 00:52:16.200
And I do some Scala on the side.

00:52:16.200 --> 00:52:18.600
And PyCharm has a Scala sister, which is IntelliJ.

00:52:18.600 --> 00:52:21.660
And it's just me that I'm still a diehard PyCharm fan.

00:52:21.660 --> 00:52:22.560
Yeah, right on.

00:52:22.560 --> 00:52:28.040
And then notable PyPI project or package that something out there, maybe not the most popular,

00:52:28.040 --> 00:52:29.920
but you're like, oh, I found this thing and it was super helpful.

00:52:30.180 --> 00:52:33.300
Well, off the top of my head, I cannot think of anything, honestly.

00:52:33.300 --> 00:52:37.740
But one thing that I love, that I hope people will love, is pytest.

00:52:37.740 --> 00:52:38.260
Yeah.

00:52:38.260 --> 00:52:38.660
All right.

00:52:38.660 --> 00:52:39.680
Yeah, pytest is super good.

00:52:39.680 --> 00:52:41.580
Yeah, let me throw out an example for you.

00:52:41.580 --> 00:52:47.120
Along with this pytest idea, something that I came across recently is great expectations,

00:52:47.120 --> 00:52:51.780
which is kind of like automated testing for the data cleaning and data validation,

00:52:51.780 --> 00:52:54.660
both when you're pulling it in the first time as well as like production.

00:52:54.660 --> 00:52:57.040
So there's a one to build on top of the pytest story.

00:52:57.040 --> 00:52:57.580
Exactly.

00:52:58.140 --> 00:53:02.720
So as of now, the things that are top of my mind, pytest, PyLint, mypy,

00:53:02.720 --> 00:53:06.960
the things that make code manageable and maintainable, I think about that a lot.

00:53:06.960 --> 00:53:07.740
Yeah, fantastic.

00:53:07.740 --> 00:53:08.360
All right.

00:53:08.360 --> 00:53:11.000
Well, that's it for all the stuff we've got to cover.

00:53:11.000 --> 00:53:12.380
Eugene, thank you for being on the show.

00:53:12.380 --> 00:53:13.260
Final call to action.

00:53:13.260 --> 00:53:16.680
People are, maybe they want to get into your writing or they want to start writing and

00:53:16.680 --> 00:53:20.680
thinking more, sort of almost become a developer philosopher type.

00:53:20.680 --> 00:53:21.880
So what advice you got for them?

00:53:21.880 --> 00:53:22.740
Just stop writing.

00:53:22.740 --> 00:53:23.620
Final advice.

00:53:23.620 --> 00:53:24.500
Why write?

00:53:24.500 --> 00:53:27.660
Like, okay, by writing, you put your stuff online and people find you.

00:53:27.660 --> 00:53:29.880
And this is why this podcast even happened, right?

00:53:29.880 --> 00:53:33.400
Michael found me through my writing and we talk about and we find like-minded people.

00:53:33.400 --> 00:53:34.600
That has happened to me.

00:53:34.600 --> 00:53:39.460
I find so many like-minded people talking to me about machine learning and systems and writing.

00:53:39.460 --> 00:53:40.960
And I've made so many new friends.

00:53:40.960 --> 00:53:41.680
Do that.

00:53:41.680 --> 00:53:43.420
And you'll make a lot of new friends online.

00:53:43.420 --> 00:53:44.320
Highly recommend it.

00:53:44.320 --> 00:53:45.360
Yeah, it's great advice.

00:53:45.360 --> 00:53:49.920
I find like stepping just even a tiny bit outside of your comfort zone starts to lead

00:53:49.920 --> 00:53:50.540
to other things.

00:53:50.540 --> 00:53:51.620
And maybe you're not doing writing.

00:53:51.620 --> 00:53:53.460
Maybe you're speaking at a meetup.

00:53:53.460 --> 00:53:56.780
That's even more possible than it used to be because you don't have to travel anymore,

00:53:56.780 --> 00:53:56.980
right?

00:53:56.980 --> 00:53:59.820
You can reach out to meetups that are not next to you and so on.

00:53:59.820 --> 00:54:01.440
All those things make huge differences.

00:54:01.440 --> 00:54:02.060
Definitely.

00:54:02.060 --> 00:54:02.820
Highly recommend.

00:54:02.820 --> 00:54:06.400
And if you write because of this podcast, email me.

00:54:06.400 --> 00:54:07.380
My email is on my website.

00:54:07.380 --> 00:54:09.060
I would love to read what you wrote about.

00:54:09.060 --> 00:54:09.880
Oh, fantastic.

00:54:09.880 --> 00:54:11.500
All right, Eugene, thank you for being on the show.

00:54:11.500 --> 00:54:13.160
It's been really great to chat with you about all this stuff.

00:54:13.320 --> 00:54:13.680
Welcome.

00:54:13.680 --> 00:54:14.380
It's my pleasure.

00:54:14.380 --> 00:54:15.020
Take care.

00:54:15.020 --> 00:54:15.660
Pat, take care.

00:54:15.660 --> 00:54:19.200
This has been another episode of Talk Python To Me.

00:54:19.200 --> 00:54:24.240
Our guest on this episode was Eugene Yan, and it's been brought to you by Retool and Linode.

00:54:24.240 --> 00:54:27.280
Supercharge your developers and power users.

00:54:27.280 --> 00:54:31.960
Let them build and maintain their internal tools quickly and easily with Retool.

00:54:31.960 --> 00:54:36.280
Just visit talkpython.fm/retool and get started today.

00:54:36.280 --> 00:54:41.380
Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

00:54:41.660 --> 00:54:44.760
Develop, deploy, and scale your modern applications faster and easier.

00:54:44.760 --> 00:54:49.720
Visit talkpython.fm/Linode and click the Create Free Account button to get started.

00:54:49.720 --> 00:54:52.040
Be sure to subscribe to the show.

00:54:52.040 --> 00:54:54.820
Open your favorite podcast app and search for Python.

00:54:54.820 --> 00:54:56.120
We should be right at the top.

00:54:56.600 --> 00:55:01.920
You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

00:55:01.920 --> 00:55:05.500
direct RSS feed at /rss on talkpython.fm.

00:55:06.340 --> 00:55:08.920
We're live streaming most of our recordings these days.

00:55:08.920 --> 00:55:13.040
If you want to be part of the show and have your comments featured on the air, be sure to

00:55:13.040 --> 00:55:16.760
subscribe to our YouTube channel at talkpython.fm/youtube.

00:55:16.760 --> 00:55:18.600
This is your host, Michael Kennedy.

00:55:18.600 --> 00:55:19.900
Thanks so much for listening.

00:55:19.900 --> 00:55:21.060
I really appreciate it.

00:55:21.060 --> 00:55:22.960
Now get out there and write some Python code.

00:55:22.960 --> 00:55:43.560
I'll see you next time.

