What ML Can Teach Us About Life: 7 Lessons

Episode #309, published Fri, Mar 26, 2021, recorded Fri, Mar 19, 2021

Episode Deep Dive Links Transcript

Machine learning and data science are full of best practices and important workflows. Can we extrapolate these to our broader lives? Eugene Yan and I give it a shot on this slightly more philosophical episode of Talk Python To Me.

The seven lessons:

1. Data cleaning: Assess what you consume
2. Low vs. high signal data: Seek to disconfirm and update
3. Explore-Exploit: Balance for greater long-term reward
4. Transfer Learning: Books and papers are cheat codes
5. Iterations: Find reps you can tolerate, and iterate fast
6. Overfitting: Focus on intuition and keep learning
7. Ensembling: Diversity is strength

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Eugene Yan transitioned from a background in psychology and behavioral research to a career in data science. He has extensive experience working with large-scale machine learning (ML) systems, most notably at Amazon where he helps build recommendation engines for Kindle. Eugene is also an active writer, sharing insights on applying ML principles to real-life, communicating effectively as a data scientist, and staying curious about emerging technologies and best practices.

What to Know If You’re New to Python

Below are a few key ideas to help new Python developers get the most from this conversation. These topics came up frequently as Eugene and Michael explored data science and machine learning:

Jupyter Notebooks: A popular environment for data exploration and ML experimentation, letting you write and run code in parts (cells).
Testing Culture: Tools like pytest help ensure code correctness and maintainability.
Working with Libraries: Libraries such as NumPy, Pandas, and others (e.g., for ML or data handling) come up often and are crucial when discussing real-world data science.
Iterative Mindset: Whether debugging code or refining ML models, the Python ecosystem encourages experimentation and quick feedback loops.

Key Points and Takeaways

Applying Machine Learning Lessons to Everyday Life Eugene’s main theme is that many workflows and best practices in machine learning have parallels in our personal and professional growth. By drawing from concepts like data cleaning, exploring vs. exploiting, or overfitting, we can see how attention to detail and the willingness to adapt also apply to everyday decisions. Rather than purely technical tips, these lessons offer a more philosophical and practical outlook on continuous improvement and learning.
- Links & Tools:
  - Eugene’s Website
  - Talk Python Training
Data Cleaning: Assess What You Consume “Garbage in, garbage out” applies as much to ML datasets as it does to our own consumption of information and food. Just like you must carefully clean and validate datasets to avoid misleading results, you should be mindful of what news, social media, and content you let influence you. This filter ensures higher “signal” in your life decisions, mirroring the positive outcomes of using accurate and reliable data in ML.
- Links & Tools:
  - Great Expectations (Data Testing)
  - IBM SPSS (mentioned as Eugene’s earlier tool)
Seek High-Signal Data & Disconfirmation ML models (e.g., support vector machines) refine their decision boundaries when they see points that challenge existing assumptions. In life, listening closely to critical feedback can drive growth and improvement. Embracing disconfirming data—rather than avoiding it—helps you refine your beliefs and make better decisions.
- Links & Tools:
  - Support Vector Machines (Concept)
Explore–Exploit Balance In reinforcement learning, you must explore enough to discover better strategies but also exploit known winners. Likewise, we should try new experiences (explore) while also doubling down on known successes (exploit). This framework lets us branch out, then commit to actions that yield the best long-term returns.
- Links & Tools:
  - Reinforcement Learning (Concept)
  - Mentioned examples: Trying new restaurants vs. returning to your favorite one.
Transfer Learning: Books as “Cheat Codes” Just as ML practitioners reuse large pretrained models to quickly tackle new problems, we can “transfer learn” by reading books and research from experts. Rather than reinventing the wheel, benefit from the knowledge that others have already condensed and compiled. This approach jumpstarts your expertise, saving time and effort while deepening your understanding.
- Links & Tools:
  - ImageNet (Concept)
  - Kaggle (Platform)
Iterate Quickly & Embrace Failure Neural networks improve with every epoch, adjusting weights after each pass over the data. Similarly, we need multiple iterations to master skills or produce quality work. Accepting that initial attempts will fail—just like an ML model’s early training—is vital to eventually achieving success.
- Links & Tools:
  - Papermill (Parameterize Jupyter Notebooks)
  - MLflow (Experiment Tracking)
Overfitting: Don’t Just Memorize—Build Intuition Overfitted models memorize data rather than truly learning its patterns. This pitfall is a reminder that genuine understanding beats rote memorization. In personal or professional contexts, building intuition ensures adaptability and creativity when facing unfamiliar problems.
- Links & Tools:
  - Richard Feynman’s Learning Philosophy
Ensembling: Diversity as a Strength Combining diverse ML models (ensembles) often yields better predictive power than any single approach. In human terms, teams or personal skill sets that reflect varied backgrounds and ideas tend to innovate more effectively. Encouraging cognitive diversity can lead to breakthroughs neither one individual nor one perspective could achieve alone.
- Links & Tools:
  - Random Forests (Ensemble Method)
Maker vs. Manager Schedule Borrowing from Paul Graham’s essay, Eugene highlights that developers need long, uninterrupted blocks of time (maker schedule) for deep focus. Managers, on the other hand, often move between short meetings. Balancing these styles helps maintain productivity, especially for complex tasks like coding and ML experimentation.
- Links & Tools:
  - Paul Graham’s Essay: “Maker’s Schedule, Manager’s Schedule”
The Power of Writing for Learning & Communication Eugene credits writing with sharpening his thinking and accelerating his growth in data science. Communicating ideas in writing forces clarity, surfacing gaps in understanding that might be hidden otherwise. By documenting designs or lessons learned, you not only help others but also internalize concepts more deeply yourself.

Links & Tools:
- PyCharm (IDE)
- VS Code (Editor)

Non-Traditional Backgrounds in Tech Eugene came from a psychology background, illustrating how diverse skill sets can thrive in Python and data science. Different perspectives can drive innovative solutions, and the Python ecosystem is accessible enough for individuals from a wide range of disciplines. Don’t be deterred if you lack a strict computer science degree—persistent learning and curiosity pay off.

Interesting Quotes and Stories

Jeff Bezos on Anecdotes vs. Data: Highlighted that if data disagrees with one’s direct anecdotes or experiences, it can be a sign to investigate deeper.

Matthew McConaughey’s Father’s Advice: When he pivoted from law school to film, his father told him, “Don’t half-ass it,” serving as a reminder to commit wholeheartedly once you choose a path.

Tony Robbins on Guarding Your Mind: Eugene references the idea of being the guardian of your own mind—what you consume shapes your perspective, similar to how poor data corrupts ML models.

Key Definitions and Terms

Overfitting: When a model memorizes the training data rather than learning generalized patterns, leading to poor performance on new data.
Reinforcement Learning (RL): An ML paradigm in which agents learn optimal behaviors through rewards and penalties within an environment.
Transfer Learning: Using a pretrained model on one task as a starting point for a different but related task, reducing the data and time needed.
Ensemble Methods: Techniques combining multiple models (like decision trees or neural nets) to improve predictive performance compared to any single model.

Learning Resources

If you’re looking to expand your Python journey or deepen your testing and data science skills, here are a couple of suggestions:

Python for Absolute Beginners: A thorough introduction covering the fundamentals of Python to help you grow confident with the language quickly.
Python Data Visualization: Useful if you want to chart energy usage, create graphs of renewables vs. fossil fuels, or explore advanced plotting techniques.

Overall Takeaway

Eugene’s story and advice illustrate how core concepts from machine learning can guide our day-to-day decisions, career paths, and personal growth. By applying principles such as data curation, focused iteration, critical feedback, and diverse collaboration, we can thrive both technically and personally in the ever-evolving world of Python and data science.

Links from the show

Eugene Yan: @eugeneyan
What Machine Learning Can Teach Us About Life - 7 Lessons article: eugeneyan.com

Maker's schedule vs. manager's schedule: paulgraham.com
Naval Podcast: overcast.fm
How to Write Better with The Why, What, How Framework https://eugeneyan.com/writing/writing-docs-why-what-how/
Resources mentioned towards the end of the podcast: eugeneyan.com/resources

New media example - Metal song decomposed by classical musicians
Opera singer: youtube.com
Composer music: youtube.com

YouTube Live Stream: youtube.com
PyCon Ticket Giveaway: talkpython.fm/pycon2021
Episode #309 deep-dive: talkpython.fm/309
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #309 deep-dive: talkpython.fm/309

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Machine learning and data science are full of best practices and important workflows.

00:03 Can we extrapolate these to our broader lives?

00:06 Eugene Yann and I give it a shot on this slightly more philosophical episode of Talk Python To Me.

00:11 This is episode 309, recorded March 19th, 2021.

00:16 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:36 This is your host, Michael Kennedy.

00:38 Follow me on Twitter where I'm @mkennedy.

00:40 And keep up with the show and listen to past episodes at talkpython.fm.

00:44 And follow the show on Twitter via at talkpython.

00:47 This episode is brought to you by Retool and Linode.

00:50 Please check out what they're offering during their segments.

00:52 It really helps support the show.

00:53 We'll be giving away five tickets to attend PyCon US 2021.

00:58 This conference is one of the primary sources of funding for the PSF.

01:02 And it's going to be held May 14th to 15th online.

01:05 And because it's online this year, it's open to anyone around the world.

01:09 So we decided to run a contest to help people, especially those who have never been part of PyCon before, attend it this year.

01:15 Just visit talkpython.fm/PyCon 2021 and enter your email address and you'll be in the running for an individual PyCon ticket.

01:23 Compliments of Talk Python.

01:25 These normally sell for about $100 each.

01:28 And if you're certain you want to go, I encourage you to visit the PyCon website, get a ticket, and that money will go to support the PSF and the Python community.

01:35 Congratulations to Ron Lee.

01:37 He won number three of the five tickets that were given away.

01:40 And there's still more chances to win.

01:41 If you want to be in this drawing, just visit talkpython.fm/PyCon 2021.

01:45 Enter your email address.

01:47 You'll be in the running to win a ticket.

01:49 Now let's get on to that interview.

01:52 Eugene, welcome to Talk Python To Me.

01:54 Thank you.

01:54 Yeah.

01:55 It's great to have you here.

01:56 I love getting down into the details of programming and writing code and working with APIs and building amazing things.

02:02 But it's also really interesting to sort of step back and take a big picture view of the world of the software.

02:08 And you wrote a really interesting article about applying some of the lessons you might learn from code back to your life.

02:14 And I really enjoyed the idea of it.

02:15 So I'm looking forward to talking to you about it on this show.

02:17 Thank you.

02:17 Happy to chat more about it as well.

02:19 Yeah.

02:19 It's going to be super fun.

02:20 Now, before we get into the details of all that, you know, let's start with your story.

02:24 How did you get into programming and Python?

02:26 My degree is in psychology.

02:27 So since then, I've been really interested in understanding, you know, how people behave, why they think the way they think, and how information changes their perceptions and behavior.

02:36 You know, I'll do this.

02:37 I used to run experiments and analyze the data in SPSS and Excel.

02:41 But eventually, the data just got bigger and bigger.

02:44 So I moved on to R.

02:45 And now I'm using Python.

02:46 So that's how it happened.

02:48 I mainly use Python because I have problems to solve that require me to process data with Python.

02:54 Yeah, really cool.

02:54 One of my very first programming jobs was at this incredibly cool place where it started out as a research lab and then it spun out of the university to a startup.

03:03 And the whole premise of what they did was to use eye tracking, not some phone thing, but like where you're looking to understand how people solve problems and how they think and so on.

03:13 And it was mostly a bunch of PhD cognitive science folks.

03:17 And they would work in MATLAB and Excel and all that stuff.

03:19 And I would help write software that would take that stuff and turn it into products and turn it into automation and whatnot.

03:25 And it's a really interesting world.

03:27 And there's a lot more opportunities for code and solving problems with code, especially on the data analysis side in psychology.

03:33 Fully agree.

03:34 Then you might first think, right?

03:36 You think, oh, psychology, that's talking to people on the couch.

03:38 Like sometimes, but not always.

03:40 Not a lot of the time.

03:41 You're right.

03:42 A lot of it is running experiments.

03:43 And like, I'm sure they collected data about eye tracking and you have to process the data.

03:47 And that happens in real life as well.

03:49 We run experiments, A-B testing, and we have to analyze the data.

03:53 Yeah, that was it.

03:54 We would run tons of experiments.

03:55 We would have maybe a week where 50 people came in.

03:58 We'd have like one-way mirrors and all sorts of recording equipment and like analyze that.

04:03 They'd even put ads, I think, in Craigslist.

04:06 All sorts of places.

04:07 Like, hey, we need somebody to come like surf on these websites for half an hour.

04:10 We'll pay you 50 bucks.

04:11 Can you come do that during lunch?

04:12 Be like, yeah, I'll do that.

04:13 That's really cool.

04:14 Yeah, it was really fun.

04:15 But it was really neat to do the programming and stuff there.

04:17 So tell me a little bit more about that transition, because it must have been a little bit challenging, right?

04:22 There's not a lot in your traditional education of psychology that teaches you programming.

04:28 Maybe a little bit, but not a ton, right?

04:29 No, actually, in my traditional education, for psychology, we use SPSS and R.

04:34 SPSS is this IBM product proprietary.

04:37 So nothing about it.

04:38 I did know a bit about the statistics and how to work with data, but that's about it.

04:43 So I think back then, when I first started getting my first job, I had to learn a lot of Python and SQL on my own.

04:51 And, you know, that was the time when, you know, Coursera is available.

04:53 And Coursera is a lifesaver.

04:55 If you ask me, I learned all my Python and data science stuff from Coursera.

04:58 So it was after work, spending one or two hours doing the courses and doing the lessons, the hands-on exercises were amazing.

05:06 And today and nowadays, you have so much available Python resources to help you quickly pick up, get something working, iterate on something, fix the bugs, and, you know, have something that you can play with.

05:15 And it just makes it so fun to learn Python now.

05:17 Yeah.

05:18 There's so much to know in programming these days.

05:20 But at the same time, there's so many resources out there to help you.

05:23 It's really a huge benefit.

05:25 I remember when MOOCs came on, as they called them, like these courses with many, many, many people in them.

05:31 And it was such a revolution at the time.

05:33 And now it's just one of the million options, you know?

05:36 Yeah, I agree.

05:37 I think MOOCs really the great equalizer.

05:39 I mean, education is the great equalizer.

05:41 And MOOCs, by making it freely available across the internet with great educators, can just teach anyone and not confine to a classroom.

05:47 I think that was amazing.

05:48 And that was what helped me transition from SPSS and R to Python and Spark and machine learning.

05:54 Yeah.

05:55 It opened that door for me.

05:56 Yeah, yeah.

05:57 Instead of studying neurons, you're studying neural networks now.

06:00 Yeah.

06:00 Nice.

06:01 So how about now?

06:02 What do you do day to day?

06:03 You just made a big life change.

06:05 I was reading that you recently moved from Singapore to Seattle.

06:08 That's a big change, even though not a whole lot in between those two things.

06:11 If you draw a line, just a lot of water.

06:13 But yeah, it's still a big change in terms of jobs and whatnot.

06:15 Tell us about that.

06:16 So I think towards the end of 2019, I think, or maybe in the middle of 2019, my wife and I were thinking of, you know, stepping off our comfort zone.

06:23 I don't know how many of your listeners have been to Singapore, but it's a beautiful place.

06:28 It's a very comfortable, tiny island, amazing weather.

06:31 But we thought, you know, we like to travel.

06:33 So we thought, hey, no, let's try to live somewhere for a bit.

06:35 So we looked at a couple of places, a couple of tech hubs, Seattle, San Francisco, Berlin, Shanghai.

06:42 And it so happens that I got an offer from Amazon.

06:46 Nice.

06:46 Yeah.

06:47 So that's where I am now.

06:48 I'm an applied scientist at Amazon.

06:49 And so what I do, I'm part of the Kindle team.

06:53 So what I do in my day-to-day job is I try to help people read more.

06:57 And we try to do this by helping them find books that they need to find.

07:00 So what I do is I work on recommendation engines, try to help people, you know, as you're finding a book or based on what you have read to the extent that you have read it.

07:08 We know this because we have Kindle data.

07:10 We know how much of a book we have completed.

07:11 We can recommend you new books that you might be interested in.

07:14 Or based on what you browse on the website, we can update our recommendations in real time and recommend you new books as well.

07:20 So that's what I do in my day-to-day job.

07:22 Yeah.

07:22 Well, that sounds really fun.

07:23 I actually really appreciate that from Amazon, specifically around the Kindle, to be honest.

07:28 Like a lot of stores are like, oh, you might also like that.

07:31 I'm like, no, I don't also like that.

07:32 I really don't care.

07:33 You know, there's just so many things where I'm shopping online or whatever, and it just doesn't make sense.

07:37 But specifically for Kindle, I'll be reading a book and say you should also check out these other ones.

07:41 Like a lot of times, like my next book is out of that list.

07:44 I really love it.

07:45 So can't go browse a bookstore these days.

07:47 I mean, we have a fantastic bookstore called Powell's.

07:49 It's like one of the largest bookstores around here in Portland, but you can't even go to it anymore.

07:54 So, yeah, it's pretty cool.

07:55 Thanks for that good list there.

07:56 Yeah, you and your folks.

07:57 I'm happy to hear that.

07:58 And this is a great place where what I'm doing is very aligned with what I do,

08:01 which is trying to help people, very aligned with my values, which is trying to help people learn more by reading more books.

08:07 So that's great.

08:08 Yeah, absolutely.

08:08 And you talk a lot about writing.

08:10 And well, I guess not just talk.

08:12 You also put your time and energy where your words are, right?

08:16 You do a lot of writing.

08:18 And the reason that I wanted to talk to you about having you on the show, like I said, is one of these articles that you wrote.

08:23 But there's a whole bunch of them that you have.

08:26 So there's a couple of people who are really successful in the tech field that I can think of who really make writing an important part of what they're doing.

08:33 Like I had Jesse Giroud Davis on the podcast at the time.

08:38 He's from MongoDB.

08:38 I think he still is.

08:39 Anyway, he talks about like all these design patterns of technical writing, right?

08:43 Like here's the how to, here's the first and like all these really interesting things.

08:47 And I kind of get that vibe from you as well, that like you've got these really strong ideas about writing and how it reinforces your programming side of the world.

08:56 What's your thoughts on that?

08:58 Well, I guess maybe just to share a bit of backstory on why I started writing so much.

09:02 I think a couple of years ago, I was interviewing a couple of mentors, informal mentors, right?

09:08 They didn't know I was getting them to be a mentor, but I would just reach out to people who are two to three steps ahead of me.

09:13 Heads of data science, CTOs, lead data scientists.

09:15 I would ask them the same question.

09:17 What makes an effective data scientist?

09:20 And I would say, you know, is it understanding the business domain deeply or is it PhD research level skills, ninja hacking skills, Python or C++ or Java?

09:29 And a lot of them said, actually, you know, all that is really important, but there's one thing that you're missing out.

09:34 And this one thing is transferable across all your entire career.

09:37 And that one thing that said is communication.

09:39 At that point in time, I didn't believe it.

09:41 I was immature.

09:41 I didn't think it was, I didn't think it makes sense, but I thought so many of these people tried it.

09:45 I owe it to them to, so many of these people mentioned, I owe it to them to try it and update it on my program.

09:50 So for one year, I said, I'm just going to volunteer for whatever writing opportunity, whatever speaking opportunity the company has.

09:57 So I started writing about my side projects.

09:59 I started writing an internal company newsletter to share about what our team, our data science team was doing.

10:05 And I started speaking at conferences and meetups.

10:07 And so that's why I've been started writing.

10:09 And since then, I found that writing actually helps me learn a lot because I would write about something and I realized, hey, you know, I don't know anything about this thing I'm writing about.

10:16 Yeah, exactly.

10:17 You think, you know, you've learned enough to sort of have some thoughts or, you know, in the programming space, you can maybe get something to work.

10:23 Right.

10:24 But if you've got to explain it, all of a sudden, it's not enough to know, well, there's two ways and I'll just use that way.

10:29 You've got to know, well, here's the two ways.

10:30 What are the trade-offs?

10:32 When should I use which?

10:33 Like you've got to dive into these sorts of details when you're either writing or speaking or presenting about it.

10:38 And it just forces you another level down in your understanding and depth, right?

10:43 Exactly.

10:43 Writing is really difficult.

10:45 And truth be told, I think a lot of people say, you know, you must really love writing.

10:48 No, actually, I don't really love writing, which is a bit weird.

10:51 I love learning and I love sharing about what I've learned online.

10:56 And writing is my vehicle to let me do that.

10:58 So that's why I started writing.

10:59 So why do I talk a lot about writing at work or writing in general?

11:03 There's this question I ask myself a lot.

11:05 I used to ask myself this question and now a lot of people ask me this question.

11:08 Hey, you know, as a data scientist, is your job to write code or is your job to write documents?

11:13 And I used to think my job was to write code, to build systems, to help customers.

11:17 But then as I'm doing this more, I find that, hey, you know, a lot of times before writing code,

11:21 I need to spend a lot of time thinking, researching and designing.

11:25 And the medium of that work is writing documents.

11:29 That's why I encourage a lot of people, you know, it's just like what you said.

11:32 Do I serve my recommendations via Redis cache or via Lambda or API Gateway or whatever?

11:37 And I'm thinking through all these designs and I need to make trade-offs.

11:40 And writing in a document the pros and cons and the rationale for the decision forces me to do that.

11:45 So that when we start implementing things, we don't do things the wrong way,

11:50 which is really expensive, right?

11:51 Implementation is expensive.

11:52 So that's why, and I find that it has really helped me a lot.

11:56 I don't feel that enough people talk about it.

11:59 And I want to encourage people to write a lot.

12:01 Well, yeah, I agree.

12:02 And it seems to me like one of the fundamental jobs of a data scientist, at least one branch of data science,

12:09 is to take the raw information, think about it, and then communicate what it means, right?

12:14 And that seems to go really hand in hand with what you're saying.

12:18 Exactly.

12:18 I think I was doing this one year of writing and speaking practice, right?

12:22 At work.

12:23 And then in the past, I had people come up to me and say that, you know, Eugene, every time I have a meeting with the data scientist on your team for half an hour,

12:31 and then I work with not knowing anything.

12:33 I don't know what they spoke about.

12:34 And that's because I think, by and large, most people, you know, we tend to use jargon like AUC, ROC, distributed, all that.

12:41 And we assume that business people will know it.

12:43 But I made this mistake as well.

12:45 And then one day, I had a great boss who brought me aside, you know, Eugene, the way you're communicating, no one understands you.

12:50 And I asked, what do you mean?

12:51 And he gave me very good feedback.

12:53 He was a great boss.

12:54 And I started changing how I communicated things.

12:57 And that made me a lot more effective at work.

12:59 So that's how I get started on that.

13:02 Oh, fantastic.

13:03 Well, I really like what you've done with the writing and stuff.

13:05 And so maybe let's spend some time diving into the one that I think is probably the centerpiece of what we'll talk about.

13:12 We'll touch on a few other ones because you do a really good job in your writing of not just putting down your thoughts, but bringing other people's ideas and influences in there.

13:21 So you have a lot of quotes.

13:22 You have a lot of references to other things and so on.

13:26 So I really like the way, the style.

13:28 So let's talk about your article, What Machine Learning Can Teach Us About Life?

13:33 Seven Lessons.

13:33 Why did you write this?

13:34 Because when I was writing some of my previous articles, okay, this is an odd thing.

13:39 I know that my audience are machine learning practitioners.

13:42 And, you know, sometimes I write about things that sometimes I want to write about things that I know will not interest them.

13:47 So this is one of those.

13:49 I like to write about life lessons, right?

13:51 And I know it will not interest them, but I think it's really important.

13:54 I want to write about that anyway.

13:55 So, you know, in order to sneak it in, in this case, machine learning is really just a Trojan horse.

14:00 Yeah.

14:01 Machine learning is a Trojan horse where I sneak in these life lessons that I found really helpful for me.

14:05 And I just, upon some reflection, I just want to write about it.

14:09 And that's it.

14:09 That's how this article came about.

14:11 I love these Trojan horse ideas.

14:13 Like, oh, I'm going to teach you.

14:14 I'll do something fun.

14:15 I'll actually have a lesson, right?

14:16 Yeah.

14:17 And now the secret's out.

14:18 Yeah.

14:18 I haven't actually shared that with anyone yet.

14:19 Now they're going to know.

14:21 That's right.

14:24 This portion of Talk Python To Me is brought to you by Retool.

14:26 Do you really need a full dev team to build that simple internal app at your company?

14:30 I'm talking about those back office apps.

14:33 The tool your customer service team uses to access your database.

14:36 That S3 uploader you built last year for the marketing team.

14:39 The quick admin panel that lets you monitor key KPIs or maybe even the tool your data science team hacked together so they could provide custom ad spend insights.

14:47 Literally every type of business relies on these internal tools.

14:51 But not many engineers love building these tools, let alone get excited about maintaining or supporting them over time.

14:57 They eventually fall into the please don't touch it, it's working category of apps.

15:02 And here's where Retool comes in.

15:04 Companies like DoorDash, Brex, Plaid, and even Amazon use Retool to build internal tools super fast.

15:11 The idea is that almost all internal tools look the same.

15:13 Forms over data.

15:15 They're made up of tables, drop-downs, buttons, text input, and so on.

15:18 Retool gives you a point, click, and drag and drop interface that makes it super simple to build internal UIs like this in hours, not days.

15:27 Retool can connect to any database or API.

15:30 Want to pull data from Postgres?

15:32 Just write a SQL query and drag the table onto your canvas.

15:35 Search across those fields, add a search input bar and update your query.

15:39 Save it, share it, super easy.

15:41 Retool is built by engineers, explicitly for engineers.

15:45 It can be set up to run on-prem in about 15 minutes using Docker, Kubernetes, or Heroku.

15:50 Get started with Retool today.

15:52 Just visit talkpython.fm/retool or click the Retool link in your podcast player show notes.

15:58 So let's go through the seven lessons.

16:01 The first one here is data cleaning.

16:04 Assess what you consume.

16:06 And I'm a big fan of this idea as a life lesson as well.

16:10 So what's the story?

16:11 Wait, before you get into that, like every one of these lessons, you start by saying, okay, here's the machine learning meaning.

16:17 And then here's the sort of follow-on lesson, right?

16:20 So what's the machine learning lesson of data cleaning?

16:22 I think the machine learning lesson of data cleaning is most machine learning practitioners would know when you use noisy data, your machine learning model is going to be noisy and it's just not going to work.

16:31 I think that's this cliche, garbage in, garbage out.

16:34 In machine learning world, that is absolutely true.

16:36 Yeah.

16:37 And cleaning the data itself is actually most of the work in terms of training your model, cleaning the data, refining it, making the model be able to learn from it.

16:45 So that's really important in the machine learning world.

16:47 Yeah, absolutely.

16:48 You have a really interesting quote in here.

16:50 And you also have some really fun pictures, which we can talk about.

16:53 Where is it in here?

16:54 It was Randy Owl shares, data cleaning isn't the grunt work.

16:59 It is the work, right?

17:00 I think so many of these things in a lot of these machine learning and like scientific programming sides, you have these amazing libraries, right?

17:08 Like you can pip install in TensorFlow or whatever.

17:11 And then you just feed the data, you know, you got your data frame, you feed it over, boom.

17:14 And like magic happens, right?

17:15 You've got to get that data.

17:17 You've got to like format the data.

17:18 You've got to convert the data.

17:19 Like that's, you've got to understand that it's all correct, right?

17:22 There's cool libraries, like great expectations that are like sort of unit tests for your data to make sure you don't feed in bad data and all that.

17:28 But yeah, I mean, I agree with that statement a lot.

17:30 That's pretty neat.

17:30 Thank you.

17:31 And I linked to Randy Owl's post, which is a very good post.

17:34 I think he wrote that, of course, I might be putting words into his mouth, but I think he wrote that because a lot of people think that data preparation is not sexy.

17:42 Their cleaning is just grunt work.

17:43 But, and he's trying to get people to remind people, no, actually it is the work.

17:47 It is what actually makes a big difference in your analysis outcomes, in your machine learning outcomes.

17:51 And I fully agree.

17:52 Yeah, absolutely.

17:53 All right.

17:53 So life lesson from this one, what's the parallel life lesson you were trying to draw?

17:57 Well, the main life lesson that I was trying to draw is actually the one below that image.

18:02 Okay.

18:03 So you have a really fantastic image and you talk a little bit about food first before you get into what I think is more important.

18:09 Although food is not unimportant.

18:10 But we've all seen these horrible pictures of like the 50s and 60s of doctors recommending cigarettes.

18:16 Like my doctor recommends Camel, not Marlboro.

18:19 You're like, oh my God, what is this?

18:20 But there's like this really similar one for sugar of like all these like, here's like, how do you, you don't get overweight.

18:26 You eat your sugar.

18:27 So you don't eat fatty food.

18:28 Oh my gosh.

18:29 Anyway, so yeah, that's one is food, which is really interesting.

18:32 But then more importantly is news, right?

18:34 And information.

18:35 So if you could just scroll up to the image, just in this image, right?

18:38 We see that food is important and bad food actually makes you bad, makes you unhealthy.

18:43 But in this image, this image is both bad food and bad information.

18:47 There's so much bad information out there on Twitter, on Facebook or social media.

18:52 And you really have to be careful about what you consume, right?

18:55 Misinformation.

18:56 I mean, it is really easy to consume, you know, this small 200 character tweets or, you know, a small empty calorie.

19:04 I like to call them empty calories on social media.

19:06 You know, there's empty calorie info bites, you call them.

19:09 That's great.

19:09 Yeah.

19:10 There's some influencers who post really short things and, you know, they just go viral.

19:14 And if you consume a lot of that, it actually, you think about, hey, did that actually change my life?

19:19 No, it's actually kind of empty calories.

19:21 And a lot of good writers, David Perel writes about this.

19:24 It's like, you want your content to be very niche, very deep, very high in nutrition.

19:28 So I think that's the same thing.

19:30 When you consume light content, you don't actually gain a lot.

19:34 It can be actually downright toxic.

19:35 A lot of times it's light content.

19:36 It's just one statement, you know, trying to sway information, sway the public.

19:39 So I think for your own sake, curate what you consume.

19:42 I agree.

19:43 I mean, it's so important.

19:44 And there's so many knock on effects, right?

19:46 And it also has some interesting machine learning tie-ins.

19:49 I would say here is some recommendation engine tie-ins.

19:52 Yes, definitely.

19:53 I think maybe it's, I'm not 100% sure about the attribution, but I think Tony Robbins said,

19:58 you've got to be the guardian of your own mind, right?

20:00 You've got to consciously decide what you let in, what thoughts you let influence you and

20:05 what ones you just eject and say, this doesn't matter.

20:07 Because what you think can really affect you.

20:11 But I was thinking even more, like if I go onto YouTube and I start to consume something

20:15 that's a little kind of crummy, but whatever.

20:17 The very next thing is you're never intense enough for YouTube.

20:21 If you watch three videos on one topic, like it's like you need a hundred more of these

20:25 videos, right?

20:25 It's just, with a lot of social media, and I don't use Facebook enough, but I can imagine

20:29 it's similar.

20:30 Like as you start to trend even a little bit in one way or the other, it just throws you

20:35 a huge rope and tries to pull you hard for the sake of engagement down that path.

20:40 And so small curves these days seem to matter a lot more.

20:43 Like if you used to like grab, I don't know, some crummy newspaper, like a rumor newspaper

20:48 at the grocery store and read it.

20:49 Then you went back and you read the New York Times and you got home.

20:51 The New York Times wouldn't just stop showing you the important news.

20:54 It would show you all sorts of junk because you read the, but nowadays that's what happens.

20:58 It's crazy.

20:59 It is.

21:00 I think a lot of it is because of just how machine learning works.

21:03 If you click on something and you read something, it thinks that you like that thing and it recommends

21:07 you more of that thing, which sometimes it's just a, you misclick or something and sometimes

21:12 it's a mistake.

21:13 And that's why machine learning and social media can sort of polarize people, which is not what

21:17 we want to do.

21:18 So it really takes, you need to be conscious about how you're affected by it.

21:22 Right.

21:22 Obviously your article is not about this, but we're going to have to reckon with this as

21:25 a society.

21:26 Definitely.

21:26 Yes.

21:27 Period.

21:27 In a big way.

21:28 But let's go on to lesson number two, low versus high signal data and seeking to disconfirm

21:34 an update.

21:35 Tell us about this one.

21:36 So maybe I'll start with the machine learning aspect for it.

21:38 I think what this here is a support vector machine.

21:41 So, you know, you're trying to separate the blue dots from the red dots.

21:44 And, you know, on the leftmost image, I mean, it's very easy to separate, right?

21:47 So you can see the margin is very wide.

21:49 The margin is the dash line, the distance from the dash line to the solid line.

21:53 On the middle image, all of a sudden we introduce a new red dot and the margin becomes very narrow.

21:59 So, you know, your certainty is a lot less and you start to think, hey, you know, maybe

22:02 I'm less certain.

22:02 And then on the right, and then you start to collect more information, more data points

22:06 around that, around that.

22:07 And then all of a sudden your margin becomes a curved margin.

22:09 So what you thought was true and you were very certain about it on the left side of the

22:15 image now suddenly changes to the right.

22:18 Yeah.

22:18 You get a little bit more information and you're like, oh, this is not the dividing line or

22:22 the distinction at all.

22:23 It's totally more nuanced or whatever.

22:25 Yeah.

22:26 Exactly.

22:26 So I know I shouldn't be referring to the image because this is a podcast, but I think

22:30 the image is really powerful in terms of how in machine learning, you change your decision

22:34 boundaries.

22:35 And in real life, this is the same.

22:37 So I think Jeff Bezos has a very powerful quote that says, I think something like this

22:41 that says that, I actually didn't put it here, but I'm just reminded of it that says that,

22:45 hey, you know, when data disagrees with anecdotes, he tends to prefer the anecdotes because

22:51 a lot of times it sort of means that maybe you're measuring things wrongly.

22:55 So data is very easy to collect.

22:57 We have a lot of data, but whereas anecdotes is one or two of those feedback points that

23:01 disagrees and we sort of need to jump into an anecdote, collect more data around it.

23:05 So I think this is the same thing.

23:07 And when you're asking for feedback, often people give you good feedback, you know, you're

23:10 doing great, continuing what you're doing, you're doing fantastic.

23:13 And when people give you bad feedback, dig into it, right?

23:17 That is a gift for you to improve.

23:19 Dig into it.

23:20 Ask them, hey, you know, I love what you just said.

23:22 Can you give me more detail?

23:23 And so it's giving you more detail, more information on how you should be thinking, how you should

23:28 be designing, how your code should change in your code reviews.

23:31 And it helps you grow.

23:33 Yeah.

23:33 I think especially in the US, people are very uncomfortable with negative feedback, both

23:37 giving and receiving it.

23:38 If it's given in the right way, it can be very valuable.

23:41 I mean, maybe listening to like what people put on your YouTube video, that might not be

23:45 all that constructive.

23:46 There's a lot of weird, just people with issues at scale.

23:49 But I totally agree.

23:50 And, you know, some more close space, right?

23:52 Like a code review or something, right?

23:54 That's definitely an opportunity to learn something.

23:56 Even if the person is wrong, it's still an opportunity to learn their perspective.

23:59 Definitely.

24:00 And learn from it, right?

24:01 So one of the quotes you have in the section is by Karl Popper.

24:04 True ignorance is not the absence of knowledge, but the refusal to acquire it.

24:09 And I think that also goes hand in hand with the polarization and stuff you talked about

24:12 before.

24:12 I agree.

24:13 I think that I adopt the growth mindset.

24:15 I think that people can change, can grow.

24:17 And that's what's necessary, right?

24:18 In our industry, right?

24:19 Things change so fast.

24:20 Yeah.

24:21 So we have to try to keep up with the times in terms of the technology.

24:24 I think fundamentally, we should focus on the problems, but don't neglect how technology

24:28 is changing as well.

24:29 For sure.

24:29 All right.

24:30 Number three, explore, exploit.

24:32 Balance for the greater long-term reward.

24:34 It has to do with reinforcement learning, right?

24:36 Yeah.

24:37 So in reinforcement learning, well, you can imagine that at a start, you don't know anything

24:41 about the state of the world, right?

24:42 Let's say you have two, this is an example for me.

24:46 Let's say you have a restaurant that you love.

24:48 Let's say you first landed in Seattle.

24:49 When I first landed in Seattle, I didn't know where to eat.

24:51 I didn't know what was good.

24:52 And I would explore many different restaurants, many different takeouts.

24:56 And after exploring, after maybe I explored 10 of them, I found that, hey, you know, this

25:01 place is great.

25:01 It's cheap.

25:02 It's nearby.

25:03 The food is solid.

25:03 And I would just exploit it all the time.

25:06 Exactly.

25:06 Like in my neighborhood, there's maybe two or three Thai food restaurants that I always

25:11 go to.

25:11 And if somebody says, hey, let's get some Thai food, I'm not going to go, well, there's

25:14 another one we haven't tried yet.

25:15 Let's go to, it's just like, nope, that one's good.

25:17 We're going here.

25:18 You know what I mean?

25:18 I am exactly like that.

25:20 I'm such a lazy thinker.

25:21 Yeah.

25:21 And I just like, this is really good.

25:24 It's my appetite.

25:25 It fits my taste.

25:26 And it's cheap.

25:27 Why do we have to try something new?

25:28 But thankfully, my wife is an explorer.

25:31 So while I'm exploiting, she encourages me, you know, let's explore these new things.

25:36 And sometimes I find treasure gems that I would never have tried.

25:40 But because she's encouraging me to explore, I found it.

25:44 Yeah.

25:44 So that's a balance.

25:46 So at the start of your career or at the start when you're trying to solve a new problem,

25:49 explore.

25:50 Take some time to explore as much as you can.

25:52 But then once you find it, exploit.

25:54 But you know, as you're exploiting, don't forget to also be exploring a bit.

25:58 Yeah.

25:58 So you tie this back to careers a lot about basically, as you sort of touched on, continuous

26:03 learning and, you know, don't get too comfortable and it just go with, don't just fall all the

26:09 way into the exploiting side.

26:11 Like I went to college, I got my engineering degree.

26:13 Why do I need to learn Jupyter?

26:15 I'm just going to keep using MATLAB or Excel and we're just going to keep working on this

26:19 building or bridge or whatever.

26:20 Right.

26:20 Like I think it's easy to do that.

26:22 I spent the time and the money and worked hard and got good grades and a four year degree.

26:26 Why do I need, I'm done with tests and learning.

26:29 I can just stop.

26:29 Right.

26:30 Exactly.

26:30 I was just going to say this exact same, a very similar example.

26:34 I studied SPSS and R in college.

26:36 Why do I need to learn Python?

26:37 Right.

26:38 Well, you know, Python has a lot of benefits.

26:39 It's a lot faster.

26:40 And then, you know, now that I learned Python, I know SQL, why do I need to learn Spark?

26:45 But, you know, if you explore a bit, it can really help you make your work a lot more

26:50 effective.

26:50 Maybe I know decision trees and linear regressions.

26:52 Do I really need neural networks?

26:54 But, you know, it's always taking some time to explore about, explore what might be new.

26:58 Sometimes some of this exploration doesn't work out and that's fine.

27:02 If you think of it from a learning perspective, you're learning a lot, but take the time to

27:05 sample around every now and then.

27:07 Yeah, absolutely.

27:07 You have a couple of great quotes in here from two people.

27:11 I find both of them very interesting.

27:12 Naval, Silicon Valley guy that just goes by Naval, N-A-V-A-L.

27:17 And he has a really interesting wheat storm that turned into a podcast series and some interesting

27:23 thinking.

27:23 Are you familiar with this?

27:24 Oh, yes.

27:24 I love that.

27:25 Yeah, I do too, actually.

27:26 It may be reading your articles and stuff.

27:28 It reminded me of that.

27:29 But he says your goal in life is to find out the people who need you the most, to find

27:33 out the businesses that need you the most, to find the projects and the art that need

27:37 you the most.

27:38 There's something out there for you.

27:39 And then Matthew McConaughey talks about when he was going, he was in law school and decided

27:45 to go into film school, which is obviously a big career switch, somebody.

27:49 And he's like, no, no, I have to do this.

27:50 Right.

27:50 And he was afraid of what his dad would say.

27:52 And instead of saying you've thrown away your career or whatever, he just said, don't

27:57 half-ass it.

27:57 Like, if you're going to do this, you better go do it.

27:59 Right.

28:00 So that's sort of the, Naval's is the explore, McConaughey's, maybe it's the exploit.

28:04 Like once you're in it, go in it full.

28:06 Exactly.

28:06 That being said, I mean, a lot of us, maybe we are a couple of years in our careers, but

28:11 I think what Naval tries to remind us, you might not be very happy what you're doing right

28:15 now, or you might love what you're doing right now, but there's always something that

28:19 suits you specifically.

28:21 So some people might love research and they might not fit into a startup environment.

28:25 Or some people in a, for example, maybe a big innovation lab, they need some of their

28:31 data scientists to really focus on research and they might not fit in there.

28:34 If the data scientist is more about iterating fast and, you know, shipping fast to customers.

28:37 So there's always something better for you.

28:39 So keep exploring.

28:41 And once you found it, exploit it like Matthew McConaughey.

28:43 He didn't half-ass it.

28:45 He ran all in and he's doing fantastically well.

28:47 Yeah.

28:47 It worked out okay for him.

28:49 Yeah.

28:49 Nice.

28:51 All right.

28:51 Transfer learning.

28:52 Books and papers are cheat codes.

28:54 Yeah.

28:55 So in machine learning, I think a couple of years ago, there was this thing called transfer

29:00 learning, which it didn't make quite a stop, but I thought it was really breakthrough.

29:03 So what it means is that there's this competition called ImageNet, where you try to classify images

29:08 into thousands of categories, right?

29:10 I think 1,000 categories.

29:12 And those big companies like Google, Microsoft, they would train huge models to classify this.

29:18 And this would be deep neural networks.

29:19 And what people found is that you can take what they have trained, this huge model with

29:24 all the weights and parameters, and you can just chop off the last layer and put your own

29:29 model.

29:29 Maybe you're trying to classify cat versus dog.

29:31 You can use that model and then classify cat versus dog and put in your own data and just

29:36 update the model.

29:37 And it would work fantastically well.

29:40 And all those of magnitude better than if you had to train from scratch.

29:43 Wow.

29:43 Yeah.

29:44 I think that's a cheat code.

29:45 I use that when I first heard about it, I use that cheat code.

29:47 Okay.

29:48 Since I've heard about it, I've only used transfer learning for work as much as I can.

29:52 If there's a transfer learning model.

29:53 Yeah.

29:53 So the idea is you, instead of starting with a completely blank set of weights in your model

29:58 and just feeding data and going, no, that's right.

30:00 That's not right.

30:01 That's a dog.

30:01 That's not a cat.

30:02 No, that's yes.

30:03 That one is a cat.

30:03 Good job.

30:04 You can use kind of a vague one to automate some of that driving.

30:08 Is that kind of the idea?

30:09 Like you give it a little bit of knowledge, but not too much.

30:11 And then you keep still teaching it.

30:13 Is that this model is able to distinguish a thousand different cats and dogs and hamburgers

30:18 and cars.

30:19 Yeah.

30:19 And you're just taking all that knowledge that's in there and you're just fine tuning

30:23 it for your specific use case of cat versus dog.

30:26 And it really cuts out so much effort.

30:28 Yeah, absolutely.

30:29 Very, very cool.

30:30 So in this one, the life lesson here is we were touching this before that people feel

30:34 like a lot of times they've gotten, they've got their degree, they've studied, they've

30:37 worked hard.

30:37 Like they just want to have fun and live their life and not just keep going.

30:40 But like the way to think of education, formalized education, let's say, even I would say college

30:46 is this generalized pre-training, right?

30:49 You're not ready to really go do the thing.

30:50 You haven't really learned the thing, but you're much closer than somebody who hasn't, right?

30:55 Something like that.

30:55 Schools, generalized pre-training.

30:57 Yes, I fully agree.

30:58 I think a lot of people think that at least what I've seen, some people see that after

31:03 they graduated, I'm done learning.

31:05 But I don't know about you, but after I graduated, after I got into working world, I realized,

31:09 hey, I didn't learn any of this stuff.

31:11 Yeah.

31:11 In the working world, I had to learn a lot more.

31:14 Almost none.

31:15 I think the only thing that I learned at school that helped was Excel and R and SPSS, the technical

31:20 skills.

31:20 So school sort of trains, you know, teamwork, how to work in a project team, communication,

31:26 how to learn fast because, you know, every semester you'll be taking four subjects.

31:30 It teaches you the general stuff.

31:31 And then once you get to work, there's a lot of on-the-job training that you have to do yourself.

31:35 So the point that I was trying to make here is that school is really just the start of

31:40 it.

31:40 It's generalized pre-training.

31:41 You could stop there.

31:42 But if you fine-tune, if you take the effort to fine-tune your model from school onto your

31:48 very specific tasks, you would be orders of magnitude more effective.

31:52 That's the message I'm trying to give.

31:53 And I also want to tell people, and this is the next paragraph, is that we have transfer

31:58 learning models, right?

31:59 And in real life, that's the same thing for transfer learning models, which is books and

32:04 academic papers.

32:05 I want to try to get people to read more books.

32:08 I've gained a lot from books.

32:10 I've gained a lot from academic papers.

32:12 And you can imagine, I don't know, maybe you read a book by, I don't know, Sapiens by

32:16 Noor Harari or Deep Work by Cal Newport, who I'm a big fan of.

32:21 They have thought about this on the Vals, Navelle's thread, right?

32:24 They have thought about this for so long, five years, decades, and they have compressed it in

32:29 a book that you can read in eight to 10 hours.

32:31 Read it and you'll be that much smarter.

32:34 And you'll see life in a different way and you'll gain a lot.

32:37 Yeah, that's super interesting.

32:39 I do agree with it that you don't necessarily have to agree with everything they say.

32:42 It doesn't have to match your situation, but it does give you a whole lot more experience

32:47 without going through the hardship of getting that experience.

32:50 Exactly.

32:50 And I think it's magic.

32:52 Yeah.

32:52 That's what separates human beings from animals, right?

32:55 Where we can transfer knowledge.

32:56 We can perform telepathy.

32:58 I can transfer knowledge to you, to anyone on the internet by writing.

33:02 And people do that through books in the past.

33:04 Yeah, very interesting.

33:05 So that's magic.

33:06 So you say books are the weights and biases of the great thinkers who've come before us.

33:11 That's pretty awesome.

33:14 This portion of Talk Python To Me is sponsored by Linode.

33:16 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

33:20 Develop, deploy, and scale your modern applications faster and easier.

33:25 Whether you're developing a personal project or managing large workloads, you deserve simple, affordable, and accessible cloud computing solutions.

33:32 As listeners of Talk Python To Me, you'll get a $100 free credit.

33:36 You can find all the details at talkpython.fm/Linode.

33:40 Linode has data centers around the world with the same simple and consistent pricing regardless of location.

33:46 Just choose the data center that's nearest to your users.

33:49 You'll also receive 24-7365 human support with no tiers or handoffs regardless of your plan size.

33:56 You can choose shared and dedicated compute instances, or you can use your $100 in credit on S3 compatible object storage, managed Kubernetes clusters, and more.

34:06 If it runs on Linux, it runs on Linode.

34:08 Visit talkpython.fm/Linode or click the link in your show notes, then click that create free account button to get started.

34:16 On this, I wanted to ask you about what you thought about, quote, new media, right?

34:22 Like, I'm thinking in particular YouTube, but other places, right?

34:26 Like, these books are very, like, the tradition of meaningful books are very important and long, and everybody knows about them, right?

34:33 But, you talked about the MOOCs before.

34:35 I mean, YouTube, I was talking about the negative aspects of it before, but there, it's amazing what you can learn if you go over to places like YouTube with the desire to seek out this kind of information.

34:46 Like, there's so much good stuff there mixed in with, like, cat videos, right?

34:52 Yeah.

34:53 What do you think?

34:53 I think that I am ambivalent to it.

34:57 I think I have a preferred way of learning, which is to read books.

35:00 Nonetheless, I know that video is awesome in the sense that I can write some code, I can iterate, and you can see me doing it.

35:05 It's almost impossible to convey that in a book, maybe some code examples.

35:09 So, that's really amazing.

35:11 I think both new media has strengths, and it also has, the difference between new media and traditional media is that new media is just a lot more powerful.

35:19 A lot more powerful.

35:20 It can be powerful for good, and also powerful for not so good.

35:23 A lot more powerful in the sense that, you know, MOOCs are amazing, right?

35:26 A professor can teleport into your house and teach you.

35:29 That's powerful.

35:30 But powerful for not so good in the sense that we have things like that reduce your attention span, TikTok videos, or short snippets of articles that it's also very powerful.

35:39 It's just whether it's for good or not so good.

35:42 So, that's something to think about when you're using new media to learn.

35:46 It is.

35:46 It's definitely more risky in that you can get distracted and pulled away because all those places are about that for sure.

35:52 You know, like, I was recently watching a video of an opera singer and, like, vocal coach analyzing, like, this heavy metal singer and, like, dissecting the song from a perspective of, like, an opera singer.

36:04 And I'm like, I appreciate both of those art forms way more having seen, like, that experience.

36:10 And I would never have that experience.

36:12 Exactly.

36:12 And I would never go down to the bookstore and pick up a book on, like, singing theory and stuff.

36:17 I just wouldn't, right?

36:19 I would find something else to spend my time on.

36:21 But anyway, like, those are the kinds of things I'm thinking that you just, you wouldn't expect.

36:24 But you can find interesting things there, right?

36:26 Definitely.

36:27 Definitely.

36:27 Hey, I want to go back really to this one really super quick because I skipped over it right at the end.

36:32 But you talk about this exploring versus exploitation thing and saying, look, you don't always have to worry so much about this stuff because sometimes many things are what you call two-way doors.

36:42 And there's this interview over here of Jeff Bezos.

36:45 And he has this distinction of some decisions being two-way doors and some of them being one-way doors.

36:51 And you shouldn't put about the same concern, worry, energy, and whatnot into both types of decisions.

36:57 They're not equal, so don't treat them equal.

36:58 Can you speak to that real quick?

36:59 I thought that was super interesting.

37:01 Yeah.

37:01 I'm just going to use Jeff's example.

37:03 For example, maybe you're doing a side hustle and someone wants to acquire it or someone wants to completely purchase it from you.

37:10 That's a one-way door that, you know, after you sold it, there's no way to reverse it, right?

37:13 Yeah.

37:14 So, however, you know that maybe you start a side hustle and then maybe you're thinking, hey, should I be targeting Python programmers or R programmers or, I don't know, Scala programmers?

37:24 That's a two-way door.

37:26 You can easily pivot.

37:27 So, I think what Jeff is saying is that the one-way door, very difficult to reverse, but it's a lot less.

37:33 It happens a lot less.

37:34 And a lot of us, a lot of people treat two-way door decisions as one-way door decisions.

37:39 And I think he's saying that, hey, you know, it's good to distinguish between both and devote the amount of energy and due diligence and analysis into each of this.

37:48 And that makes you help you make more effective decisions more efficiently.

37:53 Yeah.

37:53 I totally agree.

37:54 I thought this was, like I said, I thought this was insightful.

37:57 And I feel like in programming and in code, a lot of people get stuck in the so-called analysis paralysis.

38:04 They're just stuck, like, trying to decide.

38:06 Like, every decision is, like, it's overwhelming and it's hard to decide what to do because you feel like you might make the wrong decision.

38:12 You don't have enough experience.

38:13 Like, so much of those things, you can just, oh, we'll just refactor this later.

38:16 Or we can just throw away this later.

38:17 Like, oh, I'm going to use a relational database.

38:19 Oh, we should have used a NoSQL database.

38:21 We'll just throw it away and switch it over.

38:23 It's, like, probably not that big of a deal versus we're going to let all the Python guys go and hire a bunch of Java guys and rewrite the whole thing.

38:30 Like, if you're a year into that decision, you're fairly committed, right?

38:33 Like, so there's really, I think, understanding, like, oh, that's a two-way door decision allows you to just, like, try it and experiment.

38:40 Fully agree.

38:40 And that's hopefully by understanding the difference between one-way door and two-way doors, it makes it easier to explore, right?

38:46 Yeah.

38:47 I agree.

38:47 I think that's worth calling out.

38:48 All right.

38:49 Iterations.

38:49 Find reps you can tolerate and iterate fast.

38:52 So I think machine learning, a lot of machine learning involves iteration.

38:55 Clearly, neural networks are iterations.

38:57 Every time you pass the data through multiple epochs and the data learns, with each iteration, with each epoch, the machine learning model error reduces.

39:06 Machine learning gets model, provides better predictions and gets more accurate with your metrics.

39:11 That is the same with life.

39:12 I think a lot of people expect, this is what I used to expect.

39:16 I expected that I would read something once and I would fully understand it.

39:19 All of the knowledge is in my brain.

39:21 And then, you know, I realized that that is never true.

39:23 So, and actually, that actually lowers my expectation of myself.

39:27 You know, sometimes I'll read a paper and then, you know, I'll try to discuss it and realize I don't actually know the details.

39:32 That lowers expectation of myself.

39:33 You know, I tell myself, Eugene, by reading it once, you're never going to get it.

39:37 Yeah.

39:37 Don't fully expect yourself to that.

39:39 It's too high a bar.

39:40 So maybe read it a few times.

39:42 Be kinder to yourself.

39:43 So that's the same thing.

39:44 When I read papers, I go through it multiple times.

39:47 When I do A-B tests, I fail two times for every one time I succeed.

39:52 So I feel like 50 to 75% of the time.

39:54 And you just got to learn to be kinder with yourself where you iterate, right?

39:58 I think I've posted some examples here.

40:00 The Angry Birds developers failed 51 times.

40:03 Sir James Dyson failed 5,000 times in 15 years before a vacuum cleaner work, right?

40:09 And, you know, imagine if he gave up, we would never have that.

40:12 Yeah, yeah.

40:13 But again, a lot of great examples here about people who just iterated and just stuck to it.

40:17 I think it also sort of ties into the previous one, right?

40:20 The two-way doors.

40:21 It's okay if it doesn't work on a lot of these types of things.

40:23 Just keep going.

40:24 Just try again, right?

40:25 Eventually, you'll find one that fits, yeah?

40:27 Exactly.

40:27 Yeah.

40:28 Speaking of fitting, overfitting, focus on intuition and keep learning.

40:31 So I think overfitting is, well, I guess it's overfitting is when your machine learning model

40:36 memorizes the training set too much and can't predict well on the prediction set.

40:40 Right.

40:40 It's almost perfect on the training set.

40:42 Like it knows that it's cold, but it's so specific that any slight variation, even though

40:46 it should be a dog, it doesn't know it's a dog.

40:49 Exactly.

40:49 An example of this is when your machine learning model learns on customer IDs.

40:53 And, you know, when new customer IDs come in, it's just crap.

40:57 So I think the clearest, the person who really pushes for this is Richard Feynman.

41:01 He says that there's no, and the way he teaches math and physics is the same.

41:04 He goes directly into intuition.

41:06 Forget about formulas or forget about memorizing stuff.

41:09 If you understand the intuition, you will understand it better and you can generalize across

41:15 many, many things.

41:15 I think in life, it also makes the same sense.

41:18 Don't try to memorize things or don't try to memorize knowledge, right?

41:22 I think if you have the intuition of the fundamentals, you'll find that it transfers across many,

41:28 many different domains.

41:29 For example, I think Elon Musk talks about knowledge as a tree.

41:32 So, you know, the fundamentals of the tree are the trunk.

41:35 That thick trunk is the fundamentals that supports all the branches.

41:38 You want to make sure that the intuition is like the trunk.

41:41 You want to make sure that your trunk is solid and then you can build on new branches or cut

41:45 off new branches and grow new branches as necessary.

41:47 So I think the way to grow this intuition, at least for me, I find that being a beginner

41:52 is the best way to do this.

41:53 Yeah, absolutely.

41:54 Okay.

41:54 Very interesting.

41:55 A last one has to do with ensembles and ensembling.

41:58 Diversity is strength.

42:00 What is ensembling?

42:01 Yeah, I guess ensembling is that I could train a model, maybe a linear regression and another

42:06 model, maybe a decision tree and then another model, maybe a key nearest neighbors.

42:09 And they would all have their different errors.

42:12 They would all have the different biases and strengths.

42:13 But the unusual thing in machine learning is that, you know, you can just take all their

42:17 predictions and average them and they would do better than all of them combined.

42:21 And actually that's a cheat code that everyone is doing on Kaggle competition.

42:25 You just train thousands of models and just combine all of them.

42:29 It reminds me of like a much simpler example.

42:32 There's sort of the wisdom of crowds.

42:34 Like you hear stories of people saying, look, here's a jar, a big glass jar full of jelly

42:39 beans.

42:39 You got to guess how many jelly beans there are.

42:41 Like many, any given person will over underestimate a whole lot.

42:45 But if you ask a hundred people, it's usually really close to the actual number.

42:48 Or there's some weird examples of this at like state fairs, there'll be like a cow and people

42:53 have to ask like, how much does the cow weigh?

42:56 You know, it's like a competition and people get it really wrong.

42:58 But it's usually really close if enough people answer and participate and it's average, right?

43:03 Exactly.

43:03 So having diversity, diversity of opinions, diversity of thoughts, I think is very powerful.

43:08 Yeah.

43:09 So that's what I'm trying to encourage here as well.

43:11 Yeah.

43:11 So what's the story about life here instead of like guessing the way to cows, which is not

43:15 all that practical.

43:16 Well, I think that one way, okay, maybe a quick one, which is one way to do when you are trying

43:21 to build teams is you might want to deliberately try to find people which are different from

43:27 you, right?

43:27 which complement your strengths.

43:29 Sometimes in tech interviews, we want to find people that are similar to us, have the

43:33 same skill sets that, you know, fit this mold, fit this job description.

43:37 That's useful.

43:38 It's effective.

43:39 But I personally have built teams whereby it's very diverse, maybe from different countries

43:43 or one third female.

43:45 And I found that the creativity that comes from this is really powerful.

43:49 And the other one, which is, I think, of course, Scott Adams is known for this.

43:52 He says that, you know, if you can't be the top of your field, combine multiple superpowers

43:56 like Scott Adams did.

43:57 He combined his ability to draw, his sense of humor and his business know-how and he created

44:01 Dilbert, which is no one else can replicate Dilbert.

44:05 It needs someone like Scott Adams to do that.

44:07 Yeah.

44:08 You know, that general idea, I actually hit on this a lot because there are a lot of people

44:12 who listen to this show who are not traditional computer science developers, traditional data

44:16 science folks.

44:17 And I think sometimes they feel like they don't have quite the same skill set to compete with

44:23 those people.

44:23 And how are they going to compete with somebody with a master's degree from Stanford in computer

44:27 science?

44:27 And what I, my thought on all this is, you look, if you're really good at economics and

44:33 you're pretty good at programming, there's not too many people who have both of those skills,

44:36 right?

44:37 Like all of a sudden you go from competing with a hundred thousand down to like 500 or something.

44:41 I don't know.

44:42 Not that maybe it's a little bit extreme, but you know, like the, if you need that intersection

44:45 of skills, all of a sudden it becomes super powerful.

44:48 And what you're suggesting here is maybe like building teams.

44:51 You can kind of build that in the team rather than in an individual.

44:54 Exactly.

44:55 And I want to go back to your previous example, someone who's maybe decent programmer, but,

44:59 you know, can't compete with someone who graduated with a degree in, degree in CS and a

45:03 master's in CS and PhD in CS.

45:04 It goes back to Navelle's tweet, right?

45:07 There's something that is just right for you that can tap on your skills in economics and

45:12 comm science.

45:13 You just need to find it and that'll be a great fit.

45:15 Yeah, absolutely.

45:16 All right.

45:17 Well, that was the seven items and I, you know, I enjoyed thinking about them and just seeing

45:22 how these machine learning examples maybe can be analogies for living life.

45:27 It's pretty cool.

45:28 Thank you.

45:28 Yeah.

45:29 Yeah, absolutely.

45:30 So a couple of other things real quick that you've spoken about is you've written a couple

45:33 of things on sort of productivity as a developer and in the tech field.

45:38 One article called how to accomplish more with less useful tools and routines.

45:42 And then also routines and tools to optimize your day, which is a guest post by Susan Hsu.

45:48 Yep.

45:48 So those are really interesting.

45:50 But in particular, during one of them, I don't remember which one you talk about this article

45:56 by Paul Graham and Paul Graham wrote this thing called maker's schedule versus manager's schedule.

46:03 And, you know, I think you talk a lot or we talk in the tech field a lot about getting into flow,

46:08 really programming and just having uninterrupted time.

46:12 And yet I think probably more than ever, people are being pulled in different directions because

46:17 everyone is just a Zoom call away.

46:19 It's not even if you're not in the office, you're now just as eligible to be sucked into a meeting

46:23 as anyone else.

46:24 Right.

46:24 Yep, definitely.

46:26 Yes.

46:26 Can you talk real quickly about this?

46:27 Because I think it's a short article, but I think having awareness of this idea of a maker's

46:32 schedule and a manager's schedule and how they're not super compatible and you got to be careful

46:37 to help them coexist.

46:38 I think that's important.

46:39 Yeah.

46:40 So I think that, of course, all credit goes to Paul Graham for this.

46:43 So makers to even start to design something or to start to code a framework, you sort of

46:48 need, I don't know about you, but it takes me like 30 minutes to warm up, to have to load

46:52 all the concepts into my memory so I can start juggling them in my head.

46:55 And then, you know, once I load all that into my head and then I can, okay, I can start

46:58 writing pseudocode, you know, tweaking things, testing things iteratively.

47:02 And that takes time.

47:03 And it takes me maybe about 45 minutes, 60 minutes to get into flow.

47:07 And once I'm in the flow, I'm moving really quickly, like speeding things through or once

47:11 I'm in the flow of, you know, fixing a bug.

47:12 And I don't know about you, but if I don't fix the bug, I can't stop.

47:16 I can't go for lunch.

47:16 And that motivation, that drive, if someone pulls me into a meeting, it sort of kills the

47:21 motivation sometimes for the day.

47:23 And, you know, if you had continued for just 30 minutes, you would have fixed it.

47:27 But if it's broken by something in the middle, you'll be gone.

47:31 So how I try to do this is that I actually deliberately block my time in the morning before

47:36 lunch.

47:37 I actually block it out with meetings, my own meetings, so that I can actually use that time

47:40 to get in the flow.

47:41 Depends on when your energy level is highest.

47:43 For me, it's actually in the morning.

47:44 And then I actually have, I say that when people want to ask for a meeting, I say, oh, sure,

47:48 let's do it after 3 p.m.

47:49 If you're okay with it.

47:50 Because after 3 p.m., I mostly can't do deep work anyway.

47:53 So I think that's useful to be aware of.

47:55 Yeah, it's really interesting.

47:56 And I agree with that.

47:57 Paul talks about if you're on a manager's schedule, what you do?

48:01 You go from meeting to meeting to meeting.

48:02 And if you've got an hour gap in your day, you know, oh, you could just meet with somebody

48:06 else.

48:06 Maybe that's like a time to just set up a meeting so you could get to know somebody and dig in

48:10 with that person or the team or the project.

48:12 And that's fine if you're on a manager's schedule.

48:15 But if you're on the maker's schedule, maybe you do need the whole morning uninterrupted so

48:19 that you can get into that.

48:20 You know, like, you know, you've had a good session when, you know, you've been programmed,

48:25 program, and then you stop for a second.

48:26 And you're like, wow, I'm hungry.

48:28 I really have to go get sick.

48:29 It's like three in the afternoon.

48:31 I forgot to eat lunch.

48:32 Like, that's totally possible that that happens, right?

48:34 And I would just want to say that these sessions feel so fulfilling, feel so satisfying.

48:38 You feel like you've gotten so much work done in such a compressed amount of time that, okay,

48:43 and now you can, sure, I can have office hours now.

48:46 So those sessions are really fulfilling.

48:47 Yeah.

48:48 I don't know how this is going to work out.

48:49 But after reading this and some of your other writing, I decided on my calendar, I'm just

48:54 blocking, like, Tuesday and Friday, like, all day.

48:57 And I'm just going to call those maker days.

48:58 We'll see how that works out.

48:59 Wow.

49:00 And if I can just, like, get a lot of stuff done.

49:01 So other days, I'll have more meetings.

49:03 We'll see.

49:03 I don't know.

49:04 I'm looking forward to hearing your experience.

49:05 Yeah, absolutely.

49:06 And Chris May.

49:07 Hey, Chris.

49:08 Out in the live stream says, personal productivity brings superpowers to the powers you got by learning

49:12 Python.

49:13 Totally agree.

49:13 Yep.

49:14 Yeah, awesome.

49:14 Very, very cool.

49:15 All right.

49:16 Well, I think that's probably about all the stuff that we have time to talk about.

49:20 Although, maybe really quickly, you could touch on the bottom of your homepage.

49:25 You've got a bunch of resources.

49:26 Maybe just highlight something you think that people would find valuable there.

49:29 So, yeah, I like to, again, a lot of these answering questions that people ask me.

49:33 So people ask me, you know, what your favorite papers are?

49:35 What paper should I read?

49:36 So that's the second one on the list, Applied ML, where, you know, I try to collect papers on

49:41 real-world machine learning by companies that have implemented it.

49:44 The lessons they learn.

49:45 And, you know, sometimes people ask me, you know, wow, I'm starting to get into this field.

49:48 There's so much to learn.

49:49 And that's the third link there, where I find machine learning surveys, where, you know,

49:53 people summarize what has happened in the past.

49:55 And, of course, you know, people ask me things like, you know, how do you set up your Python

50:00 repo so that you have code reviews and all that automatically or linting?

50:03 So I have things like, you know, the Python collab template or, you know, how to test machine

50:07 learning models.

50:08 And, of course, recently I wrote about how to write machine learning design docs, design documents

50:14 and, of course, I have that as well.

50:15 So, you know, I mean, some of these are Git repos.

50:17 Some of these are just articles.

50:19 And, of course, there's the email course, which is a lot of people ask me, you know, what makes

50:23 an effective data scientist?

50:24 This is the question I ask a lot of my mentors five years ago.

50:28 And I try to summarize the five lessons that I've learned in a short email course where I only

50:34 send you one lesson a day.

50:35 And, of course, there's a short exercise that I hope people will do.

50:39 And that's why I send you one lesson a day.

50:40 And that short exercise maybe takes an hour each.

50:42 And I hope that after this, it sort of opens your mind that, you know, being an effective

50:46 data scientist is beyond coding well, is beyond PhD level research, is beyond math.

50:52 Cool.

50:52 Yeah, that looks really useful.

50:54 And you also have a Papermill-MLFlow.

50:57 What do you think of Papermill?

50:58 I started using this because I wanted to run rapid experimentations in Jupyter notebooks.

51:04 And MLFlow is something that, you know, helps you track your machine learning models.

51:08 Papermill allows you to parameterize.

51:10 At least how I'm using it is I'm parameterizing my Jupyter notebooks.

51:13 By combining both of them, I have a master Jupyter notebook that has all the different params

51:18 and all the different countries and marketplaces.

51:20 I just run that huge Jupyter notebook and all the experiments are logged.

51:23 So I love it.

51:24 So I decided to make it a template that other people can use as well.

51:27 Well, yeah, people can check out all those things.

51:29 Put them in the show notes also on your website.

51:31 All right.

51:31 Well, I think that's probably it for all the time we have.

51:34 So let me ask you the final two questions before you get out of here.

51:37 You've written about two options.

51:38 So I don't know which one you're going to go with here.

51:40 But you're going to write some Python code outside of Jupyter, say, like what text editor

51:44 do you use?

51:44 I have an answer.

51:45 And I'm curious about your answer as well.

51:47 For me, I'm a diehard PyCharm fan.

51:49 I've tried using VS Code.

51:51 Just doesn't feel as snappy, as IntelliSense as you think.

51:55 I've been using VS Code for my JavaScript.

51:57 But what's your take, Michael?

51:58 Should I be using VS Code more?

52:00 Look, I'm a fan of people doing VS Code.

52:02 And I know a lot of people love it.

52:03 The style of PyCharm is exactly, it just fits my brain.

52:06 Like, I feel that it just so perfectly understands the project I'm working on, that it's the right

52:12 tool for me as well.

52:13 That's me.

52:13 And I do some Scala on the side.

52:16 And PyCharm has a Scala sister, which is IntelliJ.

52:18 And it's just me that I'm still a diehard PyCharm fan.

52:21 Yeah, right on.

52:22 And then notable PyPI project or package that something out there, maybe not the most popular,

52:28 but you're like, oh, I found this thing and it was super helpful.

52:30 Well, off the top of my head, I cannot think of anything, honestly.

52:33 But one thing that I love, that I hope people will love, is pytest.

52:37 Yeah.

52:38 All right.

52:38 Yeah, pytest is super good.

52:39 Yeah, let me throw out an example for you.

52:41 Along with this pytest idea, something that I came across recently is great expectations,

52:47 which is kind of like automated testing for the data cleaning and data validation,

52:51 both when you're pulling it in the first time as well as like production.

52:54 So there's a one to build on top of the pytest story.

52:57 Exactly.

52:58 So as of now, the things that are top of my mind, pytest, PyLint, mypy, the things that make code manageable and maintainable, I think about that a lot.

53:06 Yeah, fantastic.

53:07 All right.

53:08 Well, that's it for all the stuff we've got to cover.

53:11 Eugene, thank you for being on the show.

53:12 Final call to action.

53:13 People are, maybe they want to get into your writing or they want to start writing and

53:16 thinking more, sort of almost become a developer philosopher type.

53:20 So what advice you got for them?

53:21 Just stop writing.

53:22 Final advice.

53:23 Why write?

53:24 Like, okay, by writing, you put your stuff online and people find you.

53:27 And this is why this podcast even happened, right?

53:29 Michael found me through my writing and we talk about and we find like-minded people.

53:33 That has happened to me.

53:34 I find so many like-minded people talking to me about machine learning and systems and writing.

53:39 And I've made so many new friends.

53:40 Do that.

53:41 And you'll make a lot of new friends online.

53:43 Highly recommend it.

53:44 Yeah, it's great advice.

53:45 I find like stepping just even a tiny bit outside of your comfort zone starts to lead

53:49 to other things.

53:50 And maybe you're not doing writing.

53:51 Maybe you're speaking at a meetup.

53:53 That's even more possible than it used to be because you don't have to travel anymore,

53:56 right?

53:56 You can reach out to meetups that are not next to you and so on.

53:59 All those things make huge differences.

54:01 Definitely.

54:02 Highly recommend.

54:02 And if you write because of this podcast, email me.

54:06 My email is on my website.

54:07 I would love to read what you wrote about.

54:09 Oh, fantastic.

54:09 All right, Eugene, thank you for being on the show.

54:11 It's been really great to chat with you about all this stuff.

54:13 Welcome.

54:13 It's my pleasure.

54:14 Take care.

54:15 Pat, take care.

54:15 This has been another episode of Talk Python To Me.

54:19 Our guest on this episode was Eugene Yan, and it's been brought to you by Retool and Linode.

54:24 Supercharge your developers and power users.

54:27 Let them build and maintain their internal tools quickly and easily with Retool.

54:31 Just visit talkpython.fm/retool and get started today.

54:36 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

54:41 Develop, deploy, and scale your modern applications faster and easier.

54:44 Visit talkpython.fm/Linode and click the Create Free Account button to get started.

54:49 Be sure to subscribe to the show.

54:52 Open your favorite podcast app and search for Python.

54:54 We should be right at the top.

54:56 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

55:01 direct RSS feed at /rss on talkpython.fm.

55:06 We're live streaming most of our recordings these days.

55:08 If you want to be part of the show and have your comments featured on the air, be sure to

55:13 subscribe to our YouTube channel at talkpython.fm/youtube.

55:16 This is your host, Michael Kennedy.

55:18 Thanks so much for listening.

55:19 I really appreciate it.

55:21 Now get out there and write some Python code.

55:22 I'll see you next time.