Building ML teams and finding ML jobs

Episode #298, published Mon, Jan 11, 2021, recorded Wed, Nov 18, 2020

Episode Deep Dive Links Transcript

Are you building or running an internal machine learning team? How about looking for a new ML position? On this episode, I talk with Chip Huyen from Snorkel AI about building ML teams, finding ML positions, and teach ML at Stanford.

Episode Deep Dive

Guests Introduction and Background

Chip Huyen is a computer scientist, writer, and educator, currently working at Snorkel AI. She initially came from a writing background, traveling the world and authoring cultural and food stories, before discovering her passion for coding at Stanford. After immersing herself in computer science courses and teaching a TensorFlow-focused class, she pivoted to AI/ML engineering. Today, her work at Snorkel AI covers everything from developing the core platform to speaking with customers about their machine learning needs. She is also authoring a book that helps both interviewers and interviewees navigate the evolving realm of ML job interviews.

What to Know If You're New to Python

Here are a few essential ideas to help you follow along and get the most out of this discussion:

Python has a broad ecosystem for data science, letting you do everything from quick experiments in notebooks to running large-scale ML models.
You can start with simple examples and pre-trained models rather than building solutions entirely from scratch.
Tools like Jupyter Notebooks or VS Code make it easy to explore data and learn the language side-by-side.
Be prepared to pick up a bit of engineering knowledge (e.g., version control) so you can move from experimenting to deploying real projects.

Key Points and Takeaways

Building Internal ML Teams Companies often jump into machine learning without a clear plan for who to hire or what problems ML will actually solve. People with business, data, and software skills can be more important early on than hiring strictly for PhD-level research.
- Tools and Links
  - snorkel.ai — The startup where Chip works, focused on data-centric AI development.
Machine Learning vs. Data Science There is a significant difference between machine learning engineering (productionizing models, MLOps) and data science (analysis, insight generation). ML engineers care deeply about deployment, monitoring, and performance, while data scientists often focus on exploration and insights. Both skill sets can overlap but require distinct expertise.
- Tools and Links
  - PaperMill GitHub — Mentioned as a way to manage notebook-driven workflows, an example of bridging the gap between research and deployable work.
Domain Knowledge and Existing Talent Many companies start their ML practice by transitioning internal data science teams into ML roles because of their familiarity with the domain and datasets. It’s often easier to teach solid engineers some ML, or data scientists the production details, rather than hiring external ML superstars who lack domain context.
- Tools and Links
  - handcalcs on GitHub — Demonstrates how specialized tools can enrich data analysis in Jupyter, even if you’re not an AI expert.
Iterative Development for Production ML Real-world machine learning is never a “one and done” task. Once a model is deployed, there is constant iteration: capturing feedback, adjusting heuristics, gathering more data, or refining the architecture. Teams that succeed treat ML as an ongoing lifecycle rather than a single project.
- Links
  - VS Code — A popular editor mentioned for quickly iterating on Python and ML code.
ML Competitions vs. Real-World Constraints Public leaderboards (e.g., old Netflix Prize) can produce complex models that are tough to put into production. Winning solutions often prioritize raw accuracy over deployment feasibility and maintainability. Real systems must consider factors like user experience, latency, interpretability, and business objectives.
Startups and Rapid ML Experimentation Smaller companies let you gain exposure to everything from data pipelines to user-facing features. You might be experimenting with labeling strategies one week and building production pipelines the next. For many, this fast-paced environment drives tremendous growth but also demands adaptable, broad skill sets.
Teaching as a Path to Mastery Chip’s journey reveals the power of teaching for deepening your expertise. By leading courses on TensorFlow and systems design, she forced herself to understand ML tools more thoroughly. If you want to level up, consider writing blog posts, giving talks, or tutoring others on what you’ve learned.
The Interview Process for ML Roles The ML interview experience can be confusing for both candidates and companies, particularly because these roles are evolving. Expect questions about system design, data management, deployment, and teamwork, not just coding challenges or advanced math proofs. Chip is writing a book to help standardize these expectations.
Choosing Tools and Frameworks With so many available libraries—TensorFlow, PyTorch, hugging face models, and more—it can be challenging to decide what to learn first. In practice, the best stack often depends on your project, existing infrastructure, and team’s familiarity with Python. It’s better to master a few libraries deeply than jump around too frequently.
Snorkel AI’s Approach to Data-Centric ML Snorkel AI focuses on labeling data programmatically and iteratively improving model performance. Rather than having humans annotate thousands of items, you can encode domain heuristics as functions. The platform unifies data management, model training, and deployment, emphasizing the cyclical nature of modern ML development.

Interesting Quotes and Stories

On discovering programming: “When I was younger, I thought programming was the most boring job in the world… but once I actually tried it, I realized it could be creative, collaborative, and fun.”
On building real ML teams: “There’s a big gap between reading academic papers and getting that model into production. A lot of it is engineering—and yes, it’s a lot of iteration.”
On teaching: “I taught TensorFlow not because I was an expert, but because I wanted to become one. Nothing shows you what you don’t know faster than teaching it.”

Key Definitions and Terms

Data Science: Focuses on exploring data, finding trends or anomalies, and generating insights for business decisions.
Machine Learning Engineering: Involves deploying and maintaining ML models in production, emphasizing robust code, scalability, and monitoring.
Heuristics: Rule-of-thumb or domain-driven logic used to label or filter data automatically, rather than labeling large datasets entirely by hand.
Leaderboards / Competitions: Platforms (like old Netflix Prize) where teams compete to achieve the best model accuracy. While helpful for research, these solutions aren’t always practical for production.

Learning Resources

If you want to sharpen your Python skills to support machine learning work, here are a couple of curated courses from Talk Python Training.

Python for Absolute Beginners: Ideal if you’re just getting started with Python programming and need a solid foundation for advanced topics like ML.
Data Science Jumpstart with 10 Projects: A hands-on exploration of Python’s data science stack, suitable for people stepping into real projects and wanting concrete experience.

Overall Takeaway

Companies are eager to leverage machine learning, but their success depends heavily on bringing the right mix of technical, domain, and collaborative talent to the table. Chip’s experience—from writing to coding to teaching—underscores the value of diverse backgrounds and continuous learning in this field. By recognizing that practical engineering considerations, labeling strategies, and iterative improvement are central to deploying ML, organizations and aspiring ML professionals alike can be better prepared to thrive.

Links from the show

Chip on Twitter: @chipro
Snorkel AI: snorkel.ai
Chip's Book Preview: twitter.com
handcalcs project: github.com
IBM Buzzword Bingo: youtube.com
Episode #298 deep-dive: talkpython.fm/298
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #298 deep-dive: talkpython.fm/298

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Are you building or running an internal machine learning team?

00:03 How about looking for a new ML position?

00:04 On this episode, I talk with Chip Heughan from Snorkel AI about building ML teams,

00:09 finding ML positions, and teaching machine learning.

00:12 This is Talk Python To Me, episode 298, recorded November 18th, 2020.

00:17 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:35 ecosystem, and the personalities.

00:37 This is your host, Michael Kennedy.

00:39 Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past

00:43 episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

00:48 This episode is sponsored by Datadog and Linode.

00:51 Please check out what they're sponsoring during their episodes.

00:53 It really helps support the show.

00:55 Chip, welcome to Talk Python To Me.

00:57 Hi, Michael.

00:58 Nice to meet you.

00:59 Thanks so much for having me.

01:00 Nice to meet you, and thanks for coming on the show.

01:03 It's going to be a lot of fun to talk about ML and putting ML into production and building

01:08 ML teams.

01:09 We're going to talk a lot of, probably cover a lot of buzzwords, right?

01:12 Like, AIS and ML are so top of mind in all of the...

01:16 I need to impress people by throwing out all the buzzwords, yeah.

01:19 That's right, right.

01:20 IBM had this really funny commercial, which is ironic that it was IBM, but it had this really

01:25 funny commercial, like, 10 years ago called Buzzword Bingo.

01:29 I don't know if you ever saw that, but it was really, really hilarious.

01:32 We'll see if I can link to it in the show notes, if I can dig it back up on YouTube.

01:35 But yeah, so we could definitely win that one today, just because it's such a growing

01:40 and interesting topic.

01:41 Yeah.

01:42 But before we get to all that, of course, let's start with your story.

01:44 How do you get into working with programming in Python and machine learning?

01:48 It's really funny because when I was younger, I thought being a programmer was the most boring

01:54 job in the world.

01:55 I was like, why would anyone want to spend the rest of their life sitting in a basement looking

02:00 to come to screen?

02:02 This is for the anti-social people.

02:04 They don't want to go outside.

02:06 They just, like you said, sit in a basement.

02:08 Yeah.

02:08 Don't they have friends?

02:09 Yeah, I know.

02:10 Yes, exactly.

02:11 Come on.

02:12 Yeah.

02:12 The views changed over the last 20 years or so, right?

02:16 Like it's...

02:16 Yeah.

02:17 Society has sort of viewed that differently, but yeah, it's how it was.

02:20 I think as far as growing up, you just realize you becomes a person that you make fun of,

02:24 you know?

02:25 So that's the story of my life.

02:27 So when I was younger, I actually come from a writing background.

02:30 So I was traveling the world.

02:32 I know it sounds like nice and stuff, but it's like not that nice.

02:36 So I was traveling the world and writing a lot about culture, people, a lot of food.

02:40 So I can't just stand for thinking I would major in script writing because I thought it

02:45 would be fun.

02:46 But then I took some CS courses and they talked to CS friends and they were like, what?

02:51 Is this what you make for an internship?

02:53 It's like what my family makes like the entire year or something.

02:56 So I was like, what is so cool about it?

02:58 So I took CS courses.

03:00 Wait a minute.

03:00 Maybe I should pay attention to this, right?

03:02 This is starting to sound good.

03:03 Yes, yes.

03:05 So yeah, so I took some CS courses and I think Stanford did a really great job of like getting

03:10 people interested in computer science because also introductory courses are extremely well

03:15 designed and exercises are not just like, I don't know, boring things.

03:19 It's like trying to design a button to increase something.

03:22 I don't know.

03:22 I don't know what boring lectures are, but there's an exercise like building games.

03:27 So you could play games.

03:28 Yeah.

03:28 So a lot of fun things.

03:30 So I took them and I took the courses.

03:32 I really enjoyed them.

03:33 And then I took more courses.

03:35 I think initially it was more of like financial needs because I need to TA to get some pocket

03:41 money.

03:41 But then I TA'd and then I met some wonderful people who are so TAing.

03:46 I think at Stanford was of course, we couldn't session leading.

03:48 It was really fun.

03:49 And I think I just getting sucked into it.

03:52 And fast forward four years, I major in computer science.

03:57 Yeah.

03:57 That's amazing.

03:58 Do you still do any writing?

04:00 Oh, yes.

04:00 I still write a lot.

04:02 So I think it's kind of tricky because of writing.

04:06 And technical writing is very different from the kind of writing I did before.

04:10 So I can't really switch among them easily.

04:13 So I think for me, it would be like a month just focused on technical writing.

04:19 So like blog posts and documentation.

04:22 So I write a lot about, for example, like paper summarizations or just some new techniques

04:28 that I learned.

04:28 I think recently I did something about.

04:30 So I look up like 200 machining tools I could find and I try to analyze.

04:35 Oh, no, no.

04:36 Wow.

04:37 It took me so much time.

04:38 So the kind of writing is very different because in people in political writing, they want to

04:42 get to the point, right?

04:42 But also sometimes I still like to write stories like non-fictions.

04:46 And when you write stories, people want to take another journey.

04:50 They just don't want to show the destination right away.

04:53 So I tried and I do a month of that and then I switch my set to another.

04:57 So yes, I still write a lot.

04:59 That's cool.

04:59 It sounds like a lot of fun.

05:00 So one of the things that is interesting about your story is like you decided to do some programming

05:06 classes and get into it.

05:07 And at first you were not so sure that that was the full on path that you wanted to go

05:12 down.

05:13 And as you got deeper into it, you saw it as more interesting.

05:16 You made friends and connections and you sort of saw the human side of it and got sucked

05:20 more and more into it, right?

05:21 Yeah.

05:22 Like, yeah, programmers do have friends.

05:24 Yes.

05:24 I learned that.

05:25 So yeah, that's good.

05:27 Right.

05:27 Encouraging.

05:28 Yeah.

05:29 Well, the thing that's interesting to me is a lot of times you hear people early in

05:34 their career thinking about like, well, what should they study?

05:36 What should they go into?

05:38 Or if they're changing careers, what should they go into?

05:40 And a lot of times the advice is follow your passion.

05:43 Like, what are you passionate about?

05:44 Like, well, I'm really passionate about soccer.

05:45 Okay.

05:46 Well, go into soccer.

05:47 Like what I've, I think I've seen over the years is a lot of people who are actually

05:52 super passionate about what they're doing.

05:53 They didn't go to it because they just knew from the beginning that that was it.

05:57 It's like somehow as you get pulled in, as you master a topic and you learn more about

06:02 it, like it, it's like this mastery and understanding leads to passion, not the other way around.

06:07 Yeah.

06:07 No, I totally agree with you.

06:09 So this is something I think about sometimes.

06:12 So I realized over the years that there are things I thought I would enjoy doing.

06:16 So I thought it was my passion.

06:17 So at some point I think like I would totally want to do AI research.

06:21 And then I took three months off in travel and I realized it's like I read every day and

06:25 write every day, but I didn't read a single paper in the three months.

06:28 So like, it's just not something I enjoy doing.

06:31 So I realized that I want to become the person who do AI research, but I don't want to do

06:36 actually be doing it.

06:37 Yeah.

06:37 So I think those things, you just need to like spend time and think about it and like

06:41 to just observe what you do in the three times you'll know what's so passionate.

06:45 And so don't think of passion as something you find.

06:47 I think passion is something you cultivate.

06:49 So you might not know.

06:51 So, so you know, have you ever had this moment?

06:53 Like you study, you learn something and it's like so weird.

06:55 You don't understand anything.

06:56 It's like miserable.

06:57 And it has a gotcha moment.

06:59 Everything just makes so much sense.

07:00 And it's just like keep doing deeper and deeper into it.

07:03 And it suddenly becomes a passion.

07:04 Yeah.

07:04 So I do think that in the beginning, I think it's really useful to just try a new thing,

07:08 but like, don't just do too short, like give it some time to actually actually see how much you learn, how much you grow in it.

07:14 And I think it's no, there's no pain.

07:16 You know, I just, there's no shame.

07:18 It's like leaving something you don't think is for you, but you definitely need to give

07:21 things to Tom.

07:22 That's good advice.

07:23 Yeah.

07:23 A lot of people want to find that thing that they love and just go for it.

07:27 And I think actually the answer might be just experience a lot of things and then decide,

07:31 right?

07:31 Which is awesome.

07:32 So you took some CS courses and did a bunch of writing.

07:36 And now all of a sudden you're on the other side of the podium, right?

07:41 You're doing a little teaching as well.

07:42 Yes.

07:42 I don't think learning and teaching have to be mutually exclusive.

07:45 So I started teaching when I was a student and I started out not because I was an expert.

07:51 As a TA?

07:52 No, as an instructor for the course.

07:54 So I started out not because, I started teaching it not because I was an expert, but rather

07:58 because I wanted to become an expert.

08:00 And so in the beginning it was like, so my first course I taught as an instructor was TensorFlow

08:06 for deep learning research.

08:07 So at that point, TensorFlow was fairly new and I was using it in my own internship and I couldn't

08:12 really find good training material.

08:15 So I went to my professors and was like, hey, can you take a course on it?

08:19 Take a course on it?

08:20 And they were like, oh, we don't have time.

08:22 Like, you know, for professors, you put their name on the line.

08:24 They could have to like, and I was going to have to make a lot of investment to make it

08:28 good.

08:28 And they were like, why don't you do it?

08:29 And I was like, what?

08:30 And I was like, yeah.

08:31 They have Stanford has this thing that you allow students to initiate course.

08:34 So I did it.

08:35 And in the process, so it's not really it's like having a group of people who also want

08:40 to learn TensorFlow and learn to quit them.

08:41 So I just tried to like anticipate a lot of questions by just Googling a lot.

08:46 It was like, I know I spent like half of my waking hours, like a flow or something.

08:50 So that's how I started.

08:52 Anything now I still continue doing it.

08:54 That's cool.

08:55 I think you could just learn so much when you try to teach something.

08:58 It's a really valuable way to just get deeper and deeper into it.

09:01 And this is at Stanford, right?

09:02 Yeah, it was Stanford.

09:03 I think it's not so like learning, but that you realize what you don't know.

09:06 Because, you know, sometimes you think that you know something and then you start explaining

09:09 to people it was like, you have no idea how it goes.

09:13 Yeah, for sure.

09:14 The way that I think about it is like if you were, say, a consultant at a company and you

09:17 had a programming problem to solve, like let's say you need to do something with multi-threading,

09:21 right?

09:22 If you find one way that works, you're done.

09:24 You move on to the next thing.

09:25 Yeah.

09:26 But if there's three ways you could have done it as a teacher, you have to know what the

09:30 three ways are.

09:31 When should you use one versus the other?

09:34 What are the three?

09:34 These are questions that just a lot of times you don't have time or energy to dig into.

09:38 But once you start teaching, you're like, well, I better know it because they're going to

09:40 ask me.

09:41 There's more ways than one.

09:42 How do I do it?

09:43 And why?

09:44 Right.

09:44 Like it really just makes you give it like this other perspective on trying to learn about

09:49 stuff.

09:49 Yeah.

09:49 No, I think there must be like some like teaching rule.

09:52 If there's a question you don't want to answer, students don't ask it.

09:55 Oh, they can like use.

09:56 Yes.

09:57 I remember from teaching math classes as well.

10:00 Like, you know that they're just going to hone right in on that one thing that you were

10:03 afraid.

10:04 Please don't ask me this thing.

10:05 I don't totally.

10:06 They can smell fear.

10:07 Exactly.

10:09 Yeah, exactly.

10:10 All right.

10:10 So you are doing teaching at Stanford right now, but you're also working at a new startup,

10:16 right?

10:16 So what are you doing today?

10:17 That is a great question.

10:18 I have been asking myself a question ever since I joined a startup.

10:22 So I think startup life is great.

10:26 I think it's so dynamic.

10:28 We have been growing so much.

10:29 The company has increased in size and multiple times since I joined in December.

10:34 So it's been like less than a year.

10:36 So it's a blessing.

10:37 I think when I joined the Snorkel AI, I told the founding team that I'm looking for environment

10:43 where I can learn like different aspects of business because eventually I want to start

10:48 my own company.

10:48 And they've been extremely supportive.

10:50 And so in the beginning, before we launched, we were like very much heads down building the

10:55 platform.

10:56 So my job was like entirely on the engineering side, like building out the modeling service

11:01 and like other features.

11:03 And then there's a company launch and we suddenly like had a lot of interest from people.

11:07 Like I was humble brag.

11:09 We had two people.

11:10 We don't have time to talk to them.

11:11 No.

11:12 So we still had a lot of interest.

11:14 So I think we needed more people to like responding, like just talk to potential customers.

11:19 So I have been spending more and more time on that side.

11:23 So yeah.

11:24 So recently I just decided to switch like most of the time on the go-to-market side.

11:30 You said your ultimate goal in the long run is to do something on your own potentially.

11:34 And those two things you talked about, those are the two really hard aspects of starting

11:38 like one, the technical side, because you have to sort of bootstrap it and get it going.

11:42 But the other is like marketing and get the word out, positioning, like all that stuff just

11:47 so often on the technical side just gets ignored until you build it and no one comes.

11:51 You're like, all right, well now what are we going to do?

11:53 Right?

11:53 Yeah.

11:54 I think it's definitely, they are both like really different, difficult aspects.

11:59 I think like another aspect is like recruiting.

12:01 I think like, I think companies can like, like good people can like making a bet higher

12:06 can basically like bankrupt the company early on.

12:09 And so, so I think like one reason why I joined the company, this company I'm with is that,

12:14 so I look at the teams like, wow, how did you manage to like convince like great people?

12:18 And I'm saying, it's like when I said how they managed to convince great people, I mean,

12:21 how does it manage to convince me to join them?

12:23 I'm just kidding.

12:24 I feel like everyone on the team is like, it's pretty great.

12:27 But I feel like it's with a strong team.

12:29 Yeah.

12:30 So I think what I learned about startup life is that like, you don't just stick to an

12:34 idea, like from beginning to end, you try out different things and try to pivot.

12:37 You try to like, in the beginning you have a hypothesis, right?

12:40 You think that like, this might be something people want, but you don't know for sure until

12:44 you're actually like working with like customers.

12:47 So over time you learn different things, you change your ideas or like maybe there's some

12:51 like giant company launching the exact same thing and like, how do you compete with that?

12:55 Yeah.

12:55 So sometimes it's not about the ideas, not even about the product, but it's about the people

12:59 because like with a strong team, even if you throw out some existing product and it be a

13:04 new product, it can still has a chance of like competing.

13:06 But if you have a bad team and it's a current idea proof should be like wrong, that you can't,

13:11 you can't really recover from it.

13:13 Yeah.

13:13 One of the things I think is really interesting about working with small, in a small company

13:17 like a startup is you get exposure to so many different things, right?

13:21 You're not just the person that does billing in this way or build that part of like this

13:25 pipeline.

13:26 Like you have to really get your hands into many parts.

13:29 It's stressful, but also I think you grow a lot if you get that opportunity.

13:32 Yeah.

13:33 I think, I think it's the discussion people have been talking about like the difference between

13:37 looking at a big company and a small company, right?

13:39 So like at big companies, you have to, you can't, you're allowed to like focus on one

13:44 small thing and go really deep into it and you spend like many, many of the waking hours

13:48 on it.

13:49 But as startups, like yeah, many things going on and you have to like maybe like cycle among

13:53 them like quickly.

13:54 So let's say like at big companies, like big companies can afford to hire specialists who

13:59 can, who can do one small things really well.

14:01 But startups, like a lot of, they might want somebody who can do a lot of things.

14:06 Okay-ish, like not like expert.

14:09 Yeah.

14:09 A lot of prototypes, do it quickly, try it out, then work, do something else, find a

14:14 gap, fill that hole, all that kind of stuff, right?

14:16 Yeah.

14:17 I think it's good.

14:17 Like really depends on like the faces of life.

14:20 I know there are people who, who really just want to like keep their heads down and focus

14:24 on one thing.

14:25 So, I mean, there's no, there's no shame.

14:27 I think I have so much respect for people who can do it as I have so much respect for people

14:32 who can like adapt quickly and learn things quickly and just like build things.

14:36 Yeah.

14:36 You got to find the one that works for you.

14:38 So one of the things that you've been writing a lot about, you're working on a book actually,

14:42 is basically about building teams in the ML space and hiring people.

14:48 And I think that this is a big challenge right now because so much of the folks in the data

14:53 science space is so hot and so many people are coming from different areas, right?

14:57 Like there might be somebody doing ML, but three years ago that person was, I don't know,

15:03 working in finance or maybe another person.

15:05 She was like a biologist, but she got into programming and now she's doing machine learning

15:09 because she's sort of right.

15:10 So it's, I think it's actually a little bit challenging to hire people in this space because

15:16 it's not just, well, show me your machine learning PhD and we'll talk about it.

15:20 Right.

15:20 Like there's probably not that many people in that realm, right?

15:24 Like there's not as much traditional education in the workforce yet.

15:27 Yeah.

15:28 So I think I agree with you that hiring is hard, but I think that hiring is hard right

15:35 now for machine learning for many reasons.

15:37 I think the first reason is like, it's probably because companies don't even know what they

15:42 are hiring for yet.

15:43 I think because machine learning is like, it's really new.

15:46 And if you're like imagining a company, like you have never deployed a machine learning model

15:50 before.

15:51 And now you're trying to start a new team.

15:53 So you're probably like, what do you need to build a machine learning model?

15:56 And you have no idea.

15:58 So you probably come up with some very generic ways.

16:01 And like you, like you say somebody who's like doing like state of the art research, somebody

16:05 who can code really well.

16:07 Yeah.

16:07 Somebody who can explain what they are doing.

16:09 So it's just like, these people just don't exist.

16:12 Yeah.

16:12 That's a good point.

16:13 But I think the second thing is that machine learning itself is not new, but machine learning

16:18 in productions, like especially from the explosion of like deep learning CNA 2012.

16:23 I think the first major application of deep learning in industry is probably Google Translate

16:28 in 2016.

16:29 And since then, a lot of companies have been looking into it.

16:32 So it means that, so I think like industry is like lagging behind like research, like a

16:38 few years.

16:38 Right.

16:38 So like research like grows and you have a lot of people knowing like how to do machine

16:43 in academic environment.

16:44 And then like industry say, oh wait, you can actually use that like to improve our business.

16:48 So let's do it.

16:49 So like at that time, like, so we're in the phase when companies are looking into it.

16:54 But most people who know machine learning comes from an academic environment.

16:57 So they are familiar with like how to like do machine learning research, but they actually

17:02 might not be like familiar with like doing machine in productions.

17:04 And there are not many people who can teach them because usually you need a hands-on experience.

17:08 So we have a very early phase of machine adoption in the industry.

17:12 And I think like, so that's why we are liking people.

17:15 But I think in a few years, we will have a lot more.

17:17 And hopefully like understanding of machine in productions plus availability of people

17:23 with actually hands-on experience will make hiring less, a lot less difficult for companies.

17:28 Yeah.

17:28 That's really interesting.

17:29 I can imagine if I was hiring, say like a couple more web developers or somebody to do

17:35 database ETL, like bring in the data and clean it up.

17:38 Yeah.

17:39 You already have people in your organization who do that and you can say, well, what do you

17:43 need?

17:43 And please talk to this person.

17:45 But if you're creating an ML team, like I know there's really large companies out there

17:49 that don't have a single person who's doing like productionized machine learning.

17:54 So it's like you pointed out, like it's starting from zero.

17:57 And a lot of times, you know, is that person who you're talking to really competent to make

18:02 that decision or make the right trade-offs, right?

18:03 Or even know what you're hiring the person for.

18:05 Yeah.

18:06 So yeah.

18:06 So you hire the person for and also like having people to like evaluate the skills can be very,

18:13 very hard.

18:14 And so, so I think I say a lot of, so, so, but actually I do see some shift in the future.

18:19 So I think like a lot of aspects of machine learning are being commoditized.

18:23 So for example, like you're seeing a lot of pre-trained models, right?

18:27 And people is like trained the model for you already and they open source it.

18:30 And you have a lot of like pre-built model, like hugging face.

18:33 So, so you can just like code API and then you can incorporate like some machine model in those systems.

18:38 So actually like, so like a lot of tools to allow you like future engineering or like runes-based systems,

18:45 like monitoring tools and deployment tools.

18:47 So, so I do believe that there's a bottleneck for machine learning in production now will be in the engineering part.

18:54 So I'm not saying that we stop doing research.

18:56 I'm not saying that like a lot less companies will do research.

19:00 I think that doing like the machine part can be like a few very large established company who know what they are doing.

19:07 And then a lot of other companies who use machine learning can just like use like existing tools and platforms.

19:13 So the challenges can be like engineering challenges and not machine learning challenges.

19:18 Yeah.

19:18 There's a lot of stuff that's getting pre-built out there.

19:20 Like I think Apple ships with some pre-trained models for running on iOS.

19:24 You've got like Azure cognitive services, stuff like that, right?

19:28 Where you just, you kind of just bring that in.

19:30 This portion of Talk Python To Me is brought to you by Datadog.

19:34 Are you having trouble visualizing latency and CPU or memory bottlenecks in your app?

19:39 Not sure where the issue is coming from or how to solve it?

19:42 Datadog seamlessly correlates logs and traces at the level of individual requests,

19:47 allowing you to quickly troubleshoot your Python application.

19:50 Plus, their continuous profiler allows you to find the most resource consuming parts of your production code

19:55 all the time at any scale with minimal overhead.

19:58 Be the hero that got that app back on track at your company.

20:01 Get started today with a free trial at talkpython.fm/Datadog or just click the link in your podcast player's show notes.

20:08 Get the insight you've been missing with Datadog.

20:11 So I guess one question, just thinking, sort of reversing it a little bit,

20:16 I guess you could see it from both ways.

20:17 If I was somebody who was looking for a machine learning job or I was hiring somebody,

20:22 how much do you think that engineering side should matter?

20:25 It sounds like it's pretty important.

20:27 So should, say, being competent with Git and source control be important?

20:31 Yeah.

20:31 Continuous integration.

20:32 Should you know something like FastAPI or Flask or something like that to build a service around your model?

20:38 Yeah.

20:38 What are the skills you think are really important there?

20:41 I think it really depends on what type of jobs do you want.

20:45 So I think Zezia, like, I think I want to say the term traditional machine learning

20:49 engineering job, but I don't think machine learning engineering is that old

20:53 to deserve the term traditional.

20:55 But I think that people have been using the last few machine engineering

20:58 as in, like, future engineering, creating models, training models, like babysitting models.

21:03 And I think that part requires quite a lot of machine learning and less engineering.

21:07 But, like, if you work to, like, set up a distributed pipeline, like how should you do, like, how can you process data in a parallel?

21:14 How can you print model?

21:16 How can you deploy model so that it can serve, like, a lot of requests at the same time but lower agencies?

21:22 Then you probably need more systems and databases and machine learning.

21:27 And if you are in the part when you want to, like, monitor the system, like, the maintenance and, like, monitoring,

21:33 so you can, like, how do you, like, push updates without, like, interrupting the service?

21:38 Or how do you, if something happens, like, how can you be alerted when some bad things happen

21:43 and then you can address it quickly or how can you run back the system?

21:46 Then I think a lot of it has, like, it's very similar to DevOps.

21:49 So you need a lot more things as well.

21:51 Right.

21:52 So it really depends on what roles you want, the company you want, because I think one thing I noticed is that, like,

21:58 companies have very different structure for their machine learning teams.

22:01 So companies like, for example, Netflix, so they have this, like, separate, like, algorithm team.

22:08 They focus on the aspect, algorithmic aspect of machine learning, right?

22:11 But so, so...

22:12 Yeah, the recommender engine is so important over Netflix, right?

22:16 I mean, that's so central.

22:18 Yeah.

22:18 They even had that million-dollar prize to see who could recommend...

22:21 Yeah, they do.

22:22 ...movies you should watch next best, right?

22:24 Yeah, but it's just really funny because, like, I think, like, these competitions are great,

22:29 but, like, the result is it very hard to actually, like, be deployed.

22:33 So, so I think, like, I'm not saying Netflix is not using the winning result.

22:37 I'm just saying that, like, for a lot of these competitions, like, the winning solutions are...

22:42 Even though the winning solutions perform well on the little board, they tend to, like, be very hard to be deployed because the solution is way too complex

22:49 to be reliably deployable.

22:52 Yes.

22:52 Oh, interesting.

22:53 Yeah.

22:53 Or maybe it's over-trained exactly on that one thing, and it's perfect at that, but it's

22:57 not generalized enough or something.

22:59 Yes, that's one thing.

23:01 I think that's one thing we have been talking about, like, how just start a competition,

23:05 like, leaderboard-driven oriented work is actually not very much close to real life.

23:10 Because when you have a leaderboard, right, you tend to have one single objective you

23:14 work toward.

23:15 For, in this case, you know, like, how good a model with the best performance.

23:18 But whereas in productions, you don't have one single objective.

23:22 Like, you might have different stakeholders in the company, and they help as one different

23:25 things.

23:26 Like, one person might want, like, hey, we want the best performance.

23:29 But then it's like, hey, we want the lowest legacies.

23:32 And that is like, hey, how do we can, like, do it in a way that we can show the most ads

23:36 without being obnoxious?

23:37 So there's a lot of things.

23:39 And sometimes you just optimize for one thing, like, you can't really go for other.

23:43 I think it's some interesting example of how, like, a machining model that can do very well

23:49 on leaderboards that's, like, not going to be useful in real life.

23:52 So think about that.

23:52 Do you remember that, I think about 10 years ago, it was like, so these giant retail companies

23:58 who have been, like, trying to, like, predict whether someone is pregnant, so that they can,

24:03 like, advertise directly to that, right?

24:05 Yes, yes.

24:06 So, and, like, someone found out, and then they sent, like, all the baby products to,

24:11 like, this teenage girl, to her family.

24:14 And, like, they didn't know about it yet, and now they suddenly know about it.

24:17 So that's an example of, like...

24:18 Yeah, they got really angry.

24:19 Like, why are you sending my daughter this?

24:21 And it turns out actually she was pregnant, right?

24:23 Oh, my gosh.

24:24 That's...

24:24 Yeah.

24:24 That's not so good for her.

24:26 No, no, it's not.

24:27 So that's an example of, like, it can be so good, it's creepy.

24:30 And you don't want that.

24:32 Or...

24:32 Yeah.

24:33 I think about, like, how we have a machining model that's, like, that can help the users

24:37 to, like, solve their problem really well.

24:39 So there are two things that can happen here.

24:41 Like, one is that, like, it solves the problem so well, that the user is just done.

24:45 They never have to come back to you ever again, and you just lose business.

24:48 Or they, like, they solve the problem so well, that the user just loves the system, they keep

24:52 coming back for more.

24:53 So, like, it's really hard to find the linear relationship between the model performance

24:57 and business performance.

24:58 Yeah.

24:59 I did really think about the relationship of these, like, competition winning algorithms

25:04 and models and stuff.

25:05 But, yeah, that makes a lot of sense that just because it solves that one problem, it might

25:09 not be practical to run in production or to maintain or whatever and evolve it.

25:13 Yeah.

25:13 I think you need to, like...

25:15 So I think that's why it's important for people who, like, who are in charge to, like,

25:19 give a good sense of what they want and how to, like, balance between different objectives

25:24 of different stakeholders in a project.

25:27 Yeah.

25:28 So give me...

25:29 Put the hiring hat on for a minute.

25:31 And if you're at a company that does not yet have an internal in-house machine learning

25:36 team, but you think maybe you want to, maybe we can analyze all this data we have and we

25:41 can find some trends and do more interesting stuff.

25:43 Yeah.

25:43 And you want to create an in-house ML team.

25:45 Like, what advice do you have for those people?

25:47 Okay.

25:47 So it really depends on who you are.

25:49 Like, you know, like, Walmart or, like, McDonald's, right?

25:52 You just want to acquire, like, a very promising ML startup and just have an in-house team.

25:57 And I think a lot of the big companies are, like, going for that approach.

26:01 Yeah.

26:01 I think another approach is that I think a lot of companies are doing is, like, to transitioning

26:06 to ML.

26:06 You might want to, like, use some existing talent in the company.

26:10 So machining is new, but data science is not.

26:13 So I think data science people, teams have been, like, a company has been having data science

26:17 team for the long run.

26:18 And data science teams also, like, work with data and they do a lot of it.

26:22 They probably already have access to data and they also try to get, like, patterns from

26:27 data.

26:27 So I think a lot of teams, like, in the beginning, they transition, like, use data science team

26:32 as, like, hey, why don't you learn machine learning and, like, try these things out?

26:36 Maybe a couple of you could learn PyTorch and work on this project and get started.

26:41 You might joke about it, but I think it's pretty much how people do it.

26:45 And I think I see a lot of people in data science transition into machine learning.

26:49 And I think especially now with abundance of machine learning courses, there's just so many

26:54 courses online for free.

26:55 Yeah.

26:56 I think it's great that people are taking advantage of it.

26:58 I was looking up, like, so courses, like, do you know Android machine learning course?

27:02 Yeah.

27:03 I haven't taken it, but I've heard of it.

27:04 Yeah.

27:05 I think it has, like, more than 2 million people who have taken the course already.

27:09 2 million students, right?

27:10 Yeah.

27:10 Crazy.

27:11 Yeah.

27:12 And it's pretty new, right?

27:13 I think it's pretty new.

27:14 Yeah.

27:14 That's really crazy.

27:15 Yeah.

27:15 It's new compared to other disciplines.

27:18 But I think, like, in machine learning, it's, like, one of the older courses.

27:21 So I think a lot of teams do that.

27:23 And I think, but I think, like, for companies who do that, I think one, too, like, hopes that

27:29 they just, like, to look into the difference between data science and machine learning.

27:32 So data science is, like, to look at data, like, the output, like, insight, like, to help

27:38 make decisions about business.

27:40 For us, you can predict the, like, how much the customer demand in the future or, like,

27:44 yeah.

27:45 But machine learning is, like, the goal is to have, like, to build product, to be, like,

27:50 engineering.

27:50 So for data science, like, you want people with stronger statistics skills because you look

27:55 at the data and get insight.

27:56 But for machine learning, it's more engineering.

27:59 So you want somebody with, like, stronger engineering skill and less that, yeah.

28:03 Right.

28:04 So as a data scientist, maybe your output might be, here's a Jupyter notebook with a Plotly

28:09 analysis of what we're thinking.

28:12 Whereas as a machine learning person, your output is, here's the API that gives you the answer.

28:17 Yes, you can think of it that way.

28:19 Something like that?

28:19 Yes, yes.

28:20 Something like that, yeah.

28:21 Okay.

28:21 So I think this is a very different focus.

28:24 I'm not saying just, like, I'm not trying to make a general statement here.

28:27 I'm just saying just, like, just from talking to a lot of people, I tend to notice that,

28:31 like, data scientists are seeing much better statistics, whereas, like, machine engineers

28:35 are much better engineers.

28:36 Yeah.

28:36 Yeah, I've seen that as well in some of the trends.

28:39 So.

28:39 Yeah.

28:39 It seems totally reasonable.

28:41 Let's reverse this a little bit.

28:43 So we were talking about if you want to build a team.

28:45 And you did point out, by the way, bringing someone in from the inside.

28:48 Like, I feel like data science, more than software developer, that role needs to be sort of intimately

28:56 familiar with the way that the business works and the way the data is collected and all the

29:01 little idiosyncrasies around it.

29:02 And so having somebody who already knows all that stuff, and now you're just like, okay.

29:06 Yeah.

29:06 Adapt that to machine learning might be easier than getting somebody who's good, but has no

29:11 experience in the business.

29:12 I think, like, I make a living out of, like, saying that I know machine learning, right?

29:16 So, of course, I want to, like, make machine learning as hype as possible.

29:20 But I have to admit that, like, machine learning for a lot of, like, simple models, you don't

29:25 need to learn.

29:26 You don't need to spend years and years and years of, like, learning to, like, be able to

29:31 use simple models.

29:32 So I think that's, like, one thing I noticed is, like, it's actually a lot easier for good

29:38 engineers to, like, pick up machine learning.

29:40 I've done for machine learning experts to pick up, like, good engineering.

29:43 Gotcha.

29:43 Yeah.

29:44 So, like, if I was to start a team, I would probably try to get, like, really good engineers

29:49 and have them learn machine learning and then, like, apply machine learning.

29:52 Then, like, to hire machine learning experts and then, like, having them, like, spend, like,

29:56 several decades to become good engineers.

29:58 Yeah.

29:59 That's a really good perspective.

30:01 Yeah.

30:01 All right.

30:02 So switching the role here to being interviewed for a machine learning job.

30:07 So you're working on this book for machine learning interviews.

30:11 It gives you, like, a sense of sort of if you're going to go apply for one of these jobs,

30:15 what are some of the skills and things you might expect to be asked about and so on, right?

30:19 Want to tell us quickly about that?

30:20 Yeah.

30:21 So this is a book I've been working on for, like, oh, my God, a year and a half now.

30:25 Do you know, like, how it has so many great plans for 2020 and none of them happened?

30:29 I think this is what's the case.

30:30 That's the case with my book.

30:31 I think it has so much great plan for it, like, and then, like, boom.

30:35 So, yeah.

30:36 So it's a slow.

30:37 It's coming along.

30:38 So I think my book is not just a book for, like, here are the questions they're going to ask you or, like,

30:43 how to answer them.

30:45 I think part of what I want to do with the book is to have some standardizations or understanding into the process.

30:51 I think it's new in the industry.

30:53 So it's new for both interviewees and interviewers.

30:56 So, for example, people still ask me, like, people would be confused, like, what is a machine engineer?

31:01 What is a data scientist?

31:03 Like, what's the difference between big company and small company?

31:06 What is the hiring process?

31:08 What skills do you need?

31:09 So I think there's just so many skills that one might need, but usually, like, you don't need all of them for a single role.

31:16 So I think my book is pretty, definitely start from it.

31:18 It's a different, I think I lost you.

31:20 Sorry, I don't know what happened to my network.

31:23 It just said that it lost and, like, all my stuff disconnected.

31:25 But we're back.

31:26 Yeah.

31:27 So we were talking about the book and you said it wasn't just for people, like, to know what the questions and answers were,

31:33 but that it, like, it's such a new industry that it's both new for interviewers and interviewees.

31:38 And I think we were going from there.

31:40 Okay.

31:40 Yeah.

31:41 So part of the book, it should give some understanding, standardization, and it should be a process of differences between different type roles.

31:48 Like, what is a data scientist?

31:49 What is a machine engineer?

31:51 Or what is a research engineer?

31:52 Also, like, it's a difference between, like, for example, like, machine engineering and data science and MLOps.

31:58 So it's going to, like, get a good picture of the process, what skills are needed for each process.

32:04 And I see interview and interview pipeline, building pipeline.

32:08 Yeah.

32:09 And a lot more.

32:10 Yeah, that sounds, look, we need all the help I think we can get for fixing the interview process.

32:16 Oh, my God.

32:17 In software development and data science.

32:19 It seems so broken to me.

32:20 I've had some friends who have gone through it recently, and it just seemed really, really rough.

32:27 And I actually did an episode.

32:28 Wait, let me do a quick search.

32:30 What are some of the highlights of the, like, pinpoints?

32:34 I think a lot of it is you get asked to work on, like, low-level algorithms, like, explain or create or recreate low-level algorithms, sometimes even just on a whiteboard.

32:45 Yeah.

32:46 Where, like, you know, go create quick sort.

32:48 And then never, ever in your job will you ever go and create quick sort.

32:52 Or something like that, right?

32:53 Yeah.

32:54 Like, you would just go to the list and say dot sort, and it would be done.

32:57 I interviewed Susan Tan a while ago, back up, way, way, way, in episode 123.

33:03 And she said, she did a talk called Lessons from 100 Straight Developer Job Interviews.

33:08 I think she was in San Francisco as well.

33:10 I'm pretty sure if I remember correctly.

33:12 A hundred is so much.

33:14 She literally did a hundred and then, like, took notes about what worked.

33:18 Oh, my God.

33:19 You know, you'll get, like, these big, like, homework projects.

33:22 Yeah.

33:22 Right?

33:22 Like, work on this for, like, a week.

33:24 And then, you know, there's a hundred applicants.

33:25 So, like, the energy put into that is often not.

33:29 Anyway, I think helping both sides of that story would be really good.

33:32 Yeah.

33:33 I would definitely love to, like, read her interview because it sounds exactly like what I've been working on.

33:38 I'm curious.

33:39 Does she, like, propose, like, what are some things that work?

33:43 She did, and it's been, gosh, it's been, like, two or three years since I spoke to her about it.

33:48 But I know she had some advice for, like, these things were really bad.

33:51 And these things I experienced were really good.

33:53 Yeah.

33:54 And so, she basically laid out, like, what are some bad interviews I had and what are some good ones and why?

33:58 And I think probably in there you could pull out some good advice.

34:01 Yeah, this is, like, really interesting because I think, like, before, as I was still, like, interviewing for jobs, I was, like, I have so much to complain about the interviewing process, right?

34:11 But now, as a part of, like, a startup and we're trying to build the reading pipeline, we realize that it's really hard.

34:17 Like, even though we complain about the existing pipeline, it's really hard to come up with something that is better.

34:23 So, I think it's just, like, too many.

34:25 So, the first of all is, like, interviews are just, like, proxy to evaluate somebody's skills, right?

34:31 Yeah.

34:32 So, you know, like, how even, like, so, like, this example, I know it's maybe, like, not very exact, but, like, first of all, I think about dating, right?

34:39 You try to find somebody and it's just approximate whether that person is a good fit for you and you might go dating for, like, years.

34:45 And you still end up with, like, some bad partner, if possible, right?

34:48 So, like...

34:49 Exactly.

34:50 The divorce rate's, like, 50% or something, right?

34:53 Like, we're not totally getting this nailed.

34:55 Yeah.

34:56 So, I think, like, for job interviews, like, you try to, like, admittedly, like, the stake is lower.

35:01 It's, like, for a job, not for a partner.

35:03 But you still have much less time, right?

35:05 Like, you only have, like, a resume.

35:06 Everyone say you shouldn't keep the resume longer than one page.

35:09 You have one page of that.

35:10 And then you maybe go on LinkedIn, social media, local things.

35:13 And then you have, like, a few hours.

35:14 Like, it's really hard to, like, get a good picture from it.

35:18 And a lot of it's, like, biases, like, because interviewers are humans.

35:21 And even though we try not to, like, we learn, we are taught that you shouldn't let biases,

35:27 you shouldn't decide, like, judge people beyond that.

35:29 But something we, like, we grew it.

35:32 Like, something is, like...

35:33 Yeah.

35:33 We just do it without even, like, being conscious of doing so.

35:37 And also, like, it's very different for different people because something that might work for

35:41 a group of people might not work for as a group of people.

35:45 So I think, like, for example, like, we have been trying to debate on take-home challenges.

35:49 So a lot of candidates told us, like, oh, my God, interviews, like, so stressful.

35:53 Like, one-on-one is really hard.

35:55 Why don't you just give a take-home challenge?

35:56 Like, just make it, like, I don't know, make it hard.

35:58 We're going to spend, like, a day on it.

36:00 When it be done, and you can see how good a way we are.

36:02 But then, like, we thought about it, and we talked to people.

36:05 And then we realized, like, for people who have a lot of responsibilities outside of work,

36:10 like, especially, like, for example, women or, like, people with small kids,

36:13 they can't spend a day, like, do or take-home challenges.

36:15 So I think, like, it's what might work for me.

36:18 Yeah.

36:18 And if they apply to 100 jobs, then all of a sudden that's half a year or something like that, right?

36:22 Yeah.

36:23 Yeah.

36:24 So it's very hard.

36:25 So some companies, some people told me that, oh, they like this concept, like that company,

36:29 when they bring you on to, like, as an intern-ish for, like, a month, they pay you.

36:34 And then if you do well, then you can get a job.

36:36 And somebody say, oh, that's great, because now everyone gets a chance to, like, show how good they are at the job.

36:40 But then not everyone can afford to, like, just go on a job without any commitment for, like, for a month, right?

36:46 And it's going to be totally excluded on immigrants.

36:48 Like, for example, if somebody needs visa sponsorships, they can't just go and work for it.

36:52 Right.

36:52 It's a very precarious situation if your presence in the country is based on, you know, you have to have a job.

36:59 And if it lapses for more than a month or two, then you've got to leave.

37:03 That's really stressful.

37:03 Yeah.

37:04 Yeah.

37:04 So I think it's really hard to find something that can work for everyone.

37:07 Yeah.

37:08 So a couple of thoughts.

37:09 One, it's been a very long time since I hired anybody.

37:12 But I used to help with hiring people to do training, people who would become trainers to teach, you know,

37:18 basically for professional development for software developers.

37:21 And we would obviously go through the resumes and see if they made any sense.

37:24 And then we would just do a quick, like, 30-minute call.

37:27 And I would say, okay, so imagine this person says, I'm a Python expert and I specialize in Flask.

37:33 All right, we're on a Zoom call.

37:34 Share your screen.

37:35 Build me a Flask app that has one function that returns JSON.

37:39 If I give it two numbers, it adds them.

37:42 I mean, like, anybody who's ever worked with Flask should be able to knock that out in five minutes.

37:47 And you can tell from, like, one minute in, is that person on that path to, like, get there?

37:52 Because they know you start with import Flask and then you create app equals Flask.

37:55 Or are they just flailing about, right?

37:58 They just have no idea.

37:59 And there's no way that they can both be an expert in Flask and not be able to create, like, a Hello World app in it, right?

38:07 And so, I mean, that was the first sort of filter we used before we actually would ask them the equivalent of, like, here's a take-home project or something.

38:15 It's just like, show me live that you're semi-confident with the tools you claim to be, like, top-notch in, right?

38:22 And that actually worked pretty well, I think.

38:24 I was blown away at how many people would claim to be, like, I've done five years of this and I'm a super expert and I'm ready to teach it to other people.

38:30 And then they can't even begin to touch it.

38:32 So, I think that's an interesting interview approach.

38:35 But I also find it's, like, for, if we do interviews, it's very, like, tailored, like, specific, like, overfit to a specific tool.

38:42 Then we might find people who are really good at one tool, but then don't really just, like, scale, like, right at a range of scale.

38:48 And I think that for startups, if, like, if a fast-changing company where you need to, like, have a lot of, like, new problems and have to keep, yeah, we will have to, like, keep learning new things.

38:57 If you just, like, get someone who's really good at, like, one thing and they can't generalize the other.

39:02 Sure.

39:02 What I was trying to more find, what we were trying to discern was they said they were expert at this thing.

39:09 Yeah.

39:09 Are they actually, like, how much can you trust?

39:12 So, if they can show they're expert at this thing they said, then probably the other stuff that they said they're pretty good at.

39:16 They're probably also in that realm.

39:17 Yeah.

39:18 But if they're, like, really far from, like, how they describe themselves in one axis, then they're probably not really going to be in a good fit.

39:25 So, yeah.

39:26 I don't know.

39:26 It worked okay.

39:27 We didn't do that much hiring.

39:28 This portion of Talk Python To Me is sponsored by Linode.

39:32 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

39:36 Develop, deploy, and scale your modern applications faster and easier.

39:40 Whether you're developing a personal project or managing large workloads, you deserve simple, affordable, and accessible cloud computing solutions.

39:48 As listeners of Talk Python To Me, you'll get a $100 free credit.

39:52 You can find all the details at talkpython.fm/Linode.

39:57 Linode has data centers around the world with the same simple and consistent pricing regardless of location.

40:02 Just choose the data center that's nearest to your users.

40:05 You'll also receive 24-7, 365 human support with no tiers or handoffs regardless of your plan size.

40:12 You can choose shared and dedicated compute instances or you can use your $100 in credit on S3 compatible object storage, managed Kubernetes clusters, and more.

40:22 If it runs on Linux, it runs on Linode.

40:24 Visit talkpython.fm/Linode or click the link in your show notes.

40:29 Then click that create free account button to get started.

40:32 The other thing that I wanted to bring up is, did you hear that Guido Van Rossum just joined Microsoft?

40:37 Oh my God, yes.

40:39 I think, yes.

40:40 Yeah, I was like, it was really interesting.

40:43 What was the thought on it?

40:44 So he said, basically, he's been retired for six months.

40:47 He's like, I'm really bored with this.

40:48 I want to go back to do something.

40:49 There's a ton of cool open source stuff going on there now.

40:52 And, you know, he gets to work with some of the other language teams and make Python, basically just focus on Python and be around it.

40:58 Yeah.

40:59 And that's all interesting.

41:00 And I think it's actually kind of a big deal that that happened.

41:02 And it's like a really big contrast from Microsoft 10 years ago that this is even possible.

41:07 But the thing that I want to bring up specifically now is somebody on Twitter asked him, so did you actually have to send in a resume, Guido, before they hired you?

41:16 And he said, yes.

41:18 Yeah.

41:18 He had to send in a resume.

41:20 He went through a bunch of interviews.

41:22 The interviews make sense, but he had to send in his resume and he had to provide his degree he got in university.

41:28 And like his transcript, like his grades and stuff he got in college.

41:31 That's what I don't get.

41:33 And I'm just thinking like, who cares if he got an F in literature or didn't?

41:40 Like, look what he's accomplished since then.

41:42 It doesn't matter.

41:43 But that's just another one of these hiring things, right?

41:45 Well, we got to check the box.

41:46 We need his like university degree in transcript.

41:49 This is so funny.

41:50 Or this technical fellow.

41:51 Does that remind me of like a few years ago?

41:54 Do you know Malala?

41:55 She was like the youngest Nobel recipient for Nobel in Peace?

41:59 No, no, Malala?

42:00 Yes.

42:01 Yes, I do.

42:01 Uh-huh.

42:02 Yeah.

42:02 It's really funny because like at the time she was like, oh, she wanted to study at Stanford.

42:06 And like Stanford was like, yes, but what's her SAT score?

42:09 And everyone was like, she's like youngest recipient in Nobel Prize for Peace.

42:13 And you're asking her like, what's her SAT score?

42:16 I thought it was just like, yeah.

42:17 Just going to cram them through the bureaucratic pipeline.

42:21 It's so funny.

42:22 Yeah.

42:22 All right.

42:23 So what are some of the other takeaways that you're like hoping to give in this book?

42:27 And you also have a chapter that's open on GitHub People Can Download, right?

42:32 Yeah.

42:32 So this is the chapter.

42:33 So I think one part of the interview a lot of people ask is a machine learning system design.

42:39 And so the question is usually like, yeah, like if you want to like build a system to do that, how would you do it?

42:46 So it's very design high level kind of questions.

42:49 And I think, so first of all, one question could be like, if you try to build a system to predict what keyword is trending on Twitter, then what would you go about it?

43:00 Like what is considered trending and blah, blah.

43:02 So I think this question is very interesting because it's usually like try to measure the understanding of like the different part of the system and not just like machine learning.

43:12 But also find that questions can be like pretty, very hard for especially junior candidates because they don't have a good graph of like what is a production environment.

43:22 So some companies.

43:23 Yeah.

43:24 A lot of times you have to see examples of that or have built examples of that to know like, well, these are the five pieces we got to put together.

43:30 And then you do it, right?

43:31 Yeah.

43:31 So originally I wrote it as part of the interviews book, but then as I start writing more about it and I learning more about it, it was like, oh my God, there is like so much more in machine learning system design.

43:43 So now it's actually become like a full blown book on its own.

43:47 So that's why it's taking me longer.

43:49 And I'm actually like teaching you a course on it, like machine learning system design, just on that part.

43:53 Oh, that's cool.

43:54 And you're teaching that in January.

43:56 Is that right?

43:57 At Stanford?

43:57 Yes.

43:57 And I'll be in January at Stanford.

44:00 Yeah.

44:00 It's a bit strange because I'm not sure how teaching online is going to go.

44:05 I'm a bit, a little bit nervous about that.

44:07 Yeah.

44:08 It's not the same as standing in front of the class and having that experience.

44:11 That's for sure.

44:11 Yeah.

44:12 But most people told me that it's a different experience because some students like it more because especially for the introvert.

44:18 Now they can just like ask questions anonymously without having to raise their hands and having anyone stare at them.

44:24 So it's going to be interesting.

44:25 I'm looking forward to it.

44:26 Yeah.

44:27 It should definitely be interesting.

44:28 All right.

44:28 I think we're just about out of time, but maybe just real quickly, you could give us the elevator pitch on Storkel, Snorkel AI and what you guys got going on there.

44:37 Ooh.

44:37 Okay.

44:38 So I think like for the pitch, I think you can just say why I decided to join Snorkel.

44:43 So it's funny because it's a startup that comes out from Stanford AI lab and I have heard of them for a while.

44:49 And when it first approached me, I was like, oh my God, another startup from Stanford.

44:53 I know it sounds super smart, but I was like, oh, startup AI, whatever.

44:57 But then I came across the paper.

44:59 So most of the founding teams are like PhD students and they have been publishing a lot.

45:05 I read one of their papers and I was like, this is really smart.

45:09 So the key idea for their paper was that like, instead of manually label on the data, right?

45:15 You can have some heuristics and causal heuristics into programming functions and apply to all the data at once.

45:22 That's the really hard thing about training your models is you get like all the state that you have to say car, bicycle, ball, tree, right?

45:30 And you just got to like go through it and teach it basically.

45:32 Yeah.

45:33 So you notice a helpful like labels.

45:35 Like for example, you see like an email with a spam or not spam, right?

45:38 You probably notice you probably have some heuristics.

45:40 Like, hey, if you say like, hey, you're going to have like, hey, please send me money to like Nigerian Prince or something like you're going to spam.

45:48 So, so you have some like heuristics in the brain.

45:50 Like, so if you can find what you end cause of heuristics and you don't have to manually do it on at once.

45:55 So I think that's the algorithm.

45:56 So like how should I combine because some heuristics are going to be noisy and like overlapping and they can't see each other.

46:02 So the current algorithm was like how should I combine one of them and I generate like what the techniques of.

46:08 most likely to be correct gradsures because you don't have gradsures actually compare gradsures.

46:12 So then you generate the set of gradsures and then you, and so they, they open source support.

46:19 So like anyone can just go on GitHub and use it.

46:21 So I went to the core and thought like, wow, these people are like good engineers.

46:25 Because you think of like PhD students are like bad engineers, but then they, their core is like good, very clean.

46:31 They have more testing and everything.

46:32 Like unit tests.

46:33 Oh my God.

46:34 No.

46:34 So, no, no.

46:38 So, so I think the product now is a, it's not just a part because actually a lot of thing of snorkels is thing of that labeling part, but we actually be like a full non like a end to end platform.

46:49 So we have your format data to like modeling training.

46:52 Like we do a lot with like monitoring analysis because we believe that you can't, machine learning is because it's changing fast and each updates on models constantly.

47:02 And so we believe in iterative development.

47:04 So like you have a, you, you train model and you see it and it's not good.

47:08 So you go back and see what's wrong and how do you improve it.

47:10 And like you manage more data.

47:11 So, so we version everything as a process, by the way.

47:14 And so.

47:15 That's cool.

47:15 It's like agile data.

47:17 We don't use agile yet, but if it's one of the buzzwords of sales and maybe we can adopt it.

47:23 Yeah, I know.

47:24 I'm just teasing.

47:24 It's one of my buzzwords.

47:25 I'm just kidding.

47:25 Oh my God.

47:27 Please, somebody from snorkel, please don't fire me.

47:29 But, but yeah, so, so we do a lot of, so it's an end to end platform for people to build machine, AI applications.

47:37 And it goes on data model, monitoring analysis.

47:41 And I think it's pretty dope.

47:42 You guys should totally check it out.

47:44 Yeah.

47:44 Right on.

47:45 Awesome.

47:45 Well, it sounds like a cool company to work for and definitely nice applied machine learning stuff.

47:51 So building tools for machine learning folks, right?

47:54 Yeah.

47:54 Awesome.

47:55 I think it's for machine learning folks, but I think like.

47:57 All right.

47:57 We recently wanted to lower the entry barriers for people to build AI applications.

48:01 So I think like, so our platform is actually no code.

48:05 So like you have the option to just build an application without any code at all.

48:09 But we also have like our SDK.

48:10 So like for people who want more like flexibility, then you can also like code.

48:15 Yeah.

48:15 Very nice.

48:16 All right.

48:16 Well, good luck with the whole company and the startup.

48:19 Hopefully it takes off and does well.

48:20 It sounds nice.

48:21 Yeah.

48:21 Thank you.

48:22 So we're pretty much out of time, but yeah.

48:24 Thanks for all the advice on building machine learning teams or getting to be part of one.

48:28 Now, before you go, there's always the two questions I ask at the end of the show.

48:32 And one is if you're going to write some code, some Python code, what editor would you use these days?

48:37 So sometimes I really want to be smart and say it's like I use Vim, but actually just use VS Code.

48:43 VS Code is definitely the most popular answer these days.

48:47 It's all good.

48:48 And notable PyPI package, like something, some Python library or package that you've come across like, oh, this was so cool.

48:54 People should know about X.

48:56 I'm not sure.

48:57 Is this, so, so, so, do you know about, I think it's like Paper Mill.

49:00 So it just allows you to format.

49:02 Yeah.

49:03 I think it's pretty cool.

49:04 It allows you to do a lot of experiments with like Jupyter Notebooks.

49:07 I think.

49:08 Yeah.

49:08 And Paper Mill comes out in Netflix.

49:10 So it's a neighbor of yours.

49:11 And.

49:12 Nice.

49:13 Yeah.

49:13 The idea is you can almost treat Notebooks like functions, right?

49:16 Like you can pass arguments to them, run them and get like something out and then even chain them together.

49:20 Yeah.

49:21 And one of the things I heard was really nice about it is if you create sort of data pipelines of one notebook going to the next, the next.

49:27 And if something goes wrong, like the notebook actually contains like all the data that came in and what it tried to do and how far it got.

49:34 Yeah.

49:34 And it's like almost a record instead of just like server failed with 500.

49:38 Like, no, like here's all the details.

49:40 You can go back and look at it.

49:41 Yeah.

49:41 It's pretty dope.

49:41 I think, I think it's really cool.

49:43 I think there's been like so many exciting work in the notebook space.

49:47 I think it's a real like Streamlit.

49:48 I think Showlit is like, it's like really cool.

49:51 Like you create like very quick applications.

49:53 I mean, there's just so many.

49:55 Yeah.

49:55 That's really nice as well.

49:56 Yeah.

49:57 What's your, which is your favorite?

49:59 My favorite.

50:00 Oh my gosh.

50:01 You know, there's all these different ones that always blow me away.

50:04 I go through so many of them.

50:06 One that I came across recently that was pretty neat is called a back off.

50:09 Someone told me about that.

50:10 And back off what you do is just put a decorator on one of your functions.

50:13 You say, if I get this kind of error, like this type of exception.

50:16 Yeah.

50:17 Like wait five seconds and then try again and then wait 10 seconds and then try again.

50:20 So if you're doing like testing against like an API and you get like a too many requests error,

50:26 you can say instead of fail the test, just wait one second and try again.

50:29 This sounds pretty dope.

50:31 I think I need to check it out.

50:33 Yeah.

50:34 Yeah.

50:34 It's super easy to use, but it kind of solves that problem of like mostly reliable, but not

50:38 all the time reliable stuff.

50:39 Do you like stars things on GitHub when you see repos that you like?

50:43 I do.

50:44 Yeah.

50:44 I star stuff all the time.

50:46 Yeah.

50:46 Can I just go into those, can I see you go to the star list and let's see like what have

50:50 you been like looking at?

50:51 Yeah.

50:52 So github.com/Mike C. Kennedy.

50:54 And let's see, I'll pull up my stars and see where are the things that I've starred?

50:59 There we go.

51:00 So the things that I have up here right now, that's a really good question, by the way,

51:04 like really cool way to look at it.

51:05 So I have pip chill, which is like pip.

51:08 You know, if you do pip freeze, it'll show you what you've installed.

51:11 pip chill will like pip freeze will include everything that was installed, including the

51:15 dependencies.

51:15 pip chill will just show you just what you manually installed, not the dependencies, which is cool.

51:21 Nice.

51:22 Then link it, L I N Q I T adds like link functionality to the Python language is cool.

51:28 I love the name, by the way, pip chill.

51:31 pip chill, yeah, it's so good.

51:33 And then I have a FastAPI chameleon and FastAPI Jinja, which adds like those templating languages

51:40 to FastAPI as a decorator.

51:42 Oh, you saw MB black is cool.

51:44 Yeah.

51:45 Yeah.

51:45 MB black adds black to notebooks.

51:47 Yeah.

51:47 So those are the ones I've starred recently, I guess.

51:49 I'm so funny.

51:50 So those are all good.

51:51 So you saw like a lot of FastAPI, but still still a lot of flask.

51:55 Do you have like, you prefer one of another?

51:58 Yeah, I do.

51:58 I really like FastAPI.

52:00 I've been liking it a lot.

52:01 Oh, it's brilliant.

52:02 Yeah.

52:03 I think it's, yeah, it's so brilliant.

52:05 It takes all the cool modern features of Python and puts it together.

52:08 So let me make one recommendation for you.

52:10 Check this out.

52:11 I just want to get your reaction to this for people as a data science machine learning

52:15 person.

52:15 Hand calcs.

52:16 What is that?

52:17 Have you, have you seen hand calcs?

52:19 No.

52:19 So hand calcs is crazy.

52:21 So what this does is you write, you create a Jupyter notebook.

52:23 Yeah.

52:24 And you write some sort of math equation that is actually just the computation.

52:28 And then you can ask it to show you, and it'll show you as if it wrote it in LaTeX.

52:32 What?

52:33 Like step by step, like how, yeah, like how it solved out the problem.

52:37 So it'd have like, like the nice square root.

52:38 So if you're doing some kind of like computation that's somewhat technical and hard.

52:43 Yeah.

52:43 In Jupyter, it'll actually show you like what you would put into like a math textbook or a

52:48 physics textbook to derive the equations and even the steps you might take to go from what?

52:52 Can it do proof for you?

52:54 I don't know how far it can go with a proof, but if you just go to like Google hand calcs.

52:59 Wait, how do you spell it?

52:59 There's a bunch of animated GIFs.

53:01 How can you say the name?

53:03 H-A-N-D, C-A-L.

53:05 H-A-N-D.

53:06 C-S.

53:07 Like.

53:07 Oh.

53:07 C-A-A-L-C-S.

53:10 Like hand calculations.

53:11 Okay.

53:11 Is this, is this from corner first?

53:14 Yes.

53:14 Oh, that's dope.

53:16 And.

53:16 Oh yeah.

53:17 If you just page down through it, you can see like all these amazing steps.

53:20 You can like render like symbolic mathematics and like the steps between various things.

53:24 Yeah.

53:24 It's really, really.

53:25 So if you were doing like complex calculations.

53:28 Whoa.

53:28 That you want to make sure you got right.

53:30 Like reading the Python code to do it is harder than like reading the symbolic mathematics of it.

53:36 Wait, how does it do this?

53:37 Yes.

53:38 I have no idea.

53:38 But it uses like Sempy and a bunch of other LaTeX and all sorts of crazy stuff.

53:43 So as a data scientist, like this thing is killer, I think.

53:47 That is pretty dope.

53:48 Nice.

53:49 Thanks for showing me.

53:50 I'm going to show it to my friends.

53:51 Yeah.

53:52 Yeah.

53:53 There you go.

53:53 So there's, there's a topical recommendation.

53:56 How's that?

53:56 Nice.

53:57 That's helpful.

53:58 Thank you.

53:58 That's dope.

53:59 Of course.

54:00 Cool.

54:01 All right, Chip.

54:02 Well, I think we're about out of time, but I just want to say thank you for being on the show and sharing all of your advice.

54:07 And I guess one final question, if people are interested, if they're out there looking to do some machine learning, are you guys hiring?

54:14 Yes.

54:14 We are hiring a lot, actually.

54:18 That's actually one of our challenges.

54:21 Like how to give on, yeah, hiring a large quality of like quantity of very good people.

54:27 Yeah.

54:27 That's, that is definitely a challenge, but we'll put a link maybe over to like the jobs page or something.

54:31 If you want, people can check it out.

54:33 This has been another episode of Talk Python To Me.

54:37 Our guest in this episode has been Chip Hewn, and it's been brought to you by Datadog and Linode.

54:42 Datadog gives you visibility into the whole system running your code.

54:46 Visit talkpython.fm/datadog and see what you've been missing.

54:51 We'll throw in a free t-shirt with your free trial.

54:52 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

54:58 Develop, deploy, and scale your modern applications faster and easier.

55:01 Visit talkpython.fm/linode and click the create free account button to get started.

55:06 Want to level up your Python?

55:08 If you're just getting started, try my Python Jumpstart by building 10 apps course.

55:13 Or if you're looking for something more advanced, check out our new async course that digs into all the

55:18 different types of async programming you can do in Python.

55:21 And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

55:26 It's like a subscription that never expires.

55:27 Be sure to subscribe to the show.

55:30 Open your favorite podcatcher and search for Python.

55:32 We should be right at the top.

55:33 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

55:38 and the direct RSS feed at /rss on talkpython.fm.

55:43 This is your host, Michael Kennedy.

55:44 Thanks so much for listening.

55:46 I really appreciate it.

55:47 Now get out there and write some Python code.

55:49 Thank you.

56:09 Thank you.