Learn Python with Talk Python's 270 hours of courses

#262: Build a career in data science Transcript

Recorded on Wednesday, Apr 22, 2020.

00:00 Has anyone told you that you should become a data scientist?

00:02 Have you heard it's a great career?

00:04 In fact, data scientist is the best job in America according to Glassdoor's 2018 rankings.

00:11 That's great, but how do you get a career in data science?

00:14 And once you've landed that first job, how do you find the right fit?

00:17 How do you find the right company?

00:19 And how do you get more deeply involved with the community as you grow in that career?

00:24 I've brought two great guests, both highly successful data scientists, on the show today who have been thinking deeply about this.

00:30 Jacqueline Nolas and Emily Robinson are here to give you real-world, actionable advice on getting into this rewarding career.

00:37 This is Talk Python to Me, episode 262, recorded Wednesday, April 22nd, 2020.

00:57 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

01:02 and the personalities.

01:03 This is your host, Michael Kennedy.

01:05 Follow me on Twitter where I'm @mkennedy.

01:07 Keep up with the show and listen to past episodes at talkpython.fm.

01:11 And follow the show on Twitter via at Talk Python.

01:14 This episode is sponsored by Kite and Linode.

01:17 Please check out what they're offering during their segments.

01:19 It really helps support the show.

01:21 Jacqueline, Emily, welcome to Talk Python to Me.

01:24 Thank you.

01:24 Thank you.

01:25 Excited to be here.

01:26 I'm excited to have you both here.

01:28 Really excited to talk about this topic.

01:30 I think one of the things that a lot of listeners out there can benefit hugely from

01:35 is how do I get started in programming?

01:39 How do I get started in data science?

01:41 How do I get started in this overall sort of Python career?

01:44 And there's so many different paths and ways you can go.

01:49 You could go get a four-year degree.

01:50 You could drop out of college and do a startup.

01:53 What is the right path?

01:55 And what is some guidance around there?

01:57 What are some of the trade-offs?

01:58 And so you both have been writing about this lately.

02:02 And this is really, really good work that you're putting out.

02:05 So I'm excited to talk to you both about that.

02:07 Great.

02:07 Yeah.

02:08 So we just published our book, Build a Career in Data Science.

02:12 We've been working on this, I think, almost two years now.

02:14 Yeah.

02:15 Two years since you reached out to me.

02:17 Yeah.

02:17 And so two years, a lot of work, a lot of talking to people in the field.

02:21 And so it's really been great to finally get it out there and talk to people and watch people

02:25 actually get help by it.

02:27 So yeah, we love talking about it.

02:29 Awesome.

02:30 Well, I'm really excited to have you all here to talk about it.

02:32 But before we get to that, let's maybe do a little bit of a meta thing.

02:36 I always ask this question on the show, but it's a bit meta this time.

02:39 It's just how did you get into programming in Python?

02:41 Emily, you want to go first?

02:42 Yeah.

02:43 I don't know if I can admit this on this podcast, but I actually don't program in Python

02:46 anymore.

02:47 I program in R day to day.

02:49 I have programmed in Python.

02:50 So how I got started was back in college.

02:53 I did Python in one computer science class, but most of the programming I did was in R in

02:57 the statistics program.

02:59 So I was lucky enough to go to Rice University when Hadley Wickham was a professor there who,

03:03 for your listeners who don't use R, is a very famous R programmers, contribute a lot of

03:08 the big packages to it.

03:09 So that's how I got started.

03:11 And I kept doing it in grad school, which I got my master's in organizational behavior.

03:15 After that, I went to a data science bootcamp called Metis, which was all in Python.

03:20 So, you know, sort of up my Python skills there and then got started working in data science

03:24 and industry.

03:25 Yeah, very cool.

03:26 Declan, how about you?

03:27 Okay.

03:27 So like Emily, most of my work is in R, although I do some Python too.

03:31 But okay.

03:32 So my background, I did an undergrad and master's in math.

03:36 And then like, I really want to help companies use math to solve problems.

03:40 This is before the term data science existed.

03:41 So I went out in industry, did some what is now data science, but I didn't know it at the

03:46 time.

03:46 Went and got a PhD because I wanted to get some more technical skills.

03:50 And then now I work as a consultant helping out companies.

03:52 Awesome.

03:53 What's your PhD in?

03:54 Industrial engineering.

03:55 Yeah.

03:55 But actually, so how I actually started using Python, I think the first Python project I

03:59 ever did was there's a style of games from like the 80s called roguelikes, where you have

04:04 like your little asterisk symbol and you're walking around like the computer screen trying

04:07 to fight monsters and stuff.

04:08 And like the monster might be the letter M.

04:10 And I wanted to make one of these.

04:12 It's like a mud, but it has like some visual representation.

04:15 Okay.

04:16 Yeah.

04:16 Awesome.

04:16 Yeah.

04:16 And so it just like, it gets weirder.

04:18 So I wanted to make one of these and I wanted to make one of like when you're on a river and

04:22 you're like tubing and you're sitting there, I want your character, your little ad symbol

04:25 to be like you floating around the river and you like see horses and stuff.

04:28 So anyways, Python was the language that had the most straightforward library for making

04:32 one of these roguelike games.

04:33 And so I spent like two weeks coding up this tubing simulator and then I got bored of the

04:38 project and left it.

04:39 But that was my first time actually using Python.

04:41 Yeah, it's cool.

04:42 And these little personal projects are super valuable for getting into programming because

04:46 you just go through and say, well, I'm learning about loops.

04:49 So I'm going to write like some different kind of loops.

04:50 Like no one learns that way, right?

04:52 Not really.

04:53 Yeah.

04:53 So that's one of the chapters in our book.

04:56 We have a whole chapter on, hey, you know, it's really good to learn things this way.

04:59 And you can actually make a portfolio of projects that you then can use to help

05:03 you get a job.

05:04 And for me, one of the, besides making tubing simulators, one of the projects I actually

05:09 did was to learn neural networks.

05:11 I generated a neural network that would create offensive license plates that would get banned

05:17 by the state of Arizona because I had a data set of all these license plates.

05:20 And that ended up being the basis for like an extremely valuable consulting project where

05:26 I ended up doing natural language processing using the same stuff I learned from that offensive

05:29 license plate thing.

05:30 That is so cool.

05:31 It's just such a fun and playful and kind of silly project.

05:34 But then, yeah, what I found really interesting in programming in general is you have these

05:40 two different realms, you know, like think hedge fund or Air Force, right?

05:45 But the fundamental thing you learn and the skills you gain to solve or work in those areas, it's

05:51 almost exactly the same.

05:52 It's just like, what layer is the specialty that they're working in?

05:57 You know, how do you work with trading?

05:58 They care maybe more about like timing versus, I don't know, visualization or whatever.

06:03 But it's blown my mind like how similar stuff is like that.

06:07 Like I created this fun thing for license plates.

06:09 And then it turns out to actually let me, I don't know what exact project you were working

06:14 on, but like something that obviously was probably not that, right?

06:17 It was a real thing someone paid me for as opposed to a license plate thing which gets me some blog post

06:22 views.

06:22 Yeah.

06:23 Just a very short side note.

06:25 Did you see that somebody thought they were going to be so, so clever and get out of all

06:29 the camera, like camera speed traps and stuff that red lights, red light cameras and whatnot.

06:36 So they got their license plate to say null, N-U-L-L.

06:39 And they thought like that would just trigger the database to think nothing was there.

06:43 But they started getting tickets for every faulty piece of data that was in.

06:48 Oh, no.

06:50 There was nobody, like there wasn't a license plate properly.

06:53 It was null.

06:54 So they started getting thousands of dollars of tickets that weren't theirs, but that was,

06:59 they're now registered for null, like across the board.

07:01 It's upstate.

07:02 I care.

07:03 Playing with fire.

07:04 Oh, that backfired bad.

07:05 Anyway, that sounds really, really fun.

07:07 So Emily, what are you doing today?

07:09 Like day to day, what do you work on?

07:11 Yeah.

07:11 So I work as a senior data scientist at Warby Parker, which makes eyeglasses and out contacts

07:16 as well.

07:17 You can get online and well, you could in previous times get in stores as well.

07:21 But that, of course, like many other companies, we've closed all our retail stores at the moment.

07:25 Yeah.

07:26 We have a Warby Parker here in Portland and I almost went in it the other day, but not anymore.

07:30 Yeah.

07:30 But still online, right?

07:31 Yeah.

07:31 Still online.

07:32 Still online.

07:32 Yeah.

07:33 So I joined there in December.

07:34 So about five months ago now on the data science team, which is a centralized team that works

07:40 with departments across the company.

07:42 So that's been really fun because previously I'd always been a data scientist that was embedded

07:47 with a team.

07:48 So when I worked at Etsy, the analytics department was centralized, but we were sort of paired with

07:53 one partner team.

07:54 So I worked with search my whole time.

07:55 And then my last job, I was part of the growth team.

07:59 So I reported to the VP of growth, not to the chief data scientist.

08:02 So this has been a new experience being on a fully centralized team where for one month I

08:07 might work with finance.

08:08 And then a couple months later, I'm working with product strategy.

08:11 That sounds super fun.

08:12 And you get to experience different kinds of problems and work with different teams

08:16 and technologies, I'm sure.

08:18 Yeah, exactly.

08:18 And so, you know, and the team tackles a wide range of projects.

08:22 So one thing we discussed in our book, right, is data science is a pretty broad field.

08:25 And so there's lots of different projects you can do.

08:27 So some of ours are making like a dashboard to view analytics and some other ones may be more

08:34 modeling problems or making machine learning product.

08:36 So that's been really interesting to get this breadth of things that we can work on depending

08:41 on what the team we're working with needs.

08:43 Yeah, sounds super cool.

08:44 Jacqueline, how about you?

08:45 I'm working as an independent consultant.

08:47 So I've spent the last couple of years working, you know, as my own company.

08:51 So helping out big companies like T-Mobile, Expedia, some smaller startups in the Seattle area.

08:57 And so this is pretty fun because, you know, like Emily was saying, by being a consultant,

09:01 I got to work on all sorts of different projects, whether it's taking machine learning models

09:05 and pulling them into production or helping a company figure out who are the most like active,

09:09 engage customers and how to, you know, think about targeting them differently.

09:12 So that's pretty great.

09:13 Unfortunately, it is not like the best time to be a consultant on your own right now.

09:19 You know, it's been a little dicey lately, but all in all, I've really enjoyed getting

09:22 to work with all these big companies on all these different interesting data science problems.

09:26 Yeah, it sounds really fun.

09:27 And it is nice to be able to pick who you want to work with and what projects you want

09:31 to take.

09:31 And you have a lot, a little more freedom, I think, to kind of go your own way.

09:35 But right now, I don't know, all bets are off.

09:39 It's yeah, I think it's tricky to be a freelancer right now because you feel it, all these changes

09:44 and all these pressure, you feel it immediately.

09:46 Right.

09:47 But if you work at, I don't know, some large company, you might not feel it right away,

09:52 but then, you know, maybe that company goes under and then all of a sudden you don't have

09:56 all the connections you had as a freelancer.

09:58 Right.

09:58 Maybe Expedia dries up well because travels down, but, you know, maybe some other companies

10:04 like, hey, we got some more work.

10:05 Why don't you come work for us?

10:06 Right.

10:06 Whereas if you work at a company, you don't necessarily cultivate those connections as

10:11 much.

10:11 Yeah.

10:11 And I think a lot of people tend to like, kind of make working as a freelancer, kind of like

10:16 a, like a cool thing that like, oh, when you're really a big shot, you get to like work, you

10:20 know, just as an independent consultant.

10:21 But it's, so I think a lot of people kind of like aspire to have that sort of a job, but

10:25 it's really hard.

10:26 It's really hard because as you're saying, there's a lot of instant, you know, if there are changes

10:30 in the market, you feel it instantly.

10:31 And, you know, half of your job is going out and finding new clients.

10:36 Making deals, trying to like work with stakeholders.

10:38 None of that has to do with programming or technical stuff, right?

10:42 You've got a, you're almost in marketing for yourself as a bit.

10:46 Right.

10:46 So like people think, oh, I want to be a freelancer.

10:48 So I don't have to do all the boring stuff.

10:50 I can just do the data science.

10:51 And it's like, no, you actually have to do more of the boring stuff.

10:54 So it's, we actually, we have a chapter in our book about, okay, what do you do once you

10:58 become like a senior data scientist and you're looking at the next steps.

11:00 And one of the paths we've discussed is the consultant path, which has some perks to it, but also

11:05 have some serious risks and downsides.

11:07 Yeah.

11:07 I also have to share.

11:08 So how we split writing the book was we each like, were the primary writer for half the

11:14 chapters and the other person edited it.

11:15 And so Jacqueline was writing this chapter she was just talking about.

11:18 And the first version of the independent consultant one was so negative.

11:22 Like she was basically like, never do this.

11:25 I'm like, Jacqueline, I think we need to like pull back a little bit.

11:27 Like, I understand.

11:28 You definitely want to share the cons.

11:30 Well, it's not that, yeah, it's not that I think consulting is a bad thing to do.

11:36 It's that I've had so many people come up to me and be like, oh, that sounds so cool.

11:40 I want to do that.

11:40 How do I do that?

11:41 And I feel like I'm the, like the old woman in front of the cave, like, you know, I think

11:48 I've even came a little too hard, but yeah.

11:50 Careful what you wish for.

11:52 You might get it.

11:54 How funny.

11:55 Yeah, it's, it definitely has this aura of like, hey, you're your own boss.

11:59 You can just do whatever.

12:00 But yeah, there's, there's a lot of work to be done there.

12:03 And I think it also makes sense at different stages in your career, for sure.

12:07 Let's start this whole conversation off with a question about how you both got your first

12:13 job.

12:13 So I heard about what you're doing now.

12:15 And I have this theory.

12:17 I've only really tested it.

12:19 I guess I've tested with a few people.

12:21 I was going to say, I really only tested with myself because I only know my career that

12:24 well, but I've, I work with some people who are interns and then found their way through

12:28 like really successful stuff and sort of saw that as well.

12:30 And my theory is in the developer data science space, the first job is the hardest.

12:39 Because once you've had one job, you have a portfolio of work, you have experience, you

12:44 can say, I've done this thing and you have a problem similar to like, here, I've done this

12:48 thing with license plates.

12:49 It's technically not license plates, but that's basically what you're asking to do.

12:52 So it's not a matter of convincing a person in an interview who can do it because you

12:57 can just show them, look, this is what I built and they're happy.

13:00 But in the very beginning, it's such an unknown people.

13:03 So I think getting that first job is probably like one of the biggest steps to kind of going

13:09 down this path.

13:10 So I wanted to ask you too, how'd you get your first jobs?

13:13 Emily, you want to go first?

13:14 Yeah.

13:15 So I mentioned my first job was Etsy.

13:17 And so, all right.

13:19 So it's like take a time machine back to fall 2016.

13:22 So I finished the Metis Data Science Bootcamp.

13:25 But I interviewed.

13:26 So like one thing they did was there was a demo day for your final project and there were

13:30 some companies hiring there.

13:31 But yeah, I ended up at Etsy.

13:32 And actually how I got that initial, I don't know if it would have happened anyway, but what

13:37 helped initially was I actually knew someone, Hillary Parker, used to work there as a data

13:41 analyst.

13:41 And she had since left, but she still knew people who worked there.

13:45 So she offered to introduce me to a manager there.

13:47 And he took a look at my profile and said, yeah, I can refer you.

13:50 So definitely a network is a big part of it, even for later jobs, although I definitely agree

13:57 the first, the hardest.

13:57 And then I think what helped there was like Etsy was a really great company.

14:03 I really enjoyed working there.

14:04 And the title at the time I had was data analyst.

14:07 It wasn't data scientist.

14:08 And now actually that team since then, shortly after I left, their titles are now data scientists.

14:12 But one thing we talk about in the book is avoiding this.

14:15 People can get very attached to the title data scientists, and those can sometimes be harder

14:21 to get, or they're very attached to like, oh, I need to go work at like Google or Facebook

14:24 or Airbnb, right?

14:26 Like this.

14:26 I would say like Etsy probably falls under there, but like, you know, very well known like data

14:30 science company.

14:30 And I do think like you are going to get such valuable experience from almost like any

14:35 job.

14:36 If you're working in like data and whether you're called data, data analyst or, you know,

14:41 research analyst or product analyst, like if you're doing code, if you're working with data,

14:45 if you're working with stakeholders, it's that can be a really great first experience.

14:49 And like you said, just having that on your resume can open up a lot more doors.

14:53 And it's very common, especially in the tech field for people to switch jobs every, you

14:58 know, one to three years.

14:58 So it's not like you're signing up there forever.

15:00 That's how I got my first job.

15:02 Yeah, cool.

15:03 And one of the things I want to talk to you all about is the trade-offs of different types

15:07 of companies to work at, but we'll get to that.

15:09 I do think this idea of having someone to introduce you or someone who knows you or someone who knows

15:15 someone, you know, like a couple of layers removed is really valuable.

15:19 When I was working at companies where we were doing a lot more hiring, it was,

15:23 does anybody know somebody who can do this and they can recommend who's good?

15:28 If the answer was no, then maybe it becomes a job search.

15:31 Then maybe it becomes a job posting.

15:33 But it was not first a job posting.

15:35 It was first, anybody knows somebody great that can do this that we're in need of.

15:39 And if somebody knew somebody, then we probably just would go out and talk to them, you know,

15:43 first.

15:43 And I don't know if that's fair or not, but that's just, it's how it works.

15:48 Because if you put a job posting out there, you could get a thousand, not a thousand,

15:51 a hundred applicants.

15:53 You've got to go through.

15:54 And, you know, I've had plenty of people I've interviewed where it's like, I can do this

15:59 thing.

15:59 Like, okay, how about this?

16:01 How about we turn on screen sharing?

16:02 And what was it at the time?

16:03 It was like, go to a meeting or something.

16:04 I turn on screen sharing.

16:06 And why don't you just write a real simple program that does that?

16:08 It should be like five lines of code.

16:09 Anyone could do it.

16:10 Who's like, couldn't do it.

16:12 You're like, okay, you clearly have not been doing this for two years because this

16:16 would be like the first week of a class that covered this topic.

16:19 So, you know, it's just, it's really tricky.

16:21 So I do think, you know, just people out there listening, like cultivate those connections

16:26 as much as possible, even if it's not, it's not a perfect meritocracy, but that's just the

16:31 way it is.

16:31 Right.

16:31 Yeah.

16:32 And I mean, I think there's lots of advantages to a network beyond like helping get a job.

16:36 It's like finding a community of other people who are having like, maybe if you're the only

16:40 data scientist at a company.

16:41 And like you said, I mean, it's partly when you sort of said like, maybe not thousands.

16:45 I mean, certainly the large companies get like thousands or tens of thousands of applications

16:49 for a data science position.

16:51 I remember Angela Bassa, who's one of our interviewees in the book, she posted, I'm not

16:55 sure it was data science, maybe it was a data analyst position at our company.

16:58 And she actually closed it after four days because they'd gotten a thousand applications.

17:02 I've got a backlog.

17:03 We're just going to have to go through this.

17:04 Yeah.

17:05 Yeah.

17:05 Yeah.

17:05 This portion of talk Python to me is brought to you by Kite, the smart AI powered autocomplete

17:12 for your editor.

17:13 As developers, our choice of editor is central to our work.

17:17 The more powerful and effective that that editor is, the more effective that you are.

17:22 That's why I'm excited about Kite.

17:24 Kite is a free plugin for your code editor that gives you ML powered autocompletions and

17:29 documentation.

17:29 Chances are it works with your editor of choice.

17:32 Even if that editor has existing autocomplete features, the list includes PyCharm, VS Code,

17:38 Atom, Sublime, Vim, and more.

17:40 And Kite runs locally.

17:42 So your code is private with no cloud or internet connection necessary.

17:46 And the Kite is 100% free.

17:48 So try it today at talkpython.fm/Kite.

17:51 Kite, K-I-T-E.

17:52 And CL Kite can help you be more effective with your Python code.

17:58 Declan, how'd you get your first job?

18:00 Okay.

18:00 So first off, I feel old because I was like, oh, in 2016.

18:05 So my story starts in 2008.

18:07 So I was finishing my master's in math.

18:10 And so, you know, because it was much earlier in this whole field.

18:13 Again, data science wasn't really a term yet.

18:16 So it was much harder to just like, you know, I remember going to like monster.com and searching

18:20 mathematician and getting just lots of math jobs.

18:22 I'm like, I don't, that's not what I want.

18:23 Like I don't want to be an actuary and I don't want to teach math.

18:27 So what do I, what can I do?

18:28 Yeah, I had no idea.

18:29 Well, and I knew there were jobs out there.

18:31 I just didn't know how to find them.

18:32 And one of my very good friends had just started the year before working at a company.

18:35 And he's like, oh, you know, we actually have a department that hires people with math degrees.

18:40 You should apply.

18:41 And so I applied and the interview process was like a two-day thing where they bring you on

18:45 site and they do a bunch of interviews.

18:47 And so at that point in my, by my master's, I had some internships.

18:51 I had some research projects.

18:52 So to your point of like, it's hard when you haven't had a first job before.

18:55 I think you can have things like internships or like, you know, that license plate or,

18:59 you know, any like project you can hang a hat on is something you can talk about.

19:03 So I had some of that.

19:04 Yeah, that's cool.

19:04 And I do think actually now is even easier than back then with stuff like GitHub and open source.

19:10 You don't have to be, you know, employed to create a cool project that people can start

19:15 to like or share or something.

19:17 Right.

19:17 So the opportunities are certainly there.

19:19 Right.

19:19 And I think that compared to before when I was doing it, no one knew what like an analytics

19:23 person needed for skills.

19:24 Was it math?

19:25 Was it programming?

19:26 Now we've really got a much better idea of what you need to have on your resume.

19:29 And so it's like a two day thing that ended with like a night where they took us to a bowling

19:33 alley, tried to get us to drink a lot to get more excited.

19:35 I think there's a lot of ethical deviousness.

19:37 Anyway, so I am just I want to let me just like just to finish the story.

19:42 I ended up taking the job.

19:44 And the job when they talked about it during the interview process was like, oh, you're

19:47 a, you know, analytics, business analytics team member.

19:50 You're going to do forecasting.

19:51 You're going to maintain the models.

19:53 And I'm like, oh, cool.

19:54 I've worked in forecasting before.

19:55 I would love a job where I help build cool, interesting new forecasting models.

19:58 And then like on the job, it came very clear that what they actually wanted me to do is rerun

20:03 the forecast each month in SAS, copy and paste it into Excel, copy and paste the chart

20:07 from Excel into PowerPoint, and then stand in front of people and read off the numbers.

20:10 And that was like a huge shock because that is not what I wanted to do in a job.

20:14 And it basically took me a year of working there before the job became kind of what I

20:18 wanted it to be.

20:18 But also I had given up at that point and moved on to a different job.

20:21 So for the people who when you do get that first job, if it is not what you expect, that's

20:27 probably, you know, I talked to a lot of people, a lot of people we interview in our book have

20:30 the same problem when they get to their first job.

20:32 They're like, oh, my God, this is not what I was expecting at all.

20:34 And by your second job search, you have a much better understanding of really what it is

20:38 you're looking for and what actually exists in industry versus there are no jobs.

20:41 Like there's no job in industry that's just writing math theorems or whatever.

20:45 That's right.

20:46 Well, also, I think sometimes you can get into those situations because the company thinks

20:52 that's how it has to be done.

20:54 Like we've done this and we need somebody to keep doing that.

20:57 Like we only knew how to run Excel and then export this stuff and then do this weird thing

21:02 and then manually fix it up.

21:03 And then you could tell us like what the picture says.

21:05 I think that's true.

21:06 Companies naturally have a tendency to that.

21:07 And I think at the time coming out of a math master's, I'm like, oh, I want to do is create

21:12 new and exciting mathematical stuff.

21:14 So I have this like affinity for like having to change the world on my job.

21:18 Yeah.

21:18 And I think, you know, I have since then in my career done a lot of good by constantly coming

21:22 up with new forecasts and new methods of doing things.

21:25 But also it is totally fine to have a job where you spend 80% of it just pressing go again.

21:31 And then the 20% doing something interesting if you like that.

21:33 Right.

21:34 Like, so there's like, I think I kind of like held my nose up against those kinds of jobs,

21:37 but I think they're pretty good.

21:38 I've hired people on teams.

21:40 I've had a lot of people who come straight from academia have the same problem I had.

21:44 Like, oh, wow.

21:45 You want me to copy and paste each number like individually?

21:48 That could take me 10 minutes.

21:50 Like, you know.

21:51 It's got to get done.

21:52 Well, my thought was, you know, maybe the job starts out that way, but then you're like,

21:56 well, I can, I can do some program.

21:58 We actually, we don't really need to load Excel and copy and paste.

22:01 I could use something like PyOpenXL where I could actually write code that talks to the

22:06 database and then runs a report and then just puts it in there.

22:09 Right.

22:10 So you could like slowly take away these manual steps by starting to create like cool pipelines

22:14 of like processing and automation.

22:16 And they didn't ask anyone to do that because they thought that was basically impossible.

22:21 Right.

22:21 And so I feel like a lot of people can end up in these situations where there's like

22:26 one workflow that you are hired for, but you, you know, as people who can write code, we're

22:31 kind of magicians, right?

22:33 They can kind of like magic stuff into existence and you can solve some of these problems and

22:38 they would probably much rather have a single button click or something that's automatic every

22:42 day, but they just, they couldn't create it or put it in place.

22:45 Yep.

22:45 I think that's true.

22:46 And I think depending on the different size company, you may get more opportunities to

22:50 do that.

22:50 And depending on your appetite of, I want to code an interesting thing in Python to try

22:54 and automate the Excel or I'll just press Excel.

22:56 I'll hit, I'll hit go for five minutes each day.

22:59 That's fine.

22:59 Exactly.

23:00 Yeah.

23:00 I do think that's where the company culture or your manager can be important though, right?

23:04 Cause I can imagine some companies that like have a lot of bureaucracy would just be

23:07 very uncomfortable with this idea.

23:09 They're like, no, we've always done it this way.

23:11 Or maybe like companies that are like, or the government or like companies that work with

23:14 the government.

23:15 So I do think like, it's important to be, and that also kind of the reason we wrote

23:20 this book was because, you know, we felt there's a lot of technical guidance out there, but not

23:25 on these other really important skills you need.

23:27 And I do think, you know, one of those skills, if you want to change the practice of a company,

23:31 you can't necessarily just be like, you know, email it to them one day and have that be done.

23:35 You need to like, you know, talk to them, figure out their, like, you know, what kind of scares

23:38 them about this change, like do change management and other things.

23:42 And I think that's like, to not underestimate the importance of things like communication

23:46 and, and working with stakeholders when thinking of things like technological solutions, even

23:51 if to you, it may seem really obvious that like, oh, of course this is like going to be a hundred

23:54 percent better.

23:55 Yeah, absolutely.

23:57 And Jacqueline, just to make you feel better, I got my first programming job when I was in

24:01 1997 when I was working on my PhD in math.

24:04 So, you know, you can always go farther back.

24:08 All right.

24:11 Well, one of the interesting things that you discussed was that there's this term data science,

24:18 but in a sense, there's almost like three branches of data science, kind of a little bit like in

24:24 software development, you'd say, hey, I'm a programmer.

24:27 I'm like, oh, cool.

24:28 Could you build me a mobile app?

24:29 Like, no, I have no idea how to build you a mobile app.

24:31 I could build you a website.

24:32 And then someone else would go, I can't build a website, but I build a cool desktop app.

24:36 Right.

24:36 So, you know, what does that kind of partitioning look like in the data science space?

24:40 Yeah.

24:41 Who wants to jump in?

24:41 So, you know, and this is something that could be like one of the more controversial parts of

24:46 the books.

24:46 But I think like we people sort of come around to this, but how we divided it is in three areas,

24:51 which is analytics, machine learning and decision science.

24:53 And for example, one company that basically has this division and they wrote a great post on this

25:00 at Airbnb is Airbnb does analytics, machine learning, and they call it inference instead of decision

25:05 science.

25:05 But the idea behind this is analytics is basically like taking data and putting it in front of the

25:11 right people.

25:11 So just sort of showing the data that you maybe already have or going out, like maybe

25:16 going out and collecting it, but basically just, you know, maybe by making dashboards or showing

25:21 a report is just surfacing data to the right people, which is really valuable.

25:25 And then the next one, machine learning, is I think often what people think of when they

25:30 think of data science, which are things like, you know, creating the recommendation model on

25:35 amazon.com, right?

25:37 When you look at a product and it says like, you know, you may like these products or at Etsy,

25:41 we have the search ranking team, which is when you search Harry Potter, what of the 200,000

25:46 Harry Potter items do you show first, right?

25:48 And they don't pick randomly.

25:49 There's an algorithm that's based off of historical, how the items have behaved historically.

25:54 And then the final one is decision science.

25:56 So this is basically going beyond the numbers to help companies or people make decisions.

26:00 And also generally involves a lot of statistics because basically it's, we need to understand

26:06 how to quantify uncertainty.

26:08 So even though we know, for example, that, you know, the people who answered this, we

26:13 ran a survey and, you know, 80% of people said this.

26:16 Well, but we had a 50% non-response rate and maybe we know that more women than men didn't

26:22 respond.

26:22 So how do we adjust for that?

26:24 What's the uncertainty around this estimate?

26:26 Making a forecast, you know, as Jacqueline talked about, like that's decision science,

26:30 you know, there.

26:31 So those are the three main areas we have.

26:33 Right.

26:33 Our mailing list already is like skewed towards this audience.

26:37 And if we just ask the mailing list, hey, everybody, tell us what you think.

26:40 It's going to carry that bias forward or that slant forward unless we can somehow do more

26:47 to take care of it and whatnot.

26:48 Right.

26:48 Yeah, exactly.

26:49 And yeah.

26:49 Anything to add there, Jacqueline?

26:50 Yeah.

26:51 I would just say that I think a lot of people have like a preconceived notion that like one

26:55 of these types is more pure or like one of these types is like the better.

26:58 Yeah.

26:59 And I'm sure it's a real data scientist, not the ones who use Excel.

27:02 Exactly.

27:03 To get the title.

27:03 And it's like, you can see it in stacked overflow posts.

27:06 You can see it a lot in LinkedIn posts.

27:09 Like there's a lot of this idea.

27:10 Probably a lot on Reddit.

27:10 Yeah.

27:11 Oh, yeah.

27:11 Oh, God.

27:12 Yeah.

27:12 I said I was going Reddit data science, but like, yeah, there's definitely can be this

27:16 culture of like, you're not a real data scientist.

27:17 Like if you don't do machine learning.

27:19 Just real quick, I'll give you the pitch for why I think each one of these has the right

27:23 to be great and like, isn't the best one.

27:25 So, okay.

27:26 So I think the reason why people like the machine learning the best is you're like, oh, cool.

27:31 I get to use, you know, real time inferences.

27:34 I get to actually help.

27:35 So like a customer, when they go on their website, they actually, what happens to them depends

27:39 on what my algorithm did.

27:41 And like, it's pretty cool to be able to say that like, I actually improved everyone's outcome.

27:45 So my car drives down the street by itself.

27:47 Yeah.

27:47 Everyone can see that.

27:49 The decision scientist, you got to be like the company's detective, right?

27:52 Like the CEO, like, like high level people can come up to you and be like, yo, I have

27:56 this question.

27:56 Can you figure it out?

27:57 And you get to like put on your detective hat and go into data and really try and come

28:01 up with an answer.

28:01 So yeah, you get to like play detective.

28:03 And the, the kind of like analysis, the analyst role.

28:06 It's great because it's like those other two roles, your things can go terribly wrong, right?

28:12 You're, you're, you can be a detective and not find the killer.

28:14 Your machine learning model can ruin things for customers.

28:17 Like things can go catastrophically wrong.

28:18 Being an analyst, you're just here to help things.

28:20 You know, you're helping, you're keeping the company going.

28:22 It's like a more relaxed.

28:23 You're giving advice, but you're not making the decision.

28:25 Just this is what we, we found.

28:27 Yeah.

28:28 So it's like, yeah, it's like, it's helping everything run more effectively without the

28:33 like incredible amounts of stress of trying to get things right that the, you know, or trying

28:37 to build new questions, you know, research and development things that you have in the other

28:40 two fields.

28:41 So it's like more of a relaxed, but enjoyable job.

28:43 I'd also say like, so often there's so much low hanging fruit in the analytics side of

28:48 like things that companies aren't looking at that would really change their decisions if

28:52 you just surface these numbers.

28:53 And plus, like, I think sometimes people can look down and it's like, oh, that's like easy.

28:57 Like you're not using, you know, stats or machine learning.

28:59 Well, it's actually, you know, it can be really hard to like pull the right data sometimes to

29:05 understand when someone's asking you the question, like, hey, can you, oh, there was a great

29:09 tweet yesterday where someone is like, you know, stakeholder, like, can you pull this data for me?

29:12 And you're, you know, and you're like, yeah, sure.

29:15 Let me just pull from, you know, select star from ideal and pristine table that you think

29:19 somehow exists.

29:20 And there's actually a lot of work to elicit the true question they're asking.

29:26 So I'm probably underlying all of this is data wrangling.

29:29 Yeah.

29:29 I think all of the people have to do data wrangling and it's really just a skill like data wrangling.

29:34 I think trying to be able to explain what is happening in the data, like, so kind of the

29:38 input and output.

29:39 Like you really need all, you need that for all three of these jobs.

29:42 So if you don't, if you're not comfortable taking data, trying to figure out, like, you

29:46 know, put it in a way that you can then use it.

29:48 And if you, you aren't comfortable looking at some numbers and trying to say like, oh, well,

29:51 this number plus this number really means that any of these three jobs is going to be more

29:55 difficult.

29:56 Yeah.

29:56 How much does knowing how to talk to databases matter?

30:00 Like writing SQL queries or things like that?

30:03 Or can you get away without that?

30:05 It really matters.

30:06 Even so the, I see, I would say, the SQL ideas I've seen show up in every data science job.

30:11 And I mean, I don't know.

30:12 I haven't seen every data science job, but everyone I have seen.

30:15 There's that one that only works with CSVs, but besides that one.

30:18 But even if you don't actually work directly with SQL, the idea of taking two CSVs and joining

30:22 them somehow together and then filtering out the rows, like because so much of the data

30:26 in the world is stored in a tabular format, you really have to think, like understand how

30:31 SQL and like relational databases work.

30:33 And if you don't actually know exact SQL syntax, that's fine.

30:36 Like maybe, you know, the pandas, whatever, or the RDPly or whatever.

30:40 But like the just concept of thinking through tables is like, yeah, you need it everywhere.

30:44 Yeah.

30:44 Emily, what do you think about that?

30:45 Yeah.

30:46 I would definitely say it's, it's one of the foundational skills.

30:49 And the good thing is like the basics of SQL, you can pick up pretty quickly, like just

30:53 like how to select from a table.

30:54 And then, you know, you can grow as needed, you know, maybe if the data engineer is helping

30:58 you out.

30:59 But, you know, of course, if you can't, if you can't access any data, you probably can't

31:03 do much data science.

31:04 Yeah.

31:05 That's a really good way to put it.

31:06 But also that's not a hard skill.

31:08 I mean, it's not really a hard skill to learn.

31:10 Like, yeah.

31:11 It seems weird and hard if you've never seen it, right?

31:14 Like, how do I connect to it?

31:15 This connection string is really complicated.

31:17 Yeah.

31:17 But you're right.

31:18 It's not a big deal.

31:19 It's just something you got to learn.

31:20 Now, I guess thinking of these three different types, it's one of the things that struck me

31:26 and you pointed out one, there's two things.

31:29 One was that the machine learning role is probably a little more computer science-y because you're

31:36 taking code and you're putting it into production and it's real time.

31:38 You're probably fitting in with APIs that other people are talking to and you're building stuff

31:45 that machines talk to.

31:46 Is that accurate?

31:47 What do you think?

31:47 I would say that the machine learning is more computer science-y.

31:51 Yes.

31:51 A hundred percent.

31:51 You do really need to understand things like unit testing or load testing in ways that the

31:56 decision scientists and the other roles don't necessarily need as much.

31:59 Right.

31:59 HTTP status codes and JSON and all that potentially, right?

32:02 Yeah.

32:02 The risk of the machine learning engineer is that that actually becomes the risk.

32:05 The risk is if you're not careful, your job could just become software engineering.

32:09 I know a lot of machine learning engineers who, well, their company doesn't have that much

32:12 machine learning engineering to do at the moment.

32:14 So you're just going to be a software engineer and then that's not great.

32:16 But the converse is as a decision scientist, you have much more stats and like just building

32:22 the actual like models.

32:23 But if you don't have the work to do as a decision scientist, there's not reports, you know,

32:28 not super interesting models to build and questions to answer.

32:31 You might end up just doing dashboards or something that, you know, like any of these jobs kind

32:35 of have a risk of falling into something you don't like.

32:37 It's just a question of which way does the rock fall down the mountain or whatever.

32:41 I don't know if that's a real metaphor, but.

32:43 Some mountains.

32:45 So another thought that I had while we were talking about this is different.

32:50 The people in these different groups will have massively different exposure to like the

32:55 C-suite or the decision makers of the company at a high level.

32:59 I'm thinking of a large company, like 500 people or more, not like a startup.

33:02 But, you know, the analysis person could easily get called in front, you know, for like a board

33:09 meeting to help them decide, you know, how are things are going.

33:12 Maybe the decision scientist, it's not so likely the machine learning developer is like, well,

33:17 they've decided and then they were told you're going to build this model and here's what they're

33:21 hoping for.

33:21 Right.

33:21 It's, it's a different kind of, you would still be working with a lot of technical people,

33:26 but you have like different ways to grow within the company, I guess.

33:30 Is that a good way to think of it?

33:31 Yes.

33:32 I think that is absolutely the case that if you're in this, if you're an analyst or a decision

33:37 scientist, then you are much more likely to get to go to a CEO, like go in that meeting

33:41 and show some interesting data that can prove something.

33:43 If you're a machine learning engineer, usually you are building a product, like Emily was

33:47 saying, like you're building a recommendation engine.

33:49 And then there's some product person whose job it is just to be in charge of that product

33:52 and they get to go and have to see.

33:54 You only go to the C-suite if you're going to be like raked over the coals because you

33:59 wrecked it with your machine learning.

34:00 I recommend it wrong.

34:01 But that being said, I think a lot of people who are sufficiently technical are like, oh,

34:06 I wouldn't want to do decision science.

34:07 I really want to do machine learning because I don't want to have to deal with like convincing

34:10 people.

34:10 I just want to have to deal with cool data modeling or whatever, you know, machine learning

34:14 modeling.

34:15 But it turns out that as Emily is saying, to do those jobs well, you still have to be able

34:19 to talk to the software engineers and the data scientists who built the model and the product

34:23 person who needs to know if the recommendation is going to be good enough for the customer.

34:26 Like you still have to do lots of talking to be good at it.

34:28 It's just that it is less of a core tenant than it is of perhaps some of the other roles.

34:32 Yeah.

34:32 How does this affect early stage careers?

34:35 Right.

34:35 Like I can, I can see somebody who like Emily in 2017 just came out of a bootcamp and

34:41 they said, okay, you're going to go talk to the CEO of Etsy and the board and like help

34:45 them with this product.

34:46 You'd be like, oh my goodness.

34:47 Like what have I gotten myself into?

34:49 Like that would on one hand be awesome, but also terrifying.

34:51 Do they fit better at different stages of careers or does that really matter?

34:56 I think it probably doesn't matter as much because like for a company that's big enough

35:00 for that prospect to be kind of terrifying, like if my last company was a startup, so like

35:03 I talked to the CEO all the time, but basically felt like another coworker.

35:06 So yeah, for it to matter, like you're probably going to have more senior people, right?

35:10 Who, if they are going to like have someone present to the CEO, it's probably not going

35:13 to be the person who joined two months ago.

35:14 Also, the other thing we didn't really talk about is how much you're specialized into one

35:19 of these roles does depend on the company.

35:20 So often that's like the company size and maturity of the data science team, right?

35:25 So at certain companies, you may be like fully like a machine learning engineer, but

35:28 if you're the first data scientist at a startup, you're probably doing a mix of all of these

35:32 and you wouldn't go as in depth in any one of them, right?

35:35 Like a startup probably doesn't need someone who can handle hundreds of millions of items,

35:40 like recommendation items, like Amazon would, like you don't need that compute power, but

35:44 maybe you build like a simpler recommendation model.

35:46 And then you also play detective work and you also, no one actually knows what the sales

35:50 number are.

35:50 So you like finally make a dashboard.

35:52 Right.

35:52 You probably do a lot of growth at like an early stage startup, a lot of AB testing type

35:56 of work.

35:57 Yeah, exactly.

35:58 So I don't want to make it seem like, oh, every role like falls into like one and only

36:01 one of this, because you certainly can have roles where you're, where you're putting on

36:05 multiple of these hats.

36:06 I would also say that not only depends on the company you work at, you may do multiple, but

36:10 also you can during your career change.

36:12 I didn't do any machine learning up until like two or three years ago.

36:15 And then I switched over to doing that now.

36:17 So now I kind of do both, but like lots of people switch in lots of directions between

36:21 any of these three jobs.

36:23 And that, that is the thing that it is possible to do.

36:25 Yeah.

36:25 Yeah, for sure.

36:25 Yeah.

36:26 Chapter one interview, Robert Chang is over at Airbnb is a really good case study in this.

36:30 So he started more on like the analytics side and the decision science.

36:33 I was working at Twitter.

36:34 He then started to continue that work in Airbnb.

36:36 And then he ended up switching over to do machine learning.

36:39 And he actually has blogged about this.

36:40 And like, as part of that process, like he did need to up his skills a bit.

36:44 So for example, he'd previously done most of his work in R, but the teams that do machine

36:49 learning, like a lot of the libraries were built in Python.

36:51 So he actually has a repo where he talks, where he like put his deliberate practice for Python

36:56 and how he was going to learn that over a couple months.

36:58 So he can make the switch.

36:59 That's cool.

37:00 Yeah.

37:00 You can definitely switch.

37:02 I mean, I've definitely made big switches in my career as well, from like being terrified

37:06 of the web to only working on the web and stuff like that as well.

37:08 Yeah.

37:08 And I would just add, oh, sorry.

37:09 I would just add on that.

37:11 You know, I've talked to a lot of people who have wanted to switch and had trouble because

37:15 these jobs are a resource and the company has a finite amount of them, right?

37:19 So there's some companies where they just don't have any machine learning engineering.

37:22 And so if you really just would love to do machine learning engineering, you're going to be

37:25 in trouble because there's just none of those jobs available.

37:26 Or as Emily points out, maybe they have a couple of them, but like people who are super

37:30 senior are already working on them.

37:32 And some companies, like you're a startup and like they have way too much work they

37:35 could possibly do any of it, you know, all of it.

37:37 So you can do kind of have a lot of freedom.

37:39 And so sometimes if you want to make this transition and you're finding it difficult, you need to

37:45 switch companies.

37:45 This portion of Talk Python to Me is brought to you by Linode.

37:50 Whether you're working on a personal project or managing your enterprise's infrastructure,

37:54 Linode has the pricing, support, and scale that you need to take your project to the

37:59 next level.

37:59 With 11 data centers worldwide, including their newest data center in Sydney, Australia,

38:04 enterprise-grade hardware, S3-compatible storage, and the next-generation network,

38:10 Linode delivers the performance that you expect at a price that you don't.

38:14 Get started on Linode today with a $20 credit and you get access to native SSD storage, a 40-gigabit

38:20 network, industry-leading processors, their revamped cloud manager at cloud.linode.com,

38:26 root access to your server, along with their newest API and a Python CLI.

38:30 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically

38:36 get $20 credit for your next project.

38:38 Oh, and one last thing.

38:39 They're hiring.

38:40 Go to linode.com slash careers to find out more.

38:43 Let them know that we sent you.

38:46 Speaking of companies, in your book, you have a really interesting conversation about different

38:52 kinds of companies.

38:53 And I've been fascinated.

38:54 I've worked at almost all of these different types.

38:58 Early-stage startup, late-stage startup, probably.

39:01 Mass, quite, yeah, let's go with massive tech company.

39:04 But not a government contractor.

39:06 I've worked sort of subcontracting with them.

39:08 I've worked at most of these.

39:09 And a lot of those experiences are not really obvious if, say, you're in a boot camp and

39:14 you're just looking for a job.

39:16 You have been through the internals of these things.

39:19 So maybe you'll give us a flyover of the five different types of companies and maybe a little

39:26 bit of example about each.

39:27 What's the team like?

39:28 What's the tech like?

39:30 What are the pros and cons?

39:31 And so on.

39:32 Sure.

39:32 We realized very quickly that when we were writing our book that we needed some sort

39:37 of way to help people understand what is the actual job like.

39:40 And then we're like, well, it really is so different depending on which company you're at.

39:44 And so Emily and I kind of brainstormed five different companies we worked at.

39:48 And then we kind of came up with goofy alternative names for them.

39:51 But if you look at our LinkedIn profile, you can probably guess.

39:54 Don't give it away to Emily.

39:55 I love that you have an actual little...

40:01 custom logo for each one.

40:05 Yeah, that was all Jacqueline.

40:06 Yeah, and I thought about which fonts to use with which company.

40:09 Yeah, it was well done.

40:11 So the five companies...

40:13 So we have MTC, which MTC is like your Google, your Apple, your Microsoft, these companies,

40:18 that's just like giant tech company.

40:20 So they're rich.

40:21 They're so big that like each part of the company uses a different type of tech.

40:25 You know, so they have lots of advanced stuff.

40:27 But because they're so big, you may not actually...

40:29 Your stuff may not link up with...

40:31 You know, if you're working on Google Maps, you may have nothing to do with a Google self-driving

40:34 car sort of a thing.

40:35 The second company is Handbag Love, which is just some company that's like a retail company,

40:39 you know, like a Nordstrom, DSW, one of these companies that is big.

40:44 They've been around for a while.

40:44 They use data science, but that's not like their thing.

40:46 But they're not a tech company.

40:48 Right.

40:48 Right.

40:49 And so I really like working at those kinds of companies because you got to like go in

40:53 and really do a lot because no one's there to tell you, oh, you can't use Python.

40:56 You have to use R or whatever.

40:58 Yeah, exactly.

40:58 There's no...

40:59 Like, let me talk to other software developers.

41:01 There are no...

41:02 Yeah.

41:02 There are none.

41:03 Like, okay, well, I can just...

41:05 These are the problems.

41:06 Please solve it with technology.

41:07 These are your requirements, right?

41:09 Yeah.

41:09 There's no rules and restrictions.

41:11 And so then we have this SegMetra company, which is like some company with like a hot new

41:15 idea for a startup.

41:16 And they're...

41:17 You know, it's really just like a classic startup where it's like there's so many things

41:20 that need to be built at once that like everyone just kind of in a constant panic

41:23 attack.

41:23 You get to do whatever you want.

41:24 So it's a lot of fun and exciting.

41:25 Then there's Videory, which is like, imagine if like...

41:29 What's that company that's Vimeo?

41:31 The company that's not YouTube, right?

41:32 So like some company that's...

41:34 Yeah.

41:34 You know, it's a tech company.

41:35 It's, you know, decent size, but it's not huge.

41:37 So everyone knows each other.

41:38 Right, right.

41:38 Maybe Zoom even.

41:39 Yeah.

41:39 Like we're talking on Zoom.

41:40 Could be something like that, right?

41:42 Yeah.

41:42 Yeah.

41:42 And then lastly, I forget what I call it.

41:44 Some GAD.

41:44 So it's basically like some giant government compactor.

41:47 Geo Aerospace or something like that.

41:48 Something like that.

41:49 And it's basically, think of your Lockheed Martin, your Boeing.

41:52 People don't, I think when they talk about data science, they usually don't think about

41:54 these companies as often, but they have tons of people like that, especially analysts.

41:58 Like these companies run on that.

42:00 And because these kind of government contracting companies are massive, they've been around for

42:05 a long time and they really don't want to make mistakes because that can cause a lot

42:08 of damage.

42:09 It's just a lot.

42:09 Everything moves a lot slower.

42:10 There's a lot more bureaucracy.

42:11 It's more of a relaxed job than working at like a startup.

42:13 Yeah, sure.

42:14 All right.

42:14 So which one of you worked at the massive tech company equivalent?

42:18 I don't know if I should say, I consulted for a massive tech company equivalent.

42:22 I'm not asking which one, just like, but you did, Jacqueline, that was you that worked

42:26 at something like this?

42:27 I worked at something like this.

42:28 So the reason I'm asking is because I want to ask you for your take on it, right?

42:33 Like, what is the team like?

42:34 Oh.

42:35 What is the tech like?

42:36 And so on, right?

42:37 Don't name names.

42:37 Oh, no, no, no.

42:38 Okay.

42:38 Okay.

42:38 Yeah.

42:39 Actually, I realized I actually consulted for a couple of them, so I'm not incriminating

42:43 anyone.

42:43 Anyway.

42:44 So when I consulted for these companies, they're like, because they're so big, they're

42:49 so big that, you know, they may have this like big onboarding process that everyone goes

42:53 through, but it has nothing to do with your actual job because the company is too big to

42:56 do that.

42:57 And then when you got on your team, it's like really specific.

42:59 I recently started working with some company like this.

43:02 They were working with the podcast, right?

43:04 They were doing some ads and stuff.

43:05 I had to go through and like sign a waiver that said nobody would climb on a ladder in a dangerous

43:11 way.

43:11 Yeah.

43:12 I'm like, it's a podcast recording.

43:15 You're going to give me audio.

43:15 Like, there's no ladders.

43:16 I don't know.

43:17 But like, this is the other one.

43:18 I see a ladder in your background.

43:19 I didn't know about that.

43:21 Yeah.

43:21 Actually, maybe this is what they're talking about.

43:23 It was like the warehouse person and the like contractor who does podcasting, whatever.

43:29 Like it didn't, you know, they wanted to run an ad.

43:31 So I had to go through this like weird process.

43:34 It was bizarre.

43:34 Yeah.

43:35 And so the cool thing about working with this company is they have tons of money and they're

43:38 really excited about technology.

43:39 So if you're like, I want to buy this expensive thing and try building a solution using that

43:42 people are generally like, sure, whatever.

43:44 It's fine.

43:44 The bad thing is this is true for everyone else as well.

43:47 So when your product A is trying to link up to product B, you may struggle a bit.

43:52 So there's just a lot of this kind of lots of tech, lots of money, high salaries, not

43:55 necessarily everything working in sync that you have to deal with.

43:58 Yeah.

43:58 You probably get to work with a ton of smart coworkers.

44:00 Yeah.

44:01 It's a bit of a bonus and a curse, right?

44:03 It's hard to stand out probably, but it's also great to have that support.

44:07 Right.

44:08 And if you're a person who really likes learning from other people and like having direct

44:11 mentorship, you are, this is one of the best companies to get that out.

44:13 Because yeah, like this company just draws people who know a lot of tech like a magnet.

44:18 Yeah.

44:18 So like one thing what we did is because, you know, it may be the case, like, you're

44:21 looking for jobs and like, you know, it's easy for you.

44:23 Like you're thinking of finding Google.

44:25 You're like, okay, that's the massive tech company.

44:26 But maybe you find a job and you're like, well, it doesn't really fit into any one of

44:29 these five things.

44:30 It's like one thing we do at the end of the chapter is we pull it together.

44:33 Okay.

44:33 Like what are some of the vectors that the companies differ on?

44:36 Right.

44:37 So mentorship, bureaucracy, like the tech stack.

44:39 So even if you find one, you know, you have a company that's not in one of these five

44:43 archetypes, you can sort of go through those things.

44:45 You say like, oh, okay, well, like it's a huge company.

44:48 So like probably there's a decent amount of bureaucracy.

44:49 I would be the first data scientist.

44:51 There's not going to be a lot of mentorship.

44:52 And so you can think about these different pieces and, you know, people have different

44:57 preferences, right?

44:58 Like some folks really, I've talked to people who really love, usually we don't, it's, I

45:02 wouldn't recommend it for someone's first job, but people who want to be the first data

45:06 scientist at the company because they want to get to build everything.

45:08 And then there's some experienced data scientists who are like, I would never want to be the first

45:12 or the only data scientist.

45:13 Like I really like working on a team.

45:14 So it's not like, you know, one of these is like, you know, oh, everyone, you know, it's

45:18 always bad to like have these certain things, but it's just different criteria that you can

45:22 think about and reflect for yourself.

45:24 Like what's important to me?

45:25 What am I looking for?

45:26 Yeah.

45:26 What's the fit?

45:27 All right.

45:27 So handbag, glove, who wants to talk about that one?

45:29 I can do that one too.

45:31 All right.

45:32 Just as I was talking about before.

45:33 So like, there's like a retailer, like let's call it like if it's, yeah, again, like Nordstrom,

45:37 Boatlocker, one of these companies that's like a retail company.

45:40 The cool thing about this is they have a very real product that they've been selling for

45:43 a long time and understand what they are doing.

45:45 So like you add a lot of stability there.

45:47 And by adding on data science, these companies are a lot like, okay, well, let's try and use

45:51 data to improve the product recommendations, improve the product, improve, improve our understanding

45:57 of things, you know, like use data to answer questions.

45:59 So you get a lot of, there's a lot you got to do as a data scientist.

46:02 You have a lot of.

46:03 Right.

46:03 They used to use intuition and now they're going to use data or something like that.

46:06 Right.

46:06 Yeah.

46:06 And so that's the upside.

46:08 So downsides are you don't have as much money.

46:10 Cause you're not like a rich tech company.

46:11 Your tech isn't as good because you know, you just don't care as much about getting the

46:15 best of the best.

46:16 Like, you know, older tech is generally fine.

46:18 And, you know, just as we were talking about, you generally have fewer people who can like

46:21 mentor you.

46:22 Like there'll be someone there, but you know, there'll be people there generally, but it

46:25 might be that everyone know, everyone's using like a far outdated Python library because

46:30 no one knows about the new way and no one's reading up on it.

46:32 So exactly.

46:33 They're still on Python too.

46:34 Something like that.

46:38 Cool.

46:39 And then early stage startup.

46:40 What's the story for data scientists there?

46:42 Yeah.

46:43 I could talk a little bit about this.

46:44 So yeah, with data scientists, like you come in and you really get to shape everything.

46:47 So like there's some negative parts.

46:50 So even beyond the data science part, right?

46:51 Like you might show up at the startup and they're like, oh, we don't have your laptop yet.

46:54 So it's sort of a funny thing is like, there's, but there's also more freedom because they

46:58 might ask you like, Hey, what kind of laptop do you want?

47:00 Like if they're like a decently well-funded startup and you can be like, oh, I want this really

47:03 souped out laptop.

47:04 You don't get that super slow clunky one with a huge company banner that takes five

47:09 minutes to start up.

47:10 Yeah, exactly.

47:11 That's a mixed bag.

47:12 But yeah, often you're like, it's talking about there's a lot of low hanging fruit.

47:16 You also have to wear, you may have to do some data engineering, right?

47:19 Like maybe there's not any data engineers and all of the databases are optimized to like,

47:24 you know, serve the website.

47:26 And so it takes you five minutes.

47:27 So like get a count of a $800,000, $800,000, $800,000 row table.

47:32 So yeah, so you have to wear a lot of different hats.

47:35 You might be pulled in a bunch of different directions.

47:37 So it's also really important to be able to prioritize, like to not just be like firefighting,

47:41 also take some time, like to, for example, build up some skills, like to build up your toolbox.

47:46 So like, okay, maybe write a library for yourself of like, that's a wrapper around pulling

47:50 the data.

47:51 So that becomes easier.

47:52 That's a really important point, because I think a lot of these, I've worked in places

47:55 like this, and it's nobody asks you to build a helper library for data access.

48:00 They help you.

48:01 They ask you, give me this answer or make this product or give me this thing.

48:04 And you're like, yeah, but we really need this thing in place.

48:08 And somebody's gonna have to build it.

48:10 It's gonna be me or the other person I'm working with.

48:12 And you just kind of have to be willing to put in that infrastructure along the way, right?

48:17 Because you're going to appreciate it later, but there's no guidance for that.

48:20 Right.

48:20 And it's definitely not in place usually.

48:22 Yeah.

48:22 And you have to like help teach people like how to ask questions, like what is possible,

48:26 like bring in best practices.

48:28 So it's like I was saying earlier, I really would not most of the time recommend this for

48:33 someone's like first data science job to do this.

48:35 But for an experienced data scientist, like I found some people who really, really love

48:39 doing this because they're like, oh, I don't have to deal with like, you know, the decisions

48:43 of past data scientists.

48:44 I get to shape this in my vision and I get to use the most modern tools, for example.

48:48 You want to use Python, you can.

48:49 You want to use R, you can.

48:50 Like no one's going to, there's no one there.

48:52 So you just, they just need answers.

48:54 Yeah.

48:54 You can use F sharp, right, Jacqueline?

48:55 Yeah.

48:56 I get a lot of, a lot of people make fun of me because my favorite programming language,

49:00 no one else in the world uses.

49:02 Well, I did see the Jupyter Notebooks now support F sharp.

49:05 So that's, that's a vote.

49:06 Be still in my heart.

49:08 That's awesome.

49:09 Yeah.

49:10 But I mean, like early stage startups.

49:12 And I would say also handbag love companies a little bit as well because they may have some

49:17 tech stack, but it might be so outdated.

49:19 They're like, you're new.

49:20 We want to like go in, we want a refreshing direction where you can go in this other way.

49:24 We're not going to make use this old thing.

49:26 We're going to try to get, you know, get something new growing here.

49:29 So you can go and have some flexibility as well, I think.

49:32 Now, what about the videory, the later stage startup?

49:36 Yeah.

49:36 For my preference, I think this is kind of a sweet spot because like you have like, you

49:40 know, it's sort of like in the, in the, in the middle of a lot of these things, right?

49:43 Like there's like some bureaucracy, but I kind of like bureaucracy sometimes.

49:46 Like HR has their stuff figured out.

49:48 Like that's nice.

49:49 It's like benefits and other things.

49:51 There actually is vacation.

49:53 Yeah, exactly.

49:54 You know, so there's usually like, there's a team of data scientists, but since it's, it's

49:58 still like a startup, you know, they weren't, you don't have like a, a, you know, 40 year

50:02 old tech stack, right?

50:03 Like most decisions were made made like five or 10 years ago.

50:05 If that.

50:06 Yeah.

50:07 So I think this can be a nice fit.

50:09 Like you can still get, you know, you can still like know everyone on the data science

50:12 team.

50:12 Like on like, if you're at like a massive tech company and have support, but also have some,

50:17 some structure in there as well.

50:18 And there's probably like data engineers and other people to help out with like data science

50:22 adjacent problems.

50:23 Yeah.

50:23 You're probably a little more locked into a tech stack.

50:25 Yes, that is true.

50:27 Yeah.

50:27 I don't think you can really like do a tech stack from start.

50:30 And so you're locked into certain decisions, you know, and, and there may be sometimes you're

50:34 like, oh, I wish I have a time machine and like could go back and like fix this decision

50:37 they made a while ago.

50:38 Right.

50:38 Like you're at an early stage startup.

50:39 You can be like, all right, we're going to start like collecting data right away.

50:42 You know, we're going to log everything.

50:43 And then if you're at like a, you know, later stage company, they're like, oh, like, why don't

50:48 we like look at the state?

50:49 And you're like, oh, actually we weren't collecting that a year ago.

50:51 And they're like, okay, but make a forecasting model anyway.

50:53 And you're like, oh no, no, no, no.

50:55 You don't understand how uncertain this answer is going to be.

50:59 Yeah.

51:00 And then I guess the last type of company archetype that you all covered was the government contractor,

51:05 the Lockheed Martins and the Hallibartons and so on.

51:08 Yeah.

51:08 And I should also mention this includes the government itself, right?

51:11 Like if you work for the Department of Transportation or something like that, or just, you know, companies

51:14 where there is for legal reasons, there is just a lot of regulation, a lot of things

51:18 like, you know, keeping things moving a little slower.

51:20 And so these kinds of jobs, they tend to, they have to tend to have lots of people who are

51:24 not data scientists.

51:25 And you tend to have the data scientists maybe embed in little groups of that.

51:28 So like in the missile department or whatever, or the, you know, truck department, I don't

51:32 know.

51:33 And so because of that, you generally, you don't have as much mentorship often, but you often

51:39 don't have as much people telling you, no, you can't do it that way.

51:42 You're wrong.

51:42 I mean, you may have it like, oh no, we don't support Python past 2.7 because our, you know,

51:47 our procurement department hasn't cleared it or whatever.

51:49 So maybe bureaucracy, but there isn't like, oh, you have to, you know, like there's, there's

51:53 just not as much of like a standardization around tech just because there's, you know, that's

51:57 not the focus of the company.

51:58 And so these kinds of jobs, I'd say are really, they're really great.

52:01 If you want a job where you go in each day, you work eight hours with a 45 minute lunch

52:06 in there, you get a little bit of stuff done, but you don't stress crazy about getting it

52:10 the most you possibly can done.

52:11 And no one's stressed about you getting exactly the most, right?

52:14 So there's not like, you know, if you're a job where you're like, I'm going to go in,

52:17 I'm going to be the 10X data scientist.

52:18 I'm going to rock, you know, my career is going to be a rocket ship up to the C-suite as fast

52:23 as I can.

52:23 Like, this is not the kind of company for you.

52:25 It's the kind of company that's for more for people who are like, I just want to do a consistently

52:29 good job.

52:30 Like then go home and take my paycheck and spend it on something I enjoy.

52:32 Yeah.

52:33 And don't need a lot of perks.

52:34 That's a good way to put it.

52:35 Yeah.

52:36 Yeah.

52:36 Because I've talked to people who are like, yeah, it's like, especially if you look at some of these

52:40 tech companies, right.

52:41 And like, I don't know, Airbnb and like Rose on tap or something like you, you're lucky if you get

52:45 coffee at some of the like government contractors.

52:47 Yeah.

52:48 That's for sure.

52:49 I think another thing that is interesting is so many of these types of companies are driven by like

52:56 government contracts or projects.

52:59 I'm thinking of like DARPA funding and like, here's a project that is guaranteed to run for

53:04 one year and then it may immediately get canceled no matter what.

53:08 Right.

53:08 So you have like these sort of long time horizons of working on something, but there's, it could

53:12 become a totally different type of job because some other contract was won and this one was

53:17 expired, lost, whatever.

53:19 Yeah.

53:19 And I think there's kind of, I would say not just besides government contractors, you can

53:22 imagine there are some other fields that might kind of fall into this, like certain parts

53:25 of healthcare might kind of fall into this area.

53:28 Yeah, definitely.

53:29 You imagine parts of finance, like some like, you know, rules around, you know, financial

53:33 risk regulations might kind of have some of these components too, but it's more the archetype

53:37 of the company that's got a lot of regulations or reasons why it has to move slowly and not

53:42 break things.

53:42 Yeah.

53:42 Yeah.

53:43 Interesting.

53:43 I think that's a really cool list you put together and I agree with a lot of your assessments

53:48 there.

53:49 So pretty neat.

53:49 Now we've been talking forever and we just barely touched on the stuff that you're covering.

53:53 We could just talk for so long because this is such a great book and a great topic,

53:57 but just for the sake of time, let's talk about one more topic and maybe blend this together.

54:01 So let's talk about getting the skills, becoming a data scientist from wherever you're starting.

54:06 And then also maybe just real quickly building a portfolio, because like I said at the beginning,

54:10 I do think having that first job is super important and getting that first job is strongly influenced

54:16 by just having something I can show.

54:18 You want me to do this?

54:19 I've already done it.

54:20 You don't have to verify if I can do it.

54:22 I look, this is it.

54:23 Just look at it.

54:23 You know, is it a personal fit or a salary fit?

54:27 Or whatever, right?

54:27 So let's start with getting the skills first.

54:30 Okay.

54:30 Yeah.

54:31 Back when you were talking about a master's.

54:32 Yeah.

54:33 And Emily, you're talking about a bootcamp.

54:34 Those sound like two different paths to me.

54:36 You didn't necessarily study programming, right?

54:39 You kind of went the math side, which actually I did as well.

54:41 Yeah.

54:42 I can cover.

54:42 Yeah.

54:43 Let me talk about all the different ways you can get skills.

54:46 And then Emily can talk a little bit about the portfolio, because that may or may not

54:49 have aligned with the chapters before.

54:52 I mean, who's to say?

54:53 I would never reveal that.

54:54 So we really, we think there's like four ways you can kind of get data science skills.

54:58 One is you can get a degree, which usually it's for people that go and get some sort of

55:02 master's degree, which is either like data science or maybe computer science or math, something

55:06 like that.

55:07 And the degree is great in that it, if you don't have that much of a background, you

55:11 will learn what you should, you will spend two years doing it.

55:14 So you should learn the basics of what you actually need, right?

55:16 The data science degree, you should learn the data science skills and you might do some projects

55:20 during it.

55:20 The downside is it takes two years and like 80 grand.

55:23 That's so much money.

55:24 Yeah.

55:25 A bootcamp takes 12 weeks and like 15 grand.

55:28 So that's much faster, much cheaper.

55:30 And the whole point of a bootcamp is to get you what you need as quickly as possible.

55:33 And I feel to me like bootcamps almost do a better job of connecting you with a job afterwards

55:38 than like a master's program.

55:41 Yeah, I think that's true.

55:42 And I think generally I would recommend bootcamps more except for people.

55:45 Bootcamps, you really need some sort of background already.

55:47 Like you need to have some idea of programming or some knowledge of this kind of field already.

55:51 If you don't know anything about data science, that might be 15 grand.

55:54 Then you still are just kind of confused.

55:56 You can, the third option is you could try and find data science work within your job.

56:00 If you're an analyst and you want to do more decision science, you can try and find

56:03 places where you can do decision science in your analyst job.

56:05 If you're a decision scientist and you want to do machine learning engineering, you can

56:08 try and find places where you can do some more engineering.

56:10 So you could kind of try and learn within whatever your job is.

56:13 What if you're a scientist who kind of does a little computation and you kind of want to

56:17 drift towards the data science side?

56:19 Yeah.

56:19 So I actually know someone who is trying, you know, doing that very thing.

56:23 She was, you know, she's a scientist.

56:25 She takes measurements and she's in her job has started to use R to actually make plots and

56:29 do the kind of investigatory stuff.

56:31 And that's totally been working for her.

56:32 And then lastly, you can teach yourself, right?

56:35 There's all these courses online.

56:36 You can work on your portfolio, which we'll get into.

56:39 And teaching yourself is great because it's free.

56:40 You have to focus on the stuff you care about.

56:42 And yeah, you can really, if you can motivate yourself correctly, you can really like learn

56:46 a lot that way.

56:47 I've learned a lot this way.

56:48 The downside is, is that it requires an immense amount of discipline, right?

56:53 If you try and do everything learning online, you have to actually do those courses instead

56:56 of playing Animal Crossing.

56:58 But yeah, not that I play Animal Crossing, but no, Jacqueline's just calling me out.

57:02 And you know, you don't know if you're teaching yourself the important things or not.

57:08 At some level, you don't have a mentor when you're teaching yourself.

57:10 And that's a problem.

57:11 Yeah.

57:11 It's also, I feel like sometimes when people are trying to teach themselves, they try to

57:16 boil the ocean, right?

57:17 Yeah.

57:17 Yes.

57:17 You know, like, well, I saw this and this and this.

57:20 So I got to know all those things.

57:21 Like, no, no, no.

57:21 You just vertical slices, not horizontal.

57:23 Yeah.

57:23 Like, figure out what you got to try to build something and learn what you need to

57:27 build that shallow or deep in these areas and then go from, like, iterate, right?

57:31 Yeah.

57:31 And I think a similar problem to that is when you're teaching yourself, there's not like

57:35 a natural stopping point.

57:37 So like in a master's or a boot camp, like they end and then you're like, oh, I guess

57:40 it's like time for you to like find a data science job versus if you're teaching yourself,

57:44 it's so easy to be like, well, I can't apply to like a data scientist yet.

57:46 I haven't learned like this thing or I haven't learned this thing.

57:48 And you just make this endless list.

57:49 Yeah.

57:49 Right.

57:49 Yeah.

57:50 Which I think is, is, you know, you're always data science is a, is a career where you're

57:54 always going to be learning.

57:55 And so it's not like you, you, it won, like no one knows everything.

57:58 So you don't have to feel like, okay, I must like master, you know, the whole world to

58:02 be able to get a data science job.

58:04 Yeah.

58:04 Yeah.

58:05 I hear you.

58:05 Let me put data science up a little bit on a pedestal here.

58:08 Like, so I feel like as a developer, you can build web apps, work with databases, whatever,

58:12 like you can totally do a quick bootcamp.

58:16 You can take online courses, read books, teach yourself.

58:19 I do feel like those are skills you can mostly get yourself.

58:22 They're like painful lessons you have to learn, but I'm not sure you'd learn those in school

58:25 anyway.

58:26 But with data science, I feel like there is a level of statistics and like scientific understanding

58:33 and a little bit of math that I think is a little bit harder for people to just get on

58:36 their own.

58:37 So having some formal training in the background seems more important for data science than pure

58:43 development.

58:44 I would guess I would broadly agree.

58:46 And not because the actual statistic and machine learning models you learn as a data scientist

58:50 are like somehow harder to learn than software engineering, but because the fields, those

58:55 fields are so confusing.

58:57 And like in a layout, like statistics, like what is considered statistics versus machine learning

59:02 versus industrial engineering?

59:03 Like these are all extremely poorly laid out.

59:05 The people in those fields make them as confusing as possible to make it seem like only they understand

59:09 it.

59:09 And there's not a really an easy pattern.

59:12 There's not really just like one book out there that's like, oh, this thing in statistics

59:15 is actually that thing in computer science.

59:17 And like, they're the same.

59:17 And don't worry about that half of statistics doesn't super matter.

59:20 Like that's not easy information to find.

59:23 Yeah.

59:23 Yeah.

59:24 I do think though, it also depends on like what type of role you want.

59:26 Right.

59:27 Like, so I think that's a little bit less important in analytics role, for example, to have like

59:30 that background.

59:31 And there is certainly like, you know, people do, you can still like learn some of this on

59:35 the job, you know, whether for like mentorship or like reading books or like other, other things.

59:40 But I agree, like there is a bit of a danger because I feel like, especially with statistics,

59:43 it's like, if you run a statistical test, like it will generally spit out an answer, but it

59:49 may not be answering what you think versus right.

59:51 Like it's a little more obvious sometimes in development work, like, oh, the website didn't

59:54 load.

59:54 So I guess I have to like figure out what we're wrong.

59:56 So there's a bit more of a danger there.

59:58 Yeah.

59:58 You're always going to get a number from that library, from those algorithms.

01:00:01 Right.

01:00:02 And you have to understand what it's doing.

01:00:04 That's kind of what I was saying.

01:00:05 Like, it's really clear if the website is letting the user log in or not.

01:00:09 Yeah.

01:00:09 There's not a huge debate.

01:00:10 Maybe security is not quite right.

01:00:12 There's details you got to get right.

01:00:13 But it's generally, it works or it doesn't.

01:00:16 Just because I'm so upset about the point I made previously, because I think the point's

01:00:20 right, but I'm getting upset thinking about it.

01:00:21 It's like, so like a linear regression or logistic regression has some built in assumptions.

01:00:26 If you're in a CS department and you're like, I'm going to use a linear regression

01:00:28 to fit this as part of a neural network.

01:00:30 People are like, fine.

01:00:31 You did that in a stats department.

01:00:33 They're like, how dare you?

01:00:34 That's so incredibly wrong.

01:00:35 Right.

01:00:35 You violated the assumptions.

01:00:36 And it's like, well, these are two trained academic professionals telling you two totally

01:00:40 different things.

01:00:41 And I think that is something you get all the time in data science that you don't get as

01:00:45 often in software engineering.

01:00:47 Yeah.

01:00:47 It's exactly that kind of stuff I was thinking of.

01:00:49 Yeah.

01:00:49 Both the two things you both mentioned.

01:00:51 All right.

01:00:52 Let's close out our conversation by talking about getting a portfolio, maybe mixing a little

01:00:57 possibly contributing to open source as we're at or something.

01:01:00 Emily, do you want to give us the rundown on that?

01:01:02 Yeah, absolutely.

01:01:03 So the idea behind a portfolio, and this is especially helpful for people who don't have

01:01:07 a formal education or haven't worked in very similar jobs or been able to learn on the job.

01:01:13 Because as you were talking about, this is a way they can show they can do the work, even

01:01:16 if they hadn't had an opportunity in school or at a company.

01:01:19 Like a portfolio project.

01:01:21 So we really recommend for it is doing something original that you care about.

01:01:26 Because, you know, one thing people might default to is like, all right, I'm going to go look

01:01:29 on Kaggle.

01:01:30 And I'm going to find like one of the data sets they have.

01:01:32 And I'm going to like do this competition where they like give you a data set.

01:01:35 And they like, you know, tell you to predict this thing.

01:01:37 And the problem with that is like, one, it doesn't really show your personality.

01:01:41 It skips over the steps that are really critical.

01:01:44 And you'll need to do in like data science roles, which is like gathering the data, figuring

01:01:48 out what question to answer.

01:01:49 And also, honestly, like if a company sees that in portfolio, you know, maybe they're worried

01:01:54 that like, oh, did they just copy someone else's code, right?

01:01:56 Like this is a problem a lot of people have worked on.

01:01:58 So we recommend, you know, kind of finding, figuring out a question you're interested in

01:02:04 answering, or finding a data set that's interesting to you and exploring it to like figure out

01:02:09 like, okay, what are some like interesting findings I can have from that?

01:02:11 And so putting that together and then sharing it on GitHub.

01:02:15 So you have the code with the readme that describes it.

01:02:17 And then ideally also having a blog, because a blog is really great.

01:02:21 Someone may not look through, you know, like hundreds of lines of your code, but they might

01:02:25 be like, oh, yeah, let me read about like what they found.

01:02:27 I'm like, look at some visualizations or read a tutorial that they wrote because they use

01:02:31 natural language processing for this project.

01:02:33 Or even just look back two years and see they've been doing this for as long as they said

01:02:38 they have been or something like that.

01:02:39 Yeah, exactly.

01:02:40 Exactly.

01:02:40 And so Jacqueline shared the example project that she did, which is trading a network on

01:02:44 offensive license plate.

01:02:45 So I do want to emphasize like it doesn't have to be, you know, something very serious.

01:02:49 Or if you want to go into finance, it doesn't necessarily have to be like a finance project.

01:02:52 Because, you know, if you're like, oh, I use like neural networks, or like one of the

01:02:57 projects I did was I built a dashboard.

01:02:59 And so that shows like I can build a dashboard from scratch.

01:03:02 So I really think this can be like a great way to show off like some of your personalities,

01:03:06 your coding skills, your communication skills with the read me and the and the blog, and

01:03:11 maybe even demo in an interview.

01:03:12 So when I was doing the job search after graduating from boot camp, I would show I bring my laptop

01:03:17 and sometimes I would show this dashboard that I had built.

01:03:19 And I'd be like, look, and you could filter it and you can click around and it like you

01:03:22 click this goes to a link.

01:03:23 And I think that made it like much more, much more real to them than if I was just talking

01:03:27 about this theoretical project.

01:03:29 Absolutely.

01:03:29 That's awesome.

01:03:30 That's really good advice.

01:03:31 Another thing I think would be valuable is if people can in the right place, the right

01:03:35 background and whatnot is to maybe contribute to some project that's relevant in the data

01:03:40 science space.

01:03:41 Right?

01:03:41 Like, if you have two people you're interviewing, and one's like, well, I'm pretty good at using

01:03:46 Jupyter.

01:03:46 The other person's like, I had two PRs merged into Jupyter.

01:03:50 And actually, you know, some of the people on the team, you know, a little bit who work

01:03:54 on like, okay, I know who I'm going to talk to a little bit more next about, you know,

01:03:57 it's a different level of credibility.

01:03:59 Even if like what you did was there were no unit tests for this part of the library.

01:04:03 So I wrote some unit tests, or I worked on the documentation, or I worked on a tutorial.

01:04:07 Like, it doesn't have to be I rewrote the main thing, right?

01:04:11 Yeah, absolutely.

01:04:12 And we have like, I think it's 14 chapters on like joining the community.

01:04:15 And that's one of the things we talk about is contributing to open source and exactly what

01:04:19 you said.

01:04:20 Like it can be, you know, writing new documentation, even fixing a typo, just these ways to get

01:04:25 involved.

01:04:26 And, you know, I do want to emphasize that like, you know, this isn't something that is required

01:04:30 to get a data science job.

01:04:31 Like I know a lot of data scientists who don't have like a GitHub with personal projects who

01:04:36 don't have a blog who don't contribute to open source.

01:04:38 So they're still like excellent data scientists.

01:04:40 But it's just like, what are the ways that one, hopefully it's fun to hopefully you learn

01:04:44 something like that's the other big point of the portfolio project.

01:04:47 It's a great way to direct your learning like you find out, oh, I need to like, you know,

01:04:51 figure out how to scrape this website, let me go like to gather the data.

01:04:54 So let me go learn web scraping.

01:04:55 And three, like and to stand out in interviews, but it certainly, you know, shouldn't, I don't

01:05:00 think it should be like a requirement for any job, for example.

01:05:02 Yeah, right.

01:05:03 I agree.

01:05:04 And it probably a different company archetypes, they probably care or completely don't care

01:05:09 about this, right?

01:05:09 Like the big geospace, geo aerospace contracts, they're probably like, okay, great.

01:05:13 We don't know that we trust you if you're writing code for just for open source.

01:05:17 That might be weird, right?

01:05:18 Whereas like the startups are like, oh my gosh, that's so amazing.

01:05:20 I can't believe, you know, or the big tech company.

01:05:22 We're trying to move to open source.

01:05:24 So that's great.

01:05:25 You can be one of our advocates.

01:05:26 So yeah, I suppose that it probably varies a lot as well in there.

01:05:29 All right.

01:05:30 Well, I would love to talk more about this because there's a, kind of cool ideas you two put in there, but I think we have to leave it at that.

01:05:37 Let me ask you the two quick questions before I let you out here.

01:05:40 If you're going to write some code, do some data, science data analysis, what editor do

01:05:45 you use these days?

01:05:45 I use RStudio, although what are my development goals?

01:05:48 I'm actually starting to use the Vim as the editor within it.

01:05:52 So I'm trying out that.

01:05:54 But yeah, I've been, I've been using RStudio for, for years now.

01:05:57 Although I also heard, what is it?

01:05:58 Is it Visual Studio?

01:05:59 Like now SportsR and like one of my teammates was trying that out and really liked it.

01:06:03 Yeah.

01:06:03 Probably VS Code.

01:06:04 Yeah.

01:06:04 That's awesome.

01:06:04 VS Code.

01:06:05 Yeah.

01:06:05 And Jacqueline?

01:06:05 So I'm a 50, 50 split between RStudio and Visual Studio Code.

01:06:09 So RStudio for anything R related, literally anything else, including like just notes to myself,

01:06:14 Visual Studio Code.

01:06:15 Yeah.

01:06:16 Awesome.

01:06:16 And then notable libraries out there for data scientists, not necessarily something super popular,

01:06:21 but you're like, oh, this package is really awesome.

01:06:22 People should know about it.

01:06:24 Do they have to be Python libraries?

01:06:25 No, they don't be Python.

01:06:27 No, there's more of a data science topic.

01:06:31 So it could be running across the board.

01:06:33 Well, I will mention one of the libraries that I created when I was consulting for T-Mobile.

01:06:38 It's called Load Test and it's for R.

01:06:41 Me and the T-Mobile team made it and it's to help you if you're making an API in R using

01:06:47 the R library plumber, which is great.

01:06:49 You can use the Load Test library to test it to make sure that your R model will be able

01:06:55 to handle the load.

01:06:55 Okay.

01:06:56 Awesome.

01:06:56 Yeah.

01:06:56 Very cool.

01:06:57 Emily?

01:06:58 I have so many.

01:06:59 Now I'm wondering if I should, you know, say my own package as well.

01:07:01 Do it.

01:07:02 Stop from Boat.

01:07:03 Stop from Boat.

01:07:04 I'll briefly share.

01:07:05 I use it less now, but I use it a lot at my last company, Funnel Join for like analyzing

01:07:08 sequences of events.

01:07:09 You're like, all right, who like came to the website?

01:07:11 1% of people who visited the homepage then bought a subscription.

01:07:14 But what about if we want that within two days?

01:07:17 So that's one.

01:07:18 But another package, R has so many packages that I like.

01:07:21 So one thing that I'm very excited about, which is like sort of hot, it's been in development

01:07:26 for a while and in pieces, but is tidy models.

01:07:29 So we're rethinking of how to do modeling in R with a brand new website out now too.

01:07:33 So I think it's tidymodels.org.

01:07:35 So I'm excited about that.

01:07:36 And then finally, just when the janitor package is a fun one for if you do cleaning data, it

01:07:42 just has all these functions for like you import a data set and there are spaces in the names

01:07:47 and like weird capitalizations and like weird like characters that make it hard to work with.

01:07:52 It has a function like clean names and it will just fix all of those for you.

01:07:55 Oh, that's cool.

01:07:55 Yeah.

01:07:56 And I think there's a PyGenitor as well.

01:07:58 I'm not sure if it's directly the same, but so people in Python, they do PyGenitor.

01:08:02 You'll also go and throw one out there on the Python world for folks.

01:08:05 There's this thing called MissingNo, MissingN-O.

01:08:09 It's a visualizer for missing data.

01:08:12 So you just have a Pandas data frame and you throw it at it and it'll draw you like a big

01:08:16 cool graph of visually where your data is filled in, where it's missing and all these sort

01:08:20 of like correlations of you're missing this data, you're probably also missing that data.

01:08:24 It's super cool.

01:08:25 Yeah.

01:08:25 There's actually an R1 for that, which is Manier.

01:08:28 I never know how to pronounce that, but it's also for like missing, yeah, Manier for missing

01:08:33 data.

01:08:33 Awesome.

01:08:34 Yeah.

01:08:34 Yeah.

01:08:34 That seems super valuable.

01:08:35 Just get a quick, like I've got all this data loaded up.

01:08:37 Let me just look at it.

01:08:39 Yeah.

01:08:39 Visually.

01:08:39 Yeah.

01:08:40 Cool.

01:08:40 Yeah.

01:08:41 Yeah.

01:08:41 So tell us how people can get your book.

01:08:42 Our book is online.

01:08:43 You can buy from the Manning website, who's our publisher.

01:08:46 And we actually have two URLs because we had a disagreement about this.

01:08:49 So we have the professional URL.

01:08:51 Do you want the professional version of the book?

01:08:53 Yeah.

01:08:53 Yeah.

01:08:53 DataSciCareer.com.

01:08:57 And then we have the fun version of the book, which is at bestbook.cool.

01:09:02 And now those will take you to the same webpage, but know if you click bestbook.cool, you're

01:09:06 getting the fun version of the website.

01:09:07 Oh, yeah.

01:09:08 You're getting DataSciCareer.com.

01:09:09 It's the professional one.

01:09:10 Yeah.

01:09:10 And maybe we should have people guess like which one of us, which one of us is the fun

01:09:15 one, which one is the four serious.

01:09:16 Exactly.

01:09:17 Put it in the show notes.

01:09:18 Put it in the comment section at the bottom of the show page.

01:09:21 Awesome.

01:09:22 Well, Jacqueline, Emily, it was really great to have you on the show.

01:09:25 And I can certainly recommend your book.

01:09:27 It's spot on.

01:09:28 It covers a bunch of great topics.

01:09:30 People ask me about careers all the time, and I always want to have good advice to give

01:09:34 them.

01:09:34 And so here's definitely something they should check out.

01:09:36 Thank you so much.

01:09:37 Thank you so much.

01:09:38 Yeah.

01:09:38 You bet.

01:09:39 Yep.

01:09:39 Bye.

01:09:39 Bye.

01:09:39 Bye.

01:09:40 Bye.

01:09:40 This has been another episode of Talk Python to Me.

01:09:43 Our guests on this episode were Emily Robinson and Jacqueline Nolus, and it's been brought

01:09:49 to you by Kite and Linode.

01:09:50 Kite is the smart AI-powered autocomplete for your editor.

01:09:54 And the more powerful your editor is, the more effective that you are.

01:09:57 Get Kite for free at talkpython.fm/kite.

01:10:01 Start your next Python project on Linode's state-of-the-art cloud service.

01:10:06 Just visit talkpython.fm/Linode.

01:10:09 L-I-N-O-D-E.

01:10:10 You'll automatically get a $20 credit when you create a new account.

01:10:13 Want to level up your Python?

01:10:16 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:10:20 Or if you're looking for something more advanced, check out our new async course that digs into

01:10:26 all the different types of async programming you can do in Python.

01:10:29 And of course, if you're interested in more than one of these, be sure to check out our

01:10:33 Everything Bundle.

01:10:33 It's like a subscription that never expires.

01:10:35 Be sure to subscribe to the show.

01:10:37 Open your favorite podcatcher and search for Python.

01:10:40 We should be right at the top.

01:10:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

01:10:47 direct RSS feed at /rss on talkpython.fm.

01:10:51 This is your host, Michael Kennedy.

01:10:52 Thanks so much for listening.

01:10:54 I really appreciate it.

01:10:55 Now get out there and write some Python code.

01:10:56 I really appreciate it.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon