Build a career in data science
That's great. But how do you get a career in data science? Once you land that first job, how do you find the right fit? How do you find the right company? And how do you get more deeply involved in the community?
I have brought two great guests, both highly successful data scientists, on the show today who have been thinking deeply about this. Jacqueline Nolis and Emily Robinson are here to give you real-world, actionable advice on getting into this rewarding career.
Links from the show
Jacqueline on Twitter: @skyetetra
Data Science Careers book (choose your version!)
Professional: datascicareer.com
Cool: bestbook.cool
Book discount code at Manning: podtalkpython19
Jacqueline’s offensive license plate project: github.com
Emily’s Pokémon project: hookedondata.org
PyJanitor package: pyjanitor.readthedocs.io
MissingNo package: github.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode Transcript
Collapse transcript
00:00 Has anyone told you that you should become a data scientist?
00:02 Have you heard it's a great career?
00:04 In fact, data scientist is the best job in America according to Glassdoor's 2018 rankings.
00:11 That's great, but how do you get a career in data science?
00:14 And once you've landed that first job, how do you find the right fit?
00:17 How do you find the right company?
00:19 And how do you get more deeply involved with the community as you grow in that career?
00:24 I've brought two great guests, both highly successful data scientists, on the show today who have been thinking deeply about this.
00:30 Jacqueline Nolas and Emily Robinson are here to give you real-world, actionable advice on getting into this rewarding career.
00:37 This is Talk Python to Me, episode 262, recorded Wednesday, April 22nd, 2020.
00:57 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,
01:02 and the personalities.
01:03 This is your host, Michael Kennedy.
01:05 Follow me on Twitter where I'm @mkennedy.
01:07 Keep up with the show and listen to past episodes at talkpython.fm.
01:11 And follow the show on Twitter via at Talk Python.
01:14 This episode is sponsored by Kite and Linode.
01:17 Please check out what they're offering during their segments.
01:19 It really helps support the show.
01:21 Jacqueline, Emily, welcome to Talk Python to Me.
01:24 Thank you.
01:24 Thank you.
01:25 Excited to be here.
01:26 I'm excited to have you both here.
01:28 Really excited to talk about this topic.
01:30 I think one of the things that a lot of listeners out there can benefit hugely from
01:35 is how do I get started in programming?
01:39 How do I get started in data science?
01:41 How do I get started in this overall sort of Python career?
01:44 And there's so many different paths and ways you can go.
01:49 You could go get a four-year degree.
01:50 You could drop out of college and do a startup.
01:53 What is the right path?
01:55 And what is some guidance around there?
01:57 What are some of the trade-offs?
01:58 And so you both have been writing about this lately.
02:02 And this is really, really good work that you're putting out.
02:05 So I'm excited to talk to you both about that.
02:07 Great.
02:07 Yeah.
02:08 So we just published our book, Build a Career in Data Science.
02:12 We've been working on this, I think, almost two years now.
02:14 Yeah.
02:15 Two years since you reached out to me.
02:17 Yeah.
02:17 And so two years, a lot of work, a lot of talking to people in the field.
02:21 And so it's really been great to finally get it out there and talk to people and watch people
02:25 actually get help by it.
02:27 So yeah, we love talking about it.
02:29 Awesome.
02:30 Well, I'm really excited to have you all here to talk about it.
02:32 But before we get to that, let's maybe do a little bit of a meta thing.
02:36 I always ask this question on the show, but it's a bit meta this time.
02:39 It's just how did you get into programming in Python?
02:41 Emily, you want to go first?
02:42 Yeah.
02:43 I don't know if I can admit this on this podcast, but I actually don't program in Python
02:46 anymore.
02:47 I program in R day to day.
02:49 I have programmed in Python.
02:50 So how I got started was back in college.
02:53 I did Python in one computer science class, but most of the programming I did was in R in
02:57 the statistics program.
02:59 So I was lucky enough to go to Rice University when Hadley Wickham was a professor there who,
03:03 for your listeners who don't use R, is a very famous R programmers, contribute a lot of
03:08 the big packages to it.
03:09 So that's how I got started.
03:11 And I kept doing it in grad school, which I got my master's in organizational behavior.
03:15 After that, I went to a data science bootcamp called Metis, which was all in Python.
03:20 So, you know, sort of up my Python skills there and then got started working in data science
03:24 and industry.
03:25 Yeah, very cool.
03:26 Declan, how about you?
03:27 Okay.
03:27 So like Emily, most of my work is in R, although I do some Python too.
03:31 But okay.
03:32 So my background, I did an undergrad and master's in math.
03:36 And then like, I really want to help companies use math to solve problems.
03:40 This is before the term data science existed.
03:41 So I went out in industry, did some what is now data science, but I didn't know it at the
03:46 time.
03:46 Went and got a PhD because I wanted to get some more technical skills.
03:50 And then now I work as a consultant helping out companies.
03:52 Awesome.
03:53 What's your PhD in?
03:54 Industrial engineering.
03:55 Yeah.
03:55 But actually, so how I actually started using Python, I think the first Python project I
03:59 ever did was there's a style of games from like the 80s called roguelikes, where you have
04:04 like your little asterisk symbol and you're walking around like the computer screen trying
04:07 to fight monsters and stuff.
04:08 And like the monster might be the letter M.
04:10 And I wanted to make one of these.
04:12 It's like a mud, but it has like some visual representation.
04:15 Okay.
04:16 Yeah.
04:16 Awesome.
04:16 Yeah.
04:16 And so it just like, it gets weirder.
04:18 So I wanted to make one of these and I wanted to make one of like when you're on a river and
04:22 you're like tubing and you're sitting there, I want your character, your little ad symbol
04:25 to be like you floating around the river and you like see horses and stuff.
04:28 So anyways, Python was the language that had the most straightforward library for making
04:32 one of these roguelike games.
04:33 And so I spent like two weeks coding up this tubing simulator and then I got bored of the
04:38 project and left it.
04:39 But that was my first time actually using Python.
04:41 Yeah, it's cool.
04:42 And these little personal projects are super valuable for getting into programming because
04:46 you just go through and say, well, I'm learning about loops.
04:49 So I'm going to write like some different kind of loops.
04:50 Like no one learns that way, right?
04:52 Not really.
04:53 Yeah.
04:53 So that's one of the chapters in our book.
04:56 We have a whole chapter on, hey, you know, it's really good to learn things this way.
04:59 And you can actually make a portfolio of projects that you then can use to help
05:03 you get a job.
05:04 And for me, one of the, besides making tubing simulators, one of the projects I actually
05:09 did was to learn neural networks.
05:11 I generated a neural network that would create offensive license plates that would get banned
05:17 by the state of Arizona because I had a data set of all these license plates.
05:20 And that ended up being the basis for like an extremely valuable consulting project where
05:26 I ended up doing natural language processing using the same stuff I learned from that offensive
05:29 license plate thing.
05:30 That is so cool.
05:31 It's just such a fun and playful and kind of silly project.
05:34 But then, yeah, what I found really interesting in programming in general is you have these
05:40 two different realms, you know, like think hedge fund or Air Force, right?
05:45 But the fundamental thing you learn and the skills you gain to solve or work in those areas, it's
05:51 almost exactly the same.
05:52 It's just like, what layer is the specialty that they're working in?
05:57 You know, how do you work with trading?
05:58 They care maybe more about like timing versus, I don't know, visualization or whatever.
06:03 But it's blown my mind like how similar stuff is like that.
06:07 Like I created this fun thing for license plates.
06:09 And then it turns out to actually let me, I don't know what exact project you were working
06:14 on, but like something that obviously was probably not that, right?
06:17 It was a real thing someone paid me for as opposed to a license plate thing which gets me some blog post
06:22 views.
06:22 Yeah.
06:23 Just a very short side note.
06:25 Did you see that somebody thought they were going to be so, so clever and get out of all
06:29 the camera, like camera speed traps and stuff that red lights, red light cameras and whatnot.
06:36 So they got their license plate to say null, N-U-L-L.
06:39 And they thought like that would just trigger the database to think nothing was there.
06:43 But they started getting tickets for every faulty piece of data that was in.
06:48 Oh, no.
06:50 There was nobody, like there wasn't a license plate properly.
06:53 It was null.
06:54 So they started getting thousands of dollars of tickets that weren't theirs, but that was,
06:59 they're now registered for null, like across the board.
07:01 It's upstate.
07:02 I care.
07:03 Playing with fire.
07:04 Oh, that backfired bad.
07:05 Anyway, that sounds really, really fun.
07:07 So Emily, what are you doing today?
07:09 Like day to day, what do you work on?
07:11 Yeah.
07:11 So I work as a senior data scientist at Warby Parker, which makes eyeglasses and out contacts
07:16 as well.
07:17 You can get online and well, you could in previous times get in stores as well.
07:21 But that, of course, like many other companies, we've closed all our retail stores at the moment.
07:25 Yeah.
07:26 We have a Warby Parker here in Portland and I almost went in it the other day, but not anymore.
07:30 Yeah.
07:30 But still online, right?
07:31 Yeah.
07:31 Still online.
07:32 Still online.
07:32 Yeah.
07:33 So I joined there in December.
07:34 So about five months ago now on the data science team, which is a centralized team that works
07:40 with departments across the company.
07:42 So that's been really fun because previously I'd always been a data scientist that was embedded
07:47 with a team.
07:48 So when I worked at Etsy, the analytics department was centralized, but we were sort of paired with
07:53 one partner team.
07:54 So I worked with search my whole time.
07:55 And then my last job, I was part of the growth team.
07:59 So I reported to the VP of growth, not to the chief data scientist.
08:02 So this has been a new experience being on a fully centralized team where for one month I
08:07 might work with finance.
08:08 And then a couple months later, I'm working with product strategy.
08:11 That sounds super fun.
08:12 And you get to experience different kinds of problems and work with different teams
08:16 and technologies, I'm sure.
08:18 Yeah, exactly.
08:18 And so, you know, and the team tackles a wide range of projects.
08:22 So one thing we discussed in our book, right, is data science is a pretty broad field.
08:25 And so there's lots of different projects you can do.
08:27 So some of ours are making like a dashboard to view analytics and some other ones may be more
08:34 modeling problems or making machine learning product.
08:36 So that's been really interesting to get this breadth of things that we can work on depending
08:41 on what the team we're working with needs.
08:43 Yeah, sounds super cool.
08:44 Jacqueline, how about you?
08:45 I'm working as an independent consultant.
08:47 So I've spent the last couple of years working, you know, as my own company.
08:51 So helping out big companies like T-Mobile, Expedia, some smaller startups in the Seattle area.
08:57 And so this is pretty fun because, you know, like Emily was saying, by being a consultant,
09:01 I got to work on all sorts of different projects, whether it's taking machine learning models
09:05 and pulling them into production or helping a company figure out who are the most like active,
09:09 engage customers and how to, you know, think about targeting them differently.
09:12 So that's pretty great.
09:13 Unfortunately, it is not like the best time to be a consultant on your own right now.
09:19 You know, it's been a little dicey lately, but all in all, I've really enjoyed getting
09:22 to work with all these big companies on all these different interesting data science problems.
09:26 Yeah, it sounds really fun.
09:27 And it is nice to be able to pick who you want to work with and what projects you want
09:31 to take.
09:31 And you have a lot, a little more freedom, I think, to kind of go your own way.
09:35 But right now, I don't know, all bets are off.
09:39 It's yeah, I think it's tricky to be a freelancer right now because you feel it, all these changes
09:44 and all these pressure, you feel it immediately.
09:46 Right.
09:47 But if you work at, I don't know, some large company, you might not feel it right away,
09:52 but then, you know, maybe that company goes under and then all of a sudden you don't have
09:56 all the connections you had as a freelancer.
09:58 Right.
09:58 Maybe Expedia dries up well because travels down, but, you know, maybe some other companies
10:04 like, hey, we got some more work.
10:05 Why don't you come work for us?
10:06 Right.
10:06 Whereas if you work at a company, you don't necessarily cultivate those connections as
10:11 much.
10:11 Yeah.
10:11 And I think a lot of people tend to like, kind of make working as a freelancer, kind of like
10:16 a, like a cool thing that like, oh, when you're really a big shot, you get to like work, you
10:20 know, just as an independent consultant.
10:21 But it's, so I think a lot of people kind of like aspire to have that sort of a job, but
10:25 it's really hard.
10:26 It's really hard because as you're saying, there's a lot of instant, you know, if there are changes
10:30 in the market, you feel it instantly.
10:31 And, you know, half of your job is going out and finding new clients.
10:36 Making deals, trying to like work with stakeholders.
10:38 None of that has to do with programming or technical stuff, right?
10:42 You've got a, you're almost in marketing for yourself as a bit.
10:46 Right.
10:46 So like people think, oh, I want to be a freelancer.
10:48 So I don't have to do all the boring stuff.
10:50 I can just do the data science.
10:51 And it's like, no, you actually have to do more of the boring stuff.
10:54 So it's, we actually, we have a chapter in our book about, okay, what do you do once you
10:58 become like a senior data scientist and you're looking at the next steps.
11:00 And one of the paths we've discussed is the consultant path, which has some perks to it, but also
11:05 have some serious risks and downsides.
11:07 Yeah.
11:07 I also have to share.
11:08 So how we split writing the book was we each like, were the primary writer for half the
11:14 chapters and the other person edited it.
11:15 And so Jacqueline was writing this chapter she was just talking about.
11:18 And the first version of the independent consultant one was so negative.
11:22 Like she was basically like, never do this.
11:25 I'm like, Jacqueline, I think we need to like pull back a little bit.
11:27 Like, I understand.
11:28 You definitely want to share the cons.
11:30 Well, it's not that, yeah, it's not that I think consulting is a bad thing to do.
11:36 It's that I've had so many people come up to me and be like, oh, that sounds so cool.
11:40 I want to do that.
11:40 How do I do that?
11:41 And I feel like I'm the, like the old woman in front of the cave, like, you know, I think
11:48 I've even came a little too hard, but yeah.
11:50 Careful what you wish for.
11:52 You might get it.
11:54 How funny.
11:55 Yeah, it's, it definitely has this aura of like, hey, you're your own boss.
11:59 You can just do whatever.
12:00 But yeah, there's, there's a lot of work to be done there.
12:03 And I think it also makes sense at different stages in your career, for sure.
12:07 Let's start this whole conversation off with a question about how you both got your first
12:13 job.
12:13 So I heard about what you're doing now.
12:15 And I have this theory.
12:17 I've only really tested it.
12:19 I guess I've tested with a few people.
12:21 I was going to say, I really only tested with myself because I only know my career that
12:24 well, but I've, I work with some people who are interns and then found their way through
12:28 like really successful stuff and sort of saw that as well.
12:30 And my theory is in the developer data science space, the first job is the hardest.
12:39 Because once you've had one job, you have a portfolio of work, you have experience, you
12:44 can say, I've done this thing and you have a problem similar to like, here, I've done this
12:48 thing with license plates.
12:49 It's technically not license plates, but that's basically what you're asking to do.
12:52 So it's not a matter of convincing a person in an interview who can do it because you
12:57 can just show them, look, this is what I built and they're happy.
13:00 But in the very beginning, it's such an unknown people.
13:03 So I think getting that first job is probably like one of the biggest steps to kind of going
13:09 down this path.
13:10 So I wanted to ask you too, how'd you get your first jobs?
13:13 Emily, you want to go first?
13:14 Yeah.
13:15 So I mentioned my first job was Etsy.
13:17 And so, all right.
13:19 So it's like take a time machine back to fall 2016.
13:22 So I finished the Metis Data Science Bootcamp.
13:25 But I interviewed.
13:26 So like one thing they did was there was a demo day for your final project and there were
13:30 some companies hiring there.
13:31 But yeah, I ended up at Etsy.
13:32 And actually how I got that initial, I don't know if it would have happened anyway, but what
13:37 helped initially was I actually knew someone, Hillary Parker, used to work there as a data
13:41 analyst.
13:41 And she had since left, but she still knew people who worked there.
13:45 So she offered to introduce me to a manager there.
13:47 And he took a look at my profile and said, yeah, I can refer you.
13:50 So definitely a network is a big part of it, even for later jobs, although I definitely agree
13:57 the first, the hardest.
13:57 And then I think what helped there was like Etsy was a really great company.
14:03 I really enjoyed working there.
14:04 And the title at the time I had was data analyst.
14:07 It wasn't data scientist.
14:08 And now actually that team since then, shortly after I left, their titles are now data scientists.
14:12 But one thing we talk about in the book is avoiding this.
14:15 People can get very attached to the title data scientists, and those can sometimes be harder
14:21 to get, or they're very attached to like, oh, I need to go work at like Google or Facebook
14:24 or Airbnb, right?
14:26 Like this.
14:26 I would say like Etsy probably falls under there, but like, you know, very well known like data
14:30 science company.
14:30 And I do think like you are going to get such valuable experience from almost like any
14:35 job.
14:36 If you're working in like data and whether you're called data, data analyst or, you know,
14:41 research analyst or product analyst, like if you're doing code, if you're working with data,
14:45 if you're working with stakeholders, it's that can be a really great first experience.
14:49 And like you said, just having that on your resume can open up a lot more doors.
14:53 And it's very common, especially in the tech field for people to switch jobs every, you
14:58 know, one to three years.
14:58 So it's not like you're signing up there forever.
15:00 That's how I got my first job.
15:02 Yeah, cool.
15:03 And one of the things I want to talk to you all about is the trade-offs of different types
15:07 of companies to work at, but we'll get to that.
15:09 I do think this idea of having someone to introduce you or someone who knows you or someone who knows
15:15 someone, you know, like a couple of layers removed is really valuable.
15:19 When I was working at companies where we were doing a lot more hiring, it was,
15:23 does anybody know somebody who can do this and they can recommend who's good?
15:28 If the answer was no, then maybe it becomes a job search.
15:31 Then maybe it becomes a job posting.
15:33 But it was not first a job posting.
15:35 It was first, anybody knows somebody great that can do this that we're in need of.
15:39 And if somebody knew somebody, then we probably just would go out and talk to them, you know,
15:43 first.
15:43 And I don't know if that's fair or not, but that's just, it's how it works.
15:48 Because if you put a job posting out there, you could get a thousand, not a thousand,
15:51 a hundred applicants.
15:53 You've got to go through.
15:54 And, you know, I've had plenty of people I've interviewed where it's like, I can do this
15:59 thing.
15:59 Like, okay, how about this?
16:01 How about we turn on screen sharing?
16:02 And what was it at the time?
16:03 It was like, go to a meeting or something.
16:04 I turn on screen sharing.
16:06 And why don't you just write a real simple program that does that?
16:08 It should be like five lines of code.
16:09 Anyone could do it.
16:10 Who's like, couldn't do it.
16:12 You're like, okay, you clearly have not been doing this for two years because this
16:16 would be like the first week of a class that covered this topic.
16:19 So, you know, it's just, it's really tricky.
16:21 So I do think, you know, just people out there listening, like cultivate those connections
16:26 as much as possible, even if it's not, it's not a perfect meritocracy, but that's just the
16:31 way it is.
16:31 Right.
16:31 Yeah.
16:32 And I mean, I think there's lots of advantages to a network beyond like helping get a job.
16:36 It's like finding a community of other people who are having like, maybe if you're the only
16:40 data scientist at a company.
16:41 And like you said, I mean, it's partly when you sort of said like, maybe not thousands.
16:45 I mean, certainly the large companies get like thousands or tens of thousands of applications
16:49 for a data science position.
16:51 I remember Angela Bassa, who's one of our interviewees in the book, she posted, I'm not
16:55 sure it was data science, maybe it was a data analyst position at our company.
16:58 And she actually closed it after four days because they'd gotten a thousand applications.
17:02 I've got a backlog.
17:03 We're just going to have to go through this.
17:04 Yeah.
17:05 Yeah.
17:05 Yeah.
17:05 This portion of talk Python to me is brought to you by Kite, the smart AI powered autocomplete
17:12 for your editor.
17:13 As developers, our choice of editor is central to our work.
17:17 The more powerful and effective that that editor is, the more effective that you are.
17:22 That's why I'm excited about Kite.
17:24 Kite is a free plugin for your code editor that gives you ML powered autocompletions and
17:29 documentation.
17:29 Chances are it works with your editor of choice.
17:32 Even if that editor has existing autocomplete features, the list includes PyCharm, VS Code,
17:38 Atom, Sublime, Vim, and more.
17:40 And Kite runs locally.
17:42 So your code is private with no cloud or internet connection necessary.
17:46 And the Kite is 100% free.
17:48 So try it today at talkpython.fm/Kite.
17:51 Kite, K-I-T-E.
17:52 And CL Kite can help you be more effective with your Python code.
17:58 Declan, how'd you get your first job?
18:00 Okay.
18:00 So first off, I feel old because I was like, oh, in 2016.
18:05 So my story starts in 2008.
18:07 So I was finishing my master's in math.
18:10 And so, you know, because it was much earlier in this whole field.
18:13 Again, data science wasn't really a term yet.
18:16 So it was much harder to just like, you know, I remember going to like monster.com and searching
18:20 mathematician and getting just lots of math jobs.
18:22 I'm like, I don't, that's not what I want.
18:23 Like I don't want to be an actuary and I don't want to teach math.
18:27 So what do I, what can I do?
18:28 Yeah, I had no idea.
18:29 Well, and I knew there were jobs out there.
18:31 I just didn't know how to find them.
18:32 And one of my very good friends had just started the year before working at a company.
18:35 And he's like, oh, you know, we actually have a department that hires people with math degrees.
18:40 You should apply.
18:41 And so I applied and the interview process was like a two-day thing where they bring you on
18:45 site and they do a bunch of interviews.
18:47 And so at that point in my, by my master's, I had some internships.
18:51 I had some research projects.
18:52 So to your point of like, it's hard when you haven't had a first job before.
18:55 I think you can have things like internships or like, you know, that license plate or,
18:59 you know, any like project you can hang a hat on is something you can talk about.
19:03 So I had some of that.
19:04 Yeah, that's cool.
19:04 And I do think actually now is even easier than back then with stuff like GitHub and open source.
19:10 You don't have to be, you know, employed to create a cool project that people can start
19:15 to like or share or something.
19:17 Right.
19:17 So the opportunities are certainly there.
19:19 Right.
19:19 And I think that compared to before when I was doing it, no one knew what like an analytics
19:23 person needed for skills.
19:24 Was it math?
19:25 Was it programming?
19:26 Now we've really got a much better idea of what you need to have on your resume.
19:29 And so it's like a two day thing that ended with like a night where they took us to a bowling
19:33 alley, tried to get us to drink a lot to get more excited.
19:35 I think there's a lot of ethical deviousness.
19:37 Anyway, so I am just I want to let me just like just to finish the story.
19:42 I ended up taking the job.
19:44 And the job when they talked about it during the interview process was like, oh, you're
19:47 a, you know, analytics, business analytics team member.
19:50 You're going to do forecasting.
19:51 You're going to maintain the models.
19:53 And I'm like, oh, cool.
19:54 I've worked in forecasting before.
19:55 I would love a job where I help build cool, interesting new forecasting models.
19:58 And then like on the job, it came very clear that what they actually wanted me to do is rerun
20:03 the forecast each month in SAS, copy and paste it into Excel, copy and paste the chart
20:07 from Excel into PowerPoint, and then stand in front of people and read off the numbers.
20:10 And that was like a huge shock because that is not what I wanted to do in a job.
20:14 And it basically took me a year of working there before the job became kind of what I
20:18 wanted it to be.
20:18 But also I had given up at that point and moved on to a different job.
20:21 So for the people who when you do get that first job, if it is not what you expect, that's
20:27 probably, you know, I talked to a lot of people, a lot of people we interview in our book have
20:30 the same problem when they get to their first job.
20:32 They're like, oh, my God, this is not what I was expecting at all.
20:34 And by your second job search, you have a much better understanding of really what it is
20:38 you're looking for and what actually exists in industry versus there are no jobs.
20:41 Like there's no job in industry that's just writing math theorems or whatever.
20:45 That's right.
20:46 Well, also, I think sometimes you can get into those situations because the company thinks
20:52 that's how it has to be done.
20:54 Like we've done this and we need somebody to keep doing that.
20:57 Like we only knew how to run Excel and then export this stuff and then do this weird thing
21:02 and then manually fix it up.
21:03 And then you could tell us like what the picture says.
21:05 I think that's true.
21:06 Companies naturally have a tendency to that.
21:07 And I think at the time coming out of a math master's, I'm like, oh, I want to do is create
21:12 new and exciting mathematical stuff.
21:14 So I have this like affinity for like having to change the world on my job.
21:18 Yeah.
21:18 And I think, you know, I have since then in my career done a lot of good by constantly coming
21:22 up with new forecasts and new methods of doing things.
21:25 But also it is totally fine to have a job where you spend 80% of it just pressing go again.
21:31 And then the 20% doing something interesting if you like that.
21:33 Right.
21:34 Like, so there's like, I think I kind of like held my nose up against those kinds of jobs,
21:37 but I think they're pretty good.
21:38 I've hired people on teams.
21:40 I've had a lot of people who come straight from academia have the same problem I had.
21:44 Like, oh, wow.
21:45 You want me to copy and paste each number like individually?
21:48 That could take me 10 minutes.
21:50 Like, you know.
21:51 It's got to get done.
21:52 Well, my thought was, you know, maybe the job starts out that way, but then you're like,
21:56 well, I can, I can do some program.
21:58 We actually, we don't really need to load Excel and copy and paste.
22:01 I could use something like PyOpenXL where I could actually write code that talks to the
22:06 database and then runs a report and then just puts it in there.
22:09 Right.
22:10 So you could like slowly take away these manual steps by starting to create like cool pipelines
22:14 of like processing and automation.
22:16 And they didn't ask anyone to do that because they thought that was basically impossible.
22:21 Right.
22:21 And so I feel like a lot of people can end up in these situations where there's like
22:26 one workflow that you are hired for, but you, you know, as people who can write code, we're
22:31 kind of magicians, right?
22:33 They can kind of like magic stuff into existence and you can solve some of these problems and
22:38 they would probably much rather have a single button click or something that's automatic every
22:42 day, but they just, they couldn't create it or put it in place.
22:45 Yep.
22:45 I think that's true.
22:46 And I think depending on the different size company, you may get more opportunities to
22:50 do that.
22:50 And depending on your appetite of, I want to code an interesting thing in Python to try
22:54 and automate the Excel or I'll just press Excel.
22:56 I'll hit, I'll hit go for five minutes each day.
22:59 That's fine.
22:59 Exactly.
23:00 Yeah.
23:00 I do think that's where the company culture or your manager can be important though, right?
23:04 Cause I can imagine some companies that like have a lot of bureaucracy would just be
23:07 very uncomfortable with this idea.
23:09 They're like, no, we've always done it this way.
23:11 Or maybe like companies that are like, or the government or like companies that work with
23:14 the government.
23:15 So I do think like, it's important to be, and that also kind of the reason we wrote
23:20 this book was because, you know, we felt there's a lot of technical guidance out there, but not
23:25 on these other really important skills you need.
23:27 And I do think, you know, one of those skills, if you want to change the practice of a company,
23:31 you can't necessarily just be like, you know, email it to them one day and have that be done.
23:35 You need to like, you know, talk to them, figure out their, like, you know, what kind of scares
23:38 them about this change, like do change management and other things.
23:42 And I think that's like, to not underestimate the importance of things like communication
23:46 and, and working with stakeholders when thinking of things like technological solutions, even
23:51 if to you, it may seem really obvious that like, oh, of course this is like going to be a hundred
23:54 percent better.
23:55 Yeah, absolutely.
23:57 And Jacqueline, just to make you feel better, I got my first programming job when I was in
24:01 1997 when I was working on my PhD in math.
24:04 So, you know, you can always go farther back.
24:08 All right.
24:11 Well, one of the interesting things that you discussed was that there's this term data science,
24:18 but in a sense, there's almost like three branches of data science, kind of a little bit like in
24:24 software development, you'd say, hey, I'm a programmer.
24:27 I'm like, oh, cool.
24:28 Could you build me a mobile app?
24:29 Like, no, I have no idea how to build you a mobile app.
24:31 I could build you a website.
24:32 And then someone else would go, I can't build a website, but I build a cool desktop app.
24:36 Right.
24:36 So, you know, what does that kind of partitioning look like in the data science space?
24:40 Yeah.
24:41 Who wants to jump in?
24:41 So, you know, and this is something that could be like one of the more controversial parts of
24:46 the books.
24:46 But I think like we people sort of come around to this, but how we divided it is in three areas,
24:51 which is analytics, machine learning and decision science.
24:53 And for example, one company that basically has this division and they wrote a great post on this
25:00 at Airbnb is Airbnb does analytics, machine learning, and they call it inference instead of decision
25:05 science.
25:05 But the idea behind this is analytics is basically like taking data and putting it in front of the
25:11 right people.
25:11 So just sort of showing the data that you maybe already have or going out, like maybe
25:16 going out and collecting it, but basically just, you know, maybe by making dashboards or showing
25:21 a report is just surfacing data to the right people, which is really valuable.
25:25 And then the next one, machine learning, is I think often what people think of when they
25:30 think of data science, which are things like, you know, creating the recommendation model on
25:35 amazon.com, right?
25:37 When you look at a product and it says like, you know, you may like these products or at Etsy,
25:41 we have the search ranking team, which is when you search Harry Potter, what of the 200,000
25:46 Harry Potter items do you show first, right?
25:48 And they don't pick randomly.
25:49 There's an algorithm that's based off of historical, how the items have behaved historically.
25:54 And then the final one is decision science.
25:56 So this is basically going beyond the numbers to help companies or people make decisions.
26:00 And also generally involves a lot of statistics because basically it's, we need to understand
26:06 how to quantify uncertainty.
26:08 So even though we know, for example, that, you know, the people who answered this, we
26:13 ran a survey and, you know, 80% of people said this.
26:16 Well, but we had a 50% non-response rate and maybe we know that more women than men didn't
26:22 respond.
26:22 So how do we adjust for that?
26:24 What's the uncertainty around this estimate?
26:26 Making a forecast, you know, as Jacqueline talked about, like that's decision science,
26:30 you know, there.
26:31 So those are the three main areas we have.
26:33 Right.
26:33 Our mailing list already is like skewed towards this audience.
26:37 And if we just ask the mailing list, hey, everybody, tell us what you think.
26:40 It's going to carry that bias forward or that slant forward unless we can somehow do more
26:47 to take care of it and whatnot.
26:48 Right.
26:48 Yeah, exactly.
26:49 And yeah.
26:49 Anything to add there, Jacqueline?
26:50 Yeah.
26:51 I would just say that I think a lot of people have like a preconceived notion that like one
26:55 of these types is more pure or like one of these types is like the better.
26:58 Yeah.
26:59 And I'm sure it's a real data scientist, not the ones who use Excel.
27:02 Exactly.
27:03 To get the title.
27:03 And it's like, you can see it in stacked overflow posts.
27:06 You can see it a lot in LinkedIn posts.
27:09 Like there's a lot of this idea.
27:10 Probably a lot on Reddit.
27:10 Yeah.
27:11 Oh, yeah.
27:11 Oh, God.
27:12 Yeah.
27:12 I said I was going Reddit data science, but like, yeah, there's definitely can be this
27:16 culture of like, you're not a real data scientist.
27:17 Like if you don't do machine learning.
27:19 Just real quick, I'll give you the pitch for why I think each one of these has the right
27:23 to be great and like, isn't the best one.
27:25 So, okay.
27:26 So I think the reason why people like the machine learning the best is you're like, oh, cool.
27:31 I get to use, you know, real time inferences.
27:34 I get to actually help.
27:35 So like a customer, when they go on their website, they actually, what happens to them depends
27:39 on what my algorithm did.
27:41 And like, it's pretty cool to be able to say that like, I actually improved everyone's outcome.
27:45 So my car drives down the street by itself.
27:47 Yeah.
27:47 Everyone can see that.
27:49 The decision scientist, you got to be like the company's detective, right?
27:52 Like the CEO, like, like high level people can come up to you and be like, yo, I have
27:56 this question.
27:56 Can you figure it out?
27:57 And you get to like put on your detective hat and go into data and really try and come
28:01 up with an answer.
28:01 So yeah, you get to like play detective.
28:03 And the, the kind of like analysis, the analyst role.
28:06 It's great because it's like those other two roles, your things can go terribly wrong, right?
28:12 You're, you're, you can be a detective and not find the killer.
28:14 Your machine learning model can ruin things for customers.
28:17 Like things can go catastrophically wrong.
28:18 Being an analyst, you're just here to help things.
28:20 You know, you're helping, you're keeping the company going.
28:22 It's like a more relaxed.
28:23 You're giving advice, but you're not making the decision.
28:25 Just this is what we, we found.
28:27 Yeah.
28:28 So it's like, yeah, it's like, it's helping everything run more effectively without the
28:33 like incredible amounts of stress of trying to get things right that the, you know, or trying
28:37 to build new questions, you know, research and development things that you have in the other
28:40 two fields.
28:41 So it's like more of a relaxed, but enjoyable job.
28:43 I'd also say like, so often there's so much low hanging fruit in the analytics side of
28:48 like things that companies aren't looking at that would really change their decisions if
28:52 you just surface these numbers.
28:53 And plus, like, I think sometimes people can look down and it's like, oh, that's like easy.
28:57 Like you're not using, you know, stats or machine learning.
28:59 Well, it's actually, you know, it can be really hard to like pull the right data sometimes to
29:05 understand when someone's asking you the question, like, hey, can you, oh, there was a great
29:09 tweet yesterday where someone is like, you know, stakeholder, like, can you pull this data for me?
29:12 And you're, you know, and you're like, yeah, sure.
29:15 Let me just pull from, you know, select star from ideal and pristine table that you think
29:19 somehow exists.
29:20 And there's actually a lot of work to elicit the true question they're asking.
29:26 So I'm probably underlying all of this is data wrangling.
29:29 Yeah.
29:29 I think all of the people have to do data wrangling and it's really just a skill like data wrangling.
29:34 I think trying to be able to explain what is happening in the data, like, so kind of the
29:38 input and output.
29:39 Like you really need all, you need that for all three of these jobs.
29:42 So if you don't, if you're not comfortable taking data, trying to figure out, like, you
29:46 know, put it in a way that you can then use it.
29:48 And if you, you aren't comfortable looking at some numbers and trying to say like, oh, well,
29:51 this number plus this number really means that any of these three jobs is going to be more
29:55 difficult.
29:56 Yeah.
29:56 How much does knowing how to talk to databases matter?
30:00 Like writing SQL queries or things like that?
30:03 Or can you get away without that?
30:05 It really matters.
30:06 Even so the, I see, I would say, the SQL ideas I've seen show up in every data science job.
30:11 And I mean, I don't know.
30:12 I haven't seen every data science job, but everyone I have seen.
30:15 There's that one that only works with CSVs, but besides that one.
30:18 But even if you don't actually work directly with SQL, the idea of taking two CSVs and joining
30:22 them somehow together and then filtering out the rows, like because so much of the data
30:26 in the world is stored in a tabular format, you really have to think, like understand how
30:31 SQL and like relational databases work.
30:33 And if you don't actually know exact SQL syntax, that's fine.
30:36 Like maybe, you know, the pandas, whatever, or the RDPly or whatever.
30:40 But like the just concept of thinking through tables is like, yeah, you need it everywhere.
30:44 Yeah.
30:44 Emily, what do you think about that?
30:45 Yeah.
30:46 I would definitely say it's, it's one of the foundational skills.
30:49 And the good thing is like the basics of SQL, you can pick up pretty quickly, like just
30:53 like how to select from a table.
30:54 And then, you know, you can grow as needed, you know, maybe if the data engineer is helping
30:58 you out.
30:59 But, you know, of course, if you can't, if you can't access any data, you probably can't
31:03 do much data science.
31:04 Yeah.
31:05 That's a really good way to put it.
31:06 But also that's not a hard skill.
31:08 I mean, it's not really a hard skill to learn.
31:10 Like, yeah.
31:11 It seems weird and hard if you've never seen it, right?
31:14 Like, how do I connect to it?
31:15 This connection string is really complicated.
31:17 Yeah.
31:17 But you're right.
31:18 It's not a big deal.
31:19 It's just something you got to learn.
31:20 Now, I guess thinking of these three different types, it's one of the things that struck me
31:26 and you pointed out one, there's two things.
31:29 One was that the machine learning role is probably a little more computer science-y because you're
31:36 taking code and you're putting it into production and it's real time.
31:38 You're probably fitting in with APIs that other people are talking to and you're building stuff
31:45 that machines talk to.
31:46 Is that accurate?
31:47 What do you think?
31:47 I would say that the machine learning is more computer science-y.
31:51 Yes.
31:51 A hundred percent.
31:51 You do really need to understand things like unit testing or load testing in ways that the
31:56 decision scientists and the other roles don't necessarily need as much.
31:59 Right.
31:59 HTTP status codes and JSON and all that potentially, right?
32:02 Yeah.
32:02 The risk of the machine learning engineer is that that actually becomes the risk.
32:05 The risk is if you're not careful, your job could just become software engineering.
32:09 I know a lot of machine learning engineers who, well, their company doesn't have that much
32:12 machine learning engineering to do at the moment.
32:14 So you're just going to be a software engineer and then that's not great.
32:16 But the converse is as a decision scientist, you have much more stats and like just building
32:22 the actual like models.
32:23 But if you don't have the work to do as a decision scientist, there's not reports, you know,
32:28 not super interesting models to build and questions to answer.
32:31 You might end up just doing dashboards or something that, you know, like any of these jobs kind
32:35 of have a risk of falling into something you don't like.
32:37 It's just a question of which way does the rock fall down the mountain or whatever.
32:41 I don't know if that's a real metaphor, but.
32:43 Some mountains.
32:45 So another thought that I had while we were talking about this is different.
32:50 The people in these different groups will have massively different exposure to like the
32:55 C-suite or the decision makers of the company at a high level.
32:59 I'm thinking of a large company, like 500 people or more, not like a startup.
33:02 But, you know, the analysis person could easily get called in front, you know, for like a board
33:09 meeting to help them decide, you know, how are things are going.
33:12 Maybe the decision scientist, it's not so likely the machine learning developer is like, well,
33:17 they've decided and then they were told you're going to build this model and here's what they're
33:21 hoping for.
33:21 Right.
33:21 It's, it's a different kind of, you would still be working with a lot of technical people,
33:26 but you have like different ways to grow within the company, I guess.
33:30 Is that a good way to think of it?
33:31 Yes.
33:32 I think that is absolutely the case that if you're in this, if you're an analyst or a decision
33:37 scientist, then you are much more likely to get to go to a CEO, like go in that meeting
33:41 and show some interesting data that can prove something.
33:43 If you're a machine learning engineer, usually you are building a product, like Emily was
33:47 saying, like you're building a recommendation engine.
33:49 And then there's some product person whose job it is just to be in charge of that product
33:52 and they get to go and have to see.
33:54 You only go to the C-suite if you're going to be like raked over the coals because you
33:59 wrecked it with your machine learning.
34:00 I recommend it wrong.
34:01 But that being said, I think a lot of people who are sufficiently technical are like, oh,
34:06 I wouldn't want to do decision science.
34:07 I really want to do machine learning because I don't want to have to deal with like convincing
34:10 people.
34:10 I just want to have to deal with cool data modeling or whatever, you know, machine learning
34:14 modeling.
34:15 But it turns out that as Emily is saying, to do those jobs well, you still have to be able
34:19 to talk to the software engineers and the data scientists who built the model and the product
34:23 person who needs to know if the recommendation is going to be good enough for the customer.
34:26 Like you still have to do lots of talking to be good at it.
34:28 It's just that it is less of a core tenant than it is of perhaps some of the other roles.
34:32 Yeah.
34:32 How does this affect early stage careers?
34:35 Right.
34:35 Like I can, I can see somebody who like Emily in 2017 just came out of a bootcamp and
34:41 they said, okay, you're going to go talk to the CEO of Etsy and the board and like help
34:45 them with this product.
34:46 You'd be like, oh my goodness.
34:47 Like what have I gotten myself into?
34:49 Like that would on one hand be awesome, but also terrifying.
34:51 Do they fit better at different stages of careers or does that really matter?
34:56 I think it probably doesn't matter as much because like for a company that's big enough
35:00 for that prospect to be kind of terrifying, like if my last company was a startup, so like
35:03 I talked to the CEO all the time, but basically felt like another coworker.
35:06 So yeah, for it to matter, like you're probably going to have more senior people, right?
35:10 Who, if they are going to like have someone present to the CEO, it's probably not going
35:13 to be the person who joined two months ago.
35:14 Also, the other thing we didn't really talk about is how much you're specialized into one
35:19 of these roles does depend on the company.
35:20 So often that's like the company size and maturity of the data science team, right?
35:25 So at certain companies, you may be like fully like a machine learning engineer, but
35:28 if you're the first data scientist at a startup, you're probably doing a mix of all of these
35:32 and you wouldn't go as in depth in any one of them, right?
35:35 Like a startup probably doesn't need someone who can handle hundreds of millions of items,
35:40 like recommendation items, like Amazon would, like you don't need that compute power, but
35:44 maybe you build like a simpler recommendation model.
35:46 And then you also play detective work and you also, no one actually knows what the sales
35:50 number are.
35:50 So you like finally make a dashboard.
35:52 Right.
35:52 You probably do a lot of growth at like an early stage startup, a lot of AB testing type
35:56 of work.
35:57 Yeah, exactly.
35:58 So I don't want to make it seem like, oh, every role like falls into like one and only
36:01 one of this, because you certainly can have roles where you're, where you're putting on
36:05 multiple of these hats.
36:06 I would also say that not only depends on the company you work at, you may do multiple, but
36:10 also you can during your career change.
36:12 I didn't do any machine learning up until like two or three years ago.
36:15 And then I switched over to doing that now.
36:17 So now I kind of do both, but like lots of people switch in lots of directions between
36:21 any of these three jobs.
36:23 And that, that is the thing that it is possible to do.
36:25 Yeah.
36:25 Yeah, for sure.
36:25 Yeah.
36:26 Chapter one interview, Robert Chang is over at Airbnb is a really good case study in this.
36:30 So he started more on like the analytics side and the decision science.
36:33 I was working at Twitter.
36:34 He then started to continue that work in Airbnb.
36:36 And then he ended up switching over to do machine learning.
36:39 And he actually has blogged about this.
36:40 And like, as part of that process, like he did need to up his skills a bit.
36:44 So for example, he'd previously done most of his work in R, but the teams that do machine
36:49 learning, like a lot of the libraries were built in Python.
36:51 So he actually has a repo where he talks, where he like put his deliberate practice for Python
36:56 and how he was going to learn that over a couple months.
36:58 So he can make the switch.
36:59 That's cool.
37:00 Yeah.
37:00 You can definitely switch.
37:02 I mean, I've definitely made big switches in my career as well, from like being terrified
37:06 of the web to only working on the web and stuff like that as well.
37:08 Yeah.
37:08 And I would just add, oh, sorry.
37:09 I would just add on that.
37:11 You know, I've talked to a lot of people who have wanted to switch and had trouble because
37:15 these jobs are a resource and the company has a finite amount of them, right?
37:19 So there's some companies where they just don't have any machine learning engineering.
37:22 And so if you really just would love to do machine learning engineering, you're going to be
37:25 in trouble because there's just none of those jobs available.
37:26 Or as Emily points out, maybe they have a couple of them, but like people who are super
37:30 senior are already working on them.
37:32 And some companies, like you're a startup and like they have way too much work they
37:35 could possibly do any of it, you know, all of it.
37:37 So you can do kind of have a lot of freedom.
37:39 And so sometimes if you want to make this transition and you're finding it difficult, you need to
37:45 switch companies.
37:45 This portion of Talk Python to Me is brought to you by Linode.
37:50 Whether you're working on a personal project or managing your enterprise's infrastructure,
37:54 Linode has the pricing, support, and scale that you need to take your project to the
37:59 next level.
37:59 With 11 data centers worldwide, including their newest data center in Sydney, Australia,
38:04 enterprise-grade hardware, S3-compatible storage, and the next-generation network,
38:10 Linode delivers the performance that you expect at a price that you don't.
38:14 Get started on Linode today with a $20 credit and you get access to native SSD storage, a 40-gigabit
38:20 network, industry-leading processors, their revamped cloud manager at cloud.linode.com,
38:26 root access to your server, along with their newest API and a Python CLI.
38:30 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically
38:36 get $20 credit for your next project.
38:38 Oh, and one last thing.
38:39 They're hiring.
38:40 Go to linode.com slash careers to find out more.
38:43 Let them know that we sent you.
38:46 Speaking of companies, in your book, you have a really interesting conversation about different
38:52 kinds of companies.
38:53 And I've been fascinated.
38:54 I've worked at almost all of these different types.
38:58 Early-stage startup, late-stage startup, probably.
39:01 Mass, quite, yeah, let's go with massive tech company.
39:04 But not a government contractor.
39:06 I've worked sort of subcontracting with them.
39:08 I've worked at most of these.
39:09 And a lot of those experiences are not really obvious if, say, you're in a boot camp and
39:14 you're just looking for a job.
39:16 You have been through the internals of these things.
39:19 So maybe you'll give us a flyover of the five different types of companies and maybe a little
39:26 bit of example about each.
39:27 What's the team like?
39:28 What's the tech like?
39:30 What are the pros and cons?
39:31 And so on.
39:32 Sure.
39:32 We realized very quickly that when we were writing our book that we needed some sort
39:37 of way to help people understand what is the actual job like.
39:40 And then we're like, well, it really is so different depending on which company you're at.
39:44 And so Emily and I kind of brainstormed five different companies we worked at.
39:48 And then we kind of came up with goofy alternative names for them.
39:51 But if you look at our LinkedIn profile, you can probably guess.
39:54 Don't give it away to Emily.
39:55 I love that you have an actual little...
40:01 custom logo for each one.
40:05 Yeah, that was all Jacqueline.
40:06 Yeah, and I thought about which fonts to use with which company.
40:09 Yeah, it was well done.
40:11 So the five companies...
40:13 So we have MTC, which MTC is like your Google, your Apple, your Microsoft, these companies,
40:18 that's just like giant tech company.
40:20 So they're rich.
40:21 They're so big that like each part of the company uses a different type of tech.
40:25 You know, so they have lots of advanced stuff.
40:27 But because they're so big, you may not actually...
40:29 Your stuff may not link up with...
40:31 You know, if you're working on Google Maps, you may have nothing to do with a Google self-driving
40:34 car sort of a thing.
40:35 The second company is Handbag Love, which is just some company that's like a retail company,
40:39 you know, like a Nordstrom, DSW, one of these companies that is big.
40:44 They've been around for a while.
40:44 They use data science, but that's not like their thing.
40:46 But they're not a tech company.
40:48 Right.
40:48 Right.
40:49 And so I really like working at those kinds of companies because you got to like go in
40:53 and really do a lot because no one's there to tell you, oh, you can't use Python.
40:56 You have to use R or whatever.
40:58 Yeah, exactly.
40:58 There's no...
40:59 Like, let me talk to other software developers.
41:01 There are no...
41:02 Yeah.
41:02 There are none.
41:03 Like, okay, well, I can just...
41:05 These are the problems.
41:06 Please solve it with technology.
41:07 These are your requirements, right?
41:09 Yeah.
41:09 There's no rules and restrictions.
41:11 And so then we have this SegMetra company, which is like some company with like a hot new
41:15 idea for a startup.
41:16 And they're...
41:17 You know, it's really just like a classic startup where it's like there's so many things
41:20 that need to be built at once that like everyone just kind of in a constant panic
41:23 attack.
41:23 You get to do whatever you want.
41:24 So it's a lot of fun and exciting.
41:25 Then there's Videory, which is like, imagine if like...
41:29 What's that company that's Vimeo?
41:31 The company that's not YouTube, right?
41:32 So like some company that's...
41:34 Yeah.
41:34 You know, it's a tech company.
41:35 It's, you know, decent size, but it's not huge.
41:37 So everyone knows each other.
41:38 Right, right.
41:38 Maybe Zoom even.
41:39 Yeah.
41:39 Like we're talking on Zoom.
41:40 Could be something like that, right?
41:42 Yeah.
41:42 Yeah.
41:42 And then lastly, I forget what I call it.
41:44 Some GAD.
41:44 So it's basically like some giant government compactor.
41:47 Geo Aerospace or something like that.
41:48 Something like that.
41:49 And it's basically, think of your Lockheed Martin, your Boeing.
41:52 People don't, I think when they talk about data science, they usually don't think about
41:54 these companies as often, but they have tons of people like that, especially analysts.
41:58 Like these companies run on that.
42:00 And because these kind of government contracting companies are massive, they've been around for
42:05 a long time and they really don't want to make mistakes because that can cause a lot
42:08 of damage.
42:09 It's just a lot.
42:09 Everything moves a lot slower.
42:10 There's a lot more bureaucracy.
42:11 It's more of a relaxed job than working at like a startup.
42:13 Yeah, sure.
42:14 All right.
42:14 So which one of you worked at the massive tech company equivalent?
42:18 I don't know if I should say, I consulted for a massive tech company equivalent.
42:22 I'm not asking which one, just like, but you did, Jacqueline, that was you that worked
42:26 at something like this?
42:27 I worked at something like this.
42:28 So the reason I'm asking is because I want to ask you for your take on it, right?
42:33 Like, what is the team like?
42:34 Oh.
42:35 What is the tech like?
42:36 And so on, right?
42:37 Don't name names.
42:37 Oh, no, no, no.
42:38 Okay.
42:38 Okay.
42:38 Yeah.
42:39 Actually, I realized I actually consulted for a couple of them, so I'm not incriminating
42:43 anyone.
42:43 Anyway.
42:44 So when I consulted for these companies, they're like, because they're so big, they're
42:49 so big that, you know, they may have this like big onboarding process that everyone goes
42:53 through, but it has nothing to do with your actual job because the company is too big to
42:56 do that.
42:57 And then when you got on your team, it's like really specific.
42:59 I recently started working with some company like this.
43:02 They were working with the podcast, right?
43:04 They were doing some ads and stuff.
43:05 I had to go through and like sign a waiver that said nobody would climb on a ladder in a dangerous
43:11 way.
43:11 Yeah.
43:12 I'm like, it's a podcast recording.
43:15 You're going to give me audio.
43:15 Like, there's no ladders.
43:16 I don't know.
43:17 But like, this is the other one.
43:18 I see a ladder in your background.
43:19 I didn't know about that.
43:21 Yeah.
43:21 Actually, maybe this is what they're talking about.
43:23 It was like the warehouse person and the like contractor who does podcasting, whatever.
43:29 Like it didn't, you know, they wanted to run an ad.
43:31 So I had to go through this like weird process.
43:34 It was bizarre.
43:34 Yeah.
43:35 And so the cool thing about working with this company is they have tons of money and they're
43:38 really excited about technology.
43:39 So if you're like, I want to buy this expensive thing and try building a solution using that
43:42 people are generally like, sure, whatever.
43:44 It's fine.
43:44 The bad thing is this is true for everyone else as well.
43:47 So when your product A is trying to link up to product B, you may struggle a bit.
43:52 So there's just a lot of this kind of lots of tech, lots of money, high salaries, not
43:55 necessarily everything working in sync that you have to deal with.
43:58 Yeah.
43:58 You probably get to work with a ton of smart coworkers.
44:00 Yeah.
44:01 It's a bit of a bonus and a curse, right?
44:03 It's hard to stand out probably, but it's also great to have that support.
44:07 Right.
44:08 And if you're a person who really likes learning from other people and like having direct
44:11 mentorship, you are, this is one of the best companies to get that out.
44:13 Because yeah, like this company just draws people who know a lot of tech like a magnet.
44:18 Yeah.
44:18 So like one thing what we did is because, you know, it may be the case, like, you're
44:21 looking for jobs and like, you know, it's easy for you.
44:23 Like you're thinking of finding Google.
44:25 You're like, okay, that's the massive tech company.
44:26 But maybe you find a job and you're like, well, it doesn't really fit into any one of
44:29 these five things.
44:30 It's like one thing we do at the end of the chapter is we pull it together.
44:33 Okay.
44:33 Like what are some of the vectors that the companies differ on?
44:36 Right.
44:37 So mentorship, bureaucracy, like the tech stack.
44:39 So even if you find one, you know, you have a company that's not in one of these five
44:43 archetypes, you can sort of go through those things.
44:45 You say like, oh, okay, well, like it's a huge company.
44:48 So like probably there's a decent amount of bureaucracy.
44:49 I would be the first data scientist.
44:51 There's not going to be a lot of mentorship.
44:52 And so you can think about these different pieces and, you know, people have different
44:57 preferences, right?
44:58 Like some folks really, I've talked to people who really love, usually we don't, it's, I
45:02 wouldn't recommend it for someone's first job, but people who want to be the first data
45:06 scientist at the company because they want to get to build everything.
45:08 And then there's some experienced data scientists who are like, I would never want to be the first
45:12 or the only data scientist.
45:13 Like I really like working on a team.
45:14 So it's not like, you know, one of these is like, you know, oh, everyone, you know, it's
45:18 always bad to like have these certain things, but it's just different criteria that you can
45:22 think about and reflect for yourself.
45:24 Like what's important to me?
45:25 What am I looking for?
45:26 Yeah.
45:26 What's the fit?
45:27 All right.
45:27 So handbag, glove, who wants to talk about that one?
45:29 I can do that one too.
45:31 All right.
45:32 Just as I was talking about before.
45:33 So like, there's like a retailer, like let's call it like if it's, yeah, again, like Nordstrom,
45:37 Boatlocker, one of these companies that's like a retail company.
45:40 The cool thing about this is they have a very real product that they've been selling for
45:43 a long time and understand what they are doing.
45:45 So like you add a lot of stability there.
45:47 And by adding on data science, these companies are a lot like, okay, well, let's try and use
45:51 data to improve the product recommendations, improve the product, improve, improve our understanding
45:57 of things, you know, like use data to answer questions.
45:59 So you get a lot of, there's a lot you got to do as a data scientist.
46:02 You have a lot of.
46:03 Right.
46:03 They used to use intuition and now they're going to use data or something like that.
46:06 Right.
46:06 Yeah.
46:06 And so that's the upside.
46:08 So downsides are you don't have as much money.
46:10 Cause you're not like a rich tech company.
46:11 Your tech isn't as good because you know, you just don't care as much about getting the
46:15 best of the best.
46:16 Like, you know, older tech is generally fine.
46:18 And, you know, just as we were talking about, you generally have fewer people who can like
46:21 mentor you.
46:22 Like there'll be someone there, but you know, there'll be people there generally, but it
46:25 might be that everyone know, everyone's using like a far outdated Python library because
46:30 no one knows about the new way and no one's reading up on it.
46:32 So exactly.
46:33 They're still on Python too.
46:34 Something like that.
46:38 Cool.
46:39 And then early stage startup.
46:40 What's the story for data scientists there?
46:42 Yeah.
46:43 I could talk a little bit about this.
46:44 So yeah, with data scientists, like you come in and you really get to shape everything.
46:47 So like there's some negative parts.
46:50 So even beyond the data science part, right?
46:51 Like you might show up at the startup and they're like, oh, we don't have your laptop yet.
46:54 So it's sort of a funny thing is like, there's, but there's also more freedom because they
46:58 might ask you like, Hey, what kind of laptop do you want?
47:00 Like if they're like a decently well-funded startup and you can be like, oh, I want this really
47:03 souped out laptop.
47:04 You don't get that super slow clunky one with a huge company banner that takes five
47:09 minutes to start up.
47:10 Yeah, exactly.
47:11 That's a mixed bag.
47:12 But yeah, often you're like, it's talking about there's a lot of low hanging fruit.
47:16 You also have to wear, you may have to do some data engineering, right?
47:19 Like maybe there's not any data engineers and all of the databases are optimized to like,
47:24 you know, serve the website.
47:26 And so it takes you five minutes.
47:27 So like get a count of a $800,000, $800,000, $800,000 row table.
47:32 So yeah, so you have to wear a lot of different hats.
47:35 You might be pulled in a bunch of different directions.
47:37 So it's also really important to be able to prioritize, like to not just be like firefighting,
47:41 also take some time, like to, for example, build up some skills, like to build up your toolbox.
47:46 So like, okay, maybe write a library for yourself of like, that's a wrapper around pulling
47:50 the data.
47:51 So that becomes easier.
47:52 That's a really important point, because I think a lot of these, I've worked in places
47:55 like this, and it's nobody asks you to build a helper library for data access.
48:00 They help you.
48:01 They ask you, give me this answer or make this product or give me this thing.
48:04 And you're like, yeah, but we really need this thing in place.
48:08 And somebody's gonna have to build it.
48:10 It's gonna be me or the other person I'm working with.
48:12 And you just kind of have to be willing to put in that infrastructure along the way, right?
48:17 Because you're going to appreciate it later, but there's no guidance for that.
48:20 Right.
48:20 And it's definitely not in place usually.
48:22 Yeah.
48:22 And you have to like help teach people like how to ask questions, like what is possible,
48:26 like bring in best practices.
48:28 So it's like I was saying earlier, I really would not most of the time recommend this for
48:33 someone's like first data science job to do this.
48:35 But for an experienced data scientist, like I found some people who really, really love
48:39 doing this because they're like, oh, I don't have to deal with like, you know, the decisions
48:43 of past data scientists.
48:44 I get to shape this in my vision and I get to use the most modern tools, for example.
48:48 You want to use Python, you can.
48:49 You want to use R, you can.
48:50 Like no one's going to, there's no one there.
48:52 So you just, they just need answers.
48:54 Yeah.
48:54 You can use F sharp, right, Jacqueline?
48:55 Yeah.
48:56 I get a lot of, a lot of people make fun of me because my favorite programming language,
49:00 no one else in the world uses.
49:02 Well, I did see the Jupyter Notebooks now support F sharp.
49:05 So that's, that's a vote.
49:06 Be still in my heart.
49:08 That's awesome.
49:09 Yeah.
49:10 But I mean, like early stage startups.
49:12 And I would say also handbag love companies a little bit as well because they may have some
49:17 tech stack, but it might be so outdated.
49:19 They're like, you're new.
49:20 We want to like go in, we want a refreshing direction where you can go in this other way.
49:24 We're not going to make use this old thing.
49:26 We're going to try to get, you know, get something new growing here.
49:29 So you can go and have some flexibility as well, I think.
49:32 Now, what about the videory, the later stage startup?
49:36 Yeah.
49:36 For my preference, I think this is kind of a sweet spot because like you have like, you
49:40 know, it's sort of like in the, in the, in the middle of a lot of these things, right?
49:43 Like there's like some bureaucracy, but I kind of like bureaucracy sometimes.
49:46 Like HR has their stuff figured out.
49:48 Like that's nice.
49:49 It's like benefits and other things.
49:51 There actually is vacation.
49:53 Yeah, exactly.
49:54 You know, so there's usually like, there's a team of data scientists, but since it's, it's
49:58 still like a startup, you know, they weren't, you don't have like a, a, you know, 40 year
50:02 old tech stack, right?
50:03 Like most decisions were made made like five or 10 years ago.
50:05 If that.
50:06 Yeah.
50:07 So I think this can be a nice fit.
50:09 Like you can still get, you know, you can still like know everyone on the data science
50:12 team.
50:12 Like on like, if you're at like a massive tech company and have support, but also have some,
50:17 some structure in there as well.
50:18 And there's probably like data engineers and other people to help out with like data science
50:22 adjacent problems.
50:23 Yeah.
50:23 You're probably a little more locked into a tech stack.
50:25 Yes, that is true.
50:27 Yeah.
50:27 I don't think you can really like do a tech stack from start.
50:30 And so you're locked into certain decisions, you know, and, and there may be sometimes you're
50:34 like, oh, I wish I have a time machine and like could go back and like fix this decision
50:37 they made a while ago.
50:38 Right.
50:38 Like you're at an early stage startup.
50:39 You can be like, all right, we're going to start like collecting data right away.
50:42 You know, we're going to log everything.
50:43 And then if you're at like a, you know, later stage company, they're like, oh, like, why don't
50:48 we like look at the state?
50:49 And you're like, oh, actually we weren't collecting that a year ago.
50:51 And they're like, okay, but make a forecasting model anyway.
50:53 And you're like, oh no, no, no, no.
50:55 You don't understand how uncertain this answer is going to be.
50:59 Yeah.
51:00 And then I guess the last type of company archetype that you all covered was the government contractor,
51:05 the Lockheed Martins and the Hallibartons and so on.
51:08 Yeah.
51:08 And I should also mention this includes the government itself, right?
51:11 Like if you work for the Department of Transportation or something like that, or just, you know, companies
51:14 where there is for legal reasons, there is just a lot of regulation, a lot of things
51:18 like, you know, keeping things moving a little slower.
51:20 And so these kinds of jobs, they tend to, they have to tend to have lots of people who are
51:24 not data scientists.
51:25 And you tend to have the data scientists maybe embed in little groups of that.
51:28 So like in the missile department or whatever, or the, you know, truck department, I don't
51:32 know.
51:33 And so because of that, you generally, you don't have as much mentorship often, but you often
51:39 don't have as much people telling you, no, you can't do it that way.
51:42 You're wrong.
51:42 I mean, you may have it like, oh no, we don't support Python past 2.7 because our, you know,
51:47 our procurement department hasn't cleared it or whatever.
51:49 So maybe bureaucracy, but there isn't like, oh, you have to, you know, like there's, there's
51:53 just not as much of like a standardization around tech just because there's, you know, that's
51:57 not the focus of the company.
51:58 And so these kinds of jobs, I'd say are really, they're really great.
52:01 If you want a job where you go in each day, you work eight hours with a 45 minute lunch
52:06 in there, you get a little bit of stuff done, but you don't stress crazy about getting it
52:10 the most you possibly can done.
52:11 And no one's stressed about you getting exactly the most, right?
52:14 So there's not like, you know, if you're a job where you're like, I'm going to go in,
52:17 I'm going to be the 10X data scientist.
52:18 I'm going to rock, you know, my career is going to be a rocket ship up to the C-suite as fast
52:23 as I can.
52:23 Like, this is not the kind of company for you.
52:25 It's the kind of company that's for more for people who are like, I just want to do a consistently
52:29 good job.
52:30 Like then go home and take my paycheck and spend it on something I enjoy.
52:32 Yeah.
52:33 And don't need a lot of perks.
52:34 That's a good way to put it.
52:35 Yeah.
52:36 Yeah.
52:36 Because I've talked to people who are like, yeah, it's like, especially if you look at some of these
52:40 tech companies, right.
52:41 And like, I don't know, Airbnb and like Rose on tap or something like you, you're lucky if you get
52:45 coffee at some of the like government contractors.
52:47 Yeah.
52:48 That's for sure.
52:49 I think another thing that is interesting is so many of these types of companies are driven by like
52:56 government contracts or projects.
52:59 I'm thinking of like DARPA funding and like, here's a project that is guaranteed to run for
53:04 one year and then it may immediately get canceled no matter what.
53:08 Right.
53:08 So you have like these sort of long time horizons of working on something, but there's, it could
53:12 become a totally different type of job because some other contract was won and this one was
53:17 expired, lost, whatever.
53:19 Yeah.
53:19 And I think there's kind of, I would say not just besides government contractors, you can
53:22 imagine there are some other fields that might kind of fall into this, like certain parts
53:25 of healthcare might kind of fall into this area.
53:28 Yeah, definitely.
53:29 You imagine parts of finance, like some like, you know, rules around, you know, financial
53:33 risk regulations might kind of have some of these components too, but it's more the archetype
53:37 of the company that's got a lot of regulations or reasons why it has to move slowly and not
53:42 break things.
53:42 Yeah.
53:42 Yeah.
53:43 Interesting.
53:43 I think that's a really cool list you put together and I agree with a lot of your assessments
53:48 there.
53:49 So pretty neat.
53:49 Now we've been talking forever and we just barely touched on the stuff that you're covering.
53:53 We could just talk for so long because this is such a great book and a great topic,
53:57 but just for the sake of time, let's talk about one more topic and maybe blend this together.
54:01 So let's talk about getting the skills, becoming a data scientist from wherever you're starting.
54:06 And then also maybe just real quickly building a portfolio, because like I said at the beginning,
54:10 I do think having that first job is super important and getting that first job is strongly influenced
54:16 by just having something I can show.
54:18 You want me to do this?
54:19 I've already done it.
54:20 You don't have to verify if I can do it.
54:22 I look, this is it.
54:23 Just look at it.
54:23 You know, is it a personal fit or a salary fit?
54:27 Or whatever, right?
54:27 So let's start with getting the skills first.
54:30 Okay.
54:30 Yeah.
54:31 Back when you were talking about a master's.
54:32 Yeah.
54:33 And Emily, you're talking about a bootcamp.
54:34 Those sound like two different paths to me.
54:36 You didn't necessarily study programming, right?
54:39 You kind of went the math side, which actually I did as well.
54:41 Yeah.
54:42 I can cover.
54:42 Yeah.
54:43 Let me talk about all the different ways you can get skills.
54:46 And then Emily can talk a little bit about the portfolio, because that may or may not
54:49 have aligned with the chapters before.
54:52 I mean, who's to say?
54:53 I would never reveal that.
54:54 So we really, we think there's like four ways you can kind of get data science skills.
54:58 One is you can get a degree, which usually it's for people that go and get some sort of
55:02 master's degree, which is either like data science or maybe computer science or math, something
55:06 like that.
55:07 And the degree is great in that it, if you don't have that much of a background, you
55:11 will learn what you should, you will spend two years doing it.
55:14 So you should learn the basics of what you actually need, right?
55:16 The data science degree, you should learn the data science skills and you might do some projects
55:20 during it.
55:20 The downside is it takes two years and like 80 grand.
55:23 That's so much money.
55:24 Yeah.
55:25 A bootcamp takes 12 weeks and like 15 grand.
55:28 So that's much faster, much cheaper.
55:30 And the whole point of a bootcamp is to get you what you need as quickly as possible.
55:33 And I feel to me like bootcamps almost do a better job of connecting you with a job afterwards
55:38 than like a master's program.
55:41 Yeah, I think that's true.
55:42 And I think generally I would recommend bootcamps more except for people.
55:45 Bootcamps, you really need some sort of background already.
55:47 Like you need to have some idea of programming or some knowledge of this kind of field already.
55:51 If you don't know anything about data science, that might be 15 grand.
55:54 Then you still are just kind of confused.
55:56 You can, the third option is you could try and find data science work within your job.
56:00 If you're an analyst and you want to do more decision science, you can try and find
56:03 places where you can do decision science in your analyst job.
56:05 If you're a decision scientist and you want to do machine learning engineering, you can
56:08 try and find places where you can do some more engineering.
56:10 So you could kind of try and learn within whatever your job is.
56:13 What if you're a scientist who kind of does a little computation and you kind of want to
56:17 drift towards the data science side?
56:19 Yeah.
56:19 So I actually know someone who is trying, you know, doing that very thing.
56:23 She was, you know, she's a scientist.
56:25 She takes measurements and she's in her job has started to use R to actually make plots and
56:29 do the kind of investigatory stuff.
56:31 And that's totally been working for her.
56:32 And then lastly, you can teach yourself, right?
56:35 There's all these courses online.
56:36 You can work on your portfolio, which we'll get into.
56:39 And teaching yourself is great because it's free.
56:40 You have to focus on the stuff you care about.
56:42 And yeah, you can really, if you can motivate yourself correctly, you can really like learn
56:46 a lot that way.
56:47 I've learned a lot this way.
56:48 The downside is, is that it requires an immense amount of discipline, right?
56:53 If you try and do everything learning online, you have to actually do those courses instead
56:56 of playing Animal Crossing.
56:58 But yeah, not that I play Animal Crossing, but no, Jacqueline's just calling me out.
57:02 And you know, you don't know if you're teaching yourself the important things or not.
57:08 At some level, you don't have a mentor when you're teaching yourself.
57:10 And that's a problem.
57:11 Yeah.
57:11 It's also, I feel like sometimes when people are trying to teach themselves, they try to
57:16 boil the ocean, right?
57:17 Yeah.
57:17 Yes.
57:17 You know, like, well, I saw this and this and this.
57:20 So I got to know all those things.
57:21 Like, no, no, no.
57:21 You just vertical slices, not horizontal.
57:23 Yeah.
57:23 Like, figure out what you got to try to build something and learn what you need to
57:27 build that shallow or deep in these areas and then go from, like, iterate, right?
57:31 Yeah.
57:31 And I think a similar problem to that is when you're teaching yourself, there's not like
57:35 a natural stopping point.
57:37 So like in a master's or a boot camp, like they end and then you're like, oh, I guess
57:40 it's like time for you to like find a data science job versus if you're teaching yourself,
57:44 it's so easy to be like, well, I can't apply to like a data scientist yet.
57:46 I haven't learned like this thing or I haven't learned this thing.
57:48 And you just make this endless list.
57:49 Yeah.
57:49 Right.
57:49 Yeah.
57:50 Which I think is, is, you know, you're always data science is a, is a career where you're
57:54 always going to be learning.
57:55 And so it's not like you, you, it won, like no one knows everything.
57:58 So you don't have to feel like, okay, I must like master, you know, the whole world to
58:02 be able to get a data science job.
58:04 Yeah.
58:04 Yeah.
58:05 I hear you.
58:05 Let me put data science up a little bit on a pedestal here.
58:08 Like, so I feel like as a developer, you can build web apps, work with databases, whatever,
58:12 like you can totally do a quick bootcamp.
58:16 You can take online courses, read books, teach yourself.
58:19 I do feel like those are skills you can mostly get yourself.
58:22 They're like painful lessons you have to learn, but I'm not sure you'd learn those in school
58:25 anyway.
58:26 But with data science, I feel like there is a level of statistics and like scientific understanding
58:33 and a little bit of math that I think is a little bit harder for people to just get on
58:36 their own.
58:37 So having some formal training in the background seems more important for data science than pure
58:43 development.
58:44 I would guess I would broadly agree.
58:46 And not because the actual statistic and machine learning models you learn as a data scientist
58:50 are like somehow harder to learn than software engineering, but because the fields, those
58:55 fields are so confusing.
58:57 And like in a layout, like statistics, like what is considered statistics versus machine learning
59:02 versus industrial engineering?
59:03 Like these are all extremely poorly laid out.
59:05 The people in those fields make them as confusing as possible to make it seem like only they understand
59:09 it.
59:09 And there's not a really an easy pattern.
59:12 There's not really just like one book out there that's like, oh, this thing in statistics
59:15 is actually that thing in computer science.
59:17 And like, they're the same.
59:17 And don't worry about that half of statistics doesn't super matter.
59:20 Like that's not easy information to find.
59:23 Yeah.
59:23 Yeah.
59:24 I do think though, it also depends on like what type of role you want.
59:26 Right.
59:27 Like, so I think that's a little bit less important in analytics role, for example, to have like
59:30 that background.
59:31 And there is certainly like, you know, people do, you can still like learn some of this on
59:35 the job, you know, whether for like mentorship or like reading books or like other, other things.
59:40 But I agree, like there is a bit of a danger because I feel like, especially with statistics,
59:43 it's like, if you run a statistical test, like it will generally spit out an answer, but it
59:49 may not be answering what you think versus right.
59:51 Like it's a little more obvious sometimes in development work, like, oh, the website didn't
59:54 load.
59:54 So I guess I have to like figure out what we're wrong.
59:56 So there's a bit more of a danger there.
59:58 Yeah.
59:58 You're always going to get a number from that library, from those algorithms.
01:00:01 Right.
01:00:02 And you have to understand what it's doing.
01:00:04 That's kind of what I was saying.
01:00:05 Like, it's really clear if the website is letting the user log in or not.
01:00:09 Yeah.
01:00:09 There's not a huge debate.
01:00:10 Maybe security is not quite right.
01:00:12 There's details you got to get right.
01:00:13 But it's generally, it works or it doesn't.
01:00:16 Just because I'm so upset about the point I made previously, because I think the point's
01:00:20 right, but I'm getting upset thinking about it.
01:00:21 It's like, so like a linear regression or logistic regression has some built in assumptions.
01:00:26 If you're in a CS department and you're like, I'm going to use a linear regression
01:00:28 to fit this as part of a neural network.
01:00:30 People are like, fine.
01:00:31 You did that in a stats department.
01:00:33 They're like, how dare you?
01:00:34 That's so incredibly wrong.
01:00:35 Right.
01:00:35 You violated the assumptions.
01:00:36 And it's like, well, these are two trained academic professionals telling you two totally
01:00:40 different things.
01:00:41 And I think that is something you get all the time in data science that you don't get as
01:00:45 often in software engineering.
01:00:47 Yeah.
01:00:47 It's exactly that kind of stuff I was thinking of.
01:00:49 Yeah.
01:00:49 Both the two things you both mentioned.
01:00:51 All right.
01:00:52 Let's close out our conversation by talking about getting a portfolio, maybe mixing a little
01:00:57 possibly contributing to open source as we're at or something.
01:01:00 Emily, do you want to give us the rundown on that?
01:01:02 Yeah, absolutely.
01:01:03 So the idea behind a portfolio, and this is especially helpful for people who don't have
01:01:07 a formal education or haven't worked in very similar jobs or been able to learn on the job.
01:01:13 Because as you were talking about, this is a way they can show they can do the work, even
01:01:16 if they hadn't had an opportunity in school or at a company.
01:01:19 Like a portfolio project.
01:01:21 So we really recommend for it is doing something original that you care about.
01:01:26 Because, you know, one thing people might default to is like, all right, I'm going to go look
01:01:29 on Kaggle.
01:01:30 And I'm going to find like one of the data sets they have.
01:01:32 And I'm going to like do this competition where they like give you a data set.
01:01:35 And they like, you know, tell you to predict this thing.
01:01:37 And the problem with that is like, one, it doesn't really show your personality.
01:01:41 It skips over the steps that are really critical.
01:01:44 And you'll need to do in like data science roles, which is like gathering the data, figuring
01:01:48 out what question to answer.
01:01:49 And also, honestly, like if a company sees that in portfolio, you know, maybe they're worried
01:01:54 that like, oh, did they just copy someone else's code, right?
01:01:56 Like this is a problem a lot of people have worked on.
01:01:58 So we recommend, you know, kind of finding, figuring out a question you're interested in
01:02:04 answering, or finding a data set that's interesting to you and exploring it to like figure out
01:02:09 like, okay, what are some like interesting findings I can have from that?
01:02:11 And so putting that together and then sharing it on GitHub.
01:02:15 So you have the code with the readme that describes it.
01:02:17 And then ideally also having a blog, because a blog is really great.
01:02:21 Someone may not look through, you know, like hundreds of lines of your code, but they might
01:02:25 be like, oh, yeah, let me read about like what they found.
01:02:27 I'm like, look at some visualizations or read a tutorial that they wrote because they use
01:02:31 natural language processing for this project.
01:02:33 Or even just look back two years and see they've been doing this for as long as they said
01:02:38 they have been or something like that.
01:02:39 Yeah, exactly.
01:02:40 Exactly.
01:02:40 And so Jacqueline shared the example project that she did, which is trading a network on
01:02:44 offensive license plate.
01:02:45 So I do want to emphasize like it doesn't have to be, you know, something very serious.
01:02:49 Or if you want to go into finance, it doesn't necessarily have to be like a finance project.
01:02:52 Because, you know, if you're like, oh, I use like neural networks, or like one of the
01:02:57 projects I did was I built a dashboard.
01:02:59 And so that shows like I can build a dashboard from scratch.
01:03:02 So I really think this can be like a great way to show off like some of your personalities,
01:03:06 your coding skills, your communication skills with the read me and the and the blog, and
01:03:11 maybe even demo in an interview.
01:03:12 So when I was doing the job search after graduating from boot camp, I would show I bring my laptop
01:03:17 and sometimes I would show this dashboard that I had built.
01:03:19 And I'd be like, look, and you could filter it and you can click around and it like you
01:03:22 click this goes to a link.
01:03:23 And I think that made it like much more, much more real to them than if I was just talking
01:03:27 about this theoretical project.
01:03:29 Absolutely.
01:03:29 That's awesome.
01:03:30 That's really good advice.
01:03:31 Another thing I think would be valuable is if people can in the right place, the right
01:03:35 background and whatnot is to maybe contribute to some project that's relevant in the data
01:03:40 science space.
01:03:41 Right?
01:03:41 Like, if you have two people you're interviewing, and one's like, well, I'm pretty good at using
01:03:46 Jupyter.
01:03:46 The other person's like, I had two PRs merged into Jupyter.
01:03:50 And actually, you know, some of the people on the team, you know, a little bit who work
01:03:54 on like, okay, I know who I'm going to talk to a little bit more next about, you know,
01:03:57 it's a different level of credibility.
01:03:59 Even if like what you did was there were no unit tests for this part of the library.
01:04:03 So I wrote some unit tests, or I worked on the documentation, or I worked on a tutorial.
01:04:07 Like, it doesn't have to be I rewrote the main thing, right?
01:04:11 Yeah, absolutely.
01:04:12 And we have like, I think it's 14 chapters on like joining the community.
01:04:15 And that's one of the things we talk about is contributing to open source and exactly what
01:04:19 you said.
01:04:20 Like it can be, you know, writing new documentation, even fixing a typo, just these ways to get
01:04:25 involved.
01:04:26 And, you know, I do want to emphasize that like, you know, this isn't something that is required
01:04:30 to get a data science job.
01:04:31 Like I know a lot of data scientists who don't have like a GitHub with personal projects who
01:04:36 don't have a blog who don't contribute to open source.
01:04:38 So they're still like excellent data scientists.
01:04:40 But it's just like, what are the ways that one, hopefully it's fun to hopefully you learn
01:04:44 something like that's the other big point of the portfolio project.
01:04:47 It's a great way to direct your learning like you find out, oh, I need to like, you know,
01:04:51 figure out how to scrape this website, let me go like to gather the data.
01:04:54 So let me go learn web scraping.
01:04:55 And three, like and to stand out in interviews, but it certainly, you know, shouldn't, I don't
01:05:00 think it should be like a requirement for any job, for example.
01:05:02 Yeah, right.
01:05:03 I agree.
01:05:04 And it probably a different company archetypes, they probably care or completely don't care
01:05:09 about this, right?
01:05:09 Like the big geospace, geo aerospace contracts, they're probably like, okay, great.
01:05:13 We don't know that we trust you if you're writing code for just for open source.
01:05:17 That might be weird, right?
01:05:18 Whereas like the startups are like, oh my gosh, that's so amazing.
01:05:20 I can't believe, you know, or the big tech company.
01:05:22 We're trying to move to open source.
01:05:24 So that's great.
01:05:25 You can be one of our advocates.
01:05:26 So yeah, I suppose that it probably varies a lot as well in there.
01:05:29 All right.
01:05:30 Well, I would love to talk more about this because there's a, kind of cool ideas you two put in there, but I think we have to leave it at that.
01:05:37 Let me ask you the two quick questions before I let you out here.
01:05:40 If you're going to write some code, do some data, science data analysis, what editor do
01:05:45 you use these days?
01:05:45 I use RStudio, although what are my development goals?
01:05:48 I'm actually starting to use the Vim as the editor within it.
01:05:52 So I'm trying out that.
01:05:54 But yeah, I've been, I've been using RStudio for, for years now.
01:05:57 Although I also heard, what is it?
01:05:58 Is it Visual Studio?
01:05:59 Like now SportsR and like one of my teammates was trying that out and really liked it.
01:06:03 Yeah.
01:06:03 Probably VS Code.
01:06:04 Yeah.
01:06:04 That's awesome.
01:06:04 VS Code.
01:06:05 Yeah.
01:06:05 And Jacqueline?
01:06:05 So I'm a 50, 50 split between RStudio and Visual Studio Code.
01:06:09 So RStudio for anything R related, literally anything else, including like just notes to myself,
01:06:14 Visual Studio Code.
01:06:15 Yeah.
01:06:16 Awesome.
01:06:16 And then notable libraries out there for data scientists, not necessarily something super popular,
01:06:21 but you're like, oh, this package is really awesome.
01:06:22 People should know about it.
01:06:24 Do they have to be Python libraries?
01:06:25 No, they don't be Python.
01:06:27 No, there's more of a data science topic.
01:06:31 So it could be running across the board.
01:06:33 Well, I will mention one of the libraries that I created when I was consulting for T-Mobile.
01:06:38 It's called Load Test and it's for R.
01:06:41 Me and the T-Mobile team made it and it's to help you if you're making an API in R using
01:06:47 the R library plumber, which is great.
01:06:49 You can use the Load Test library to test it to make sure that your R model will be able
01:06:55 to handle the load.
01:06:55 Okay.
01:06:56 Awesome.
01:06:56 Yeah.
01:06:56 Very cool.
01:06:57 Emily?
01:06:58 I have so many.
01:06:59 Now I'm wondering if I should, you know, say my own package as well.
01:07:01 Do it.
01:07:02 Stop from Boat.
01:07:03 Stop from Boat.
01:07:04 I'll briefly share.
01:07:05 I use it less now, but I use it a lot at my last company, Funnel Join for like analyzing
01:07:08 sequences of events.
01:07:09 You're like, all right, who like came to the website?
01:07:11 1% of people who visited the homepage then bought a subscription.
01:07:14 But what about if we want that within two days?
01:07:17 So that's one.
01:07:18 But another package, R has so many packages that I like.
01:07:21 So one thing that I'm very excited about, which is like sort of hot, it's been in development
01:07:26 for a while and in pieces, but is tidy models.
01:07:29 So we're rethinking of how to do modeling in R with a brand new website out now too.
01:07:33 So I think it's tidymodels.org.
01:07:35 So I'm excited about that.
01:07:36 And then finally, just when the janitor package is a fun one for if you do cleaning data, it
01:07:42 just has all these functions for like you import a data set and there are spaces in the names
01:07:47 and like weird capitalizations and like weird like characters that make it hard to work with.
01:07:52 It has a function like clean names and it will just fix all of those for you.
01:07:55 Oh, that's cool.
01:07:55 Yeah.
01:07:56 And I think there's a PyGenitor as well.
01:07:58 I'm not sure if it's directly the same, but so people in Python, they do PyGenitor.
01:08:02 You'll also go and throw one out there on the Python world for folks.
01:08:05 There's this thing called MissingNo, MissingN-O.
01:08:09 It's a visualizer for missing data.
01:08:12 So you just have a Pandas data frame and you throw it at it and it'll draw you like a big
01:08:16 cool graph of visually where your data is filled in, where it's missing and all these sort
01:08:20 of like correlations of you're missing this data, you're probably also missing that data.
01:08:24 It's super cool.
01:08:25 Yeah.
01:08:25 There's actually an R1 for that, which is Manier.
01:08:28 I never know how to pronounce that, but it's also for like missing, yeah, Manier for missing
01:08:33 data.
01:08:33 Awesome.
01:08:34 Yeah.
01:08:34 Yeah.
01:08:34 That seems super valuable.
01:08:35 Just get a quick, like I've got all this data loaded up.
01:08:37 Let me just look at it.
01:08:39 Yeah.
01:08:39 Visually.
01:08:39 Yeah.
01:08:40 Cool.
01:08:40 Yeah.
01:08:41 Yeah.
01:08:41 So tell us how people can get your book.
01:08:42 Our book is online.
01:08:43 You can buy from the Manning website, who's our publisher.
01:08:46 And we actually have two URLs because we had a disagreement about this.
01:08:49 So we have the professional URL.
01:08:51 Do you want the professional version of the book?
01:08:53 Yeah.
01:08:53 Yeah.
01:08:53 DataSciCareer.com.
01:08:57 And then we have the fun version of the book, which is at bestbook.cool.
01:09:02 And now those will take you to the same webpage, but know if you click bestbook.cool, you're
01:09:06 getting the fun version of the website.
01:09:07 Oh, yeah.
01:09:08 You're getting DataSciCareer.com.
01:09:09 It's the professional one.
01:09:10 Yeah.
01:09:10 And maybe we should have people guess like which one of us, which one of us is the fun
01:09:15 one, which one is the four serious.
01:09:16 Exactly.
01:09:17 Put it in the show notes.
01:09:18 Put it in the comment section at the bottom of the show page.
01:09:21 Awesome.
01:09:22 Well, Jacqueline, Emily, it was really great to have you on the show.
01:09:25 And I can certainly recommend your book.
01:09:27 It's spot on.
01:09:28 It covers a bunch of great topics.
01:09:30 People ask me about careers all the time, and I always want to have good advice to give
01:09:34 them.
01:09:34 And so here's definitely something they should check out.
01:09:36 Thank you so much.
01:09:37 Thank you so much.
01:09:38 Yeah.
01:09:38 You bet.
01:09:39 Yep.
01:09:39 Bye.
01:09:39 Bye.
01:09:39 Bye.
01:09:40 Bye.
01:09:40 This has been another episode of Talk Python to Me.
01:09:43 Our guests on this episode were Emily Robinson and Jacqueline Nolus, and it's been brought
01:09:49 to you by Kite and Linode.
01:09:50 Kite is the smart AI-powered autocomplete for your editor.
01:09:54 And the more powerful your editor is, the more effective that you are.
01:09:57 Get Kite for free at talkpython.fm/kite.
01:10:01 Start your next Python project on Linode's state-of-the-art cloud service.
01:10:06 Just visit talkpython.fm/Linode.
01:10:09 L-I-N-O-D-E.
01:10:10 You'll automatically get a $20 credit when you create a new account.
01:10:13 Want to level up your Python?
01:10:16 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.
01:10:20 Or if you're looking for something more advanced, check out our new async course that digs into
01:10:26 all the different types of async programming you can do in Python.
01:10:29 And of course, if you're interested in more than one of these, be sure to check out our
01:10:33 Everything Bundle.
01:10:33 It's like a subscription that never expires.
01:10:35 Be sure to subscribe to the show.
01:10:37 Open your favorite podcatcher and search for Python.
01:10:40 We should be right at the top.
01:10:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the
01:10:47 direct RSS feed at /rss on talkpython.fm.
01:10:51 This is your host, Michael Kennedy.
01:10:52 Thanks so much for listening.
01:10:54 I really appreciate it.
01:10:55 Now get out there and write some Python code.
01:10:56 I really appreciate it.