#262: Build a career in data science Transcript
00:00 Has anyone told you that you should become a data scientist?
00:02 Have you heard it's a great career?
00:04 In fact, data scientist is the best job in America according to Glassdoor's 2018 rankings.
00:11 That's great, but how do you get a career in data science?
00:14 And once you've landed that first job, how do you find the right fit?
00:17 How do you find the right company?
00:19 And how do you get more deeply involved with the community as you grow in that career?
00:24 I've brought two great guests, both highly successful data scientists, on the show today who have been thinking deeply about this.
00:30 Jacqueline Nolas and Emily Robinson are here to give you real-world, actionable advice on getting into this rewarding career.
00:37 This is Talk Python to Me, episode 262, recorded Wednesday, April 22nd, 2020.
00:57 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,
01:02 and the personalities.
01:03 This is your host, Michael Kennedy.
01:05 Follow me on Twitter where I'm @mkennedy.
01:07 Keep up with the show and listen to past episodes at talkpython.fm.
01:11 And follow the show on Twitter via at Talk Python.
01:14 This episode is sponsored by Kite and Linode.
01:17 Please check out what they're offering during their segments.
01:19 It really helps support the show.
01:21 Jacqueline, Emily, welcome to Talk Python to Me.
01:24 Thank you.
01:24 Thank you.
01:25 Excited to be here.
01:26 I'm excited to have you both here.
01:28 Really excited to talk about this topic.
01:30 I think one of the things that a lot of listeners out there can benefit hugely from
01:35 is how do I get started in programming?
01:39 How do I get started in data science?
01:41 How do I get started in this overall sort of Python career?
01:44 And there's so many different paths and ways you can go.
01:49 You could go get a four-year degree.
01:50 You could drop out of college and do a startup.
01:53 What is the right path?
01:55 And what is some guidance around there?
01:57 What are some of the trade-offs?
01:58 And so you both have been writing about this lately.
02:02 And this is really, really good work that you're putting out.
02:05 So I'm excited to talk to you both about that.
02:07 Great.
02:07 Yeah.
02:08 So we just published our book, Build a Career in Data Science.
02:12 We've been working on this, I think, almost two years now.
02:14 Yeah.
02:15 Two years since you reached out to me.
02:17 Yeah.
02:17 And so two years, a lot of work, a lot of talking to people in the field.
02:21 And so it's really been great to finally get it out there and talk to people and watch people
02:25 actually get help by it.
02:27 So yeah, we love talking about it.
02:29 Awesome.
02:30 Well, I'm really excited to have you all here to talk about it.
02:32 But before we get to that, let's maybe do a little bit of a meta thing.
02:36 I always ask this question on the show, but it's a bit meta this time.
02:39 It's just how did you get into programming in Python?
02:41 Emily, you want to go first?
02:42 Yeah.
02:43 I don't know if I can admit this on this podcast, but I actually don't program in Python
02:46 anymore.
02:47 I program in R day to day.
02:49 I have programmed in Python.
02:50 So how I got started was back in college.
02:53 I did Python in one computer science class, but most of the programming I did was in R in
02:57 the statistics program.
02:59 So I was lucky enough to go to Rice University when Hadley Wickham was a professor there who,
03:03 for your listeners who don't use R, is a very famous R programmers, contribute a lot of
03:08 the big packages to it.
03:09 So that's how I got started.
03:11 And I kept doing it in grad school, which I got my master's in organizational behavior.
03:15 After that, I went to a data science bootcamp called Metis, which was all in Python.
03:20 So, you know, sort of up my Python skills there and then got started working in data science
03:24 and industry.
03:25 Yeah, very cool.
03:26 Declan, how about you?
03:27 Okay.
03:27 So like Emily, most of my work is in R, although I do some Python too.
03:31 But okay.
03:32 So my background, I did an undergrad and master's in math.
03:36 And then like, I really want to help companies use math to solve problems.
03:40 This is before the term data science existed.
03:41 So I went out in industry, did some what is now data science, but I didn't know it at the
03:46 time.
03:46 Went and got a PhD because I wanted to get some more technical skills.
03:50 And then now I work as a consultant helping out companies.
03:52 Awesome.
03:53 What's your PhD in?
03:54 Industrial engineering.
03:55 Yeah.
03:55 But actually, so how I actually started using Python, I think the first Python project I
03:59 ever did was there's a style of games from like the 80s called roguelikes, where you have
04:04 like your little asterisk symbol and you're walking around like the computer screen trying
04:07 to fight monsters and stuff.
04:08 And like the monster might be the letter M.
04:10 And I wanted to make one of these.
04:12 It's like a mud, but it has like some visual representation.
04:15 Okay.
04:16 Yeah.
04:16 Awesome.
04:16 Yeah.
04:16 And so it just like, it gets weirder.
04:18 So I wanted to make one of these and I wanted to make one of like when you're on a river and
04:22 you're like tubing and you're sitting there, I want your character, your little ad symbol
04:25 to be like you floating around the river and you like see horses and stuff.
04:28 So anyways, Python was the language that had the most straightforward library for making
04:32 one of these roguelike games.
04:33 And so I spent like two weeks coding up this tubing simulator and then I got bored of the
04:38 project and left it.
04:39 But that was my first time actually using Python.
04:41 Yeah, it's cool.
04:42 And these little personal projects are super valuable for getting into programming because
04:46 you just go through and say, well, I'm learning about loops.
04:49 So I'm going to write like some different kind of loops.
04:50 Like no one learns that way, right?
04:52 Not really.
04:53 Yeah.
04:53 So that's one of the chapters in our book.
04:56 We have a whole chapter on, hey, you know, it's really good to learn things this way.
04:59 And you can actually make a portfolio of projects that you then can use to help
05:03 you get a job.
05:04 And for me, one of the, besides making tubing simulators, one of the projects I actually
05:09 did was to learn neural networks.
05:11 I generated a neural network that would create offensive license plates that would get banned
05:17 by the state of Arizona because I had a data set of all these license plates.
05:20 And that ended up being the basis for like an extremely valuable consulting project where
05:26 I ended up doing natural language processing using the same stuff I learned from that offensive
05:29 license plate thing.
05:30 That is so cool.
05:31 It's just such a fun and playful and kind of silly project.
05:34 But then, yeah, what I found really interesting in programming in general is you have these
05:40 two different realms, you know, like think hedge fund or Air Force, right?
05:45 But the fundamental thing you learn and the skills you gain to solve or work in those areas, it's
05:51 almost exactly the same.
05:52 It's just like, what layer is the specialty that they're working in?
05:57 You know, how do you work with trading?
05:58 They care maybe more about like timing versus, I don't know, visualization or whatever.
06:03 But it's blown my mind like how similar stuff is like that.
06:07 Like I created this fun thing for license plates.
06:09 And then it turns out to actually let me, I don't know what exact project you were working
06:14 on, but like something that obviously was probably not that, right?
06:17 It was a real thing someone paid me for as opposed to a license plate thing which gets me some blog post
06:22 views.
06:22 Yeah.
06:23 Just a very short side note.
06:25 Did you see that somebody thought they were going to be so, so clever and get out of all
06:29 the camera, like camera speed traps and stuff that red lights, red light cameras and whatnot.
06:36 So they got their license plate to say null, N-U-L-L.
06:39 And they thought like that would just trigger the database to think nothing was there.
06:43 But they started getting tickets for every faulty piece of data that was in.
06:48 Oh, no.
06:50 There was nobody, like there wasn't a license plate properly.
06:53 It was null.
06:54 So they started getting thousands of dollars of tickets that weren't theirs, but that was,
06:59 they're now registered for null, like across the board.
07:01 It's upstate.
07:02 I care.
07:03 Playing with fire.
07:04 Oh, that backfired bad.
07:05 Anyway, that sounds really, really fun.
07:07 So Emily, what are you doing today?
07:09 Like day to day, what do you work on?
07:11 Yeah.
07:11 So I work as a senior data scientist at Warby Parker, which makes eyeglasses and out contacts
07:16 as well.
07:17 You can get online and well, you could in previous times get in stores as well.
07:21 But that, of course, like many other companies, we've closed all our retail stores at the moment.
07:25 Yeah.
07:26 We have a Warby Parker here in Portland and I almost went in it the other day, but not anymore.
07:30 Yeah.
07:30 But still online, right?
07:31 Yeah.
07:31 Still online.
07:32 Still online.
07:32 Yeah.
07:33 So I joined there in December.
07:34 So about five months ago now on the data science team, which is a centralized team that works
07:40 with departments across the company.
07:42 So that's been really fun because previously I'd always been a data scientist that was embedded
07:47 with a team.
07:48 So when I worked at Etsy, the analytics department was centralized, but we were sort of paired with
07:53 one partner team.
07:54 So I worked with search my whole time.
07:55 And then my last job, I was part of the growth team.
07:59 So I reported to the VP of growth, not to the chief data scientist.
08:02 So this has been a new experience being on a fully centralized team where for one month I
08:07 might work with finance.
08:08 And then a couple months later, I'm working with product strategy.
08:11 That sounds super fun.
08:12 And you get to experience different kinds of problems and work with different teams
08:16 and technologies, I'm sure.
08:18 Yeah, exactly.
08:18 And so, you know, and the team tackles a wide range of projects.
08:22 So one thing we discussed in our book, right, is data science is a pretty broad field.
08:25 And so there's lots of different projects you can do.
08:27 So some of ours are making like a dashboard to view analytics and some other ones may be more
08:34 modeling problems or making machine learning product.
08:36 So that's been really interesting to get this breadth of things that we can work on depending
08:41 on what the team we're working with needs.
08:43 Yeah, sounds super cool.
08:44 Jacqueline, how about you?
08:45 I'm working as an independent consultant.
08:47 So I've spent the last couple of years working, you know, as my own company.
08:51 So helping out big companies like T-Mobile, Expedia, some smaller startups in the Seattle area.
08:57 And so this is pretty fun because, you know, like Emily was saying, by being a consultant,
09:01 I got to work on all sorts of different projects, whether it's taking machine learning models
09:05 and pulling them into production or helping a company figure out who are the most like active,
09:09 engage customers and how to, you know, think about targeting them differently.
09:12 So that's pretty great.
09:13 Unfortunately, it is not like the best time to be a consultant on your own right now.
09:19 You know, it's been a little dicey lately, but all in all, I've really enjoyed getting
09:22 to work with all these big companies on all these different interesting data science problems.
09:26 Yeah, it sounds really fun.
09:27 And it is nice to be able to pick who you want to work with and what projects you want
09:31 to take.
09:31 And you have a lot, a little more freedom, I think, to kind of go your own way.
09:35 But right now, I don't know, all bets are off.
09:39 It's yeah, I think it's tricky to be a freelancer right now because you feel it, all these changes
09:44 and all these pressure, you feel it immediately.
09:46 Right.
09:47 But if you work at, I don't know, some large company, you might not feel it right away,
09:52 but then, you know, maybe that company goes under and then all of a sudden you don't have
09:56 all the connections you had as a freelancer.
09:58 Right.
09:58 Maybe Expedia dries up well because travels down, but, you know, maybe some other companies
10:04 like, hey, we got some more work.
10:05 Why don't you come work for us?
10:06 Right.
10:06 Whereas if you work at a company, you don't necessarily cultivate those connections as
10:11 much.
10:11 Yeah.
10:11 And I think a lot of people tend to like, kind of make working as a freelancer, kind of like
10:16 a, like a cool thing that like, oh, when you're really a big shot, you get to like work, you
10:20 know, just as an independent consultant.
10:21 But it's, so I think a lot of people kind of like aspire to have that sort of a job, but
10:25 it's really hard.
10:26 It's really hard because as you're saying, there's a lot of instant, you know, if there are changes
10:30 in the market, you feel it instantly.
10:31 And, you know, half of your job is going out and finding new clients.
10:36 Making deals, trying to like work with stakeholders.
10:38 None of that has to do with programming or technical stuff, right?
10:42 You've got a, you're almost in marketing for yourself as a bit.
10:46 Right.
10:46 So like people think, oh, I want to be a freelancer.
10:48 So I don't have to do all the boring stuff.
10:50 I can just do the data science.
10:51 And it's like, no, you actually have to do more of the boring stuff.
10:54 So it's, we actually, we have a chapter in our book about, okay, what do you do once you
10:58 become like a senior data scientist and you're looking at the next steps.
11:00 And one of the paths we've discussed is the consultant path, which has some perks to it, but also
11:05 have some serious risks and downsides.
11:07 Yeah.
11:07 I also have to share.
11:08 So how we split writing the book was we each like, were the primary writer for half the
11:14 chapters and the other person edited it.
11:15 And so Jacqueline was writing this chapter she was just talking about.
11:18 And the first version of the independent consultant one was so negative.
11:22 Like she was basically like, never do this.
11:25 I'm like, Jacqueline, I think we need to like pull back a little bit.
11:27 Like, I understand.
11:28 You definitely want to share the cons.
11:30 Well, it's not that, yeah, it's not that I think consulting is a bad thing to do.
11:36 It's that I've had so many people come up to me and be like, oh, that sounds so cool.
11:40 I want to do that.
11:40 How do I do that?
11:41 And I feel like I'm the, like the old woman in front of the cave, like, you know, I think
11:48 I've even came a little too hard, but yeah.
11:50 Careful what you wish for.
11:52 You might get it.
11:54 How funny.
11:55 Yeah, it's, it definitely has this aura of like, hey, you're your own boss.
11:59 You can just do whatever.
12:00 But yeah, there's, there's a lot of work to be done there.
12:03 And I think it also makes sense at different stages in your career, for sure.
12:07 Let's start this whole conversation off with a question about how you both got your first
12:13 job.
12:13 So I heard about what you're doing now.
12:15 And I have this theory.
12:17 I've only really tested it.
12:19 I guess I've tested with a few people.
12:21 I was going to say, I really only tested with myself because I only know my career that
12:24 well, but I've, I work with some people who are interns and then found their way through
12:28 like really successful stuff and sort of saw that as well.
12:30 And my theory is in the developer data science space, the first job is the hardest.
12:39 Because once you've had one job, you have a portfolio of work, you have experience, you
12:44 can say, I've done this thing and you have a problem similar to like, here, I've done this
12:48 thing with license plates.
12:49 It's technically not license plates, but that's basically what you're asking to do.
12:52 So it's not a matter of convincing a person in an interview who can do it because you
12:57 can just show them, look, this is what I built and they're happy.
13:00 But in the very beginning, it's such an unknown people.
13:03 So I think getting that first job is probably like one of the biggest steps to kind of going
13:09 down this path.
13:10 So I wanted to ask you too, how'd you get your first jobs?
13:13 Emily, you want to go first?
13:14 Yeah.
13:15 So I mentioned my first job was Etsy.
13:17 And so, all right.
13:19 So it's like take a time machine back to fall 2016.
13:22 So I finished the Metis Data Science Bootcamp.
13:25 But I interviewed.
13:26 So like one thing they did was there was a demo day for your final project and there were
13:30 some companies hiring there.
13:31 But yeah, I ended up at Etsy.
13:32 And actually how I got that initial, I don't know if it would have happened anyway, but what
13:37 helped initially was I actually knew someone, Hillary Parker, used to work there as a data
13:41 analyst.
13:41 And she had since left, but she still knew people who worked there.
13:45 So she offered to introduce me to a manager there.
13:47 And he took a look at my profile and said, yeah, I can refer you.
13:50 So definitely a network is a big part of it, even for later jobs, although I definitely agree
13:57 the first, the hardest.
13:57 And then I think what helped there was like Etsy was a really great company.
14:03 I really enjoyed working there.
14:04 And the title at the time I had was data analyst.
14:07 It wasn't data scientist.
14:08 And now actually that team since then, shortly after I left, their titles are now data scientists.
14:12 But one thing we talk about in the book is avoiding this.
14:15 People can get very attached to the title data scientists, and those can sometimes be harder
14:21 to get, or they're very attached to like, oh, I need to go work at like Google or Facebook
14:24 or Airbnb, right?
14:26 Like this.
14:26 I would say like Etsy probably falls under there, but like, you know, very well known like data
14:30 science company.
14:30 And I do think like you are going to get such valuable experience from almost like any
14:35 job.
14:36 If you're working in like data and whether you're called data, data analyst or, you know,
14:41 research analyst or product analyst, like if you're doing code, if you're working with data,
14:45 if you're working with stakeholders, it's that can be a really great first experience.
14:49 And like you said, just having that on your resume can open up a lot more doors.
14:53 And it's very common, especially in the tech field for people to switch jobs every, you
14:58 know, one to three years.
14:58 So it's not like you're signing up there forever.
15:00 That's how I got my first job.
15:02 Yeah, cool.
15:03 And one of the things I want to talk to you all about is the trade-offs of different types
15:07 of companies to work at, but we'll get to that.
15:09 I do think this idea of having someone to introduce you or someone who knows you or someone who knows
15:15 someone, you know, like a couple of layers removed is really valuable.
15:19 When I was working at companies where we were doing a lot more hiring, it was,
15:23 does anybody know somebody who can do this and they can recommend who's good?
15:28 If the answer was no, then maybe it becomes a job search.
15:31 Then maybe it becomes a job posting.
15:33 But it was not first a job posting.
15:35 It was first, anybody knows somebody great that can do this that we're in need of.
15:39 And if somebody knew somebody, then we probably just would go out and talk to them, you know,
15:43 first.
15:43 And I don't know if that's fair or not, but that's just, it's how it works.
15:48 Because if you put a job posting out there, you could get a thousand, not a thousand,
15:51 a hundred applicants.
15:53 You've got to go through.
15:54 And, you know, I've had plenty of people I've interviewed where it's like, I can do this
15:59 thing.
15:59 Like, okay, how about this?
16:01 How about we turn on screen sharing?
16:02 And what was it at the time?
16:03 It was like, go to a meeting or something.
16:04 I turn on screen sharing.
16:06 And why don't you just write a real simple program that does that?
16:08 It should be like five lines of code.
16:09 Anyone could do it.
16:10 Who's like, couldn't do it.
16:12 You're like, okay, you clearly have not been doing this for two years because this
16:16 would be like the first week of a class that covered this topic.
16:19 So, you know, it's just, it's really tricky.
16:21 So I do think, you know, just people out there listening, like cultivate those connections
16:26 as much as possible, even if it's not, it's not a perfect meritocracy, but that's just the
16:31 way it is.
16:31 Right.
16:31 Yeah.
16:32 And I mean, I think there's lots of advantages to a network beyond like helping get a job.
16:36 It's like finding a community of other people who are having like, maybe if you're the only
16:40 data scientist at a company.
16:41 And like you said, I mean, it's partly when you sort of said like, maybe not thousands.
16:45 I mean, certainly the large companies get like thousands or tens of thousands of applications
16:49 for a data science position.
16:51 I remember Angela Bassa, who's one of our interviewees in the book, she posted, I'm not
16:55 sure it was data science, maybe it was a data analyst position at our company.
16:58 And she actually closed it after four days because they'd gotten a thousand applications.
17:02 I've got a backlog.
17:03 We're just going to have to go through this.
17:04 Yeah.
17:05 Yeah.
17:05 Yeah.
17:05 This portion of talk Python to me is brought to you by Kite, the smart AI powered autocomplete
17:12 for your editor.
17:13 As developers, our choice of editor is central to our work.
17:17 The more powerful and effective that that editor is, the more effective that you are.
17:22 That's why I'm excited about Kite.
17:24 Kite is a free plugin for your code editor that gives you ML powered autocompletions and
17:29 documentation.
17:29 Chances are it works with your editor of choice.
17:32 Even if that editor has existing autocomplete features, the list includes PyCharm, VS Code,
17:38 Atom, Sublime, Vim, and more.
17:40 And Kite runs locally.
17:42 So your code is private with no cloud or internet connection necessary.
17:46 And the Kite is 100% free.
17:48 So try it today at talkpython.fm/Kite.
17:51 Kite, K-I-T-E.
17:52 And CL Kite can help you be more effective with your Python code.
17:58 Declan, how'd you get your first job?
18:00 Okay.
18:00 So first off, I feel old because I was like, oh, in 2016.
18:05 So my story starts in 2008.
18:07 So I was finishing my master's in math.
18:10 And so, you know, because it was much earlier in this whole field.
18:13 Again, data science wasn't really a term yet.
18:16 So it was much harder to just like, you know, I remember going to like monster.com and searching
18:20 mathematician and getting just lots of math jobs.
18:22 I'm like, I don't, that's not what I want.
18:23 Like I don't want to be an actuary and I don't want to teach math.
18:27 So what do I, what can I do?
18:28 Yeah, I had no idea.
18:29 Well, and I knew there were jobs out there.
18:31 I just didn't know how to find them.
18:32 And one of my very good friends had just started the year before working at a company.
18:35 And he's like, oh, you know, we actually have a department that hires people with math degrees.
18:40 You should apply.
18:41 And so I applied and the interview process was like a two-day thing where they bring you on
18:45 site and they do a bunch of interviews.
18:47 And so at that point in my, by my master's, I had some internships.
18:51 I had some research projects.
18:52 So to your point of like, it's hard when you haven't had a first job before.
18:55 I think you can have things like internships or like, you know, that license plate or,
18:59 you know, any like project you can hang a hat on is something you can talk about.
19:03 So I had some of that.
19:04 Yeah, that's cool.
19:04 And I do think actually now is even easier than back then with stuff like GitHub and open source.
19:10 You don't have to be, you know, employed to create a cool project that people can start
19:15 to like or share or something.
19:17 Right.
19:17 So the opportunities are certainly there.
19:19 Right.
19:19 And I think that compared to before when I was doing it, no one knew what like an analytics
19:23 person needed for skills.
19:24 Was it math?
19:25 Was it programming?
19:26 Now we've really got a much better idea of what you need to have on your resume.
19:29 And so it's like a two day thing that ended with like a night where they took us to a bowling
19:33 alley, tried to get us to drink a lot to get more excited.
19:35 I think there's a lot of ethical deviousness.
19:37 Anyway, so I am just I want to let me just like just to finish the story.
19:42 I ended up taking the job.
19:44 And the job when they talked about it during the interview process was like, oh, you're
19:47 a, you know, analytics, business analytics team member.
19:50 You're going to do forecasting.
19:51 You're going to maintain the models.
19:53 And I'm like, oh, cool.
19:54 I've worked in forecasting before.
19:55 I would love a job where I help build cool, interesting new forecasting models.
19:58 And then like on the job, it came very clear that what they actually wanted me to do is rerun
20:03 the forecast each month in SAS, copy and paste it into Excel, copy and paste the chart
20:07 from Excel into PowerPoint, and then stand in front of people and read off the numbers.
20:10 And that was like a huge shock because that is not what I wanted to do in a job.
20:14 And it basically took me a year of working there before the job became kind of what I
20:18 wanted it to be.
20:18 But also I had given up at that point and moved on to a different job.
20:21 So for the people who when you do get that first job, if it is not what you expect, that's
20:27 probably, you know, I talked to a lot of people, a lot of people we interview in our book have
20:30 the same problem when they get to their first job.
20:32 They're like, oh, my God, this is not what I was expecting at all.
20:34 And by your second job search, you have a much better understanding of really what it is
20:38 you're looking for and what actually exists in industry versus there are no jobs.
20:41 Like there's no job in industry that's just writing math theorems or whatever.
20:45 That's right.
20:46 Well, also, I think sometimes you can get into those situations because the company thinks
20:52 that's how it has to be done.
20:54 Like we've done this and we need somebody to keep doing that.
20:57 Like we only knew how to run Excel and then export this stuff and then do this weird thing
21:02 and then manually fix it up.
21:03 And then you could tell us like what the picture says.
21:05 I think that's true.
21:06 Companies naturally have a tendency to that.
21:07 And I think at the time coming out of a math master's, I'm like, oh, I want to do is create
21:12 new and exciting mathematical stuff.
21:14 So I have this like affinity for like having to change the world on my job.
21:18 Yeah.
21:18 And I think, you know, I have since then in my career done a lot of good by constantly coming
21:22 up with new forecasts and new methods of doing things.
21:25 But also it is totally fine to have a job where you spend 80% of it just pressing go again.
21:31 And then the 20% doing something interesting if you like that.
21:33 Right.
21:34 Like, so there's like, I think I kind of like held my nose up against those kinds of jobs,
21:37 but I think they're pretty good.
21:38 I've hired people on teams.
21:40 I've had a lot of people who come straight from academia have the same problem I had.
21:44 Like, oh, wow.
21:45 You want me to copy and paste each number like individually?
21:48 That could take me 10 minutes.
21:50 Like, you know.
21:51 It's got to get done.
21:52 Well, my thought was, you know, maybe the job starts out that way, but then you're like,
21:56 well, I can, I can do some program.
21:58 We actually, we don't really need to load Excel and copy and paste.
22:01 I could use something like PyOpenXL where I could actually write code that talks to the
22:06 database and then runs a report and then just puts it in there.
22:09 Right.
22:10 So you could like slowly take away these manual steps by starting to create like cool pipelines
22:14 of like processing and automation.
22:16 And they didn't ask anyone to do that because they thought that was basically impossible.
22:21 Right.
22:21 And so I feel like a lot of people can end up in these situations where there's like
22:26 one workflow that you are hired for, but you, you know, as people who can write code, we're
22:31 kind of magicians, right?
22:33 They can kind of like magic stuff into existence and you can solve some of these problems and
22:38 they would probably much rather have a single button click or something that's automatic every
22:42 day, but they just, they couldn't create it or put it in place.
22:45 Yep.
22:45 I think that's true.
22:46 And I think depending on the different size company, you may get more opportunities to
22:50 do that.
22:50 And depending on your appetite of, I want to code an interesting thing in Python to try
22:54 and automate the Excel or I'll just press Excel.
22:56 I'll hit, I'll hit go for five minutes each day.
22:59 That's fine.
22:59 Exactly.
23:00 Yeah.
23:00 I do think that's where the company culture or your manager can be important though, right?
23:04 Cause I can imagine some companies that like have a lot of bureaucracy would just be
23:07 very uncomfortable with this idea.
23:09 They're like, no, we've always done it this way.
23:11 Or maybe like companies that are like, or the government or like companies that work with
23:14 the government.
23:15 So I do think like, it's important to be, and that also kind of the reason we wrote
23:20 this book was because, you know, we felt there's a lot of technical guidance out there, but not
23:25 on these other really important skills you need.
23:27 And I do think, you know, one of those skills, if you want to change the practice of a company,
23:31 you can't necessarily just be like, you know, email it to them one day and have that be done.
23:35 You need to like, you know, talk to them, figure out their, like, you know, what kind of scares
23:38 them about this change, like do change management and other things.
23:42 And I think that's like, to not underestimate the importance of things like communication
23:46 and, and working with stakeholders when thinking of things like technological solutions, even
23:51 if to you, it may seem really obvious that like, oh, of course this is like going to be a hundred
23:54 percent better.
23:55 Yeah, absolutely.
23:57 And Jacqueline, just to make you feel better, I got my first programming job when I was in
24:01 1997 when I was working on my PhD in math.
24:04 So, you know, you can always go farther back.
24:08 All right.
24:11 Well, one of the interesting things that you discussed was that there's this term data science,
24:18 but in a sense, there's almost like three branches of data science, kind of a little bit like in
24:24 software development, you'd say, hey, I'm a programmer.
24:27 I'm like, oh, cool.
24:28 Could you build me a mobile app?
24:29 Like, no, I have no idea how to build you a mobile app.
24:31 I could build you a website.
24:32 And then someone else would go, I can't build a website, but I build a cool desktop app.
24:36 Right.
24:36 So, you know, what does that kind of partitioning look like in the data science space?
24:40 Yeah.
24:41 Who wants to jump in?
24:41 So, you know, and this is something that could be like one of the more controversial parts of
24:46 the books.
24:46 But I think like we people sort of come around to this, but how we divided it is in three areas,
24:51 which is analytics, machine learning and decision science.
24:53 And for example, one company that basically has this division and they wrote a great post on this
25:00 at Airbnb is Airbnb does analytics, machine learning, and they call it inference instead of decision
25:05 science.
25:05 But the idea behind this is analytics is basically like taking data and putting it in front of the
25:11 right people.
25:11 So just sort of showing the data that you maybe already have or going out, like maybe
25:16 going out and collecting it, but basically just, you know, maybe by making dashboards or showing
25:21 a report is just surfacing data to the right people, which is really valuable.
25:25 And then the next one, machine learning, is I think often what people think of when they
25:30 think of data science, which are things like, you know, creating the recommendation model on
25:35 amazon.com, right?
25:37 When you look at a product and it says like, you know, you may like these products or at Etsy,
25:41 we have the search ranking team, which is when you search Harry Potter, what of the 200,000
25:46 Harry Potter items do you show first, right?
25:48 And they don't pick randomly.
25:49 There's an algorithm that's based off of historical, how the items have behaved historically.
25:54 And then the final one is decision science.
25:56 So this is basically going beyond the numbers to help companies or people make decisions.
26:00 And also generally involves a lot of statistics because basically it's, we need to understand
26:06 how to quantify uncertainty.
26:08 So even though we know, for example, that, you know, the people who answered this, we
26:13 ran a survey and, you know, 80% of people said this.
26:16 Well, but we had a 50% non-response rate and maybe we know that more women than men didn't
26:22 respond.
26:22 So how do we adjust for that?
26:24 What's the uncertainty around this estimate?
26:26 Making a forecast, you know, as Jacqueline talked about, like that's decision science,
26:30 you know, there.
26:31 So those are the three main areas we have.
26:33 Right.
26:33 Our mailing list already is like skewed towards this audience.
26:37 And if we just ask the mailing list, hey, everybody, tell us what you think.
26:40 It's going to carry that bias forward or that slant forward unless we can somehow do more
26:47 to take care of it and whatnot.
26:48 Right.
26:48 Yeah, exactly.
26:49 And yeah.
26:49 Anything to add there, Jacqueline?
26:50 Yeah.
26:51 I would just say that I think a lot of people have like a preconceived notion that like one
26:55 of these types is more pure or like one of these types is like the better.
26:58 Yeah.
26:59 And I'm sure it's a real data scientist, not the ones who use Excel.
27:02 Exactly.
27:03 To get the title.
27:03 And it's like, you can see it in stacked overflow posts.
27:06 You can see it a lot in LinkedIn posts.
27:09 Like there's a lot of this idea.
27:10 Probably a lot on Reddit.
27:10 Yeah.
27:11 Oh, yeah.
27:11 Oh, God.
27:12 Yeah.
27:12 I said I was going Reddit data science, but like, yeah, there's definitely can be this
27:16 culture of like, you're not a real data scientist.
27:17 Like if you don't do machine learning.
27:19 Just real quick, I'll give you the pitch for why I think each one of these has the right
27:23 to be great and like, isn't the best one.
27:25 So, okay.
27:26 So I think the reason why people like the machine learning the best is you're like, oh, cool.
27:31 I get to use, you know, real time inferences.
27:34 I get to actually help.
27:35 So like a customer, when they go on their website, they actually, what happens to them depends
27:39 on what my algorithm did.
27:41 And like, it's pretty cool to be able to say that like, I actually improved everyone's outcome.
27:45 So my car drives down the street by itself.
27:47 Yeah.
27:47 Everyone can see that.
27:49 The decision scientist, you got to be like the company's detective, right?
27:52 Like the CEO, like, like high level people can come up to you and be like, yo, I have
27:56 this question.
27:56 Can you figure it out?
27:57 And you get to like put on your detective hat and go into data and really try and come
28:01 up with an answer.
28:01 So yeah, you get to like play detective.
28:03 And the, the kind of like analysis, the analyst role.
28:06 It's great because it's like those other two roles, your things can go terribly wrong, right?
28:12 You're, you're, you can be a detective and not find the killer.
28:14 Your machine learning model can ruin things for customers.
28:17 Like things can go catastrophically wrong.
28:18 Being an analyst, you're just here to help things.
28:20 You know, you're helping, you're keeping the company going.
28:22 It's like a more relaxed.
28:23 You're giving advice, but you're not making the decision.
28:25 Just this is what we, we found.
28:27 Yeah.
28:28 So it's like, yeah, it's like, it's helping everything run more effectively without the
28:33 like incredible amounts of stress of trying to get things right that the, you know, or trying
28:37 to build new questions, you know, research and development things that you have in the other
28:40 two fields.
28:41 So it's like more of a relaxed, but enjoyable job.
28:43 I'd also say like, so often there's so much low hanging fruit in the analytics side of
28:48 like things that companies aren't looking at that would really change their decisions if
28:52 you just surface these numbers.
28:53 And plus, like, I think sometimes people can look down and it's like, oh, that's like easy.
28:57 Like you're not using, you know, stats or machine learning.
28:59 Well, it's actually, you know, it can be really hard to like pull the right data sometimes to
29:05 understand when someone's asking you the question, like, hey, can you, oh, there was a great
29:09 tweet yesterday where someone is like, you know, stakeholder, like, can you pull this data for me?
29:12 And you're, you know, and you're like, yeah, sure.
29:15 Let me just pull from, you know, select star from ideal and pristine table that you think
29:19 somehow exists.
29:20 And there's actually a lot of work to elicit the true question they're asking.
29:26 So I'm probably underlying all of this is data wrangling.
29:29 Yeah.
29:29 I think all of the people have to do data wrangling and it's really just a skill like data wrangling.
29:34 I think trying to be able to explain what is happening in the data, like, so kind of the
29:38 input and output.
29:39 Like you really need all, you need that for all three of these jobs.
29:42 So if you don't, if you're not comfortable taking data, trying to figure out, like, you
29:46 know, put it in a way that you can then use it.
29:48 And if you, you aren't comfortable looking at some numbers and trying to say like, oh, well,
29:51 this number plus this number really means that any of these three jobs is going to be more
29:55 difficult.
29:56 Yeah.
29:56 How much does knowing how to talk to databases matter?
30:00 Like writing SQL queries or things like that?
30:03 Or can you get away without that?
30:05 It really matters.
30:06 Even so the, I see, I would say, the SQL ideas I've seen show up in every data science job.
30:11 And I mean, I don't know.
30:12 I haven't seen every data science job, but everyone I have seen.
30:15 There's that one that only works with CSVs, but besides that one.
30:18 But even if you don't actually work directly with SQL, the idea of taking two CSVs and joining
30:22 them somehow together and then filtering out the rows, like because so much of the data
30:26 in the world is stored in a tabular format, you really have to think, like understand how
30:31 SQL and like relational databases work.
30:33 And if you don't actually know exact SQL syntax, that's fine.
30:36 Like maybe, you know, the pandas, whatever, or the RDPly or whatever.
30:40 But like the just concept of thinking through tables is like, yeah, you need it everywhere.
30:44 Yeah.
30:44 Emily, what do you think about that?
30:45 Yeah.
30:46 I would definitely say it's, it's one of the foundational skills.
30:49 And the good thing is like the basics of SQL, you can pick up pretty quickly, like just
30:53 like how to select from a table.
30:54 And then, you know, you can grow as needed, you know, maybe if the data engineer is helping
30:58 you out.
30:59 But, you know, of course, if you can't, if you can't access any data, you probably can't
31:03 do much data science.
31:04 Yeah.
31:05 That's a really good way to put it.
31:06 But also that's not a hard skill.
31:08 I mean, it's not really a hard skill to learn.
31:10 Like, yeah.
31:11 It seems weird and hard if you've never seen it, right?
31:14 Like, how do I connect to it?
31:15 This connection string is really complicated.
31:17 Yeah.
31:17 But you're right.
31:18 It's not a big deal.
31:19 It's just something you got to learn.
31:20 Now, I guess thinking of these three different types, it's one of the things that struck me
31:26 and you pointed out one, there's two things.
31:29 One was that the machine learning role is probably a little more computer science-y because you're
31:36 taking code and you're putting it into production and it's real time.
31:38 You're probably fitting in with APIs that other people are talking to and you're building stuff
31:45 that machines talk to.
31:46 Is that accurate?
31:47 What do you think?
31:47 I would say that the machine learning is more computer science-y.
31:51 Yes.
31:51 A hundred percent.
31:51 You do really need to understand things like unit testing or load testing in ways that the
31:56 decision scientists and the other roles don't necessarily need as much.
31:59 Right.
31:59 HTTP status codes and JSON and all that potentially, right?
32:02 Yeah.
32:02 The risk of the machine learning engineer is that that actually becomes the risk.
32:05 The risk is if you're not careful, your job could just become software engineering.
32:09 I know a lot of machine learning engineers who, well, their company doesn't have that much
32:12 machine learning engineering to do at the moment.
32:14 So you're just going to be a software engineer and then that's not great.
32:16 But the converse is as a decision scientist, you have much more stats and like just building
32:22 the actual like models.
32:23 But if you don't have the work to do as a decision scientist, there's not reports, you know,
32:28 not super interesting models to build and questions to answer.
32:31 You might end up just doing dashboards or something that, you know, like any of these jobs kind
32:35 of have a risk of falling into something you don't like.
32:37 It's just a question of which way does the rock fall down the mountain or whatever.
32:41 I don't know if that's a real metaphor, but.
32:43 Some mountains.
32:45 So another thought that I had while we were talking about this is different.
32:50 The people in these different groups will have massively different exposure to like the
32:55 C-suite or the decision makers of the company at a high level.
32:59 I'm thinking of a large company, like 500 people or more, not like a startup.
33:02 But, you know, the analysis person could easily get called in front, you know, for like a board
33:09 meeting to help them decide, you know, how are things are going.
33:12 Maybe the decision scientist, it's not so likely the machine learning developer is like, well,
33:17 they've decided and then they were told you're going to build this model and here's what they're
33:21 hoping for.
33:21 Right.
33:21 It's, it's a different kind of, you would still be working with a lot of technical people,
33:26 but you have like different ways to grow within the company, I guess.
33:30 Is that a good way to think of it?
33:31 Yes.
33:32 I think that is absolutely the case that if you're in this, if you're an analyst or a decision
33:37 scientist, then you are much more likely to get to go to a CEO, like go in that meeting
33:41 and show some interesting data that can prove something.
33:43 If you're a machine learning engineer, usually you are building a product, like Emily was
33:47 saying, like you're building a recommendation engine.
33:49 And then there's some product person whose job it is just to be in charge of that product
33:52 and they get to go and have to see.
33:54 You only go to the C-suite if you're going to be like raked over the coals because you
33:59 wrecked it with your machine learning.
34:00 I recommend it wrong.
34:01 But that being said, I think a lot of people who are sufficiently technical are like, oh,
34:06 I wouldn't want to do decision science.
34:07 I really want to do machine learning because I don't want to have to deal with like convincing
34:10 people.
34:10 I just want to have to deal with cool data modeling or whatever, you know, machine learning
34:14 modeling.
34:15 But it turns out that as Emily is saying, to do those jobs well, you still have to be able
34:19 to talk to the software engineers and the data scientists who built the model and the product
34:23 person who needs to know if the recommendation is going to be good enough for the customer.
34:26 Like you still have to do lots of talking to be good at it.
34:28 It's just that it is less of a core tenant than it is of perhaps some of the other roles.
34:32 Yeah.
34:32 How does this affect early stage careers?
34:35 Right.
34:35 Like I can, I can see somebody who like Emily in 2017 just came out of a bootcamp and
34:41 they said, okay, you're going to go talk to the CEO of Etsy and the board and like help
34:45 them with this product.
34:46 You'd be like, oh my goodness.
34:47 Like what have I gotten myself into?
34:49 Like that would on one hand be awesome, but also terrifying.
34:51 Do they fit better at different stages of careers or does that really matter?
34:56 I think it probably doesn't matter as much because like for a company that's big enough
35:00 for that prospect to be kind of terrifying, like if my last company was a startup, so like
35:03 I talked to the CEO all the time, but basically felt like another coworker.
35:06 So yeah, for it to matter, like you're probably going to have more senior people, right?
35:10 Who, if they are going to like have someone present to the CEO, it's probably not going
35:13 to be the person who joined two months ago.
35:14 Also, the other thing we didn't really talk about is how much you're specialized into one
35:19 of these roles does depend on the company.
35:20 So often that's like the company size and maturity of the data science team, right?
35:25 So at certain companies, you may be like fully like a machine learning engineer, but
35:28 if you're the first data scientist at a startup, you're probably doing a mix of all of these
35:32 and you wouldn't go as in depth in any one of them, right?
35:35 Like a startup probably doesn't need someone who can handle hundreds of millions of items,
35:40 like recommendation items, like Amazon would, like you don't need that compute power, but
35:44 maybe you build like a simpler recommendation model.
35:46 And then you also play detective work and you also, no one actually knows what the sales
35:50 number are.
35:50 So you like finally make a dashboard.
35:52 Right.
35:52 You probably do a lot of growth at like an early stage startup, a lot of AB testing type
35:56 of work.
35:57 Yeah, exactly.
35:58 So I don't want to make it seem like, oh, every role like falls into like one and only
36:01 one of this, because you certainly can have roles where you're, where you're putting on
36:05 multiple of these hats.
36:06 I would also say that not only depends on the company you work at, you may do multiple, but
36:10 also you can during your career change.
36:12 I didn't do any machine learning up until like two or three years ago.
36:15 And then I switched over to doing that now.
36:17 So now I kind of do both, but like lots of people switch in lots of directions between
36:21 any of these three jobs.
36:23 And that, that is the thing that it is possible to do.
36:25 Yeah.
36:25 Yeah, for sure.
36:25 Yeah.
36:26 Chapter one interview, Robert Chang is over at Airbnb is a really good case study in this.
36:30 So he started more on like the analytics side and the decision science.
36:33 I was working at Twitter.
36:34 He then started to continue that work in Airbnb.
36:36 And then he ended up switching over to do machine learning.
36:39 And he actually has blogged about this.
36:40 And like, as part of that process, like he did need to up his skills a bit.
36:44 So for example, he'd previously done most of his work in R, but the teams that do machine
36:49 learning, like a lot of the libraries were built in Python.
36:51 So he actually has a repo where he talks, where he like put his deliberate practice for Python
36:56 and how he was going to learn that over a couple months.
36:58 So he can make the switch.
36:59 That's cool.
37:00 Yeah.
37:00 You can definitely switch.
37:02 I mean, I've definitely made big switches in my career as well, from like being terrified
37:06 of the web to only working on the web and stuff like that as well.
37:08 Yeah.
37:08 And I would just add, oh, sorry.
37:09 I would just add on that.
37:11 You know, I've talked to a lot of people who have wanted to switch and had trouble because
37:15 these jobs are a resource and the company has a finite amount of them, right?
37:19 So there's some companies where they just don't have any machine learning engineering.
37:22 And so if you really just would love to do machine learning engineering, you're going to be
37:25 in trouble because there's just none of those jobs available.
37:26 Or as Emily points out, maybe they have a couple of them, but like people who are super
37:30 senior are already working on them.
37:32 And some companies, like you're a startup and like they have way too much work they
37:35 could possibly do any of it, you know, all of it.
37:37 So you can do kind of have a lot of freedom.
37:39 And so sometimes if you want to make this transition and you're finding it difficult, you need to
37:45 switch companies.
37:45 This portion of Talk Python to Me is brought to you by Linode.
37:50 Whether you're working on a personal project or managing your enterprise's infrastructure,
37:54 Linode has the pricing, support, and scale that you need to take your project to the
37:59 next level.
37:59 With 11 data centers worldwide, including their newest data center in Sydney, Australia,
38:04 enterprise-grade hardware, S3-compatible storage, and the next-generation network,
38:10 Linode delivers the performance that you expect at a price that you don't.
38:14 Get started on Linode today with a $20 credit and you get access to native SSD storage, a 40-gigabit
38:20 network, industry-leading processors, their revamped cloud manager at cloud.linode.com,
38:26 root access to your server, along with their newest API and a Python CLI.
38:30 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically
38:36 get $20 credit for your next project.
38:38 Oh, and one last thing.
38:39 They're hiring.
38:40 Go to linode.com slash careers to find out more.
38:43 Let them know that we sent you.
38:46 Speaking of companies, in your book, you have a really interesting conversation about different
38:52 kinds of companies.
38:53 And I've been fascinated.
38:54 I've worked at almost all of these different types.
38:58 Early-stage startup, late-stage startup, probably.
39:01 Mass, quite, yeah, let's go with massive tech company.
39:04 But not a government contractor.
39:06 I've worked sort of subcontracting with them.
39:08 I've worked at most of these.
39:09 And a lot of those experiences are not really obvious if, say, you're in a boot camp and
39:14 you're just looking for a job.
39:16 You have been through the internals of these things.
39:19 So maybe you'll give us a flyover of the five different types of companies and maybe a little
39:26 bit of example about each.
39:27 What's the team like?
39:28 What's the tech like?
39:30 What are the pros and cons?
39:31 And so on.
39:32 Sure.
39:32 We realized very quickly that when we were writing our book that we needed some sort
39:37 of way to help people understand what is the actual job like.
39:40 And then we're like, well, it really is so different depending on which company you're at.
39:44 And so Emily and I kind of brainstormed five different companies we worked at.
39:48 And then we kind of came up with goofy alternative names for them.
39:51 But if you look at our LinkedIn profile, you can probably guess.
39:54 Don't give it away to Emily.
39:55 I love that you have an actual little...
40:01 custom logo for each one.
40:05 Yeah, that was all Jacqueline.
40:06 Yeah, and I thought about which fonts to use with which company.
40:09 Yeah, it was well done.
40:11 So the five companies...
40:13 So we have MTC, which MTC is like your Google, your Apple, your Microsoft, these companies,
40:18 that's just like giant tech company.
40:20 So they're rich.
40:21 They're so big that like each part of the company uses a different type of tech.
40:25 You know, so they have lots of advanced stuff.
40:27 But because they're so big, you may not actually...
40:29 Your stuff may not link up with...
40:31 You know, if you're working on Google Maps, you may have nothing to do with a Google self-driving
40:34 car sort of a thing.
40:35 The second company is Handbag Love, which is just some company that's like a retail company,
40:39 you know, like a Nordstrom, DSW, one of these companies that is big.
40:44 They've been around for a while.
40:44 They use data science, but that's not like their thing.
40:46 But they're not a tech company.
40:48 Right.
40:48 Right.
40:49 And so I really like working at those kinds of companies because you got to like go in
40:53 and really do a lot because no one's there to tell you, oh, you can't use Python.
40:56 You have to use R or whatever.
40:58 Yeah, exactly.
40:58 There's no...
40:59 Like, let me talk to other software developers.
41:01 There are no...
41:02 Yeah.
41:02 There are none.
41:03 Like, okay, well, I can just...
41:05 These are the problems.
41:06 Please solve it with technology.
41:07 These are your requirements, right?
41:09 Yeah.
41:09 There's no rules and restrictions.
41:11 And so then we have this SegMetra company, which is like some company with like a hot new
41:15 idea for a startup.
41:16 And they're...
41:17 You know, it's really just like a classic startup where it's like there's so many things
41:20 that need to be built at once that like everyone just kind of in a constant panic
41:23 attack.
41:23 You get to do whatever you want.
41:24 So it's a lot of fun and exciting.
41:25 Then there's Videory, which is like, imagine if like...
41:29 What's that company that's Vimeo?
41:31 The company that's not YouTube, right?
41:32 So like some company that's...
41:34 Yeah.
41:34 You know, it's a tech company.
41:35 It's, you know, decent size, but it's not huge.
41:37 So everyone knows each other.
41:38 Right, right.
41:38 Maybe Zoom even.
41:39 Yeah.
41:39 Like we're talking on Zoom.
41:40 Could be something like that, right?
41:42 Yeah.
41:42 Yeah.
41:42 And then lastly, I forget what I call it.
41:44 Some GAD.
41:44 So it's basically like some giant government compactor.
41:47 Geo Aerospace or something like that.
41:48 Something like that.
41:49 And it's basically, think of your Lockheed Martin, your Boeing.
41:52 People don't, I think when they talk about data science, they usually don't think about
41:54 these companies as often, but they have tons of people like that, especially analysts.
41:58 Like these companies run on that.
42:00 And because these kind of government contracting companies are massive, they've been around for
42:05 a long time and they really don't want to make mistakes because that can cause a lot
42:08 of damage.
42:09 It's just a lot.
42:09 Everything moves a lot slower.
42:10 There's a lot more bureaucracy.
42:11 It's more of a relaxed job than working at like a startup.
42:13 Yeah, sure.
42:14 All right.
42:14 So which one of you worked at the massive tech company equivalent?
42:18 I don't know if I should say, I consulted for a massive tech company equivalent.
42:22 I'm not asking which one, just like, but you did, Jacqueline, that was you that worked
42:26 at something like this?
42:27 I worked at something like this.
42:28 So the reason I'm asking is because I want to ask you for your take on it, right?
42:33 Like, what is the team like?
42:34 Oh.
42:35 What is the tech like?
42:36 And so on, right?
42:37 Don't name names.
42:37 Oh, no, no, no.
42:38 Okay.
42:38 Okay.
42:38 Yeah.
42:39 Actually, I realized I actually consulted for a couple of them, so I'm not incriminating
42:43 anyone.
42:43 Anyway.
42:44 So when I consulted for these companies, they're like, because they're so big, they're
42:49 so big that, you know, they may have this like big onboarding process that everyone goes
42:53 through, but it has nothing to do with your actual job because the company is too big to
42:56 do that.
42:57 And then when you got on your team, it's like really specific.
42:59 I recently started working with some company like this.
43:02 They were working with the podcast, right?
43:04 They were doing some ads and stuff.
43:05 I had to go through and like sign a waiver that said nobody would climb on a ladder in a dangerous
43:11 way.
43:11 Yeah.
43:12 I'm like, it's a podcast recording.
43:15 You're going to give me audio.
43:15 Like, there's no ladders.
43:16 I don't know.
43:17 But like, this is the other one.
43:18 I see a ladder in your background.
43:19 I didn't know about that.
43:21 Yeah.
43:21 Actually, maybe this is what they're talking about.
43:23 It was like the warehouse person and the like contractor who does podcasting, whatever.
43:29 Like it didn't, you know, they wanted to run an ad.
43:31 So I had to go through this like weird process.
43:34 It was bizarre.
43:34 Yeah.
43:35 And so the cool thing about working with this company is they have tons of money and they're
43:38 really excited about technology.
43:39 So if you're like, I want to buy this expensive thing and try building a solution using that
43:42 people are generally like, sure, whatever.
43:44 It's fine.
43:44 The bad thing is this is true for everyone else as well.
43:47 So when your product A is trying to link up to product B, you may struggle a bit.
43:52 So there's just a lot of this kind of lots of tech, lots of money, high salaries, not
43:55 necessarily everything working in sync that you have to deal with.
43:58 Yeah.
43:58 You probably get to work with a ton of smart coworkers.
44:00 Yeah.
44:01 It's a bit of a bonus and a curse, right?
44:03 It's hard to stand out probably, but it's also great to have that support.
44:07 Right.
44:08 And if you're a person who really likes learning from other people and like having direct
44:11 mentorship, you are, this is one of the best companies to get that out.
44:13 Because yeah, like this company just draws people who know a lot of tech like a magnet.
44:18 Yeah.
44:18 So like one thing what we did is because, you know, it may be the case, like, you're
44:21 looking for jobs and like, you know, it's easy for you.
44:23 Like you're thinking of finding Google.
44:25 You're like, okay, that's the massive tech company.
44:26 But maybe you find a job and you're like, well, it doesn't really fit into any one of
44:29 these five things.
44:30 It's like one thing we do at the end of the chapter is we pull it together.
44:33 Okay.
44:33 Like what are some of the vectors that the companies differ on?
44:36 Right.
44:37 So mentorship, bureaucracy, like the tech stack.
44:39 So even if you find one, you know, you have a company that's not in one of these five
44:43 archetypes, you can sort of go through those things.
44:45 You say like, oh, okay, well, like it's a huge company.
44:48 So like probably there's a decent amount of bureaucracy.
44:49 I would be the first data scientist.
44:51 There's not going to be a lot of mentorship.
44:52 And so you can think about these different pieces and, you know, people have different
44:57 preferences, right?
44:58 Like some folks really, I've talked to people who really love, usually we don't, it's, I
45:02 wouldn't recommend it for someone's first job, but people who want to be the first data
45:06 scientist at the company because they want to get to build everything.
45:08 And then there's some experienced data scientists who are like, I would never want to be the first
45:12 or the only data scientist.
45:13 Like I really like working on a team.
45:14 So it's not like, you know, one of these is like, you know, oh, everyone, you know, it's
45:18 always bad to like have these certain things, but it's just different criteria that you can
45:22 think about and reflect for yourself.
45:24 Like what's important to me?
45:25 What am I looking for?
45:26 Yeah.
45:26 What's the fit?
45:27 All right.
45:27 So handbag, glove, who wants to talk about that one?
45:29 I can do that one too.
45:31 All right.
45:32 Just as I was talking about before.
45:33 So like, there's like a retailer, like let's call it like if it's, yeah, again, like Nordstrom,
45:37 Boatlocker, one of these companies that's like a retail company.
45:40 The cool thing about this is they have a very real product that they've been selling for
45:43 a long time and understand what they are doing.
45:45 So like you add a lot of stability there.
45:47 And by adding on data science, these companies are a lot like, okay, well, let's try and use
45:51 data to improve the product recommendations, improve the product, improve, improve our understanding
45:57 of things, you know, like use data to answer questions.
45:59 So you get a lot of, there's a lot you got to do as a data scientist.
46:02 You have a lot of.
46:03 Right.
46:03 They used to use intuition and now they're going to use data or something like that.
46:06 Right.
46:06 Yeah.
46:06 And so that's the upside.
46:08 So downsides are you don't have as much money.
46:10 Cause you're not like a rich tech company.
46:11 Your tech isn't as good because you know, you just don't care as much about getting the
46:15 best of the best.
46:16 Like, you know, older tech is generally fine.
46:18 And, you know, just as we were talking about, you generally have fewer people who can like
46:21 mentor you.
46:22 Like there'll be someone there, but you know, there'll be people there generally, but it
46:25 might be that everyone know, everyone's using like a far outdated Python library because
46:30 no one knows about the new way and no one's reading up on it.
46:32 So exactly.
46:33 They're still on Python too.
46:34 Something like that.
46:38 Cool.
46:39 And then early stage startup.
46:40 What's the story for data scientists there?
46:42 Yeah.
46:43 I could talk a little bit about this.
46:44 So yeah, with data scientists, like you come in and you really get to shape everything.
46:47 So like there's some negative parts.
46:50 So even beyond the data science part, right?
46:51 Like you might show up at the startup and they're like, oh, we don't have your laptop yet.
46:54 So it's sort of a funny thing is like, there's, but there's also more freedom because they
46:58 might ask you like, Hey, what kind of laptop do you want?
47:00 Like if they're like a decently well-funded startup and you can be like, oh, I want this really
47:03 souped out laptop.
47:04 You don't get that super slow clunky one with a huge company banner that takes five
47:09 minutes to start up.
47:10 Yeah, exactly.
47:11 That's a mixed bag.
47:12 But yeah, often you're like, it's talking about there's a lot of low hanging fruit.
47:16 You also have to wear, you may have to do some data engineering, right?
47:19 Like maybe there's not any data engineers and all of the databases are optimized to like,
47:24 you know, serve the website.
47:26 And so it takes you five minutes.
47:27 So like get a count of a $800,000, $800,000, $800,000 row table.
47:32 So yeah, so you have to wear a lot of different hats.
47:35 You might be pulled in a bunch of different directions.
47:37 So it's also really important to be able to prioritize, like to not just be like firefighting,
47:41 also take some time, like to, for example, build up some skills, like to build up your toolbox.
47:46 So like, okay, maybe write a library for yourself of like, that's a wrapper around pulling
47:50 the data.
47:51 So that becomes easier.
47:52 That's a really important point, because I think a lot of these, I've worked in places
47:55 like this, and it's nobody asks you to build a helper library for data access.
48:00 They help you.
48:01 They ask you, give me this answer or make this product or give me this thing.
48:04 And you're like, yeah, but we really need this thing in place.
48:08 And somebody's gonna have to build it.
48:10 It's gonna be me or the other person I'm working with.
48:12 And you just kind of have to be willing to put in that infrastructure along the way, right?
48:17 Because you're going to appreciate it later, but there's no guidance for that.
48:20 Right.
48:20 And it's definitely not in place usually.
48:22 Yeah.
48:22 And you have to like help teach people like how to ask questions, like what is possible,
48:26 like bring in best practices.
48:28 So it's like I was saying earlier, I really would not most of the time recommend this for
48:33 someone's like first data science job to do this.
48:35 But for an experienced data scientist, like I found some people who really, really love
48:39 doing this because they're like, oh, I don't have to deal with like, you know, the decisions
48:43 of past data scientists.
48:44 I get to shape this in my vision and I get to use the most modern tools, for example.
48:48 You want to use Python, you can.
48:49 You want to use R, you can.
48:50 Like no one's going to, there's no one there.
48:52 So you just, they just need answers.
48:54 Yeah.
48:54 You can use F sharp, right, Jacqueline?
48:55 Yeah.
48:56 I get a lot of, a lot of people make fun of me because my favorite programming language,
49:00 no one else in the world uses.
49:02 Well, I did see the Jupyter Notebooks now support F sharp.
49:05 So that's, that's a vote.
49:06 Be still in my heart.
49:08 That's awesome.
49:09 Yeah.
49:10 But I mean, like early stage startups.
49:12 And I would say also handbag love companies a little bit as well because they may have some
49:17 tech stack, but it might be so outdated.
49:19 They're like, you're new.
49:20 We want to like go in, we want a refreshing direction where you can go in this other way.
49:24 We're not going to make use this old thing.
49:26 We're going to try to get, you know, get something new growing here.
49:29 So you can go and have some flexibility as well, I think.
49:32 Now, what about the videory, the later stage startup?
49:36 Yeah.
49:36 For my preference, I think this is kind of a sweet spot because like you have like, you
49:40 know, it's sort of like in the, in the, in the middle of a lot of these things, right?
49:43 Like there's like some bureaucracy, but I kind of like bureaucracy sometimes.
49:46 Like HR has their stuff figured out.
49:48 Like that's nice.
49:49 It's like benefits and other things.
49:51 There actually is vacation.
49:53 Yeah, exactly.
49:54 You know, so there's usually like, there's a team of data scientists, but since it's, it's
49:58 still like a startup, you know, they weren't, you don't have like a, a, you know, 40 year
50:02 old tech stack, right?
50:03 Like most decisions were made made like five or 10 years ago.
50:05 If that.
50:06 Yeah.
50:07 So I think this can be a nice fit.
50:09 Like you can still get, you know, you can still like know everyone on the data science
50:12 team.
50:12 Like on like, if you're at like a massive tech company and have support, but also have some,
50:17 some structure in there as well.
50:18 And there's probably like data engineers and other people to help out with like data science
50:22 adjacent problems.
50:23 Yeah.
50:23 You're probably a little more locked into a tech stack.
50:25 Yes, that is true.
50:27 Yeah.
50:27 I don't think you can really like do a tech stack from start.
50:30 And so you're locked into certain decisions, you know, and, and there may be sometimes you're
50:34 like, oh, I wish I have a time machine and like could go back and like fix this decision
50:37 they made a while ago.
50:38 Right.
50:38 Like you're at an early stage startup.
50:39 You can be like, all right, we're going to start like collecting data right away.
50:42 You know, we're going to log everything.
50:43 And then if you're at like a, you know, later stage company, they're like, oh, like, why don't
50:48 we like look at the state?
50:49 And you're like, oh, actually we weren't collecting that a year ago.
50:51 And they're like, okay, but make a forecasting model anyway.
50:53 And you're like, oh no, no, no, no.
50:55 You don't understand how uncertain this answer is going to be.
50:59 Yeah.
51:00 And then I guess the last type of company archetype that you all covered was the government contractor,
51:05 the Lockheed Martins and the Hallibartons and so on.
51:08 Yeah.
51:08 And I should also mention this includes the government itself, right?
51:11 Like if you work for the Department of Transportation or something like that, or just, you know, companies
51:14 where there is for legal reasons, there is just a lot of regulation, a lot of things
51:18 like, you know, keeping things moving a little slower.
51:20 And so these kinds of jobs, they tend to, they have to tend to have lots of people who are
51:24 not data scientists.
51:25 And you tend to have the data scientists maybe embed in little groups of that.
51:28 So like in the missile department or whatever, or the, you know, truck department, I don't
51:32 know.
51:33 And so because of that, you generally, you don't have as much mentorship often, but you often
51:39 don't have as much people telling you, no, you can't do it that way.
51:42 You're wrong.
51:42 I mean, you may have it like, oh no, we don't support Python past 2.7 because our, you know,
51:47 our procurement department hasn't cleared it or whatever.
51:49 So maybe bureaucracy, but there isn't like, oh, you have to, you know, like there's, there's
51:53 just not as much of like a standardization around tech just because there's, you know, that's
51:57 not the focus of the company.
51:58 And so these kinds of jobs, I'd say are really, they're really great.
52:01 If you want a job where you go in each day, you work eight hours with a 45 minute lunch
52:06 in there, you get a little bit of stuff done, but you don't stress crazy about getting it
52:10 the most you possibly can done.
52:11 And no one's stressed about you getting exactly the most, right?
52:14 So there's not like, you know, if you're a job where you're like, I'm going to go in,
52:17 I'm going to be the 10X data scientist.
52:18 I'm going to rock, you know, my career is going to be a rocket ship up to the C-suite as fast
52:23 as I can.
52:23 Like, this is not the kind of company for you.
52:25 It's the kind of company that's for more for people who are like, I just want to do a consistently
52:29 good job.
52:30 Like then go home and take my paycheck and spend it on something I enjoy.
52:32 Yeah.
52:33 And don't need a lot of perks.
52:34 That's a good way to put it.
52:35 Yeah.
52:36 Yeah.
52:36 Because I've talked to people who are like, yeah, it's like, especially if you look at some of these
52:40 tech companies, right.
52:41 And like, I don't know, Airbnb and like Rose on tap or something like you, you're lucky if you get
52:45 coffee at some of the like government contractors.
52:47 Yeah.
52:48 That's for sure.
52:49 I think another thing that is interesting is so many of these types of companies are driven by like
52:56 government contracts or projects.
52:59 I'm thinking of like DARPA funding and like, here's a project that is guaranteed to run for
53:04 one year and then it may immediately get canceled no matter what.
53:08 Right.
53:08 So you have like these sort of long time horizons of working on something, but there's, it could
53:12 become a totally different type of job because some other contract was won and this one was
53:17 expired, lost, whatever.
53:19 Yeah.
53:19 And I think there's kind of, I would say not just besides government contractors, you can
53:22 imagine there are some other fields that might kind of fall into this, like certain parts
53:25 of healthcare might kind of fall into this area.
53:28 Yeah, definitely.
53:29 You imagine parts of finance, like some like, you know, rules around, you know, financial
53:33 risk regulations might kind of have some of these components too, but it's more the archetype
53:37 of the company that's got a lot of regulations or reasons why it has to move slowly and not
53:42 break things.
53:42 Yeah.
53:42 Yeah.
53:43 Interesting.
53:43 I think that's a really cool list you put together and I agree with a lot of your assessments
53:48 there.
53:49 So pretty neat.
53:49 Now we've been talking forever and we just barely touched on the stuff that you're covering.
53:53 We could just talk for so long because this is such a great book and a great topic,
53:57 but just for the sake of time, let's talk about one more topic and maybe blend this together.
54:01 So let's talk about getting the skills, becoming a data scientist from wherever you're starting.
54:06 And then also maybe just real quickly building a portfolio, because like I said at the beginning,
54:10 I do think having that first job is super important and getting that first job is strongly influenced
54:16 by just having something I can show.
54:18 You want me to do this?
54:19 I've already done it.
54:20 You don't have to verify if I can do it.
54:22 I look, this is it.
54:23 Just look at it.
54:23 You know, is it a personal fit or a salary fit?
54:27 Or whatever, right?
54:27 So let's start with getting the skills first.
54:30 Okay.
54:30 Yeah.
54:31 Back when you were talking about a master's.
54:32 Yeah.
54:33 And Emily, you're talking about a bootcamp.
54:34 Those sound like two different paths to me.
54:36 You didn't necessarily study programming, right?
54:39 You kind of went the math side, which actually I did as well.
54:41 Yeah.
54:42 I can cover.
54:42 Yeah.
54:43 Let me talk about all the different ways you can get skills.
54:46 And then Emily can talk a little bit about the portfolio, because that may or may not
54:49 have aligned with the chapters before.
54:52 I mean, who's to say?
54:53 I would never reveal that.
54:54 So we really, we think there's like four ways you can kind of get data science skills.
54:58 One is you can get a degree, which usually it's for people that go and get some sort of
55:02 master's degree, which is either like data science or maybe computer science or math, something
55:06 like that.
55:07 And the degree is great in that it, if you don't have that much of a background, you
55:11 will learn what you should, you will spend two years doing it.
55:14 So you should learn the basics of what you actually need, right?
55:16 The data science degree, you should learn the data science skills and you might do some projects
55:20 during it.
55:20 The downside is it takes two years and like 80 grand.
55:23 That's so much money.
55:24 Yeah.
55:25 A bootcamp takes 12 weeks and like 15 grand.
55:28 So that's much faster, much cheaper.
55:30 And the whole point of a bootcamp is to get you what you need as quickly as possible.
55:33 And I feel to me like bootcamps almost do a better job of connecting you with a job afterwards
55:38 than like a master's program.
55:41 Yeah, I think that's true.
55:42 And I think generally I would recommend bootcamps more except for people.
55:45 Bootcamps, you really need some sort of background already.
55:47 Like you need to have some idea of programming or some knowledge of this kind of field already.
55:51 If you don't know anything about data science, that might be 15 grand.
55:54 Then you still are just kind of confused.
55:56 You can, the third option is you could try and find data science work within your job.
56:00 If you're an analyst and you want to do more decision science, you can try and find
56:03 places where you can do decision science in your analyst job.
56:05 If you're a decision scientist and you want to do machine learning engineering, you can
56:08 try and find places where you can do some more engineering.
56:10 So you could kind of try and learn within whatever your job is.
56:13 What if you're a scientist who kind of does a little computation and you kind of want to
56:17 drift towards the data science side?
56:19 Yeah.
56:19 So I actually know someone who is trying, you know, doing that very thing.
56:23 She was, you know, she's a scientist.
56:25 She takes measurements and she's in her job has started to use R to actually make plots and
56:29 do the kind of investigatory stuff.
56:31 And that's totally been working for her.
56:32 And then lastly, you can teach yourself, right?
56:35 There's all these courses online.
56:36 You can work on your portfolio, which we'll get into.
56:39 And teaching yourself is great because it's free.
56:40 You have to focus on the stuff you care about.
56:42 And yeah, you can really, if you can motivate yourself correctly, you can really like learn
56:46 a lot that way.
56:47 I've learned a lot this way.
56:48 The downside is, is that it requires an immense amount of discipline, right?
56:53 If you try and do everything learning online, you have to actually do those courses instead
56:56 of playing Animal Crossing.
56:58 But yeah, not that I play Animal Crossing, but no, Jacqueline's just calling me out.
57:02 And you know, you don't know if you're teaching yourself the important things or not.
57:08 At some level, you don't have a mentor when you're teaching yourself.
57:10 And that's a problem.
57:11 Yeah.
57:11 It's also, I feel like sometimes when people are trying to teach themselves, they try to
57:16 boil the ocean, right?
57:17 Yeah.
57:17 Yes.
57:17 You know, like, well, I saw this and this and this.
57:20 So I got to know all those things.
57:21 Like, no, no, no.
57:21 You just vertical slices, not horizontal.
57:23 Yeah.
57:23 Like, figure out what you got to try to build something and learn what you need to
57:27 build that shallow or deep in these areas and then go from, like, iterate, right?
57:31 Yeah.
57:31 And I think a similar problem to that is when you're teaching yourself, there's not like
57:35 a natural stopping point.
57:37 So like in a master's or a boot camp, like they end and then you're like, oh, I guess
57:40 it's like time for you to like find a data science job versus if you're teaching yourself,
57:44 it's so easy to be like, well, I can't apply to like a data scientist yet.
57:46 I haven't learned like this thing or I haven't learned this thing.
57:48 And you just make this endless list.
57:49 Yeah.
57:49 Right.
57:49 Yeah.
57:50 Which I think is, is, you know, you're always data science is a, is a career where you're
57:54 always going to be learning.
57:55 And so it's not like you, you, it won, like no one knows everything.
57:58 So you don't have to feel like, okay, I must like master, you know, the whole world to
58:02 be able to get a data science job.
58:04 Yeah.
58:04 Yeah.
58:05 I hear you.
58:05 Let me put data science up a little bit on a pedestal here.
58:08 Like, so I feel like as a developer, you can build web apps, work with databases, whatever,
58:12 like you can totally do a quick bootcamp.
58:16 You can take online courses, read books, teach yourself.
58:19 I do feel like those are skills you can mostly get yourself.
58:22 They're like painful lessons you have to learn, but I'm not sure you'd learn those in school
58:25 anyway.
58:26 But with data science, I feel like there is a level of statistics and like scientific understanding
58:33 and a little bit of math that I think is a little bit harder for people to just get on
58:36 their own.
58:37 So having some formal training in the background seems more important for data science than pure
58:43 development.
58:44 I would guess I would broadly agree.
58:46 And not because the actual statistic and machine learning models you learn as a data scientist
58:50 are like somehow harder to learn than software engineering, but because the fields, those
58:55 fields are so confusing.
58:57 And like in a layout, like statistics, like what is considered statistics versus machine learning
59:02 versus industrial engineering?
59:03 Like these are all extremely poorly laid out.
59:05 The people in those fields make them as confusing as possible to make it seem like only they understand
59:09 it.
59:09 And there's not a really an easy pattern.
59:12 There's not really just like one book out there that's like, oh, this thing in statistics
59:15 is actually that thing in computer science.
59:17 And like, they're the same.
59:17 And don't worry about that half of statistics doesn't super matter.
59:20 Like that's not easy information to find.
59:23 Yeah.
59:23 Yeah.
59:24 I do think though, it also depends on like what type of role you want.
59:26 Right.
59:27 Like, so I think that's a little bit less important in analytics role, for example, to have like
59:30 that background.
59:31 And there is certainly like, you know, people do, you can still like learn some of this on
59:35 the job, you know, whether for like mentorship or like reading books or like other, other things.
59:40 But I agree, like there is a bit of a danger because I feel like, especially with statistics,
59:43 it's like, if you run a statistical test, like it will generally spit out an answer, but it
59:49 may not be answering what you think versus right.
59:51 Like it's a little more obvious sometimes in development work, like, oh, the website didn't
59:54 load.
59:54 So I guess I have to like figure out what we're wrong.
59:56 So there's a bit more of a danger there.
59:58 Yeah.
59:58 You're always going to get a number from that library, from those algorithms.
01:00:01 Right.
01:00:02 And you have to understand what it's doing.
01:00:04 That's kind of what I was saying.
01:00:05 Like, it's really clear if the website is letting the user log in or not.
01:00:09 Yeah.
01:00:09 There's not a huge debate.
01:00:10 Maybe security is not quite right.
01:00:12 There's details you got to get right.
01:00:13 But it's generally, it works or it doesn't.
01:00:16 Just because I'm so upset about the point I made previously, because I think the point's
01:00:20 right, but I'm getting upset thinking about it.
01:00:21 It's like, so like a linear regression or logistic regression has some built in assumptions.
01:00:26 If you're in a CS department and you're like, I'm going to use a linear regression
01:00:28 to fit this as part of a neural network.
01:00:30 People are like, fine.
01:00:31 You did that in a stats department.
01:00:33 They're like, how dare you?
01:00:34 That's so incredibly wrong.
01:00:35 Right.
01:00:35 You violated the assumptions.
01:00:36 And it's like, well, these are two trained academic professionals telling you two totally
01:00:40 different things.
01:00:41 And I think that is something you get all the time in data science that you don't get as
01:00:45 often in software engineering.
01:00:47 Yeah.
01:00:47 It's exactly that kind of stuff I was thinking of.
01:00:49 Yeah.
01:00:49 Both the two things you both mentioned.
01:00:51 All right.
01:00:52 Let's close out our conversation by talking about getting a portfolio, maybe mixing a little
01:00:57 possibly contributing to open source as we're at or something.
01:01:00 Emily, do you want to give us the rundown on that?
01:01:02 Yeah, absolutely.
01:01:03 So the idea behind a portfolio, and this is especially helpful for people who don't have
01:01:07 a formal education or haven't worked in very similar jobs or been able to learn on the job.
01:01:13 Because as you were talking about, this is a way they can show they can do the work, even
01:01:16 if they hadn't had an opportunity in school or at a company.
01:01:19 Like a portfolio project.
01:01:21 So we really recommend for it is doing something original that you care about.
01:01:26 Because, you know, one thing people might default to is like, all right, I'm going to go look
01:01:29 on Kaggle.
01:01:30 And I'm going to find like one of the data sets they have.
01:01:32 And I'm going to like do this competition where they like give you a data set.
01:01:35 And they like, you know, tell you to predict this thing.
01:01:37 And the problem with that is like, one, it doesn't really show your personality.
01:01:41 It skips over the steps that are really critical.
01:01:44 And you'll need to do in like data science roles, which is like gathering the data, figuring
01:01:48 out what question to answer.
01:01:49 And also, honestly, like if a company sees that in portfolio, you know, maybe they're worried
01:01:54 that like, oh, did they just copy someone else's code, right?
01:01:56 Like this is a problem a lot of people have worked on.
01:01:58 So we recommend, you know, kind of finding, figuring out a question you're interested in
01:02:04 answering, or finding a data set that's interesting to you and exploring it to like figure out
01:02:09 like, okay, what are some like interesting findings I can have from that?
01:02:11 And so putting that together and then sharing it on GitHub.
01:02:15 So you have the code with the readme that describes it.
01:02:17 And then ideally also having a blog, because a blog is really great.
01:02:21 Someone may not look through, you know, like hundreds of lines of your code, but they might
01:02:25 be like, oh, yeah, let me read about like what they found.
01:02:27 I'm like, look at some visualizations or read a tutorial that they wrote because they use
01:02:31 natural language processing for this project.
01:02:33 Or even just look back two years and see they've been doing this for as long as they said
01:02:38 they have been or something like that.
01:02:39 Yeah, exactly.
01:02:40 Exactly.
01:02:40 And so Jacqueline shared the example project that she did, which is trading a network on
01:02:44 offensive license plate.
01:02:45 So I do want to emphasize like it doesn't have to be, you know, something very serious.
01:02:49 Or if you want to go into finance, it doesn't necessarily have to be like a finance project.
01:02:52 Because, you know, if you're like, oh, I use like neural networks, or like one of the
01:02:57 projects I did was I built a dashboard.
01:02:59 And so that shows like I can build a dashboard from scratch.
01:03:02 So I really think this can be like a great way to show off like some of your personalities,
01:03:06 your coding skills, your communication skills with the read me and the and the blog, and
01:03:11 maybe even demo in an interview.
01:03:12 So when I was doing the job search after graduating from boot camp, I would show I bring my laptop
01:03:17 and sometimes I would show this dashboard that I had built.
01:03:19 And I'd be like, look, and you could filter it and you can click around and it like you
01:03:22 click this goes to a link.
01:03:23 And I think that made it like much more, much more real to them than if I was just talking
01:03:27 about this theoretical project.
01:03:29 Absolutely.
01:03:29 That's awesome.
01:03:30 That's really good advice.
01:03:31 Another thing I think would be valuable is if people can in the right place, the right
01:03:35 background and whatnot is to maybe contribute to some project that's relevant in the data
01:03:40 science space.
01:03:41 Right?
01:03:41 Like, if you have two people you're interviewing, and one's like, well, I'm pretty good at using
01:03:46 Jupyter.
01:03:46 The other person's like, I had two PRs merged into Jupyter.
01:03:50 And actually, you know, some of the people on the team, you know, a little bit who work
01:03:54 on like, okay, I know who I'm going to talk to a little bit more next about, you know,
01:03:57 it's a different level of credibility.
01:03:59 Even if like what you did was there were no unit tests for this part of the library.
01:04:03 So I wrote some unit tests, or I worked on the documentation, or I worked on a tutorial.
01:04:07 Like, it doesn't have to be I rewrote the main thing, right?
01:04:11 Yeah, absolutely.
01:04:12 And we have like, I think it's 14 chapters on like joining the community.
01:04:15 And that's one of the things we talk about is contributing to open source and exactly what
01:04:19 you said.
01:04:20 Like it can be, you know, writing new documentation, even fixing a typo, just these ways to get
01:04:25 involved.
01:04:26 And, you know, I do want to emphasize that like, you know, this isn't something that is required
01:04:30 to get a data science job.
01:04:31 Like I know a lot of data scientists who don't have like a GitHub with personal projects who
01:04:36 don't have a blog who don't contribute to open source.
01:04:38 So they're still like excellent data scientists.
01:04:40 But it's just like, what are the ways that one, hopefully it's fun to hopefully you learn
01:04:44 something like that's the other big point of the portfolio project.
01:04:47 It's a great way to direct your learning like you find out, oh, I need to like, you know,
01:04:51 figure out how to scrape this website, let me go like to gather the data.
01:04:54 So let me go learn web scraping.
01:04:55 And three, like and to stand out in interviews, but it certainly, you know, shouldn't, I don't
01:05:00 think it should be like a requirement for any job, for example.
01:05:02 Yeah, right.
01:05:03 I agree.
01:05:04 And it probably a different company archetypes, they probably care or completely don't care
01:05:09 about this, right?
01:05:09 Like the big geospace, geo aerospace contracts, they're probably like, okay, great.
01:05:13 We don't know that we trust you if you're writing code for just for open source.
01:05:17 That might be weird, right?
01:05:18 Whereas like the startups are like, oh my gosh, that's so amazing.
01:05:20 I can't believe, you know, or the big tech company.
01:05:22 We're trying to move to open source.
01:05:24 So that's great.
01:05:25 You can be one of our advocates.
01:05:26 So yeah, I suppose that it probably varies a lot as well in there.
01:05:29 All right.
01:05:30 Well, I would love to talk more about this because there's a, kind of cool ideas you two put in there, but I think we have to leave it at that.
01:05:37 Let me ask you the two quick questions before I let you out here.
01:05:40 If you're going to write some code, do some data, science data analysis, what editor do
01:05:45 you use these days?
01:05:45 I use RStudio, although what are my development goals?
01:05:48 I'm actually starting to use the Vim as the editor within it.
01:05:52 So I'm trying out that.
01:05:54 But yeah, I've been, I've been using RStudio for, for years now.
01:05:57 Although I also heard, what is it?
01:05:58 Is it Visual Studio?
01:05:59 Like now SportsR and like one of my teammates was trying that out and really liked it.
01:06:03 Yeah.
01:06:03 Probably VS Code.
01:06:04 Yeah.
01:06:04 That's awesome.
01:06:04 VS Code.
01:06:05 Yeah.
01:06:05 And Jacqueline?
01:06:05 So I'm a 50, 50 split between RStudio and Visual Studio Code.
01:06:09 So RStudio for anything R related, literally anything else, including like just notes to myself,
01:06:14 Visual Studio Code.
01:06:15 Yeah.
01:06:16 Awesome.
01:06:16 And then notable libraries out there for data scientists, not necessarily something super popular,
01:06:21 but you're like, oh, this package is really awesome.
01:06:22 People should know about it.
01:06:24 Do they have to be Python libraries?
01:06:25 No, they don't be Python.
01:06:27 No, there's more of a data science topic.
01:06:31 So it could be running across the board.
01:06:33 Well, I will mention one of the libraries that I created when I was consulting for T-Mobile.
01:06:38 It's called Load Test and it's for R.
01:06:41 Me and the T-Mobile team made it and it's to help you if you're making an API in R using
01:06:47 the R library plumber, which is great.
01:06:49 You can use the Load Test library to test it to make sure that your R model will be able
01:06:55 to handle the load.
01:06:55 Okay.
01:06:56 Awesome.
01:06:56 Yeah.
01:06:56 Very cool.
01:06:57 Emily?
01:06:58 I have so many.
01:06:59 Now I'm wondering if I should, you know, say my own package as well.
01:07:01 Do it.
01:07:02 Stop from Boat.
01:07:03 Stop from Boat.
01:07:04 I'll briefly share.
01:07:05 I use it less now, but I use it a lot at my last company, Funnel Join for like analyzing
01:07:08 sequences of events.
01:07:09 You're like, all right, who like came to the website?
01:07:11 1% of people who visited the homepage then bought a subscription.
01:07:14 But what about if we want that within two days?
01:07:17 So that's one.
01:07:18 But another package, R has so many packages that I like.
01:07:21 So one thing that I'm very excited about, which is like sort of hot, it's been in development
01:07:26 for a while and in pieces, but is tidy models.
01:07:29 So we're rethinking of how to do modeling in R with a brand new website out now too.
01:07:33 So I think it's tidymodels.org.
01:07:35 So I'm excited about that.
01:07:36 And then finally, just when the janitor package is a fun one for if you do cleaning data, it
01:07:42 just has all these functions for like you import a data set and there are spaces in the names
01:07:47 and like weird capitalizations and like weird like characters that make it hard to work with.
01:07:52 It has a function like clean names and it will just fix all of those for you.
01:07:55 Oh, that's cool.
01:07:55 Yeah.
01:07:56 And I think there's a PyGenitor as well.
01:07:58 I'm not sure if it's directly the same, but so people in Python, they do PyGenitor.
01:08:02 You'll also go and throw one out there on the Python world for folks.
01:08:05 There's this thing called MissingNo, MissingN-O.
01:08:09 It's a visualizer for missing data.
01:08:12 So you just have a Pandas data frame and you throw it at it and it'll draw you like a big
01:08:16 cool graph of visually where your data is filled in, where it's missing and all these sort
01:08:20 of like correlations of you're missing this data, you're probably also missing that data.
01:08:24 It's super cool.
01:08:25 Yeah.
01:08:25 There's actually an R1 for that, which is Manier.
01:08:28 I never know how to pronounce that, but it's also for like missing, yeah, Manier for missing
01:08:33 data.
01:08:33 Awesome.
01:08:34 Yeah.
01:08:34 Yeah.
01:08:34 That seems super valuable.
01:08:35 Just get a quick, like I've got all this data loaded up.
01:08:37 Let me just look at it.
01:08:39 Yeah.
01:08:39 Visually.
01:08:39 Yeah.
01:08:40 Cool.
01:08:40 Yeah.
01:08:41 Yeah.
01:08:41 So tell us how people can get your book.
01:08:42 Our book is online.
01:08:43 You can buy from the Manning website, who's our publisher.
01:08:46 And we actually have two URLs because we had a disagreement about this.
01:08:49 So we have the professional URL.
01:08:51 Do you want the professional version of the book?
01:08:53 Yeah.
01:08:53 Yeah.
01:08:53 DataSciCareer.com.
01:08:57 And then we have the fun version of the book, which is at bestbook.cool.
01:09:02 And now those will take you to the same webpage, but know if you click bestbook.cool, you're
01:09:06 getting the fun version of the website.
01:09:07 Oh, yeah.
01:09:08 You're getting DataSciCareer.com.
01:09:09 It's the professional one.
01:09:10 Yeah.
01:09:10 And maybe we should have people guess like which one of us, which one of us is the fun
01:09:15 one, which one is the four serious.
01:09:16 Exactly.
01:09:17 Put it in the show notes.
01:09:18 Put it in the comment section at the bottom of the show page.
01:09:21 Awesome.
01:09:22 Well, Jacqueline, Emily, it was really great to have you on the show.
01:09:25 And I can certainly recommend your book.
01:09:27 It's spot on.
01:09:28 It covers a bunch of great topics.
01:09:30 People ask me about careers all the time, and I always want to have good advice to give
01:09:34 them.
01:09:34 And so here's definitely something they should check out.
01:09:36 Thank you so much.
01:09:37 Thank you so much.
01:09:38 Yeah.
01:09:38 You bet.
01:09:39 Yep.
01:09:39 Bye.
01:09:39 Bye.
01:09:39 Bye.
01:09:40 Bye.
01:09:40 This has been another episode of Talk Python to Me.
01:09:43 Our guests on this episode were Emily Robinson and Jacqueline Nolus, and it's been brought
01:09:49 to you by Kite and Linode.
01:09:50 Kite is the smart AI-powered autocomplete for your editor.
01:09:54 And the more powerful your editor is, the more effective that you are.
01:09:57 Get Kite for free at talkpython.fm/kite.
01:10:01 Start your next Python project on Linode's state-of-the-art cloud service.
01:10:06 Just visit talkpython.fm/Linode.
01:10:09 L-I-N-O-D-E.
01:10:10 You'll automatically get a $20 credit when you create a new account.
01:10:13 Want to level up your Python?
01:10:16 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.
01:10:20 Or if you're looking for something more advanced, check out our new async course that digs into
01:10:26 all the different types of async programming you can do in Python.
01:10:29 And of course, if you're interested in more than one of these, be sure to check out our
01:10:33 Everything Bundle.
01:10:33 It's like a subscription that never expires.
01:10:35 Be sure to subscribe to the show.
01:10:37 Open your favorite podcatcher and search for Python.
01:10:40 We should be right at the top.
01:10:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the
01:10:47 direct RSS feed at /rss on talkpython.fm.
01:10:51 This is your host, Michael Kennedy.
01:10:52 Thanks so much for listening.
01:10:54 I really appreciate it.
01:10:55 Now get out there and write some Python code.
01:10:56 I really appreciate it.