Monitor errors and performance issues with

#262: Build a career in data science Transcript

Recorded on Wednesday, Apr 22, 2020.

00:00 Has anyone told you that you should become a data scientist? Have you heard it's a great career? In fact, data scientists is the best job in America according to Glassdoor in 2018 rankings. That's great. But how do you get a career in data science? And once you've landed that first job, how do you find the right fit? How do you find the right company? And how do you get more deeply involved with the community as you grow in that career? I brought two great guests, both highly successful data scientists on the show today, we've been thinking deeply about this, Jacqueline Nolan and Emily Robinson are here to give you real world actionable advice on getting into this rewarding career. This is talk Python to me, Episode 262, recorded Wednesday, April 22 2020. Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk And follow the show on Twitter via at talk Python. This episode is sponsored by kite, and linode. Please check out what they're offering during their segments. It really helps support the show. Jacqueline Emily, welcome to talk Python to me.

01:24 Thank you. Thank you excited to be here. I'm excited

01:26 to have you both here. Really excited to talk about this topic. I think one of the things that a lot of listeners out there can benefit hugely from is how do I get started in programming? How do I get started in data science? How do I get started in this overall sort of Python and career. And there's so many different paths and ways you can go, you could go get a four year degree, you could drop out of college and do a startup like, you know, what is the right path? And what is some guidance around there? What are some of the trade offs? And so you both have been writing about this lately, and this is really, really good, the work that you're putting out. So I'm excited to talk to you both about that. Great.

02:07 Yeah, no, we've had, you know, so we just published our book, build a career in data science. We've been working on this, I think almost two years now.

02:14 Yeah, two years since since you reached out to me.

02:17 Yeah. And so two years, a lot of work a lot of talking to people in the field. And so it's really been great to finally get it out there and talk to people and watch people actually get helped by it. So yeah, we love talking about it. Awesome.

02:30 Well, I'm really excited to have you all here to talk about it. But before we get to that, let's maybe do a little bit of a meta thing. I always ask this question on the show, but it's it's a bit meta this time is just how did you get into programming in Python? Emily, you wanna go first?

02:42 Yeah, I don't know if I can admit this on this podcast. But I actually don't program in Python anymore. I program in our day to day I have programmed in Python. So how I got started was back in college. I did Python in one computer science class, but most of the programming I did was in our in the statistics program. So I was lucky enough to go to Rice University when Hadley Wickham was a professor there who, for your listeners who don't use arza is a very famous, our programmers contribute a lot of the big packages to it. So that's how I got started. And I kept doing it in grad school, which I got my master's in organizational behavior. After that, I went to a data science boot camp called Metis, which was all in Python. So you know, sort of my Python skills there and then got started working in Data Science and Industry. Oh, very

03:25 cool, Declan. How are you?

03:27 Okay, so like Emily, most of my work is an art though I do some Python too. But okay, so my background, I did an undergrad and Master's in math. And then like, I really want to help companies use math to solve problems. And this is before the term data science existed. So I went out in industry did some what is now data science, but I didn't know at the time, went got a PhD because I wanted to get some more technical skills. And then now I work as a consultant helping our companies. Awesome. What's your PhD in? industrial engineering? Yeah, but actually, so how I actually started using Python, I think the first Python project I ever did was, there's a style of games from like the 80s called roguelikes, where you have like your little dress symbol, and you're walking around like the computer screen trying to fight monsters and stuff. And like the monster might be the letter M. And

04:11 I wanted to make this like mud, but it has it like some visual Yeah, representation. Okay. Yeah. Awesome. Yeah.

04:16 And so it just like it gets weirder. So I wanted to make one of these I wanted to make one of like when you're on a river, and you're like tubing, and you're sitting there, and while your character you'll have some will be like you floating around the river and you like see horses and stuff. So anyways, Python was the language that had the most straightforward library for making one of these roguelike games. And so I spent like, two weeks coding up this tubing simulator, and then I got bored of the project and left it but that was my first time actually, I thought,

04:41 yeah, it's cool. And these little personal projects are super valuable for getting into programming. Because to just go through and say, Well, I'm learning about loops. So I'm going to write like some different kinds of loops. Like no one learns that way. Right? Not really.

04:53 Yeah. So that's one of the chapters in our book. We have a whole chapter on Hey, you know, it's really good to learn things this way. You can actually make a pro a portfolio of projects that you then can use to help you get a job. And for me, one of the, besides making tubing simulators, one of the projects I actually did was to learn neural networks, I generated a neural network that would create a fence of license plates that will get banned by the state of Arizona, because I have a dataset of all these license plates. And that ended up being the basis for like a, an extremely valuable consulting project where I ended up doing natural language processing using the same stuff I learned from that offensive license plate thing

05:30 that is so cool. It's just such a fun and playful and kind of silly project. But then, yes, what I found really interesting in programming in general, is you have these two different realms, you know, like think hedge fund or Air Force, right? But the fundamental thing you learn in the skills you gain to solve or work in those areas, it's almost exactly the same. It's just like, what layer is the specialty that they're working in? You know, how do you work with trading, they care maybe more about like timing, versus I don't know, visualization or whatever. But it's, it blew my mind, like how similar stuff is like that, like, I created this fun thing for license plates. And then it turns out to actually, let me I don't know what exact project you are working on, but like something that obviously was probably not that right, there's a

06:17 real thing someone paid me for, as opposed to a license plate thing, which gets me some blog post views. Yeah,

06:23 just those very short side note, did you see that somebody thought they were gonna be so so clever, and get out of all the camera, like camera speed traps and stuff at red lights, red light cameras and whatnot. So they got their license plate to say no, in you ll. And they thought like that would just trigger the database to think nothing was there. But they started getting tickets for every faulty piece of data that was in there was nobody like there wasn't a license plate. According to the starting a thousands of dollars of tickets that weren't theirs. But that was they are now registered for no, like across the board. It's Upstate. I kept playing with fire. Oh, that backfired bad. Anyway. Oh, that sounds really, really fun. So, Emily, what are you doing today? I day to day, what do you work on?

07:11 Yeah. So I work as a senior data scientist at Warby Parker, which makes eyeglasses and now context as well, you can get online and well, you could in previous times get in stores as well. But that, of course, like many other companies, we've closed all retail stores at the moment. Yeah,

07:26 we have a Warby Parker here in Portland. And I almost wanted it the other day, but not anymore. Yeah.

07:30 But still online. Right. Yeah. Still online. So online. Yeah. So I joined there in December, so about five months ago now on the data science team, which is a centralized team that works with departments across the company. So that's been really fun. Because previously, I'd always been a data scientist that was embedded with the team. So when I worked at Etsy, the analytics department was centralized. But we are sort of paired with one partner team. So I worked with search my whole time. And then my last job, I was part of the growth team. So like, I reported to the VP of growth, not to the chief data scientist. So this has been a new experience being on a on a fully centralized team, where for one month, I might work with finance. And then a couple months later, I'm working with product strategy. That's super fun. And you get to experience different kinds of problems and work with different teams and technologies, I'm sure Yeah, exactly. And so you know, the team tackles a wide range of projects. So one thing we discussed her book, right, as data science is a pretty broad field. And so there's lots of different projects you can do. So some of ours are making, like a dashboard to view analytics and some other ones may be more modeling problems or making machine learning product. So that's been really interesting to get this breath of things that we can work on, depending on what the the team we're working with needs. Yeah.

08:43 sounds super cool.

08:45 Declan, how about you, I'm working as an independent consultant. So I've spent the last couple years working as my own company. So helping out big companies like T Mobile Expedia, some smaller startups in the Seattle area. And so this is pretty fun. Because, you know, like Emily is saying, by being a consultant, I got to work on all sorts of different projects, whether it's taking machine learning models, employing them in production, or helping a company figure out who are the most like active engaged customers and how to, you know, think about targeting them differently. That's pretty great. Unfortunately, it's not like the best time to be a consultant on your own right now. You know, it's been a little dicey lately, but oh, no, I've really enjoyed getting to work with all these big companies on all these different interesting data science problems.

09:26 Yeah, it sounds really fun. And it is nice to be able to pick who you want to work with them what projects you want to take, and you have a lot, a little more freedom, I think, to kind of go your own way. But I am right now. I don't know. all bets are off. It's Yeah, I think it's tricky to be a freelancer right now. Because you feel it, all these changes and all these pressure, you feel it immediately. Right. But if you work at, I don't know, some large company, you might not feel right away, but then, you know, maybe that company goes under and then all of a sudden, you don't have all the connections you had as a freelancer. Right maybe Expedia drive up well, because travels down, but, you know, maybe some other companies like, Hey, we got some more work, why did you come work for us? Right? Whereas if you work at a company, you don't necessarily cultivate those connections as much.

10:11 Yeah, and I think a lot of people tend to like, kind of make working as a freelancer kinda like a cool thing that like, when you're really a big shot, you get to like work, you know, just as an independent consultant, but it's, so I think a lot of people kind of, like aspire to have that sort of a job. But it's really hard. It's really hard. Because as you're saying, there's a lot of instant, you know, third changes the market, you feel it instantly. And, you know, half of your job is going out and finding new clients. You can feels kinda like work with stakeholders, none of that has

10:39 to do with programming. Yeah, or technical stuff, right? You've got a, you're almost in marketing for yourself as a bit,

10:45 right? So like, people think, Oh, I want to be a freelancer. So I don't have to do all the boring stuff. I can just do that data science. And it's like, No, you actually have to do more of the boring stuff. So it's, we actually, we have a chapter in our book about, okay, what do you do once you become like a senior data scientist, and you're looking at the next steps. And one of the paths we've discussed is the consultant path, which has some perks to it, but also has some serious risks and downsides.

11:07 Yeah, I also have to share it. So how we split writing the book was we each like where the the primary writer for half the chapters, the other person edited. And so Jacqueline was writing this chapter, she was just talking about the first version of the independent consultant what was so negative, like she was basically like, never do this.

11:26 Hold back a little bit. Like I understand you want to share the cons?

11:32 Not that yeah, it's not that I think consulting is a bad thing to do. It's that I've had so many people come up to me be like, Oh, that sounds so cool. I want to do that. How do I do that? And I feel like I'm the like, the old woman in front of the cave. Like,

11:48 I think I'm even Campbell. It's too hard.

11:51 For what you wish for. Yeah, I get it. I'll funny. Yeah, it's, it definitely has this arc of like, hey, you're your own boss, and just do whatever. But yeah, there's, there's a lot of work to be done there. And I think it also makes sense at different stages in your career, for sure. Let's start this whole conversation off with a question about how you both got your first job. So I heard about what you're doing now. And I have this theory, I've only really tested it, I guess I've tested it with a few people, I was gonna say I really only test it with myself, because I only know my career that well. But I've worked with people who are interns and then found their way through like really successful stuff and sort of saw that as well. And my theory is, in the developer, data science space, the first job is the hardest. Because once you've had one job, you have a portfolio of work, you have experience, you can say I've done this thing and you have a problem. Similar to like here, I've done this thing with license plates. It's technically not license plates. But that's basically what you're asking me to do. And so it's not a matter of convincing a person in interview, you can do it because you can just show them, Look, this is what I built, and they're happy. But in the very beginning, it's such an unknown people. So I think getting that first job is probably like one of the biggest steps to kind of going down this path. So I wanted to ask you to how do you get your first jobs? Emily, you wanna go first? Yeah.

13:14 So I mentioned my first job was at sea. And so all right, so it's like, take a time machine back to fall 2016. So I've finished the meta data science boot camp, but I interviewed so like, one thing they did was there was a demo day for your final project. And there was some companies hiring there. But yeah, I ended up at Etsy. And actually, how I got that initial, I don't know if it would have happened anyway. But what helped initially was actually knew someone, Hilary Parker used to work there as a data analyst. And she had since left but you know, she still knew people work there. So she offered to introduce me to a manager there. And he took a look at my profile and said, Yeah, I can refer you. So you know, definitely a network is a big part of it, even for later jobs, although I definitely agree the first of the hardest. And then I think what helped there was like it was really great company I really enjoyed working there. And the title at the time I had was data analyst, so it wasn't data scientists. And I'll actually that team since then, shortly after I left their titles are not data scientists. But one thing we talked about in the book is avoiding this people can get very attached to the title data scientist and those can sometimes be harder to get or they're very attached to like, Oh, I need to go work at like Google or Facebook or Airbnb, right like this. I don't say like Etsy probably falls under there. But like, you know, very well known like data science company. And I do think like you are going to get such valuable experience from almost like any job. If you're working in like data and whether you're call data data analyst or you know, research analysts or product analysts, like if you're doing code, if you're working with data, through working with stakeholders, it's that can be a really great first experience. And like you said, just having that on your resume can open up a lot more doors. And it's very common, especially in the tech field for people to switch jobs every you know, one to three years. So it's not like you're signing up They're forever. That's how I got my my first job.

15:02 Yeah, cool. And one of the things I want to talk to you about is the trade offs of different types of companies to work out. But we'll get to that. I do think this idea of having someone to introduce you, or someone who knows you, or someone who knows someone, you know, like a couple layers removed, is really valuable. When I was working at companies where we were doing a lot more hiring, it was just anybody know, somebody who could do this. And they can recommend is good. If the answer was no, then maybe it becomes a job search, then maybe it becomes a job posting, but it was not first a job posting it was first, anybody knows somebody great that can do this, that we're in need of, if somebody knew somebody, then we probably just would go out and talk to them, you know, first, and I don't know if that's fair or not. But that's just, it's how it works. Because if you put a job post out there, you could get 1000 1000. Applicants you've got to go through. And, you know, I've had plenty of people I've interviewed where it's like, I can do this thing, like, Okay, how about this? How about we turn on screen sharing? And what was it that times like GoToMeeting or something I turn on screen sharing? And why is right, a real simple program that does that should be like five lines of code. Anyone could do it who's like, couldn't do it? Like, okay, you clearly have not been doing this for two years, because this would be like the first week of a class that covered this topic. So yeah, I just, it's really tricky. So I do think, you know, just people out there listening, like, cultivate those connections as much as possible, even if it's not, it's not a perfect meritocracy. But that's just the way it is. Right.

16:31 Yeah. And I mean, I think there's lots of advantages to a network beyond like helping get a job. It's like finding a community of other people who are having like, maybe the only data scientists at a company. And like you said, I mean, it's partly, would you sort of said, like, maybe not thousands. I mean, certainly the large companies got like thousands or 10s of thousands of applications for a data science position. I remember Angela bassa, who's one of our interviewees in the book, she posted, I'm sure as data science, he was a data analyst position at our company. And she actually closed it after four days because they gotten 1000 applications. Because I've

17:03 got a backlog. We're just got to go through this. Yeah. This portion of talk Python, to me is brought to you by kite, the smart AI powered autocomplete for your editor. as developers, our choice of editor is central to our work, the more powerful and effective that that editor is, the more effective that you are. That's why I'm excited about kite is a free plugin for your code editor that gives you ml powered auto completions and documentation. Chances are it works with your editor of choice. Even if that editor has existing autocomplete features. The list includes pi charm VS code, atom, sublime, vim, and more. And kite runs locally. So your code is private with no cloud or internet connection necessary. And the kite is 100% free. So try it today at talk slash kite KITV. And CL kite can help you be more effective with your Python code. Declan, how did you get your first job?

18:00 Okay, so first off, I feel old

18:02 because I was like, oh, in 2016. So my story starts in 2008. So I was finishing my Master's in math. And so you know, because it was much earlier in this whole field. Again, data science wasn't really a term yet. So it was much harder to just like I you know, I remember going to like and searching mathematician and getting just lots of math jobs like I don't, that's not

18:23 what I want. Like, I don't want to be an actuary. I don't want to teach math. So what do I care? What can I do?

18:28 Yeah, I have no idea. Well, and I knew there were jobs out there. I just didn't know how to find them. And what am I very good friends had just started the year before working in a company. He's like, Oh, you know, we actually have a department that hires people with math degrees, you should apply. And so I applied, and the interview process was like a two day thing where they bring you on site, and they do a bunch of interviews. And so at that point, in my by my master's, I had some internships, I had some research projects. So to your point of like, it's hard when you haven't had a first job before, I think you can have things like internships or like, you know, that license plate or you know, any like project you can hang a hat on is something you can talk about.

19:03 So I have some of that. That's cool. And I do think actually now is even easier than back then, with stuff like GitHub and open source. You don't have to be, you know, employed to create a cool project that people can start to like or share or something. Right. So the opportunities are certainly there,

19:18 right. And I think that compared to before, when I was doing it, no one knew what like an analytics person needed for skills was it math was a programming now we've really got a much better idea of what you need to have on your resume. And so it's like a two day thing that ended with like a night where they took us to a bowling alley tried to guess to drink a lot. So get mark, though, I think there's a lot of ethical anyway, so I am just I want to let me just like just to finish a story. I ended up taking the job. And the job and they talked about it during the interview process was like, Oh, you're, you know, analytics, business analytics, team member, you're going to do forecasting, you're going to maintain the models. And I'm like, Oh, cool. I've worked in forecasting before. I would love a job where I help build cool, interesting new forecasting models. And I'm like on the Job became very clear that what they actually wanted me to do is rerun the forecast each month in SAS, copy and paste it into itself, copy, paste the church from Excel into PowerPoint, and then stand in front of people and read off the numbers. And that was like a huge shock, because that is not what I want to do in a job. And it basically took me a year of working there before the job became kind of what I want it to be. But also I'd given up at that point moved on to a different job. So for the people who, when you do get that first job, if it is not what you expect, that's probably you know, I've talked to a lot of people, a lot of people we interview in our book have the same problem, when they get to their first job. They're like, Oh, my God, this is not what I was expecting at all. And by your second job search, you have a much better understanding of really what it is you're looking for. And what actually exists in industry versus there are no job like, there's no job in industry, that's just writing math theorems or whatever. That's right.

20:47 Well, also, I think, sometimes you can get into those situations, because the company thinks that's how it has to be done, like we've done this. So we need somebody to keep doing that. Like, we only knew how to run Excel, and then export this stuff, and then do this weird thing and then manually fix it up. And then you could tell us like what the picture says, I think that's true

21:05 companies naturally have a tendency to that. I think, at that time, coming out of a math masters, I'm like, Oh, I want to do is creating new and exciting Mathematical Society, just like affinity for, like having to change the world on my job. Yeah. And I think, you know, I have since then, in my career done a lot of good by constantly coming up with new forecasts and new methods of doing thing. But also, it is totally fine to have a job where you spend 80% of it just pressing go again. And then the 20% doing something interesting, if you like that, right, like, so there's Yeah, like, I think I kind of like held my nose up against those kinds of jobs, but I think they're pretty good. I've hired people on teams, I've had a lot of people who come straight from academia have the same problem, I had, like, oh, wow, you want me to copy and paste each number, like individually that could take me 10 minutes, like, you know,

21:51 it's got to get done? Well, my thought was, you know, maybe the job starts out that way. But then you're like, well, I can, I can do some programming, we actually, we don't really need to load Excel and copy and paste, I could use something like pi open Excel, where I could actually write code that talks to the database, and then runs the report, and then just puts it in there, right? So you could like slowly take away these manual steps by starting to create like cool pipelines of like processing and automation. And they didn't ask anyone to do that, because they thought that was basically impossible, right. And so I feel like a lot of people can end up in these situations, where there's like, one workflow that you are hired for, but you know, as people can write code, or kind of magicians, right, they can kind of like magic stuff into existence. And you can solve some of these problems. And they would probably much rather have a single button click or something that's automatic every day, but they just they couldn't create it or put it in place. Yep, I

22:45 think that's true. And I think depending on the different size company, you may get more opportunities to do that. And depending on your appetite of I want to code an interesting thing in Python to try and automate the Excel or just press Excel. I'll hit go for five minutes each day. That's Yeah, exactly.

23:00 Yeah, I do think that's where the company culture or your manager can be important, though, right? Like, imagine some companies that like have a lot of bureaucracy would just be very uncomfortable with this idea. They're like, No, we've always done it this way. Or maybe like companies that are like, are the government or like companies that work with the government. So I do think, like, it's important to be. And then also, kind of the reason we wrote this book was because, you know, he felt there's a lot of tactical guidance out there, but not on these other really important skills you need. And I do think, you know, one of those skills, if you want to change the practice of a company, you can't necessarily just be like, you know, email it to them one day, and have that be done. You need to like, you know, talk to them figure out, they're like, you know, what kind of scares them about this change, like do change management and other things. And I think that's like to not underestimate the importance of things like communication and and working with stakeholders, when thinking of things like technological solutions, even if to you it makes it really obvious that like, Oh, of course, this is like going to be 100%. Better.

23:55 Yeah, absolutely. And Jacqueline, just to make you feel better, I got my first programming job when I was in 1997.

24:07 Because you're farther.

24:10 All right. Well, one of the interesting things that you discussed was that there's this term data science. But in a sense, there's almost like three branches of data science, kind of a little bit like in software development, you'd say, Hey, I'm a programmer. I like Oh, cool. Could you build me a mobile app? Like, no, I have no idea how to build your mobile app. I could build you a website and let someone else go. I can't build a website by build a cool desktop app, right? So you know, what is that kind of partitioning look like in the data science space? Yeah, who wants to jump in? So

24:43 you know, and this is something that could be like one of the more controversial parts of the books, but I think like we people sort of come around to this, but how we divided it with is in three areas, which is analytics, machine learning and decision science. And for example, one company that basically has this division and then a grandma wrote a great post on this At Airbnb, his Airbnb does analytics machine learning and they call it inference instead of decision science. But the idea behind this is analytics is basically like taking data and putting it in front of the right people. So just sort of showing the data that you maybe already have or going out, like maybe going on a collecting it. But basically, just, you know, maybe by making dashboards or showing a report is just surfacing data to the right people, which is really valuable. And then the next one, machine learning is, I think, often what people think of when they think of data science, which are things like, you know, creating the recommendation model on, right? When you look at a product, and it says, like, you know, you may like these products, or at Etsy, we have the search ranking team, which is when you search Harry Potter, what of the 200,000, Harry Potter items you show first, right? And they don't pick randomly. There's an algorithm that's based off of historical how the items have behaved historically. And then the final one is decision science. So this is basically going beyond the numbers to help companies or people make decisions, and also generally involves a lot of statistics. Because basically, it's we need to understand how to quantify uncertainty. So even though we know, for example, that you know, the people who answered this, we ran a survey and you know, 80% of people said this well, but we had a 50% non response rate, and maybe we know that more women than men didn't respond. So how do we adjust for that? What's the uncertainty around this estimate? Making a forecast? You know, as Jacqueline talked about, like, that's decision science, you know, there. So those are the three main areas we have, right,

26:33 our mailing list already is like skewed towards this audience. And we just asked the mailing list. Hey, everybody, tell us what you think it's going to carry that bias forward if or that that slants forward unless we can somehow do more to take care of it and what not right, yeah,

26:48 exactly. And yeah, anything to add there, Jacqueline?

26:50 Yeah, I would just say that I think a lot of people have like a preconceived notion that like one of these types is more pure, like one of these types of like,

26:58 better. Yeah. real data scientists use Excel. title.

27:03 And it's like, you can see it in Stack Overflow posts. You can comment, you could see it a lot in LinkedIn posts. Like there's a lot of Reddit. Yeah. Oh, God. Yeah. I said, it's gone Reddit data science, but like, yeah, there's definitely can be this culture of like, you're not a real data scientist. Like, if you don't do machine learning, just real quick, I'll give you the pitch for why I think each one of these has the right to be great. And like, isn't the best one. So okay. So I think the reason why people like the machine learning the best is they're like, oh, cool, I get to use, you know, real time inferences, I get to actually help. So like a customer, when they go on their website, they actually there, what happens then depends on what my algorithm did. And like, it's pretty cool to be able to say that, like, I actually improved everyone's outcome. So

27:45 my car drives down the street by itself.

27:47 Yeah, everyone can see that. The decision scientist, you got to be like, the company's detective, right? Like the CEO, like like high level people can come up to you and be like, Yo, I have this question. Can you figure it out? You guess like, put on your detective hat and go to the data and really try and come up with an answer. So yeah, I guess my play detective and the the kind of like analysis, the analyst role, it's great, because it's like, those other two roles, your things could go terribly wrong, right? You're you're you could be a detective and not find the killer. Your machine learning model can ruin things for customers like things to go catch up with the wrong analysts, you're just here to help things, you know, you're helping, you're keeping the company going. It's like a more real

28:23 giving advice, but you're making the decision. Just this is what we found.

28:27 Yeah. So it's like, yeah, it's like it's helping everything run more effectively without the like, incredible amounts of stress of trying to get things right, that the, you know, are trying to build new question, you know, research and development things that you have in the other two fields. So it's like a more of a relaxed, but enjoyable job.

28:43 I'd also say like, so often, there's so much low hanging fruit in the analytics side of like, things that companies aren't looking at, that would really change their decisions, if you just surface these numbers. And plus, like, I think sometimes people can look down, it's like, oh, that's like, easy, like, you're not using, you know, stats or machine learning. Well, it's actually, you know, it can be really hard to like pull the right data sometimes to understand when someone's asking you the question like, Hey, can you Oh, there was a great tweet yesterday, or someone is like, you know, stickler, like, Can you pull this data for me? For me? And you're, you know, and you're like, yeah, sure, let me just pull from, you know, select star from ideal and pristine table that you think somehow exists? And there's actually a lot of words that the true question they're asking.

29:26 So I probably underlying all of this is data wrangling?

29:29 Yeah, I think all the people have to do data wrangling. And it's really just a skill, like data wrangling, I think trying to be able to explain what is happening in the data like so kind of the input and output, like you really need all you need that for all three of these jobs. So if you don't, if you're not comfortable taking data, trying to figure out like, you know, put it in a way that you can then use it. And if you aren't comfortable looking at some numbers and trying to say like, Oh, well, this number plus this number really means that any of these three jobs is going to be more difficult.

29:56 Yeah. How much is knowing how to talk to databases Matter like writing SQL queries, or things like that, or can you get away without that

30:05 it really matters even. So the sequel, I would say, the sequel ideas, I have seen sharp and every data science job. And I mean, I don't know, I haven't seen every data science job. But

30:14 everyone I have seen, there's that one that only works with csvs. But

30:17 that one, but even if you don't actually work directly with SQL, the idea of taking two csvs and joining them somehow together, and then filtering out the rows, like, because so much of the data in the world is stored in a tabular format, you really have to think be like understand how SQL and like relational databases for it. And if you don't actually know, exact SQL syntax, that's fine. Like maybe you know, the pandas, whatever, or the our D plier, or whatever, but like, the just concept of thinking through tables is like, yeah, you're needed everywhere.

30:44 Yeah. Emily, what do you think about that?

30:45 Yeah, I would definitely say it's, it's one of the foundational skills. And the good thing is like the basics of SQL, you can pick up pretty quickly, like, just like how to select from a table, and then you know, you can grow as needed, you know, maybe if the data engineers helping you out, but you know, of course, if you can't, if you can't access any data, you probably can't do much data science.

31:04 Yeah, does does a really good way to put it.

31:06 But also, that's not a hard skill. I mean, it's not really a skill to learn, like, yeah, it

31:10 seems weird and hard if you've never seen it, right? Like, how do I connect to it? This connection is really complicated. Yeah, but you're right. It's not a big deal. It's just something you got to learn. Now, I guess, thinking of these three different types, it's one of the things that struck me, and you pointed out one, there's two things. One was that the machine learning roll is probably a little more computer sciency. Because you're taking code and you're putting it into production, and it's real time, you're probably fitting in with API's that other people are talking to, and you're building stuff that machines talk to, is that accurate? What do you think I would say about the machine learning as more computer science? Yes, 100%, you do really need to understand like things like unit testing, or load testing in ways that the decision scientist and the other roles don't necessarily need as much right HTTP status codes in JSON? Yeah, potentially. Right? Yeah,

32:02 the risk of the machine learning engineer is that that actually becomes the risk, the risk is, if you're not careful, your job could just become software engineering. I know a lot of machine learning engineers who Well, the company doesn't have that much machine learning engineering to do at the moment. So you're just gonna be a software engineer, and then that's not great. But the converse is, as a decision scientist, you have much more stats, and like just building the actual like models, but if you don't have the work to do, as a decision scientist, there's not reports, you know, not super interesting models to build and questions to answer you might end up just doing dashboards is something that, you know, like any of these jobs kind of have a risk of falling into something you don't like, just a question of which way does the rock fall down the mountain or whatever. That's a real metaphor, but

32:44 some mountains. So another thought that I had, while we were talking about this is different people in these different groups will have massively different exposure to like the C suite. They're the decision makers of the company at a high level, I am thinking of a large company like 500 people or more like a startup. But you know, the analysis person could easily get called in from, you know, for like a board meeting to help them decide, you know, how are things are going? Maybe the decision scientist, it's, it's not so likely the machine learning developer, it's like, well, they've decided, and then you were told you're going to build this model? And here's what they're hoping for. Right? It's, it's a different kind of, you would still be working with a lot of technical people, but you have, like, different ways to grow within the company, I guess. Is that a good way to think of it?

33:31 Yes, I think that is absolutely the case that if you're in this, if you're a analyst or a decision scientists, then you're much more likely to get to go to a CEO, like go in that meeting and show some interesting data that proves something. If you're a machine learning engineer, usually you are building a product, like Emily was saying, like you're building a recommendation engine, and then there's some product person whose job it is, must be in charge of that product and make up to go and have to see

33:54 you only go to the C suite, if you're gonna be like raked over the coals, because you wrecked it with your machine learning recommended.

34:01 But that being said, I know, a lot of people who are sufficiently technical are like, Oh, I wouldn't want to do decision science, I really want to do machine learning, because I don't want to have to deal with like, convincing people, I just want to have to do with cool data modeling, or whatever, you know, machine learning modeling. But it turns out that as Emily was saying, to do those jobs, well, you still have to be able to talk to the software engineers, and the data scientists who built the model, and the product person who needs to know if the recommendation is going to be good enough for the customer. But you still have to do lots of talking to be good at it. It's just that it is less of a core tenet than it is perhaps some of the other roles.

34:32 Yeah. How does this affect early stage careers, right? Like I can, I can see somebody who, like Emily, in 2017 just came out of a boot camp, and they said, okay, you're gonna go talk to the CEO of Etsy and the board and like, help them with this product. You'd be like, Oh, my goodness, like, what have I gotten myself into, like, on one hand, be awesome, but also terrifying. Do they fit better at different stages of careers or that really matter?

34:56 I think it probably doesn't matter as much because like for a company, that's been Enough for that prospect to be kind of terrifying. Like my last couple years of startups, like I talked to the CEO all the time. Sure, basically fucking another co worker. So yeah, for tomorrow, like, you're probably gonna have more senior people, right? Who if they are gonna like have someone present to the CEO, it's probably not going to be the person who joined two months ago. Yeah. Also, the other thing we didn't really talk about is how much you're specialized into one of these roles does depend on the company. So often that's like the company size, the maturity of the data science team, right? So at certain companies, you may be like, fully like a machine learning engineer. But if you're the first data scientist at a startup, you're probably doing a mix of all of these. And you want to go as in depth in any one of them, right? Like, a startup probably doesn't need someone who can handle hundreds of millions of items, like recommendation items like Amazon would like you don't need that compute power. But maybe you build like a simpler recommendation model. And then you also play detective work. And you also no one actually knows what the sales number RSV like, finally make a dashboard.

35:52 Right? You probably do a lot of growth at this early stage startup, lot of AV testing type of work.

35:57 Yeah, exactly. So I don't want to make it seem like oh, every roll, like falls into like one and only one of this. Because you certainly can have roles where you're where you're putting on multiple of these hats, I would also

36:06 say that not only depends on the company workout, you may do multiple, but also you can during your career change. I don't do any machine learning up until like two or three years ago. And then I switched over to doing that now to now kind of do both. But like lots of people switch in lots of directions between any of these three jobs. And that that is the thing that is possible to do. Yeah, sure.

36:25 Yeah. Chapter one interview, Robert Chang is over at Airbnb is a really good case study in this. So he started more on like the analytic side and the decision science is working at Twitter, he then tend to continue that work in Airbnb, and then he ended up switching over to do machine learning. And he actually has blogged about this. And like, as part of that process, like he did need to add the skills of it. So for example, he previously done most of his work and are but the teams that did machine learning, like a lot of the libraries were built in Python. So he actually has a repo where he talks where he like, put his deliberate practice for Python and how he was going to learn that over a couple months, so we can make the sweat. That's cool.

37:00 Yeah, you can definitely switch to me, I've definitely made big switches in my career as well, from like, being terrified to the web, only working on the web and stuff like that, as well.

37:08 Yeah, and I would just add outside, I would just add on that, you know, I've talked to a lot of people who have wanted to switch and have trouble, because these jobs are a resource. And they the company has finite amount of them, right. So there's some companies where they just don't have any machine learning, engineering. And so if you really just would love to do machine learning engineer, you're gonna be in trouble because it just none of those jobs available. Or, as Emily points out, maybe they have a couple of them, but like, people who are super senior already working on them. And some companies do a startup and like they have way too much work, they could possibly do any of it, you know all of that. So you can do kind of have a lot of freedom. And so sometimes if you want to make this transition, and you're finding it difficult, and you need to switch up

37:45 Yeah. This portion of talk Python to me is brought to you by linode. Whether you're working on a personal project or managing your enterprises infrastructure, linode has the pricing support and scale that you need to take your project to the next level. With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise grade hardware, s3 compatible storage, and the next generation network will know delivers the performance that you expect, at a price that you don't get started on the node today with a $20 credit and you get access to native SSD storage, a 40 gigabit network industry leading processors, their revamped Cloud Manager root access to your server along with their newest API and a Python COI just visit talk slash linode. When creating a new linode account, you'll automatically get $20 credit for your next project. Oh, and one last thing they're hiring go to slash careers to find out more, let them know that we sent you. Speaking of companies in your book, you have a really interesting conversation about different kinds of companies. And I've been fascinated I've I've worked at almost all of these different types, early stage startup, late stage startup, probably mass quite Yeah, let's go with massive tech company but not not a government contractor. I've worked sort of subcontract with EMS, I've worked in most of these in a lot of those experiences are not really obvious. If say you're in a boot camp, and you're just looking for a job you have been through through the internals of these things. So maybe you could give us a flyover of the five different types of companies and maybe a little bit of example, about each what's the team like? What's the tech like, and what are the pros and cons and so on. Sure,

39:32 we realized very quickly that when we were writing our book that like we didn't, you know, we needed some sort of way to help people understand like, what is the actual job like, and then we're like, what, it really is so different depending on which company you're at. And so Emily and I kind of brainstormed five different companies we worked at, and then we kind of came up with goofy alternative names for them. But if you think look at art.

39:58 I love that you have Like a actual little custom logo?

40:06 Yeah. And I thought about what fonts to use with which company?

40:10 Yeah, it was, well done.

40:11 So the five companies. So we have MTC, which MTC is like your Google, your Apple, your Microsoft company, that's just like giant tech company. So they have, they're rich, they're so big that like, each part of the company uses a different type of tech, you know, so they have lots of events stuff, but because they're so big, you may not actually your stuff may not link up with you know, if you're working at Google Maps, you may have nothing to do with a Google self driving car sort of thing. This I think companies handbag love, which is just some company that's like a retail company, you know, like a Nordstrom DFW one of these companies that is big, they've been around for a while they use data science, but that's not like their thing.

40:46 But they're not a tech company.

40:48 Right? Right. And so I really like working at those kind of companies, because you got to, like go in and really do a lot because no one's there to tell you Oh, you can't use Python, you have to use our whatever.

40:57 Yeah, exactly. There's no like, let me talk to other software developers, there are no or other. There are none. like okay, well, I can just, these are the problems, please solve it with technology. These are the requirements, right? Yeah, there's no

41:10 rules and restriction. And so then we have this sag Metro company, which is like some company with a hot new idea for a startup. And they're, you know, it's really just like a classic startup where it's like, there's so many things that needs to be built at once that like everyone just kind of in a constant panic attack, because they'll do whatever you want. So it's a lot of fun and exciting. Then there's video re, which is like, imagine if like, what's that company that Vimeo the company? That's not YouTube? Right. So like some company, that's Yeah, you know, it's a tech company. It's, you know, decent size. It's not huge. So everyone knows each other. Right? Maybe zoom even?

41:39 Yeah, like, yeah, zoom could be something like that. Right?

41:42 Yeah. Yeah. And then lastly, I forget what I call it some gas. So it's basically like some giant government camio

41:46 aerospace or

41:48 something like that. And it's basically think of your Lockheed Martin, your Boeing's people don't I think when you talk about data science, they usually don't think about these companies as often. But they have tons of people like that, especially analyst like that these companies run on that. And because these kind of government contracting companies are massive. They've been around for a long time. And they really don't want to make mistakes, because that can cause a lot of damage. It's just a lot. Everything moves a lot slower. There's lot more bureaucracy, it's more of a relaxed job and working out like a startup.

42:13 Yeah, sure. All right. So which one? Have you worked at the massive tech company equivalent?

42:18 I don't know if I should say I consulted for a massive tech company equivalent. I'm not

42:22 asking which one just like, but you did what Jacqueline, that was you? That was Yeah,

42:26 I guess I worked out like this.

42:28 So the reason I'm asking because I want to ask you what you'll for your take on it. Right? Like what is the team like, Oh, what is the tech like? And so yeah, right. Don't name names. Okay.

42:38 Okay. Yeah. Actually, I realized actually consulted for a couple of times. I'm not incriminating anyone. Anyway. So when I consulted for these companies, they're like, because they're so big. They're so big that, you know, they may have this, like, big onboarding process that everyone goes through, but it has nothing to do with your actual job, because the company is too big to do that. And then we got it. Right, right. It's like really specific. I

42:59 recently started working with some company like this. They were working with the podcast, right? They were doing some ads and stuff. I had to go through and like sign a waiver that said nobody would climb on a ladder in a dangerous way. Yeah. It's a podcast, are you gonna give me audio? Like there's no ladder?

43:18 a ladder in your background?

43:21 Actually, that maybe this is just like, it was like, the warehouse person and the like, contractor who does podcasting or whatever, like it didn't, you know, they wanted to run an ad. So I had to go through this like, weird process. It was bizarre.

43:34 Yeah. And so the cool thing about working at this company is they have tons of money. And they're really excited about technology. So if you're like, I want to buy this expensive thing and try building a solution using that people are generally like, sure, whatever, it's fine. The bad thing is, this is true for everyone else as well. So when you're Yeah, you're a product A is trying to link up to product B, you might struggle a bit. So there's just a lot of this kind of lots of tech, lots of money, high salary is not necessarily everything working in sync. They have to deal with Yeah,

43:58 you probably get to work with a ton of smart co workers. Mm hmm. Yeah, it's a bit of a bonus and a curse, right? It's hard to stand out probably. But it's it's also great to have that support. Right? And

44:08 if you're a person who really likes learning from other people, and like having direct mentorship, you are this is one of the best companies to get that out. Because Yeah, like this company just draws people who know a lot of tech like magnet like a magnet.

44:18 Yeah. So like one thing, what we did is because you know, maybe the case, like you're looking for jobs, and like, you know, it's easy for you like using, you know, find a Google you're like, Okay, that's a massive tech company, but maybe you you find a job and you're like, well, it doesn't really fit into any one of these five things, is like one thing we do at the end of the chapter is we pull it together, okay, like what are some of the vectors that the companies differ on? Right? So mentorship bureaucracy, like the tech stack, so even if you find one, you know, you have a company that's on one of these five archetypes. You can sort of go through those things you say like okay, well, like it's a huge companies like probably there's a decent amount of bureaucracy, I would be the first data scientist there's not going to be a lot of mentorship. And so you can think about these different pieces. And you know, people have different preferences, right like some folks really I've talked to people Really love. Usually we don't, it's, I wouldn't recommend it for someone's first job. But people want to be the first data scientist at the company, they want to get to build everything. And then there's some experienced data scientists who are like, I would never want to be the first to the only data scientist like, I really like working on a team. So it's sort of like, you know, one of these is like, you know, oh, everyone, you know, it's always bad to like, have these certain things. But it's just different criteria that you can think about and reflect for yourself, like, what's important to me? What am I looking for?

45:25 Yeah, what's the fit? Alright, so handbag glove? Who wants to talk about that one?

45:30 I can do that one, too. All right. So this guy's talking about before. So like, there's like a retail like, let's call it like, if it's Yeah, I got like Nordstrom footlocker, one of these companies, that's like a retail company. The cool thing about this is they have a very real product that they've been selling for a long time and understand what they are doing. So like you add a lot of stability there. And by adding on data science, these companies are like, Okay, well, let's try and use data to improve the product recommendations, improve the product improve, improve our understanding of things, you know, like, use data to answer questions. So you get a lot of, there's a lot you got to do as a data scientist, you have a lot of leaves use intuition, and now they're gonna use data. Yeah, that right? Yeah. And so that's outside. So downsides are you don't have as much money, because you're not like a rich tech company. Your tech isn't as good because, you know, you just don't care as much about getting the best of the past. Like, you know, older tech is generally fine. And, you know, just as you're talking about, you generally have fewer people who can like mentor you, like there'll be someone there. But you know, there'll be people there generally. But it might be that everyone know, everyone's using, like a far outdated Python library, because no one knows the new way. And no one's reading up on it. So Exactly.

46:32 They're still on Python two, yeah. Something like that. And then early stage startup, what's the story for data scientists there?

46:42 Yeah, I can talk a little bit about this. So yeah, what data scientists like you come in, and you really get to shape everything. So like, there's some negative parts. So even beyond the data science part, right? Like, you might show up the startup, and they're like, Oh, we don't have your laptop yet. So it's sort of a funny thing is like, there's that but there's also more freedom, because they might ask you like, hey, what kind of laptop do you want? Like if they're like a decently well funded startup? And you're like, Oh, I want this really souped out laptop?

47:04 You don't get that super slow, clunky one with a huge company, bear. Yeah.

47:10 Yeah, exactly. That's a mixed bag. But yeah, often you're like a segment, there's a lot of low hanging fruit, you also have to wear, you may have to do some data engineering, right? Like, maybe there's not any data engineers, and all of the databases are optimized to like, you know, serve the website. And so it takes you five minutes. So like, get a count of $100,000 800,000 800,000 row table. Yeah. So yeah, so you have to wear a lot of different hats, you might be pulled in a bunch of different directions. So it's also really important to be able to prioritize, like to not just be like firefighting, also, take some time like to, for example, build up some skills, I'd like to build up your toolbox. So like, okay, maybe write a library for yourself of like, that's a wrapper around pulling the data. So that becomes easier.

47:52 That's a really important point. Because I think a lot of these I've worked in places like this. And it's nobody asked you to build a proper library for data access, they help you, they ask you give me this answer, or make this product or give me this thing, and you're like, Yeah, but we really need this thing in place. And I, somebody's got to build it, it's gonna be me or the other person I'm working with. And you just kind of have to be willing to put in that infrastructure along the way, right? Because you're gonna appreciate it later. But there's no guidance for that. Right? And it's definitely not in place, usually.

48:22 Yeah. And you have to, like help teach people like how to ask questions like what is possible, like, bringing best practices. So like I was saying earlier, I really would not, most the time recommend this for someone's like, First Data Science job to do this. But for inexperienced data scientists like I found some people who really, really love doing this, because they're like, ah, I don't have to deal with like, you know, the decisions of past data scientists, I get to shape this in my my vision. And I could use the most modern tools. For example,

48:48 you want to use Python, you can even use r you can like no one's gonna there's no one there. So you just they just need answers. Yeah,

48:54 you could use F sharp, right, Jacqueline? Yeah. I got

48:57 a lot of a lot people make fun of me, because my favorite programming language no one else in the world.

49:02 Well, I did see the Jupiter notebooks now support F sharp. So that's still my heart. That's awesome. Yeah, but I mean, like early stage startups, I would say also, handbag, love companies a little bit as well, because they may have some tech stack, but it might be so outdated. They're like, you're new, we want to like go and we want a refreshing direction where you can go in this other way. But we're not going to make use this old thing. We're going to try to get, you know, get something new growing here. So you can go and have some flexibility as well, I think, yeah. What about the video or at the later stage startup?

49:36 Yeah, for my preference. I think this is kind of a sweet spot. Because like you have like, you know, it's sort of like in the in the, in the middle of a lot of these things, right. Like, there's like some bureaucracy, but I kind of like for accuracy. Sometimes, like HR has their stuff figured out like That's nice. It's like benefits.

49:51 There actually is vacation. Well, yeah,

49:53 exactly. You know, so there's usually like there's a team of data scientists, but since it's it's still like a startup, you know, they weren't You don't have like a, you know, 40 year old tech stack, right? Like most decisions were made made, like five or 10 years ago. If that. Yeah. So I think this can be a nice fit, like you can still get, you know, it gets to like, no everyone on the data science team like, unlike if you're at like a massive tech company, have support but also have some some structure in there as well. There's probably like data engineers and other people to help out with like data science and Jason problems. Yeah, you're

50:23 probably a little more locked into a tech stack.

50:25 Yes, that is true. Yeah, I don't think you can really, like do a tech stack from start until you're locked into certain decisions, you know, and there may be sometimes you're like, I wish I have a time machine and like, go back and like, fix this decision they made a while ago, right? Like, you're at an early stage startup, you can be like, Alright, we're gonna start, like collecting data right away, you know, we're gonna log everything. And then if you're at like a, you know, later stage company, they're like, oh, like, why don't we like look at the state? And you're like, Oh, actually, we weren't collecting that a year ago. They're like, okay, but forecasting model anyway. And you're like, Oh, no,

50:55 no, you don't understand how uncertain this answer is going to be. Yeah. And then I guess the last type of company, yeah, that you'll covered was the government contractor, the Lockheed Martin's and Halliburton, and so on.

51:08 Yeah. And I should also mention, this includes the government itself, right. Like, if you work for the government transportation or something like that, or just, you know, companies where there is for legal reasons, there's just a lot of regulation, a lot of things like, you know, keeping things moving a little slower. And so these kinds of jobs, they tend to, they have to tend to have lots of people who are not data scientists, and you tend to have the data scientists maybe embedded in little groups about so like in the missile department, or whatever, you know, I don't know. And so, because of that, you generally you don't have mentorship often, but you often don't have as much people telling you, no, you can't do it that way. You're wrong. You may have it like, Oh, no, we don't support Python past 2.7. Because our, you know, our procurement department has it clear and whatever. So maybe if your accuracy, but there isn't like, Oh, you have to you know, like there's, there's just not as much of like a standardization around tech, just because there's, you know, that's not the focus of the company. And so these kinds of jobs, I'd say are really, they're really great. If you want a job where you go in each day, you work eight hours with a 45 minute lunch in there. Yeah, got a little bit of stuff done, but you don't stress crazy about getting it the most popular can. No one's stressed about you getting exactly the most, right. So there's not like, you know, if you're a job where you're like, I'm going to go in, I'm going to be the TEDx data scientist, I'm going to rock you know, get my career to be a rocket ship up to the C suite as fast as I can. Like, this is not the kind of company for you. It's the kind of company that's for more for people who are like, I just want to do a consistently good job, I think go home and take my paycheck and spend it on something I enjoy.

52:32 Yeah. And don't need a lot of perks. That's good.

52:35 Yeah,

52:36 yeah. cuz I've talked to people who was like, yeah, it's like, especially if you look at some of these tech companies, right? And like, I don't know, Airbnb and like Roseanne Taff, or something like you, you're lucky if you get coffee at some, like?

52:48 Yeah, that's for sure. I think another thing that is interesting is so many of these types of companies are driven by like, government contracts or projects. I'm thinking of like DARPA funding and like, here's a project that is guaranteed to run for one year, and that it may immediately get canceled, no matter what, right? So you have like these sort of long time horizons of working on something, but there's, it could become a totally different type of job, because some other contract was one and this one was expired, last whatever.

53:19 Yeah. And I think there's kind of, I would say, not just besides government contractors, even imagine there are some other fields that might kind of fall into this, like certain parts of healthcare might kind of fall into this area, or like, definitely, you imagine parts of finance, like some, like, you know, rules around, you know, financial risk regulations might kind of have some of these components, too. But it's more of the archetype of the company that's got a lot of regulations or reasons why it has to move slowly and not break. Yeah,

53:43 interesting. I think that's a really cool list you put together and I agree with a lot of your assessments there. So pretty neat. Now, we've been talking forever. And we just barely touched on the stuff that we could just talk to. So because this is such a great book, and a great topic, but just for the sake of time, let's talk about one more topic and maybe blend this together. So let's talk about getting the skills becoming a data scientist from wherever you're starting. And then also maybe just real quickly building a portfolio because like I said, at the beginning, I do think having that first job is super important. And getting that first job is strongly influenced by just having something I can show you want me to do this. I've already done it. You don't have to verify if I can do it. I look this is it. Just look at it. You know, is it a personal fit or a salary fit or whatever? Right. So let's start with getting the skills first. Okay, yeah, back when you were talking about a Master's? Yeah. And Emily, you're talking about a boot camp? Those sound like two different paths to me. You didn't necessarily study? programming, right? You kind of went the math side, which actually I did as well.

54:41 Yeah, I can cover. Yeah. Let me talk about all the different ways you can get skills and then Emily can talk a little bit about the portfolio, because that may or may not have aligned with the chapters. I mean, who's to say I would never revealed that. So we really, we think there's like four ways you can kind of get data science skills. One is getting degree, but usually it's for people that go and get some sort of master's degree, which is either like data science, or maybe computer science or math, something like that. And the degree is great in that, if you don't have that much of a background, you will learn what you should, you will spend two years doing it. So you should learn the basics of what you actually need, right? The data science degree, you should learn data science skills, and you make do some projects during it. The downside is it takes two years and like at cram, that's so much money. Yeah, boot camp takes 12 weeks, and like 15 grand, that's much faster, much cheaper. And the whole point of a boot camp is to get you what you need as quickly as possible. And I feel to me, like boot camps almost do a better job of connecting you with a job afterwards, then, like a master's program? Yeah, I think that's true. And I think but generally, I would recommend boot camps more except for people boot camps, you really need some sort of background already. Like you need to have some idea of programming or you know, some knowledge of this kind of field already. If you don't know anything about data science, that might be 15 grand, then you still are just kind of confused. You can. The third option is you could try and find data science work within your job. If you're an analyst and you want to do more decision science, you can try and find places where you can do decision science in your analyst job. If you're a decision scientist, and you want to do machine learning engineering, you can try and find places where you can do software engineering, so you could kind of try and learn within whatever your job is, what if you're a scientist who kind of does a little computation, and you kind of want to drift towards the data science side? Yeah. So I actually know someone who is trying, you know, doing that very thing she was Yeah, yeah, she's a scientist, she takes measurements and she's in her job has started to use our to actually make plots and do the kind of investigatory stuff that's totally been working for. And it's great. And then lastly, you can teach yourself, right, there's all these courses online, you can work on your portfolio, which we'll get into and teaching yourself is great, because it's free, you have to focus on the stuff you care about. And yeah, you can really, if you can motivate yourself correctly, you can really like learn a lot that way. I've learned a lot this way. The downside is, is that it requires an immense amount of discipline, right? If you try and do everything learning online, you have to actually do those courses instead of playing Animal Crossing.

56:59 Not that play Animal Crossing. But no

57:01 Jacqueline's just calling me out.

57:05 And you know, you don't know if you're teaching yourself the important things or not at some level, you don't have a mentor when you're teaching yourself. And that's a problem.

57:11 Yeah. It's also I feel like sometimes when people are trying to teach themselves, they try to boil the ocean, right? Yeah, yes. You know, like, well, I saw this and this and this. So I got to know all those things. Like No, no, you just vertical slices. Not horizontal. Yeah. Like what, like, figure out what you got to try to build something and learn what you need to build that shallow or deep in these areas? And then go from like, iterate, right?

57:31 Yeah. And I think a similar problem to that is when you're teaching yourself, there's not like, a natural stopping point. So like in a Master's or boot camp, like they end and then you're like, I guess it's like time for you to like find a data science job. versus if you're teaching yourself, it's so easy to be like, well, I can't apply like a data scientist, yet. I haven't learned like this thing, or I haven't learned this thing and just make

57:49 that right. Yeah.

57:50 Which I think is, you know, you're always data sciences as a career where you're always going to be learning. And so it's not like you get one like, no one knows everything. So you don't have to feel like okay, I must like Master, you know, the whole world to be able to get a data science job. Yeah,

58:04 yeah, I hear. Let me put data science, a little bit on a pedestal here, like so I feel like as a developer, you can build web apps work with databases, whatever, like you can totally do a quick boot camp, you can pick online courses, read books, teach yourself, I do feel like those are skills, you can mostly get yourself there, like painful lessons you have to learn, but I'm not sure you've learned those in school anyway. But with data science, I feel like there is a level of statistics and like scientific understanding a little bit of math that I think is a little bit harder for people to just get on their own. But having some formal training in the background seems more important for data science than pure development,

58:44 I would guess I would probably agree and, and not because the actual statistical and machine learning models you learn as a data scientist are like somehow harder to learn than software engineering. But because the fields, those fields are so confusing, and like in a layout, like statistics, that what is considered statistics versus machine learning versus industrial engineering, like these are all extremely poorly laid out. The people in those fields make them as confusing as possible to make it seem like only they understand it. And there's not really an easy pattern. There's not really just like one book out there that's like, Oh, this thing is statistics is actually that thing in computer science and like they're the same and don't worry about that half of statistics. Doesn't super matter. Like that's not easy information to find. Yeah,

59:22 yeah. I do think though, it also depends on like, what type of role you want, right, like so I think it's a little bit less important and analytics role, for example, to have like that background and there's certainly like, you know, people do you can select learn some of this on the job, you know, whether it's like mentorship or like reading books or like other other things, but I agree, like there was a bit of a danger because I feel like, especially with statistics, it's like, if you run a statistical test, like oh, generally spit out an answer, but it may not be answering what you think versus right like it's a little more obvious. Sometimes the development work like up the website didn't load, so I guess I have to like figure out what went wrong. So there's a bit more of a danger there. Yeah, you're

59:58 always gonna get a number from That library from those algorithms right? And you have to understand what it's doing. We're like, that's kind of what I was saying. Like, it's really clear if the website is letting the user login or not. Yeah, there's not a huge debate. Maybe the security's not quite right. There's details you got to get right. But it's generally it works or it doesn't

01:00:16 just cuz I'm so upset about the point I made previously, because I think the points right, but I'm getting upset thinking about it. It's like, so like a linear regression, or logistic regression has some built in assumptions. If you're in a CS department, you're like, I'm used to linear regression to fit this as part of the neural network. See, people like fine, you did that a stats department, they're like, how dare you? That's so incredibly wrong, right? You violated the assumptions. And it's like, well, these are two trained academic professional 72. Totally different things. And I think that is something you get all the time in data science that you don't get as often in software engineering. Yeah, it's exactly that kind of stuff I was thinking of, yeah, the both of the two things you both mentioned. Alright, let's close out our conversation by talking about getting a portfolio, maybe makes it a little possibly contributing to open source as we're out or something. Emily, do you want to give us a rundown on that?

01:01:02 Yeah, absolutely. So the idea behind a portfolio, and this is especially helpful for people who don't have a formal education or you know, haven't worked in like very similar jobs, or like been able to learn on the job, because as you were talking about, this is a way they can show they can do the work, even if they hadn't had an opportunity in school or at a company. So like a portfolio project. So we really recommend for it is doing something original that you care about. Because you know, one thing people might default to is like, I'm going to go look on kaggle. And I'm going to find like one of the data sets they have, and I'm going to like do this competition, where they like, give you a dataset, and they like, you know, tell you to predict this thing. And the problem with that is like one, it doesn't really show your personality, it skips over the steps that are really critical. And you'll need to do in like data science roles, which is like gathering the data, figuring out what question to answer. And also honestly, like, if a company sees that in portfolio, you know, maybe they're worried that like, Oh, they just copy someone else's code, right? Like, this is a problem a lot of people have worked on. So we recommend, you know, kind of finding, figuring out a question you're interested in answering or finding a data set that's interesting to you and exploring it to like, figure out like, Okay, what are some like interesting findings I can have from that. And so putting that together, and then sharing it on GitHub, so you have the code with a readme that describes it. And then ideally, also having a blog, because a blog is really great. Someone may not look through, you know, like hundreds of lines of your code. But there might be like, Oh, yeah, let me read about like what they found, and like, look at some visualizations, or read a tutorial that they wrote, because they use natural language processing for this project, or even

01:02:33 just look back two years. And see they've been doing this for as long as they said they have been or something like, Yeah,

01:02:39 exactly, exactly. And so Jacqueline share the example project that she did, which is trading network on a ffensive license plate. So I do want to emphasize, like, it doesn't have to be, you know, something very serious, or if you want to go into finance doesn't necessarily have to be like a finance project. Because you know, if you're like, Oh, I use like neural networks, or like one of the projects I did was I built a dashboard. So that shows like, I can build a dashboard from scratch. So I really think this can be like a great way to show off, like some of your personalities, your coding skills, your communication skills with the readme, and the and the blog, and maybe even demo it an interview. So when I was doing the job search, after graduating for boot camp, I would show I'd bring my laptop, and sometimes I would show this dashboard that I had built. And I'd be like, Look, you could filter it, and you can click around and it like you click this, it goes to a link. And I think that made it like much more much more real to them than if I was just talking about this theoretical project.

01:03:29 Absolutely. That's awesome. That's really good advice. Another thing I think would be valuable is if people can in the right place at the right background and whatnot is to maybe contribute to some project that's relevant in the data science space. Right? Like, if you have two people you're interviewing and one's like, well, I'm pretty good at using Jupiter, the other person's like I had to PR is merged into Jupiter. And actually know some of the people in the team, you know, a little bit who work on like, Okay, I know, I'm going to talk to you a little bit more next about, you know, it just it just all it's a different level of credibility, even if like what you did was, there were no unit tests for this part of the library. So I wrote some unit tests, or I work on the documentation, or I worked on a tutorial. Like, it doesn't have to be I rewrote the main thing, right?

01:04:11 Yeah, absolutely. And we have like, I think it's 14 chapters on like joining the community. And that's one of the things we talk about is contributing to open source and exactly what you said, like it can be, you know, writing new documentation, even fixing a typo, just these ways to get involved. And I do want to emphasize that like, you know, this isn't something that is required to get a data science job, like I know a lot of data scientists who don't have like a GitHub with personal projects who don't have a blog who don't contribute to open source, they're still like excellent data scientists. But it's just like one of the ways that one hopefully it's fun to hopefully learn something like that's the other big point of the portfolio project it's a great way to direct your learning like you find out Oh, I need to like you know, figure out how to scrape this website let me go like to gather the data. So let me go learn web scraping and three like enter standout and interviews but it's certainly you know, shouldn't I don't think it should be like a requirement for any job, for example.

01:05:02 Yeah, right. I agree. And it probably at different company archetypes. They probably care or completely don't care about this. Right. Like, yeah, big geospace, geo aerospace contracts. They're probably Okay. Great. We don't know that we trust you. If you're writing code for just for open sores, that might be weird, right? Whereas like, the startups are like, Oh, my gosh, that's so amazing. You know, or the big tech company. We're trying to move to open source. That's great. be one of our advocates. So yeah, I suppose that it probably varies a lot as well. in there. Yeah. All right. Well, I would love to talk more about this, because there's a ton of cool ideas you to put in there. But I think we have to leave it at that. Let me ask you the two quick questions before I let you out here. If you're gonna write some code, do some data, science data analysis. What editor Do you use these days?

01:05:45 I use our studio although what our development goals I've actually starting to use the them as the editor within it. So I'm trying out that but yeah, I've been I've been using our studio for four years now. Although I also heard what is Visual Studio like now sports are and like one of my teammates was trying that out. And really like, Yeah,

01:06:03 probably vs. Code. Yeah. vs. Code. Yeah. And Jacqueline.

01:06:05 So I'm a 5050 split between our studio and Visual Studio code. So our studio for anything are related, literally anything else, including, like just news to myself? Visual Studio code?

01:06:15 Yeah. Awesome. And then notable libraries out there for data scientists, not necessarily some super popular, but I got this package is really awesome. People should know about it.

01:06:23 Do they have to be Python libraries, no.

01:06:30 More data science topics.

01:06:32 cross border? Well, I will mention one of the libraries that I created. When I was consulting for T Mobile. It's called load test. And it's for R. Me and the T Mobile team made it and it's um, it's to help you if you're making an API and are using the R library plumber, which is great. You can use the load test library to test it to make sure that your our model will be able to handle the look. Okay, awesome. Yeah, very cool. Anyway,

01:06:58 I have so many now I'm wondering if I should, you know, say my own package as well. But do it like that? I'll briefly share, I use it less now. But I use a lot my last company funnel join for like analyzing sequences of events, you're like, Alright, who like came to the website? What percent of people who visited the homepage then bought a subscription. But what about if we want that within two days. So that's one, but another package, I really are so many packages that I like. So one thing that I'm very excited about, which is like sort of hot, it's been in development for a while and in pieces, but is tidy models. So a rethinking of how to do modeling, and are with a brand new website out now to say it's tidy models. org. So I'm excited about that. And then finally, just the janitor package is a fun one for if you do cleaning data, it just has all these functions for like you import a data set. And there are spaces in the names and like weird capitalizations and like weird, like characters that make it hard to work with it as a function like clean names. And I'll just fix all those for you. Oh, that's cool.

01:07:55 Yeah. And I think there's a PI generator as well. I'm not sure if it's directly the same, but some people do in Python, negative pi Jenner, you also go and throw it out there on the Python world for folks. There's this thing called missing, no missing in Oh, four missing, it's a visualizer for missing data. So you just have a panda's data frame, and you throw it at it. And it'll draw you like a big cool graph of visually where your data is filled in where it's missing. And all these sort of like correlations of you're missing this data, you're probably also missing that data. Super cool. Yeah,

01:08:25 there's actually a an r1 for that, which is handy, or I never know how to pronounce that. But it's also for like, missing. Yeah, Manager for missing data.

01:08:33 Awesome. Yeah. Yeah, that seems super valuable. Just get a quick like, I've got all this data loaded up. Let me just look at it. Yeah, visually.

01:08:39 Yeah. Cool. And yeah. So tell us how people can get your book. Our book is online, you can buy from the Manning website, who's our publisher, and we actually have two URLs because we had a disagreement about this. So we have the professional URL. You want the professional version of the book? Yeah, yeah. Data sigh And then we have a fun version of the book, which is at best book. Cool. And those will take you to the same webpage. But no, if you click best book, cool, you're getting the fun version of the web. site. is the professional one.

01:09:10 Yeah. And maybe we should have people guess like, which one of us for free? Which one? Which one of us? Is that fun? one? Which one?

01:09:16 Exactly. Put it in the show notes. section at the bottom of the page. Awesome. Well, Jacqueline, Emily, it was really great to have you on the show. And I can certainly recommend your book. It's spot on. It covers a bunch of great topics. People ask me about careers all the time, and always want to have good advice to give them and so here's definitely something they should check out.

01:09:36 Thank you so much. Thank you so much. Yeah.

01:09:39 Yep. Bye. Bye.

01:09:41 This has been another episode of talk Python to me. Our guests in this episode were Emily Robinson and Jacqueline newless. And it's been brought to you by kite and linode. Height is the smart AI powered autocomplete for your editor. And the more powerful your editor is the more effective that you are get kite for free at talk Python dot FM kite. Start your next Python project on the nodes state of the art cloud service. Just visit talk slash linode li n od E, you'll automatically get a $20 credit when you create a new account. Want to level up your Python if you're just getting started, try my Python jumpstart by building 10 apps course. Or if you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at slash iTunes. The Google Play feed is slash play in the direct RSS feed net slash RSS on talk This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Don't get out there and right

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon