#139: Paths into a data science career Transcript
00:00 Michael Kennedy: Data science is one of the fastest growing segments of software development, but it takes a slightly different set of skills than your average full stack development job. This means, there's a big opportunity to get into data science, but how do you do it? How do you get into the industry? Well, that's what Hugo Bowne-Anderson is here to tell us all about. This is Talk Python To Me, Episode 139 recorded November 7th, 2017. Welcome to Talk Python To Me, a weekly podcast on Python. The language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @MKennedy. Keep up with the show and listen to past episodes at TalkPython.fm, and follow the show on Twitter via @TalkPython. This episode has been sponsored by Rollbar and GoCD. Thank them both for supporting the podcast by checking out what they're offering during their segments. Hugo, welcome to Talk Python.
01:06 Hugo Bowne-Anderson: Thanks, Michael. Great to be on the show.
01:09 Michael Kennedy: It's fabulous to have you here. I think it's time that we do a dive into how people become data scientists, how they get into data science. I've done a couple of shows on becoming programmers, but that's not exactly the same thing as becoming a data scientist in the sense. So, I'm super excited to talk to you about all these different paths into data science and how people can kind of level up in that space.
01:33 Hugo Bowne-Anderson: I'm really excited also, because I've been thinking about this a lot lately, of course.
01:38 Michael Kennedy: Yeah, of course. It's a really important topic. I mean, data science, I really attribute Python's meteoric growth over the last three years to data science. I know it's growing in others and it's playing important parts all over, but the rise of Python in data science and the rise of Python becoming more popular, those two graphs seem like the same.
01:58 Hugo Bowne-Anderson: Absolutely, and you can see the Python community embracing this as well. I mean, I was at PyCon in Portland, where you are this year, and we had two keynotes, well, many keynotes, but Jake VanderPlas and Kathryn Huff. Seeing such data science illuminaries and thought leaders being invited to something like PyCon to give keynotes more and more is really exciting.
02:19 Michael Kennedy: Yeah, and Jake VanderPlas' keynote especially struck a nerve with me because it really opened my eyes to, basically, his message was, this Python ecosystem is a mosaic and there's may different ways in which people are using Python, and basically, many different things that Python means to different people. The way that maybe a web developer working on a large scale web app works is really different than the way a data scientist exploring astronomical data would work, but these are both super valid and reinforcing the ways to work. I really liked his message.
02:54 Hugo Bowne-Anderson: For sure.
02:56 Michael Kennedy: Awesome, okay, so before we get into data science, let's get your story. How did you get into programming and Python?
03:04 Hugo Bowne-Anderson: Okay, so, at grad school. I'm from Australia. I went to the University of New South Wales for grad school. I did pure math there, and I did a bit of MATLAB, I'd done some Maple as an undergrad, but all of this was relatively minimal. When I started my postdoc I moved away from pure math and went into more applied math that I'd done as part of my undergrad. I was working in cell biology, in fact, in Germany, and I was working in biophysics, so, thinking about kind of the physical, mechanical principles of how cells grow, reproduce, that type of stuff.
03:37 Michael Kennedy: It sounds really interesting.
03:39 Hugo Bowne-Anderson: Oh, it was incredible. I was actually working in an institute of maybe 400 cell biologists and theorists dedicated, so it wasn't on a university campus. It was an incredible environment. I was hired, ostensibly, to do mathematical modeling, but the biologist I worked with kept on coming and asking me the same questions with respect to data analysis, statistic inference, this type of stuff, which I don't know a great deal about at the time but with my quantitative training, I really could pick it up on the fly. So, I started working more on the data analysis and statistic inference in conjunction with the modeling. Of course, to do that today, you need to be able to program because data sets are so large. I mean, you can't do it with pen and paper like they used to. So, I started learning Python and R to do this. I learned via online courses, a lot of web resources, and in fact, you know, the open source community in Python and R are really embracing, so any questions I had, I could pick up on the fly.
04:46 Michael Kennedy: Yeah, that's really cool. I definitely see this working with scientists or in these types of areas very important. How much programming did the biologists do? Did they program in MATLAB, were they just Excel people? How much were they taking care of themselves and how much were you solving their science problems?
05:08 Hugo Bowne-Anderson: This was now, wow, seven years ago. I didn't quite realize that. The answer, then, was MATLAB. Grad students would come to me and say, "I need to learn how to do this image analysis in MATLAB," or, "How do we estimate these statistic parameters? "How do I get the mean out of this dataset using MATLAB?" Something I saw, maybe, three or four years ago was a conversion in which more and more students came and started asking to learn Python and R. Biology is R, a lot of the time, but more and more Python in physics and R in biology. I think people just started seeing the value. Also, I think there's a challenge that, MATLAB is incredible for a number of things, but part of their business means they're embedded in institutions and it's really tough for institutions to break away. It's generational in a lot of ways, actually. The guys at the top, MATLAB worked for them in a number of respects, but seeing the resurgence of this, these open source libraries for academic research is really exciting.
06:23 Michael Kennedy: It's super exciting, and to be fair, you know, the world looked really different 25 years ago in scientific computing than it does now. Open source was not so much of a thing. Your alternative was probably C++ or something, right?
06:38 Hugo Bowne-Anderson: Absolutely.
06:39 Michael Kennedy: Maybe Fortran, right, but they were not great tools. It was a super clear choice to choose MATLAB, but you're right, now it's the senior professors that have been doing MATLAB for 30 years and all their work is in MATLAB, you know, there's probably a tendency to just stick with that. Your students come on to help you, like, hey, you got to learn MATLAB 'cause that's where I work, right? Something like this.
07:00 Hugo Bowne-Anderson: That's correct.
07:01 Michael Kennedy: I do think there's some really interesting growth around trying to displace MATLAB. I mean, there's SageMath also, from Seattle. A similar place to where Jake VanderPlas and the eScience Institute is up there. I think it's really powerful to see people coming in, learning Python. I think one of the major advantages people get is you can take that into general industry when you're done. If you go and study applied math but then you actually don't become a professor, what are you going to do, right? Knowing Python has a lot more doors that get opened for you than knowing MATLAB.
07:42 Hugo Bowne-Anderson: No, and you can collaborate with anyone around the world as well. Somebody to read and execute your code doesn't require them to have a proprietary license.
07:51 Michael Kennedy: Yeah, that's a really good point. The cost of these proprietary systems like MATLAB or Maple like you mentioned, they're problematic, right? I did some work with wavelet decomposition, and that was like a $2,000 add on to MATLAB. That's a crazy amount of money for one license.
08:10 Hugo Bowne-Anderson: Absolutely.
08:11 Michael Kennedy: There's probably, like, pip install something now.
08:14 Hugo Bowne-Anderson: That's right. I mentioned Katy Huff's keynote at PyCon. She, in this keynote, did a wonderful thing where she laid out a series of points of what scientific research and scientific methodology has been historically and needs to be and demonstrated that open source communities actually are far better at all these scientific principles than most other communities that have existed since the Ancient Greeks. Things such as version control, reproduceability, absolutely open code bases, this type of stuff is exactly what science needs now, particularly as we're all, this buzz term reproduceability crisis, it's incredibly important that all of our tools and techniques are open.
08:54 Michael Kennedy: Yeah, that's a really nice point that open source, kind of, very much has the same zen as the principles of scientific research and exploration, right?
09:04 Hugo Bowne-Anderson: Yeah.
09:04 Michael Kennedy: That's awesome. That was how you got into Python and programming. How about now, what do you do today?
09:10 Hugo Bowne-Anderson: I work for a startup called DataCamp, and we do online education for data science. We have an in-browser platform, and now we have a mobile app, actually, where people can come and learn and practice and apply data science. I've recently changed positions in the company. Until recently, I was working on curriculum building. When I joined the company, we had two Python courses and over the past year and a half I've built it out with colleagues and external instructors to around 30 courses. That job was really ideation, what courses will look like, high level view of a curriculum, figuring out what data sets to use in courses, techniques to teach whether it's scikit-learn, pandas, how to approach these APIs. We've taught them with wonderful people such as Andreas Muller of scikit-learn. Great courses with the people at Anaconda. I spent my days writing code and explanations for courses, marketing material, being on calls and on GitHub with instructors which was so much fun.
10:11 Michael Kennedy: I actually did basically the same job but for a different focused curriculum. For a long time, I was head of curriculum at a company called DevelopMentor, which got acquired and I don't do that anymore. It was really fun to sort of look broadly at a technology, think about how people get started, how they become experts, what are the important parts? And really try to piece that together as a jigsaw puzzle. It's a fun job.
10:40 Hugo Bowne-Anderson: It's super fun. Particularly, just trying to match up all those different parts of curriculum building, just making sure that five days in a row are not in the weeds of figuring out which data sets to use, so mixing up high level curriculum building with being in the weeds.
10:56 Michael Kennedy: Yeah, that's awesome. It's definitely a social side of programming.
11:00 Hugo Bowne-Anderson: For sure. And now, I've transitioned to a job working as a data science advocate and evangelist for DataCamp. I'm doing data science on a daily basis, writing articles about data science for our community, pedagogical articles, technical blog posts, topical. Currently, I'm writing and developing an analysis of the #MeToo movement on Twitter, seeing how that has developed using the Twitter API and Python in a package called Tweepy. Yeah, doing data science. For example, the Twitter analysis I just spoke about, I'm also doing data science on student data to see how we can cluster users and cluster our students and see best learning techniques for students, whether, for example, a bit of practice on DataCamp every day reinforces learning more than dedicated long study sessions on weekends. Something I'm about to start, which I think would be of great interest to the open source community, and we have courses on pandas, for example. I'm going to start looking into the data to see if we can check out after a student imports pandas as PD, what are the top three mistakes they make straight away? These types of things will be of interest to us, to our students, and also to the open source community at large.
12:25 Michael Kennedy: Yeah, I think that's really interesting. These ideas of, like, helping people with that first step. Because a lot of times, getting into a new technology or a new library, it's those first steps that are the hardest to take.
12:37 Hugo Bowne-Anderson: Absolutely.
12:38 Michael Kennedy: And you're also doing a podcast?
12:41 Hugo Bowne-Anderson: Exactly, I'm currently developing a podcast called Data Framed, which is about data science, about what data scientists do on a daily basis, and about the societal impact of data science, which is really exciting, 'cause I think it can be of great value to our students, and it can be of great value to a lot of working data scientists. For example, data scientists working at Uber really may have no idea what data scientists working at Netflix or data scientists working in astronomy do on a daily basis, but because it is a term that encompasses so much of what working professionals do, I think it's really exciting for me and will be for the community as well.
13:19 Michael Kennedy: Yeah, I think that's awesome to shine a light on these different areas. The stuff that you're doing at Uber, like you said, is really, really different than maybe if you're working at, say, a police department trying to understand how police violence or violence against police happens. These are really, really different, but maybe there's lessons to be learned from one to be applied to the other.
13:43 Hugo Bowne-Anderson: Absolutely, and all the way from that to city transit data and how, you know, I live in New York and New York Transit has a huge API, the MTA, where you can go and access data with respect to how the subway works and how decisions are implemented around that.
14:01 Michael Kennedy: Yeah, that's really interesting. I bet there's some awesome data science stories to be told out of the public transit of these major cities.
14:08 Hugo Bowne-Anderson: Absolutely. There's a blog called I Quant NY, which is all about getting access to public New York data and seeing what you can find in it. That's a great blog to check out.
14:20 Michael Kennedy: There's probably a lot of data science going on there in New York around the stock market as well.
14:25 Hugo Bowne-Anderson: Yeah, absolutely.
14:26 Michael Kennedy: Oh, man. Of course, they don't share that so much. Once you find something that works, they keep that quiet, right?
14:33 Hugo Bowne-Anderson: That's it. The other thing I've been doing recently is these Facebook Live code along sessions. I'm a huge fan of live coding. I know it's probably slightly masochistic, but I don't have a huge problem. One of my favorite things about live coding, one of the most valuable moments for me and people coding along is when I make a mistake that I can't figure out and I need to go use a search engine and go on Stack Overflow, and people see me figure it out in real time. Actually, I think Jake VanderPlas one upped that once where he was doing a live coding session and found a bug in scikit-learn and went in and issued a PR in the coding session that fixed the bug. That's online somewhere.
15:21 Michael Kennedy: That is awesome.
15:23 Hugo Bowne-Anderson: But the Facebook stuff is great, because Facebook is really pushing their live sessions at the moment, so everyone who follows us, we've got now, I think, 330,000 odd followers on Facebook. When I start a live code along session, they all get notified and a whole bunch of them jump on and interact and it's super fun. They can comment just below the video. I've got a colleague there who filters the questions and I answer some of them and that's also a lot of fun.
15:50 Michael Kennedy: That's a really interesting trend. I haven't seen a whole lot of that previously, but I did just talk about in Python Bites last week this AI framework that you can basically plug your AI into almost any game and then teach it to play that game, and the guy who runs that, he actually has a Twitch channel and some of his Twitch code along, building up these AIs and teaching them to play various games, they're like six hours along.
16:21 Hugo Bowne-Anderson: That's really cool.
16:23 Michael Kennedy: I'd never really watched one of those. It was really quite interesting, actually.
16:27 Hugo Bowne-Anderson: It's a whole different world, isn't it?
16:29 Michael Kennedy: Yeah, it sure is. A lot of the stuff that's online is really polished or somewhat polished, but it's at least intended to be polished and packaged up for, like, here's a 20 minute little thing, whereas those are more, like, let's just explore this until we have the answer. That's cool.
16:44 Hugo Bowne-Anderson: Yeah, exactly.
16:45 Michael Kennedy: Nice, alright. Career paths into data science, let's talk about those.
16:50 Hugo Bowne-Anderson: I think we'll, in this conversation, move to maybe more technical, more specifically data science-y material, but the first thing I wanted to state, very passionately is that, as with anything but perhaps more so in data science, be active, be curious, and be part of a community. There are lots of budding data scientists, aspiring data scientists, working data scientists, hiring managers out there, and getting in touch with them and putting yourself out there is incredibly important. To that end, I'd really suggest starting on some basic data science projects, if it's your first foray into it, and we can talk about what that could look like in a second, and create a public profile. Get yourself a GitHub account to do that. Maybe have a little blog where, every now and then, you post some analysis you've done. Even if it's a basic exploratory data analysis, that's great, and put some words in there, put some images, and figure out how to communicate around this. Go to conferences, go to meetups and talk to people. Hackathons are also effective.
17:58 Michael Kennedy: Yeah, hackathons, that's definitely a nice way to meet people who are more than just sitting next to you at a presentation, but actually, you're kind of working together a little bit. That's cool. I definitely encourage people to create some kind of blog or write stuff. I think that that's really valuable. And you don't have to wait until you're an expert for sure at something. It could be you're solving some problem and you couldn't find a way online how to solve that particular problem, so you know, blog about that. Talk about what you tried, what didn't work. There's a lot of people who would be interested in following along this I'm getting started sort of story.
18:38 Hugo Bowne-Anderson: Absolutely. And also, do a bit of self promotion or marketing. I'm not suggesting, you know, get your paid ads on Facebook, but if somebody asks a question on Reddit or Quora or Stack Overflow and you think your response may be helpful then get out there and put it there. There are also a number of blogs that have really wide distribution where you can write analyses for them as well. I mean, DataCamp, we've got a community section where we solicit external contributions. The Open Data Science Conference, ODSC, has a blog where they do the same. Once you feel a bit more comfortable with your material, definitely put it out there. I know that that can be difficult as well, so there's certainly a bit of a loss of ego that needs to occur in this scenario, but just remember, there's lot of interesting stuff going on out there and you can be part of it.
19:34 Michael Kennedy: Absolutely, and I think it's really, really important on how you frame what you've created and presented. If you say, I'm the expert in this thing and then you're not, well then, people may find that out and that's going to go badly. But if you're really upfront, like, look, I'm really just getting started everyone, but here's something that I couldn't find any help with and here's what I figured out and I thought it was awesome. Like, nobody's going to knock you for that.
19:54 Hugo Bowne-Anderson: Absolutely, yeah.
19:56 Michael Kennedy: Except for maybe on Reddit. They might send something angry at you .
20:00 Hugo Bowne-Anderson: There'll always be at least one troll.
20:03 Michael Kennedy: That's right, that's right, but you know, it's totally, totally worth it. This portion of Talk Python To Me has been brought to you by Rollbar. One of the frustrating things about being a developer is dealing with errors, ugh! Relying on users to report errors, digging through log files trying to debug issues, or getting millions of alerts just flooding your inbox and ruining your day. With Rollbar's full stack error monitoring, you get the context, insight and control you need to find and fix bugs faster. Adding Rollbar to your Python app is as easy as pip install Rollbar. You can start tracking production errors and deployments in eight minutes or less. Are you considering self hosting tools for security or compliance reasons? You should really check out Rollbar's compliance SaaS option and get advanced security features and meet compliance without the hassle of self hosting, including HIPAA, ISO 27001, Privacy Shield and more. They'd love to give you a demo. Give Rollbar a try today. Go to TalkPython.fm/Rollbar and check them out. The other thing that your brought up is a GitHub repo, or a GitHub profile, and I think that's super important. One of the things about GitHub is you can't fake the commit history over time very easily, right? If you say, "I've been doing this for three years," but your GitHub repo only has a week of activity, that's not a great sign. If you're planning this, do that stuff early so that it can create this history that is proof of what you've been doing.
21:31 Hugo Bowne-Anderson: Absolutely, and something I did when I was starting up my profile on GitHub, I had a sticky, a literal sticky on my computer screen, not the app Stickies, but I had a sticky that said Commit to GitHub Today. I didn't actually do it every day, but you can actually see before I joined DataCamp that there was a lot of public activity I was doing. One, because I really enjoyed it, but two, because I made an active decision to put myself out there.
22:02 Michael Kennedy: Yeah, it makes a huge, huge difference. So, conferences, what are some of the data science conferences that people should go to?
22:10 Hugo Bowne-Anderson: I really like the Open Data Science Conference, ODSC, which I mentioned earlier. In fact, that's where I met the DataCamp people and I had two or three, let's say two and a half job offers from going to ODSC. I also think, not only conferences, but meetups are incredibly useful. It depends which city you're in, but in New York there are a lot of interesting meetups. A lot of people go there after work because they love data science, and even more so, you have hiring managers and recruiters get up, literally the organizers at these meetups say at the start or the end, "Hey, anyone who has a job, stand up and tell us what it is."
22:52 Michael Kennedy: Yeah, I see that, definitely in the Python programming meetups as well. I agree, that's a great way to get connected with your local people, not just people in the industry, but people down the street, right?
23:04 Hugo Bowne-Anderson: Absolutely. The great thing about data science recruiters and data science people in HR and managers is that there are a significant number of jobs out there, so they're really interested in the conversation. As someone approaching data science at the moment, you are in a relative position of power. I mean, it is competitive, but compared to the recruiters, they'll be definitely up for a conversation in a way that they wouldn't in other industries currently. I remember I had a conversation with this great guy from Goldman Sachs where I just asked him up front, you know, what are mistakes that you've had people make in interviews that I should not make? And he gave me lots of great feedback. One example he said, if you don't know something, just admit you don't know it and say that's a gap and I'm looking forward to filling that. There was one guy, he asked what the bias-variance tradeoff meant. It was on a call, and he heard the guy start typing and then answered the question.
24:04 Michael Kennedy: Pro tip, use some type of touch device if you're going to Google during your interview.
24:11 Hugo Bowne-Anderson: Exactly.
24:12 Michael Kennedy: Oh my goodness.
24:12 Hugo Bowne-Anderson: The other thing, when you go to conferences and hackathons and this type of stuff, conferences are also great because they have sprints. A lot of the big packages, whether it be scikit-learn, pandas, gensim, Project Jupyter, which we'll talk about later on, I think. They have sprints when the conference ends where you can go and help contribute to the project. The communities are super open. You can start, they actually encourage you to start by just helping out with documentation which is a huge bottleneck at a certain point in open source software development. You can actually be an active member of these development communities immediately without being like, "Oh, I don't know how to define this class correctly."
24:57 Michael Kennedy: Yeah. Well, I think another huge benefit of that is if you do want to have your public profile, have PRs against, say, pandas or scikit-learn or something like that, those are mature, polished libraries that are hard to just get into yourself, but if you go to sprint and sit down with somebody who's an expert and you guys do it together, there's a pretty quick way to get up to speed to where you can start doing those things if you want, if that's one of your paths you want to follow.
25:25 Hugo Bowne-Anderson: And you're right, you're there at these sprints and you're able, you know, best case scenario, to be pair programming with core developers on pandas, or--
25:32 Michael Kennedy: Exactly.
25:32 Hugo Bowne-Anderson: Or scikit-learn.
25:32 Michael Kennedy: Exactly.
25:34 Hugo Bowne-Anderson: Or NumPy, right? It's crazy.
25:37 Michael Kennedy: Yeah, and so when you go to that job interview and they say, "Well, how does it really work "inside pandas when you do this? "Which would be better, should I do this or this?" Well, internally, it does this, and so here's why you do that. That's an incredible answer, and you could totally get those kinds of insights from these sprints. I agree.
25:52 Hugo Bowne-Anderson: Absolutely. What that also demonstrates is that you're entrepreneurial, which I think a lot of people are looking for these days, someone who will take responsibility and run with it.
26:02 Michael Kennedy: Yeah, that puts you in a pretty thin group already, which is great.
26:07 Hugo Bowne-Anderson: Yeah.
26:08 Michael Kennedy: You also said that reading blogs and things like that, pretty helpful?
26:14 Hugo Bowne-Anderson: Absolutely, read as widely as possible. I think reading blogs, getting on newsletters. Following people on Twitter is one of my greatest, greatest resources. We've chatted about Jake VanderPlas. On the R side, you have Mara Averick, Hadley Wickham, Hilary Mason is great, Dave Robinson on the R side. I follow all these people, so you may as well follow me, Hugo Bowne, 'cause I retweet a lot of this.
26:40 Michael Kennedy: Catch the important retweets, right.
26:42 Hugo Bowne-Anderson: We're really arching up on the DataCamp community at the moment. As I said, ODSC has a fantastic blog. Python Weekly. There are so many different places. I'll include a significant number of links in the notes of this podcast on this stuff as well.
27:03 Michael Kennedy: Yeah, I find that Twitter is super, super valuable. I also find Reddit, actually, if you don't mind a few angry comments every now and then. Certainly, the Reddit community is great and really smart, so you can drop in on the data science one or the Python one and pick up a lot there.
27:20 Hugo Bowne-Anderson: Yeah.
27:22 Michael Kennedy: Cool, and so this kind of sets the stage for you to be prepared to get a job, to make the connections to get a job, but eventually, probably most people's goals are to go get some kind of work in data science job, right?
27:36 Hugo Bowne-Anderson: Yes.
27:37 Michael Kennedy: So, you already brought up recruiters, and I think that's certainly one of the possibilities. Probably one of the least effective ways to get a job is to just go to the career page and apply by filling out the online form. A recruiter can help you get inside. If you have a friend that you know works at that place, ask for an introduction. I think most jobs that are really great jobs start looking for someone to fill it by saying, "Alright team, who knows somebody who would be awesome for this job, anyone?"
28:15 Hugo Bowne-Anderson: That's right.
28:16 Michael Kennedy: And then it becomes this open search. So, how do you get inside this first round before it becomes posted on the career page?
28:23 Hugo Bowne-Anderson: I actually think hackathons are a great way to do that because you actually start coding with people there, do a bit of pair programming, and you get to meet people there. When there are jobs going around, there are a lot of working data scientists from all levels at these hackathons. I also think more specific online platforms, AngelList. If you want to work with startups, there's a lot of stuff happening there. And LinkedIn. In North America anyway, making your LinkedIn profile as attractive as possible will definitely help. You'll get inbound mail coming as opposed to needing to go to the apply page.
29:06 Michael Kennedy: Yeah, and you're in a much better place when people are reaching out to you, rather than the other way around.
29:10 Hugo Bowne-Anderson: Yeah, absolutely.
29:10 Michael Kennedy: For sure. I think that's totally right.
29:14 Hugo Bowne-Anderson: This is general advice to anyone applying for a job, and maybe everyone knows this, but when I heard it a few years ago, it blew my mind. If you're applying for a job and sending a cover letter, use the same font and the same colors as that company's website.
29:27 Michael Kennedy: How interesting, yeah, that's pretty easy to do, right?
29:30 Hugo Bowne-Anderson: Yeah, exactly. Generally, they love it. We got one recently at DataCamp. We were like, "Wow, this looks really nice." And then we were like, "Wait a second. Oh, they've done that." And when we realized that they'd done that, it was even stronger.
29:44 Michael Kennedy: I was on the receiving end of people applying for jobs for quite a while. To me, when I saw something come in and it was just a standard resume, or, like, "Here, I'm applying for this job. Here's my info." If it wasn't, I think your company is amazing because, and I want to work with you to do X, like, that went straight in the trash. If there was not something about the job, the place, you know, if it was just like, here's a copy of my Word document, it was like, well, here's a copy of my recycle bin, next.
30:18 Hugo Bowne-Anderson: Exactly. And it's the same when recruiters reach out to you. I mean, I get recruiter mail on LinkedIn which is like, "Your skillset matches our company." And I'm like, come on, right?
30:28 Michael Kennedy: Exactly.
30:28 Hugo Bowne-Anderson: It's not even, hey, you've done this cool stuff in Python, and whatever it may be. But this actually speaks to something else, which is making it particular to the company, and also making it particular to yourself. Being yourself when doing data science or trying to build your portfolio is incredibly important, I think, playing to your own strengths. A lot of aspiring data scientists feel they need to be a data science unicorn so that they can do the data munching, data collection, data manipulation, machine learning, statistical inference, Bayesian methods, data visualization, you know? Like, this is crazy, right? When you're trying to teach people data science and they feel that, that's totally overwhelming. I'm actually overwhelmed by that sentence I just stated.
31:14 Michael Kennedy: Sounds like a pHD in math plus programming, right?
31:17 Hugo Bowne-Anderson: Yeah, exactly right, and you don't need to be an expert at machine learning algorithms, for example, to be an effective data scientist. That will make you some sort of effective data scientist. But playing to your own strengths and realizing that data scientists work in teams, so, I've worked on a course recently with an educator and data scientist, Sergey Fogelson. He manages a four person data science team at Viacom here at Times Square. I was chatting with him about his team, and he said if everyone he hired knew the ins and outs of support vector machines, that would be a horrible team. He's got one person who is great at statistical data visualization, he has one person who's a data engineer and fantastic at that, he has one person who does the machine learning stuff and also has a background in math and physics so can actually explain the ins and outs of these algorithms to the rest of the team. I actually forget what the fourth person does, but that speaks to the fact that managers are aware that when they hire in teams, they're going to hire people with different strengths. And for that reason, I'd suggest to anyone entering data science to do things that interest you. Have a play around. When developing your portfolio you'll see you've got to do different steps in the data science pipeline. Figure out what you enjoy the most, and then apply for those jobs as well.
32:39 Michael Kennedy: Yeah, I totally agree with you. I think one of the underlying things you're touching on here is authenticity, because if you feel like someone is reaching out to you and they're being super authentic, like you said earlier about, well, you know, I honestly have no idea what that term means, but I'm super excited to learn it if it's important. Like, I'm not against that, I just don't know every single little detail about this. I think when people are hiring, you see the enthusiasm, you see some real problem solving skills and some authenticity, it really goes a long way.
33:10 Hugo Bowne-Anderson: Yeah, and being able to adapt, pivot and learn as well. Being able to say, hey, this is what I've learned in the past year. I have no idea what that means, Mrs Hiring Manager, but I'm willing to learn that, is incredibly important in this space, because in all honesty, in five years, it might not be Python. Julia may come up, R may really blast in again. The ability to learn and relearn, I think, is incredibly important, and demonstrating that.
33:41 Michael Kennedy: Yeah, absolutely. At a minimum, you have to learn the details and the ins and outs of that actual problem set and that industry that maybe you don't have. Another thing you touched on was, do what interests you, because then you have the enthusiasm, and that really is super powerful as well. I'm a big fan of combining what you're interested in or what you have expertise in, plus programming, plus data science, and I think it really gives you this superpower. You talked about this cell biology project that you had. They were probably like, "Go to Hugo. He can solve the problem, because he controls the magic of programming and he can do this biology stuff." So, there's this really unique set of skills. You don't go from a million data scientists and how do you differentiate yourself from them. You're like, I'm the data scientist that also understands wind power like nobody else. So, if I'm trying to apply to a renewable energy company, well, that's a clear win, right?
34:43 Hugo Bowne-Anderson: For sure. I definitely think you've got to be doing something you're interested in. A lot of people may say, "I'm going to do a Kaggle competition because that's what people do." I think Kaggle competitions are great, but choose one that you're super interested in. If you're interested in flight patterns in North America, do a Kaggle competition about how often flights are delayed. Which airlines, which cities? That type of stuff. If you're a movie buff, jump into the MovieLens dataset and try to develop a basic recommendation systems engine. If you're into Yelp reviews, okay, if you hate Yelp reviews that don't give you enough information, try to learn a bit of natural language processing or natural language understanding by segmenting or filtering or clustering these Yelp reviews. Doing things that interest you is incredibly powerful when developing your data science portfolio. But also, it makes sense in the sense that if someone's talking to you about something that they don't really care about, you're not that effected, whereas we all love listening to people who are passionate about something, right? So, that's very powerful. Another approach, and I actually had this conversation with a data scientist and statistician in the R ecosystem, Mine Cetinkaya-Rundel. I'm sorry if I got that pronunciation wrong, but we were discussing this, and she said, "Yeah, do stuff that interests you, or stuff that you have to do." And I said, "What do you mean?" She said, let's say you're trying to learn data science and you're doing your budgets, your monthly family budgets in Excel. Try to do that in R, try to develop a minimal dashboard, or in Python, and see how that goes. If you wear a Fitbit, get your Fitbit data out of CSVs and have a look at your own sleeping patterns and your own heart rate data and accelerometer data and that type of stuff, and write something on your blog or on GitHub about that.
36:39 Michael Kennedy: I think even companies get created out of those types of activities, right? You're like, "You know, I really wish I could do this thing better for myself." and you're like, "Wait a minute, this seems like everybody must have this problem, and this is a cool solution. What can I do with that?"
36:52 Hugo Bowne-Anderson: Exactly.
36:54 Michael Kennedy: This portion of Talk Python To Me was brought to you by GoCD. GoCD is an on premise, open source continuous delivery tool to help you get better visibility into and control of your team's deployments. With GoCD's comprehensive pipeline modeling, you can model complex workflows from multiple teams with ease. And, GoCD's Value Stream Map lets you track changes from commit to deploy at a glance. Say goodbye to deployment panic, and hello to consistent, predictable deliveries. We all know that continuous integration is super important to the code quality of your applications. Choose the open source, local CI server, GoCD. Learn more at TalkPython.fm/GoCD. That's TalkPython.fm/GOCD.
37:38 Hugo Bowne-Anderson: I love that you spoke to this idea of creating superpowers by combining two or more areas of expertise, because I think that will also help differentiate you. A lot of people are out there trying to get data science jobs, but if you're data science plus, you differentiate yourself from everyone else who is speaking about data science. If you're interesting in data science plus analyzing genomic data, or data science plus analyzing, as we discussed, Yelp reviews, that type of stuff will help differentiate you from the masses.
38:09 Michael Kennedy: Yeah, absolutely. If I was on the hiring side and I saw that this is a person who is a proper data scientist but they also know my industry, that goes right to the top. That's great.
38:19 Hugo Bowne-Anderson: Exactly.
38:19 Michael Kennedy: Let's talk about programming skills a little bit.
38:22 Hugo Bowne-Anderson: Love to.
38:24 Michael Kennedy: I'm familiar with the programming skills you need to be a web developer, but how about data scientist? What do you think people should really focus on there?
38:32 Hugo Bowne-Anderson: Currently? I would learn at least one technology really well by applying it to projects, the types of projects we just discussed. I think the two most applicable technologies right now are Python and R. If you learn one of them really well by applying it to projects, and I'm not necessarily saying going and learning all the ins and outs of object oriented programming in Python, but the type of stuff you pick up when doing a project of analyzing social media trends using Twitter, you'll gain so much knowledge doing that. I'd also suggest learning a bit about others, to be able to speak the language. If you choose Python, I'd then learn a bit of R. Not necessarily as much as you know in Python, but being able to speak that language will really help you in whatever roles you enter in the future.
39:24 Michael Kennedy: Yeah, certainly having these multiple languages as your skillset. If you understand, well, maybe over in R, there's this really cool way to do this one thing that's not so easy in Python, that can help you think of different ways to solve the problem, or maybe it's just not so obvious in Python how to do it. That can definitely open your mind to different avenues of solving these problems. You maybe can grab a library that's important over there, import it over to Python, and use it if you'd rather.
39:49 Hugo Bowne-Anderson: For sure, and I think one great example of this, I use Python substantially more than I use R these days. One case in which maybe I'll jump into R is doing some very basic exploratory data analysis and filtering and that type of stuff, because all these new Tidyverse tools developed by Hadley Wickham among other people, are incredibly useful for rapid iteration of exploratory data analysis in a way that the more Pythonic tools, perhaps, are not.
40:22 Michael Kennedy: Sure, that's a good example. What do you think about, I'm not sure what the proper way, sort of, software engineering type of skills, like refactoring, design patterns, those kinds of ideas. How important is that kind of stuff versus a good exploratory, we're just going to find the answer, we're just going to rummage through this data type of programming?
40:45 Hugo Bowne-Anderson: That's an incredibly important question that I don't have a concrete answer to yet. I think what people need to is, I mean, you don't want to go down the hole of becoming a developer. You're trying to do data science. I didn't actually mean it's a hole that you enter when you're becoming a developer, but you don't want to go down the hole of, you know, developing software engineering best practices and only focusing on that, but you do need basic programming best practices. The first things are, you know, having a style guide. Python PEP 8 all the way. Commenting your code, using version control. Have a workflow, and maybe you don't have this at the very start, but do exploratory data analysis and write exploratory code while it's working for you. But when you start tripping over it, when it starts to become more inefficient, then perhaps start to refactor your code. Put your functions in modules, in .py files, for example. Have an editor that you use, or notebooks.
41:47 Michael Kennedy: One of the areas that I see this kind of stuff becoming really important is people can do super important work, especially if they're coming more from the science side towards the programming, rather from the software side towards the data, is they're really good at writing scripts that will answer their problem, but they're not super reusable, right? They're kind of just, like, it goes through the steps that I need to solve my problem, rather than, here's a thing I could make an open source project. Imagine if pandas was just crammed inside of some other application in a way that wasn't able to become this amazing thing.
42:20 Hugo Bowne-Anderson: Exactly, and that's a huge bottleneck for working scientists. I don't want to be too hard on the biologist, but the type of code I saw was really, like, we had to go through it in serious detail to figure out what was happening in there, even when it was published. And of course, remember that you're writing code for other people to read, but more importantly, you're writing code for future you to read.
42:48 Michael Kennedy: Yes .
42:49 Hugo Bowne-Anderson: So, be good on future you.
42:51 Michael Kennedy: I often have this thought of, like, if I do this, my future self will thank me. In programming, but also just in, like, making coffee before I go to bed, right? Get it ready to press the button.
43:01 Hugo Bowne-Anderson: That's it. And I also think there are a few other technologies which we've spoken to in some sense. Git is incredibly useful. There can be a slightly steep learning curve before you see the value there. Version control is incredibly necessary for data science moving forward. Learning Bash, a bit of shell is really useful. If you're on a job and you need to spin up an AWS instance, you'll need to know a bit of that stuff. I don't necessarily say spend weeks or months using it, and I know all of this can be quite overwhelming, all these different tools, but if you know a bit of each, you'll be in good stead for getting into data science.
43:46 Michael Kennedy: Yeah, what's worked for me a lot in these things is, like, it's not like, well, I want to know Bash and Linux so I'm just going to study them to death. It's like, I have this problem I need to solve with Linux. Let me learn enough to solve that problem. And then, you just keep doing this. You build up enough to hit most of the important areas.
44:04 Hugo Bowne-Anderson: Exactly, and once again, you're speaking to doing projects and having some particular project which you can do and learn tools around it. And as we've discussed, putting that on your blog. Having a blog post, how I used Linux to solve this part of this problem, and if someone asks you about it, you can say, "Yeah, I know this and that about it, and you can check out more content on my blog." I think that's incredibly useful. Or on my GitHub, right?
44:31 Michael Kennedy: Yeah, yeah, absolutely. It's super important. We talked about the programming stuff, kind of low level. What are the core skills? I mean, do I need a math degree to be a data scientist? Do I need to be a scientist, a programmer? What are the core skills?
44:46 Hugo Bowne-Anderson: You definitely don't need a math degree to be an effective data scientist. I do think, though, if you learn a bit along the way. Let's say, you're totally not into matrices and linear algebra and all of that jazz. That's cool, but if you do learn a bit along the way and try to not be scared of it, you know, you'll become a bit more effective. I'd suggest to you to try and ease yourself into that stuff. But the more important initial skills are being able to explore data, being able to read in a dataset using pandas, for example, or data table in R, and check it out, look at some figures, compute some summary statistics, that type of stuff. Very related to this is data cleaning and data manipulation. There's the saying that 80% of my job is cleaning data and manipulating it, and it's a joke, because it's more like 95% of most people's jobs. I think this is incredibly important. Statistics, I think, is really essential in data science. But I need to be careful there because when I say statistics I don't mean the central limit theorem. I'm talking about applied statistics or practical statistics and actually, when I was wrapping up my postdoc, I was asked the same question so many times by students that I started running workshops in R and Python called An Introduction To Practical Statistics. We'd take their data sets and see how we can find out stuff from Python and R. What I'm talking about there is, you know, how to compute the mean, standard deviation, how to do basic statistical modeling, fitting polynomials, that type of stuff.
46:28 Michael Kennedy: Are these correlated or not, things like that, right?
46:34 Hugo Bowne-Anderson: Exactly, and thinking about how then that translates into my initial question as well. It's not only, you know, does this look linear? It's, what are the implications of this? What can I tell to someone who doesn't know something about the Pearson correlation coefficient? How can I explain this in human terms to a manager, for example? Bootstrapping is an incredibly useful technique in statistics that I think everyone should know. I might try to explain very briefly what bootstrapping is.
47:03 Michael Kennedy: Yeah, yeah, go for it, 'cause I'm not entirely sure what it is myself.
47:07 Hugo Bowne-Anderson: It means something different in the world you're from, as well.
47:10 Michael Kennedy: Yes, there's two meanings of bootstrap that I know already. You don't even know what I'm thinking of. I don't think it's what you're thinking of.
47:17 Hugo Bowne-Anderson: Think about this. You've got some dataset, people's heights in a certain population, and you have that average. This is the average height of this dataset. But you know that, let's say, you only have 10 data points or 20 data points. You know that this won't actually be the average height of the entire population. The average height you've got has some sort of error bar associated with it, and what you want to do is estimate those error bars. What you do is, you resample from the sample you have. So, if you have 20 data points, you can resample 20 with replacement to get a slightly different average. You can do that 100, 1,000 times, and then you get some sort of distribution of potential means or potential averages, that's the bootstrap of the average. That will tell you the spread of possible averages in the total population. The great thing is, this doesn't just apply to averages or means. You can do this with any statistic under certain scenarios and it gives you a pretty good idea of what you're looking at statistically.
48:21 Michael Kennedy: That's really cool, it's like meta statistics. Statistics about statistics.
48:25 Hugo Bowne-Anderson: Exactly. The great thing is, once you have that distribution of means you can visualize it, right? You get a distribution, you can have a look at it, and that speaks to the next core skill that I think everyone, if you're not going to be a specialist in data visualization, that's fine, but as a working data scientist, you'll be asked time and time again to explain your results, and a picture is worth 1,000 lines of code. I think that's incredibly important to become adept at data visualization. The fifth point, which is the term on everyone's tongue, is machine learning, the related deep learning. I think machine learning is incredibly important for working data scientists, but I don't want aspiring data scientists or software engineers who are trying to enter the data science space to fall into the trap of thinking, if I can machine learn in inverted commas, you know, that makes me a data scientist. I'd suggest that, definitely, learn a bit about deep learning, but don't get too sucked in, unless you want that to be your focus, and then really do it, right?
49:33 Michael Kennedy: Yeah, it's definitely one of the most mysterious and sort of new buzzy parts of data science.
49:39 Hugo Bowne-Anderson: Exactly. The way it's related to this kind of reburgeoning concept of artificial intelligence is fascinating, but there's also a potential for a bubble. I don't want to be too harsh on it, because it's incredibly important, and the effects on society and the way we live will be huge, but we need to be careful as well.
50:00 Michael Kennedy: I think probably the danger is that it can become the hammer where everything becomes a nail to hit it with. There was this funny image I retweeted on Twitter yesterday. I don't know where it came from originally, but there's this huge bulldozer thing. Instead of having a big scoop on the end of its arm, it had a little regular person sized shovel and it was digging, and the quote was something like, you know, machine learning solution when all you really needed was a few if statements, something like that.
50:28 Hugo Bowne-Anderson: That's fantastic.
50:29 Michael Kennedy: I do see that possibly being a danger, right? It's not the only way to solve problems, but the problems that they can solve are, like, they were unsolvable before. It really does have the possibility to open new doors, but it's not the only tool for it.
50:43 Hugo Bowne-Anderson: Yeah, I mean, the pendulum swings both ways. Part of the reason it's really buzzing now is because it has been incredibly effective, as we've seen.
50:52 Michael Kennedy: Yeah, and these companies are saying, "Hey, we have tons of data and we don't fully understand it. Could this maybe be our magic silver bullet to unlock something that we didn't know about?" You said storytelling, right?
51:02 Hugo Bowne-Anderson: Storytelling is incredibly important. I think even when you're writing a chunk of code, you're telling a story to future you or someone else who is reading it and trying to interpret it, but when developing a data science project, you're introducing them to a dataset. You're showing them exploratory data analysis, you're potentially showing them some statistical inference, machine learning pipelines, so being able to explain in a variety of terms what your data science story is is incredibly important, and to give takeaways at the end, to give an introduction, this type of stuff. Consider it a story, and also think who your target audience is. If you want to write a blog post which a hiring manager can understand, that's one thing, but if you want to write a blog post that someone who is very well versed in machine learning can understand, they're very different things. Just kind of think about that, practice that, and read what other people do as well. There's a website, I can't remember what it is, but it's called something like, you know, 100 Interesting Jupyter Notebooks in Data Science or something like that.
52:09 Michael Kennedy: I think I've seen that. That's really cool. That definitely is a great place. I think Jupyter notebboks really are powerful and they've brought storytelling to code in a way that just wasn't here before.
52:22 Hugo Bowne-Anderson: Absolutely, and the idea of being able to interactively write your code and see output straight away below the cell you've written in is really strong. This was actually one of Jake VanderPlas' points in his PyCon keynote where, you know, someone said to him, "I can't believe you use Jupyter, it's so slow and beefy." And he was like, "Oh, I never thought about that. That doesn't affect my workflow. It's really about speed of development for me, not speed of execution." I think was his term and that he can go in there, and we all can, write some code, see the output, get some cool visualizations, move on, write some markdown in there in order to have some text and tell that story. Now, one of the greatest things, of course, now is that, and has been for sometime, that GitHub renders Jupyter notebooks as well. You can just give someone a link to your Jupyter notebook on GitHub and they can go and check it out immediately without even needing to clone the repository.
53:20 Michael Kennedy: Oh, I didn't know that. That's awesome, very cool. We're kind of getting near the end. We've got to wrap it up a bit. One of the final things we should focus on, it's a time of unparalleled information and learning resources. I mean, 20 years ago it was get a book or get a degree. There's a whole lot more than that now, right?
53:43 Hugo Bowne-Anderson: Absolutely.
53:44 Michael Kennedy: You guys at DataCamp already have a ton of courses for data scientists.
53:48 Hugo Bowne-Anderson: Yeah, I definitely think one way to keep up to date with what's happening in the field is online education. There are lots of platforms for this which offer different things. I think Coursera and edX opened the world of online education, not only in data science and programming, but everything from the humanities to space exploration to politics, and it's incredible platforms, both of them are. What we do at DataCamp is we're building a vertical platform for people to learn data science. What we offer, really, one of our major value propositions is it's more personalized in the sense that you get a shell and you get to write a script in the course and you get automated personalized feedback. Let's say I try to import pandas and read in a CSV, but I pass the wrong argument to it or the wrong separator or something like that. DataCamp will say, "Hey, you passed in this argument. Why don't you try doing this instead in order to import it? Read the CSV correctly." So, we have a mixture of videos and interactive coding sessions. There are lots of other great places. Kevin Markham has his data school, which is great for Pythonic data science.
55:06 Michael Kennedy: Yeah, Kevin Markham is doing really awesome stuff. Shout out to Kevin, I was just talking to him yesterday, actually, and he and I have done a little bit together. He's got some really cool stuff for data science and Python, for sure.
55:18 Hugo Bowne-Anderson: Absolutely, and of course, your courses, your Talk Python courses for pure Python. Everyone should do those.
55:23 Michael Kennedy: Well, thank you very much. I appreciate that shout out, that's awesome.
55:26 Hugo Bowne-Anderson: Of course.
55:27 Michael Kennedy: Hopefully people who are getting started in data science or the programmers that want to move into data science, hopefully this has been really helpful. I think there's a pretty concrete roadmap of steps that you can take to get there, so thanks for laying that out for us.
55:41 Hugo Bowne-Anderson: Absolutely, and thanks for coming up with this idea for us to have this chat, as well. It's been really cool.
55:46 Michael Kennedy: Yeah, it's super fun. I think everyone is going to enjoy it. Before I let you get out of here though, you've got two questions to answer. First of all, if you're going to write some code, namely Python code, really, what editor do you open up?
55:59 Hugo Bowne-Anderson: When I use an editor, which I do for scripting, I'll use Atom, but as we've said, for most data science, I do it in Jupyter notebooks. I love Jupyter notebooks. Also, I'd recommend very soon, or even now, people checking out Jupyter Lab.
56:12 Michael Kennedy: I don't know Jupyter Lab, tell us about it.
56:14 Hugo Bowne-Anderson: Jupyter Lab is amazing. It's really a modular infrastructure for data science and scientific computing. You open up your Jupyter Lab kernel and you can have a Jupyter notebook in there, you can have a terminal in there, you can have a markdown file which you see rendered immediately. You can even have notebooks. You and I can open Jupyter notebooks in our respective Jupyter Lab environments and collaborate on them in real time and you can paste code into the chat that I can then paste into my notebook. It's really a new modular infrastructure.
56:51 Michael Kennedy: That's awesome, it's like social Jupyter.
56:54 Hugo Bowne-Anderson: Yeah, absolutely.
56:54 Michael Kennedy: That sounds great.
56:57 Hugo Bowne-Anderson: That's super exciting, and the development around that is really strong.
57:01 Michael Kennedy: Nice, okay. Notable PyPI package?
57:05 Hugo Bowne-Anderson: Okay, there are so many.
57:07 Michael Kennedy: It's so hard. There's like 120,000 almost now, it's insane.
57:12 Hugo Bowne-Anderson: I'll mention one that I recently discovered and I've only played around with, but it seems super cool. It's called Newspaper, and I've been thinking about it a bit recently. I spend a lot of my time trying to scrape HTML and prettify it, so for that, generally I use Request and Beautiful Soup, those are huge, but those aren't the ones I'm talking about at the moment. This is called Newspaper, and it's a really simple API for scraping articles and curating them and doing natural language processing, so you can get in touch with the New York Times or whatever it may be, scrape the article really easily. There are some natural language processing methods, title methods, text methods, that type of stuff. I probably won't get this right, but it's something like NLP method and spits out keywords and topics and that type of stuff.
57:59 Michael Kennedy: It's an incredible library, yeah. I just discovered it recently as well. Basically, the idea is, instead of combining Request plus Beautiful Soup, you get the text and the semantic markup and you've got to do whatever you're going to do, it's like, you can just point it at an article and say, "Who is the author? When was this published?"
58:19 Hugo Bowne-Anderson: Exactly.
58:19 Michael Kennedy: "What are the key words?" You can point it at the home page, like the home page of the New York Times and say, "What are the articles on this page?" It's crazy.
58:25 Hugo Bowne-Anderson: It's awesome. And it deals with date times in a really intuitive, nice way, which, date times are the bane of my existence, a lot of the time.
58:34 Michael Kennedy: Why are date times so hard? They are though, they're really tricky.
58:38 Hugo Bowne-Anderson: James Gleick has this thing where, it's an article about how there should just be one timezone. I'm not going to go into that, but I'm just putting that out there. It's not obvious who would be the center of that time zone.
58:50 Michael Kennedy: Yeah, that's a big debate there, right? I wake up at two in the afternoon, and then I get up, like, that would totally simplify things.
58:58 Hugo Bowne-Anderson: His argument is that timezones are a historical artifact that we need to get rid of. But that's my notable PyPI package. I just wanted to give a few shout outs to a bunch of others from the data science Python stack. This list is by no means exhaustive, but I use pandas, scikit-learn, NumPy is huge, matplotlib, cborn, Altair, and Bokeh are all great for data vis. DAS for distributed computing, PyMC3, StatsModels. These are all really interesting and core elements of the data science Python stack that I use and love.
59:33 Michael Kennedy: Yeah, those are all very, very good ones. Awesome. Yeah, Newspaper, lots of fun with that one. Alright, so, Hugo, final call to action? People that are wanting to get into data science, what do you say?
59:44 Hugo Bowne-Anderson: Get out there and do things. Play to your own strengths, be brave, and something we haven't really chatted about, realize that imposter syndrome is a real thing for everybody. At the inaugural JupyterCon this year, Fernando Perez, the creator of IPython, for real, the creator of iPython and co-lead of Project Jupyter, encouraged everyone to realize that everyone has imposter syndrome and that he himself has imposter syndrome. Any time you think you're an imposter, remember that Fernando Perez feels the same way.
01:00:16 Michael Kennedy: He's out there changing the world, and so can you, right?
01:00:18 Hugo Bowne-Anderson: Exactly.
01:00:18 Michael Kennedy: Awesome.
01:00:20 Hugo Bowne-Anderson: That's it.
01:00:21 Michael Kennedy: Alright, well, great to talk with you, and thanks for coming on the show.
01:00:24 Hugo Bowne-Anderson: Such a pleasure, thank you.
01:00:25 Michael Kennedy: This has been another episode of Talk Python To Me. Today's guest has been Hugo Bowne-Anderson. This episode has been brought to you by Rollbar and GoCD. Rollbar takes the pain out of errors. They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed until your users complained, of course. As Talk Python To Me listeners, track a ridiculous number of errors for free at Rollbar.com/TalkPythonToMe. GoCD is the on-premise, open source continuous delivery server. Want to improve your deployment workflow but keep your code and builds in house? Check out GoCD at TalkPython.fm/GOCD and take control over your process. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point? Well, check out my online course, Python Jumpstart, byBbuilding 10 Apps at TalkPython.fm/course to experience a more engaging way to learn Python. And, if you're looking for something a little more advanced, try my Write Pythonic Code course at TalkPython.fm/Pythonic. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, Google Play feed at /Play, and direct RSS feed at /RSS on TalkPython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now, get out there and write some Python code.