#322: A path into data science Transcript
00:00 Are you interested in getting ahead in data science? On this episode, you'll meet Sanyam Bhutani,
00:04 who studied computer science but found his education didn't prepare him for getting a
00:08 data science-focused job. That's where he started his own path of self-education and advancement.
00:14 Now he's working at an AI startup and ranking high on Kaggle.
00:17 This is Talk Python to Me, episode 322, recorded June 10, 2021.
00:23 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the
00:41 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where
00:46 I'm at mkennedy, and keep up with the show and listen to past episodes at talkpython.fm,
00:50 and follow the show on Twitter via at talkpython. This episode is brought to you by Sentry and
00:56 your base, and the transcripts are brought to you by Assembly AI. Please check out what they're
01:01 offering during their segments. It really helps support the show. Sanyam, welcome to Talk Python
01:06 to Me. Michael, I'm disappointed I didn't hear the Steve Ballmer remix intro, but I'm very honored.
01:12 Aha, developers, developers, developers. Oh, come on. It's so good. You know it's so good.
01:17 I remember that was in your first few episodes. I think they came out right around the time I was
01:21 in university. Thanks for this opportunity. I've been a fan and listener of the show and
01:26 really excited to be talking to you. Yeah, I'm really excited to have you. I'm excited to hear
01:30 about your journey into data science. It's going to be so much fun because I feel like so many people
01:36 out there looking in from the outside, you know, they maybe didn't come into data science or to Python
01:43 from a traditional computer science education. And they feel like, well, I didn't go through that
01:48 path. And so I probably, this is not a good fit for me. I think that's very far from the truth. I think
01:54 there's so many opportunities to get into data science or to get into Python and programming.
01:59 And while you do have some experience with computer science at the university, it sounds like as we'll
02:04 learn through your journey that a lot of what is actually effective had very little to do with
02:09 university. Let me start out on a spicy note. I studied computer science at a university at one
02:15 of the best universities in the country. It didn't make me a better programmer at all. Let me start
02:20 with that spicy opening. That is spicy. Now you've thrown it down. And I want to come back to that,
02:28 but let's just start with your story. You know, what got you into programming first? Was this a
02:32 university thing that you pursued or were you interested in that beforehand or how'd you get into
02:37 programming? Sure. So I was the standard nerd definition. I enjoyed spending time with computers.
02:42 Whenever my parents would go to sleep, I'd figure out a way to sneak into the computer room, just play
02:47 games all night. And along the way, I think somewhere in high school, I discovered programming.
02:53 It was Java, unfortunately, but I saw the promise of it. I saw all of these interesting things that were
03:00 happening around it. And somewhere I just made up my mind that, Hey, I want to take up computer science
03:05 because that's what coders do. Unfortunately, not as I learned later. That's how I got interested in it.
03:11 And that's why I decided to take up a course in it. Yeah, that's fantastic. I never took computer
03:15 science as a major in college, but I studied math and I had to take a couple of programming courses
03:20 to sort of fulfill my math degree requirements. And yeah, I found it to be a mixed bag. Like I had to
03:27 learn Scheme and Lisp and I thought, well, that's not super practical, but...
03:31 What are those?
03:31 Yeah.
03:33 But I've got to start here. I was like, please let's do some C++ or something. Like, no,
03:38 no C++ for you. Darn it. And then I was told I had to learn Fortran because it was the most
03:44 important language I would ever learn. Turned out to not be true, but I learned Fortran as well.
03:49 And eventually I got into some fun languages that I got to build some things.
03:53 Well, I don't know how you felt, but my experience with being in a university, and this is speaking
04:00 from doing this in the nineties. So it could have absolutely changed, right? I haven't gone back to
04:04 the university since, but I didn't get a lot of projects that I really loved that I was really
04:09 super excited about. It was like, well, you're going to need to learn how to do this algorithm
04:13 by hand on paper. And you're going to need to implement this in this sort of archaic,
04:18 weird language. You got to do this.
04:20 You're compiling stuff in your head through paper. Talk about state of the art.
04:24 Yes. Oh my gosh. I'm like, why, why do we not get to use computers in our computer science course?
04:29 This is crazy. I just don't get it. But here I, here I am. So I didn't come away feeling like it
04:34 made me a super good programmer. It gave me some exposure and some interesting experience, but my
04:40 real exposure that got me into programming and told me, like revealed to me, like you can do this.
04:47 And this is for you was when I was doing a research project that had to do with math and
04:52 not programming, but it needed a little pro it needed some programming to do the simulations and
04:57 do the work. And I'm like, well, now this is super fun. This is the kind of stuff I wanted to do. And I
05:01 was up at 2am, you know, working on it in the computer labs then, because all of a sudden it was really cool.
05:07 So I don't know. Hopefully computer science is more practical these days, but I didn't find a huge
05:13 value in those computer courses I took at university.
05:15 To be honest, like I relate to that so much. I could just rant about this for hours, but I signed up for
05:21 computer science because there was this notion in my head that, hey, computer science is where you do
05:25 programming stuff, right? You make computers smarter. And then they're teaching us all of this stuff that,
05:31 you know, doesn't really make sense. Like I remember listening to talk Python those days,
05:36 and you were talking about PyPy, which I didn't know what was because they never told us what it was.
05:41 And I'm listening to all of this stuff. And what they're teaching us is how to make for loops,
05:45 print out patterns. Like I don't see how these things connect, right? You're talking about flask
05:50 building apps on talk Python. I just heard Michael talk about it. But now what is all this stuff?
05:56 What is inheritance? Where does it come into the picture? And there was this huge disconnect for me.
06:00 So very much echoed that experience as well, unfortunately.
06:04 Yeah, that's interesting to hear you look back on it. One of the things that you talk about in some
06:08 of your writings and some of your experience, and we'll get into it, has to do with top-down versus
06:15 bottom-up learning. Before we get to that, I want to make sure you get to answer both the opening
06:20 questions. So what are you doing these days right now before we dive into that aspect of learning?
06:24 Sure. I currently work at h2o.ai, which is a company building auto-ML products. I'm sure we'll
06:29 get into this later on as well. I work as a content creator slash engineer. So we have a makers gonna
06:36 make culture, which means that I have absolute freedom to bring ideas. Usually people don't stop
06:41 me and they encourage me, which also means that I can do a podcast at work. I started a podcast,
06:47 a while ago called Chai Time Data Science, where I interview my heroes. A lot of them are Kaggle
06:52 grandmasters. So we can talk about this later as well, but Kaggle has different tiers. Grandmaster
06:58 is the highest one of them. S2O has, I think, more than 20 grandmasters. So at some point I said,
07:04 hey, can I interview our people? And they said, yes. So I have a lot of freedom over stuff I do,
07:08 but it's a lot of creating content and things in those domains. So blog posts, videos,
07:13 I get to do meetups as well. That sounds like a really fun job. It is.
07:17 This whole exploring ideas and creating content and interviewing people and just being out in the
07:22 community. It's the aspect of programming that when people first hear about it, I think is extremely
07:27 surprising, right? Yes. A lot of people think of programming, especially before they really get
07:32 into it, it's this solitary thing that kind of geeky, super smart people do, mostly alone,
07:38 mostly to avoid other contact with others, right? And then as you get into it, you learn like,
07:43 actually there's a whole lot of team dynamics and programming. And then there's these roles that
07:47 are like developer evangelist, which sounds pretty similar to what you're kind of doing,
07:51 like community outreach on the dev side, which is very social and outgoing and interesting. And so,
07:58 yeah, it's a whole spectrum. It's a lot of fun for sure. And yeah, it would be closest to evangelism, but also I have a lot of freedom to do a lot of, honestly,
08:06 anything I bring to the table, usually I get positive feedbacks about it. So I just keep doing stuff,
08:11 even if it's interviewing people over time.
08:13 Yeah, that's fantastic. So back to my top down, bottom up thing that you talk about. In a lot of
08:19 academic, you know, high school, college settings, the foundation is set at the beginning, right? Okay,
08:26 well, we're going to teach you how to do derivatives or differential calculus. So what we're going to do
08:31 is we're going to start out real, real simple. And I'm going to talk about what does a difference look
08:36 like, and then we're going to talk about limits. And then we're going to, you know, eventually,
08:39 like two months later, you can do derivatives and you can actually do calculus, right?
08:44 Yeah.
08:44 That first two months, you just have to have faith that I'm just going to keep cranking on the details
08:49 until something interesting happens.
08:52 But even after those two months, like you get to solve these problems, you get to, at least for me,
08:57 I was able to confirm that, hey, the answers I'm getting match with those in the book. But what's
09:02 the point of all this? Like, okay, I'm able to solve these problems. I know how to ace my test. I know
09:06 how to match into that. But where will this be used? I'll never find out. And I just didn't have the
09:12 passion for it. My only passion was, I need to get good grades to get into a good university. But apart from
09:17 that I had no real...
09:19 Right, right. I want a good job, but like, this must be the path.
09:21 Yeah, basically.
09:22 Yeah, yeah. I hear you. And I feel like so much of academics and many presentations and courses are done
09:29 this way as well. They, but especially academics, because you have to finish it to get the grade,
09:34 to get the degree. So they're like, well, it's fine if it takes three months before this is interesting
09:39 to anyone, because they have to stay here. They have no choice. Like, we're going to build up slowly,
09:44 bit, bit at a time for three months, because guess what? They're all enrolled and they need this.
09:51 This is a required course. And so we're going to make sure we get every little detail in place along
09:55 the way. And eventually it'll be interesting to them. I just feel like that is so backwards
10:00 from trying to capture inspiration of people, you know?
10:03 Somehow, every single time there's always this disconnect, like, okay, I get it. At some point,
10:09 you have to know the concepts, but you're never told of the bigger picture, which is what is a
10:14 larger focus in the top-down approach. So I never know where these individual things will really be
10:19 used. It's like, before you get to drive, you need to know about the thermodynamics of the engine
10:24 rather than sitting in the driver's seat.
10:26 Yeah, there's a beautiful Ferrari you want to take out for a drive. And you're like, no,
10:30 no, no, no, no, no, no. We're going to study physics, study thermodynamics,
10:33 a little chemistry for the combustion. And then a couple of years, you can take that thing for a drive.
10:37 Exactly. And I feel like it actually captures it pretty well. You know, contrast that with,
10:42 well, let's just teach people the rules of taking a derivative, right? Derivative of x squared is 2x.
10:48 Okay, great. Now let's show them how they solve cool problems. Like, oh, here's a ball flying through
10:53 the air and we can figure out its velocity when it hits the ground based on things like the derivative
10:57 and acceleration and so on. And then eventually, once you're like, this is really interesting,
11:01 then you could talk about like, all right, now let's dig in. Let's talk about like the details of why
11:07 this math or this data science algorithm works. And it just doesn't really go that way. So I think
11:12 that's definitely an interesting part of the journey that you had to make that switch, right? To sort of
11:18 go from this like really theoretical academic background to, oh, I've got like a Kaggle competition.
11:24 Yes.
11:25 And I've got two weeks to solve the problem. Like we can't be rebuilding it from the foundation. Let's go
11:30 the other direction.
11:31 Yeah. Just to be clear to the audience, I just did a bachelor's in computer science. I didn't do a
11:36 master's or PhD. I gave up on academia midway. But yeah, echo on that. And it's the top down approach.
11:44 I was introduced to this through fast.ai. They are a big advocate of this. And that's how I became a fan
11:51 of this. Essentially, what they cover in their blog post is you're given the baseball bat and you get to play
11:57 first rather than being taught the physics of the curveball. And I think at least for me, in retrospect,
12:03 the main challenges throughout all of these months of learning a subject in university, you need to be able to
12:10 stay motivated. And remember why you've taken up a course. I took up web programming because let's say I want to learn
12:16 how to make websites and not because I need to remember what HTML tags come in the final semester question every year.
12:23 And somewhere in the middle, you lose out on this motivation. And the top down approach essentially takes care of that.
12:28 That, hey, bring your project and figure out stuff along the way. And I think I mentioned in our interview, I think
12:34 talk Python courses really cover this well because you're given 10 sets of projects and you can just build them along the way.
12:41 Yeah, thank you. Yeah. I mean, I really am a big fan of this because I think, and I tried to incorporate the courses
12:46 that we have, because I do think you need to have these little wins right away. And you hear a lot of times
12:52 people talk about like, well, if you're teaching kids, the kids need to have these like good experiences early.
12:58 It's like, you know what? Replace kids with people. Like people just, they've got a lot of time and other options and you want
13:05 to make them feel good and excited and like they're making progress. They need to make progress,
13:09 even if it's little steps to the beginning, make it feel like legitimate progress, not just algorithms
13:15 and loops and stuff like that. Yeah, absolutely. So you went through your computer science degree,
13:21 but you didn't come out the other side feeling like a data scientist. And this was around the time
13:26 of the MOOCs, right? The massive online open courses. Is that what this stands for?
13:31 I think so.
13:32 I think so. Something like that. It might have several variations. And one of those is over at
13:39 fast.ai, right? Focused on deep learning and data science type topics, right?
13:45 Yeah. So just going back to the universities, like you said, I was just really unhappy that, hey, there's this huge disconnect.
13:51 And like any smart person in their 20s, I just spent a lot of evenings ranting about it.
13:58 And at some point I decided, OK, this is going to help me. And I just started signing up for every single course on the
14:06 internet. I used to say this proudly that I've done 50 plus courses to my peers who would look up to me that, oh, this guy's...
14:13 Oh, I've only done 10.
14:14 Yeah. But in retrospect, I was just being dumb and chasing all of these courses. Fast.ai, in retrospect,
14:21 and I keep saying this, but it's the most impactful course in my career. So Fast.ai is not just a course,
14:27 it's also community and software. But I got introduced to top-down learning through them. And they make you
14:34 excited about this stuff. In the first lecture of the deep learning course, they have a bunch of courses.
14:39 They teach you how to put together a few lines of code. Of course, you don't know what's happening,
14:43 behind it. But you build something that's state-of-the-art. And Jeremy Howard, the creator,
14:48 shows you how you can get to the top of a leaderboard on a Kaggle competition. I don't know
14:52 what's more exciting than that, at least to someone who was printing out star patterns in university.
14:57 Yeah, that's really neat. And I think the community aspect is also pretty important, having that ability
15:03 to sort of bond with people there. So MOOC, the M stands for massive, like number of massive in terms
15:09 of number of people, because it's a large group. I haven't gone through their courses or anything.
15:13 At this point, it's a few hundred thousand people, I'm sure might be more than that.
15:18 That probably counts as massive. You know, if you compare it against a 30-person college course or
15:22 whatever. Yeah. Okay.
15:24 The biggest mind opener for me was, we suck at diversity in tech, right? No other way of putting
15:29 it. And just talking to different people on these online communities, people who don't have computer
15:36 science degree or were coming from different walks of life. I didn't understand that, hey, you're supposed
15:41 to have other responsibilities as well. You're supposed to be helping your family out. I just assume you can do this
15:47 in your free time and that's all you do. But that was also a mind opener for me during those days.
15:51 This portion of Talk Python Army is brought to you by Sentry. How would you like to remove a little stress from your
15:58 life? Do you worry that users might be having difficulties or are encountering errors in your app right now?
16:03 Would you even know it until they send that support email? How much better would it be to have the error and
16:09 performance details immediately sent to you, including the call stack and values of local variables and the
16:15 active user recorded in that report? With Sentry, this is not only possible, it's simple. In fact, we use Sentry
16:22 on all the Talk Python web properties. We've actually fixed a bug triggered by a user and had the upgrade
16:29 ready to roll out as we got their support email. That was a great email to write back. We saw your error and
16:34 have already rolled out the fix. Imagine their surprise. Surprise and delight your users today.
16:39 Create your Sentry account at talkpython.fm/sentry. And if you sign up with the code
16:44 Talk Python2021, it's good for two months of Sentry's team plan, which will give you up to 20 times as
16:51 many monthly events as well as other features. So just use that code Talk Python2021 as your promo code
16:58 when you sign up. One of the things I really value about the Python community is it's not just straight CS
17:07 to sort of deep applied Python out of like this university chain, but rather so many people are
17:14 brought in from different areas, right? People are interested in biology and they learn a little
17:18 Python. People are doing astronomy and they learn a little Python. People are building Instagram and
17:23 you know, they're using Python. So there's just this diversity of viewpoints and specialties that
17:29 comes to Python that's really unique. And it sounds like you kind of got that feeling as well here.
17:33 I was always very welcomed by and the fast day community especially is very warm and welcoming. So
17:40 it's at least at that time, I went to Reddit to ask a few questions. And I got a lot of harsh feedback,
17:47 which really demotivated me. But fast day community was the exact opposite at that time. Reddit is a lot
17:53 better now. Any other community for that matter. But it's a very welcoming community. And no one says
17:58 that, hey kid, you're not supposed to be asking these stupid questions. Rather, even the creator
18:02 himself, Jeremy Howard often hangs out in the forums, answer all of the questions. So it's really put back
18:08 the inspiration in me while I was just in this dark phase, nothing making sense in university.
18:13 Yeah, very cool. So there's a couple of things that you've done. Let's set the stage and then we'll dive
18:19 into the details on them. So as you've gotten your degree, if you've gotten better in data science and
18:25 deep learning, there's a handful of things you've done to give back to the community and stretch yourself as
18:31 well. One is to work on your blog and write articles. Two is to create your podcast, which I was happy to be a
18:37 guest on a while ago. It's very nicely done. Thanks for saying yes. It was a very exciting moment to host you,
18:44 honestly. Yeah, thank you. And yeah, so the podcasts and then also the Kaggle competitions. So let's start
18:52 with your blog posts. I just pick a couple out of here that are interesting. One of them is how not to do
18:58 fast AI or any other ML MOOC course, right? Yeah. And so you go through sort of how you
19:05 approached these courses. You talked about how you took 50 courses, which is on one hand, I think it's
19:11 really awesome to get that exposure. But on the other hand, to really master programming, you need to
19:18 stop and try to like solve concrete problems, fail at that, figure out like, well, I'm trying to solve
19:25 this problem. I can't even get a virtual environment set up to let me install this library. Like what is
19:29 going, you have to hit your head against that. And it feels like you're bad. It's just, you know,
19:34 it's building layers of experience in a way that like, it's not the funnest, but you got to go through
19:40 those steps and then you sort of work your way into developing that experience. There's not a super
19:45 shortcut. Having the courses helps give you the perspective and know where to focus, but it's still,
19:50 you kind of got to go through that path, right? So maybe talk us through how you approached it and
19:55 then the advice you might have after. Yeah. And to counter back also, just generally speaking,
20:00 maybe I'm not the most outward looking person, but I didn't find these ideas of, you know, building any
20:08 projects. So I couldn't think of a website that would look interesting. So I would just go to a course,
20:12 assuming that I would learn all of this stuff. And a lot of these MOOCs are very nicely marketed,
20:17 that they make you feel that, okay, I'm going to come out learning something. So I just followed
20:22 this trail of stuff that I would keep looking up. I need to know Python. So I would do a Python course.
20:28 Then I would take a course on different frameworks, keep doing that. And even at the end, I didn't
20:34 accomplish much because again, there was this huge disconnect because if anyone would tell me to do
20:39 anything that's slightly outside of the curriculum, I would fail at that. And that's just because I didn't
20:43 experiment as much. And by in retrospect, I should have spent at least thrice or twice,
20:50 at least twice as much time just trying to quote even the stupidest idea possible instead of just
20:57 watching those lectures because they felt in my comfort zone that, okay, I'm learning something,
21:03 but I wasn't learning something at that point.
21:05 Well, you are learning something. I do think that being able to watch the lectures of an online
21:10 course and following along, like you're getting real exposure and real stuff, but you're not,
21:16 even though you're feeling comfortable, you're not at a place where if somebody said,
21:20 now go build something different, it's not that different, but it's different and do it from
21:24 scratch, right? You're not building up that skillset unless you're also experimenting along the way.
21:29 Yes. And I'm sorry, just to clarify. So this wasn't for the first course I had taken like at
21:33 least 10 of them and I was watching the same stuff over and over again. So at that point,
21:37 it was a waste of time, I think.
21:39 Yeah, for sure. For sure. So you said, all right, well, I'm not so sure that the way I was doing it
21:45 was totally the right way. So what would you say is the right way? What some advice would you give
21:50 there for being successful in these online courses?
21:52 Sure. I'll point out to a book by Rade Kosmalski, who I had interviewed on my podcast earlier, but he's
21:59 put out a book that essentially talks about different things that you should be learning or how should
22:03 you really approach learning. And in his book, he talks about code twice as much as reading theory,
22:09 have this northern light of an idea. I wasn't again, I couldn't think of anything. So I took to Kaggle
22:15 the competitions. In my opinion, at least in my opinion, just do fast theory and then jump on to
22:20 Kaggle. Those are the two best places to learn about data science, in my opinion.
22:24 So the Kaggle competitions are interesting. Let's maybe talk about those for a little bit.
22:29 Sure.
22:30 I haven't talked about Kaggle a lot on the show. I'm sure people are mostly familiar, but maybe not
22:36 everyone is. So just tell us what is Kaggle.
22:39 Sure. And fun fact, the CEO actually tweeted yesterday. So at this point, Kaggle is at seven
22:45 million users, I think. So when they say they're the home of data science, it's really the biggest
22:49 community in data science. And why do I say community? It has competitions that are hosted
22:54 on the platform. So different use cases for different companies exist as competitions.
23:00 Now, as you can see, the first one is an example one, but different competitions are brought onto
23:07 the platform by companies who want the community to solve a problem. In return, there are price pools,
23:14 but really what people are there for is the knowledge sharing that happens. And how does
23:18 that happen? They also have very nice discussion forums, as well as notebooks. At some point,
23:24 they call it kernels if you're not familiar. But essentially, you can host Jupyter notebooks
23:29 on the platform where people share their stuff. And this is the best of the best on the platform.
23:34 So they share tips and tricks of how you can approach the competition. And then you start to
23:38 try and compete on a leaderboard. And you get real time feedback because there are at times
23:43 thousand people competing on the leaderboard, which may or may not be a good experience,
23:47 from my experience, at least for the first few competitions. But it's very exciting.
23:52 It's a little bit like a hackathon type of thing, but very focused on a data science problem,
23:56 not only generating an app or a website. Maybe that's a good elevator pitch.
24:01 Exactly.
24:01 Okay. So I'm sitting here looking at Kaggle.com slash competitions. And yeah, I can see a bunch
24:08 of interesting things. It doesn't explicitly say who it's sponsored by on the outside. Maybe if I click
24:12 and it'll say, oh yeah, this is brought to you by or sponsored by or put out by so-and-so. But the
24:20 first one is a Simphysibio RSNA COVID-19 detection, which sounds like a bunch of acronyms. I don't know
24:27 anything about it, although I've heard of COVID. The idea is to identify and localize COVID-19 abnormalities
24:35 in chest x-rays, which is interesting. And that's a genuinely useful thing that we could all benefit from.
24:42 Right. Having machine learning that can assist doctors and say, wait a minute, wait a minute,
24:47 this person seems to have either had or currently has COVID based on this picture. Let's do something
24:53 about that. That's genuinely helpful for society. And if I can just point out, at least for this
24:58 particular competition, I think it launched a few days ago. And just in those few days, you already
25:03 have 450 people that are, I can say just hyper, they'll be hyperactive in the discussion. Then a lot
25:09 of us just go there for the learning. I'm sure most of us just go there for the learning and things you
25:14 get to experiment with and learn on there. Yeah. And it says the prize for this is a hundred thousand
25:18 dollars in the US, which is pretty sweet. Is that split like number one gets half, number two gets a
25:25 quarter and it like trails off or is it all or nothing? Number one or zero? I think it's in the top
25:30 three, sometimes in the top five. It varies from competition to competition. And again, it's really hard to get into
25:36 the top. They have medals. They've gamified all of this stuff. And how's that helpful from an outside
25:43 perspective? As you gain medals, you move higher up the ranks as well as tiers. You start as a novice,
25:48 then you become a quote unquote expert, master and then grandmaster. So as you earn a certain set of
25:54 medals, you start on your part towards becoming a grandmaster. So that's more exciting than the prize pool.
26:01 Again, legends or very experienced people are aiming for the prize pool. I don't think I've ever even
26:08 dreamt of that. Right. If it's, you know, too far out of reach, it's not worth trying to worry about
26:14 that. It's more about making the progress and seeing yourself go up in the charts and gain that
26:19 experience. Right. Yeah. Yeah. So let's see some other ones. I went and sorted by prize purse here. So
26:25 Jane Street market prediction, test your model against future real market data. That's interesting.
26:30 there's 4,000 teams competing for that. There's one about discover how data is used for the public
26:35 good in the US for 90,000. That's pretty cool. Major League Baseball has one on digital engagement
26:43 forecasting. So predicting fan engagement for a baseball player, digital content. That's pretty
26:48 cool. This launch, I think less than a day ago and there are already 15 teams on there. I'm sure if you
26:54 go over to this competition, you can see some stuff in the discussion and kernels already. Yeah.
26:59 One that is very close to my heart is SETI, Breakthrough Listen, ET signal search. So
27:05 find extraterrestrial signals in the data from deep space. That's pretty cool. The prize is not huge,
27:10 but you know, if you were the person that discovered aliens, come on, I mean, that's a pretty good prize.
27:15 And that's just zooming back to where this conversation started. Like I said, I'm not the
27:20 person who could think of these ideas. And now I'm given this large number of options,
27:25 whatever is exciting to me. I can jump on that competition. Even if I have zero idea about how
27:31 to approach that problem, there'll be plenty of stuff that shared there. And I can just go from
27:35 there. I can just start learning. I can just try to approach this in a top down fashion.
27:39 Yeah, absolutely. So another one of your blog posts that you wrote is your first Kaggle competition
27:44 experience, writing basically retrospective on that. So maybe tell us what that was like.
27:49 Sure. So in this competition, and I tend to set these goals every year. So I just
27:54 announce my goals, go big or go home, right? I just tweet out the craziest stuff that I couldn't imagine.
28:00 Last year, I wanted to lose 50 pounds. I managed to lose 70 pounds.
28:04 Congratulations. That's massive.
28:05 Thank you. But yeah, I just set these goals. And one of these goals was to start on competing on Kaggle.
28:12 So in this competition, my first one ever, and all of these competitions are a similar experience. I just
28:18 joined the quick draw doodle competition because again, it looked exciting to me. What I did at that
28:24 time was just went to the discussion. I found people sharing stuff, sharing code. I just took that,
28:29 tweaked a few numbers, tweaked a few parameters, didn't make much sense. And I started moving up
28:34 the leaderboard. So the leaderboard is the most exciting and most addicting thing on Kaggle because
28:39 you're getting this real-time feedback. Okay, I'm doing better than these people. And then you go to bed,
28:44 you wake up, someone has shared a tip or a trick somewhere in a kernel or a discussion. And now
28:51 everyone has used that. And by the time you wake up, you're down by a hundred positions.
28:54 I see. They're like, oh, you're all just training all the data. What if you use like transfer learning
29:00 on this little subset? This is actually totally crushing it. Everybody's like, we're changing
29:05 what we're doing. And you wake up and you fall on down the leaderboard massively, huh?
29:08 Exactly. And again, now you have to get back to work.
29:11 One thing you talked about in your blog posts was how going through it, you got some pretty good
29:16 real world experience, right? You talked about how you, where were you talking about? You talked about
29:22 how you took all the training data and the data is a lot for this competition. There's like a billion
29:29 images described as a CSV file or something weird like that in each image. And so you took all that
29:35 data, loaded the training data, not all the data, and loaded it up and sent it over on your GPU. And it
29:42 took 50 hours, like more than two days. And you expected you're going to crush it, right? And it turns out
29:49 that like, actually that made it less accurate, right? So you had to get more creative. Maybe tell
29:53 us about that.
29:54 Yeah. And again, this was this disconnect that I found in from these MOOCs that I was coming from
29:59 with everything is just structured so nicely that it's supposed to work. And I just took that approach.
30:03 Okay. I'm just going to check all of the data in a data loader, put it on my GPU related train and
30:08 I'll get a good accuracy. It turns out not really because it's not how this problem was structured. And again,
30:14 I learned about all of these, I think from a practitioner's perspective, important things
30:19 where I learned, Hey, I need to structure my project in a way, because at some point I'll be
30:24 an untitled 152.ipython notebook and I need to go back. I wouldn't have a track of that.
30:30 I should probably do smaller experiments rather than the first one being a 50 hour long experiment.
30:35 So I should try and figure out how to run it on a subset of the data.
30:39 Yeah. That's a really good point because if you're waiting 50 hours per iteration, that's not going to go
30:44 very quickly.
30:45 It sounds very easy and very obvious, but it wasn't to me at least. Maybe I was stupid at that point.
30:50 Well, no, I wouldn't say necessarily that. I mean, it probably seemed like, well, of course, if it's working
30:56 a little bit, let's just give it all the data, then it's going to really work. Right. That's a pretty reasonable,
31:00 naive beginner point of view that that's going to be totally fine. But then in reality, you know, reality comes
31:08 along. Well, it's more complicated. So you ended up coming with a combination of like some of the
31:13 larger images, some of the smaller images and building up out of like that kind of stuff.
31:17 Right. Yeah. So I learned that, hey, maybe I should start with 1% of the training data, put up a baseline
31:23 again, obvious stuff, and then try to work with different image sizes. And what I was trying to do
31:28 is see if the accuracy, according to my local validation was going up and submitting it to the
31:33 leaderboards and just checking if it's actually working and then training bigger models through
31:38 that. At that point, Resonate was, I think, state of the art. That's what I was sticking to,
31:43 because I didn't have any outside idea about that. Other people were, of course, doing a lot of things
31:47 that I was just saying, I was just saying, do catch up. Sure. This portion of Talk Python to
31:52 me is brought to you by YourBase. YourBase has a really cool product that will dramatically improve
31:58 testing and CI of your Python applications. If you could benefit from having pytest run your test 100
32:04 times faster or more, you need to check them out. Here's how it works. YourBase observes what tests
32:10 interact with which part of your application code. And the first time you run it, the speed is roughly the
32:14 same as normal. But the next time you run pytest is where the magic is. YourBase knows which parts of
32:21 your application code has changed. If the code under test hasn't changed, why test it again? So YourBase
32:26 only runs the tests that have interacted with the part of the code that has. If you change just a couple
32:32 of functions, you only need to run the few relevant tests and all the others can be safely skipped.
32:37 This means skipping hundreds or even thousands of tests most of the time, making your dev test
32:43 workflow and your CI builds much, much faster. All you have to do is install YourBase and run pytest as usual.
32:50 They'll take it from there. Get your free trial by visiting talkpython.fm/yourbase. YourBase test
32:57 acceleration works with the tools you're already using. So give them a pip install and see the difference right away.
33:02 Get started at talkpython.fm/yourbase.
33:05 So you're a fan of Kaggle. You recommend people come along and use this for concrete ways to
33:13 get started and build their knowledge beyond just theoretical stuff?
33:16 100%. I would just say in retrospect, I would just tell myself to, hey, do fast, hey, sincerely once and
33:23 then just sign up for any competition and go from there.
33:25 Is it better to do it with a team of people? Do it by yourself?
33:28 I'll be honest. Sometimes I would not be the person working the hardest in the team. So
33:34 I would tell myself to at least start solo and then team up with different people. Everyone follows
33:39 different approaches, but at least for me, I tend to be the lazy person. So I would
33:43 make sure that I've done some homework because before asking other people to join the team.
33:48 Yeah, that makes a lot of sense. But apart from that, when you join your team and all of my
33:53 Kaggle quote unquote successes, I would credit it to all of the teams I've been a part of. And then you
33:58 get to meet all of these data scientists in a team where they're from different levels of experience and
34:04 they're doing these things that I couldn't have imagined. It's again, a greater learning experience
34:09 in that sense.
34:09 Yeah. What's the story in terms of people who are in the talk, you talked about them being
34:15 grandmasters or whatever they're called. Yeah. There's grandmasters, masters, experts,
34:19 contributors, and novices in the ranking here. What's the job story look like? The career
34:25 story. So if I'm over here and I'm one of the 1,500 masters in Kaggle, like dropping that
34:33 information at a job interview, is that going to get me somewhere or not? Do you think?
34:36 It depends on the company a lot. So when I say the company where I work, H2O.com has a lot of
34:43 taglers. We have 20 grandmasters. I think out of the five we can see right now, three are a part of
34:49 H2O.
34:50 Oh my gosh. Yeah, that's like 10% of all of them. That's awesome.
34:54 Oh, sorry. Four in the top five are a part of H2O at this point. Three of them, sorry.
34:57 Yeah, amazing.
34:58 So such a place, they of course recognize the fact that this isn't easy. If you're a master,
35:03 you're probably already in the top one, top 0.5% of the global rankings. And there's a lot of work
35:09 behind that. So I think it does make a lot of sense. Some companies don't recognize it. Maybe
35:14 I wouldn't want to work at those companies. Again, hot take.
35:17 Yeah, that's actually an interesting point, isn't it? Like if the person interviewing you for a data
35:22 science position doesn't know about Kaggle and respect like massive progress there, maybe you don't
35:27 want to really be on that team. Unless you're like, we're hiring you to like modernize this and set
35:33 the stage and like bring like the real stuff to us. But if it's like, join the team, we'll show you how
35:38 it's done. It's like, eh, you don't know what Kaggle is. Okay.
35:41 It's just a portfolio of projects. You can tell everyone that, hey, I worked on this problem that
35:47 your company is working on. And against the best of the best, I ran say 10 out of 1000. And that's,
35:54 that should be a huge signal to the hiring people. I agree. I think, you know, put aside
35:58 the competition, put aside the, how do you rank against other people? If you can come over here
36:03 and say, oh, you see this major league baseball digital engagement thing? I did that and it came
36:08 out pretty well, actually solved that problem. And here's my GitHub repo for that and our conversations
36:14 around it. This one about the prediction of future sales also did that. And then this home price one,
36:20 actually I was near the top of that, like just having that kind of portfolio to share as part of
36:27 an interview is so incredibly important. So many people ask me, I want to get a job in this thing.
36:33 How do I get started? Do I need degree X or should I go learn this technology or that technology? Like
36:40 all those things are interesting and valuable, but being something I really like about the tech industry,
36:46 but it's also, you know, it's a challenging cause that's kind of where you got to live is it's not
36:51 so much your credentials or your background that will get you the opportunities. It's I need somebody
36:55 that does this. I need somebody that knows how to predict house prices. You predicted house prices.
37:00 You've shown, you can do it. You're hired, right? Like if you can show that you're doing the thing that
37:05 they already need, there's not a whole large discussion going on after that, right? You're
37:10 really close to being in the right place to do that thing. So building up this portfolio
37:14 is important. I think I managed to somewhat figure this out in my university days out of an interest
37:19 just to explore problems that I started freelancing, which was because I wasn't allowed to have a job
37:25 job while being in university. And at that point I figured out, Hey, if I'm going to approach a person
37:30 on let's say Upwork and they want me to build something, I shouldn't be starting after we've had that
37:36 conversation. If I can just look at the problem, even put together the most basic structure around
37:42 it. And I can show it to them that, Hey, I put this together in two days. If you hire me, I can
37:46 build this in X amount of days. And most of the times that got me through the clients or whatever
37:52 deals I've got in that world. Yeah. And getting that first or second project under your belt.
37:57 It's really important. I feel like Kaggle is part of that. Also, you know, Upwork is interesting that
38:01 you bring that up. I I'm a fan of Upwork. If I was starting out and trying to get my first project,
38:07 my first job, and I was having a hard time in my local area of finding that I'd certainly consider
38:13 looking and seeing what jobs are out there in Upwork, even if I thought they didn't pay very
38:17 well, or I didn't totally want them just having that one or two projects done. And part of my resume,
38:23 then you can start looking, you know, more broadly. And it's just going to be such a help to have some
38:29 kind of portfolio. Right. So as a student, I have the pay really didn't matter. It was a lot as a
38:34 student, but my biggest promotion in life was going from that basic food menu to looking up
38:41 that menu as I started making money. That was exciting. It almost felt illegal that he, someone
38:47 is paying me to write code. Yeah. I remember my first job. I was so super excited. It almost didn't
38:52 matter what they could have paid minimum wage and I would have been thrilled about it because,
38:55 oh my gosh, someone's paying me to learn programming. And look, I have a book. I'm
38:59 spending half my time just learning how to do this. I mean, they're basically paying me to learn this
39:04 stuff. It's amazing. So yeah, I really, really had the same feeling when I was getting started.
39:08 Fantastic. All right. So another interesting area around what you're doing has to do with your
39:14 podcast. So maybe we could talk about just a couple of your, a couple of your interviews that
39:19 you've done that you really liked, right? Sure.
39:21 You found interesting. So tell us a bit about a couple of them.
39:24 Sure. So at some point how this started was I was doing all of these, I was trying to
39:28 essentially explore different areas of content creation. I started with blogging,
39:32 fast AI, gurus, Jeremy Howard told us to write blog posts. So I started doing that.
39:38 And at some point I found this disconnect of advices. So I reached out to a friend that,
39:43 Hey, would you, you've been helping me a lot. Is it okay if I put this together in a blog post
39:48 and put it out in the world? And that went on for a while. I started this as a blog series.
39:52 And later after I graduated, I thought, okay, maybe if I do this as a podcast and I'm sure
39:58 you would agree, I could explore all of these great people's mind in a bigger depth. So that's
40:04 how the podcast started for me.
40:05 Yeah. Well, one of the big secrets about having a podcast is I get to be the first
40:10 listener basically to all these interviews, right? I mean, I guess now that we're live streaming,
40:14 it would have like 50 first listeners or something, whatever it turns out to be. But
40:17 it's really amazing the opportunity to just meet these people that you're really interested in,
40:23 especially with conferences being gone and stuff. Now that's really hard to find time to like meet
40:27 up and just talk about them, but Hey, you can have them as a guest on your show. It's really nice.
40:31 Exactly. So coming back to my favorite interviews, I try to interview people about their journey.
40:36 As someone who's trying to understand how did this great person, Radek Osmalski, we have Radek
40:42 Osmalski's interview on top. He's one of my heroes from Fast AI. But how did someone like him learn
40:48 programming? How did they learn how to Kaggle? How did they break into the field? And we, at least in
40:53 the interviews, I try to ask them, did you face this problem? How did you overcome it?
40:57 My three favorite interviews would be Radek's, Dima Domains and Rada. So I try to interview people who
41:04 are Kagglers, practitioners and researchers, essentially anyone I can find who would like
41:09 to share their journey. And these are from all three aspects, essentially.
41:13 That's cool. Dima, Damon, she did video recognition and computer vision. That sounds super interesting.
41:19 I remember in her interview, I started by asking her, Hey, when you were doing your research,
41:24 you were just using OpenCV. What do you think about it nowadays? Apart from that, it's also a lot about
41:30 her research perspective. So Dima is very much experienced and is a great orator as well. So
41:35 she was talking about how to approach your first research project or how to just go about research.
41:41 What is even research as someone who doesn't understand what that word means? And that's
41:45 what I try to explore in all of these interviews.
41:47 Yeah. So let's go back to Kaggle for just a minute, because something you touched on is really
41:52 interesting. I know there are a lot of research teams and groups at universities who are trying to
41:59 build models or trying to build mathematical algorithms or trying to do research. I feel like
42:05 maybe some of these Kaggle competitions would be really, really good to say, as part of our research
42:10 project, let's take what we're trying to develop here and try to actually apply it to one of these
42:15 competitions and see where it stands.
42:17 For sure. And it's highly encouraged, at least in the community, some organizers,
42:21 so the sponsors of the competition invite you to present your solution even in research conferences.
42:28 And apart from that, even if you end up creating a blog post or a research paper outside of
42:33 the community is very close and they recognize it instantly. And they know that, I mean, nothing
42:38 against research, but at least this particular solution has been tried and tested against this
42:44 leaderboard and it works really well. It is quite cutting edge because it's been tested against all
42:48 of these people.
42:49 Yeah. Yeah. Very neat. All right. Third one that you had queued up for us is Andrada
42:53 Altianu.
42:54 Altianu. Yeah.
42:55 Altianu. Yeah.
42:57 The best part about these interviews is I just get to meet all of these people with such amazing
43:01 energy and such openness about their journey. This was again, such a fun interview because Andrada was
43:07 so open about her journey. I was at that point, I was just starting my journey in data visualization.
43:12 And I asked her, Hey Andrada, did you feel the same that you couldn't plot things against X and Y axis and then you would have a hard time figuring out where they are ending up? Because at least for me,
43:22 I couldn't understand what's going on. And that's what we discussed about. And she was essentially talking about how she started her journey as someone who's fairly new to coding.
43:31 And at this point, she's become a Kaggle Grandmaster in kernels and she's been writing all of these amazing notebooks. And in this interview, we just learned about how she went about that as someone who,
43:42 just started out and then learned all about this as they went about.
43:45 Yeah. That looks like a really interesting interview and somewhat similar to the one that a conversation we're having here, right?
43:50 I think so. Yes.
43:51 Yeah. Yeah. All right. One final area that I want to make sure we get to spend some time on is, you know, you work at H2O.ai. I've had a lot of experience with these different frameworks. Maybe we could do like a survey of the various deep learning ML libraries and you could sort of tell me how they compare and your thoughts on the various ones.
44:09 Sure. So the vision set for, both by our founder, co-founder and CEO is makers going to make. And what we're trying to do on a philosophical level is just create products that allow people to build stuff.
44:23 So with that vision, and this is just my take on it, not from a company's perspective, they've built together all of these auto ML products. So Wave being the latest one, I'm sure we'll talk about this. But apart from that, they started out with the open source H2O3, which was an auto ML framework. It's still one of the most widely used ones. Followed by driverless CI, which is an end-to-end auto ML product where essentially you just upload your data.
44:48 And what I like to call the Ironman mode where you just click a button, it figures out what models need to be trained, does all the feature engineering and puts out a nice model for you.
44:59 So we have this arsenal of auto ML products. At this point, open source and both enterprise facing aimed at different problems. Wave being the latest one of them.
45:08 Yeah. Wave, H2O Wave is pretty interesting. I guess it's at wave.h2o.ai. And it's a real-time web app dashboard for Python and data science. And a lot of the data science things I see are about making static graphs or maybe graphs that you can go and explore.
45:26 Like I could move my mouse over and it'll like highlight information about different parts that I could zoom into it and whatnot. But this is like a real-time changing dashboard, like a stock market or like a factory or something like that. You want to see what's happening as time passes, right?
45:42 I wish you said crypto market.
45:45 Yes, exactly. So the reason I spoke about the philosophy is because I think this is the next bigger goal for the company. What we're trying to create is, I'm not sure if this is out yet or not. We're trying to build a public app store of AI apps.
45:58 So Wave is an open source framework that takes care of a lot of the things that as a data scientist, at least I wouldn't want to worry about. I don't want to learn HTML, CSS, JavaScript. So it's just a framework that takes care of all of the UI, UX stuff, does it very nicely. I don't have to worry about messing up because it's taken care of. And then I can build different AI apps.
46:20 As a company, what we're trying to do is we're also putting out an app store where you can already use the open source apps if you want. And you can also contribute your own apps if you want.
46:30 Yeah, very cool. So there's a bunch of cool examples. There's a whole gallery full of many, many different things that you can go and write. Basically, Wave is a open source dashboard for Python developers that don't have to do web stuff, but they can share it as the web, right? On a website.
46:47 Exactly. And just to be clear for the audience, when I say we, my biggest contribution would probably be this interview. But again, it's this amazing team of engineers who have been building these products at H2O that they know how to scale them and how to properly engineer them through all of this experience. It's really a data scientist-focused product.
47:08 Every now and then there's like a project I'm like, or something out there. I'm like, I really wish I had a reason to use this. This looks like really fun to play with. I just have no use for it personally. This is one of those things, right?
47:18 I would love to have an excuse to use something like this and make it go, but I just don't have that much data that changes that much in my world.
47:27 Maybe you could together a web page of Talk Python where different episodes structure themselves and the listeners can see a dashboard in real time. Just a suggestion.
47:36 That would be cool. Like maybe downloads in real time and interaction in real time, comments. Yeah, for sure. Something like that.
47:42 Yeah.
47:43 Yeah. I mean, I could definitely like put a little bit of something, but if you worked at a place that like had a lot of stuff going on, like a factory or like a big e-commerce site or something, you could make a really cool live app out of this stuff, I feel like.
47:57 For sure. And again, it's still under development and all of our grandmasters who have this rich experience are also contributing to it. So I'm sure by the time this interview goes out, we would have added a lot to it.
48:09 Yeah, it's cool. It's already got 2.6,000 GitHub stars. That's pretty cool. So really, really nice. Maybe let's talk about some of the other mainstream ones as well, like Keras, TensorFlow, Fast AI. Give us your thoughts on these different frameworks. Obviously, it's your opinion, not like a, you know, endorsement or a deep dive or whatever. But just what do you think about these people or don't necessarily have experience with all of them?
48:30 Yeah. Just to be clear, I strongly endorse Fast AI. I've been a fan of that. That I'll agree on. But from that, I started my journey with TensorFlow.
48:38 TensorFlow, at least in that day, of course, TensorFlow has come along with the Keras API has been merged. But I was really struggling because it had this static graph structure and it didn't feel Pythonic. Not that I was a good Python programmer. I'm still not. So that's why to Fast AI and what Keras is to TensorFlow, Fast AI is somewhat to PyTorch. PyTorch follows this more Pythonic approach and Fast AI is a wrapper, but more on top of PyTorch.
49:05 Okay, interesting. Yeah. So Fast AI is maybe a little easier to get started with, you think?
49:10 Yeah. So the nice thing about Fast AI is a very heavily opinionated library. So there are a lot of things that have been baked into it. And for some reason, whenever I just switch to PyTorch, I am not able to replicate similar accuracies. That's what I mean, because somewhere defaults are so good that it always gets better results. But essentially, it's this layered API. And from an end user perspective, I could just use the high level API where they have on the left, you can, if you click on applications, you can see the difference.
49:40 Applications that they support. Or if I want to work on something that's cutting edge, I can also use the training loop, which is really nice, and just bring in a PyTorch model and connect that.
49:50 Yeah. Okay. Yeah, that looks really cool. Again, computer vision. I want to build a computer game AI that plays me in real life. Like put a camera over, say, a chess board, and it'll play me, but not just on the screen. As I actually move the things, right, it'll see. That'd be fun. Maybe I can try that out here.
50:11 Sounds very cool. Yeah.
50:12 Yeah, just a little bit of interaction with some real something or other there. That sounds cool. But a lot of options these days, right? And we've got all these different people in libraries and many things to choose from, right?
50:23 Again, one of the things that I've learned, at least from the podcast, and this is the collective opinion of everyone I've interviewed, again, you don't need to worry about the framework as much as you really understand the concepts. So that's why I encourage FastHair, because it's also a course around the framework. So once, at least from my perspective, when I've gotten around to learning all of these things, it shouldn't be that hard to switch to another framework, depending on whatever your job requires you to do or whatever your project needs you to use.
50:51 Yeah. Well, yeah, you learn the foundation, solve the problem one way, and then you can solve it with some other library more easily again and again.
50:58 It's really hard for me to remind myself that, hey, the problem is what I'm trying to solve and not create more problems. I don't want to learn more of different things, but I need to figure out how to minimize my time in a way that I actually solve the problem.
51:12 Yeah. I think one interesting thing that people learn as they get more experience is, even if the technology is super different, right? If I learned how to build something interesting in JavaScript, maybe I know nothing about Python, so how am I going to do that? But actually, what you've learned over in one place is really way more transferable and reusable.
51:31 Like the way of just thinking about solving problems, the way of thinking about, okay, I got to pay attention to this and not that. So what's important in this library, picking the right library are these things and so on.
51:41 And of course, you should keep switching between frameworks as well. The thing for me was I was switching as a very early stage developer. I'm still a very early stage, if I can even call myself a developer.
51:51 And I was switching between frameworks every 15 days just because they looked exciting. That's not the right thing I would tell myself to do.
52:00 Yeah, this is true. This is true. Get comfortable in one and then you can move around. But yeah, don't just chase the shiny thing all over the place for sure. Although in the data science world, there's so many shiny new things that there are to pay attention to and visualizations and libraries and charting and graphing and whatnot. It's easy to get distracted, I think.
52:18 For sure. And that's why I mentioned I need to remember what I'm working on. So I need to make the graph and not figure out how to make it prettier as long as it does what it's supposed to do.
52:26 But that framework looks exciting. Maybe I should try that over the weekend and now I'm spending 15 days.
52:32 More like here's an excuse to try that framework. This is my chance to try it. So I'm going to go do it.
52:36 Exactly.
52:37 Exactly. Well, all right. Comment on the live stream. Davinas says, hey, some advice on getting started on web development or data science, you know, Python. I'll throw out a little bit then you can add your thoughts.
52:49 Sure.
52:50 I would say you need to have some foundation in just Python basics, right? You need to know variables, loops, functions, like that kind of stuff. But don't kind of like the beginning conversation we had. Don't go so deep and say, well, I've got to completely understand everything about this language before I take the step to my first web app or before I take the step to like firing up Jupyter and doing my first analysis.
53:14 Like, don't do that. You know, just get comfortable with the basics. Start building. And as you go into more advanced areas, then you're like, OK, well, now I kind of need to learn about what is a list comprehension.
53:24 Michael, but first question for you. What are the basics? Really? That's one thing I really struggle with. I still struggle with because I look around on Twitter. Everyone smarter than me is talking about this stuff. And this is pretty basic. Is it? Am I the stupid person who needs to know all of this?
53:48 Well, here is the interesting thing. Like the people who are blogging, the people who are recording YouTube videos or people who are tweeting about things, they're already at like some certain level. And then they're super psyched about something advanced that they've just learned or some really cool scalability thing that they've learned.
54:09 And there's a really good article says titled something like you're not Instagram, you're not Google, you're not LinkedIn or something. So you don't need all these crazy design patterns and this like crazy cloud architecture that companies like that have because you're a two person startup that doesn't even yet have a business.
54:25 Build something simple. And I think there's a lot of people that are fascinated by either looking up, like, look where we could go and look at what Instagram is doing, look at what Google is doing. And they are amazing and interesting what those companies and teams are doing, but they don't apply to you now.
54:47 You know what I mean? So I think there's just a lot of really interesting conversations about stuff that's interesting, but not applicable to people who are beginners at all. Right.
54:57 Exactly.
54:57 You need to master Dockers and Kubernetes. Probably not. Can you run it on your computer? Yes. Okay. Then start there. We'll worry about Docker. Like once you get something working, maybe we'll put it in a container. But now, don't worry about that now. Get started.
55:10 Exactly. And just to the person asking this question, focus on getting the website up and give yourself a deadline. That's why I love setting goals publicly. Give yourself 10-20 days to figure out the Python basics and put together first website. You won't like it. In retrospect, you might hide it from a GitHub. I do that a lot. And over time, you'll polish it. It doesn't have to look like, like you said, Facebook or Instagram when it comes out.
55:36 It just needs to function somewhat. And sometimes you'll click a button, something will fail. But then you figure out that, okay, I need to fix this now. And now you have stuff to do. And then you can think of other things. Okay, maybe I should add this. Maybe I should add a button. You're making progress already.
55:51 Yeah. Yeah, absolutely. And the other thing to keep in mind is that software is plastic. It's malleable. It can be changed. You don't have to get it right the first time. You have to just make progress.
56:02 You've learned more than you change it and you make more progress. And so many people can get hung up, like not even getting started because like, well, I'm not really sure how to get started. Like, just take a step. If it's wrong, you take a step in a slightly different direction until you get in the right place. Like that's how you do it without getting hung up, without trying to boil the ocean by learning everything.
56:21 Exactly.
56:21 All right. Maybe that's a good place to leave it there for that conversation. But yeah, it's super interesting to hear your story. And congratulations on the success coming from getting started with small projects in college to working for H2O AI.
56:35 I'm still learning a lot. But again, thanks so much for this opportunity. Like I said, I think there are two types of teachers. First, that they introduce you to something and the second that make you really interested in it. You were the second one to me because I just got so excited about all of these things through your podcast. And of course, there were others as well, but you were a major part of it. And yeah, thanks for this opportunity.
56:55 Oh, yeah. Thanks so much. Now you're not out of here yet, though. You got to answer the two final questions. If you're going to write some Python code, what editor do you use?
57:03 Jupyter notebook.
57:04 Okay, yeah, right on. And then is there some library or something on PyPI you've come across recently? You're like, oh, this is super cool. Got to tell people about this.
57:12 I keep running into all of them every second day. But I would say just discovering fast AI was the biggest wow moments for me.
57:18 Yeah. All right. So fast AI. Perfect. That's a good one. All right. Final call to action. People are interested.
57:24 They're maybe also listening, getting into programming, getting into data science. And what advice do you have for them?
57:30 Just build something or just go to Kaggle if you can't figure out what project to work on. I still struggle with that inspiration a lot. So I just, I would just tell myself to go to Kaggle and sign up for any competition that I like the most and go from there. Probably take fast AI along the way and you're all set.
57:45 All right. Fantastic. Well, thanks so much for being here and catch you later.
57:49 Thanks so much.
57:50 Yeah. Bye.
57:52 This has been another episode of Talk Python to Me. Our guest on this episode was Sanyam Bhutani.
57:57 It was brought to you by Sentry, Your Base, and Assembly AI.
58:01 Take some stress out of your life. Get notified immediately about errors in your web applications with Sentry.
58:07 Just visit talkpython.fm/sentry and get started for free and use the promo code talkpython2021 when you sign up.
58:16 Your Base test acceleration will dramatically improve dev test workflows and CI builds of your Python applications.
58:23 If you could benefit from having pytest run your tests 100 times faster or more, you need to check them out.
58:28 Get started at talkpython.fm/yourbase.
58:33 Transcripts for this and all of our episodes are brought to you by Assembly AI. Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code. Visit talkpython.fm/assembly AI.
58:45 Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python.
58:50 Our content ranges from true beginners to deeply advanced topics like memory and async.
58:55 And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm.
59:01 Be sure to subscribe to the show. Open your favorite podcast app and search for Python. We should be right at the top.
59:07 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.
59:17 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.
59:28 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.
59:34 Thank you.
59:54 Thank you.