Monitor performance issues & errors in your code

#322: A path into data science Transcript

Recorded on Thursday, Jun 10, 2021.

00:00 Are you interested in getting ahead in data science? On this episode, you'll meet Sanyam Bhutani, who studied computer science, but found his education didn't prepare him for getting a data science focused job. That's where he started his own path of self education and advancement. Now he's working at an AI startup and ranking high on kaggle. This is talk Python to me, Episode 322, recorded June 10, 2021.

00:37 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talk python.fm. And follow the show on Twitter via @talkpython. This episode is brought to you by 'SENTRY' and 'YourBase' and the transcripts are brought to you by 'Assembly AI', please check out what they're offering during their segments. It really helps support the show. Sanyam , welcome to talk Python me, Michael. I'm disappointed I didn't hear the Steve balmar remix intro but I'm very honored. Aha developers, developers, developers. It's so good. No, it's so good. I remember that was in your first few episodes, I think they came out right around the time I was in university. Thanks for this opportunity. I've been a fan and listener of the show and really excited to be talking to you. Yeah, I'm really excited to have you. I'm excited to hear about your journey into data science, it's going to be so much fun, because I feel like so many people out there looking in from the outside, you know, they maybe didn't come into data science or to Python from a traditional computer science education. And they feel like Well, I didn't go through that path. And so I probably this is not a good fit. For me. I think that's very far from the truth. I think there's so many opportunities to get into data science, or to get into Python and programming. And while you do have some experience with computer science at the University, it sounds like as we'll learn through your journey, that a lot of what's actually effective, had very little to do with university. Let me start out on a spicy note, I studied computer science at University at one of the best universities in the country. It didn't make me a better programmer at all. Let me start with that.

02:23 That is spicy. Now you've thrown it down. And I want to come back to that. But let's just start with your story. You know, what got you into programming first with this a university thing that you pursued? Or were you interested in that beforehand? or How did you get into programming? Sure. So I was the standard nerd definition. I enjoyed spending time with computers, whenever my parents would go to sleep, I'd figure out a way to sneak into the computer room just play games all night. And along the way, I think somewhere in high school, I discovered programming. It was Java, unfortunately. But I saw the promise of it as all of these interesting things that were happening around it. And somewhere I just made up my mind that, hey, I want to take a computer science because that's what coders do. Unfortunately, not I learned later . That's how I got interested in it. And that's why I decided to take a course in Yeah, that's fantastic. I never took computer science as a major in college. But I studied math. And I had to take a couple of programming courses to sort of fulfill my math degree requirements. And yeah, I found it to be a mixed bag. Like I had to learn Scheme and Lisp. And I thought, well, that's not super practical, but moto who's.

03:33 But I've got to start here. I was like, please, let's do some C++ or something. Like No, no C++ for you darn it . And then I was told I had to learn Fortran, because it was the most important language I would ever learn. turned out to not be true. But I learned Fortran as well. And eventually I got into some fun languages that I got to build some things. But well, I don't know how you felt but my experience with being in a university, and this is speaking from doing this in the 90s. So it could have absolutely changed, right. I haven't gone back to the university sense. But I didn't get a lot of projects that I really love that was really super excited about it was like, well, you're gonna need to learn how to do this algorithm by hand on paper, and you're going to need to implement this. And this sort of archaic weird language. You got to do compiling stuff in your head through people talk about state of the art, yes. Oh my gosh, I'm like, Why? Why do we not get to use computers and our computer science course is crazy. I just don't get it. But here, here I am. So I didn't come away feeling like it made me a super good programmer. It gave me some exposure and some interesting experience. But my real exposure that got me into programming and told me like, revealed to me like you can do this and this is for you was when I was doing a research project that had to do with math and not programming but it needed a little pro it needed some programming, to do the simulations and do the work and I'm like well now this is super fun. This is the

05:00 kinda stuff I wanted to do. And I was up at 2am, you know, working on it in the computer labs, then because all of a sudden, it was really cool. So I don't know, hopefully, computer science is more practical these days. But I didn't find a huge value in those computer courses. I took it University, to be honest, like, I relate to that so much. I could just rant about this for hours. But I signed up for computer science, because there was this notion in my head that, hey, computer science is where you do programming stuff, right? You make computers smarter, and then they teaching us all of this stuff that, you know, doesn't really make sense. Like I remember listening to talk Python in those days, and you were talking about PyPI, which I didn't know what was because they never told us what it was. And I'm listening to all of this stuff. And what they're teaching us is how to make for loops print out patterns like I, I don't see how these things connect, right? You're talking about Flask building apps on Talk Python. I just heard Michael talk about it. But now what is all this stuff? What is inheritance? Where does it come into the picture? And there was this huge disconnect for me. So very much experience as well, unfortunately, yeah, that's interesting to hear you look back on it. One of the things that you talk about in some of your writings is some of your experience. And we'll get into it has to do with top down versus bottom up learning. Before we get to that, I want to make sure that you get to answer the both the opening question. So what are you doing these days? Right now, before we dive into that aspect of learning, should I currently work at 'H2O.ai', which is a company building auto ml products, I'm sure we'll get into this later on as well. I work as a content creator/engineer. So we have a makers gonna make culture which means that I have absolute freedom to bring ideas. Usually people don't stop me and they encouraged me, which also means that I can do a podcast at work. I started a podcast A while ago called 'Chai time data science' where I interview my heroes, a lot of them are kaggle grandmasters, so we can talk about this later as well. But kaggle has different tiers, Grandmaster is the highest one of them. H2O has, I think, more than 20 grandmasters so at some point I said, Hey, can I interview our people? And they said yes. And so I have a lot of freedom over stuff I do. But it's a lot of creating content and things in those domains. So blog posts, videos, yeah, I get to do meetups as well. That sounds like a really fun job. It is this whole exploring ideas and creating content and interviewing people and just being out in the community. It's an aspect of programming that when people first hear about it, I think is extremely surprising, right? Yes. A lot of people think of programming, especially before they really get into it. It's a solitary thing, that kind of geeky, super smart people do mostly alone, mostly to avoid other contact with others, right. And then as you get into it, you learn like, actually, there's a whole lot of team dynamics and programming. And then there's these roles that are like developer, evangelists, which sounds pretty similar to what you're kind of doing, like community outreach on the dev side, which is very social, and outgoing and interesting. And so yeah, it's, it's a whole spectrum. It's a lot of fun, for sure. And yeah, it would be closest to evangelism. But also, I have a lot of freedom to do a lot of, honestly, any anything I bring to the table, usually I get positive feedbacks about it. So I just keep doing stuff, even if it's interviewing people, which I Yeah, and it's fantastic. So back to my top down bottom up thing that you talk about it a lot of academic, your high school, college settings, the foundation is set at the beginning, right? Okay, well, we're going to teach you how to do derivatives, differential calculus. So what we're going to do is when I start out real, real simple, we're going to talk about what does a difference look like? And then we're gonna talk about limits. And then we're gonna, you know, eventually, like, two months later, you can do derivatives. And you can actually do calculus, right? Yeah, that first two months, you just have to have faith that I'm just gonna keep cranking on the details until something interesting happens. But even after those two months, like you get to solve these problems, you get to at least for me, I was able to confirm that he the answers, I'm getting match with those in the book. But what's the point of all this? Like, okay, I'm able to solve these problems. I know how to ace my tests. I know how to pack match into that. But where will this be used? I never find out and I just didn't have the passion for it. My only passion was, I need to get good grades to get into a good university. But apart from that, I had no real right. I want a good job. But like it must be the past. Yeah, basically. Yeah, I hear you. And I feel like so much of academics. And many presentations and courses are done this way as well. They, but especially academics, because you have to finish it to get the grade to get the degree. So it was like, well, it's fine. If it takes three months before this is interesting to anyone, because they have to stay here. They don't have no choice. Like we're going to build up slowly bit bit at a time for three months. Because guess what, they're all enrolled, and they need this. This is a required course. And so we're going to make sure we get every little detail in place along the way, and eventually it'll be interesting to them. I just feel like that is so

10:00 backwards from trying to capture inspiration of people, you know, somehow, every single time, there's always this disconnect, like, Okay, I get it. At some point, you have to know the concepts. But you never told us the bigger picture, which is what is a larger focus in the top down approach. So I never know where this, these individual things will really be used. It's like, yeah, before you get to drive you need to know about the thermodynamics of the engine, rather than sitting in the driver's seat. Yeah, there's a beautiful Ferrari you want to take out for driving like

10:31 to study physics. I study thermodynamics, a little chemistry for the combustion, and then a couple years, you can take that thing for drive Exactly. And I feel like actually captures that pretty well. You know, contrast that with. Well, let's just teach people the rules of taking a derivative, right? derivative of an x squared is 2x. Okay, great. Now let's show them how they solve cool problems like whoa, here's a ball flying through the air. And we can figure out its velocity when it hits the ground based on things like the derivative and acceleration and so on. And then eventually, once you're like, this is really interesting, then you could talk about like, Alright, now let's dig in, let's talk about like, the details of why this math or this data science algorithm works. And it just doesn't really go that way. So I think that's definitely an interesting part of the journey, that you had to make that switch, right to sort of go from this, like, really theoretical academic background to Oh, I've got like a kaggle competition. And I've got two weeks to solve the problem. We can't be, we can't be rebuilding it from the foundation. Let's go the other direction. Yeah. Just to be clear to the audience, I just did a Bachelor's in computer science. I didn't do a masters or PhD, I gave up on

11:39 midway. But yeah, I go on that. And it's the top down approach, I was introduced to this through 'fast.ai'. They are a big advocate of this. And that's that's how I became a fan of this. Essentially, what they cover in the blog post is, you're given the baseball bat, and you get to play first rather than being taught the physics of the curveball. And I think, at least for me, in retrospect, the main challenges throughout all of these months of learning a subject in university, you need to be able to stay motivated. And remember why you taken up a course, I took up web programming because let's say I want to learn how to make websites and not because I need to remember what HTML tags come in the final semester question every year. And somewhere in the middle, you lose out on this motivation. And the top down approach essentially takes care of that, that, Hey, bring your project and figure out stuff along the way. And I think I mentioned in our interview, I think talk Python courses really cover this well, because you're you're given 10 sets of projects, and you can just build them along the way. Yeah, thank you. Yeah. I mean, I really am a big fan of this, because I think that I tried to incorporate the courses that we have, because I do think you need to have these little wins right away. And you hear a lot of times people talk about like, well, if you're teaching kids, the kids need to have these, like good experiences early. It's like, you know what, replace kids with people, like people just, they've got a lot of time and other options, and you want to make them feel good and excited. Like they're making progress. But they need to make progress, even if it's little steps in the beginning, make it feel like legitimate progress, not just algorithms and loops and stuff like Yeah, yeah, absolutely. So you went through your computer science degree, but you didn't come out the other side feeling like a data scientists. And this was around the time of the MOOCs, right, the massive online open courses that was just for him. Okay, I think so something like that. It might be a very several variations. And one of those is over at fast.ai. Right, focused on deep learning and data science type topics, right? Yeah. So just going back to the universities, like I said, I was just really unhappy that, hey, there's this huge disconnect, unlike any smart person in the 2020s. I just spent a lot of evenings ranting about it. And at some point, I decided, Okay, this is this gonna help me. And I just started signing up for every single course on the internet, I used to say this proudly that I've done 50 plus courses to my peers who would look up to me that oh, this guy, so I've only done 10 Yeah.

14:16 But in retrospect, I was just being dumb and chasing all of these courses fast ai, in retrospect, and I keep saying this, but it's the most impactful course in my career. So fast is not just a course it's also community and software. But I got introduced to top down learning through them. And they make you excited about this stuff. In the first lecture of the deep learning course, they have a bunch of courses. They teach you how to put together a few lines of code. Of course, you don't know what's happening behind it. But you build something that state of the art and Jeremy Howard, the creator shows you how you can get to the top of a leaderboard on a kaggle competition. I don't know what's more exciting than that, at least to someone who was handing out star patterns in university. Yeah, that's really neat. And I think

15:00 The community aspect is also pretty important having that ability to sort of bond with people there. So MOOC, the M stands for massive, like number of massive in terms of number of people, because it's a large group, I haven't gone through their courses or anything, I disappointed. It's a few 100,000 people I'm sure might be more than that probably counts as massive, you know, if you compare it against your, like a 30, person college course or whatever. Yeah, okay, the biggest mind opener for me was we suck at diversity in tech, right? No other way of putting it and just talking to different people on these online communities, people who don't have computer science degree or were coming from different walks of life, I didn't understand that he was supposed to have other responsibilities as well, you're supposed to be helping your family out. I just assume you can do this in your free time. And that's all you do. But that was also a mind opener for me during those days.

15:53 This portion of talk Python to me is brought to you by SENTRY. How would you like to remove a little stress from your life? Do you worry that users might be having difficulties or are encountering errors in your app right now? Would you even know it until they send that support email? How much better would it be to have the error and performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in that report? With SENTRY, This is not only possible, it's simple. In fact, we use SENTRY on all the talk Python and web properties, we've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got their support email, that was a great email to write back, we saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users today, create your Sentry account at 'talkpython.fm/sentry'. And if you sign up with a code talk python 2021. It's good for two months of Sentries team plan, which will give you up to 20 times as many monthly events as well as other features. So just use that code talk Python 2021 as your promo code when you sign up.

17:00 One of the things I really value about the Python community is it's not just straight CS, two to sort of deep applied Python out of like this university chain, but rather, so many people are brought in from different areas, right? People are interested in biology and they learn a little Python, people are doing astronomy, and they learn a little Python, people are building Instagram, and you know, they're using Python. So there's just this diversity of viewpoints and specialties that comes to Python that's really unique. And it sounds like you kind of got that feeling as well, here. I was always very welcomed by Andy foster community especially is very warm and welcoming. So it's, at least at that time, I went to Reddit to ask a few questions. And I got a lot of harsh feedback, which really demotivated me, but fastly community was the exact opposite at that time, Reddit is a lot better now. Any other community for that matter? But it's a very welcoming community. And no one says that Hey, kid, you're not supposed to be asking these stupid questions rather, even the Creator Himself, Jeremy Howard often hangs out in the forums, answer all of the questions. So it's really put back the inspiration in me while I was just in this dark phase, nothing making sense in university. Yeah, very cool. So there's a couple of things that you've done. Let's set the stage. And then we'll dive into the details on them. So as you've gotten your degree, if you've gotten better in data science and deep learning, there's a handful of things you've done, sort of give back to the community and stretch yourself as well want us to work on your blog and write articles two is to create your podcast, which I was happy to be a guest on a while ago. And it's very nicely done. Thanks for saying is it was it was very, a very exciting moment to host you honestly. Yeah. Thank you. And yeah, so the podcasts and then also the kaggle competitions. So let's start with your blog post. I just pick a couple out of here that are interesting. One of them is how not to do fast AI or any other ML MOOC, or is right. Yeah. And so you go through sort of how you approached these courses, you talked about how you took 50 courses, which is on what I think it's really awesome to get that exposure. But on the other hand, to really master programming, you need to stop and try to like, solve concrete problems. Yes, fail at that figure out like, what I'm trying to solve this problem. I can't even get a virtual environment set up to let me install this library, like what is going, you have to hit your head against that and it feels like you're bad. It's just, you know, it's building layers of experience in a way that like, it's not the fun is but you got to go through those steps. And then you sort of work your way into developing that experience. There's not a super shortcut you having the courses helps give you the perspective and know where to focus, but it's still you kind of got to go that path, right. So maybe talk us through how you approached it. And then the vice advice you might have after Yeah, and to counter back also, just generally

20:00 Speaking maybe I'm not the most outward looking person, but I did'nt find these ideas of, you know, building any project. So I couldn't think of a website that would look interesting. So I would just go to a course, assuming that I would learn all of this stuff. And a lot of these MOOCs are very nicely marketed, that they make you feel that, Okay, I'm gonna come out learning something. So I just followed this trail of stuff that I would keep looking up, I need to know Python. So I would do a Python course, then I would take a course on different frameworks. Keep doing that. And even at the end, I didn't accomplish much, because again, there was this huge disconnect. Because if anyone would tell me to do anything that's slightly outside of the curriculum, I would fail at that. And that's just because I didn't experiment as much. And by in retrospect, I should have spent at least thrice or twice, at least twice as much time just trying to code even the stupidest idea possible, instead of just watching those lectures, because they they felt in my comfort zone that okay, I'm learning something, but I wasn't learning something. At that point. Well, you are learning something, I do think that being able to watch the lectures of an online course and following along, like you're getting real exposure and real stuff, but you're not even though you're feeling comfortable. You're not at a place where you're somebody said, Now go build something different. It's not that different, but it's different and do it from scratch. Right? You're not building up that skill set, unless you're also experimenting along the way. Yes. And I'm sorry, just to clarify. So this wasn't for the first course I had taken like at least 10 of them. And I was watching the same stuff over and over again. So at that point, it was a waste of time, I think. Yeah, for sure. For sure. So he said, All right. Well, I'm not so sure that the way I was doing it was totally the right way. So what would you say is the right way? What advice would you give there for being successful in these online courses? Sure. I'll point out to a book by Radico Malsky, who I had interviewed on my podcast earlier, but he's put out a book that essentially talks about different things that you should be learning, or how should you really approach learning. And in his book, he talks about code twice as much as reading theory, have this Northern Light of an idea? I wasn't again, I couldn't think of anything. So I took two kaggle competitions, in my opinion, at least in my opinion, just do fast ai and then jump out to kaggle. Those are the two best places to learn about data science, in my opinion. So the kaggle competitions are interesting. Let's maybe talk about those for a little bit. I haven't talked about kaggle a lot on the show. I'm sure people are mostly familiar, but maybe not everyone is. So just tell us what is kaggle? Sure. And Fun fact, this you actually tweeted yesterday. So at this point, kaggle is at 7 million users, I think. So when they say they're the home of data scientists really the biggest community in data science. And why do I say community, it has competitions that are hosted on the platform. So different use cases for different companies exist as competitions. Now as you can see, the first one is a example when but different competitions are brought onto the platform by companies who want the community to solve a problem. In return, there are prize pools, but really what people are there for is the knowledge sharing that happens and how does that happen. They also have very nice discussion forums, as well as notebooks. Now, at some point, they call it kernel. So if you're not familiar, but essentially you can host Jupyter Notebooks on the platform where people share this stuff. And this is the best of the best on the platform. So they share tips and tricks of how you can approach the competition. And then you use that to try and compete on a leaderboard. And you get real time feedback, because there are 10,000 people competing on the leaderboard, which may or may not be a good experience, from my experience, at least for the first few comparisons. But it's really exciting. It's a little bit like a hackathon type of thing. But because I focused on a data science problem not on like generating an app or a website, maybe that's a good elevator pitch. Exactly. Okay. So I'm sitting here looking at kaggle.com /competitions. And yeah, I can see a bunch of interesting things. It doesn't explicitly say who it's sponsored by on the outside, maybe if I click in, it'll say, Oh, yeah, this is brought to you by or sponsored by or put put out by so and so. But the first one is a SIIM-FISABIO-RSNA COVID-19 Detection, which sounds like a bunch of acronyms. I don't know anything about although I've heard of COVID.

24:30 The idea is to identify and localize COVID-19 evident nor abnormalities in chest x rays, which is interesting. And you know, that's a genuinely useful thing that we could all benefit from right? Having machine learning that can assist doctors and say, wait a minute, wait a minute, this person seems to have either had or currently has COVID basis picture. Let's do something about that. That's genuinely helpful for society. And if I can just point out at least for this particular competition, I think it launched

25:00 Few days ago, and just in those few days, you already have 450 people that are, I can say just hyper, they will be hyper active in the discussions and a lot of us just go there for the learning. I'm sure most of us go there for the learning and things you get to experiment with and learn on the piano. It says the price for this is $100,000.US, which is pretty sweet is that split, like number one gets half number two gets a quarter and it like trails offers it all or nothing number one, or zero, I think it's in the top three, sometimes in the top five. It varies from competition to competition. And again, it's it's really hard to get into the top, they have medals, they've gamified, all of this stuff. And how's that helpful. from an outside perspective, as you gain medals, you move higher up the ranks as well as the US, you start as a Novice, then you become an quote unquote expert, master and then Grandmaster. So as you own a certain set of medals, you start on your path towards becoming a grandmaster. So that's more exciting than the prize pool. Again, legends or very experienced, people are aiming for the prize pool. I don't think I've ever even dreamt of that. Right? If it's, you know, too far out of reach, it's not worth trying to worry about that. It's more about making the progress and seeing yourself go up in the charts and gain that experience. Right. Yeah, yeah. So let's see some other ones. I went and sorted by prize purse here. So Jane Street Market prediction, test your model against future real market data. That's interesting. There's 4000 teams competing for that. There's one about discover how data is used for the public good in the US for 90,000. That's pretty cool. Major League Baseball has one on digital engagement forecasting. So predicting fan engagement for baseball player digital content, that's pretty cool. This launch, I think, less than a day ago, and they're already 15 teams on there. I'm sure if you go over this competition, you can see some stuff in the discussion and kernels already. Yeah, a one that is very close to my heart is SETI, breakthrough. Listen, ETS signal search. So find Extra Terrestrial signals and the data from deep space. That's pretty cool. The prize is not huge. But yeah, if you were the person that discovered aliens Come on, that's a pretty good prize. And that's just zooming back to where this conversation started. Like I said, I'm not the person who could think of these ideas. And now I'm given this large number of options. Whatever is exciting to me, I can jump out that competition, even if I have zero idea about how to approach that problem. There'll be plenty of stuff that showed there. And I can just go from there. I can just start learning, I can just try to approach this in a top down fashion. Yeah, absolutely. So another one of your blog post that you wrote is your first kaggle competition experience writing, I make basically a retrospective on that. And so maybe tell us what that was like show. So in this competition, and I tend to set these goals every year. So I just announced my goals Go big or go home, right? I just read out the craziest stuff that I couldn't imagine last year, I wanted to lose 50 pounds, I managed to lose 70 pounds. Congratulations. That's massive. Thank you. But yeah, I just set these goals. And one of these goals was to start on competing on kaggle. So in this competition, my first one ever, and all of these competitions are a similar experience. I just joined the Quick Draw doodle competition, because again, it looked exciting to me. What I did at that time was just went to the discussion, I found people sharing stuff sharing code, I just took that tweaked a few numbers, tweet, a few parameters didn't make much sense. And I started moving up the leaderboard. So the leaderboard is the most exciting and most addicting thing on kaggle. Because you're getting this real time feedback, okay, I'm doing better than these people. And then you go to bed, you wake up, someone has shared a tip or a trick somewhere in a Kernel or a discussion. And now everyone has used that. And by the time you wake up, you're down by 100 positions. I see. They're like, Oh urologist training and all the data. What if you like transfer learning on this little subset? This is actually totally crushing it. everybody's like, we're changing what we're doing. And you wake up and you fallen down the leaderboard massively. Hmm, exactly. And again, now you have to get back to work. One thing you talked about in your blog posts was how going through it, you got some pretty good real world experience, right? You talked about how you were were you talking about as you talked about how you took all the training data. And you know, the data is a lot for this competition. There's like a billion images described as a CSV file or something weird, like that image. And so you took all that data loaded the training data, not all the data and loaded it up and sent it over on your GPU. And it took 50 hours, like more than two days

29:46 to remember that you expected you're gonna crush it right? And it turns out that like, actually, that made it less accurate, right, so you had to get more creative. Maybe Tell us about that. Yep. And again, this was this disconnect that I found in from the MOOCs that I was coming from the with everything.

30:00 just stuck to it so nicely that it's supposed to. And I just took that approach, okay, I'm just gonna check all of the data in a Data Loader, put it on my GPU, let it train, and I'll get a good accuracy, right? turns out not really, because it's not how this problem was structured. And again, I learned about all of these. I think from a practitioners perspective, important thing is, well, I learned here, I need to structure my project in a way because at some point, I'll be an untitled 152.IPython notebook, and I need to go back, I wouldn't have a track of that, I should probably do smaller experiments, rather than the first one being a 50. experiments, I should try and figure out how to run it on a subset of the data. Yeah, that's a really good point. Because if you're waiting 50 hours per iteration, that's not going to go very quickly. It sounds very easy and very obvious, but wasn't to me at least maybe I was stupid. I wouldn't say necessarily that I mean, it probably seemed like, Well, of course, if it's working a little bit, let's just give it all the data, then it's gonna really work, right? That's a pretty reasonable, naive, beginner point of view. Yeah, that that's gonna be totally fine. But then, in reality, reality comes along, like, wow, it's more complicated. So you ended up coming with a combination of like some of the larger images, some of the smaller images and building up out of that, like that kind of stuff right? Yeah. So I learned that he maybe I should start with 1% of the training data, put up a baseline, again, obvious stuff, and then try to work with different image sizes. And what I was trying to do is see if the accuracy according to my local validation was going up, and submitting it to the leaderboards, and just checking if it's actually working. And then training bigger models through that, at that point resonate was, I think, state of the art. That's what I was sticking to because I didn't have any outside idea about that other people were, of course, doing a lot of things. And I was just trying to catch up. Sure.

31:51 This portion of talk Python to me is brought to you by Your Base, Your Base has a really cool product that will dramatically improve testing and CI of your Python applications. If you could benefit from having py test, run your tests 100 times faster or more, you need to check them out. Here's how it works. Your Base observes what tests interact with which part of your application code and the first time you run it, the speed is roughly the same as normal. But the next time you run py test is where the magic is, your base knows which parts of your application code has changed. If the code under test hasn't changed, why test it again. So your base only runs the tests that have interacted with the part of the code that has, if you change just a couple of functions, you only need to run the few relevant tests and all the others can be safely skipped. This means skipping hundreds or even 1000s of tests, most of the time making your dev test workflow in your ci builds much, much faster. All you have to do is install your base and run py test as usual, they'll take it from there, get your free trial by visiting

31:51 talkpython.fm/yourbase, Your Base test acceleration works with the tools you're already using. So give them a pip install and see the difference right away, get started at talkpython.fm/yourbase.

33:07 So you're a fan of kaggle, you recommend people come along and use this for concrete ways to get started and build their knowledge beyond just theoretical stuff. 100%, I would just say, in retrospect, I would just tell myself to Hey, do fast ai sincerely once and then just sign up for any competition and go from there. Is it better to do it with a team of people do it by yourself, I will be honest, sometimes I would not be the person working the hardest in the team. So I would tell myself to at least start solo and then team up with different people, everyone follows different approaches. But at least for me, I tend to be the lazy person. So I would make sure that I've done some homework because before asking other people to join the team. Yeah, that makes a lot of sense. But apart from that, when you join your team, and all of my kaggle, quote, unquote, success is I would credit it to all of the teams I've been a part of. And then you get to meet all of these data scientists in a team where they're from different levels of experience, and they're doing these things that I couldn't have imagined. It's, again, a greater learning experience in that sense. Yeah. What's the story in terms of like people who are in the talk, you talked about them being grandmasters or whatever they're called? Yeah, there's grandmasters masters experts, contributors and novices in the ranking here. What's the job story look like? The Career story so if I'm over here, and and I'm one of the 1500 masters in kaggle dropping that information at a job interview is going to get me somewhere or not. Do you think it depends on the company a lot, so we, when I say we the company at I, when I work it through has a lot of Kagglers we have 20 grandmasters I think out of the five we can see right now three other part of is to number oh my gosh, yeah, that's like 10% of all all of them that's already four in the top five or a part of HQ at this one. Three of them. Yeah, amazing. So such a pleasure.

35:00 Of course, recognize the fact that this isn't easy. If you're a master, you're probably already in the top one top 0.5% of the global rankings. And there's a lot of work behind that. So I think it does make a lot of sense. Some companies don't recognize it. Maybe I've been involved in to work at those companies again, heartache. Yeah, that's actually an interesting point, isn't it? Like, if the person interviewing you for a data science position doesn't know about kaggle and respect, like massive progress there, maybe you don't want to really be on that team, unless you're like, we're hiring you to, like, modernize this and set the stage and like, bring, like, the real stuff to us. But if it's like, join the team, we'll show you how it's done, as you know, okay. It's just a portfolio of projects, you can tell everyone that, hey, I worked on this problem that your company is working on, and against the best of the best, I earned 10 out of 1000. And that's, that should be a huge signal to the hiring. I agree. I think, you know, put aside the competition, put aside the How do you rank against other people, if you can come over here and say, Oh, you see this major league baseball, digital engagement thing? I did that, and it came out pretty well actually solve that problem. And here's my GitHub repo for that, and our conversations around it. This one about the prediction of future sales also did that. And then this home price one, actually, I was near the top of that, like, just having that kind of portfolio to share, as part of an interview is so incredibly important, as so many people ask me, I want to get a job in this thing. How do I get started? Do I need degree x? Or should I go learn this technology or that technology, like all those things are interesting and valuable, but being something I really liked about the tech industry, but it's also, you know, challenging, because that's kind of where you got to live is, it's not so much your credentials, or your background that will get you the opportunities, it's, I need somebody that does this, I need somebody that knows how to predict house prices, you predicted house prices, you've shown you can do it, you're hired, right? Like if you can show that you're doing the thing that they already need, there's not a whole large discussion going on after that, right, you're really close to being in the right place to do that thing. So building up this portfolio is important, I think I managed to somewhat figure this out in my university days out of an interest just to explore problems that I started freelancing, which was because I wasn't allowed to have a job job while being in university. And at that point, I figured out, hey, if I'm going to approach a person on Upwork, and they want me to build something, it shouldn't be starting after we've had that conversation, if I can just look at the problem, even put together the most basic structure around it, and I can show it to them that, hey, I put this together in two days, if you hire me, I can build this in X amount of days. And most of the times that got me through the clients or whatever deals, I go in that good. Yeah, and getting that first or second project under your belt. It's really important. I feel like kaggle is part of that. Also, you know, Upwork is interesting that you bring that up, I am a fan of Upwork. If I was starting out and trying to get my first project, my first job, and I was having a hard time in my local area, finding that I'd certainly consider looking and seeing what jobs are out there and Upwork even if I thought they didn't pay very well, I didn't totally want them just to having that one or two projects done and part of my resume, then you can start looking, you know, more broadly, and it's just gonna be such a help to have some kind of portfolio, right. So as a student, I the pay really didn't matter. Or it was a lot as a student, but my biggest promotion in life was going from that basic food menu to looking up that menu as I started making money. That was exciting. It almost felt illegal that he someone is paying me to write code. Yeah, I remember my first job, I was so super excited. It almost didn't matter what they could have paid minimum wage, and I would have been thrilled about it. Because oh my gosh, someone's paying me to learn programming. And look, I have a book. I'm spending half my time just learning how to do this. I mean, they're paying me to learn this stuff. It's amazing. So yeah, I really, really had the same feeling when I was getting started. Fantastic. All right. So another interesting area around what you're doing has to do with your podcast. So maybe we could talk about just a couple of your couple your interviews that you've done that you really liked, showed found interesting. So tell us about a couple of them. Sure. So at some point how this started was I was doing all of these I was trying to essentially explore different areas of content creation. I started with blogging fast ai, gurus, Jeremy Howard told us to write blog posts. So I started doing that. And at some point, I found this disconnect of advices. So I reached out to a friend that Hey, would you you've been helping me a lot is it okay if I put this together in a blog post and put it out in the world? And that went on for a while. I started this as a blog series. And later after I graduated, I thought okay, maybe if I do this as a podcast, and I'm sure you would agree, I could explore

40:00 All of these great people's mind bigger depth. So that's how the podcast started for me. Yeah. Well, one of the big secrets about having a podcast is I get to be the first listener basically, to all these interviews, right? I mean, I guess now that we're live streaming it, we have like 51st listeners or something like that to me. But it's really amazing the opportunity to just meet these people that you're really interested in. Especially with conferences being gone and stuff. Now that's really hard to find time to like, meet up and just talk about them. But hey, you can have them as a guest on your show. It's really nice. Exactly. So coming back to my favorite interviews, I try to interview people about the journey as someone who's trying to understand how did this great person, Radek Kosmin, we have right a Kosminsky his interview on top. He's one of my heroes from fast ai. But how did someone like him learn programming? How did they learn how to Kaggle How did they break into the field? And we at least in the interviews, I try to ask them, did you face this problem? How did you overcome it? my three favorite interviews would be Roddick's, Demar Domins, and Andrada. So I try to interview people who are Kagglers. practitioners and researchers, essentially anyone I can find who would like to share their journey. And these are from all three aspects, essentially, that's called Deemar Domin, she did video recognition and computer vision. That sounds super interesting. I remember in her interview, I started by asking you, hey, when you were doing your research, you were just using open CV? What do you think about it nowadays? All from that it's also a lot about her research perspective. So deema is very much experienced, and is a great orator as well. So she was talking about how to approach your first research project, or how to just go about research, what is even research as someone who doesn't understand what that word means? And that's what I try to explore in all of these interviews. Yeah. So let's go back to kaggle. For some time something you touched on is really interesting. I know there are a lot of research teams, and groups at universities, who are trying to build models, they're trying to build mathematical algorithms, they're trying to do research, I feel like maybe some of these kaggle competitions would be really, really good to say, as part of our research project, let's take what we're trying to develop here and try to actually apply it to one of these competitions and see where it stands for sure. And it's highly encouraged, at least in the community, some organizers so the sponsors of the competition, invite you to present your solution, even in research conferences. And apart from that, even if you end up creating a blog post or research paper outside of the community is very close. And they recognize it instantly. And they know that I mean nothing against research. But at least this particular solution has been tried and tested against this leaderboard. And it works really well it is quite cutting edge because it's been tested against all of these people now. Very name. Alright, third one that you had queued up for us is Andrada. Olteanu. Yeah, Olteanu. Yeah, the best part about these interviews is I just get to meet all of these people with such amazing energy and such openness about the journey. This was again, such a fun interview. Because Andrada I was so open about her journey I was at that point, I was just starting my journey in data visualization. And I asked out Did you feel the same that you couldn't plot things against the x and y axis, and then you would have a hard time figuring out where they're ending up, because at least for me, I couldn't understand what's going on. And that's what we discussed about. And she was essentially talking about how she started her journey as someone who's fairly new to coding. And at this point, she's become a kaggle, Grandmaster in kernels. And she's been writing all of these amazing notebooks. And in this interview, we just learned about how she went about that as someone who just started out, and then learned all about this as they went about. Yeah, that looks like a really interesting interview, and somewhat similar to the one that a conversation we're having here. I think so. Yes. Yeah. All right. One final area that I want to make sure we get to spend some time on, as you know, you work at H2O.AI, I've had a lot of experience with these different frameworks, maybe we could do like a survey of the various deep learning ml libraries. And you could sort of tell me how they compare and your thoughts on the various ones. Sure. So the vision set forth by our founder, co founder and CEO is makers gonna make and what we're trying to do on a philosophy level is just create products that allow people to build stuff. So with that vision, and this is just my take on Android from a company's perspective, they've built together all of these auto ml products. So v being the latest one, I am sure we will talk about this. But apart from that, they started out with the open source h2 O3, which was the auto ml framework is still one of the most widely used ones, followed by driverless AI, which is an end to end auto ml product where essentially you just upload your data. And what I like to call the Iron Man mode, where you just click a button, it figures out what models need to be trained, does all the feature engineering and puts out a nice model for you. So we have this awesome

45:00 have alternative products at this point, open source and both enterprise facing in the different problems. We've been the latest one of them. Yeah, yeah, wave H2O Wave is pretty interesting. I guess it's at wave.h2o.ai. And it's a real time web app dashboard for Python and data science. And a lot of the data science things I see are about making static graphs, or maybe graphs that are, you can go and explore, like, I could move my mouse over, and it'll like highlight information about different parts that I could zoom into it and whatnot. But this is like a real time changing dashboard like a stock market, or like a factory or something like that. You want to see what's happening as time passes, right? I wish you said cryptomarket.

45:45 Yes, exactly. So reason I spoke about the philosophy is because I think this is the next bigger goal for the company. What we're trying to create is I'm not sure if this is out yet or not, are we trying to build a public app store of AI apps. So Wave is an open source frameworks framework that, you know, takes care of a lot of the things that as a data scientist, at least, I wouldn't want to worry about, I don't want to learn HTML, CSS, JavaScript. So it's just a framework that takes care of all of the UI UX stuff does it very nicely, I don't have to worry about messing up because it's taken care of. And then I can build different AI apps. As a company, what we're trying to do is we're also putting out an app store where you can already use the open source apps if you want. And you can also contribute your own apps if you want. Yeah, very cool. So there's a bunch of cool examples. There's a whole gallery full of many, many different things that you can go and rain, basically, wave is a open source dashboard for Python developers that don't have to do web stuff, but they can share it as the web, right? So on a website, exactly. And just to be clear, for the audience, when I say Wave my biggest contribution will probably be this interview. But again, it's it's this amazing team of engineers who have been building these products are attached to that they know how to scale them, and how to properly engineer them through all of this experience. I it's really a data science focused data scientists focus product every now and then there's like a project I'm like, or something out there. Like I really wish I had a reason to use this looks like really fun to play with, I just have no use for it personally, this is one of those things, right? I would love to have an excuse to use something like this and make it go. But I just don't have that much data that changes that much. In my world. Maybe you could put together a web page of talk Python with different episodes structure themselves, and the listeners can see a dashboard in real time. This is cool, like maybe downloads in real time and interaction in real time comments? Yeah, for sure. Something like that. Yeah. Yeah. I mean, I could definitely like put a little bit of something. But if you worked at a place that like, had all lot of stuff going on, like a factory, or like a big e commerce site or something, you could make a really cool live app out of this stuff. I feel like for sure. And again, it's it's been it's still under development. And all of our grandmasters who have this rich experience are also contributing to it. So I'm, I'm sure by the time this interview goes out, we would have added a lot to it. Yeah, it's cool. It's already got 2.6 1000 GitHub stars. That's pretty cool. So really, really nice. Maybe let's talk about some of the other mainstream ones as well, like Keras TensorFlow fast AI, give us your thoughts on these different frameworks. Obviously, it's your opinion, not like a, you know, endorsement, or a deep dive or whatever. But just what do you think about these people are going to have experience with all of them? Yeah, just to be clear, I strongly endorse fast ai . I've been a fan of that, that I agree on. But apart from that, I started my journey with TensorFlow, at least in that day, of course, TensorFlow has come along which the keraz API has been merged. But I was really struggling because it had this static graph structure. And it didn't feel pythonic Not that I was a good Python programmer, I'm still not. So that's why to fast AI. And what Keras, is to TensorFlow fast a summer to pytorch pytorch, for more pythonic approach, and fast is a wrapper but more on top of pytorch. Okay, interesting. Yeah, so fast AI is maybe a little easier to get started with anything. Yep. So the nice thing about fast ai is, is a very heavily opinionated library. So there are a lot of things that have been baked into it. And for some reason, whenever I just switch to pytorch, I am not able to replicate similar accuracies. That's what I mean, because somewhere, defaults are so good, that it always gets better results. But essentially, it's this layered API. And from an end user perspective, I could just use the high level API we have on the left, you can if you click on application, you can see the different applications that they support. Or if I want to work on something that's cutting edge, I can also use a training loop, which is really nice. And just bringing a pytorch model and connected Yeah, okay. Oh, it looks really cool. Again, computer vision I want to build build a game that play a computer game computer game AI that plays me in real life, like put.

50:00 Camera over, say a chessboard. And it'll, it'll play me but but not just on the screen, like on as I actually move the things right, it'll see that'd be, that'd be fun. Maybe I can try that out here. Sounds good. Yeah, just a little bit of interaction with some real, something rather there. It sounds cool. But a lot of options these days, right. And we've got all these different deep learning libraries and many things to choose from. Right? Again, one of the things that I've learned, at least from the podcast, and this is the collective opinion of everyone I interviewed again, you don't need to worry about the framework as much as you really understand the concepts. So that's why I encourage Fast AI, because it's also course around the framework. So you once, at least from my perspective, when I've gotten around to learning all of these things, it shouldn't be that hard or to switch to another framework, or depending on whatever your job requires you to do, or whatever your project needs you to use. Yeah, well, yeah, you learn the foundation. Solve the problem one way, and then you can solve it with some other library more easily. Again, and again, it's really hard for me to remind myself that, hey, the problem is what I'm trying to solve and not create more problems. I don't want to learn more of different things. But I need to figure out how to minimize my time in a way that I actually solved the problem. Yeah, I think one interesting thing that people learn as they get more experience is, even if the technology is super different, right? If I learned how to build something interesting in JavaScript, that maybe I know nothing about Python. So how am I going to do that, but actually, what you've learned over and one place is really way more transferable and reusable, like the way of just thinking about solving problems, the way of thinking about, Okay, I got to pay attention to this and not that. So what's important in this library, picking the right library are these things and so on. And of course, you should keep switching between frameworks as well. The thing for me was, I was switching as a very early stage developer, I'm still very early stage, if I can even call myself a developer. And I was switching between frameworks every 15 days, just because they looked exciting. That's that's not the right thing I would tell myself to do. Yeah, this is true. This is true. Get comfortable in while, and then you can move around. But yeah, don't just chase the shiny thing, just all over the place, for sure. Although in data science, where there's so many shiny new things that there are to pay attention to and visualizations, and libraries, and charting, and graphing and whatnot, it's easy to get distracted, I think. And that's why I mentioned you need to, I need to remember what I'm working on, I need to make the graph and not figure out how to make it prettier, as long as it does what it's supposed to do. But that framework

52:28 Looks exciting. Maybe I should try that over the weekend. And now I'm spending more like, here's an excuse to try that framework. This is my chance to try it. So I'm gonna go do exactly, exactly. Well. Alright, coming out of the live stream, David Ross says, Hey, some advice on getting started on web development or data science, you know, Python, I'll throw out a little bit though, you can add your thoughts, I would say you need to have some foundation and just Python basics, right? You need to know, variables, loops, functions, like that kind of stuff. But don't kind of the like the beginning conversation we had don't go so deep and say, well, I've got to completely understand everything about this language. Before I take the step to my first web app, or before I take the step to like firing up Jupyter and doing my first analysis. Like, don't do that. Just get comfortable with the basics. Start building and as you go into more advanced areas, then you're like, Okay, well, now I kind of need to learn about what is a list comprehension, but I didn't need it before. But now I'm going to like just add it on as you need it. Don't try to exhaustively learn Python, and then send them what would you say like after that Michael would first question for you? What are the basics? Really, that's one thing I really struggle with, I still struggle with because I look around on Twitter, everyone smarter than me is talking about this stuff. And this is pretty basic, is it? Am I the stupid person who needs to know all of this? Well, here is the interesting thing, like the people who are blogging, the people who are recording YouTube videos, or people who are tweeting about things, they're already at, like some certain level, and then they're super psyched about something advanced that they've just learned, or some really cool scalability thing that they've learned. And there's a really good article says, titled something like you're not Instagram, you're not Google, you're not LinkedIn or so you don't need all these crazy design patterns. And this like crazy cloud architecture, that companies like that have because you're a two person startup that doesn't even yet have a business build something simple. Worry about, you know, zero downtime, chaos monkeys later focus on the simple, right. And I think there's a lot of people that are fascinated by either looking up like, look where we could go look at what Instagram is doing, look at what Google's doing, and they are amazing and interesting what those companies and teams are doing, but they don't apply to you now. You know, I mean, so I think there's just a lot of really interesting conversations about stuff that's interesting, but not applicable to people who are beginners at all right, exactly. The master Dockers in Kubernetes

55:00 Probably not. Computer, yes. Okay, then start there. We'll worry about Docker. Like, once you get something working, maybe we'll put it in a container. But now don't worry about that now get started. Exactly and just disappear. So to the person asking this question, focus on getting the website up and give yourself a deadline. That's That's why I love setting goals publicly give yourself 10 20 days to figure out the Python basics and put together first website, you won't like it in retrospect, you might hide it from a GitHub, I do that a lot. And over time, you will polish it, it doesn't have to look like Like you said, Facebook or Instagram when it comes. It just needs to function somewhat. When and sometimes you'll freak about and something will fail. But then you figure out that Okay, I need to fix this now. And now you have stuff to do. And then you can think of other things. Okay, maybe I should add this. Maybe I should add a button. You making progress already? Yeah. Yeah, absolutely. And the other thing to keep in mind is that software is plastic, it's malleable, it can be changed. You don't have to get it right the first time, you have to just make progress. And you you've learned more than you change that you make more progress, as so many people can get hung up, like not even getting started because like, Well, I'm not really sure how to get started, I just take a step. If it's wrong, you take a step in a slightly different direction till you get in the right place. Like that's, that's how you do it without getting hung up without trying to boil the ocean by ordering and everything. Exactly. Alright, maybe that's a good place to leave it there for that conversation. But yeah, it's super interesting to hear your story. And congratulations on the success coming from getting started with small projects in college to working for h2o ai, I'm still learning a lot. But again, thanks so much for this opportunity. Like I said, I think there are two types of teachers first that they introduce you to something and the second that make you really interested in it. You are the second one to me, because I just got too excited about all of these things through your podcast. And of course they will others as well. But you are a major part of it. And yeah, thanks for this opportunity. Oh, yeah, thanks so much. Now you're not out of here yet, though. You got to answer the two final question. If you're gonna write some Python code, what editor do you use? Jupyter. notebook. Okay, yeah. Right. And then is there some library or something on PyPI? Yeah, you've come across right here like, Oh, this is super cool. kind of tell people about this. I keep running into all of them every second day. But I would say just discovering fast ai was was the biggest moments for me. Yeah. All right. So fast AI. Perfect. That's a good one. All right, final call to action. People are interested there may be also listening and getting into programming, getting into data science, and what advice you have for just build something or just go to kaggle. If you can't figure out what project to work on, I still struggle with that inspiration a lot. So I just, I would just tell myself to go to kaggle and sign up for any competition that I like the most. and go from there. Probably take Fast AI along the way. And you're all set. Alright, fantastic. Well, thanks so much for being here and catch you later. Thanks so much. Yep. Bye.

57:52 This has been another episode of talk Python. To me. Our guest on this episode was Sanyam Bhutani was brought to you by SENTRY, YourBase and AssemblyAI. Take some stress out of your life. get notified immediately about errors in your web applications with SENTRY just visit 'talkpython.fm' /sentry and get started for free and use the promo code 'talkpython 2021'. When you sign up. Your Base test acceleration will dramatically improve dev test workflows, and CI builds of your Python applications. If you could benefit from having py test run your tests 100 times faster or more, you need to check them out. Get started at 'talkpython.fm/yourbase'.

58:33 transcripts for this and all of our episodes are brought to you by Assembly AI Do you need a great automatic speech to text API get human level accuracy in just a few lines of code visit 'talkpython.fm/assemblyai'. to level up your Python. We have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription insight. Check it out for yourself at 'training.talkpython.fm' Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkpython.fm/youtube'. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon