A path into data science

Episode #322, published Fri, Jun 25, 2021, recorded Thu, Jun 10, 2021

Episode Deep Dive Links Transcript

Are you interested in getting ahead in data science? On this episode, you'll meet Sanyam Bhutani who studied computer science but found his education didn't prepare him for getting a data science-focused job. That's where he started his own path of self-education and advancement. Now he's working at an AI startup and ranking high on Kaggle.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guest introduction and background

Sanyam Bhutani is the featured guest on this episode. He graduated with a bachelor's degree in computer science but felt that his formal education did not fully prepare him for a career in data science. Through self-directed study, community involvement, and experimentation with tools like Kaggle, fast.ai, and various online courses, he charted a personal path into AI and machine learning. Sanyam currently works at h2o.ai, where he creates data science content, hosts a podcast, and interacts with a community of Kaggle grandmasters and practitioners.

What to Know If You're New to Python

If you’re newer to Python and data science, here are a few tips from the episode to help you get started faster:

Understand core Python basics (variables, loops, functions) before diving into specialized libraries.
Experiment in small steps: Work on small project ideas and gradually increase complexity rather than mastering “all the theory” up front.
Leverage communities such as Kaggle, fast.ai, and local meetups to ask questions and share progress.

Key points and takeaways

Building a Self-Guided Path into Data Science
Sanyam found that university coursework in computer science did not fully align with the practical demands of data science and AI. By taking online courses (including MOOCs) and entering Kaggle competitions, he filled in the gaps and developed real-world skills on his own terms.
- Links and Tools:
  - Kaggle
  - fast.ai
Top-Down vs. Bottom-Up Learning
Traditional degree programs often rely on bottom-up learning, emphasizing fundamentals before tackling applied projects. Sanyam discovered that fast.ai and Kaggle encouraged a top-down approach: Build exciting, functional projects first, then go back and solidify the underlying concepts.
- Links and Tools:
  - fast.ai’s Top-Down Learning Philosophy
Harnessing Kaggle Competitions for Real-World Practice
Kaggle competitions offered Sanyam an avenue for hands-on learning. He could iterate quickly, compare his results on a leaderboard, and collaborate with teams of experienced data scientists. This fast feedback loop built motivation and practical expertise.
- Links and Tools:
  - Kaggle Competitions
  - Kaggle Discussion Forums
Iterative Project Development
In one of his early competitions (the “quick draw” doodle challenge), Sanyam learned the importance of smaller data subsets and iterative experiments rather than huge, 50-hour training runs that might not necessarily yield better results. This was a pivotal lesson in experimentation and workflow organization.
- Tools and Concepts:
  - Python-based experimentation (Jupyter notebooks)
  - Subset training and data loaders
Building a Portfolio and Reputation
Engaging in Kaggle, writing articles, and creating content (like Sanyam’s blog and podcast) not only honed his skills but also served as a public portfolio. Being able to point to top Kaggle rankings or in-depth blog posts often carries more weight in hiring decisions than traditional credentials alone.
- Relevant Platforms:
  - Upwork (for freelance projects)
  - Personal blogging platforms like Medium or Dev.to
Creating Community-Driven Content
Sanyam’s podcast (“Chai Time Data Science”) highlights insights from Kaggle Grandmasters and AI researchers. Through interviews, he shows how different backgrounds, approaches, and problem-solving methods lead to breakthroughs in data science.
- Links and Tools:
  - Chai Time Data Science (Sanyam’s podcast)
h2o.ai and AutoML
Sanyam now works at h2o.ai, which focuses on automated machine learning tools. Products like driverless AI and the open-source H2O suite help data scientists quickly train and deploy models without manually managing endless parameters.
- Links and Tools:
  - h2o.ai
  - Driverless AI
Building Interactive Dashboards with H2O Wave
H2O Wave is a real-time web application framework for Python developers who want to build dashboards without heavy front-end coding. This underscores the importance of sharing real-time insights and analytics in a way that non-technical teams can act upon.
- Links and Tools:
  - H2O Wave
Fast.ai’s Practical Deep Learning
The fast.ai course combines code, community, and a library that wraps PyTorch in a higher-level API. Sanyam praises it for making deep learning approachable quickly, so you can see meaningful results—like top leaderboard spots in Kaggle competitions—right away.
- Links and Tools:
  - fast.ai Course
Avoiding Overengineering at the Start
Both Michael and Sanyam emphasized that new developers shouldn’t jump into advanced practices like Kubernetes or intricate design patterns too soon. Start small, get something working, learn from it, and only adopt more complex tools when they become necessary.

Key Concepts:
- Minimum Viable Product (MVP) approach
- Simple scripts before containers or large-scale architecture

Overcoming Impostor Syndrome
The episode touches on how easy it is to feel unprepared or “behind” in the fast-moving tech landscape. Both host and guest encourage developers to focus on incremental learning and celebrating small wins, such as finishing a mini project or climbing a Kaggle leaderboard.

Tools and Strategies:
- Public goal-setting (tweets, blog)
- Discussion forums for peer validation

Translating Learning into Job Readiness
Rather than fixating on degrees, focus on demonstrating capabilities through open-source code, Kaggle competitions, or freelance gigs. This show-don’t-tell approach resonates strongly with data-driven companies who want evidence of applied skills.

Links and Tools:
- fast.ai
- Talk Python Training (Data Science Jumpstart)

Interesting quotes and stories

"It didn't make me a better programmer at all, let me start with that spicy opening." — Sanyam on his university experience

"I think there's so many opportunities to get into data science or to get into Python and programming, no matter your background." — Michael

"I was just going to watch more MOOCs, but I realized I should spend my time actually coding." — Sanyam

"Some companies might not recognize Kaggle experience. Maybe I wouldn't want to work there." — Sanyam

Key definitions and terms

Kaggle: A platform that hosts machine learning competitions, data science challenges, and community notebooks. Great for honing practical ML skills.
fast.ai: A high-level deep learning library built on PyTorch, paired with a course emphasizing top-down learning, making deep learning more accessible.
AutoML: Automated Machine Learning, which automates repetitive tasks of training and tuning ML models. h2o.ai’s Driverless AI is one such product.
Top-Down Learning: An educational approach focusing on immediately building real projects or models before fully diving into low-level theoretical details.
PyTorch: A popular deep learning framework in Python, often used in combination with fast.ai.

Learning resources

Here are a few recommended resources mentioned or inspired by this episode:

Python for Absolute Beginners
Ideal if you want a clear and friendly introduction to Python fundamentals, step by step.
Data Science Jumpstart with 10 Projects
If you’re excited about data-driven projects, this course will guide you from zero to working on interesting data science tasks.
Kaggle
Join competitions to get hands-on experience with real datasets and see how your solutions compare with others.
fast.ai Course
Learn top-down deep learning with accessible lessons and a supportive community.

Overall takeaway

Aspiring data scientists can thrive by focusing on practical, hands-on experience. That means picking an interesting real-world challenge, using tools like Kaggle to get quick feedback, and gradually learning the theoretical underpinnings. Resources like fast.ai, community platforms, and lightweight frameworks provide a gentle entry into projects that make a real impact—proving that consistent effort, collaboration, and curiosity can shape a successful path into data science.

Links from the show

Sanyam on Twitter: @bhutanisanyam1
Chai Time Data Science Podcast: youtube.com
Fast AI: fast.ai
How not to do Fast.ai (or any ML MOOC): medium.com
First Kaggle Competition Experience: towardsdatascience.com
Kaggle competitions: kaggle.com
Radek Osmulski interview: youtube.com
Dima Damen interview: youtube.com
Andrada Olteanu interview: youtube.com
H2O Wave: wave.h2o.ai
Keras: keras.io
Tensorflow: tensorflow.org
PyTorch: pytorch.org
Quick, Draw! Doodle Recognition Challenge: kaggle.com
Developers, Developers, Developers song: soundcloud.com
YouTube Live Stream: youtube.com
Episode #322 deep-dive: talkpython.fm/322
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #322 deep-dive: talkpython.fm/322

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Are you interested in getting ahead in data science? On this episode, you'll meet Sanyam Bhutani,

00:04 who studied computer science but found his education didn't prepare him for getting a

00:08 data science-focused job. That's where he started his own path of self-education and advancement.

00:14 Now he's working at an AI startup and ranking high on Kaggle.

00:17 This is Talk Python To Me, episode 322, recorded June 10, 2021.

00:23 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:41 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where

00:46 I'm at mkennedy, and keep up with the show and listen to past episodes at talkpython.fm,

00:50 and follow the show on Twitter via at talkpython. This episode is brought to you by Sentry and

00:56 your base, and the transcripts are brought to you by Assembly AI. Please check out what they're

01:01 offering during their segments. It really helps support the show. Sanyam, welcome to Talk Python

01:06 to Me. Michael, I'm disappointed I didn't hear the Steve Ballmer remix intro, but I'm very honored.

01:12 Aha, developers, developers, developers. Oh, come on. It's so good. You know it's so good.

01:17 I remember that was in your first few episodes. I think they came out right around the time I was

01:21 in university. Thanks for this opportunity. I've been a fan and listener of the show and

01:26 really excited to be talking to you. Yeah, I'm really excited to have you. I'm excited to hear

01:30 about your journey into data science. It's going to be so much fun because I feel like so many people

01:36 out there looking in from the outside, you know, they maybe didn't come into data science or to Python

01:43 from a traditional computer science education. And they feel like, well, I didn't go through that

01:48 path. And so I probably, this is not a good fit for me. I think that's very far from the truth. I think

01:54 there's so many opportunities to get into data science or to get into Python and programming.

01:59 And while you do have some experience with computer science at the university, it sounds like as we'll

02:04 learn through your journey that a lot of what is actually effective had very little to do with

02:09 university. Let me start out on a spicy note. I studied computer science at a university at one

02:15 of the best universities in the country. It didn't make me a better programmer at all. Let me start

02:20 with that spicy opening. That is spicy. Now you've thrown it down. And I want to come back to that,

02:28 but let's just start with your story. You know, what got you into programming first? Was this a

02:32 university thing that you pursued or were you interested in that beforehand or how'd you get into

02:37 programming? Sure. So I was the standard nerd definition. I enjoyed spending time with computers.

02:42 Whenever my parents would go to sleep, I'd figure out a way to sneak into the computer room, just play

02:47 games all night. And along the way, I think somewhere in high school, I discovered programming.

02:53 It was Java, unfortunately, but I saw the promise of it. I saw all of these interesting things that were

03:00 happening around it. And somewhere I just made up my mind that, Hey, I want to take up computer science

03:05 because that's what coders do. Unfortunately, not as I learned later. That's how I got interested in it.

03:11 And that's why I decided to take up a course in it. Yeah, that's fantastic. I never took computer

03:15 science as a major in college, but I studied math and I had to take a couple of programming courses

03:20 to sort of fulfill my math degree requirements. And yeah, I found it to be a mixed bag. Like I had to

03:27 learn Scheme and Lisp and I thought, well, that's not super practical, but...

03:31 What are those?

03:31 Yeah.

03:33 But I've got to start here. I was like, please let's do some C++ or something. Like, no,

03:38 no C++ for you. Darn it. And then I was told I had to learn Fortran because it was the most

03:44 important language I would ever learn. Turned out to not be true, but I learned Fortran as well.

03:49 And eventually I got into some fun languages that I got to build some things.

03:53 Well, I don't know how you felt, but my experience with being in a university, and this is speaking

04:00 from doing this in the nineties. So it could have absolutely changed, right? I haven't gone back to

04:04 the university since, but I didn't get a lot of projects that I really loved that I was really

04:09 super excited about. It was like, well, you're going to need to learn how to do this algorithm

04:13 by hand on paper. And you're going to need to implement this in this sort of archaic,

04:18 weird language. You got to do this.

04:20 You're compiling stuff in your head through paper. Talk about state of the art.

04:24 Yes. Oh my gosh. I'm like, why, why do we not get to use computers in our computer science course?

04:29 This is crazy. I just don't get it. But here I, here I am. So I didn't come away feeling like it

04:34 made me a super good programmer. It gave me some exposure and some interesting experience, but my

04:40 real exposure that got me into programming and told me, like revealed to me, like you can do this.

04:47 And this is for you was when I was doing a research project that had to do with math and

04:52 not programming, but it needed a little pro it needed some programming to do the simulations and

04:57 do the work. And I'm like, well, now this is super fun. This is the kind of stuff I wanted to do. And I

05:01 was up at 2am, you know, working on it in the computer labs then, because all of a sudden it was really cool.

05:07 So I don't know. Hopefully computer science is more practical these days, but I didn't find a huge

05:13 value in those computer courses I took at university.

05:15 To be honest, like I relate to that so much. I could just rant about this for hours, but I signed up for

05:21 computer science because there was this notion in my head that, hey, computer science is where you do

05:25 programming stuff, right? You make computers smarter. And then they're teaching us all of this stuff that,

05:31 you know, doesn't really make sense. Like I remember listening to talk Python those days,

05:36 and you were talking about PyPy, which I didn't know what was because they never told us what it was.

05:41 And I'm listening to all of this stuff. And what they're teaching us is how to make for loops,

05:45 print out patterns. Like I don't see how these things connect, right? You're talking about flask

05:50 building apps on talk Python. I just heard Michael talk about it. But now what is all this stuff?

05:56 What is inheritance? Where does it come into the picture? And there was this huge disconnect for me.

06:00 So very much echoed that experience as well, unfortunately.

06:04 Yeah, that's interesting to hear you look back on it. One of the things that you talk about in some

06:08 of your writings and some of your experience, and we'll get into it, has to do with top-down versus

06:15 bottom-up learning. Before we get to that, I want to make sure you get to answer both the opening

06:20 questions. So what are you doing these days right now before we dive into that aspect of learning?

06:24 Sure. I currently work at h2o.ai, which is a company building auto-ML products. I'm sure we'll

06:29 get into this later on as well. I work as a content creator slash engineer. So we have a makers gonna

06:36 make culture, which means that I have absolute freedom to bring ideas. Usually people don't stop

06:41 me and they encourage me, which also means that I can do a podcast at work. I started a podcast,

06:47 a while ago called Chai Time Data Science, where I interview my heroes. A lot of them are Kaggle

06:52 grandmasters. So we can talk about this later as well, but Kaggle has different tiers. Grandmaster

06:58 is the highest one of them. S2O has, I think, more than 20 grandmasters. So at some point I said,

07:04 hey, can I interview our people? And they said, yes. So I have a lot of freedom over stuff I do,

07:08 but it's a lot of creating content and things in those domains. So blog posts, videos,

07:13 I get to do meetups as well. That sounds like a really fun job. It is.

07:17 This whole exploring ideas and creating content and interviewing people and just being out in the

07:22 community. It's the aspect of programming that when people first hear about it, I think is extremely

07:27 surprising, right? Yes. A lot of people think of programming, especially before they really get

07:32 into it, it's this solitary thing that kind of geeky, super smart people do, mostly alone,

07:38 mostly to avoid other contact with others, right? And then as you get into it, you learn like,

07:43 actually there's a whole lot of team dynamics and programming. And then there's these roles that

07:47 are like developer evangelist, which sounds pretty similar to what you're kind of doing,

07:51 like community outreach on the dev side, which is very social and outgoing and interesting. And so,

07:58 yeah, it's a whole spectrum. It's a lot of fun for sure. And yeah, it would be closest to evangelism, but also I have a lot of freedom to do a lot of, honestly,

08:06 anything I bring to the table, usually I get positive feedbacks about it. So I just keep doing stuff,

08:11 even if it's interviewing people over time.

08:13 Yeah, that's fantastic. So back to my top down, bottom up thing that you talk about. In a lot of

08:19 academic, you know, high school, college settings, the foundation is set at the beginning, right? Okay,

08:26 well, we're going to teach you how to do derivatives or differential calculus. So what we're going to do

08:31 is we're going to start out real, real simple. And I'm going to talk about what does a difference look

08:36 like, and then we're going to talk about limits. And then we're going to, you know, eventually,

08:39 like two months later, you can do derivatives and you can actually do calculus, right?

08:44 Yeah.

08:44 That first two months, you just have to have faith that I'm just going to keep cranking on the details

08:49 until something interesting happens.

08:52 But even after those two months, like you get to solve these problems, you get to, at least for me,

08:57 I was able to confirm that, hey, the answers I'm getting match with those in the book. But what's

09:02 the point of all this? Like, okay, I'm able to solve these problems. I know how to ace my test. I know

09:06 how to match into that. But where will this be used? I'll never find out. And I just didn't have the

09:12 passion for it. My only passion was, I need to get good grades to get into a good university. But apart from

09:17 that I had no real...

09:19 Right, right. I want a good job, but like, this must be the path.

09:21 Yeah, basically.

09:22 Yeah, yeah. I hear you. And I feel like so much of academics and many presentations and courses are done

09:29 this way as well. They, but especially academics, because you have to finish it to get the grade,

09:34 to get the degree. So they're like, well, it's fine if it takes three months before this is interesting

09:39 to anyone, because they have to stay here. They have no choice. Like, we're going to build up slowly,

09:44 bit, bit at a time for three months, because guess what? They're all enrolled and they need this.

09:51 This is a required course. And so we're going to make sure we get every little detail in place along

09:55 the way. And eventually it'll be interesting to them. I just feel like that is so backwards

10:00 from trying to capture inspiration of people, you know?

10:03 Somehow, every single time there's always this disconnect, like, okay, I get it. At some point,

10:09 you have to know the concepts, but you're never told of the bigger picture, which is what is a

10:14 larger focus in the top-down approach. So I never know where these individual things will really be

10:19 used. It's like, before you get to drive, you need to know about the thermodynamics of the engine

10:24 rather than sitting in the driver's seat.

10:26 Yeah, there's a beautiful Ferrari you want to take out for a drive. And you're like, no,

10:30 no, no, no, no, no, no. We're going to study physics, study thermodynamics,

10:33 a little chemistry for the combustion. And then a couple of years, you can take that thing for a drive.

10:37 Exactly. And I feel like it actually captures it pretty well. You know, contrast that with,

10:42 well, let's just teach people the rules of taking a derivative, right? Derivative of x squared is 2x.

10:48 Okay, great. Now let's show them how they solve cool problems. Like, oh, here's a ball flying through

10:53 the air and we can figure out its velocity when it hits the ground based on things like the derivative

10:57 and acceleration and so on. And then eventually, once you're like, this is really interesting,

11:01 then you could talk about like, all right, now let's dig in. Let's talk about like the details of why

11:07 this math or this data science algorithm works. And it just doesn't really go that way. So I think

11:12 that's definitely an interesting part of the journey that you had to make that switch, right? To sort of

11:18 go from this like really theoretical academic background to, oh, I've got like a Kaggle competition.

11:24 Yes.

11:25 And I've got two weeks to solve the problem. Like we can't be rebuilding it from the foundation. Let's go

11:30 the other direction.

11:31 Yeah. Just to be clear to the audience, I just did a bachelor's in computer science. I didn't do a

11:36 master's or PhD. I gave up on academia midway. But yeah, echo on that. And it's the top down approach.

11:44 I was introduced to this through fast.ai. They are a big advocate of this. And that's how I became a fan

11:51 of this. Essentially, what they cover in their blog post is you're given the baseball bat and you get to play

11:57 first rather than being taught the physics of the curveball. And I think at least for me, in retrospect,

12:03 the main challenges throughout all of these months of learning a subject in university, you need to be able to

12:10 stay motivated. And remember why you've taken up a course. I took up web programming because let's say I want to learn

12:16 how to make websites and not because I need to remember what HTML tags come in the final semester question every year.

12:23 And somewhere in the middle, you lose out on this motivation. And the top down approach essentially takes care of that.

12:28 That, hey, bring your project and figure out stuff along the way. And I think I mentioned in our interview, I think

12:34 talk Python courses really cover this well because you're given 10 sets of projects and you can just build them along the way.

12:41 Yeah, thank you. Yeah. I mean, I really am a big fan of this because I think, and I tried to incorporate the courses

12:46 that we have, because I do think you need to have these little wins right away. And you hear a lot of times

12:52 people talk about like, well, if you're teaching kids, the kids need to have these like good experiences early.

12:58 It's like, you know what? Replace kids with people. Like people just, they've got a lot of time and other options and you want

13:05 to make them feel good and excited and like they're making progress. They need to make progress,

13:09 even if it's little steps to the beginning, make it feel like legitimate progress, not just algorithms

13:15 and loops and stuff like that. Yeah, absolutely. So you went through your computer science degree,

13:21 but you didn't come out the other side feeling like a data scientist. And this was around the time

13:26 of the MOOCs, right? The massive online open courses. Is that what this stands for?

13:31 I think so.

13:32 I think so. Something like that. It might have several variations. And one of those is over at

13:39 fast.ai, right? Focused on deep learning and data science type topics, right?

13:45 Yeah. So just going back to the universities, like you said, I was just really unhappy that, hey, there's this huge disconnect.

13:51 And like any smart person in their 20s, I just spent a lot of evenings ranting about it.

13:58 And at some point I decided, OK, this is going to help me. And I just started signing up for every single course on the

14:06 internet. I used to say this proudly that I've done 50 plus courses to my peers who would look up to me that, oh, this guy's...

14:13 Oh, I've only done 10.

14:14 Yeah. But in retrospect, I was just being dumb and chasing all of these courses. Fast.ai, in retrospect,

14:21 and I keep saying this, but it's the most impactful course in my career. So Fast.ai is not just a course,

14:27 it's also community and software. But I got introduced to top-down learning through them. And they make you

14:34 excited about this stuff. In the first lecture of the deep learning course, they have a bunch of courses.

14:39 They teach you how to put together a few lines of code. Of course, you don't know what's happening,

14:43 behind it. But you build something that's state-of-the-art. And Jeremy Howard, the creator,

14:48 shows you how you can get to the top of a leaderboard on a Kaggle competition. I don't know

14:52 what's more exciting than that, at least to someone who was printing out star patterns in university.

14:57 Yeah, that's really neat. And I think the community aspect is also pretty important, having that ability

15:03 to sort of bond with people there. So MOOC, the M stands for massive, like number of massive in terms

15:09 of number of people, because it's a large group. I haven't gone through their courses or anything.

15:13 At this point, it's a few hundred thousand people, I'm sure might be more than that.

15:18 That probably counts as massive. You know, if you compare it against a 30-person college course or

15:22 whatever. Yeah. Okay.

15:24 The biggest mind opener for me was, we suck at diversity in tech, right? No other way of putting

15:29 it. And just talking to different people on these online communities, people who don't have computer

15:36 science degree or were coming from different walks of life. I didn't understand that, hey, you're supposed

15:41 to have other responsibilities as well. You're supposed to be helping your family out. I just assume you can do this

15:47 in your free time and that's all you do. But that was also a mind opener for me during those days.

15:51 This portion of Talk Python Army is brought to you by Sentry. How would you like to remove a little stress from your

15:58 life? Do you worry that users might be having difficulties or are encountering errors in your app right now?

16:03 Would you even know it until they send that support email? How much better would it be to have the error and

16:09 performance details immediately sent to you, including the call stack and values of local variables and the

16:15 active user recorded in that report? With Sentry, this is not only possible, it's simple. In fact, we use Sentry

16:22 on all the Talk Python web properties. We've actually fixed a bug triggered by a user and had the upgrade

16:29 ready to roll out as we got their support email. That was a great email to write back. We saw your error and

16:34 have already rolled out the fix. Imagine their surprise. Surprise and delight your users today.

16:39 Create your Sentry account at talkpython.fm/sentry. And if you sign up with the code

16:44 Talk Python2021, it's good for two months of Sentry's team plan, which will give you up to 20 times as

16:51 many monthly events as well as other features. So just use that code Talk Python2021 as your promo code

16:58 when you sign up. One of the things I really value about the Python community is it's not just straight CS

17:07 to sort of deep applied Python out of like this university chain, but rather so many people are

17:14 brought in from different areas, right? People are interested in biology and they learn a little

17:18 Python. People are doing astronomy and they learn a little Python. People are building Instagram and

17:23 you know, they're using Python. So there's just this diversity of viewpoints and specialties that

17:29 comes to Python that's really unique. And it sounds like you kind of got that feeling as well here.

17:33 I was always very welcomed by and the fast day community especially is very warm and welcoming. So

17:40 it's at least at that time, I went to Reddit to ask a few questions. And I got a lot of harsh feedback,

17:47 which really demotivated me. But fast day community was the exact opposite at that time. Reddit is a lot

17:53 better now. Any other community for that matter. But it's a very welcoming community. And no one says

17:58 that, hey kid, you're not supposed to be asking these stupid questions. Rather, even the creator

18:02 himself, Jeremy Howard often hangs out in the forums, answer all of the questions. So it's really put back

18:08 the inspiration in me while I was just in this dark phase, nothing making sense in university.

18:13 Yeah, very cool. So there's a couple of things that you've done. Let's set the stage and then we'll dive

18:19 into the details on them. So as you've gotten your degree, if you've gotten better in data science and

18:25 deep learning, there's a handful of things you've done to give back to the community and stretch yourself as

18:31 well. One is to work on your blog and write articles. Two is to create your podcast, which I was happy to be a

18:37 guest on a while ago. It's very nicely done. Thanks for saying yes. It was a very exciting moment to host you,

18:44 honestly. Yeah, thank you. And yeah, so the podcasts and then also the Kaggle competitions. So let's start

18:52 with your blog posts. I just pick a couple out of here that are interesting. One of them is how not to do

18:58 fast AI or any other ML MOOC course, right? Yeah. And so you go through sort of how you

19:05 approached these courses. You talked about how you took 50 courses, which is on one hand, I think it's

19:11 really awesome to get that exposure. But on the other hand, to really master programming, you need to

19:18 stop and try to like solve concrete problems, fail at that, figure out like, well, I'm trying to solve

19:25 this problem. I can't even get a virtual environment set up to let me install this library. Like what is

19:29 going, you have to hit your head against that. And it feels like you're bad. It's just, you know,

19:34 it's building layers of experience in a way that like, it's not the funnest, but you got to go through

19:40 those steps and then you sort of work your way into developing that experience. There's not a super

19:45 shortcut. Having the courses helps give you the perspective and know where to focus, but it's still,

19:50 you kind of got to go through that path, right? So maybe talk us through how you approached it and

19:55 then the advice you might have after. Yeah. And to counter back also, just generally speaking,

20:00 maybe I'm not the most outward looking person, but I didn't find these ideas of, you know, building any

20:08 projects. So I couldn't think of a website that would look interesting. So I would just go to a course,

20:12 assuming that I would learn all of this stuff. And a lot of these MOOCs are very nicely marketed,

20:17 that they make you feel that, okay, I'm going to come out learning something. So I just followed

20:22 this trail of stuff that I would keep looking up. I need to know Python. So I would do a Python course.

20:28 Then I would take a course on different frameworks, keep doing that. And even at the end, I didn't

20:34 accomplish much because again, there was this huge disconnect because if anyone would tell me to do

20:39 anything that's slightly outside of the curriculum, I would fail at that. And that's just because I didn't

20:43 experiment as much. And by in retrospect, I should have spent at least thrice or twice,

20:50 at least twice as much time just trying to quote even the stupidest idea possible instead of just

20:57 watching those lectures because they felt in my comfort zone that, okay, I'm learning something,

21:03 but I wasn't learning something at that point.

21:05 Well, you are learning something. I do think that being able to watch the lectures of an online

21:10 course and following along, like you're getting real exposure and real stuff, but you're not,

21:16 even though you're feeling comfortable, you're not at a place where if somebody said,

21:20 now go build something different, it's not that different, but it's different and do it from

21:24 scratch, right? You're not building up that skillset unless you're also experimenting along the way.

21:29 Yes. And I'm sorry, just to clarify. So this wasn't for the first course I had taken like at

21:33 least 10 of them and I was watching the same stuff over and over again. So at that point,

21:37 it was a waste of time, I think.

21:39 Yeah, for sure. For sure. So you said, all right, well, I'm not so sure that the way I was doing it

21:45 was totally the right way. So what would you say is the right way? What some advice would you give

21:50 there for being successful in these online courses?

21:52 Sure. I'll point out to a book by Rade Kosmalski, who I had interviewed on my podcast earlier, but he's

21:59 put out a book that essentially talks about different things that you should be learning or how should

22:03 you really approach learning. And in his book, he talks about code twice as much as reading theory,

22:09 have this northern light of an idea. I wasn't again, I couldn't think of anything. So I took to Kaggle

22:15 the competitions. In my opinion, at least in my opinion, just do fast theory and then jump on to

22:20 Kaggle. Those are the two best places to learn about data science, in my opinion.

22:24 So the Kaggle competitions are interesting. Let's maybe talk about those for a little bit.

22:29 Sure.

22:30 I haven't talked about Kaggle a lot on the show. I'm sure people are mostly familiar, but maybe not

22:36 everyone is. So just tell us what is Kaggle.

22:39 Sure. And fun fact, the CEO actually tweeted yesterday. So at this point, Kaggle is at seven

22:45 million users, I think. So when they say they're the home of data science, it's really the biggest

22:49 community in data science. And why do I say community? It has competitions that are hosted

22:54 on the platform. So different use cases for different companies exist as competitions.

23:00 Now, as you can see, the first one is an example one, but different competitions are brought onto

23:07 the platform by companies who want the community to solve a problem. In return, there are price pools,

23:14 but really what people are there for is the knowledge sharing that happens. And how does

23:18 that happen? They also have very nice discussion forums, as well as notebooks. At some point,

23:24 they call it kernels if you're not familiar. But essentially, you can host Jupyter notebooks

23:29 on the platform where people share their stuff. And this is the best of the best on the platform.

23:34 So they share tips and tricks of how you can approach the competition. And then you start to

23:38 try and compete on a leaderboard. And you get real time feedback because there are at times

23:43 thousand people competing on the leaderboard, which may or may not be a good experience,

23:47 from my experience, at least for the first few competitions. But it's very exciting.

23:52 It's a little bit like a hackathon type of thing, but very focused on a data science problem,

23:56 not only generating an app or a website. Maybe that's a good elevator pitch.

24:01 Exactly.

24:01 Okay. So I'm sitting here looking at Kaggle.com slash competitions. And yeah, I can see a bunch

24:08 of interesting things. It doesn't explicitly say who it's sponsored by on the outside. Maybe if I click

24:12 and it'll say, oh yeah, this is brought to you by or sponsored by or put out by so-and-so. But the

24:20 first one is a Simphysibio RSNA COVID-19 detection, which sounds like a bunch of acronyms. I don't know

24:27 anything about it, although I've heard of COVID. The idea is to identify and localize COVID-19 abnormalities

24:35 in chest x-rays, which is interesting. And that's a genuinely useful thing that we could all benefit from.

24:42 Right. Having machine learning that can assist doctors and say, wait a minute, wait a minute,

24:47 this person seems to have either had or currently has COVID based on this picture. Let's do something

24:53 about that. That's genuinely helpful for society. And if I can just point out, at least for this

24:58 particular competition, I think it launched a few days ago. And just in those few days, you already

25:03 have 450 people that are, I can say just hyper, they'll be hyperactive in the discussion. Then a lot

25:09 of us just go there for the learning. I'm sure most of us just go there for the learning and things you

25:14 get to experiment with and learn on there. Yeah. And it says the prize for this is a hundred thousand

25:18 dollars in the US, which is pretty sweet. Is that split like number one gets half, number two gets a

25:25 quarter and it like trails off or is it all or nothing? Number one or zero? I think it's in the top

25:30 three, sometimes in the top five. It varies from competition to competition. And again, it's really hard to get into

25:36 the top. They have medals. They've gamified all of this stuff. And how's that helpful from an outside

25:43 perspective? As you gain medals, you move higher up the ranks as well as tiers. You start as a novice,

25:48 then you become a quote unquote expert, master and then grandmaster. So as you earn a certain set of

25:54 medals, you start on your part towards becoming a grandmaster. So that's more exciting than the prize pool.

26:01 Again, legends or very experienced people are aiming for the prize pool. I don't think I've ever even

26:08 dreamt of that. Right. If it's, you know, too far out of reach, it's not worth trying to worry about

26:14 that. It's more about making the progress and seeing yourself go up in the charts and gain that

26:19 experience. Right. Yeah. Yeah. So let's see some other ones. I went and sorted by prize purse here. So

26:25 Jane Street market prediction, test your model against future real market data. That's interesting.

26:30 there's 4,000 teams competing for that. There's one about discover how data is used for the public

26:35 good in the US for 90,000. That's pretty cool. Major League Baseball has one on digital engagement

26:43 forecasting. So predicting fan engagement for a baseball player, digital content. That's pretty

26:48 cool. This launch, I think less than a day ago and there are already 15 teams on there. I'm sure if you

26:54 go over to this competition, you can see some stuff in the discussion and kernels already. Yeah.

26:59 One that is very close to my heart is SETI, Breakthrough Listen, ET signal search. So

27:05 find extraterrestrial signals in the data from deep space. That's pretty cool. The prize is not huge,

27:10 but you know, if you were the person that discovered aliens, come on, I mean, that's a pretty good prize.

27:15 And that's just zooming back to where this conversation started. Like I said, I'm not the

27:20 person who could think of these ideas. And now I'm given this large number of options,

27:25 whatever is exciting to me. I can jump on that competition. Even if I have zero idea about how

27:31 to approach that problem, there'll be plenty of stuff that shared there. And I can just go from

27:35 there. I can just start learning. I can just try to approach this in a top down fashion.

27:39 Yeah, absolutely. So another one of your blog posts that you wrote is your first Kaggle competition

27:44 experience, writing basically retrospective on that. So maybe tell us what that was like.

27:49 Sure. So in this competition, and I tend to set these goals every year. So I just

27:54 announce my goals, go big or go home, right? I just tweet out the craziest stuff that I couldn't imagine.

28:00 Last year, I wanted to lose 50 pounds. I managed to lose 70 pounds.

28:04 Congratulations. That's massive.

28:05 Thank you. But yeah, I just set these goals. And one of these goals was to start on competing on Kaggle.

28:12 So in this competition, my first one ever, and all of these competitions are a similar experience. I just

28:18 joined the quick draw doodle competition because again, it looked exciting to me. What I did at that

28:24 time was just went to the discussion. I found people sharing stuff, sharing code. I just took that,

28:29 tweaked a few numbers, tweaked a few parameters, didn't make much sense. And I started moving up

28:34 the leaderboard. So the leaderboard is the most exciting and most addicting thing on Kaggle because

28:39 you're getting this real-time feedback. Okay, I'm doing better than these people. And then you go to bed,

28:44 you wake up, someone has shared a tip or a trick somewhere in a kernel or a discussion. And now

28:51 everyone has used that. And by the time you wake up, you're down by a hundred positions.

28:54 I see. They're like, oh, you're all just training all the data. What if you use like transfer learning

29:00 on this little subset? This is actually totally crushing it. Everybody's like, we're changing

29:05 what we're doing. And you wake up and you fall on down the leaderboard massively, huh?

29:08 Exactly. And again, now you have to get back to work.

29:11 One thing you talked about in your blog posts was how going through it, you got some pretty good

29:16 real world experience, right? You talked about how you, where were you talking about? You talked about

29:22 how you took all the training data and the data is a lot for this competition. There's like a billion

29:29 images described as a CSV file or something weird like that in each image. And so you took all that

29:35 data, loaded the training data, not all the data, and loaded it up and sent it over on your GPU. And it

29:42 took 50 hours, like more than two days. And you expected you're going to crush it, right? And it turns out

29:49 that like, actually that made it less accurate, right? So you had to get more creative. Maybe tell

29:53 us about that.

29:54 Yeah. And again, this was this disconnect that I found in from these MOOCs that I was coming from

29:59 with everything is just structured so nicely that it's supposed to work. And I just took that approach.

30:03 Okay. I'm just going to check all of the data in a data loader, put it on my GPU related train and

30:08 I'll get a good accuracy. It turns out not really because it's not how this problem was structured. And again,

30:14 I learned about all of these, I think from a practitioner's perspective, important things

30:19 where I learned, Hey, I need to structure my project in a way, because at some point I'll be

30:24 an untitled 152.ipython notebook and I need to go back. I wouldn't have a track of that.

30:30 I should probably do smaller experiments rather than the first one being a 50 hour long experiment.

30:35 So I should try and figure out how to run it on a subset of the data.

30:39 Yeah. That's a really good point because if you're waiting 50 hours per iteration, that's not going to go

30:44 very quickly.

30:45 It sounds very easy and very obvious, but it wasn't to me at least. Maybe I was stupid at that point.

30:50 Well, no, I wouldn't say necessarily that. I mean, it probably seemed like, well, of course, if it's working

30:56 a little bit, let's just give it all the data, then it's going to really work. Right. That's a pretty reasonable,

31:00 naive beginner point of view that that's going to be totally fine. But then in reality, you know, reality comes

31:08 along. Well, it's more complicated. So you ended up coming with a combination of like some of the

31:13 larger images, some of the smaller images and building up out of like that kind of stuff.

31:17 Right. Yeah. So I learned that, hey, maybe I should start with 1% of the training data, put up a baseline

31:23 again, obvious stuff, and then try to work with different image sizes. And what I was trying to do

31:28 is see if the accuracy, according to my local validation was going up and submitting it to the

31:33 leaderboards and just checking if it's actually working and then training bigger models through

31:38 that. At that point, Resonate was, I think, state of the art. That's what I was sticking to,

31:43 because I didn't have any outside idea about that. Other people were, of course, doing a lot of things

31:47 that I was just saying, I was just saying, do catch up. Sure. This portion of Talk Python to

31:52 me is brought to you by YourBase. YourBase has a really cool product that will dramatically improve

31:58 testing and CI of your Python applications. If you could benefit from having pytest run your test 100

32:04 times faster or more, you need to check them out. Here's how it works. YourBase observes what tests

32:10 interact with which part of your application code. And the first time you run it, the speed is roughly the

32:14 same as normal. But the next time you run pytest is where the magic is. YourBase knows which parts of

32:21 your application code has changed. If the code under test hasn't changed, why test it again? So YourBase

32:26 only runs the tests that have interacted with the part of the code that has. If you change just a couple

32:32 of functions, you only need to run the few relevant tests and all the others can be safely skipped.

32:37 This means skipping hundreds or even thousands of tests most of the time, making your dev test

32:43 workflow and your CI builds much, much faster. All you have to do is install YourBase and run pytest as usual.

32:50 They'll take it from there. Get your free trial by visiting talkpython.fm/yourbase. YourBase test

32:57 acceleration works with the tools you're already using. So give them a pip install and see the difference right away.

33:02 Get started at talkpython.fm/yourbase.

33:05 So you're a fan of Kaggle. You recommend people come along and use this for concrete ways to

33:13 get started and build their knowledge beyond just theoretical stuff?

33:16 100%. I would just say in retrospect, I would just tell myself to, hey, do fast, hey, sincerely once and

33:23 then just sign up for any competition and go from there.

33:25 Is it better to do it with a team of people? Do it by yourself?

33:28 I'll be honest. Sometimes I would not be the person working the hardest in the team. So

33:34 I would tell myself to at least start solo and then team up with different people. Everyone follows

33:39 different approaches, but at least for me, I tend to be the lazy person. So I would

33:43 make sure that I've done some homework because before asking other people to join the team.

33:48 Yeah, that makes a lot of sense. But apart from that, when you join your team and all of my

33:53 Kaggle quote unquote successes, I would credit it to all of the teams I've been a part of. And then you

33:58 get to meet all of these data scientists in a team where they're from different levels of experience and

34:04 they're doing these things that I couldn't have imagined. It's again, a greater learning experience

34:09 in that sense.

34:09 Yeah. What's the story in terms of people who are in the talk, you talked about them being

34:15 grandmasters or whatever they're called. Yeah. There's grandmasters, masters, experts,

34:19 contributors, and novices in the ranking here. What's the job story look like? The career

34:25 story. So if I'm over here and I'm one of the 1,500 masters in Kaggle, like dropping that

34:33 information at a job interview, is that going to get me somewhere or not? Do you think?

34:36 It depends on the company a lot. So when I say the company where I work, H2O.com has a lot of

34:43 taglers. We have 20 grandmasters. I think out of the five we can see right now, three are a part of

34:49 H2O.

34:50 Oh my gosh. Yeah, that's like 10% of all of them. That's awesome.

34:54 Oh, sorry. Four in the top five are a part of H2O at this point. Three of them, sorry.

34:57 Yeah, amazing.

34:58 So such a place, they of course recognize the fact that this isn't easy. If you're a master,

35:03 you're probably already in the top one, top 0.5% of the global rankings. And there's a lot of work

35:09 behind that. So I think it does make a lot of sense. Some companies don't recognize it. Maybe

35:14 I wouldn't want to work at those companies. Again, hot take.

35:17 Yeah, that's actually an interesting point, isn't it? Like if the person interviewing you for a data

35:22 science position doesn't know about Kaggle and respect like massive progress there, maybe you don't

35:27 want to really be on that team. Unless you're like, we're hiring you to like modernize this and set

35:33 the stage and like bring like the real stuff to us. But if it's like, join the team, we'll show you how

35:38 it's done. It's like, eh, you don't know what Kaggle is. Okay.

35:41 It's just a portfolio of projects. You can tell everyone that, hey, I worked on this problem that

35:47 your company is working on. And against the best of the best, I ran say 10 out of 1000. And that's,

35:54 that should be a huge signal to the hiring people. I agree. I think, you know, put aside

35:58 the competition, put aside the, how do you rank against other people? If you can come over here

36:03 and say, oh, you see this major league baseball digital engagement thing? I did that and it came

36:08 out pretty well, actually solved that problem. And here's my GitHub repo for that and our conversations

36:14 around it. This one about the prediction of future sales also did that. And then this home price one,

36:20 actually I was near the top of that, like just having that kind of portfolio to share as part of

36:27 an interview is so incredibly important. So many people ask me, I want to get a job in this thing.

36:33 How do I get started? Do I need degree X or should I go learn this technology or that technology? Like

36:40 all those things are interesting and valuable, but being something I really like about the tech industry,

36:46 but it's also, you know, it's a challenging cause that's kind of where you got to live is it's not

36:51 so much your credentials or your background that will get you the opportunities. It's I need somebody

36:55 that does this. I need somebody that knows how to predict house prices. You predicted house prices.

37:00 You've shown, you can do it. You're hired, right? Like if you can show that you're doing the thing that

37:05 they already need, there's not a whole large discussion going on after that, right? You're

37:10 really close to being in the right place to do that thing. So building up this portfolio

37:14 is important. I think I managed to somewhat figure this out in my university days out of an interest

37:19 just to explore problems that I started freelancing, which was because I wasn't allowed to have a job

37:25 job while being in university. And at that point I figured out, Hey, if I'm going to approach a person

37:30 on let's say Upwork and they want me to build something, I shouldn't be starting after we've had that

37:36 conversation. If I can just look at the problem, even put together the most basic structure around

37:42 it. And I can show it to them that, Hey, I put this together in two days. If you hire me, I can

37:46 build this in X amount of days. And most of the times that got me through the clients or whatever

37:52 deals I've got in that world. Yeah. And getting that first or second project under your belt.

37:57 It's really important. I feel like Kaggle is part of that. Also, you know, Upwork is interesting that

38:01 you bring that up. I I'm a fan of Upwork. If I was starting out and trying to get my first project,

38:07 my first job, and I was having a hard time in my local area of finding that I'd certainly consider

38:13 looking and seeing what jobs are out there in Upwork, even if I thought they didn't pay very

38:17 well, or I didn't totally want them just having that one or two projects done. And part of my resume,

38:23 then you can start looking, you know, more broadly. And it's just going to be such a help to have some

38:29 kind of portfolio. Right. So as a student, I have the pay really didn't matter. It was a lot as a

38:34 student, but my biggest promotion in life was going from that basic food menu to looking up

38:41 that menu as I started making money. That was exciting. It almost felt illegal that he, someone

38:47 is paying me to write code. Yeah. I remember my first job. I was so super excited. It almost didn't

38:52 matter what they could have paid minimum wage and I would have been thrilled about it because,

38:55 oh my gosh, someone's paying me to learn programming. And look, I have a book. I'm

38:59 spending half my time just learning how to do this. I mean, they're basically paying me to learn this

39:04 stuff. It's amazing. So yeah, I really, really had the same feeling when I was getting started.

39:08 Fantastic. All right. So another interesting area around what you're doing has to do with your

39:14 podcast. So maybe we could talk about just a couple of your, a couple of your interviews that

39:19 you've done that you really liked, right? Sure.

39:21 You found interesting. So tell us a bit about a couple of them.

39:24 Sure. So at some point how this started was I was doing all of these, I was trying to

39:28 essentially explore different areas of content creation. I started with blogging,

39:32 fast AI, gurus, Jeremy Howard told us to write blog posts. So I started doing that.

39:38 And at some point I found this disconnect of advices. So I reached out to a friend that,

39:43 Hey, would you, you've been helping me a lot. Is it okay if I put this together in a blog post

39:48 and put it out in the world? And that went on for a while. I started this as a blog series.

39:52 And later after I graduated, I thought, okay, maybe if I do this as a podcast and I'm sure

39:58 you would agree, I could explore all of these great people's mind in a bigger depth. So that's

40:04 how the podcast started for me.

40:05 Yeah. Well, one of the big secrets about having a podcast is I get to be the first

40:10 listener basically to all these interviews, right? I mean, I guess now that we're live streaming,

40:14 it would have like 50 first listeners or something, whatever it turns out to be. But

40:17 it's really amazing the opportunity to just meet these people that you're really interested in,

40:23 especially with conferences being gone and stuff. Now that's really hard to find time to like meet

40:27 up and just talk about them, but Hey, you can have them as a guest on your show. It's really nice.

40:31 Exactly. So coming back to my favorite interviews, I try to interview people about their journey.

40:36 As someone who's trying to understand how did this great person, Radek Osmalski, we have Radek

40:42 Osmalski's interview on top. He's one of my heroes from Fast AI. But how did someone like him learn

40:48 programming? How did they learn how to Kaggle? How did they break into the field? And we, at least in

40:53 the interviews, I try to ask them, did you face this problem? How did you overcome it?

40:57 My three favorite interviews would be Radek's, Dima Domains and Rada. So I try to interview people who

41:04 are Kagglers, practitioners and researchers, essentially anyone I can find who would like

41:09 to share their journey. And these are from all three aspects, essentially.

41:13 That's cool. Dima, Damon, she did video recognition and computer vision. That sounds super interesting.

41:19 I remember in her interview, I started by asking her, Hey, when you were doing your research,

41:24 you were just using OpenCV. What do you think about it nowadays? Apart from that, it's also a lot about

41:30 her research perspective. So Dima is very much experienced and is a great orator as well. So

41:35 she was talking about how to approach your first research project or how to just go about research.

41:41 What is even research as someone who doesn't understand what that word means? And that's

41:45 what I try to explore in all of these interviews.

41:47 Yeah. So let's go back to Kaggle for just a minute, because something you touched on is really

41:52 interesting. I know there are a lot of research teams and groups at universities who are trying to

41:59 build models or trying to build mathematical algorithms or trying to do research. I feel like

42:05 maybe some of these Kaggle competitions would be really, really good to say, as part of our research

42:10 project, let's take what we're trying to develop here and try to actually apply it to one of these

42:15 competitions and see where it stands.

42:17 For sure. And it's highly encouraged, at least in the community, some organizers,

42:21 so the sponsors of the competition invite you to present your solution even in research conferences.

42:28 And apart from that, even if you end up creating a blog post or a research paper outside of

42:33 the community is very close and they recognize it instantly. And they know that, I mean, nothing

42:38 against research, but at least this particular solution has been tried and tested against this

42:44 leaderboard and it works really well. It is quite cutting edge because it's been tested against all

42:48 of these people.

42:49 Yeah. Yeah. Very neat. All right. Third one that you had queued up for us is Andrada

42:53 Altianu.

42:54 Altianu. Yeah.

42:55 Altianu. Yeah.

42:57 The best part about these interviews is I just get to meet all of these people with such amazing

43:01 energy and such openness about their journey. This was again, such a fun interview because Andrada was

43:07 so open about her journey. I was at that point, I was just starting my journey in data visualization.

43:12 And I asked her, Hey Andrada, did you feel the same that you couldn't plot things against X and Y axis and then you would have a hard time figuring out where they are ending up? Because at least for me,

43:22 I couldn't understand what's going on. And that's what we discussed about. And she was essentially talking about how she started her journey as someone who's fairly new to coding.

43:31 And at this point, she's become a Kaggle Grandmaster in kernels and she's been writing all of these amazing notebooks. And in this interview, we just learned about how she went about that as someone who,

43:42 just started out and then learned all about this as they went about.

43:45 Yeah. That looks like a really interesting interview and somewhat similar to the one that a conversation we're having here, right?

43:50 I think so. Yes.

43:51 Yeah. Yeah. All right. One final area that I want to make sure we get to spend some time on is, you know, you work at H2O.ai. I've had a lot of experience with these different frameworks. Maybe we could do like a survey of the various deep learning ML libraries and you could sort of tell me how they compare and your thoughts on the various ones.

44:09 Sure. So the vision set for, both by our founder, co-founder and CEO is makers going to make. And what we're trying to do on a philosophical level is just create products that allow people to build stuff.

44:23 So with that vision, and this is just my take on it, not from a company's perspective, they've built together all of these auto ML products. So Wave being the latest one, I'm sure we'll talk about this. But apart from that, they started out with the open source H2O3, which was an auto ML framework. It's still one of the most widely used ones. Followed by driverless CI, which is an end-to-end auto ML product where essentially you just upload your data.

44:48 And what I like to call the Ironman mode where you just click a button, it figures out what models need to be trained, does all the feature engineering and puts out a nice model for you.

44:59 So we have this arsenal of auto ML products. At this point, open source and both enterprise facing aimed at different problems. Wave being the latest one of them.

45:08 Yeah. Wave, H2O Wave is pretty interesting. I guess it's at wave.h2o.ai. And it's a real-time web app dashboard for Python and data science. And a lot of the data science things I see are about making static graphs or maybe graphs that you can go and explore.

45:26 Like I could move my mouse over and it'll like highlight information about different parts that I could zoom into it and whatnot. But this is like a real-time changing dashboard, like a stock market or like a factory or something like that. You want to see what's happening as time passes, right?

45:42 I wish you said crypto market.

45:45 Yes, exactly. So the reason I spoke about the philosophy is because I think this is the next bigger goal for the company. What we're trying to create is, I'm not sure if this is out yet or not. We're trying to build a public app store of AI apps.

45:58 So Wave is an open source framework that takes care of a lot of the things that as a data scientist, at least I wouldn't want to worry about. I don't want to learn HTML, CSS, JavaScript. So it's just a framework that takes care of all of the UI, UX stuff, does it very nicely. I don't have to worry about messing up because it's taken care of. And then I can build different AI apps.

46:20 As a company, what we're trying to do is we're also putting out an app store where you can already use the open source apps if you want. And you can also contribute your own apps if you want.

46:30 Yeah, very cool. So there's a bunch of cool examples. There's a whole gallery full of many, many different things that you can go and write. Basically, Wave is a open source dashboard for Python developers that don't have to do web stuff, but they can share it as the web, right? On a website.

46:47 Exactly. And just to be clear for the audience, when I say we, my biggest contribution would probably be this interview. But again, it's this amazing team of engineers who have been building these products at H2O that they know how to scale them and how to properly engineer them through all of this experience. It's really a data scientist-focused product.

47:08 Every now and then there's like a project I'm like, or something out there. I'm like, I really wish I had a reason to use this. This looks like really fun to play with. I just have no use for it personally. This is one of those things, right?

47:18 I would love to have an excuse to use something like this and make it go, but I just don't have that much data that changes that much in my world.

47:27 Maybe you could together a web page of Talk Python where different episodes structure themselves and the listeners can see a dashboard in real time. Just a suggestion.

47:36 That would be cool. Like maybe downloads in real time and interaction in real time, comments. Yeah, for sure. Something like that.

47:42 Yeah.

47:43 Yeah. I mean, I could definitely like put a little bit of something, but if you worked at a place that like had a lot of stuff going on, like a factory or like a big e-commerce site or something, you could make a really cool live app out of this stuff, I feel like.

47:57 For sure. And again, it's still under development and all of our grandmasters who have this rich experience are also contributing to it. So I'm sure by the time this interview goes out, we would have added a lot to it.

48:09 Yeah, it's cool. It's already got 2.6,000 GitHub stars. That's pretty cool. So really, really nice. Maybe let's talk about some of the other mainstream ones as well, like Keras, TensorFlow, Fast AI. Give us your thoughts on these different frameworks. Obviously, it's your opinion, not like a, you know, endorsement or a deep dive or whatever. But just what do you think about these people or don't necessarily have experience with all of them?

48:30 Yeah. Just to be clear, I strongly endorse Fast AI. I've been a fan of that. That I'll agree on. But from that, I started my journey with TensorFlow.

48:38 TensorFlow, at least in that day, of course, TensorFlow has come along with the Keras API has been merged. But I was really struggling because it had this static graph structure and it didn't feel Pythonic. Not that I was a good Python programmer. I'm still not. So that's why to Fast AI and what Keras is to TensorFlow, Fast AI is somewhat to PyTorch. PyTorch follows this more Pythonic approach and Fast AI is a wrapper, but more on top of PyTorch.

49:05 Okay, interesting. Yeah. So Fast AI is maybe a little easier to get started with, you think?

49:10 Yeah. So the nice thing about Fast AI is a very heavily opinionated library. So there are a lot of things that have been baked into it. And for some reason, whenever I just switch to PyTorch, I am not able to replicate similar accuracies. That's what I mean, because somewhere defaults are so good that it always gets better results. But essentially, it's this layered API. And from an end user perspective, I could just use the high level API where they have on the left, you can, if you click on applications, you can see the difference.

49:40 Applications that they support. Or if I want to work on something that's cutting edge, I can also use the training loop, which is really nice, and just bring in a PyTorch model and connect that.

49:50 Yeah. Okay. Yeah, that looks really cool. Again, computer vision. I want to build a computer game AI that plays me in real life. Like put a camera over, say, a chess board, and it'll play me, but not just on the screen. As I actually move the things, right, it'll see. That'd be fun. Maybe I can try that out here.

50:11 Sounds very cool. Yeah.

50:12 Yeah, just a little bit of interaction with some real something or other there. That sounds cool. But a lot of options these days, right? And we've got all these different people in libraries and many things to choose from, right?

50:23 Again, one of the things that I've learned, at least from the podcast, and this is the collective opinion of everyone I've interviewed, again, you don't need to worry about the framework as much as you really understand the concepts. So that's why I encourage FastHair, because it's also a course around the framework. So once, at least from my perspective, when I've gotten around to learning all of these things, it shouldn't be that hard to switch to another framework, depending on whatever your job requires you to do or whatever your project needs you to use.

50:51 Yeah. Well, yeah, you learn the foundation, solve the problem one way, and then you can solve it with some other library more easily again and again.

50:58 It's really hard for me to remind myself that, hey, the problem is what I'm trying to solve and not create more problems. I don't want to learn more of different things, but I need to figure out how to minimize my time in a way that I actually solve the problem.

51:12 Yeah. I think one interesting thing that people learn as they get more experience is, even if the technology is super different, right? If I learned how to build something interesting in JavaScript, maybe I know nothing about Python, so how am I going to do that? But actually, what you've learned over in one place is really way more transferable and reusable.

51:31 Like the way of just thinking about solving problems, the way of thinking about, okay, I got to pay attention to this and not that. So what's important in this library, picking the right library are these things and so on.

51:41 And of course, you should keep switching between frameworks as well. The thing for me was I was switching as a very early stage developer. I'm still a very early stage, if I can even call myself a developer.

51:51 And I was switching between frameworks every 15 days just because they looked exciting. That's not the right thing I would tell myself to do.

52:00 Yeah, this is true. This is true. Get comfortable in one and then you can move around. But yeah, don't just chase the shiny thing all over the place for sure. Although in the data science world, there's so many shiny new things that there are to pay attention to and visualizations and libraries and charting and graphing and whatnot. It's easy to get distracted, I think.

52:18 For sure. And that's why I mentioned I need to remember what I'm working on. So I need to make the graph and not figure out how to make it prettier as long as it does what it's supposed to do.

52:26 But that framework looks exciting. Maybe I should try that over the weekend and now I'm spending 15 days.

52:32 More like here's an excuse to try that framework. This is my chance to try it. So I'm going to go do it.

52:36 Exactly.

52:37 Exactly. Well, all right. Comment on the live stream. Davinas says, hey, some advice on getting started on web development or data science, you know, Python. I'll throw out a little bit then you can add your thoughts.

52:49 Sure.

52:50 I would say you need to have some foundation in just Python basics, right? You need to know variables, loops, functions, like that kind of stuff. But don't kind of like the beginning conversation we had. Don't go so deep and say, well, I've got to completely understand everything about this language before I take the step to my first web app or before I take the step to like firing up Jupyter and doing my first analysis.

53:14 Like, don't do that. You know, just get comfortable with the basics. Start building. And as you go into more advanced areas, then you're like, OK, well, now I kind of need to learn about what is a list comprehension.

53:24 Michael, but first question for you. What are the basics? Really? That's one thing I really struggle with. I still struggle with because I look around on Twitter. Everyone smarter than me is talking about this stuff. And this is pretty basic. Is it? Am I the stupid person who needs to know all of this?

53:48 Well, here is the interesting thing. Like the people who are blogging, the people who are recording YouTube videos or people who are tweeting about things, they're already at like some certain level. And then they're super psyched about something advanced that they've just learned or some really cool scalability thing that they've learned.

54:09 And there's a really good article says titled something like you're not Instagram, you're not Google, you're not LinkedIn or something. So you don't need all these crazy design patterns and this like crazy cloud architecture that companies like that have because you're a two person startup that doesn't even yet have a business.

54:25 Build something simple. And I think there's a lot of people that are fascinated by either looking up, like, look where we could go and look at what Instagram is doing, look at what Google is doing. And they are amazing and interesting what those companies and teams are doing, but they don't apply to you now.

54:47 You know what I mean? So I think there's just a lot of really interesting conversations about stuff that's interesting, but not applicable to people who are beginners at all. Right.

54:57 Exactly.

54:57 You need to master Dockers and Kubernetes. Probably not. Can you run it on your computer? Yes. Okay. Then start there. We'll worry about Docker. Like once you get something working, maybe we'll put it in a container. But now, don't worry about that now. Get started.

55:10 Exactly. And just to the person asking this question, focus on getting the website up and give yourself a deadline. That's why I love setting goals publicly. Give yourself 10-20 days to figure out the Python basics and put together first website. You won't like it. In retrospect, you might hide it from a GitHub. I do that a lot. And over time, you'll polish it. It doesn't have to look like, like you said, Facebook or Instagram when it comes out.

55:36 It just needs to function somewhat. And sometimes you'll click a button, something will fail. But then you figure out that, okay, I need to fix this now. And now you have stuff to do. And then you can think of other things. Okay, maybe I should add this. Maybe I should add a button. You're making progress already.

55:51 Yeah. Yeah, absolutely. And the other thing to keep in mind is that software is plastic. It's malleable. It can be changed. You don't have to get it right the first time. You have to just make progress.

56:02 You've learned more than you change it and you make more progress. And so many people can get hung up, like not even getting started because like, well, I'm not really sure how to get started. Like, just take a step. If it's wrong, you take a step in a slightly different direction until you get in the right place. Like that's how you do it without getting hung up, without trying to boil the ocean by learning everything.

56:21 Exactly.

56:21 All right. Maybe that's a good place to leave it there for that conversation. But yeah, it's super interesting to hear your story. And congratulations on the success coming from getting started with small projects in college to working for H2O AI.

56:35 I'm still learning a lot. But again, thanks so much for this opportunity. Like I said, I think there are two types of teachers. First, that they introduce you to something and the second that make you really interested in it. You were the second one to me because I just got so excited about all of these things through your podcast. And of course, there were others as well, but you were a major part of it. And yeah, thanks for this opportunity.

56:55 Oh, yeah. Thanks so much. Now you're not out of here yet, though. You got to answer the two final questions. If you're going to write some Python code, what editor do you use?

57:03 Jupyter notebook.

57:04 Okay, yeah, right on. And then is there some library or something on PyPI you've come across recently? You're like, oh, this is super cool. Got to tell people about this.

57:12 I keep running into all of them every second day. But I would say just discovering fast AI was the biggest wow moments for me.

57:18 Yeah. All right. So fast AI. Perfect. That's a good one. All right. Final call to action. People are interested.

57:24 They're maybe also listening, getting into programming, getting into data science. And what advice do you have for them?

57:30 Just build something or just go to Kaggle if you can't figure out what project to work on. I still struggle with that inspiration a lot. So I just, I would just tell myself to go to Kaggle and sign up for any competition that I like the most and go from there. Probably take fast AI along the way and you're all set.

57:45 All right. Fantastic. Well, thanks so much for being here and catch you later.

57:49 Thanks so much.

57:50 Yeah. Bye.

57:52 This has been another episode of Talk Python To Me. Our guest on this episode was Sanyam Bhutani.

57:57 It was brought to you by Sentry, Your Base, and Assembly AI.

58:01 Take some stress out of your life. Get notified immediately about errors in your web applications with Sentry.

58:07 Just visit talkpython.fm/sentry and get started for free and use the promo code talkpython2021 when you sign up.

58:16 Your Base test acceleration will dramatically improve dev test workflows and CI builds of your Python applications.

58:23 If you could benefit from having pytest run your tests 100 times faster or more, you need to check them out.

58:28 Get started at talkpython.fm/yourbase.

58:33 Transcripts for this and all of our episodes are brought to you by Assembly AI. Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code. Visit talkpython.fm/assembly AI.

58:45 Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python.

58:50 Our content ranges from true beginners to deeply advanced topics like memory and async.

58:55 And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm.

59:01 Be sure to subscribe to the show. Open your favorite podcast app and search for Python. We should be right at the top.

59:07 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

59:17 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

59:28 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

59:34 Thank you.

59:54 Thank you.