A path into data science
Episode Deep Dive
Guest introduction and background
Sanyam Bhutani is the featured guest on this episode. He graduated with a bachelor's degree in computer science but felt that his formal education did not fully prepare him for a career in data science. Through self-directed study, community involvement, and experimentation with tools like Kaggle, fast.ai, and various online courses, he charted a personal path into AI and machine learning. Sanyam currently works at h2o.ai, where he creates data science content, hosts a podcast, and interacts with a community of Kaggle grandmasters and practitioners.
What to Know If You're New to Python
If you’re newer to Python and data science, here are a few tips from the episode to help you get started faster:
- Understand core Python basics (variables, loops, functions) before diving into specialized libraries.
- Experiment in small steps: Work on small project ideas and gradually increase complexity rather than mastering “all the theory” up front.
- Leverage communities such as Kaggle, fast.ai, and local meetups to ask questions and share progress.
Key points and takeaways
Building a Self-Guided Path into Data Science
Sanyam found that university coursework in computer science did not fully align with the practical demands of data science and AI. By taking online courses (including MOOCs) and entering Kaggle competitions, he filled in the gaps and developed real-world skills on his own terms.Top-Down vs. Bottom-Up Learning
Traditional degree programs often rely on bottom-up learning, emphasizing fundamentals before tackling applied projects. Sanyam discovered that fast.ai and Kaggle encouraged a top-down approach: Build exciting, functional projects first, then go back and solidify the underlying concepts.- Links and Tools:
Harnessing Kaggle Competitions for Real-World Practice
Kaggle competitions offered Sanyam an avenue for hands-on learning. He could iterate quickly, compare his results on a leaderboard, and collaborate with teams of experienced data scientists. This fast feedback loop built motivation and practical expertise.- Links and Tools:
Iterative Project Development
In one of his early competitions (the “quick draw” doodle challenge), Sanyam learned the importance of smaller data subsets and iterative experiments rather than huge, 50-hour training runs that might not necessarily yield better results. This was a pivotal lesson in experimentation and workflow organization.- Tools and Concepts:
- Python-based experimentation (Jupyter notebooks)
- Subset training and data loaders
- Tools and Concepts:
Building a Portfolio and Reputation
Engaging in Kaggle, writing articles, and creating content (like Sanyam’s blog and podcast) not only honed his skills but also served as a public portfolio. Being able to point to top Kaggle rankings or in-depth blog posts often carries more weight in hiring decisions than traditional credentials alone.- Relevant Platforms:
- Upwork (for freelance projects)
- Personal blogging platforms like Medium or Dev.to
- Relevant Platforms:
Creating Community-Driven Content
Sanyam’s podcast (“Chai Time Data Science”) highlights insights from Kaggle Grandmasters and AI researchers. Through interviews, he shows how different backgrounds, approaches, and problem-solving methods lead to breakthroughs in data science.- Links and Tools:
- Chai Time Data Science (Sanyam’s podcast)
- Links and Tools:
h2o.ai and AutoML
Sanyam now works at h2o.ai, which focuses on automated machine learning tools. Products like driverless AI and the open-source H2O suite help data scientists quickly train and deploy models without manually managing endless parameters.- Links and Tools:
Building Interactive Dashboards with H2O Wave
H2O Wave is a real-time web application framework for Python developers who want to build dashboards without heavy front-end coding. This underscores the importance of sharing real-time insights and analytics in a way that non-technical teams can act upon.- Links and Tools:
Fast.ai’s Practical Deep Learning
The fast.ai course combines code, community, and a library that wraps PyTorch in a higher-level API. Sanyam praises it for making deep learning approachable quickly, so you can see meaningful results—like top leaderboard spots in Kaggle competitions—right away.- Links and Tools:
Avoiding Overengineering at the Start
Both Michael and Sanyam emphasized that new developers shouldn’t jump into advanced practices like Kubernetes or intricate design patterns too soon. Start small, get something working, learn from it, and only adopt more complex tools when they become necessary.
- Key Concepts:
- Minimum Viable Product (MVP) approach
- Simple scripts before containers or large-scale architecture
- Overcoming Impostor Syndrome
The episode touches on how easy it is to feel unprepared or “behind” in the fast-moving tech landscape. Both host and guest encourage developers to focus on incremental learning and celebrating small wins, such as finishing a mini project or climbing a Kaggle leaderboard.
- Tools and Strategies:
- Public goal-setting (tweets, blog)
- Discussion forums for peer validation
- Translating Learning into Job Readiness
Rather than fixating on degrees, focus on demonstrating capabilities through open-source code, Kaggle competitions, or freelance gigs. This show-don’t-tell approach resonates strongly with data-driven companies who want evidence of applied skills.
- Links and Tools:
Interesting quotes and stories
"It didn't make me a better programmer at all, let me start with that spicy opening." — Sanyam on his university experience
"I think there's so many opportunities to get into data science or to get into Python and programming, no matter your background." — Michael
"I was just going to watch more MOOCs, but I realized I should spend my time actually coding." — Sanyam
"Some companies might not recognize Kaggle experience. Maybe I wouldn't want to work there." — Sanyam
Key definitions and terms
- Kaggle: A platform that hosts machine learning competitions, data science challenges, and community notebooks. Great for honing practical ML skills.
- fast.ai: A high-level deep learning library built on PyTorch, paired with a course emphasizing top-down learning, making deep learning more accessible.
- AutoML: Automated Machine Learning, which automates repetitive tasks of training and tuning ML models. h2o.ai’s Driverless AI is one such product.
- Top-Down Learning: An educational approach focusing on immediately building real projects or models before fully diving into low-level theoretical details.
- PyTorch: A popular deep learning framework in Python, often used in combination with fast.ai.
Learning resources
Here are a few recommended resources mentioned or inspired by this episode:
Python for Absolute Beginners
Ideal if you want a clear and friendly introduction to Python fundamentals, step by step.Data Science Jumpstart with 10 Projects
If you’re excited about data-driven projects, this course will guide you from zero to working on interesting data science tasks.Kaggle
Join competitions to get hands-on experience with real datasets and see how your solutions compare with others.fast.ai Course
Learn top-down deep learning with accessible lessons and a supportive community.
Overall takeaway
Aspiring data scientists can thrive by focusing on practical, hands-on experience. That means picking an interesting real-world challenge, using tools like Kaggle to get quick feedback, and gradually learning the theoretical underpinnings. Resources like fast.ai, community platforms, and lightweight frameworks provide a gentle entry into projects that make a real impact—proving that consistent effort, collaboration, and curiosity can shape a successful path into data science.
Links from the show
Chai Time Data Science Podcast: youtube.com
Fast AI: fast.ai
How not to do Fast.ai (or any ML MOOC): medium.com
First Kaggle Competition Experience: towardsdatascience.com
Kaggle competitions: kaggle.com
Radek Osmulski interview: youtube.com
Dima Damen interview: youtube.com
Andrada Olteanu interview: youtube.com
H2O Wave: wave.h2o.ai
Keras: keras.io
Tensorflow: tensorflow.org
PyTorch: pytorch.org
Quick, Draw! Doodle Recognition Challenge: kaggle.com
Developers, Developers, Developers song: soundcloud.com
YouTube Live Stream: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy