Learn Python with Talk Python's 270 hours of courses

Building ML teams and finding ML jobs

Episode #298, published Mon, Jan 11, 2021, recorded Wed, Nov 18, 2020

Are you building or running an internal machine learning team? How about looking for a new ML position? On this episode, I talk with Chip Huyen from Snorkel AI about building ML teams, finding ML positions, and teach ML at Stanford.

Episode Deep Dive

Guests Introduction and Background

Chip Huyen is a computer scientist, writer, and educator, currently working at Snorkel AI. She initially came from a writing background, traveling the world and authoring cultural and food stories, before discovering her passion for coding at Stanford. After immersing herself in computer science courses and teaching a TensorFlow-focused class, she pivoted to AI/ML engineering. Today, her work at Snorkel AI covers everything from developing the core platform to speaking with customers about their machine learning needs. She is also authoring a book that helps both interviewers and interviewees navigate the evolving realm of ML job interviews.

What to Know If You're New to Python

Here are a few essential ideas to help you follow along and get the most out of this discussion:

  • Python has a broad ecosystem for data science, letting you do everything from quick experiments in notebooks to running large-scale ML models.
  • You can start with simple examples and pre-trained models rather than building solutions entirely from scratch.
  • Tools like Jupyter Notebooks or VS Code make it easy to explore data and learn the language side-by-side.
  • Be prepared to pick up a bit of engineering knowledge (e.g., version control) so you can move from experimenting to deploying real projects.

Key Points and Takeaways

  1. Building Internal ML Teams Companies often jump into machine learning without a clear plan for who to hire or what problems ML will actually solve. People with business, data, and software skills can be more important early on than hiring strictly for PhD-level research.
    • Tools and Links
      • snorkel.ai — The startup where Chip works, focused on data-centric AI development.
  2. Machine Learning vs. Data Science There is a significant difference between machine learning engineering (productionizing models, MLOps) and data science (analysis, insight generation). ML engineers care deeply about deployment, monitoring, and performance, while data scientists often focus on exploration and insights. Both skill sets can overlap but require distinct expertise.
    • Tools and Links
      • PaperMill GitHub — Mentioned as a way to manage notebook-driven workflows, an example of bridging the gap between research and deployable work.
  3. Domain Knowledge and Existing Talent Many companies start their ML practice by transitioning internal data science teams into ML roles because of their familiarity with the domain and datasets. It’s often easier to teach solid engineers some ML, or data scientists the production details, rather than hiring external ML superstars who lack domain context.
    • Tools and Links
      • handcalcs on GitHub — Demonstrates how specialized tools can enrich data analysis in Jupyter, even if you’re not an AI expert.
  4. Iterative Development for Production ML Real-world machine learning is never a “one and done” task. Once a model is deployed, there is constant iteration: capturing feedback, adjusting heuristics, gathering more data, or refining the architecture. Teams that succeed treat ML as an ongoing lifecycle rather than a single project.
    • Links
      • VS Code — A popular editor mentioned for quickly iterating on Python and ML code.
  5. ML Competitions vs. Real-World Constraints Public leaderboards (e.g., old Netflix Prize) can produce complex models that are tough to put into production. Winning solutions often prioritize raw accuracy over deployment feasibility and maintainability. Real systems must consider factors like user experience, latency, interpretability, and business objectives.
  6. Startups and Rapid ML Experimentation Smaller companies let you gain exposure to everything from data pipelines to user-facing features. You might be experimenting with labeling strategies one week and building production pipelines the next. For many, this fast-paced environment drives tremendous growth but also demands adaptable, broad skill sets.
  7. Teaching as a Path to Mastery Chip’s journey reveals the power of teaching for deepening your expertise. By leading courses on TensorFlow and systems design, she forced herself to understand ML tools more thoroughly. If you want to level up, consider writing blog posts, giving talks, or tutoring others on what you’ve learned.
  8. The Interview Process for ML Roles The ML interview experience can be confusing for both candidates and companies, particularly because these roles are evolving. Expect questions about system design, data management, deployment, and teamwork, not just coding challenges or advanced math proofs. Chip is writing a book to help standardize these expectations.
  9. Choosing Tools and Frameworks With so many available libraries—TensorFlow, PyTorch, hugging face models, and more—it can be challenging to decide what to learn first. In practice, the best stack often depends on your project, existing infrastructure, and team’s familiarity with Python. It’s better to master a few libraries deeply than jump around too frequently.
  10. Snorkel AI’s Approach to Data-Centric ML Snorkel AI focuses on labeling data programmatically and iteratively improving model performance. Rather than having humans annotate thousands of items, you can encode domain heuristics as functions. The platform unifies data management, model training, and deployment, emphasizing the cyclical nature of modern ML development.

Interesting Quotes and Stories

  • On discovering programming: “When I was younger, I thought programming was the most boring job in the world… but once I actually tried it, I realized it could be creative, collaborative, and fun.”
  • On building real ML teams: “There’s a big gap between reading academic papers and getting that model into production. A lot of it is engineering—and yes, it’s a lot of iteration.”
  • On teaching: “I taught TensorFlow not because I was an expert, but because I wanted to become one. Nothing shows you what you don’t know faster than teaching it.”

Key Definitions and Terms

  • Data Science: Focuses on exploring data, finding trends or anomalies, and generating insights for business decisions.
  • Machine Learning Engineering: Involves deploying and maintaining ML models in production, emphasizing robust code, scalability, and monitoring.
  • Heuristics: Rule-of-thumb or domain-driven logic used to label or filter data automatically, rather than labeling large datasets entirely by hand.
  • Leaderboards / Competitions: Platforms (like old Netflix Prize) where teams compete to achieve the best model accuracy. While helpful for research, these solutions aren’t always practical for production.

Learning Resources

If you want to sharpen your Python skills to support machine learning work, here are a couple of curated courses from Talk Python Training.

Overall Takeaway

Companies are eager to leverage machine learning, but their success depends heavily on bringing the right mix of technical, domain, and collaborative talent to the table. Chip’s experience—from writing to coding to teaching—underscores the value of diverse backgrounds and continuous learning in this field. By recognizing that practical engineering considerations, labeling strategies, and iterative improvement are central to deploying ML, organizations and aspiring ML professionals alike can be better prepared to thrive.

Links from the show

Chip on Twitter: @chipro
Snorkel AI: snorkel.ai
Chip's Book Preview: twitter.com
handcalcs project: github.com
IBM Buzzword Bingo: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon