Learn Python with Talk Python's 270 hours of courses

Data Science Panel at PyCon 2024

Episode #467, published Thu, Jun 20, 2024, recorded Sat, May 18, 2024

Guests and sponsors
I have a special episode for you this time around. We're coming to you live from PyCon 2024. I had the chance to sit down with some amazing people from the data science side of things: Jodie Burchell, Maria Jose Molina-Contreras, and Jessica Greene. We cover a whole set of recent topics from a data science perspective. Though we did have to cut the conversation a bit short as they were coming from and go to talks they were all giving but it was still a pretty deep conversation.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests

  • Jodie Burchell Jodie is a data science developer advocate at JetBrains. She has a strong background in natural language processing (NLP) and statistics. Before becoming a developer advocate, Jodie spent many years in data science roles, focusing on both core statistical modeling with tabular data and cutting-edge NLP problems.
  • Maria Jose Molina-Contreras Maria is originally from Barcelona and now lives in Berlin, working as a data scientist at a startup tackling sustainability challenges. She’s been involved in data science for around eight years and is passionate about using machine learning tools to address real-world problems, such as sustainable packaging.
  • Jessica Greene Jessica is an ML engineer at Koja, which she describes as “the search engine for a better planet.” She moved into tech from a career in coffee roasting and has been in the field for six years. Jessica is largely self-taught, with a strong interest in backend engineering before moving into machine learning.
  1. PyCon Community and Experience
    • All three guests are excited to be at PyCon 2024. They note the welcoming, community-focused atmosphere at Python conferences and encourage people of all experience levels (especially beginners) to attend. They highlight that there are many programs, grants, and financial aid options that lower the barriers to entry for Python events.
  2. Transition into Data Science
    • Jessica’s career-change journey (from coffee roasting to ML engineering) illustrates how accessible data science can be today—there are many resources and communities to help people become self-taught developers.
    • Jodie and Maria also mention career transitions, emphasizing that non-traditional backgrounds or prior academic research can lead naturally into data science.
  3. Core Data Science vs. LLM “Hype”
    • Despite the excitement around large language models, a major portion of day-to-day data science work still focuses on more “traditional” approaches such as decision trees, linear regression, and solid data cleaning practices.
    • The panel points out that many business problems are still well-served by simpler and cheaper-to-deploy models.
  4. Measurement, Validation, and Bias
    • Jodie’s upcoming (now recorded) PyCon talk, “Lies, Damn Lies, and Large Language Models,” delves into measuring hallucinations and misinformation in LLMs. She mentions metrics like Truthful QA for factual correctness, highlighting that these evaluations often target specific definitions or language domains.
    • Maria brings up the importance of carefully testing and evaluating LLMs for bias, toxicity, and potential security issues. She mentions the open-source library Giskard for model evaluation and fairness checks.
  5. Useful Tools for LLMs and Model Evaluation
    • LangChain A popular Python framework that helps build chat-like applications and more general LLM-based pipelines, including retrieval-augmented generation.
    • Giskard A tool for inspecting, testing, and evaluating machine learning models (including LLMs) with a focus on bias detection and interpretability.
    • CodeCarbon Jessica highlights CodeCarbon for measuring the carbon emissions of Python workloads. It helps teams assess the energy footprint of training or inferencing on large models—important for cost, sustainability, and performance considerations. See episode 318: Measuring your ML impact with CodeCarbon for a deep dive into CodeCarbon.
  6. Responsible Use of LLMs and Smaller Models
    • The guests stress that LLMs are expensive to run and come with privacy, security, and ethical concerns (e.g., hallucinations, data leaks, biases).
    • They foresee a growing move toward more domain-specific or smaller models to reduce cost and complexity while still maintaining solid, explainable results.
  7. Advice for Beginners in Data Science
    • All speakers agree that understanding core data cleaning, measurement, and statistical methods is crucial—no code generator or AI assistant can replace deep problem-solving and domain knowledge.
    • They encourage companies to hire more junior data scientists and invest in mentorship, stating that the “science” part of data science will remain essential regardless of how advanced AI tooling becomes.

Overall Takeaway

Data science remains firmly grounded in traditional modeling, solid experimentation, and careful measurement. While there is undeniable excitement around large language models and new AI-powered tools, many day-to-day business problems still benefit most from simpler solutions with a clear return on investment. Tools like LangChain, Giskard, and CodeCarbon address everything from LLM pipelines and bias checks to sustainability concerns, underscoring a key theme: combining a solid foundation in core data science with thoughtful exploration of new technologies.

Links from the show

Jodie Burchell: @t_redactyl
Jessica Greene: linkedin.com
Maria Jose Molina-Contreras: linkedin.com

Talk Python's free Shiny course: talkpython.fm/shiny
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon