New course: Agentic AI for Python Devs

Building Data Science with Foundation LLM Models

Episode #526, published Sat, Nov 1, 2025, recorded Tue, Oct 7, 2025
Today, we’re talking about building real AI products with foundation models. Not toy demos, not vibes. We’ll get into the boring dashboards that save launches, evals that change your mind, and the shift from analyst to AI app builder. Our guide is Hugo Bowne-Anderson, educator, podcaster, and data scientist, who’s been in the trenches from scalable Python to LLM apps. If you care about shipping LLM features without burning the house down, stick around.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Hugo Bowne-Anderson returns to Talk Python To Me for his third appearance, bringing a wealth of experience from the evolution of the Python data science ecosystem. Hugo's journey spans academic research in biology, math, and physics at Yale University to becoming a key figure in data science education and developer relations. He played a pivotal role at DataCamp working on curriculum and education, then moved on to work with significant projects like Dask at Coiled with Matt Rocklin, and Metaflow at Outerbounds (originally from Netflix). Currently, Hugo works as a freelance consultant, advisor, and educator helping organizations build, ship, and maintain AI, LLM, ML, and data-powered products. He hosts the podcast "Vanishing Gradients," which focuses on conversations with industry practitioners about building data and AI-powered products. His unique positioning at the intersection of education, product development, and developer relations makes him an ideal guide through the modern landscape of foundation models meeting data science.

What to Know If You're New to Python

  • Foundation models and LLMs are reshaping data science workflows. Understanding the basics of how these models work (text in, text out, plus structured output capabilities) will help you leverage them effectively in your projects.
  • AI-assisted coding tools like Cursor, Copilot, and Continue are becoming essential parts of the development workflow. Learning to use these tools effectively requires understanding when to use them and when to code manually.
  • Evaluation-driven development is a critical skill for building LLM-powered applications. This mirrors the machine learning workflow of creating test sets, evaluating performance, and iterating based on results.
  • The modern PyData stack now includes tools like Polars (fast DataFrames), DuckDB (analytical database), Marimo (reactive notebooks), and uv (package management) alongside traditional tools like Pandas and Jupyter.

Hugo's seven levels of AI-assisted coding:

  • Level 0: Copy-pasting code snippets from Stack Overflow into your development environment.
  • Level 1: Copying code between ChatGPT and your IDE rather than using Stack Overflow.
  • Level 2: Code completion suggestions appearing directly in your IDE through tools like GitHub Copilot.
  • Level 3: AI agents integrated into your IDE or terminal (like Cursor) that can build complete applications from scratch.
  • Level 4: AI agents embedded in collaborative tools like Slack, Discord, or email that fix documentation or answer code questions.
  • Level 5: Proactive agents that automatically perform code reviews in CI/CD pipelines without being explicitly asked.
  • Level 6: Async or background agents that work independently on tasks while you focus on other work.
  • Level 7: Fully proactive agents that monitor production systems, detect anomalies, and surface insights to developers like a colleague would.

Agentic AI and Its Transformative Power for Data Science

AI-assisted programming has fundamentally changed how data scientists and developers work with code. The technology moves far beyond simple autocomplete to systems that can understand context, write entire applications, and even participate in code reviews. Hugo emphasizes that this isn't just about making programmers obsolete but rather about empowering those who already know what they're doing. The comparison to Stack Overflow is apt: just as developers learned to copy and paste code snippets responsibly, today's developers must learn to use AI-generated code thoughtfully. The key difference is that AI assistance requires active engagement and understanding rather than passive consumption. These tools supercharge productivity for tasks like writing Pandas code, generating SQL queries, or creating data visualization scripts, but they demand that users read, understand, and validate the generated code. The wisdom layer that Hugo's colleague Vilay describes captures this perfectly: the real value isn't in the code generation itself but in the knowledge and judgment applied to using these tools effectively.

  • Cursor: AI-first code editor built as a VS Code fork with integrated agentic capabilities
  • Continue: Open-source AI code assistant that works with local models for privacy-preserving development
  • Super Whisper: Voice dictation tool for macOS enabling hands-free interaction with coding agents
  • Mac Whisper: Alternative dictation tool using Whisper for voice-to-text
  • Devin: AI software engineer agent (mentioned as an example of autonomous coding)

Modern Tools Reshaping the Data Science Stack

The PyData ecosystem has seen remarkable evolution beyond just AI assistance, with new tools addressing fundamental performance and workflow challenges. Polars has emerged as a blazing-fast alternative to Pandas, built in Rust and offering significant performance improvements for data manipulation tasks. DuckDB has become the go-to in-process analytical database, providing SQLite-like simplicity with PostgreSQL-like analytical capabilities and exceptional speed for data analysis. Marimo represents the next generation of computational notebooks, addressing Jupyter's well-known issues with execution order by using the abstract syntax tree to build a directed acyclic graph of cell dependencies, ensuring reproducibility while maintaining the literate programming experience. The package management landscape has been revolutionized by uv, which offers dramatically faster performance than traditional pip and better integration with modern workflows. For the Conda ecosystem, Pixi provides similar improvements built on the lessons learned from Mamba. These tools aren't replacements that invalidate the classic stack but rather enhancements that address specific pain points while maintaining compatibility with existing workflows and knowledge.

  • Polars: Lightning-fast DataFrames library built in Rust with a Python API
  • DuckDB: In-process analytical database optimized for OLAP queries
  • Marimo: Reactive Python notebooks that solve execution order problems through AST analysis
  • uv: Ultra-fast Python package installer and resolver built in Rust
  • Pixi: Modern package management tool for multi-language projects
  • Mamba: Fast, cross-platform package manager (predecessor to Pixi efforts)

Python 3.14 and the Free-Threaded Future

Python 3.14's release on Pi Day (October 7, 2025) marks a significant milestone with the official inclusion of free-threaded Python, removing the Global Interpreter Lock's constraints on parallel execution. This development has particularly important implications for data scientists who frequently work with computationally intensive tasks that could benefit from true parallelism. However, Hugo and Michael discuss the nuanced reality that many data science workloads may not immediately benefit from this change. Foundation models and LLM-powered products often rely on API calls to external services or leverage models that someone else trained, shifting the computational burden away from the data scientist's local machine. There's a narrow but important band of use cases between simple scripting and heavy computation that would have been done in C++ or Rust anyway where free-threaded Python truly shines. The conversation acknowledges that while this is an important milestone for the language, the practical impact for many data scientists building LLM-powered applications may be less dramatic than for those doing traditional large-scale analytics, distributed computing, or training models from scratch.

  • Python 3.14 Release: Official Python release with free-threaded support
  • Dask: Parallel computing library for analytics that predates free-threaded Python
  • Fundamentals of Dask Course: Free course by Hugo Bowne-Anderson on parallel computing with Dask

Effective Prompting and Socratic Dialogue Development

One of the most critical skills for working with AI-assisted coding is learning to communicate effectively with the AI systems. Hugo introduces the concept of "Socratic and dialogue-driven development" coined by Isaac Flath, which emphasizes treating the AI as a pair programming partner rather than a magic code-generating black box. Before writing any code, developers should have extensive conversations with their AI assistant about the problem, the approach, and the architecture. This planning phase dramatically improves outcomes compared to immediately asking for code generation. The practice of writing product requirement documents with the AI before any implementation helps establish shared understanding and catch potential issues early. Using features like Cursor Rules allows developers to train their AI assistant on project-specific conventions and preferences, creating a form of project memory that persists across sessions. The dictation approach through tools like Super Whisper enables more thorough and patient prompting because speaking is less fatiguing than typing, leading to more detailed and context-rich instructions. The key insight is that these tools require active partnership rather than passive consumption, much like working with a highly enthusiastic junior developer who has perfect recall but needs clear direction.

  • Isaac Flath's Elite AI Assisted Coding Course: Course focusing on Socratic dialogue approach to AI coding
  • Cursor Rules: Project-specific and global rules system for customizing AI behavior in Cursor
  • System Prompts: Custom instructions for ChatGPT and other models to adjust behavior and tone

Common Gotchas and How to Avoid Them

Working with AI-assisted coding tools comes with a set of predictable challenges that experienced users learn to navigate. Dead looping is one of the most frustrating issues: the AI optimizes locally around each immediate problem, fixing one error only to introduce another, then cycling back to the original state. The solution involves prompting the AI to "zoom out" and take a holistic view of the system rather than fixating on the most recent error message. AI assistants frequently do things they weren't asked to do, like downloading packages unprompted or creating elaborate directory structures when told to keep everything in one file. They also regularly ignore explicit instructions, requiring repeated reminders and careful rule-setting. Memory issues plague longer conversations as the AI forgets earlier context or decisions, making it necessary to start fresh conversations periodically or explicitly ask the AI to summarize important context for transfer to a new session. The concept of treating the AI as an enthusiastic, somewhat scattered, incredibly knowledgeable intern with ADHD-like attention patterns helps set appropriate expectations. Git discipline becomes more important than ever in YOLO mode (auto-accept), with developers staging changes incrementally and watching diffs in real-time to catch issues before they compound. The planning folder approach, where each project maintains markdown files documenting the plan and progress, provides crucial continuity across sessions.

  • Version control best practices: Stage incrementally, watch diffs, commit frequently when using auto-accept mode
  • Planning folder pattern: Markdown files documenting project plans that are updated by AI as work progresses
  • Fresh conversation strategy: Summarizing context and starting new chats when sessions degrade

Exploratory Data Analysis with AI

AI tools excel at exploratory data analysis in ways that complement and enhance traditional data science workflows. When given a CSV file or dataset, modern LLMs can quickly identify patterns, clusters, and insights that might take human analysts hours to discover. Hugo shares an example of a client throwing thousands of rows of customer website data into an AI system, which immediately identified distinct clusters of power users (high usage, high spend), engaged but low-revenue users (high usage, low spend), and other segments that would typically require extensive manual analysis. The key advantage isn't just speed but also the ability to suggest visualizations and analytical approaches the data scientist might not have considered. However, critical thinking remains essential: while AI can write code to calculate means and medians correctly, the interpretation and validation of insights must remain human-driven. This exploratory capability extends to failure analysis in LLM-powered applications, where AI can examine conversation logs and identify patterns like users getting correct responses only after initially requesting human representatives (indicating a failure to route appropriately). The best practice involves using AI for hypothesis generation and initial pattern detection while applying domain expertise to validate and refine the insights before making business decisions.

  • ChatGPT, Claude, Gemini: Major LLM providers useful for data exploration
  • LangChain: Framework for building LLM applications including data analysis tools
  • Plotly: Interactive visualization library commonly used by AI for generating charts

Building LLM-Powered Applications: The Excitement Curve

Hugo presents a provocative and insightful visualization of the LLM application development experience compared to traditional software. Traditional software development starts boring with hello world and basic features, gradually building excitement as you add unit tests, scale, optimize, and deploy. LLM-powered software inverts this curve completely: you start with a flashy, impressive demo that generates tremendous excitement and dopamine. Then reality sets in as you discover basic functionality issues, hallucinations, monitoring challenges, and integration complexities, with excitement declining at each stage. Hugo suspects it's not coincidental that the most addictive technology of a generation emerged during a time of Instagram-driven instant gratification culture. The mission for serious builders is to raise the entire curve, not by reducing the initial demo excitement but by preventing the subsequent decline. This requires embracing evaluation-driven development, proper monitoring and observability, systematic error analysis, and treating LLM app development more like traditional machine learning projects than like building standard web applications. The parallels to ML development are strong: you need labeled data (pass/fail examples), failure mode classification (hallucination, retrieval error, wrong tool call), and systematic iteration based on what the data reveals about your system's performance.

  • Evaluation-driven development workflow: Generate data, label pass/fail, classify failure modes, iterate
  • Demo-first approach: Building impressive prototypes that mask underlying complexity
  • Production readiness gap: The distance between working demo and reliable production system

Evaluation-Driven Development for LLM Applications

The methodology Hugo teaches in his Maven course with Stefan Krauchik centers on bringing machine learning discipline to LLM application development. The process begins by getting data flowing through your system, even if that means generating synthetic data before you have real users. Each input-output pair gets labeled as pass or fail, and failures are classified by mode: hallucination, retrieval error, wrong tool call, incorrect output format, or other categories specific to your application. This labeled dataset becomes your evaluation set, analogous to a test set in traditional ML. Using something as simple as a pivot table in a spreadsheet, you can rank-order failure modes by frequency to identify what to fix first. If retrieval errors dominate, focus on your RAG pipeline: embeddings, chunking strategy, or document preprocessing. If tool call errors are most common, refine how tools are defined and documented for the LLM. The architecture diagram approach helps identify which component to improve, recognizing that fixing OCR on PDFs often provides more lift than switching to the latest Sonnet or GPT model. Once you have a solid eval set with coverage across failure modes, you can objectively compare model performance rather than relying on vibes. This systematic approach transforms LLM development from an art into an engineering discipline, making it possible to ship reliable products instead of impressive demos that fail in production.

  • Hamel Hussain's work on evals: Prominent figure in LLM evaluation practices
  • Pivot tables and spreadsheets: Simple but effective tools for early-stage failure analysis
  • RAG pipeline components: Embeddings, chunking, retrieval strategies, and document processing
  • Building LLM Powered Applications Course: Hugo's course with Stefan Krauchik on systematic LLM app development

Tests as Specifications in the AI Era

Traditional test-driven development takes on new meaning when building applications with AI assistance. Tests become specifications that AI agents can understand and work toward, providing concrete success criteria that are more reliable than natural language descriptions. Writing comprehensive tests before implementation gives the AI clear targets and constraints, dramatically improving the quality of generated code. Furthermore, AI assistants excel at writing tests for code, creating unit tests, integration tests, and property-based tests that human developers might skip due to time pressure or tedium. The practice of asking AI to add assertions throughout code helps catch issues early and makes the codebase more maintainable. When combined with evaluation-driven development for LLM features, this creates a comprehensive quality framework where traditional code has traditional tests and LLM-powered features have eval sets. The investment in testing infrastructure pays dividends when refactoring or updating dependencies because the test suite provides immediate feedback on what broke. For data science applications specifically, tests can encode domain knowledge about expected data ranges, relationships, and invariants that might otherwise only exist in tribal knowledge.

  • Test-driven development with AI: Writing tests first as specifications for AI to follow
  • AI-generated test suites: Using AI to create comprehensive test coverage
  • Assertions and error checking: Defensive programming enhanced by AI assistance

The Changing Surface Area of Software

One of Hugo's most thought-provoking insights involves how AI-assisted coding fundamentally changes what software can be and who it serves. Historically, software has been expensive to build, requiring teams of well-compensated engineers. This economic reality meant that software needed large markets to justify development costs, leading to feature-rich applications trying to serve many use cases and edge cases to maximize revenue potential. With AI dramatically reducing development costs and time, the economics shift completely. Internal tools that would never have been built become viable when they take hours instead of weeks. Personal utilities for small user groups become worthwhile. Ephemeral or disposable software that solves an immediate problem then gets discarded becomes a reasonable category. This isn't about AI replacing Salesforce or major SaaS platforms but rather about opening up an entirely new middle ground of software that previously didn't exist. A chess tutor app for a hundred friends, a custom marketing automation stack for a specific team, bespoke data viewers for particular analysis needs - these represent a new category of "fast software" or "just-in-time software" that changes what developers can accomplish. The implications extend to dependency management, where generating simple utility code becomes preferable to taking on entire dependency trees for minor functionality.

  • Ephemeral software: Applications built quickly for temporary or specific needs
  • Just-in-time software: Building tools exactly when needed rather than planning far in advance
  • Internal tooling: Custom applications for specific teams or workflows
  • Personal utilities: Small applications serving niche use cases or small user groups

Prompt Engineering Over Premature Optimization

When building LLM-powered applications, the temptation to immediately jump to advanced techniques like fine-tuning or switching models must be resisted. Hugo emphasizes "prompt and prompt and prompt initially" because tremendous improvements can come from better prompting alone. Before considering fine-tuning, developers should exhaust prompt engineering, add relevant examples (few-shot learning), improve system messages, and refine tool descriptions. When retrieval errors occur, the fix often involves prompting the system to better understand the schema or data structure rather than switching embedding models. The architecture diagram exercise reveals that seemingly LLM-related problems often stem from upstream issues: bad OCR on PDF documents, poor data quality, inadequate chunking strategies, or incorrect metadata. Fixing these fundamentals typically provides more lift than upgrading to the latest Sonnet or GPT release. The newest and sexiest model might help, but only after getting the fundamentals right and establishing an eval set to measure improvement objectively. This mirrors traditional software optimization advice about premature optimization being the root of evil: profile first to identify bottlenecks rather than optimizing based on intuition. The same discipline applies to LLM applications, where measurement and systematic improvement beat speculation and premature sophistication.

  • Focus on prompting first: Exhaust prompt engineering before considering advanced techniques
  • Architecture diagrams: Visual mapping of system components to identify root causes
  • Data quality and preprocessing: Often more impactful than model selection
  • Measurement before optimization: Establish eval sets before making changes

The Future: Proactive Agents and Background Automation

The emerging frontier of AI-assisted development involves agents that don't wait for instructions but actively monitor systems and bring insights or issues to developers' attention. These proactive agents can notice production anomalies, identify trends in user behavior, cluster related issues, or highlight opportunities for optimization without being explicitly asked. Background agents already exist in non-coding contexts, like email inbox monitors that cluster messages by topic and priority, surfacing high-value client communications that deserve immediate attention. In the coding sphere, tools like Sentry's SEER represent early versions of this future: when an exception occurs in production, the system automatically analyzes it against the codebase, potentially suggesting fixes or even creating pull requests before the developer has investigated. Hugo envisions a Monday morning where instead of wading through logs and metrics, developers receive a curated briefing from their AI colleague highlighting what matters: "Check this out, this looks interesting, this might be a problem." The parallel to the industrial revolution is apt: just as looms created entire satellite industries, AI agents will create new categories of jobs and specializations we can't fully anticipate yet. The role of managing multiple agents, ensuring their work integrates properly, and handling CICD for AI-generated code represents entirely new problem spaces requiring new tools and practices.

  • Sentry SEER: AI-powered error analysis and fix suggestions
  • Proactive monitoring agents: Systems that watch for issues and surface insights
  • Background automation: Agents that work independently on lower-priority tasks
  • Agent management: Emerging skill of coordinating multiple AI agents working simultaneously

Advice for Early-Career Data Scientists

For those just entering the field or early in their careers, the landscape can seem daunting with AI tools potentially automating aspects of data science work. Hugo's advice focuses on three pillars: value delivery, skill development, and connecting work to business outcomes. Data scientists must focus on their core skills of exploring data, building systems, and deriving insights while using AI tools to enhance rather than replace these capabilities. The key differentiator isn't coding speed but the ability to ask the right questions, validate results critically, and translate technical findings into business value. Early-career professionals should resist the temptation to let AI tools make them passive consumers of generated code; instead, they must actively engage with every line, questioning why the AI made particular choices and using those moments as learning opportunities. Building a portfolio of projects that demonstrate business impact matters more than showcasing coding prowess alone. The alignment with business value that Hugo illustrates through Lorikeet's pricing model (charging per resolved ticket rather than per token) captures this mindset perfectly: ultimate success comes from solving real problems, not from technical sophistication for its own sake. Carving out time to experiment with emerging AI tools is essential, accepting that not every experiment will succeed but that developing fluency with these tools is now a core competency.

  • Lorikeet AI: Customer support automation company mentioned for business-value-first pricing
  • Focus on business value: Connecting technical work to concrete business outcomes
  • Active learning: Using AI-generated code as learning opportunities
  • Portfolio building: Demonstrating impact through real projects

Interesting Quotes and Stories

"DevRel is the wisdom layer, and it's firmly beside product as a pillar." - Vilay at Outerbounds, quoted by Hugo

"What type of human don't you need to have a conversation with to learn stuff?" - Jeremy Howard, on complaints about needing to iterate with ChatGPT

"I never particularly enjoyed writing Pandas code. If AI can help me write my Pandas code, I read it, make sure it's all good in the hood, and then I get to focus on building systems. I think that's a huge win." - Hugo Bowne-Anderson

"You can build a not insignificant amount of software with your voice and three buttons." - Hugo on using Super Whisper and Stream Deck for dictation-driven development

"We were all copy and pasting from Stack Overflow. We've been doing that for a long time. So in some ways, we're scaling and superpowering that behavior." - Hugo on AI-assisted coding

"The surface area of what software is, is expanding and changing completely." - Hugo on how AI changes software economics

"I would have to fire myself if I didn't talk about the other way, which is AI helping us do data science." - Hugo on AI's bidirectional relationship with data science

"Think of it as a super excited, somewhat scatterbrain junior helper. If you had hired somebody, even if they went to Stanford, but they hadn't really done work on any major data science projects, would you expect 100% correctness?" - Michael Kennedy on setting AI expectations

"One of the biggest wins here is being able to vibe code your own data viewers." - Hugo on practical applications of AI coding

"Focus on the fundamentals. When you have this set of evals, you can see how it performed on your test set. Imagine being able to switch out a model and seeing what's up there." - Hugo on evaluation-driven development

"Focus on three things: what value you can deliver, what's your skill as a data scientist, and tie that to business value. Build, build, build and consistently tie it to business value." - Hugo's advice for early-career data scientists

Key Definitions and Terms

Agentic Coding: Development approach where AI agents can autonomously write, modify, and test code based on natural language instructions, going beyond simple autocomplete to understanding context and generating complete solutions.

RAG (Retrieval-Augmented Generation): Architecture pattern where an LLM is augmented with the ability to retrieve relevant documents or information from a knowledge base before generating responses, improving accuracy and reducing hallucinations.

Tool Calls: Actions an AI agent can take beyond text generation, such as calling APIs, sending emails, querying databases, or executing functions. An agent is defined as an LLM plus tool calls in a control loop.

Dead Looping: A failure mode where an AI assistant gets stuck cycling through the same errors repeatedly, fixing problem A which causes problem B which causes problem C which causes problem A again, without making progress.

Evaluation-Driven Development: Methodology for building LLM applications that mirrors ML best practices: collecting labeled examples (pass/fail), classifying failure modes, prioritizing fixes based on frequency, and objectively measuring improvements.

Vibe Coding: Informal term for rapidly building software with AI assistance based on rough specifications or feelings about what's needed, without detailed upfront planning. Can produce working prototypes quickly but requires validation.

YOLO Mode: Auto-accept mode in AI coding tools where the system automatically applies suggested changes without requiring manual approval for each modification. Requires strong version control discipline.

Cursor Rules: Project-specific or global configuration in Cursor that teaches the AI assistant about conventions, preferences, and requirements to improve code generation quality across sessions.

Free-Threaded Python: Python implementation without the Global Interpreter Lock (GIL), allowing true parallel execution of Python code across multiple CPU cores, officially supported starting in Python 3.14.

Eval Set: Collection of input-output pairs labeled with expected outcomes, used to measure LLM application performance objectively. Analogous to a test set in traditional machine learning.

Failure Modes: Categories of errors in LLM applications such as hallucination (generating false information), retrieval errors (finding wrong documents), tool call errors (calling wrong functions or with wrong parameters), or incorrect output formatting.

Proof of Concept Purgatory: The valley of disappointment after an impressive LLM demo where excitement decreases as developers encounter real-world challenges like hallucinations, integration issues, and monitoring complexity.

Socratic Dialogue Development: Approach coined by Isaac Flath emphasizing conversation with AI before code generation, treating the AI as a pair programming partner to establish shared understanding.

Learning Resources

Want to go deeper into the topics covered in this episode? Here are resources to expand your knowledge and develop practical skills in modern data science and LLM-powered application development.

Building LLM Powered Applications for Data Science and Software Engineers: Hugo Bowne-Anderson and Stefan Krauchik's Maven course covering evaluation-driven development, systematic failure analysis, and best practices for shipping reliable LLM applications. Talk Python listeners get 20% off.

LLM Building Blocks for Python Course: Concise 1.2-hour video course teaching everything needed to integrate large language models into Python applications, from structured data to async pipelines.

Data Science Jumpstart with 10 Projects: Matt Harrison's practical course covering exploratory data analysis, data cleanup, visualization, and machine learning without drowning in theory.

Just Enough Python for Data Scientists Course: Bridges the gap from notebook-based analysis to production-quality code with essential Python skills, including modern AI tools for refactoring and testing.

Fundamentals of Dask: Free course by Hugo Bowne-Anderson on parallelizing Python computation across cores and clusters, essential background for understanding when free-threaded Python matters.

Python for Absolute Beginners: If you're completely new to Python, start here with comprehensive coverage of fundamentals at a beginner's pace.

Vanishing Gradients Podcast: Hugo's podcast featuring conversations with industry practitioners about building and shipping data-powered products, including episodes on evals, NASA's use of AI, and real-world LLM deployment.

Elite AI Assisted Coding Course on Maven: Isaac Flath's course on Socratic dialogue-driven development and advanced AI-assisted coding techniques.

Hamel Hussain on Evals - Vanishing Gradients Episode: Deep dive into evaluation practices for LLM applications with one of the leading voices in the space.

The End of Programming As We Know It by Tim O'Reilly: Essay exploring how AI is changing the nature of software development, discussed in Hugo's podcast.

Overall Takeaway

The intersection of foundation models and data science represents not a replacement of existing skills but a profound expansion of what's possible. Hugo Bowne-Anderson's journey from academic biology through the evolution of PyData to today's LLM-powered landscape embodies the continuous adaptation required in this field. The message is clear: data scientists who embrace AI-assisted tools while maintaining their core competencies of data exploration, critical thinking, and connection to business value will find themselves empowered rather than replaced.

The modern data science stack now spans from classic tools like Pandas and Jupyter to cutting-edge alternatives like Polars and Marimo, from traditional package management to lightning-fast uv, and from manual coding to agentic AI assistants that can scaffold entire applications. But the fundamentals remain: you must understand what your code does, validate that your analyses are correct, and ensure your work delivers real value. The most successful practitioners treat AI tools as enthusiastic junior colleagues with perfect recall and impressive breadth but requiring direction, validation, and the wisdom that only human experience provides.

The shift from proof-of-concept excitement to production reliability requires discipline borrowed from machine learning: evaluation-driven development, systematic failure analysis, and objective measurement over vibes. The expanding surface area of software - internal tools, ephemeral applications, personal utilities - creates opportunities that didn't exist when software required large teams and justified only by serving massive markets. For those entering the field now, the challenge isn't competing with AI but rather learning to leverage it while developing the irreplaceable skills of asking the right questions, identifying what matters, and translating technical capability into business impact.

We're still in the early days of this transformation. The tools are evolving rapidly, best practices are emerging in real-time, and the full implications remain to be seen. But one thing is certain: the boring dashboards, careful evals, and systematic approaches that Hugo advocates for aren't just good engineering practice - they're the difference between impressive demos and products that actually ship and serve users reliably. The future belongs to those who can manage both the exciting creative possibilities and the unglamorous discipline required to make AI-powered systems work in production.

Hugo Bowne-Anderson: x.com
Vanishing Gradients Podcast: vanishinggradients.fireside.fm
Fundamentals of Dask: High Performance Data Science Course: training.talkpython.fm
Building LLM Applications for Data Scientists and Software Engineers: maven.com
marimo: a next-generation Python notebook: marimo.io
DevDocs (Offline aggregated docs): devdocs.io
Elgato Stream Deck: elgato.com
Sentry's Seer: talkpython.fm
The End of Programming as We Know It: oreilly.com
LorikeetCX AI Concierge: lorikeetcx.ai
Text to SQL & AI Query Generator: text2sql.ai
Inverse relationship enthusiasm for AI and traditional projects: oreilly.com

Watch this episode on YouTube: youtube.com
Episode #526 deep-dive: talkpython.fm/526
Episode transcripts: talkpython.fm

Theme Song: Developer Rap
🥁 Served in a Flask 🎸: talkpython.fm/flasksong

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy

Episode Transcript

Collapse transcript

00:00 AI has changed how we write code and data science is right in the blast radius.

00:04 Today we move past autocomplete to systems that file PRs, comment in Slack, and even police our CI.

00:11 Hugo Bown Anderson is back to map the new stack.

00:14 When classic PyData still wins, where small local models beat the cloud, and how tests become your specs.

00:21 We dig into cursor and copilot, proactive agents, and practical patterns you can ship this week.

00:26 This episode is all about leveling up your data science workflow, not replacing it.

00:31 This is Talk Python To Me, episode 526, recorded October 7th, 2025.

00:55 Welcome to Talk Python To Me, the number one podcast for Python developers and data scientists.

01:00 This is your host, Michael Kennedy. I'm a PSF fellow who's been coding for over 25 years.

01:05 Let's connect on social media. You'll find me and Talk Python on Mastodon, BlueSky, and X.

01:11 The social links are all in the show notes. You can find over 10 years of past episodes at

01:16 Talk Python.fm. And if you want to be part of the show, you can join our recording live streams.

01:21 That's right. We live streamed the raw uncut version of each episode on YouTube.

01:26 Just visit talkpython.fm/youtube to see the schedule of upcoming events.

01:31 And be sure to subscribe and press the bell so you'll get notified anytime we're recording.

01:36 This episode is sponsored by Posit Connect from the makers of Shiny.

01:40 Publish, share, and deploy all of your data projects that you're creating using Python.

01:45 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.

01:52 Posit Connect supports all of them.

01:54 Try Posit Connect for free by going to talkpython.fm/posit, P-O-S-I-T.

02:00 And it's brought to you by Nordstellar.

02:02 Nordstellar is a threat exposure management platform from the Nord security family,

02:07 the folks behind NordVPN, that combines dark web intelligence, session hijacking prevention,

02:13 brand and domain abuse detection, and external attack surface management.

02:18 Learn more and get started keeping your team safe at talkpython.fm/nordstellar.

02:24 Hey, I want to take just a minute and talk to you guys.

02:26 I just released a really cool new course called Agentic AI Programming for Python Developers and Data Scientists.

02:33 You've heard me mention a couple times on the podcast

02:36 how I've had some incredible success with some of these agentic AI coding tools.

02:41 I hear people talking about how they're not really working for them.

02:45 And then I look at the results that I'm getting and think, wow, that's something that would

02:49 have taken two weeks.

02:50 It's built in two hours and it's well factored and good looking code.

02:56 What gives?

02:57 Why is this difference here?

02:58 Well, I decided to create this course to share all the things that I'm doing with these agentic

03:05 coding tools with the idea of making you as successful and productive as well.

03:09 Yes, I know we're all tired about hearing about how AI is going to change everything for software developers.

03:15 But there are some tools here that will give you truly difference-making levels of productivity.

03:21 And that's what this course is about.

03:23 So check it out at talkpython.fm/agentic AI.

03:27 The link's in your podcast player show notes.

03:29 Let's get to the interview.

03:31 Hey, Hugo. Welcome to Talk Python.

03:33 Awesome to have you, man.

03:33 Such a pleasure to be here and to be back.

03:36 I think this is my third time over the years on Talk Python.

03:39 I do think it is your third time.

03:41 It's been a lot of fun data science things.

03:43 And I think people will see that our conversation, the thing that we're going to really focus on this time,

03:49 will have evolved a little bit as the times have changed.

03:52 Since the last couple of years, the data science and the entirety of programming has gotten a little different.

03:57 Actually, just our conversations over the years, like kind of art can be a symbol of the trajectory

04:03 from like early days of data science, PyDataStack, all of these amazing things

04:07 as it went from academia to industry.

04:09 then to, you know, large-scale distributed compute

04:11 when I came and talked about Dask and Coiled.

04:14 And now a couple of years after our quote-unquote GPT moment

04:17 and our ChatGPT moment and our stable diffusion moment

04:19 here to talk about LLM's foundation models meet data science.

04:22 That's right.

04:22 We've gone from local machine to cloud, AI.

04:27 Wow.

04:27 I do see that we'll actually maybe be back.

04:29 I see a world where we build together smaller models that maybe run locally, doing other things,

04:36 Maybe some servers, connect MCP servers, connecting all these things, like little special agents.

04:41 That's going to be something we can potentially dive into.

04:44 We'll see.

04:44 But I think the arc is not done.

04:46 The other thing that comes to mind is I do, and we'll get into this with AI-assisted programming,

04:50 which I think superpowers people who know what they're doing, but may not be great for beginners.

04:56 I do imagine we might have boot camps or classes where you're in a cave and no access to AI

05:02 or the internet and you learn, you actually learn to code on a laptop without any of this stuff.

05:07 Yeah. There's going to be a local PyPI and be a local set of docs. You know, there was this really

05:12 cool app. Gosh, I've covered on Python bytes, the news podcast I do. And I wish I could remember,

05:18 but what it would do is it had like hundreds, maybe thousands of different projects like Flask

05:24 or Tailwind or whatever, Span Technologies. And you could click it and say, I want to have Flask

05:30 3.1 offline, get me the docs.

05:33 And you could create like a catalog of all these different projects you're using,

05:36 all offline searchable docs across.

05:39 Amazing.

05:39 I feel like something like that might come back.

05:41 You know what I mean?

05:42 It's funny you mentioned that because I was chatting with Innes Montani

05:44 of spaCy and Prodigy fame the other night.

05:47 She's here in Sydney.

05:48 And she reminded me that Sebastian Ramirez of FastAPI,

05:52 when he was building FastAPI originally, I think what she reminded me was

05:56 they didn't have great internet access where they were.

05:59 So they did download a lot of things at very slow speeds, then build everything locally.

06:04 I'm reliving that, by the way, in a very weird way.

06:07 My fiber modem died.

06:08 And if anybody watches the video, they'll see that I'm actually in the library here, which is great.

06:12 I preserve a little private room.

06:13 I got high speed internet here.

06:15 But at home, I'm tethered with one bar of LTE.

06:18 So anything I do is at like 30 kilobits.

06:20 And it brings me back to my youth.

06:22 But boy, you got to decide what you want to do next.

06:24 That's it.

06:25 Yeah.

06:25 So I do, circling back and closing this up, I do think this true learning the code thing is both,

06:32 it's going to be something that comes back around.

06:33 I think it's actually a challenge with all the tools and cheats.

06:38 I don't consider them bad cheats, but the things that can do the work for you,

06:41 it requires a lot of willpower to stay focused.

06:44 And I do think it's going to be kind of a kobold moment as well,

06:47 where somewhere down the line, people are going to be like,

06:49 we need to just get some people that used to type this stuff in by hand.

06:53 And we need them to look at it and figure out why this doesn't work.

06:55 Without a doubt.

06:56 And, you know, learn Python the hard way.

06:58 Learn X the hard way.

06:59 Like sometimes most things you got to do the work, right?

07:02 You definitely do.

07:03 Well, Hugo, before we get into the topic too much, quick introduction on yourself.

07:08 Who are you?

07:09 I know you've been on the show a few times, but it's spanned many years and I say it often,

07:12 but it's always worth repeating.

07:13 And like 50% of the people in the Python community are new over the last two years.

07:18 Like they've only been here two years or less, which blows my mind.

07:21 So those people probably want to listen to the podcast before they got into Python.

07:23 Who are you?

07:24 So firstly, what is up Python community?

07:26 Clearly, I'm a huge fan of Python.

07:28 Used it for many years.

07:28 Love it.

07:29 Background many moons ago in scientific research, biology, math, physics.

07:33 Was working in academic research at Yale University, New Haven, Connecticut.

07:37 Living in New York City just over a decade ago.

07:40 The data science ML meetups, hackathons there blew my mind so much.

07:44 Moved to industry, a small startup at the time, Datacamp.

07:48 Worked on curriculum, education, internal data science product.

07:52 Wore many hats, as you do.

07:53 And worked a lot on Pythonic education there.

07:56 Since then, I've been working in a mixture of DevRel, marketing, product.

08:00 A year and a half ago on wonderful projects, such as Dask at Coiled with Matt Rocklin,

08:04 then Metaflow out of Netflix with the wonderful team at Outer Bounds.

08:08 A year and a half ago or so, space was so exciting, man.

08:12 I decided to go freelance.

08:13 And so I mixed my time, essentially helping people build, ship, and maintain AI, LLM, ML, data-powered products more generally.

08:23 I do this through consulting.

08:25 I do it through advising.

08:27 I do it through education and developer relations.

08:30 So helping open source frameworks and products reach developers and getting material that

08:35 helps them.

08:36 My former colleague and boss at Outer Bounds, Vilay, who really gets developer relations,

08:41 refers to DevRel as the wisdom layer.

08:44 And he puts it firmly beside product as a pillar.

08:47 And I love that because I think a lot of the time we consider education or DevRel as a

08:52 necessary thing you have to do as opposed to being at a foundational pillar. And once again,

08:58 that's why I'm in such admiration of the work you do in bringing so many resources to the community

09:03 at large. Thank you. And foreshadowing a little bit, I would like to kind of reinforce that quote

09:09 you just said. It's the wisdom layer. Like as data scientists, your job is to provide insight

09:15 and knowledge and trends, forecasting. Developers, our job is to provide solutions and things that

09:22 that we can use apps and tools and whatnot. And I think a lot of us, myself included,

09:27 get tied down and like, oh, I'm really good at coding and I'm good at this library.

09:32 And we can kind of forget that like the real first tier job of ours is to provide answers

09:39 and solutions and apps. And I think a lot of the pushback on AI is like, it's taken my coding.

09:44 One way I think about it is it has taken my coding in some ways, but as we'll get to,

09:49 I never particularly enjoyed writing, love Pandas, never particularly enjoyed writing

09:54 Pandas code, for example, incredible tool.

09:56 But if I can help me write my Pandas code, I read it, make sure it's all good in the

10:01 hood.

10:01 And then I get to focus on building systems.

10:04 I think that's a huge win.

10:05 100%.

10:06 I would be remiss to not give a little shout out to a couple of things that you have done

10:10 or are doing.

10:11 Take it chronologically.

10:13 A while ago, you worked on the fundamentals of Dask, high performance data science course

10:18 over at Talk Python.

10:18 This course is 100% free.

10:20 People want to dive into it and learn from you.

10:22 They can absolutely take it.

10:23 I might even take it now.

10:24 It's free.

10:24 It's just over an hour.

10:26 Yeah, people can drop in and I'll be sure to put a link in the show notes for that.

10:29 So that's awesome.

10:30 That was really fun to build with you as well, Michael.

10:33 That was during the early days of COVID we were working on that.

10:35 So, you know.

10:36 What else do we have to do?

10:37 No, it was really great working on it with you as well.

10:40 I appreciate that tons.

10:41 And then you since then have started a data podcast called Vanishing Gradients.

10:47 Exactly.

10:48 Tell people about that.

10:49 So this is a podcast and I still call it a data podcast.

10:52 Although a lot of people like you have to call it AI, Hugo.

10:54 And AI is data.

10:56 And as we'll get to a lot of the principles in building AI powered products are the same.

11:00 Modulo implementation details of building data powered products.

11:03 It's a podcast where I talk with industry practitioners and builders about what they're

11:09 doing in the space, how they're building, and essentially trying to help propagate knowledge

11:13 from the bleeding edge back to builders and leaders in the space. So recently had a conversation

11:20 with Hamil Hussain there, who he's the evals guy, among other things, but all about the eval space

11:26 and how you can use evaluation in the LLM powered software development lifecycle to improve your

11:32 product. I've spoken with Shell Genteman at NASA, conversations with Jeremy Howard. So one of the

11:40 things, as I'm sure you do, I love about podcasting is I get to invite people I admire and who I think

11:45 are awesome to chat about stuff and then share it with the public. So that's the rationale there.

11:50 But it's really to help people propagate knowledge, wisdom, and skills back the adoption curve.

11:56 Along the adoption curve. That's good work. Who should listen to it? Beginners, experts,

12:00 data people, programmers? Everyone should listen to it. Everyone who's interested in building and

12:05 shipping data-powered stuff. And the way I actually chat about it with guests is the first third of

12:11 any conversation, I want everyone to understand, everyone who's somewhat technical. The middle,

12:16 we can go a bit deeper. And the third is a free-for-all. I definitely encourage everyone to

12:21 jump in. And we've got specific episodes on evals, of course, and that type of stuff.

12:26 But we also have industry-specific episodes, such as chatting about what was happening in the

12:31 early days of shipping LLM-powered software at Honeycomb or at NASA and these types of places.

12:36 That is one of the little secrets of being podcast hosts is you get to talk to people about

12:41 amazing stuff. You're like, huh, it'd be really cool to talk to the people that made that

12:45 Fusion breakthrough. They did Python. Why don't we invite them? And don't draw by,

12:48 you know, that's amazing. And I get like a lot of my friends who work in the space connect me

12:51 with other people. So I'm actually chatting with a data leader at Mozilla and then the VP of

12:55 learning at Duolingo. So we're going to have a lot of really fun episodes coming up. What's up with

12:59 the name, Vanishing Gradients? Where'd that come from? There's the Vanishing Gradient problem

13:04 in deep learning. So when you do stochastic gradient descent, you compute gradients and

13:08 climb down in order to optimize neural networks. And there's a challenge that sometimes gradients

13:14 vanish and you stop learning. So the rationale was, what happens when you stop learning? And

13:20 let's bring back the idea of learning in this space. The opposite, of course, is the exploding

13:24 gradients problem, which I also considered calling it, where the gradients just explode, of course,

13:29 But we went with vanishing for that reason.

13:30 I like that.

13:31 That's a very clever, very subtle.

13:34 Nice.

13:34 So let's talk data science in 2025.

13:38 And to be clear, I didn't ask, let's talk using AI for data science.

13:44 Let's talk data science in 2025.

13:46 And surely, I think there's two things here.

13:48 I think there's some really interesting, what I don't know how we want to think about,

13:51 like pure programming libraries and tools that are super powerful.

13:56 And we could give a quick shout out to some of them.

13:58 But then also, anytime you're exploring data, using some of these LLMs, especially the agentic

14:04 tooling, it's a game changer.

14:05 So let's start with the first one.

14:07 What tools, you know, things like polars maybe or whatever is like jumping out at you over

14:12 the last year or so that's like, wow.

14:14 I chatted about this on Vanishing Gradients with Akshay Agrawal, who built Marimo and

14:20 develops Marimo, which I encourage everyone to check out.

14:23 So let's actually rewind slightly and think about what we've been using over the past decade,

14:29 plus plus.

14:29 And it's the PyData stack, Jupyter Notebooks, Pandas, SQL, SQLite databases, and in production,

14:37 maybe Postgres and these types of things.

14:40 And how has this evolved now?

14:42 What are modern, really cutting edge tools that we use in similar ways?

14:46 You mentioned polars.

14:47 I totally agree that this is something we're seeing a lot of activity on and a lot of use

14:52 on.

14:52 On the database side, we've got DuckDB, right?

14:54 DuckDB is making a huge impact.

14:56 Beautiful to use, but it's so fast as well, right?

14:59 And I mean, and that's what you want there.

15:01 And then on the literate programming side, you've got Marimo, which I'm a huge fan of.

15:08 This portion of Talk Python To Me is brought to you by the folks at Posit.

15:11 Posit has made a huge investment in the Python community lately.

15:15 Known originally for RStudio, they've been building out a suite of tools

15:18 and services for Team Python.

15:21 Over the past few years, we've all learned some pretty scary terms.

15:25 Hypersquatting, supply chain attack, obfuscated code, and more.

15:30 These all orbit around the idea that when you install Python packages,

15:34 you're effectively running arbitrary code off the internet on your dev machine,

15:38 and usually even on your servers.

15:41 The thought alone makes me shudder, and this doesn't even touch the reproducibility issues surrounding external packages.

15:47 But there are tools to help.

15:49 Posit Package Manager can solve both problems for you.

15:52 Think of Posit Package Manager as your personal package concierge.

15:56 You use it to build your own package repositories within your firewall that keep your project safe.

16:00 You can upload your own internal packages to share or import packages directly from PyPI.

16:06 Your team members can install from these repos in normal ways using tools like pip, Poetry, and uv.

16:12 Posit Package Manager can help you manage updates, ensuring you're using the latest,

16:17 most secure versions of your packages.

16:19 But it also takes point-in-time snapshots of your repos,

16:23 which you can use to rerun your code reproducibly in the future.

16:27 Posit Package Manager reports on packages with known CVEs and other vulnerabilities

16:31 so you can keep ahead of threats.

16:34 And if you need the highest level of security, you can even run Posit Package Manager

16:38 in air-gapped environments.

16:40 If you work on a data science team where security matters,

16:42 You owe it to you and your org to check out Posit Package Manager.

16:46 Visit talkpython.fm/ppm today and get a three-month free trial to see if it's a good fit.

16:52 That's talkpython.fm/ppm.

16:55 The link is in your podcast player's show notes.

16:57 Thank you to Posit for supporting the show.

17:00 I still use Jupyter Notebooks, but one thing Marimo affords me,

17:04 because it's actually a.py file as well, you can convert them.

17:08 Well, they're essentially scripts as well.

17:09 So the notebook to production story is really interesting there.

17:13 I think Marimo is super interesting.

17:16 I think when I look at it, when I see people working with it or when I work with it, the

17:20 limited extent to which I have, it just looks smooth and polished and modern.

17:26 And I don't know, I just, when I use it, I feel, feel like it's something great.

17:29 It also solves the problem that while JupyterNetbooks or JupyterNetbooks, JupyterLab, whatever,

17:35 in general is like an incredible tool for data exploration and presenting data. It has this,

17:42 this crazy implicit go-to sort of sequence, right? Like if you don't just go run all cells

17:48 and you start bouncing around, you end up potentially running stuff out of order or

17:52 skipping a step that would have made a different answer, the step below. And that's, that's real

17:56 dangerous. And so Marimo uses the abstract syntax tree to look at dependencies across

18:02 cells and make sure they run in order, which I think is an underappreciated benefit.

18:07 It's like, oh, that's kind of nice.

18:08 Like, no, like, do you want the wrong answer or the right answer?

18:11 This is really important in data science and science in general.

18:14 You're right.

18:14 My understanding is it uses the AST to build a DAG of cells and execute some.

18:19 And what that means is, yeah, and it means you can't redefine something in a cell below,

18:24 but it'll give you a scratch pad to do so if you want to.

18:27 Now, I just want to say that's fantastic for a lot of cases.

18:30 There are cases when you just want to explore an experiment where Jupyter notebooks like

18:35 absolutely excel.

18:36 So it's not an either or here as well.

18:38 And I do want to say Jupyter notebooks, in all honesty, like get a bunch of hate for that.

18:43 And neither you nor I feel that way.

18:45 But I just want to be very explicit that that's not what we're saying here at all.

18:49 Yeah, I have a lot of reverence for notebooks.

18:52 Not only did they change the game for data science in general, but they changed it for

18:57 Python.

18:58 So if you look at the popularity and the people participating in Python,

19:02 like one of its really powerful aspects is people are coming from all these different angles

19:06 with different ideas and perspectives and different tools they want to build and so on.

19:10 That's made it so rich.

19:11 And that started basically in 2012 with the PyData stack with notebooks and all of that.

19:18 Yeah, I remember the first notebook I opened was called an IPython notebook,

19:22 not even a Jupyter notebook.

19:23 And of course, it's all based around an IPython also.

19:27 So we got to give a shout out to that.

19:28 And as I said, I was actually working biology in research at the time.

19:32 And in biology, we have notebooks, right?

19:35 Like you write, you put your PCR gel, you put your figures there, you write things.

19:39 This idea of literate programming is exactly that.

19:42 And what it does is it brings experimentation.

19:45 It brings scientific rigor and scientific research into computation.

19:49 And very important for this space where we are, really what we're talking about in data

19:53 science, ML, AI is the convergence of software meeting data and experimentation. So we need new

19:59 tools for this. And notebooks are one of the most awesome examples of that. Okay. I took us down a

20:05 bit of a hole with the Marimo stuff because it is cool. Anything else that jumps out to you? I have

20:10 one at the end that I want to riff on before we get off this topic, but what else jumps out to you?

20:14 Schools 2024, 2025, that's kind of like, oh, that's different. To step back a bit, we are talking

20:19 about like data science with AI and that type of stuff. And this works both ways, right? Like data

20:24 science plays into AI and building with foundation models, but I'd have to fire myself if I didn't

20:30 talk about the other way, which is AI helping us do data science. And the tools are AI assisted.

20:37 The biggest tool is AI assisted programming for data science, which is revolutionary. I think

20:44 maybe isn't even as big a term as we need for this. Absolutely groundbreaking.

20:48 It's easy to get frustrated saying it's bad for the environment.

20:52 I'm a good data science, good developer.

20:54 I don't need this stuff.

20:55 But for the most part, I feel like the cat is out of the bag.

20:58 The Pandora's box is open, whatever analogy you want to use here.

21:01 And it's made such a difference.

21:03 Without a doubt.

21:03 And I definitely agree on climate concerns.

21:05 We should be having larger conversations around this.

21:07 You can start using smaller models and local models for AI-assisted programming.

21:11 They won't superpower you as much.

21:13 there are, you know, to say all, and this is to say all of AI is very bad for the climate is,

21:19 I suppose, like saying both hummers and electric cars in the same bucket, right? So, but I totally

21:25 agree that that's a concern we need. And the other thing though, man, if we're talking about like,

21:31 you know, software's writing the software, AI's writing the software, vibe coding, I don't

21:35 necessarily understand it. Guess what? We were all copy and pasting from Stack Overflow. We've

21:39 been doing that for a long time, right? And I don't necessarily understand all that code. So

21:43 in some ways scaling and superpowering that behavior.

21:47 It's on you.

21:48 It's on me.

21:48 It's on everyone who uses it to either, and we're going to get into this more in detail

21:53 later, use that as a learning experience or as a, well, I don't need to know that.

21:58 I'll just, whatever it says, right?

22:00 That was true with Stack Overflow as well.

22:02 Like you would go to Stack Overflow and you would just copy something.

22:05 And the knock on people who would take stuff from Stack Overflow and paste it was they

22:09 had no idea what it meant.

22:11 They just saw that it solved the problem.

22:12 there was even that joke keyboard that Stack Overflow created. All it had was like a control

22:17 and a C and a V and they had a Stack Overflow logo. It's like, this is all you need, right?

22:21 It was hilarious, right? I'd like to system prompt ChatGBT to not be so sycophantic

22:26 and treat me like people on Stack Overflow used to treat people sometimes as well.

22:29 My ego is doing way too well today. I need to be beat down. But the thing is like,

22:33 you could go to Stack Overflow and you could go, wow, okay, I didn't know that. And then you learn

22:37 it and you don't need to go back to Stack Overflow and copy that thing because now you've understood

22:41 something deep and that's different that's on you when you're copying my stack overflow and it's

22:46 it's on you a hundred times over if you use these tools right because a lot of times especially the

22:51 agentic stuff it explains what it does like here's what it was here's why i changed it it could let

22:56 that scroll by or you can go slow and study it and become smarter not more brain dead you know what

23:01 i mean if i was getting like pandas or scikit-learn code from stack overflow i'd really like

23:06 want to understand it because that was my bread and butter whereas if it was front end stuff

23:11 Like I'd probably go and find the same issue question time and time again.

23:16 Same with like environment stuff, like getting environments working in Jupyter notebooks or

23:21 something.

23:21 I just, I still can't grok that stuff.

23:23 I know.

23:23 I was just thinking of bash scripts, like shell scripts.

23:27 I'm like, you know what?

23:28 This is just, I don't need to, I don't need to remember this.

23:30 I just bookmark that puppy.

23:31 And long as it doesn't have RMRF or something destructive in it, I'm just right.

23:36 Incredible.

23:37 Okay, one more thing before we move on to like the AI depths that I think is, we got to talk about it.

23:43 Because today is October 7th, 2025, as we record this, not as I release it.

23:47 So I'll have to be a bit nostalgic for a few weeks.

23:50 But today is Python Pi Day.

23:53 Python 3.14 came out today, right?

23:56 And one of the main features of Python 3.14 is the free threaded aspect being sort of officially taken in.

24:03 And I know one of the big challenges that's been solved with C extensions and Rust and other stuff, but it's still a bit of a challenge is like, I've got a ton of data.

24:10 I want to process it in my codes in Python.

24:12 How do I take advantage of the 32 cores I got?

24:15 Or do I get one 32nd of a computer, right?

24:18 And so I think starting to think about parallel programming a little bit, it's going to take on whatever significance it takes, it's going to take on more than it has traditionally.

24:26 Totally agree.

24:27 And I think it's the data scientists who are going to need it more than anyone.

24:30 Yeah.

24:31 Our web frameworks handle that kind of stuff for us.

24:33 They fan that out into processes and other things.

24:36 But when you've got real computational stuff, there's no IO blocking that you can work around,

24:42 right, to leverage async, right?

24:43 You've got to do the CPU stuff.

24:45 Exactly.

24:46 And it's a really good question because I would have a trillion percent agreed with you.

24:51 And I 100%, but I would have a trillion percent agreed with you pre our chat GBT moment when

24:57 data scientists, ML engineers, all of these types of people weren't only building products,

25:02 serving models, that type of stuff, but they were responsible for training as well.

25:06 And I think you're totally right with large-scale analytics. Think about Dask and geospatial,

25:12 large-scale, multidimensional, geospatial, atmospheric data, these types of things,

25:17 and basic analytics. I'm not even talking about machine learning there, but we have entered

25:21 a regime now where you can build ml and ai powered products by pinging apis or hosting your own models

25:29 and that type of stuff whether it's hugging face from hugging face or wherever it may be

25:33 or you know using olama locally and in that case i think because you're not doing the training

25:39 yourself you're able to do a lot of things without requiring massive massive compute yeah there is a

25:44 bit of a it seems like a big area but it's a bit of a thin area because you've got the regular

25:49 programming you can do then you need that async you need that parallelism for higher compute but

25:54 just you don't go very far until someone says fine i'm doing it in rust i'm doing a C++ or it's an

25:59 api and then you don't need it again you know what i mean there's like a little stratosphere sort of

26:04 bit of it and happy python pi day yes happy python pi day that's pretty cool i've not even installed

26:09 it yet today because uv has not shipped their support for it yet and that's when i mentioned

26:15 new modern tools like marimo and polars and duck db uv has has to be in there as well also very

26:22 excited about other package management but plus plus for lack of a better term tools like

26:27 pixie that uh wolf olprecht and his his team who i know have been on the show and you know who

26:33 people may know from mamba which helped us with conda so much so uv is not the only only story

26:39 out there solving these working to solve these problems i do think it's really interesting

26:42 especially for the data science crowd, because things have gotten better for those of us that

26:47 use pip exclusively. It's like faster, a little bit better resolution, a little bit better workflow.

26:53 Like some of the tools are brought together, like pip-tools plus just regular pip. But I think it's

26:58 a bigger consideration for your side of the fence in that there's Conda. And now do you stick with

27:03 Conda or do you use UV? Like that's, they kind of compete more than pip did with uv, I think,

27:08 actually. I use uv. I agree. I think uv is pushed it over the edge. And then the pyx

27:12 that charlie's released if you're like an org where they help with building like the layers of

27:17 machine models and pytorch and stuff is it's pretty interesting i also just want to say on python pi

27:22 day that is so geeky and i i would want to i do want to say show my geeky t-shirt to everyone which

27:28 says schrodinger's cat wanted dead and alive it's a wanted poster so speaking of geeky stuff that's

27:33 perfect that's perfect i i just have a sweater i'm sorry i didn't i didn't prepare yeah that's true

27:38 i'm in a library it's full of books so i can i'm sure something geeky is behind me all right so

27:42 We've talked about maybe some of the other not so necessarily AI focused tools is like what people should focus on.

27:49 But like we both said, it's so transformative.

27:52 And if people haven't actually tried it and seen it in action, you got to see it to believe it.

27:56 Because I was a skeptic until my friend's like, no, let's sit down and let me show you.

27:59 I'm like, oh, OK, I get it.

28:02 And so let's talk about some of the AI tools.

28:04 I've done an initial slicing into different levels, which may be useful thinking through the evolution of these tools.

28:10 And how do we use AI to help us code?

28:13 And I jokingly level zero because we zero index here is copy and pasting from Stack Overflow.

28:20 Right now, that's not quite using AI, but it is AI is using collective wisdom as opposed

28:24 to stuff in yourself.

28:25 Right.

28:26 So we've got that.

28:27 Okay.

28:28 But then after our chat GBT moment, we've people started copy instead of copying and pasting

28:34 between Stack Overflow and their IDE, VS Code, Jupyter Notebooks, whatever it may be.

28:38 people started copy and pasting between ChatGPT and the IDE.

28:43 So you would, you know, let's say you get an error message from your IDE,

28:47 copy it into ChatGPT, give it a bit more context, whatever it may be.

28:50 And then such as, you know, give it the code you wrote,

28:53 plus the error message, plus your environment.

28:56 And it will be pretty good at helping you, depending on what packages you're using, what framework.

29:01 If you're working in PyData, absolutely fantastic, right?

29:05 Of course, it's trained on scikit-learn.

29:07 It's trained on MapLotlib.

29:08 It's trained on Seaborn.

29:09 It's trained on spaCy.

29:10 One of the biggest frustrations at that point was it wasn't trained on ChatGPT's current

29:16 API.

29:16 So it would always give you, even if you corrected it, it would always revert to previous OpenAI

29:23 API.

29:23 Okay.

29:24 So level one, copy and pasting.

29:26 Level two, code completion in IDE.

29:28 So you can think co-pilot or whatever.

29:30 You're writing code and it will start suggesting things, right?

29:33 So that's just working your IDE.

29:35 Now, I think these things people probably know about.

29:38 But then level three is where things get really wild for me, where you actually, you have an

29:42 agent in your IDE or your terminal.

29:45 So cursor, which is a VS CodeFork, has an agent and a chat in it where you can have an

29:51 empty repository and say, hey, I want you to write a program that creates a RAG pipeline

30:00 over the documents in this subdirectory or something like that.

30:04 And it will go and do that immediately.

30:06 It will be, when you do that, it won't be great.

30:09 It'll do all types of nonsense.

30:10 It might write lots of directories and subdirectories.

30:13 So we can talk about some of the gotchas, but this is agentic coding where you're chatting

30:19 and it just throws stuff in.

30:21 I will also add, and this is something we mentioned briefly beforehand.

30:24 I don't like typing that much, to be honest.

30:26 So I use Super Whisper.

30:28 There are other tools to do this where I dictate to it.

30:31 And I also have a stream deck, which is what we mentioned beforehand, which a lot of content creators use this. It's like buttons

30:39 and knobs that you can assign to macros. And so when I have a button that opens cursor, puts it in

30:45 agent mode, attaches Claude, Sonnet 4.5 or Gemma 2.5, whatever it may be. Exactly. That's a stream

30:52 deck. And then I have accept code buttons and reject code and that type of stuff. So I can

30:57 actually build a not insignificant amount of software with my voice and three buttons,

31:02 which is so powerful.

31:04 I also find that if I do dictation, I can be a lot more patient and thorough

31:09 in a lot of ways, right?

31:10 Like I don't know about you, but I've had like RSI issues.

31:13 So I got to be real cognizant of like how much typing I do.

31:17 I've got my Microsoft Sculpt ergonomic keyboard that I drag with me everywhere

31:21 because square keyboards will destroy me in like a week.

31:24 And so it lets me go on without worrying about those kinds of things.

31:27 I use a Mac Whisper, but it's super similar, I believe, right?

31:31 It just uses the same underlying engine.

31:33 And it's really nice that you can, I do it for email.

31:36 I do it for lots of things.

31:37 I'll just give that.

31:38 I love how you mentioned it does help one be more patient because yeah, when typing, well,

31:42 when chatting with an AI, you can get frustrated and the friction in typing and correcting yourself

31:47 and that type of stuff, you just don't have when speaking natural language.

31:51 Yeah.

31:51 And I find people, I don't, I mean, this is not scientific, but my experience has been

31:55 that people say like, oh, this stuff is not good.

31:57 it always just gives me junk results and so on, is so often there's not enough information given.

32:03 It'd be like, create me a graph, not use Plotly to create this type of graph with this type of

32:09 focus from this data like I did before. You know what I mean? Those are really different things.

32:14 And the more specificity you can give these tools, the better. I use a lot of AI stuff.

32:20 And I would say I have plenty of prompts that are pages long. And I was like, here's a file,

32:26 Here's four pages of what I want you to do with it.

32:28 Let's go.

32:29 And it is not always writing.

32:30 So but incredible results compared to what if you just say, you know, analyze this or whatever.

32:34 We'll get to this when we talk about gotchas.

32:36 But having a conversation with your system before getting it to write anything is incredibly important and productive.

32:46 This portion of Talk Python To Me is brought to you by NordStellar.

32:49 NordStellar is a threat exposure management platform from the Nord security family,

32:54 the folks behind NordVPN that combines dark web intelligence, session hijacking prevention,

32:59 brand and abuse detection, and external attack service management. Keeping your team and your

33:04 company secure is a daunting challenge. That's why you need NordStellar on your side. It's a

33:10 comprehensive set of services, monitoring, and alerts to limit your exposure to breaches and

33:16 attacks and act instantly if something does happen. Here's how it works. NordStellar detects

33:21 compromised employee and consumer credentials. It detects stolen authentication cookies found in

33:27 InfoSteeler logs and dark web sources, then revokes live sessions and flags compromised devices,

33:34 reducing MFA bypass ATOs without extra code in your app. Nordstellar scans the dark web for

33:40 cyber threats targeting your company. It monitors forums, markets, ransomware blogs, and over 25,000

33:47 cybercrime telegram channels with alerting and searchable context you can route to Slack or your

33:53 IRR tool. Nordstellar adds brand and domain protection. It detects cyber squats and lookalikes

33:59 via visual, content similarity, and search transparency logs, plus broader brand abuse

34:05 takedowns across the web, social, and app stores to cut the phishing risk for your users. They

34:11 don't just alert you about impersonation, they file and manage the removals. Finally, Nordstellar

34:16 is developer-friendly.

34:17 It's available as a platform and API with integrations for Splunk,

34:21 QRadar, Datadog, Sentinel, Elastic, and Cortex.

34:24 No agents to install.

34:25 If security is important to you and your organization,

34:28 check out Nordstellar.

34:29 Visit talkpython.fm/nordstellar.

34:31 The link is in your podcast player's show notes and on the episode page.

34:35 Please use our link, talkpython.fm/nordstellar,

34:38 so that they know that you heard about their service from us.

34:41 Thank you to the whole Nord security team

34:44 for supporting Talk Python To Me.

34:46 Isaac Flath, who he has a wonderful course on Maven called Elite AI Assisted Coding,

34:51 which I'm actually starting as a student this week. And I may give a guest talk there. He wrote,

34:55 I don't know whether he came up with it, but it's, it's, he calls it Socratic and dialogue

34:59 driven development where you essentially pair program with the AI. Don't expect it to do

35:03 everything, but have, have conversations. So the other thing is a lot of these agentic systems like

35:08 cursor, which I use daily, probably a bit too much. You can plug into any, you know, state of the art

35:15 API. So, you know, Sonnet 4.5 came out recently and the day after you could use that in Cursor.

35:20 Gemini 2.5 came out a while ago, then you couldn't plug it in. GPT-5 and so on. The other thing I

35:26 just wanted to give a shout out to is Continue. Tyler Dunn, speaking of modern tools and open

35:31 source tools, it's like an open source cursor of sorts. And I mean, he wouldn't frame it that way,

35:37 but you can have all your own local models, data preserving, privacy preserving, and use those in

35:42 in this way. So just going back to this slicing though, level one, copy and paste code, level two,

35:48 code completion, level three agents in an IDE or terminal. So Claude code, for example, can be in

35:54 your terminal, cursor, VS Code, fork. Level four is embedded in other tools like Slack or Discord or

36:01 email. You can tag cursor in Slack to, if you notice a documentation fault, you can tag it in

36:07 Slack to fix that.

36:10 Manus, you can tag in email threads now.

36:13 Level five is more proactive.

36:15 So all of these are reactive systems.

36:17 Level five is more proactive.

36:19 So you can have Cursor, for example, and a lot of these systems,

36:21 I just use Cursor the most, to do code review in CI,

36:24 in continuous integration.

36:26 So whenever I submit a PR, Cursor can come in and do a code review there.

36:31 Now I'm getting a bit to future music.

36:33 At level six, we've got async or background agents

36:36 that just do stuff in the background, essentially,

36:39 which we're going to see a lot more of.

36:41 And then level seven, which we haven't seen so much proactive agents.

36:45 And I think these are going to be huge that will just notice stuff happening in production.

36:49 Like, hey, we have this outlier here.

36:51 Oh, this didn't quite work.

36:53 Or agents that come to me on a Monday morning and be like, hey, check this out.

36:57 Check that out.

36:57 Check that.

36:58 Like a good colleague, right?

37:01 A team member.

37:02 Yeah.

37:02 So, but we're already at, you know, level five and kind of got background agents as well.

37:08 So we're getting, yeah, to a lot of really exciting places.

37:13 Yeah.

37:13 So I'm a hundred percent bought in up to level four, probably like certainly the agent decoding

37:19 a little bit of the code review, not so much in CI, but like looking at the stuff that

37:22 I'm, I might ask it like, Hey, what's going on here?

37:25 Why is it like this background agents?

37:27 I just haven't, I haven't got there.

37:29 They seem, they seem like they don't have enough to work with, right?

37:32 they don't have my whole machine and all the setups and all the things they need. But I can

37:36 see how they would be useful. Certainly the way you're describing like a good colleague.

37:39 And so one example is not necessarily in software, but background agents, and I haven't done this,

37:44 but I've got friends and colleagues who've built background agents that monitor their inbox and

37:48 will ping them like their email, sorry, and will cluster emails and be like, hey, you really should

37:53 reply to this one. This is a prospect or a client that really needs your attention right now.

37:57 I think that kind of stuff would be really neat. I will throw out there now, this comes from

38:00 Sentry, who is a sponsor of the show, just to be fair, but I'm not doing this because they

38:04 sponsored it. They added this thing called Seer, S-E-E-R, that when your app collects an exception

38:10 or something or doesn't collect it and it gets up there, depending on how you send it up there,

38:15 is it'll apply AI to whatever it receives. And it'll look, if you bind it to your GitHub and so

38:19 on, it'll like try to understand the project. And maybe by the time you get to look at the bug

38:25 report, it actually has a solution suggested as well, which, and it'll do a PR, which that's on

38:30 the verge of what you were suggesting as this sort of like proactive buddy that's just hanging out

38:34 there. As I said, I do want to mention just a few gotchas. Or actually, let me just say some of the

38:40 really powerful use cases in data science, since we are talking about foundation models for data

38:45 science, just writing code. I mean, text to seek all these LLMs are trained on now, right? So

38:52 writing SQL code based on what you say or what you type in natural language. Now, chat with it

38:57 beforehand so that you make sure it understands your schema. It may not be great at complex joins,

39:03 this type of stuff. Understand the system you're working with, get a feel for how it works.

39:07 Actually think of it as like a super excited ADHD-esque, perhaps slightly autistic as well.

39:14 And I mean that with all the love in terms of, you know, absolute like deep memory, be able to recall

39:19 a lot of information. And on the ADHD spectrum, in terms of just how it will do, like it will spread

39:26 its attention over a lot of different places and create lots of different stuff, some of which may

39:30 not work, but some of which will be incredible. So have empathy for your system in that sense,

39:34 but writing SQL code, amazing. PyDataStack, incredible. I want to run an idea by you and

39:39 just see what you think here with what you just said with like, think of it as this super excited,

39:44 somewhat scatterbrain, junior helper, excited friend. If you had hired somebody, even if they

39:50 went to Stanford, but they hadn't really done work on any like major data science projects,

39:54 and they came to your company and you gave them a job,

39:57 would you expect 100%, like absolutely 100% correctness?

40:00 And they're, no.

40:01 So I don't know why people expect the AI to be literally 100%.

40:05 I think I have an idea why, but expecting the AI to be 100% right,

40:10 where it's kind of doing some of this human level type of reason.

40:13 I mean, not thinking, I'm not saying that, but this kind of problem, this creative problem solving,

40:18 we should have a little bit of patience for if it gets it wrong,

40:21 especially if we give it poor directions.

40:23 You know what I mean?

40:23 And I just feel like people so often think, well, it's a computer and it's wrong, so it's trash.

40:29 It's no good.

40:29 Well, was it 95% right?

40:31 Because that's really helpful.

40:32 This is so important.

40:33 And there are several things in there.

40:34 Firstly, I think when after our chat champion moment, Jeremy Howard noticed a complaint that people were like, oh, I need to chat with it like it doesn't do the thing the first time and I need to correct it.

40:44 And Jeremy was like, what type of human don't you need to have a conversation with to learn stuff?

40:48 The other thing is in nearly all of these systems now, you have memory.

40:54 And cursor, for example, has cursor rules.

40:55 Whereas if you notice stuff that it does or doesn't do, put it in the rules.

40:59 You can have project-specific rules.

41:01 You can have cursor general rules, this type of stuff.

41:04 Such as always explain your reasoning.

41:06 And to your point, I don't think these things think, but they mimic reasoning and thought

41:10 in pretty sophisticated ways.

41:12 So I totally agree with that.

41:14 Also, use these things to not only write code, but to write tests for code producers, right?

41:21 To debug, to add assertions.

41:24 Now, also make sure you're always reading the code on average that it writes.

41:30 If things are really important, it's about your appetite for risk.

41:32 Maybe if things are really important, make sure you know what the code does and read it.

41:37 If things aren't as important, maybe you don't need to.

41:40 And an example I'll give there is one of the biggest wins here is being able to vibe code your own data viewers.

41:47 So let's say I'm building.

41:48 100% I agree with you on this.

41:50 Incredible, right?

41:51 So if I'm building an LLM powered app where, okay, building an LLM powered app and I've got all these conversations, I can build a vibe code, a custom, a bespoke custom viewer to view that.

42:02 If it's an agent that writes emails, I can even build a viewer that displays the conversations

42:08 as emails or whatever it may be, right?

42:10 Now, I need to make sure that it's displaying the correct information.

42:14 So I need to make sure that my traces are actually looking correct.

42:18 So I need to understand that code.

42:20 But the ins and outs of the front end it's building, not the biggest deal for me.

42:24 So figure out what's important and what isn't.

42:26 There's so many tools like this that are like, it's not me worth taking a week to write that.

42:31 But wouldn't it be cool if I had that?

42:33 And nothing depends on it.

42:34 It's kind of like its own side little utility.

42:37 There's no, it's not a building block that's going to become an important foundational thing.

42:41 It just, if it works, you're like, that's awesome.

42:43 I have that.

42:44 People got to take advantage of just going like, oh, I need a utility that does X

42:49 or I need a view in our admin section of the web app that does this.

42:53 Nothing depends on it.

42:54 If it works, amazing.

42:55 If it doesn't, well, it didn't exist anyway.

42:57 So whatever.

42:59 And there's so many opportunities.

43:01 I love this so much. And I want to take a slight detour because it's so important. I actually think the surface area of what software is, is expanding and changing completely. So let's just look at, take a bit of a sociological big picture look at history where software, classically, you've, has been very expensive to build, right?

43:22 You've needed to pay a lot of not inexpensive engineers to build out a sophisticated product.

43:29 For that reason, you've had to get a lot of demand.

43:32 You've had to have a lot of service area of the market.

43:34 What that's meant is that you've needed to cover a lot of edge cases to satisfy a large

43:41 market so that you can make revenue based on the costs that you're accruing, right?

43:45 Now, with the ability to even vibe code or use AI-assisted coding to build this type

43:52 stuff, it changes what we can build and what it needs, what type of people it needs to satisfy.

43:58 So that's why the conversation of like vibe coding is going to bury Salesforce or SaaS versus like

44:04 old men screaming at clouds, vibe coding sucks or whatever it is, misses the entire middle ground

44:10 of the types of things that are possible. And I do think internal products, such as we've talked

44:15 about data viewers, internal products, such as, you know, lots of companies are ripping out their

44:20 marketing automation stacks and building things internally through to, I mean, I've got a friend

44:26 who built a chess tutor app. I mean, it's not chess.com, but like a hundred of his friends

44:32 use it, right? So the idea of the different types of software we can build now and thinking about,

44:38 you know, ephemeral software or just in time software, disposable software, fast software

44:44 that we can build to solve a problem right now and then move on. I think we need to shift our

44:48 model of what software actually is. I 100% agree. And it's such an exciting time. I also think this

44:55 actually has an implication for IPI packages and a lot of these just external packages. So think

45:01 about the ones you use, like this is not going to happen for pandas or polars or Jupyter or

45:05 something like that. But how often have you gone like, oh, I need, I need a package that will let

45:12 me look up both my ipv4 and ipv6 ip address and you you'd make you take that as a dependency

45:19 that's like one fun using one function out of whatever thing you grab that might give you that

45:23 information and there's there's a lot more opportunity to have fewer building blocks and

45:28 dependencies that you need in your application if you can just say hey agent thing i need this and

45:34 put it in its own file and boom and now you're not dependent on well did that thing upgrade to

45:38 Python 3.15, sorry, you're stuck. It didn't. You know what I mean? Like it's, you were sort of free

45:44 from, from these like little weird, I depend on this whole tree of dependencies because I need

45:50 one little piece of functionality. You can vendor in stuff a lot easier if it's a low stakes.

45:54 The other thing that I think AI assisted coding help I've seen help with is, now this is,

45:59 this will be a bit controversial, but it's going from prototype to production. And what I really

46:03 mean by that? Let's say your production stack is in like Databricks, Spark, whatever. You can

46:08 prototype, write your pandas code. And then because once again, Spark, Databricks, all that documentation

46:14 is in training data. You can then convert it relatively easily. But of course, read, test your

46:18 code that you're pushing to prod just as you would with a junior software engineer, right?

46:24 The other thing is some gotchas with this. It'll do lots of things you don't ask it to do, okay?

46:30 such as it will just start downloading packages, for example,

46:35 if you're trying to write some code, right?

46:37 And so it'll do lots of things you don't ask it to do,

46:40 and it will also do things you ask it to not do, right?

46:44 So you'll say only write in one file, and Devon, for example, will create nested subdirectories

46:51 each with 15 notebooks, something like that.

46:54 It will also forget lots of things in your conversation.

46:58 So make sure to remind it of things.

47:02 It will do something called dead looping.

47:04 And this is one of the most frustrating and pernicious things, depending on how long the

47:07 loop is.

47:08 But it will try to solve your most recent concern.

47:11 It will really optimize locally around the conversation.

47:15 And so it will solve that.

47:16 Then another error will appear and it will solve that.

47:18 And it will another error appear.

47:20 It will solve that.

47:21 And after a while, it might actually go back to the initial state.

47:24 So that's what dead looping is.

47:25 And this can be a loop of three.

47:26 And it just cycles.

47:27 Yeah.

47:27 five, seven, nine. So be very careful about that. Solutions are get it to zoom out. Say, hey,

47:33 let's zoom out and have a holistic conversation about this. Now, this is wild that this is how

47:37 we're interacting with software as well. Now, this is part of the- It's science fiction.

47:41 It really is. Science fiction. But write product requirement drops with it before any code. Now,

47:46 it may still just start writing code, even when you tell it to do that, right? Plan with it.

47:50 Write rules. Have empathy for your super excited, bright, fast, forgetful intern.

47:57 I've really embraced this plan thing.

47:59 All my major projects have a plans folder.

48:01 And every time I start, I create a markdown file.

48:04 I say, we're going to plan it out.

48:06 And I want you to write in this markdown file what we're going to do.

48:09 And I'll do that with a really expensive model, you know, like a thinking

48:12 something or whatever, and it'll do it.

48:14 Then I'll switch.

48:15 I'll completely throw away that chat.

48:16 Get another one, a little lower model.

48:18 Say, we're going to do phase one of this plan.

48:19 Let's go.

48:20 And then two and then three.

48:21 And every time I say, when you're done, you update the plan.

48:24 So you know where we are and what we've done.

48:26 What's not.

48:27 And it's tremendously successful.

48:28 There is a concern of sometimes you want memory from one chat.

48:32 Like sometimes a chat just degrades.

48:34 So start a new conversation.

48:35 And there are clever ways, different for different products and models, but to get it to summarize

48:40 the conversation so far, like literally say, I'm going to take another instance of you.

48:44 Let's squash this so I can pass the important memories to it and so on.

48:48 The other thing is Cursor, a lot of these products have, it used to be called YOLO mode on Cursor,

48:53 Like Y-O-L-O mode, right?

48:56 Where it just executes.

48:58 I run in YOLO mode, by the way, I do it.

49:00 I have too much anxiety around that.

49:02 So here's the thing that's really gotten that I've really noticed that's interesting

49:06 is my Git discipline is significantly better now that I'm, if I do that.

49:13 So I'll open it up and I'll go and I'll start having a chat

49:16 and I'll make sure everything is checked in.

49:18 I might do a separate branch if I think it's going to go bonkers.

49:21 And then it'll do a little bit of work.

49:22 And I'm like, okay, that's successful.

49:23 So I'll stage those changes, but not even commit them.

49:26 And then I'll let it keep going and it'll create more work.

49:29 And I see if it's successful, I'll just like put, and then eventually I'll commit it at the end.

49:33 It won't even push it necessarily.

49:35 And then you can also, while it's running, have the get diff window open

49:39 and you can just sort of see what it's doing by looking at the diffs that start to appear.

49:43 And so that's why I'm okay with YOLO mode because I can always just get revert and we're fine again.

49:49 Totally.

49:50 And also, I didn't mention this, but these AI assistants are great at looking at diffs with you.

49:55 And I also, I'm really thinking through what happens when we have so much AI generated code.

50:01 And I think part of the future of work, I don't want to get too sci-fi-esque, but how many agents can you manage simultaneously?

50:08 Like maybe the SuperSpa employees will be people who can manage 100 agents.

50:12 It's like the revenge of the program manager, I'm telling you.

50:15 And in fact, project management, product management are some of the most important skills moving forward.

50:20 But also when you have so many agents generating code, what happens to CICD?

50:24 What happens to Jenkins?

50:25 And there's going to be a whole new space of products and agents that may deal with these

50:31 types of systems, right?

50:32 Look during the industrial revolution, what sprung up when looms like appeared and the

50:37 satellite industries that happened there.

50:39 So I think we're talking about job automation, job automation and job displacement, but the

50:43 amount of new jobs that will become available, I'm actually very excited about.

50:48 And we'll put in the show notes, please.

50:50 But there's an essay by Tim O'Reilly.

50:52 And I did a podcast with him about it, actually, that we can link to if you're up for it, called

50:55 The End of Programming As We Know It.

50:57 OK.

50:59 And he essentially, with his great depth of historical knowledge and his forward thinking

51:04 through vectors, prevents a really wonderful vision and ideas around where software is

51:10 heading.

51:11 Exactly.

51:11 What about exploring data?

51:13 So we've been talking about mostly writing data, testing, writing code, testing code in

51:18 the context of data science, but I think it's actually really pretty powerful to say, here's a

51:23 CSV. This is basically what it means. Let's start looking at, get me some graphs, pull me out some

51:29 trends. What's important? What do you see that I didn't see? What do you think about this, like

51:34 this exploratory analysis side of it? And that's not even, you know, that doesn't even worry you

51:38 about like, is there maybe a bug in the code because I don't want to put it in production,

51:42 right? It's just, it's fooling around to get a jumpstart on understanding.

51:45 So firstly, exploratory data analysis is one of a scientist's and data scientist's most important

51:51 jobs. So the question then becomes is how can we get AI to help us see what's happening that's so

51:57 integral to what we do? And the truth is they're wonderful. They can be wonderful at pulling out

52:02 insights that I just haven't noticed, or I don't even think about how to visualize. Now, if I ask

52:09 it to find the mean or median, it may suck at that on average, unless it writes the code to do so,

52:15 But when you get it to do EDA or exploratory data analysis, it will provide insights that I haven't thought of.

52:23 So one example I saw recently from a client was throwing in thousands of rows of customer data and website data of customers.

52:33 And it immediately showed clusters of high usage versus low usage.

52:39 You could see power users.

52:40 You could see power users who spent a lot on the platform.

52:43 You could see power users who didn't spend much on the platform.

52:46 And this is the type of work that takes hours for data scientists to sift through and develop

52:52 hypotheses around.

52:53 So when we're talking about, you know, data science is exploration and hypothesis driven,

52:58 right?

52:58 So when we're talking about exploration and hypothesis driven data science, it's wonderful

53:03 there.

53:04 And once again, this isn't to replace what we do, but it's to help us.

53:07 It's having a thought partner, which is fast.

53:11 Beyond the computer, a different bicycle of the mind.

53:13 of a sort. Yeah. Maybe like a, maybe it's like an e-bike of the mind. What do you think?

53:17 I love it.

53:17 Yeah. E-bikes. I love e-bikes.

53:19 They're awesome. And I especially do because I live in Sydney, right? So getting to the beach,

53:24 it's actually quite hilly depending where, where you are. And the other thing I, yeah, when,

53:30 so hopefully we have time to talk about this, but something I do a lot of work on is what I call

53:35 evaluation driven development for LLM and AI powered applications where a really important part

53:41 of this is error analysis and failure analysis. So seeing where your applications fail. So let's

53:49 say we're building a chatbot, which is RAG. So it has some corpus of documents and it retrieves

53:54 stuff from it. And we want to interact with it. When we're building that, some of the first steps

54:00 and ongoing process is seeing what failure modes they are. Is it hallucinating or is it not

54:05 retrieving things correctly? Is it looking at the wrong documents? These types of things. And you do

54:10 that to drive the development and iterative process of AI powered software. Now you can also use AI

54:18 to look at that and look at the results in the data exploration. And it can really bring out a lot of

54:24 different failure modes. It can say, Hey, look at this cluster of conversations where the user

54:30 finally got the correct response, but they asked to be connected to a representative first,

54:35 and they didn't get connected to a representative. So this is actually an example where the conversation

54:39 looks like it's resolved. The support ticket was resolved or whatever it is, but there's a deep

54:44 failure mode within there. So AI can be very good in terms of the data exploration and hypothesis

54:50 driving process with that. So that was an example, but yeah, should we get into building LLM powered

54:56 software? Yeah, sure. So I wonder whether you'd like to bring up the figure that I will talk

55:01 through because I know there'll be people listening and I'll share it in the chat with you. So this is

55:06 a slightly tongue-in-cheek figure. It speaks to real pain we have. So on the x-axis, we have time

55:11 and we're talking about building software. On the y-axis, we have excitement or dopamine, if you will,

55:16 if you want to measure it. And for the scientists out there, I apologize for not having units on my

55:20 axes, but it's all good. We know with traditional software, excitement increases over time. Things

55:26 are pretty boring at the start. You have a hello world and basic features and add unit tests,

55:30 then you scale and optimize and load balance. And over time, excitement increases and increases

55:34 and increases. This is absolutely inverted with generative AI powered software, right? Where you

55:39 have a flashy demo at the start. You're like, wow, check this out. Okay. Then you're like, oh, wait,

55:44 does it actually work? Or can someone else use it? And so you have issues with basic functionality

55:49 then. So excitement goes down. Then you're like, oh no, all these hallucinations, excitement goes

55:54 down. Then you have monitoring challenges. You're like, how can I even look at all my conversations

55:59 and tool calls and actions and this type of stuff? Excitement goes down. Then you have integration

56:03 issues? How do I integrate into my enterprise stack? Excitement goes down again. And I don't

56:07 think it's a coincidence that this is one of the most exciting technologies of a generation that's

56:14 totally addicted to Instagram as well, right? But the question is, and a lot of work I do,

56:18 is in helping people raise the curve, not change anything, not even change the excitement of the

56:23 flashy demo, but just make sure that excitement goes up and up and up as you build. And once again,

56:30 There's no free lunch.

56:30 You've got to do this the hard way, right?

56:33 So this is something, and a short plug for a course I teach,

56:36 but this is something I teach a lot on.

56:37 And I teach a Maven course, building LM powered applications

56:41 for data science and software engineers with my colleague, Stefan Krauchik,

56:45 who he works on agent infrastructure at Salesforce.

56:48 I bring a lot of the data and ML stuff.

56:50 He brings a lot of the ops stuff.

56:52 But yeah, in this course, we teach a lot of these things.

56:56 And the way we do that, one way to think about it is what we call evaluation-driven development. So that's not

57:03 necessarily having very sophisticated evals and tests and that type of stuff to start,

57:09 but it's about having a sense of where you want your product to go and having a sense of evaluation

57:16 of how to drive it in that direction. And the skillset that's required for this is so similar

57:22 to people who've built data science and ML powered products, right? So it's a curiosity to explore

57:28 data. It's a hacker mindset. It's experimenting with different tools. And what have we been doing

57:34 in the PyDataStack? I mean, look at the data viz landscape in Python. I think it was maybe PyCon

57:39 2018 that Jake Vanderplass, I think first time gave his talk, you know, data visualization in

57:44 Python. And, you know, there were so many tools one could use there. It's this type of mindset

57:49 you need. And then in terms of the product workflow, how you make sure you don't kind of go down this

57:56 curve, lose excitement in what I call proof of concept purgatory. People call the plateau of

58:00 productivity. There are all types of fun names about it. But what you do is you use the machine

58:06 learning mindset, which is get some data in and out of your system. If you don't have users yet,

58:11 you can generate synthetic data to do so, right? Then you label it, pass or fail initially. And

58:19 you give it a failure mode such as it was hallucination retrieval error wrong tool call

58:25 and a tool call essentially is what an agent does so an agent is an llm plus tool calls a tool call

58:30 could be ping an api send an email whatever it is agent plus tool call in some you know while loop

58:36 or for loop right and so you have did it did this particular interaction pass or fail what was the

58:43 failure mode and then how do I fix these particular failure modes? And I mean, one way to do this

58:49 initially, depending on the complexity of your system, is to put all of this in a spreadsheet

58:52 and do a pivot table. I know AI engineers hate it when I tell them to do pivot tables, but if you

58:57 rank order, use a pivot table to rank order your failure modes by frequency, then you can see what

59:03 to fix first. And if it's a retrieval error, maybe you want to fix the rag part of your system and

59:08 the retrieval part, right?

59:09 As opposed to the generative part.

59:11 So focus on your embeddings or chunks.

59:13 If it's a tool call, focus on how that tool call is defined,

59:17 heuristics that the LLM uses there.

59:20 A lot of the time, if you're doing...

59:21 I would like to add, well, this is great.

59:24 I would like to add that you can use those agentic coding tools

59:27 to help build better analytics.

59:29 Absolutely.

59:30 Other support.

59:31 So you're like, God, we don't really have,

59:33 we can't really track that, right?

59:34 Like, well, give it half an hour and you can, you know what I mean?

59:37 Exactly.

59:37 And so you got to kind of think out of the box and be a little creative there.

59:42 Maybe build an MCP server that specializes in solving a specialized LLM that addresses

59:48 a shortcoming that you can then force it to use the MCP server and work more constrained

59:52 or whatever.

59:53 Exactly.

59:53 And as we kind of hinted at earlier, one of the wacky things here is that a lot of this

59:57 comes down to prompting.

59:58 And people like, should I fine tune or prompt engineer or rag or that type of stuff?

01:00:02 Prompt and prompt and prompt initially, because you can get so much lift by prompting.

01:00:08 Now, if it is actually a retrieval error, perhaps you want to improve your embeddings

01:00:13 or your chunking strategy or that type of stuff.

01:00:15 The other thing is, of course, data, metadata, data ingestion.

01:00:19 Draw an architectural diagram of your system where you see, you know, you have your RAG,

01:00:23 you have your output, you have your embeddings, you have your OCR, or if you've got PDS,

01:00:27 however you're ingesting your data, a huge amount of the time,

01:00:30 fixing how you do your OCR on your PDFs will be far more significant lift

01:00:36 than switching out to Claude Sonnet 4.5, okay?

01:00:40 At that point, I totally understand why people want to try the newest and sexiest model.

01:00:45 And I'm not telling people not to.

01:00:47 What I'm saying is focus on the fundamentals.

01:00:50 And then when you have this set of evals of labeled data of what works, what doesn't,

01:00:57 You want a test set, essentially, right?

01:00:59 Like in machine learning.

01:01:00 So it's the same process.

01:01:02 You have this test set, your gold standard, which ideally covers, has coverage over all

01:01:06 your failure modes.

01:01:07 You want eval coverage.

01:01:09 Then when you switch out to a new model, you can see how it performed on your test set.

01:01:13 Imagine that.

01:01:14 Imagine being able to switch out a model and seeing what's up there.

01:01:18 We will say it's better concretely with data, not just it feels better, which is often.

01:01:23 There are all these eval conversations about we don't want these evals.

01:01:26 We want online evals.

01:01:27 And then there are people do things only by vibes.

01:01:29 And all of these things are absolutely valid as well.

01:01:32 I think it's a healthy combination for whatever your product needs at any point in time.

01:01:36 I do want to take just a couple minutes, literally, and get your thoughts on people coming into

01:01:41 the industry.

01:01:42 Data scientists who are just graduating now, or they got their first job, or they're about

01:01:47 to get their first job.

01:01:48 Like, I imagine this is something of a scary time.

01:01:50 I think, well, now I'm not just competing with all the other people.

01:01:53 Now these like AI things are biting against me getting a job as well.

01:01:57 But I think it's both a blessing and a curse.

01:02:00 What do you think?

01:02:00 Focus on three things.

01:02:02 What value you can deliver?

01:02:03 What's your skill as a data scientist?

01:02:05 And it is looking at data.

01:02:07 It is building and it is tying that to business value.

01:02:10 So if you focus on your skills and you build, build, build and consistently tie it to business

01:02:15 value, I think you'll go a long way.

01:02:18 And this actually speaks to how we think about evaluation more generally.

01:02:23 And I do want to give a shout out to a Sydney-based company called Lorikeet that they build customer support agents for all types of industries.

01:02:33 And when they do evaluation, they always evaluate was the ticket solved or not.

01:02:39 That's the most important evaluation.

01:02:40 It's not did this LLM call result in what we wanted it to be or anything along those lines.

01:02:45 Of course, once they see a failure, they tie it back to these more technical LLM-based failure modes.

01:02:53 But I wonder if, yeah, maybe Google Lorikeet AI or something like that.

01:02:58 I had it in another tab.

01:02:59 I don't know.

01:02:59 I'll get it back.

01:03:00 One thing I love about, and I'm indexing on them.

01:03:03 There are lots of companies that do this.

01:03:05 But one thing that's very interesting about them is that their pricing model is ticket resolution based, right?

01:03:12 So if they've built a concierge or customer service agent for you, you pay based on how many tickets are resolved.

01:03:19 A lot of their competitors, their pricing is based on tokens, right?

01:03:22 Because that's what they pay for.

01:03:24 And in terms of aligning incentives, having it based around resolution is incredibly important for how you build as well.

01:03:32 And the reason, the question was, what should early stage data scientists focus on?

01:03:37 The reason I detoured on that story is supreme focus on business value, how your skills can be tied to business value via building stuff.

01:03:46 It's great advice.

01:03:47 And I will just throw one more thing out there for people is don't let these AI tools undercut your desire to actually learn the details.

01:03:56 If you just go like, all right, I asked it and it gave me the answer and it just streamed by and you didn't pay attention, you're doing it wrong.

01:04:02 You've got to stop, read, pay attention, question why did it do that.

01:04:06 You can even ask it, why did you do this?

01:04:08 Rather than that way, it might give you a documentation link.

01:04:11 Like you got to stay active and it's so easy to just go next, next, next, because it's exciting

01:04:17 that it's building something.

01:04:18 Absolutely.

01:04:18 And I will add one other thing there, which is we need to carve out time.

01:04:22 These things won't always work.

01:04:24 You can spend a day working with AI and make less progress than if you'd done it yourself

01:04:28 as well.

01:04:28 I want to be very clear about that.

01:04:30 What we all need to do is figure out organizations we can work at, work with, and then time ourselves

01:04:36 to experiment with these seriously emerging and rapidly changing technologies.

01:04:42 So it won't always be wins is my point.

01:04:45 So don't get discouraged when it isn't.

01:04:47 Awesome, Hugo.

01:04:48 Thank you for being on the show, sharing what you've been up to,

01:04:51 your view of this.

01:04:52 It's got your pleasure.

01:04:53 Absolutely.

01:04:53 I'll put a link to your podcasts, your courses, stuff like that

01:04:56 in the show notes for people.

01:04:56 I'd love that.

01:04:57 Oh, and I mentioned this to you earlier, but always love Talk Python, of course,

01:05:01 and I'm so grateful for having me on three times.

01:05:03 And I'd love to offer your audience 20% off my course as well.

01:05:06 So we'll include that link in the show notes.

01:05:09 Beautiful.

01:05:09 We'll put it right by the link.

01:05:11 All right.

01:05:11 Well, thanks for being here.

01:05:12 We live in weird and amazing and crazy times.

01:05:15 And yeah, I'm going to leave it with that.

01:05:17 On the journey together.

01:05:18 Thank you.

01:05:18 Thanks, Michael.

01:05:19 That's right.

01:05:19 See you later.

01:05:19 Bye, everyone.

01:05:20 Ciao.

01:05:21 This has been another episode of Talk Python To Me.

01:05:24 Thank you to our sponsors.

01:05:25 Be sure to check out what they're offering.

01:05:27 It really helps support the show.

01:05:29 This episode is sponsored by Posit Connect from the makers of Shiny.

01:05:34 publish share and deploy all of your data projects that you're creating using python

01:05:38 streamlet dash shiny bokeh fast api flask quarto reports dashboards and apis posit connect supports

01:05:46 all of them try posit connect for free by going to talkpython.fm/ posit p-o-s-i-t this episode

01:05:54 is brought to you by nord stellar nord stellar is a threat exposure management platform from the

01:05:59 Nord security family, the folks behind Nord VPN that combines dark web intelligence, session

01:06:05 hijacking prevention, brand and domain abuse detection, and external attack surface management.

01:06:11 Learn more and get started keeping your team safe at talkpython.fm/nordstellar.

01:06:17 If you or your team needs to learn Python, we have over 270 hours of beginner and advanced

01:06:22 courses on topics ranging from complete beginners to async code, Flask, Django,

01:06:28 HTML and even LLMs.

01:06:30 Best of all, there's not a subscription in sight.

01:06:33 Browse the catalog at talkpython.fm.

01:06:35 Be sure and subscribe to the show.

01:06:37 Open your favorite podcast player app, search for Python,

01:06:40 we should be right at the top.

01:06:41 If you enjoy the Geeky Rap theme song, you will be right at the top.

01:06:43 You can download the full track.

01:06:45 The link is your podcast player show notes.

01:06:47 This Is Your Host Michael Kennedy.

01:06:49 Thank you so much for listening.

01:06:50 I really appreciate it.

01:06:51 Now get out there and write some Python code.

01:07:03 Talk Python To Me, yeah we ready to roll Upgrading the code, no fear of getting whole

01:07:14 We tapped into that modern vibe, overcame each storm Talk Python To Me, I-Sync is the norm

01:07:21 you

Talk Python's Mastodon Michael Kennedy's Mastodon