Learn Python with Talk Python's 270 hours of courses

Machine Learning Ethics and Laws Panel

Episode #351, published Thu, Feb 3, 2022, recorded Fri, Dec 17, 2021

The world of AI is changing fast. And the AI / ML space is a bit out of the ordinary for software developers. Typically in software, we can prove that given a certain situations, the code will always behave the same. We can point to where and why a decision is made.

ML isn't like that. We set it up and then it takes on a life of its own. Regulators and governments are starting to step in and make rules over AI. The EU is one of the first to do so. That's why it's great to have Ines Montani and Katharine Jarmul, both awesome data scientists and EU residents, here to give us an overview of the coming regulations and other benefits and pitfalls of the AI / ML space.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests introduction and background

Ines Montani is the co-founder of Explosion and the co-creator of the popular NLP library spaCy. She and her team also develop Prodigy, an annotation tool for training custom machine learning models. Ines focuses heavily on privacy and data ownership concerns, preferring solutions that let organizations run models on their own hardware while safeguarding user data.

Catherine Jarmol is a data scientist who has spent the last five years focusing on data privacy, data security, and ML. She has worked on solutions like anonymization, differential privacy, and encrypted learning. Catherine resides in Berlin, where she’s been active in Europe’s data ethics discussions. She also announced (during this conversation) that she’s joining ThoughtWorks as a principal data scientist, furthering her work in privacy and AI.


What to Know If You're New to Python

If you’re relatively new to Python and want to better follow the discussion on machine learning and data privacy, it helps to have a grasp of the following:

  • Installing and managing Python packages such as spaCy (for NLP) or Flask/FastAPI (for web backends).
  • Familiarity with virtual environments for isolating projects.
  • Basic knowledge of how Python handles data structures and importing modules.
  • Understanding of Python’s object-oriented nature can help you see how ML libraries (like spaCy or others) structure their models and workflows.

Key points and takeaways

  1. EU AI Regulation and “High-Risk” Systems The episode revolves around Europe’s emerging regulations on AI, comparing it to how the EU introduced GDPR for data privacy. One major theme is that the regulation targets “high-risk” AI systems that potentially harm or manipulate human behavior, such as indiscriminate surveillance or social scoring. While there’s excitement about clear rules in the AI space, many details—like definitions and enforcement—remain vague. The panelists discuss the tension between protecting citizens and allowing room for innovation.
  2. Contrasting Classical Software vs. Machine Learning Traditional software offers predictable, hard-coded decisions, whereas machine learning systems learn patterns from data. Once trained, ML models can appear like “black boxes” and show unexpected decisions. This unpredictability complicates how you test, audit, and prove fairness or correctness—something legislators in the EU are grappling with.
  3. Auditing and Interpretability Europe’s proposed rules require explainability in AI decisions, meaning companies should be able to demonstrate how an ML model arrives at a particular outcome. Auditing these black-box models is tricky—some interpretability techniques exist (e.g., LIME, SHAP), but the legal framework is still catching up. This prompts conversations on open-source versus proprietary approaches and the potential new wave of “AI auditing” professions.
  4. Europe’s Stance on Data Privacy Both guests highlight how Berlin and broader Europe emphasize data protection. GDPR is a key legal precedent, but new AI regulations plan to go further by making ML decisions explainable and banning certain practices like social scoring. They also mention that large EU tech fines typically focus on major operators such as Big Tech, but smaller businesses still must comply with data collection standards (like avoiding hidden trackers).
  5. The “Moral Crumple Zone” and Accountability Catherine introduces the concept of the “moral crumple zone,” where blame gets pinned on the nearest human in a partially automated system when failure or harm occurs. For example, if a self-driving car malfunctions, the fallback is to blame the driver monitoring the system, even if the real problem was the developer’s code or corporate decisions. This underscores the complexity of accountability in AI.
  6. Costs and Environmental Impact of ML Training massive models can consume as much energy as running a car for years. The guests discuss how giant “foundation models” have sparked debate about carbon footprints, and wonder why the EU AI law doesn’t address environmental costs. They predict we might see more targeted, smaller-scale models, especially if environmental regulations catch up.
  7. GitHub Copilot and Data Privacy Concerns Michael mentions installing GitHub Copilot and discovering that his code edits—including possible secrets—could be sent back as analytics data. This highlights the tension between convenience/AI assistance and privacy. Catherine and Ines both emphasize the need to read terms carefully and weigh the benefit of AI-driven coding against potential data exposure.
  8. Berlin’s Unique View on Privacy & Tech Activism Both guests live in Berlin and note that Germany’s privacy activism (e.g., Google Street View blowback) shapes the regulatory environment. Local groups like the Chaos Communication Club champion data ethics, often sending a strong signal to lawmakers and big tech that privacy matters. Berlin’s tech scene, in turn, must adapt to a more privacy-aware populace.
  9. Practical Advice on Building Ethical ML Solutions Ines advocates smaller, more purpose-built ML models, trained with curated data, rather than “just collecting everything” or blindly relying on large LLM-based APIs. Catherine suggests measuring not only accuracy but fairness, privacy, and security factors. The overall recommendation is to invest in robust data workflows—like using local annotation tools—and keep user data onsite when feasible.
  10. Fairness, Manipulation, and the Future of AI The guests address how AI systems can subtly guide user behavior (from political ad targeting to YouTube recommendation “rabbit holes”). Legislators are trying to prohibit AI that exploits vulnerabilities, but drawing those lines can be challenging. Overall, the talk underscores how the next wave of AI regulation may balance innovation, user protection, and corporate responsibility.
  • Links and tools:
    • Rasa (rasa.com) – Mentioned in reference to chatbots that self-identify as AI

Interesting quotes and stories

  • On Berlin’s attitude toward Street View: “They drove the camera trucks once, and then basically said: ‘Germany’s so difficult, we’re never sending our cars here again!’” — Catherine Jarmol
  • On the moral crumple zone: “If you’re the one pressing the button, you have to answer for that... But if the machine did it, then it’s the nearest human who gets blamed.” — Catherine Jarmol
  • On reading AI Terms of Service: “What if one of the edits has my AWS key in there?” — Michael Kennedy, highlighting potential privacy leaks through Copilot.

Key definitions and terms

  • GDPR (General Data Protection Regulation): The EU law governing data privacy and user rights around personal data.
  • High-Risk AI: Term from the EU’s upcoming AI regulation describing models or applications that can significantly impact human rights or safety.
  • Moral Crumple Zone: A phenomenon where a human operator is blamed for failures in partially automated or AI-driven systems, even if the technology is at fault.
  • Foundation Models: Extremely large-scale ML models pretrained on broad data sets; they can be adapted to various tasks but raise questions on energy usage, bias, and interpretability.

Learning resources

Below are some resources to help you dive deeper into Python, machine learning, and NLP:


Overall takeaway

Both guests highlight that AI’s ever-growing influence demands responsible design choices and thoughtful regulation. From ensuring data privacy, to managing interpretability, to mitigating environmental impact, the future of AI depends on balancing open innovation with strong ethical and legal frameworks. As developers and data scientists, we must take proactive steps to understand and address issues like fairness, accountability, and privacy—because building trustworthy AI is everyone’s responsibility.

Links from the show

Katharine Jarmul on Twitter: @kjam
Katharine's site: kjamistan.com

Ines Montani on Twitter: @_inesmontani
Explosion AI: explosion.ai

EU proposes new Artificial Intelligence Regulation: nortonrosefulbright.com
The EU’s leaked AI regulation is ambitious but disappointingly vague: techmonitor.ai
EU ARTIFICIAL INTELLIGENCE ACT: eur-lex.europa.eu/legal-content

Facial Recognition Technology Ban Passed by King County Council: kingcounty.gov

On the Opportunities and Risks of Foundation Models paper: arxiv.org
thoughtworks: thoughtworks.com

I don't care about cookies extension: chrome.google.com
Everybody hates “FLoC,” Google’s tracking plan: arstechnica.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon