Measuring your ML impact with CodeCarbon
In this episode, you'll meet Victor Schmidt, Jonathan Wilson, and Boris Feld. They work on the CodeCarbon project together. This project offers a Python package and dashboarding tool to help you understand and minimize your ML model's environmental impact.
Episode Deep Dive
Guests Introduction and Background
Jonathan Wilson is an associate professor of environmental studies at Haverford College. He brings a unique perspective to the CodeCarbon project, combining environmental science expertise with a background in computer science.
Boris Feld works at Comet ML, focusing on experiment management for machine learning workloads. He has helped create and shape CodeCarbon’s open-source development efforts, contributing coding expertise and solutions for measuring emissions during model training.
Victor Schmidt is a PhD student at Mila (Quebec’s AI Institute) in Montreal. He specializes in machine learning and co-founded the CodeCarbon project to better understand the carbon footprint of ML training.
Their collective mission is to empower software developers and data scientists to measure and minimize the environmental impact of machine learning models through CodeCarbon.
What to Know If You're New to Python
Are you new to Python but eager to follow along with measuring ML impact? Here are a few essentials:
- Use a popular code editor: Tools like VS Code or Sublime Text allow quick Python setup.
- Learn the basics of pip install: Installing packages like
codecarbon
requires comfort with pip or other dependency managers. - Basic machine learning concepts: Understanding model training, hyperparameters, and inference will help you follow CodeCarbon’s workflow.
- Practice Python fundamentals: If you’re just starting out, see the “Learning Resources” section below for a beginner-friendly course.
Key Points and Takeaways
Measuring ML Carbon Footprint with CodeCarbon
CodeCarbon is a Python package and dashboard that estimates the energy consumption and carbon emissions of ML workloads. By integrating a few lines of code (pip install codecarbon
and a simple start-stop tracker), data scientists can automatically capture energy metrics, map them to local or cloud energy grids, and see an approximate carbon footprint.- Links & Tools:
Why Estimating Emissions Matters
The rise of large-scale ML models has sparked concern about their environmental cost. Even medium-scale GPU training can be energy intensive, yet many organizations don’t track these impacts. CodeCarbon’s team underscores that you must measure usage before you can optimize or reduce it.- Links & Tools:
- NVIDIA SMI (for GPU power usage)
- Intel Power Gadget (for CPU monitoring on some setups)
- Links & Tools:
Grid Power Intensity and Location-Aware Emissions
CodeCarbon can use local grid emissions data to map kilowatt hours consumed to CO2 output. Where you train your model matters: some regions rely on coal, others on hydro or nuclear. A model that emits a certain amount of CO2 in one area could be “cleaner” elsewhere simply because of the greener energy mix.- Links & Tools:
- ElectricityMap / CO2 Signal (tracking real-time grid mixes)
- Links & Tools:
Cloud Providers and Data Transparency
Estimating emissions on the cloud can be trickier, since actual data center efficiency and energy mix may differ from local averages. CodeCarbon currently relies on published region-level data or assumptions about the local grid. The guests encourage cloud providers to share more detailed stats so that estimates can be more accurate.- Cloud Provider Regions: AWS, GCP, Azure all have varying degrees of published sustainability data.
Practical Ways to Reduce ML’s Carbon Footprint
Tactics like switching to low-carbon data center regions, employing early stopping, pruning large models, or using Bayesian search instead of brute-force grid searches, all save time and reduce emissions. These optimizations align with both environmental benefits and lower compute costs.- Links & Tools:
- Comet ML (Experiment management and hyperparameter tracking)
- Links & Tools:
Balancing Accuracy vs. Emissions
More compute time doesn’t always mean better results. The “law of diminishing returns” in model accuracy often sets in before skyrocketing resource usage. CodeCarbon helps you identify when to stop training, or when further compute yields little performance gain relative to its environmental cost.Open-Source Collaboration and Volunteer Contributions
CodeCarbon is an open-source project sustained by volunteers and a few sponsor companies. The team welcomes help in areas like collecting energy grid data, refining CPU measurements, improving data visualizations, and building a central API for large organizations to consolidate their runs.- GitHub Issues: CodeCarbon Issues
Going Beyond CSV: Future Reporting and Dashboards
Currently, each training run logs to a CSV file. In the future, CodeCarbon aims to provide an API and dashboards for entire teams or departments, showing aggregated footprints across multiple runs and experiments. This centralized tracking could drive data-driven environmental initiatives company-wide.Transparent Offsets vs. Actual Reductions
Many companies claim “carbon neutrality” through offsets, but that doesn’t mean zero carbon was emitted upfront. The CodeCarbon discussion stresses direct reduction—through better hardware choices, code efficiency, or greener power—rather than relying entirely on offset programs.Industry Impact and Collective Responsibility
While single developers can choose better coding practices or region selection, structural change from hardware manufacturers, cloud providers, and open-data policies is also needed. Projects like CodeCarbon hope to drive broader awareness and accountability in the ML industry.
Interesting Quotes and Stories
"If you want to improve it, you must measure it before." — Jonathan Wilson
"It's not just because a company claims carbon neutral... it doesn't mean no carbon was emitted." — Victor Schmidt
"CO2 offset is good, but not emitting in the first place is even better." — Boris Feld
Key Definitions and Terms
- CodeCarbon: A Python library to measure the estimated power usage and resulting carbon emissions during ML training or similar compute-intensive tasks.
- Grid Search: A brute-force hyperparameter search method that can lead to excessive training runs and higher emissions.
- Early Stopping: A technique that halts training once the model stops improving significantly, saving time and energy.
- RAPL: (Running Average Power Limit) – An Intel interface for reading estimated CPU power usage.
- ElectricityMap / CO2 Signal: Services that provide real-time or historical carbon intensity data for different geographic regions.
Learning Resources
If you are completely new to Python, consider starting with Python for Absolute Beginners. It covers foundational skills needed to follow along with the Python-based CodeCarbon library.
For those looking to strengthen ML and data science practices in Python, check out Data Science Jumpstart with 10 Projects. You’ll build practical skills that align well with applying CodeCarbon to real-world data science pipelines.
Overall Takeaway
CodeCarbon is a vital tool for uncovering and reducing machine learning’s hidden carbon footprint. By integrating straightforward Python tracking code, developers and researchers can make more informed decisions, optimize training loops, and hopefully drive an industry-wide shift toward greener AI. The team’s commitment to open-source collaboration, combined with forward-thinking practices like location-aware training and refined hyperparameter searches, demonstrates that sustainability in machine learning is an achievable goal—one measured line of code at a time.
Links from the show
Victor Schmidt: @vict0rsch
Jonathan Wilson: haverford.edu
Boris Feld: @Lothiraldan
CodeCarbon project: codecarbon.io
MIT "5 Cars" Article: technologyreview.com
Green Future Choice Renewable Power in Portland: portlandgeneral.com
YouTube Live Stream: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy