Python in the Electrical Energy Sector
In addition to that, we also spend some time on how Jack used Python and Open CV (computer vision) to automate the study of massive bee colonies and behaviors. Spoiler alert: That involved gluing Wing Ding fonts to the backs of bees!
Episode Deep Dive
Guest introduction and background
Jack Simpson is a computational biologist turned data scientist working in the Australian electrical energy sector. Originally drawn to programming through his fascination with honeybees and biology, he discovered the power of Python for analyzing large datasets and automating research tasks. Today, Jack applies these deep data and code skills to help understand Australia’s electrical markets and grid, particularly around the massive adoption of solar energy and the implications for power generation and pricing.
What to Know If You're New to Python
- You’ll hear mentions of pandas for data analysis and NumPy for numerical operations — familiarity with these libraries will help you follow the episode’s discussion of large-scale data handling.
- Know that Python has special tools for big data, such as Dask (for parallel and distributed computing) and Numba (for just-in-time compilation).
- Handling huge CSV files can be simplified by Python’s built-in modules like
zipfile
and techniques such as vectorization (applying operations over entire arrays rather than loops). - Python also has incredible community-supported libraries like OpenCV (computer vision) and NetworkX (graph analysis) mentioned in the episode, which open up highly specialized uses.
Key points and takeaways
Python’s Role in the Australian Energy Market
Jack uses Python to analyze real-time and historical power generation data across Australia. By pulling data on a five-minute or even four-second interval, he can identify trends in energy consumption and generation, including emergent behavior from solar, wind, and gas power stations.- Links and Tools:
Massive Rooftop Solar Adoption and Negative Energy Prices
Australia’s surge in rooftop solar installations has drastically changed midday energy supply and pricing. Jack’s analysis in Python shows that energy prices can go negative when there’s excess solar generation, causing fascinating market shifts.- Links and Tools:
- System Advisor Model (SAM) (via Python interface)
- NumPy
- Links and Tools:
Handling Large, Publicly Available Datasets
The Australian energy sector releases extensive, granular data—millions of rows per month. Python’s standard libraries (zipfile
, CSV handling) and frameworks like pandas streamline data ingestion, cleaning, and transformation for Jack’s analyses.- Links and Tools:
Importance of Domain Knowledge in Data Science
Understanding the energy market’s physical and financial structures helps Jack interpret the data correctly (e.g., why a solar farm might keep output fixed vs. a gas generator’s rapid changes). Pure code skills alone aren’t enough for meaningful insights.- Links and Tools:
- NetworkX (for social or network-based data relationships)
- Links and Tools:
Speeding Up Python with Vectorization and JIT Compilation
Jack stressed the performance gains from vectorizing calculations (applying operations across entire arrays in pandas / NumPy) and using Numba for just-in-time compiling of Python loops that depend on prior states.- Links and Tools:
Parallelizing Work with Multiprocessing and Big Servers
For extremely large monthly data dumps (hundreds of millions of rows), Jack splits tasks into parallel processes. This approach, or using frameworks like Dask, helps harness multi-core servers (e.g., 60-core machines) efficiently.- Links and Tools:
- Dask
- Python’s
multiprocessing
module
- Links and Tools:
Bee Research and Computer Vision with Python
Before shifting full-time to the energy sector, Jack’s PhD focused on tracking bees in a hive via tags and cameras. With OpenCV plus deep learning libraries like TensorFlow, he automated identifying bee behaviors, highlighting Python’s range of scientific uses.- Links and Tools:
Embracing Python’s Scientific Stack
From scanning images to analyzing time-series data, Python’s ecosystem (pandas, NumPy, SciPy, scikit-learn, etc.) provides a broad toolset. Jack also used specialized libraries like Cython for performance-critical code.Curiosity and Problem-Solving in Different Domains
Jack’s path shows how Pythonic problem-solving can jump from analyzing bee colonies to studying power grids. The core data manipulation and machine learning ideas transfer with minimal friction.- Links and Tools:
- Geopandas for spatial analysis and mapping
- Links and Tools:
Public Access to Energy Data
The transparency of Australia’s energy data enables hobbyists, researchers, and policymakers to dive deep. Jack underlines that the limiting factor is usually domain knowledge, not data availability.
- Links and Tools:
- AEMO Datasets for open energy data
Interesting quotes and stories
"I had a page of C++ code to convert between OpenCV and Caffe data structures, and I was terrified of breaking it. With Python, it’s all just NumPy arrays." — Jack Simpson
"I would put tiny patterns, even Wingdings, on the backs of bees. But the other bees didn’t like the smell of the glue and literally threw them out of the hive." — Jack Simpson
Key definitions and terms
- Vectorization: Performing an operation over an entire array or series instead of running explicit Python loops. This leverages fast, lower-level optimizations.
- Negative Electricity Prices: A market condition where power suppliers effectively pay others to take their electricity, usually due to excess generation.
- Numba: A just-in-time compiler for Python, especially suited for numeric calculations that can be parallelized or need speedups.
- System Advisor Model (SAM): A tool from NREL (U.S. National Renewable Energy Laboratory) for simulating energy system performance, including solar and wind.
- OpenCV: Open Source Computer Vision library for image processing and real-time computer vision tasks.
Learning resources
- Python for Absolute Beginners: Ideal if you're just starting out.
- Move from Excel to Python with Pandas: Great for those transitioning large spreadsheet workflows to pandas.
- Fundamentals of Dask: Learn how to scale your pandas workflows to large data.
Overall takeaway
Python’s flexibility and powerful libraries let developers move across scientific domains, from beekeeping research to high-stakes energy grid analysis. With effective tooling for large-scale data, performance optimizations, and transparency in the energy sector’s data, anyone with Python skills and a thirst for domain knowledge can make a tangible impact—no matter if it’s keeping the lights on for millions or decoding the secrets of a busy beehive.
Links from the show
Bees, lasers, and machine learning: jacksimpson.co
South Australian Gas Generator Interventions: jacksimpson.co
PySAM System Advisor Model: sam.nrel.gov
Visualizing the impact of Melbourne’s COVID-19 lockdown on Solar Panel Installations: jacksimpson.co
Stack Overflow Python graph: insights.stackoverflow.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy