Learn Python with Talk Python's 270 hours of courses

Python in the Electrical Energy Sector

Episode #320, published Sat, Jun 12, 2021, recorded Sun, Jun 6, 2021

In this episode, we cover how Python is being used to understand the electrical markets and grid in Australia. Our guest, Jack Simpson, has used Python to uncover a bunch of interesting developments as the country has adopted more and more solar energy. We round out the episode looking at some best practices for high-performance, large-data processing in Pandas and beyond.

In addition to that, we also spend some time on how Jack used Python and Open CV (computer vision) to automate the study of massive bee colonies and behaviors. Spoiler alert: That involved gluing Wing Ding fonts to the backs of bees!

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest introduction and background

Jack Simpson is a computational biologist turned data scientist working in the Australian electrical energy sector. Originally drawn to programming through his fascination with honeybees and biology, he discovered the power of Python for analyzing large datasets and automating research tasks. Today, Jack applies these deep data and code skills to help understand Australia’s electrical markets and grid, particularly around the massive adoption of solar energy and the implications for power generation and pricing.

What to Know If You're New to Python

  • You’ll hear mentions of pandas for data analysis and NumPy for numerical operations — familiarity with these libraries will help you follow the episode’s discussion of large-scale data handling.
  • Know that Python has special tools for big data, such as Dask (for parallel and distributed computing) and Numba (for just-in-time compilation).
  • Handling huge CSV files can be simplified by Python’s built-in modules like zipfile and techniques such as vectorization (applying operations over entire arrays rather than loops).
  • Python also has incredible community-supported libraries like OpenCV (computer vision) and NetworkX (graph analysis) mentioned in the episode, which open up highly specialized uses.

Key points and takeaways

  1. Python’s Role in the Australian Energy Market
    Jack uses Python to analyze real-time and historical power generation data across Australia. By pulling data on a five-minute or even four-second interval, he can identify trends in energy consumption and generation, including emergent behavior from solar, wind, and gas power stations.

  2. Massive Rooftop Solar Adoption and Negative Energy Prices
    Australia’s surge in rooftop solar installations has drastically changed midday energy supply and pricing. Jack’s analysis in Python shows that energy prices can go negative when there’s excess solar generation, causing fascinating market shifts.

  3. Handling Large, Publicly Available Datasets
    The Australian energy sector releases extensive, granular data—millions of rows per month. Python’s standard libraries (zipfile, CSV handling) and frameworks like pandas streamline data ingestion, cleaning, and transformation for Jack’s analyses.

  4. Importance of Domain Knowledge in Data Science
    Understanding the energy market’s physical and financial structures helps Jack interpret the data correctly (e.g., why a solar farm might keep output fixed vs. a gas generator’s rapid changes). Pure code skills alone aren’t enough for meaningful insights.

    • Links and Tools:
      • NetworkX (for social or network-based data relationships)
  5. Speeding Up Python with Vectorization and JIT Compilation
    Jack stressed the performance gains from vectorizing calculations (applying operations across entire arrays in pandas / NumPy) and using Numba for just-in-time compiling of Python loops that depend on prior states.

  6. Parallelizing Work with Multiprocessing and Big Servers
    For extremely large monthly data dumps (hundreds of millions of rows), Jack splits tasks into parallel processes. This approach, or using frameworks like Dask, helps harness multi-core servers (e.g., 60-core machines) efficiently.

    • Links and Tools:
      • Dask
      • Python’s multiprocessing module
  7. Bee Research and Computer Vision with Python
    Before shifting full-time to the energy sector, Jack’s PhD focused on tracking bees in a hive via tags and cameras. With OpenCV plus deep learning libraries like TensorFlow, he automated identifying bee behaviors, highlighting Python’s range of scientific uses.

  8. Embracing Python’s Scientific Stack
    From scanning images to analyzing time-series data, Python’s ecosystem (pandas, NumPy, SciPy, scikit-learn, etc.) provides a broad toolset. Jack also used specialized libraries like Cython for performance-critical code.

  9. Curiosity and Problem-Solving in Different Domains
    Jack’s path shows how Pythonic problem-solving can jump from analyzing bee colonies to studying power grids. The core data manipulation and machine learning ideas transfer with minimal friction.

    • Links and Tools:
  10. Public Access to Energy Data
    The transparency of Australia’s energy data enables hobbyists, researchers, and policymakers to dive deep. Jack underlines that the limiting factor is usually domain knowledge, not data availability.

Interesting quotes and stories

"I had a page of C++ code to convert between OpenCV and Caffe data structures, and I was terrified of breaking it. With Python, it’s all just NumPy arrays." — Jack Simpson

"I would put tiny patterns, even Wingdings, on the backs of bees. But the other bees didn’t like the smell of the glue and literally threw them out of the hive." — Jack Simpson

Key definitions and terms

  • Vectorization: Performing an operation over an entire array or series instead of running explicit Python loops. This leverages fast, lower-level optimizations.
  • Negative Electricity Prices: A market condition where power suppliers effectively pay others to take their electricity, usually due to excess generation.
  • Numba: A just-in-time compiler for Python, especially suited for numeric calculations that can be parallelized or need speedups.
  • System Advisor Model (SAM): A tool from NREL (U.S. National Renewable Energy Laboratory) for simulating energy system performance, including solar and wind.
  • OpenCV: Open Source Computer Vision library for image processing and real-time computer vision tasks.

Learning resources

Overall takeaway

Python’s flexibility and powerful libraries let developers move across scientific domains, from beekeeping research to high-stakes energy grid analysis. With effective tooling for large-scale data, performance optimizations, and transparency in the energy sector’s data, anyone with Python skills and a thirst for domain knowledge can make a tangible impact—no matter if it’s keeping the lights on for millions or decoding the secrets of a busy beehive.

Links from the show

Jack Simpson: jacksimpson.co
Bees, lasers, and machine learning: jacksimpson.co
South Australian Gas Generator Interventions: jacksimpson.co
PySAM System Advisor Model: sam.nrel.gov
Visualizing the impact of Melbourne’s COVID-19 lockdown on Solar Panel Installations: jacksimpson.co

Stack Overflow Python graph: insights.stackoverflow.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon