Learn Python with Talk Python's 270 hours of courses

Discovering exoplanets with Python

Episode #289, published Mon, Nov 9, 2020, recorded Tue, Sep 29, 2020

When I saw the headline "Machine learning algorithm confirms 50 new exoplanets in historic first" I knew the Python angle of this story had to be told! And that's how this episode was born. Join David Armstrong and Jev Gamper as they tell us how they use Python and machine learning to discover not 1, but 50 new exoplanets in pre-existing Keplar satellite data.

Episode Deep Dive

Guests Introduction and Background

David Armstrong is a lecturer at the University of Warwick in the Physics Department, where he researches exoplanets and their detection using both space-based telescopes like Kepler and ground-based observatories. He focuses on machine learning applications in astrophysics and supervises students exploring new methods to confirm exoplanets.

Jev Gamper is a PhD candidate in medical imaging and a senior scientist at a startup in London working on climate modeling and remote sensing. He got his start in Python during undergrad projects in quantitative finance and later joined David Armstrong’s research group to apply machine learning and Python to exoplanet detection in Kepler data.


What to Know If You're New to Python

Here are a few points mentioned in the episode to help you follow along more smoothly:

  • It’s useful to understand basic data handling in Python, especially working with time-series data (how to load, transform, and analyze data arrays).
  • Familiarity with popular machine learning tools like scikit-learn and the concepts of training data and labels will help you follow the discussion on exoplanet detection.
  • Knowing how Python is used in scientific computing (e.g., Jupyter notebooks, NumPy, and Pandas) gives context for how large datasets (like Kepler’s) can be processed.
  • Because they talk about comparing results to a “physics-based model,” you’ll benefit from some statistical or Bayesian ideas, though not too deeply.

Key Points and Takeaways

  1. Machine Learning for Exoplanet Detection The core topic is how Python-driven machine learning accelerated the process of finding and confirming exoplanets in data from NASA’s Kepler telescope. By using labeled examples of known planets and false positives, the researchers trained models (e.g., random forests, neural networks) to automate what was once a tedious, human-intensive process.
  2. Why Kepler Data is Special Kepler stared at one patch of sky for about four years and captured incredibly detailed brightness measurements (“light curves”) for around 200,000 stars. Because it was so focused, Kepler generated a massive amount of high-quality data, making it perfect for testing and refining planet-finding algorithms.
  3. Working with Time-Series Brightness (“Light Curves”) The raw data from Kepler is brightness over time, which can be tens of thousands of measurements per star. The team used Python to load these time-series into arrays, clean them, and identify potential transit dips, which might indicate an orbiting exoplanet.
  4. Comparing Machine Learning Results to Physics-Based Models The widely used model “VESPA” provided an established, simulation-driven approach to validate planets. The podcast discussed how their machine learning probabilities sometimes conflicted with these classical fits, but independent checks tended to favor the ML model’s accuracy.
  5. Transit vs. Wobble Method Two common ways of finding exoplanets came up: the transit method (looking for dips in brightness) and the wobble method (measuring the star’s radial velocity changes). While the transit method is especially well-suited to large-scale data like Kepler’s, the wobble method offers complementary insights for certain planets.
  6. Scaling Machine Learning Without Huge Clusters Despite the large datasets, much of the team’s analysis ran on standard desktops or single GPUs, showing that well-structured data and targeted machine learning can reduce the need for massive compute clusters. This is a testament to Python’s ecosystem efficiency and the power of modern libraries.
  7. Future Missions: TESS and Beyond After Kepler’s mission ended, NASA launched TESS, which surveys almost the entire sky. TESS gathers less data per star (shorter observations) but covers far more stars overall, implying a dramatic uptick in potential planet candidates—and a clear need for automated ML approaches.
  8. Validation and Calibration of Probabilistic Models Producing a “score” isn’t enough—converting that score into a robust “probability” is crucial. The guests described how they used calibration and Bayesian approaches to ensure that a 70% ML confidence corresponds to a meaningful, real-world likelihood.
  9. Astronomy’s Human Challenge: Reducing Bias in Large Datasets Before ML, graduate students manually examined transit signals, but human biases (energy levels, coffee breaks) crept into the results. Properly trained models can reduce these biases, producing consistent and standardized results even at large scales.
  10. Implications for Life Beyond Earth By confirming thousands of exoplanets and showing how prevalent planetary systems may be, it becomes more plausible that life exists elsewhere in the universe. While detection of life was not within this project’s scope, it clearly inspires the big-picture question of whether Earth is alone.
  1. Reusable ML Patterns for Other Fields The discussion briefly touched on other areas (like medical imaging and remote sensing) where automated classification powered by Python can unlock new research possibilities. Tools learned in astronomy (e.g., scikit-learn pipelines, probabilistic calibration) can be readily transferred to these different domains.
  • Tools / Links:
    • Rasterio (mentioned for geospatial data)

Interesting Quotes and Stories

“People started finding a pattern of how often a planet got flagged right after a coffee break.” – David Armstrong, highlighting how human biases and breaks can show up in large, manual classification tasks.

“Some of the first transits were discovered from the ground, which surprised everyone—nobody thought we could measure the brightness that precisely from Earth.” – David Armstrong, sharing a story on how ground-based telescopes still play a major role in exoplanet discoveries.

“The data from Kepler is public. That’s the amazing part: You can get it online, do your own processing, and verify your own planets!” – Jev Gamper, on the accessibility of scientific data for amateurs and professionals alike.


Key Definitions and Terms

  • Transit Method: Detecting exoplanets by looking for dips in a star’s brightness as a planet crosses its face.
  • Wobble (Radial Velocity) Method: Finding exoplanets by measuring the star’s slight orbital motion induced by the planet’s gravity.
  • Light Curve: The graph of a star’s brightness over time; used to identify possible transit events.
  • False Positive: In exoplanet terms, a signal that appears to be a planet but is actually caused by something else (like eclipsing binary stars).
  • Gaussian Process Classifier: A probabilistic model often used in astronomy (and other fields) to handle uncertainty in classification tasks.

Learning Resources

Below are a few resources to continue your Python journey and apply it to data-focused projects:


Overall Takeaway

This episode shows how a compelling mix of Python, machine learning, and abundant astrophysical data from Kepler led to one of the most significant exoplanet confirmations in recent years. The guests demonstrated that modern ML libraries can not only automate large-scale data processing but also outperform or complement classic astrophysical models. The broader implication is clear: With the continued rise of missions like TESS and future telescopes, Python-based machine learning will remain central to unlocking the secrets of our universe and potentially discovering habitable worlds.

Links from the show

Jev Gamper on Twitter: @brutforcimag
Machine learning algorithm confirms 50 new exoplanets in historic first article: techrepublic.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon