Discovering exoplanets with Python

Episode #289, published Mon, Nov 9, 2020, recorded Tue, Sep 29, 2020

Episode Deep Dive Links Transcript

When I saw the headline "Machine learning algorithm confirms 50 new exoplanets in historic first" I knew the Python angle of this story had to be told! And that's how this episode was born. Join David Armstrong and Jev Gamper as they tell us how they use Python and machine learning to discover not 1, but 50 new exoplanets in pre-existing Keplar satellite data.

Episode Deep Dive

Guests Introduction and Background

David Armstrong is a lecturer at the University of Warwick in the Physics Department, where he researches exoplanets and their detection using both space-based telescopes like Kepler and ground-based observatories. He focuses on machine learning applications in astrophysics and supervises students exploring new methods to confirm exoplanets.

Jev Gamper is a PhD candidate in medical imaging and a senior scientist at a startup in London working on climate modeling and remote sensing. He got his start in Python during undergrad projects in quantitative finance and later joined David Armstrong’s research group to apply machine learning and Python to exoplanet detection in Kepler data.

What to Know If You're New to Python

Here are a few points mentioned in the episode to help you follow along more smoothly:

It’s useful to understand basic data handling in Python, especially working with time-series data (how to load, transform, and analyze data arrays).
Familiarity with popular machine learning tools like scikit-learn and the concepts of training data and labels will help you follow the discussion on exoplanet detection.
Knowing how Python is used in scientific computing (e.g., Jupyter notebooks, NumPy, and Pandas) gives context for how large datasets (like Kepler’s) can be processed.
Because they talk about comparing results to a “physics-based model,” you’ll benefit from some statistical or Bayesian ideas, though not too deeply.

Key Points and Takeaways

Machine Learning for Exoplanet Detection The core topic is how Python-driven machine learning accelerated the process of finding and confirming exoplanets in data from NASA’s Kepler telescope. By using labeled examples of known planets and false positives, the researchers trained models (e.g., random forests, neural networks) to automate what was once a tedious, human-intensive process.
- Tools / Links:
  - scikit-learn
  - GPFlow
Why Kepler Data is Special Kepler stared at one patch of sky for about four years and captured incredibly detailed brightness measurements (“light curves”) for around 200,000 stars. Because it was so focused, Kepler generated a massive amount of high-quality data, making it perfect for testing and refining planet-finding algorithms.
- Tools / Links:
  - NASA Kepler Mission Site
Working with Time-Series Brightness (“Light Curves”) The raw data from Kepler is brightness over time, which can be tens of thousands of measurements per star. The team used Python to load these time-series into arrays, clean them, and identify potential transit dips, which might indicate an orbiting exoplanet.
- Tools / Links:
  - Pandas
Comparing Machine Learning Results to Physics-Based Models The widely used model “VESPA” provided an established, simulation-driven approach to validate planets. The podcast discussed how their machine learning probabilities sometimes conflicted with these classical fits, but independent checks tended to favor the ML model’s accuracy.
- Tools / Links:
  - VESPA Model Paper / Project (Astronomy reference)
Transit vs. Wobble Method Two common ways of finding exoplanets came up: the transit method (looking for dips in brightness) and the wobble method (measuring the star’s radial velocity changes). While the transit method is especially well-suited to large-scale data like Kepler’s, the wobble method offers complementary insights for certain planets.
- Tools / Links:
  - Wikipedia: Radial Velocity Method
Scaling Machine Learning Without Huge Clusters Despite the large datasets, much of the team’s analysis ran on standard desktops or single GPUs, showing that well-structured data and targeted machine learning can reduce the need for massive compute clusters. This is a testament to Python’s ecosystem efficiency and the power of modern libraries.
- Tools / Links:
  - PyTorch
Future Missions: TESS and Beyond After Kepler’s mission ended, NASA launched TESS, which surveys almost the entire sky. TESS gathers less data per star (shorter observations) but covers far more stars overall, implying a dramatic uptick in potential planet candidates—and a clear need for automated ML approaches.
- Tools / Links:
  - TESS (NASA)
Validation and Calibration of Probabilistic Models Producing a “score” isn’t enough—converting that score into a robust “probability” is crucial. The guests described how they used calibration and Bayesian approaches to ensure that a 70% ML confidence corresponds to a meaningful, real-world likelihood.
- Tools / Links:
  - Bootstrapping & Calibration in ML
Astronomy’s Human Challenge: Reducing Bias in Large Datasets Before ML, graduate students manually examined transit signals, but human biases (energy levels, coffee breaks) crept into the results. Properly trained models can reduce these biases, producing consistent and standardized results even at large scales.
- Tools / Links:
  - Jupyter Notebooks (commonly used for analysis)
Implications for Life Beyond Earth By confirming thousands of exoplanets and showing how prevalent planetary systems may be, it becomes more plausible that life exists elsewhere in the universe. While detection of life was not within this project’s scope, it clearly inspires the big-picture question of whether Earth is alone.

Tools / Links:
- NASA Exoplanet Archive

Reusable ML Patterns for Other Fields The discussion briefly touched on other areas (like medical imaging and remote sensing) where automated classification powered by Python can unlock new research possibilities. Tools learned in astronomy (e.g., scikit-learn pipelines, probabilistic calibration) can be readily transferred to these different domains.

Tools / Links:
- Rasterio (mentioned for geospatial data)

Interesting Quotes and Stories

“People started finding a pattern of how often a planet got flagged right after a coffee break.” – David Armstrong, highlighting how human biases and breaks can show up in large, manual classification tasks.

“Some of the first transits were discovered from the ground, which surprised everyone—nobody thought we could measure the brightness that precisely from Earth.” – David Armstrong, sharing a story on how ground-based telescopes still play a major role in exoplanet discoveries.

“The data from Kepler is public. That’s the amazing part: You can get it online, do your own processing, and verify your own planets!” – Jev Gamper, on the accessibility of scientific data for amateurs and professionals alike.

Key Definitions and Terms

Transit Method: Detecting exoplanets by looking for dips in a star’s brightness as a planet crosses its face.
Wobble (Radial Velocity) Method: Finding exoplanets by measuring the star’s slight orbital motion induced by the planet’s gravity.
Light Curve: The graph of a star’s brightness over time; used to identify possible transit events.
False Positive: In exoplanet terms, a signal that appears to be a planet but is actually caused by something else (like eclipsing binary stars).
Gaussian Process Classifier: A probabilistic model often used in astronomy (and other fields) to handle uncertainty in classification tasks.

Learning Resources

Below are a few resources to continue your Python journey and apply it to data-focused projects:

Python for Absolute Beginners: If you are just getting started, this course will build a solid foundation in Python from the ground up.
Data Science Jumpstart with 10 Projects: Dive into data exploration and machine learning in Python with real-world examples to expand the methods mentioned in this episode.

Overall Takeaway

This episode shows how a compelling mix of Python, machine learning, and abundant astrophysical data from Kepler led to one of the most significant exoplanet confirmations in recent years. The guests demonstrated that modern ML libraries can not only automate large-scale data processing but also outperform or complement classic astrophysical models. The broader implication is clear: With the continued rise of missions like TESS and future telescopes, Python-based machine learning will remain central to unlocking the secrets of our universe and potentially discovering habitable worlds.

Links from the show

Jev Gamper on Twitter: @brutforcimag
Machine learning algorithm confirms 50 new exoplanets in historic first article: techrepublic.com
Episode #289 deep-dive: talkpython.fm/289
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #289 deep-dive: talkpython.fm/289

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 When I saw the headline, Machine Learning Algorithm Confirms 50 New Exoplanets in Historic First,

00:05 I knew that Python must be operating somewhere in the background, and that the story must be told.

00:10 That's how this episode was born. Join David Armstrong and Jeff Gamper as they tell us how

00:16 they use Python machine learning to discover not one, but 50 new exoplanets in pre-existing

00:22 Kepler satellite data. This is Talk Python To Me, episode 289, recorded September 29th, 2020.

00:28 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:48 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:53 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter

00:58 via at talkpython. This episode is brought to you by brilliant.org and us. David, Jeff,

01:05 welcome to Talk Python To Me. Yeah, thanks for having us. Yeah, thanks for having us.

01:09 Yeah, it's good to have you guys here. You know, when I saw this article, its title was something like,

01:15 Machine Learning Algorithm Confirms 50 New Exoplanets, a Historic First. It didn't say

01:21 anything about Python, but I'm like, I got to contact these guys and find out the story because I bet you

01:25 Python has evolved. And oh my gosh, what an amazing discovery.

01:29 And Python's all over astrophysics at the moment. I mean, almost everyone uses it.

01:33 So good chance, I think.

01:34 Yeah, exactly. I thought, okay, machine learning and astrophysics combined. This has to be Python.

01:39 So I reached out to you guys and I'm really excited to have you on the show to talk about the research

01:44 that you're doing, some of the discoveries that you've made. And it sounds also like this is kind

01:50 of the beginning of the stories where things are going. Yeah. So this was really a test case we

01:54 wanted to put together to build up for a load of big missions that are coming online soon. I mean,

01:58 you might've heard of the Kepler satellite before, but it's been around for a while,

02:01 but it's the biggest data set we have for these exoplanet sort of transits, I guess we'll talk

02:05 about in a bit, but it's a perfect test case to work with.

02:08 Yeah, that's awesome. And it's, it's so amazing. You found all these planets as a small test case,

02:12 but Kepler starting to near the end of its life, right?

02:15 Kepler stopped recording already. So the main mission stopped in 2013 and it had an extension

02:19 called K2 that went on for a bit longer than that. Yeah. But the data is still full of things to

02:24 discover, as you can see. Amazing. All right, before we get too far into the topics though,

02:28 let's just set the stage really quickly with each of your stories. How'd you get into programming in

02:32 Python? Would you like to start David or should I go ahead? Sure. I mean, I got into programming

02:37 with God, it's, it's like college education here, what you call an A-level in computer science

02:41 with learning Pascal, if you believe it. Yeah. So it's going back a fair way, right?

02:46 But now I got into Python while I was doing my undergrad degree. And then mostly when I started

02:51 doing a PhD in astrophysics, because back then it was kind of either Python or IDL. You basically

02:55 picked one when you started. I think Python's won the war now, but yeah, I think so as well. I mean,

03:00 I haven't done any IDO, but there was this sort of domain specific programming language for astronomy

03:04 called IDO, right? Yeah, but it's proprietary and you have to pay a fee for it. And I think that's

03:08 what killed it in the end, but it's not killed. There's still people that use it, but the numbers

03:11 are shrinking. Yeah. Yeah, for sure. Jeff, how about you? So I also got started with Python during my

03:16 undergrad. At that time, I was very much interested in quantitative finance and qualitative trading.

03:22 Not interested in that anymore for a long time and wrote my undergrad thesis with Python and R.

03:29 And then I came to Warwick to do my master's, met David and Theo. Theo is another co-author on the

03:36 paper. And there was this master's project to do exoplanet detection and validation with Python and

03:43 machine learning. So I just jumped on it and got started. And that was kind of the second project

03:48 with Python. Yeah, that sounds really fun. I got into programming as programming in general,

03:52 into C++, because I had to learn enough programming for a math research project. And it's funny how these

03:58 things sort of take you down paths that you don't necessarily expect them to take you. Like,

04:02 oh, I'm kind of interested in this project. I guess I got to learn that. Wait, now that's my job or

04:05 that's my specialty. How'd that happen? Yeah, exactly. I mean, that has been the case for me. It's been

04:10 quantitative trading, then astronomy, and then PhD in medical imaging, which is all Python.

04:16 Yeah, absolutely. Especially the ML side of the medical imaging analysis, for sure. So how about

04:22 what you're up to day to day? What do you do these days?

04:25 So nowadays, I'm towards the end of my PhD in medical imaging, which mostly involves all things

04:31 Python, PyTorch, deep learning, computer vision, networks, all these sort of things. And I also

04:37 work as a senior scientist at a startup in London, where mostly we work with climate models, but also

04:43 remote sensing. So there's quite a bit of machine learning there as well, where computer vision on

04:48 remote sensing images and also fitting neural nets to climate data. So that's the stuff today.

04:54 How interesting. I think one of the cool takeaways, or it seems like you could infer from that is

04:59 you're using this skill you have with Python to do all these different things, right? Like

05:04 quantitative trading, climatology, and astrophysics. Like those are not typically a common skill set.

05:11 I mean, the common things in all those things, it's programming, being able to code. And Python is

05:17 a good language to start for everyone and has very kind of big community around it, where most of the

05:23 tools that you will need on your day to day basis for day to day tasks are already well developed and

05:28 well maintained. So that's really kind of what makes Python really cool for that.

05:32 Absolutely. David, how about you?

05:34 Well, Stack Overflow, of course.

05:36 Of course. You can always copy and paste that little segment to make it do something.

05:40 Who knows what?

05:41 Yeah, I mean, so I'm a lecturer at the University of Warwick in the physics department. So most of my

05:45 days, well, bits of research and teaching, writing grant proposals, supervising students,

05:50 this kind of thing. I mean, there's more and more of the later ones and less and less of the research,

05:54 but I think that's how it goes.

05:55 Yeah, absolutely. That is how it goes. Cool. Cool, cool. I think being in academics is just

05:59 really fun. You get to explore so many amazing ideas and you're not rushed, right? It's not like

06:04 you've got a six week sprint to write this code and then move on to the next thing.

06:08 Yeah, you can really think things through and try and work out what you're doing. I mean,

06:11 there's always something to do, right? It's a different kind of rush, but nothing like industry, I think. Yeah. Yeah. Well, as a PhD student, I'd like to speak here.

06:20 Well, sometimes you do have to rush if you have a deadline and your supervisor is behind you telling

06:24 you, no, you got to do this. You got to submit your paper there. So yeah. Yeah. I was in grad school

06:30 for a while and there's this guy, he was just, everybody knew him. He was working on his PhD and

06:35 he had been around for nine years in the PhD program. And he just, you could tell he was just

06:41 meant to be at a university. Wonderful guy, but he had no, it was just no urgency for him to finish

06:47 his degree and get going until they said, you know, there's a 10 year limit in this program. And if you

06:52 don't finish within 10 years, you don't get your PhD. He's like, oh my gosh, he was done in six months.

06:56 Well, 10 years as well. That's a lot of time.

06:59 Yeah. He just chilled. He just loved it. Like, I don't know. It was easy living, I guess.

07:04 All right. So let's talk about astronomy and Python. Now in this article, there's the official

07:09 research paper that you guys put out. And then there was top side type articles. And in that one,

07:14 the one from, I think it was the tech Republic, it said now astronomers are using machine learning

07:20 algorithms to search for planets beyond our solar system, formerly known as exoplanets. Is there a

07:25 different name? Am I missing something or are they off here? There are still planets, but we just call

07:30 planets outside the solar system, extrasolar planets, kind of just an exoplanets for short.

07:33 Extra solar. Yeah. Okay. I don't know if it's a formal name in the same way,

07:38 but exoplanets is what they've been called in the literature for a few years now. So.

07:41 Yeah. Okay. That's what I thought too. You know, they're not wrong. They're not wrong.

07:44 Super. All right. So when was the first exoplanet discovered?

07:49 There's a bit of a argument about some of the details, but it's roughly 20 years ago,

07:53 1995, I think for some of the first discoveries. I mean, there's been, there were a few candidates

07:58 before them, but it was back and forth in the literature for a while.

08:00 Right. I feel like this was done by a couple of astronomers in the observatory in Hawaii.

08:06 And so this was a ground-based discovery originally, right?

08:09 Yeah. The two astronomers were based in Geneva, actually. It was Michelle Mayor and Didier Kelo.

08:13 Okay.

08:14 And they got the Nobel prize for it last year. Right. And then.

08:17 Oh, wow. Okay.

08:17 Yeah. So that was the first one discovered. I mean, since then we've gone up over about 4,000

08:22 now and it goes up every day. I mean, it's just been accelerating rapidly.

08:25 Yeah. So we've always thought solar system setup that we have, this cannot be unique,

08:31 but we didn't have any proof of it. And so now, like you said, we do. And a lot of the data that

08:36 people are analyzing must be coming from Kepler these days, right? So maybe tell us,

08:40 maybe we start with the story of Kepler.

08:41 So Kepler launched in 2009. It was active in the, what we call the primary mission for four years.

08:46 And it just stared at one patch of the sky, like a fairly large bit, but still just one

08:50 patch for those straight four years, taking measurements of how bright the stars were every

08:53 half an hour, about 200,000 stars. And that's what really revolutionized the field in terms

08:58 of the statistics. I mean, we went from having sort of hundreds of planets that we knew about

09:01 to having thousands. And it was the first one that could really find planets that were a bit

09:05 closer to being like the earth, even if they were usually much hotter than the earth.

09:08 Right.

09:08 The ones we found.

09:09 Right. Because a lot of the discoveries early had to do with planets that were both very close

09:15 to the stars. So the pulsing was frequent and very large because, so it was extreme,

09:19 right? Like a hot Jupyter type of thing.

09:22 Yeah, that's right. And those hot Jupyters were fascinating at the time. I mean, no one expected

09:26 them to exist at all until they started finding them.

09:28 Yeah.

09:28 Really interesting stuff. But of course you want to find planets that are closer to the earth,

09:32 right? That's what everyone's got in their mind. So you're always trying to push that boundary

09:35 out of it.

09:35 Yeah. Yeah, absolutely. So we had Kepler and you said that shut down in 2013 and other ones must be coming

09:42 in a line. Yeah. So Kepler, they actually...

09:44 The excitement around this can't have gone down, right? With the discoveries being made.

09:48 No, no, of course not. Like, so Kepler, they actually, what happened is it had these reaction

09:52 wheels in the satellite and too many of them broke. So they ended up not being able to stabilize it very

09:57 well. And they sort of repurposed it by this very clever technique of kind of balancing it off the

10:01 wind from the sun. But I don't want to go into that. But it was called K2 for a bit longer and

10:05 carried on for a bit. But that's finished now as well. Eventually it ran out of fuel. But now the one

10:09 that's active that everyone's looking at is called TESS. There's another NASA mission that's,

10:13 again, observing stars, trying to look for planets transiting, like measuring the brightness of the

10:17 stars, but doing it in a very different way. So it looks at almost the entire sky. And that's been up

10:21 for about two years now and it's still going on.

10:23 Wow. Okay. Maybe give us a little background on how this works. So it's not visual, right? They don't

10:31 see the planets.

10:32 No, much as we'd like to.

10:33 They measure... Wouldn't that be amazing? You're like, oh, look, it's a water world.

10:37 No, it's measuring the effects of a planet on the star, right?

10:42 Yeah. So the way we were looking at it is called the transit method. It's found most of the planets

10:46 to date so far, though that might change going forward. I mean, it's when, if you imagine a

10:50 planet going around the star, if everything lines up absolutely perfectly, sometimes the planet passes

10:55 between us and the star or between the telescope and the star. And that's going to block out a tiny

10:59 little bit of the light from the planet, from the star.

11:00 Like a little, almost like an eclipse.

11:02 Yeah, it is an eclipse. I mean, we call them transits because they're planets, but it's called

11:05 eclipses if you have a star instead. And it's a very small fraction. I mean, those hot Jupyters

11:09 we talked about drop out about a percent of the star's light, but something like the Earth,

11:13 it's more very small fractions of a percent. Parts per million we usually end up talking about.

11:17 And so we're trying to measure the brightness of the stars and look for these dips in the light.

11:20 And if you see them in a sort of regular pattern with the right kind of shape and lots of other

11:24 things, you can say it's a planet.

11:25 Yeah. There was also the wobble method.

11:27 Yeah. Do you want to do it yet?

11:29 Yeah, go ahead. Yeah. I mean, with the wobbler effect, it's mostly, again, it's an indirect

11:34 observation method for detecting the planets orbiting the star. And it's mostly based on

11:39 the sun or the star and the planet having the common center of mass and then kind of going

11:45 in an orbit around that common center of mass. And that would make the star wobble. But most of the

11:51 kind of the observation methods for the wobble effect are for the large planets, which we have

11:56 been detecting before and for the smaller planets that might not necessarily work, especially for the

12:00 earth-like planets. So that's why transit photometry that is used by the Kepler is a bit better,

12:05 is a lot better to detect earth-like planets. And that's what we're interested in is quantifying

12:09 the frequency of these earth-like planets.

12:11 Yeah. Figuring out if anything lives there is one thing, but finding them is the first step,

12:15 right? And the closer, the better.

12:17 Yes. You got to start somewhere.

12:19 Exactly. So this used to be done by hand or maybe studying one-off, like let's look at this one star.

12:26 Maybe you guys give us the background on how this was done before and then some of the techniques that

12:31 you brought in with Python here.

12:33 So previously, people would still survey lots of stars. They'd try and observe loads of them and

12:37 try and find the ones that had these transits around them. But you'd kind of identify candidates

12:41 automatically. And then once you have those candidates, you'd have to look through the ball and

12:44 basically say, okay, this one's false because it's clearly, you know, the telescopes jumped out here

12:49 or something like that. And, you know, people would go through this and you'd have large teams of often

12:53 PhD students looking at these candidates for a while. You know, it wasn't really a great use of people's

12:57 time. You got very familiar with the data, right? And that's kind of okay. You know, that only works for

13:02 so long. So yeah.

13:03 Yeah. You study all the science, you go to school for seven, eight years with your undergrad, and you're

13:10 just looking for variations, like common variations, right?

13:13 I mean, there's a lot of science that's like that one way or another, right?

13:15 I know. Yeah.

13:16 I mean, I know people who sat in fields, basically watching swallows flying and measuring like how

13:22 often they did it and when for days upon days upon days. And it's kind of nice to be outside,

13:26 right? That one's not so bad. But, you know, there's a lot of data taking, right?

13:30 Yeah.

13:30 But anyway, the more we can automate that, the better, really. I mean, not only because you don't want

13:34 people to have to spend all their time doing that kind of thing, but because if you're trying to do

13:38 statistics with it, you really don't want human biases coming into it.

13:41 Yeah, absolutely.

13:42 Like some of the earliest things, I don't want to name any names or surveys here, but we could start,

13:46 I think you could see a frequency in how often people flagged candidates as potential planets

13:50 based on when the coffee breaks were. So there was a peak after coffee breaks and lunch, right?

13:54 And, you know, that's fine if you find all of them.

13:57 Exactly. And yeah, we talked about this previously. This is super interesting that human bias, not even

14:04 bias in the traditional sense, but just like moods and stuff would affect it. You know, you've had your

14:09 afternoon coffee and cake or cookies or whatever. And so now you're a little more focused, a little

14:16 happier and, oh yeah, that probably is a planet. Yeah. You're a little less pessimistic. Yeah.

14:21 Yeah. So you flag a few more things, right? Yeah.

14:23 That actually becomes really important if you work in medical imaging, because there it's not just

14:29 one PhD student looking at a particular light curve of a planet. Here you have multiple doctors looking

14:35 at the same image and all of them have different idea of what is happening in that image. And then

14:40 try to figure out what is the truth kind of thing.

14:42 How interesting. And of course it has a way more serious consequences, right? If there's hundreds

14:46 of thousands of planets out there and we miss one, well, yeah, maybe it'll even be found later.

14:50 But if you're trying to determine, do you need treatment for cancer? Yes or no? And they're like,

14:54 eh, no. Or yes, or whatever, right? That is a serious knock-on effect. All right. So you talked

15:00 about this light curve. Yeah. Tell us, maybe describe the data that you're looking at, because if you're

15:05 doing, say, this cancer research and having machines look at that, that's like a mammogram

15:09 or something along those lines. But this is not a picture in the traditional sense that you're

15:14 analyzing with this study, right? Yeah. So when I got started on this project as a master student,

15:18 I've spent quite a bit of time with David, just asking him loads of questions to understand what

15:22 is the data? What is it that I'm looking at? And how do I even work with it? What am I supposed

15:26 to do with it? Because around the Kepler satellite itself, there's a massive data processing pipeline.

15:31 First satellite itself naturally looks at the stars, records the brightness of a star over time.

15:37 Then that data is transitioned to the research center where the data is processed along with the

15:43 information of a kind of engineering information of what was happening on the satellite. And that

15:48 together is being post-processed to determine, to kind of get the final light curve or brightness of a

15:54 star. And then that brightness of a star over time is processed continuously to detect new planetary signals

16:00 again and again and again and again. And from that, then there are parameters derived on what specific

16:07 orbital period this planet might have, what might be the size of it, et cetera, et cetera, et cetera. And these become

16:12 the features that we work with. And I probably missed a whole lot of things already. So David could probably add some

16:18 more on that.

16:18 I mean, I don't want to add too much detail in a big go, right? Fundamentally, though, it's the satellite measuring how bright

16:24 these stars are over time and we have to take pictures with the satellite. So you have like a CCD image like you'd see on a normal

16:29 camera, but they have to like identify the stars in that and work out where they are and how bright they are at each

16:34 point and try and turn that into a long time series. But in the end, what we've got is lots of time series data.

16:39 Right. Once you've narrowed it down to a single star, you're able to just say, what is the brightness of that star over time?

16:45 Yeah. So I mean, and in Kepler's case, most of them, we'd have a measurement every 30 minutes and that would carry on

16:50 almost continually for about four years. So you're talking around 60,000 data points for each star there.

16:55 So it gets quite nice.

16:56 Yeah, that's quite a bit of data. One thing is step back a little bit on the science side, I guess.

17:00 One thing that just blows my mind about this is the timescale that you're working on here.

17:05 If you've got, say, like Mercury or hot Jupyter going around and around really fast, like you get lots of measurements.

17:10 But if you've got something far out from the star, it could be equivalent to a year, multiple years.

17:17 Like that's just a couple of passes, right?

17:20 Yeah. And so each pass, you get one transit, right? And if there's fewer than three, it starts to get

17:24 very difficult to say for sure whether it's really a planet you're seeing.

17:26 Yeah.

17:27 And not just two random blips where the instrument got a bit hot or something.

17:30 Yeah.

17:30 So really, we kind of require that there's at least three in most cases before you'd really claim

17:35 something was a planet. And even that's pushing it.

17:37 Yeah.

17:37 So ideally, you want more than that. And you can see you need to stare for four years just to start

17:41 to get anything like a year.

17:42 I know. It's great. Yeah. It's such a large timescale.

17:45 In your planets. And this is why, you know, this is why most of the planets we know just

17:48 aren't like the same sort of periods as the Earth. I mean, the vast majority of the planets we know

17:53 actually have periods less than Mercury.

17:55 Wow.

17:55 So Mercury's, I forget off the top of my head, it's around 80 days, right? Close to that.

17:59 Most of the planets we know are sort of sub-20-day periods, even, just because it's easier to find them.

18:03 That's flying. Yeah, yeah. That's where you get all the measurements.

18:06 But the other ones must be out there, right?

18:07 Yeah. I mean, the indications we have are that planets are everywhere. Very, very common.

18:11 And it's just a matter of getting the detection efficiency good enough to find the ones that

18:14 are further out, really, I think.

18:15 Yeah. David talked about this data, large data, you know, 60,000 measurements for a single star

18:21 and hundreds of thousands of stars. You maybe want to talk about the data processing pipeline

18:25 and how you took this data and worked through it, got it somewhere you guys could analyze?

18:31 I mean, the big advantage was that most of the data is already public. So all of the parts of the

18:36 data set, you can access it online on the associated Kepler websites and then do some of the processing

18:41 steps to kind of derive the features that you're interested in. Okay.

18:44 And also the confirmed planets and confirmed false positives as labels for machine learning are also

18:50 available online. So kind of when I was working on the master's project back then, David pointed me in the

18:56 right directions, where do I have to get the labels, where do I have to get the features, and then

19:00 naturally pointed me and directed me in terms of what features are we supposed to derive from that,

19:04 what kind of post-processing is supposed to happen.

19:06 And that's where really the advantage of Python comes in and packages like scikit-learn, because

19:10 most of these steps are already available in terms of normalizing the features and making the

19:16 machine learning pipelines and making the hyperparameter optimization pipelines. So that's all been kind of

19:21 easy to set up. In terms of the features for the data set, there were, I don't know how much into the

19:27 details we want to get, but there are a few features that we computed on top of the features available

19:32 with the Kepler satellite, Kepler data processing pipeline, and we naturally did some feature

19:37 importance to determine what do we need, what we don't need.

19:40 Yeah, yeah. Tell us about that.

19:42 Feature importance. I wouldn't remember which features were important now. In fact, I don't.

19:46 I can still remember a few. I've been looking at this more recently, I think, I guess.

19:49 I mean, the one that always comes up at the top is the shape of the transit. So when the planet passes in front

19:54 of the star, there's a sort of characteristic shape you get. It's very much like a U kind of shape.

19:57 And that tells you a lot. I mean, a lot of the things that can cause kind of false positives.

20:01 So what we're trying to do with all of this is work out when we see a candidate, whether it's a real

20:05 planet or whether it comes from something else. Like quite often you get things called eclipsing

20:08 binaries where it's basically just two stars over to each other and you see the eclipse of one of them.

20:12 And that can give you a very similar signal that might look a bit like a planet, but isn't actually.

20:16 And there's a few other things like that that can cause false positives.

20:19 Right.

20:19 And they tend to have these very slightly subtly different shapes. So sort of narrower eclipses that are a bit more

20:24 V-shaped and so on and things like that. And that always helps us a lot, but it doesn't get every case,

20:28 sadly enough.

20:29 If my memory is not too wrong, David, was it also the case that some of the

20:33 uncertainty measurements that were computed by the Kepler satellite pipeline were also computed as

20:38 important features or am I wrong?

20:40 I don't think we used them in the end.

20:41 Okay.

20:42 But they did go in to start with, but they didn't come out so important when you were doing those tests.

20:46 Okay.

20:46 Yeah. I was going to say, I'm no expert at image processing and with machine learning, but it seems to me that these models are very good at detecting small, minute changes that humans often miss.

20:58 Right. Like it was so successful detecting breast cancer and mammogram scans that a lot of trained professionals actually missed.

21:05 And these slight variations in like, oh, this is how it is when it's a star with a star rather than a planet and a star is subtle, but somehow machine learning seems to be able to find those and pull them out.

21:18 This portion of talk Python to me is brought to you by brilliant.org.

21:21 Brilliant has digestible courses in topics from the basics of scientific thinking all the way up to high end science like quantum computing.

21:29 And while quantum computing may sound complicated, brilliant makes complex learning uncomplicated and fun.

21:34 It's super easy to get started.

21:36 And they've got so many science and math courses to choose from.

21:38 I recently used brilliant to get into rocket science for an upcoming episode.

21:42 And it was a blast.

21:44 The interactive courses are presented in a clean and accessible way.

21:47 And you could go from knowing nothing about a topic to having a deep understanding.

21:51 Put your spare time to good use and hugely improve your critical thinking skills.

21:55 Go to talkpython.fm/brilliant and sign up for free.

21:59 The first 200 people that use that link get 20% off the premium subscription.

22:04 That's talkpython.fm/brilliant.

22:06 Or just click the link in the show notes.

22:09 I think the best thing here is that like sometimes in the extreme cases, like someone looking at it who was awake and paying proper attention and everything would spot the difference.

22:18 But what the machine can do is give you a nice sort of quantified boundary between the two.

22:22 And so, you know, this is where you need to draw your line and stuff near it.

22:25 Here it falls on this side or this side.

22:26 And here's how confident you should be about it, given what we can see.

22:29 And it can make all those quantified.

22:30 And that's the real benefit, I think.

22:32 You're not just relying on people sort of keeping themselves the same the whole way through, which no one does, of course.

22:37 And one of the things I think is probably a challenge here is certainly a challenge in some places is as you look at more of these signals, you get better at identifying those things.

22:47 But you weren't good at the first hundred you looked at or whatever, right?

22:51 You've become trained in looking at these type of data.

22:54 And so you bring on a new grad student, they go through that similar like learning curve as well.

22:58 And you can try and mitigate all that by like getting multiple people to look at every candidate.

23:02 But then you're just multiplying up the work hours, right?

23:04 In machine learning, they often refer to it as graduate student descent, where you just have multiple graduate students and they iterate on each and every of them.

23:13 Exactly.

23:13 It's like a different form of deep learning.

23:15 Right.

23:15 They can probably still get the wrong minimum, right?

23:18 The wrong optimization.

23:19 So, Yav, one of the things about this I think it sounds like made it work is you talked about there's already these Kepler data where it's labeled, right?

23:29 These are verified exoplanet signals.

23:31 These are verified false positives and these are not exoplanets, right?

23:35 So you were able to leverage that to teach the scikit-learn model?

23:38 Yes.

23:39 So we took some of the verified planets and some of the verified false positives, took the features, and essentially you create a Pandas data frame that you can feed into the scikit-learn machine learning models.

23:50 Split them into your cross-validation folds and fit the hyperparameters and see what you get.

23:55 All of the metrics to really evaluate the performance of these classifiers are already out there.

24:00 It's more the question of how do you interpret them from the understanding of the model.

24:04 So one interesting thing from what I remember was that the dataset itself seemed quite easy in terms of us getting very high accuracy metrics, but the slight difference between the different classifiers wasn't interesting.

24:19 So linear discriminant analysis performed slightly better than quadratic discriminant analysis, which meant that the decision boundary between the false positives and the confirmed planets was actually way simpler.

24:31 And you didn't need a quadratic function to describe what is the difference between them.

24:36 So you needed to be careful of not trying to overfit to too many outliers, essentially.

24:41 And the clearer the distinction, the better it is for you guys, right?

24:44 Yeah.

24:45 And just quantifying how confident we should be about that distinction.

24:48 I mean, that was a big thing of the improvement we made is trying to get probabilistic output so we could really get a proper probability for whether something was a planet or not.

24:56 Yeah.

24:56 Versus just trying to rank the best ones, which is what people have tried to do before.

24:59 I understand normal surveys, normal studies, where they have got very basic statistics, right?

25:06 Like, how many standard deviations is this from the mean or various other standards, statistical analysis?

25:13 What about this?

25:14 Like, once you've fed that all through machine learning, what you get out the other side, like, how do you make sense of that?

25:19 Oh, God.

25:20 So probabilistic machine learning is a huge thing in itself, right?

25:23 Beyond even that.

25:25 But there's different ways.

25:26 So, I mean, we used multiple different models.

25:27 One of them that's naturally probabilistic is a Gaussian process classifier, which we built with GPFlow as a different package.

25:33 And that actually took a GPU to run, which we thankfully got for free off NVIDIA.

25:37 So I should probably say that, you know, they'll be happy.

25:39 Awesome.

25:40 Yeah, that's cool.

25:41 So that sort of naturally gives you a probability of class membership.

25:44 So you set it up with the whole training seven labels that you have, and it'll tell you what the actual probability that a new sample actually is a member of any one of those classes.

25:52 Yeah.

25:52 But all the other things we have to calibrate.

25:54 So we get some sort of score out of the machine learning model, and then we have to take all of our examples and say, okay, well, we've got a score of 0.6, but actually 80% of these are really planets.

26:03 So really, that needs to be 0.8.

26:05 And you can kind of calibrate that way, sort of bootstrapping yourself.

26:08 So do you have people go back through and basically reanalyze what the machine learning suggested, just focusing on those stars?

26:14 What was the, once it says, you know, 67% likelihood, what happened?

26:19 That's definitely where we had to actually work on calibrating the classifiers that were non-probabilistic.

26:25 We also wanted to compare it against the probabilities produced by some of the previous methods, like VESPA, which was a physics-based model that evaluates the likelihood of different false positive scenarios and the planetary scenario.

26:39 And that's where the things got interesting, because the probabilities that our machine learning classifiers were producing versus the probabilities produced by previous models were slightly different, particularly for false positives, if I'm not mistaken.

26:52 More than slightly.

26:53 I mean, they disagreed in something like 27% of the cases.

26:56 Yeah.

26:56 And this is VESPA, I should say, before we get deep into that, is a sort of field standard tool that's like a, it's an actual physical model.

27:03 So you're fitting models of these different planet and false positive scenarios to the data and trying to say, how well does each model fit?

27:08 Right.

27:08 From what you understand, does it make sense with the way you think planets and gravity and all that works?

27:13 Yeah.

27:14 Because we can sort of build up a fake system with a binary star and so on and say, okay, well, this is what that data looks like.

27:19 And, you know, does it match?

27:20 Does it not?

27:21 And so on.

27:21 And you can do that properly in a Bayesian way and get some good statistics out of it.

27:25 The trouble is to make it run fast enough on a large number of samples, because to do this properly is very, very slow.

27:31 And VESPA makes some approximations too, and it has a whole set of results.

27:34 But till now, it's been the only thing that can run fast enough to really do this on large numbers of candidates.

27:38 And so we were the first people to be able to compare against that.

27:41 And we actually got some quite big discrepancies, which is kind of worrying, right?

27:44 Oh, wow.

27:45 Yeah.

27:45 I mean, it's not that it...

27:47 It's both exciting and worrying because somebody's got to change something, right?

27:51 Or learn something.

27:52 Yeah.

27:53 Well, it's important to do, you know.

27:54 So we haven't really got to the root of that yet as to what's causing it.

27:58 I mean, but when there was an independent label, it pretty much favored our classifier.

28:01 So we're optimistic about that part, at least.

28:03 Yeah, that is optimistic.

28:04 Yev, do you want to tell us more about this GPFlow library?

28:08 So GPFlow was something I briefly worked with when working on this project.

28:13 At that time, I was quite interested in Gaussian process classifiers and Bayesian nonparametrics.

28:19 But also how does it...

28:21 The problem with Gaussian processes generally is that they're quite hard to scale because you need to do a matrix inversion to be able to kind of get the posterior.

28:30 What GPFlow allows you to do is to formulate the whole problem or rather uses the methods that formulate the problem of kind of fitting your Gaussian processes to the problem by formulating this optimization problem.

28:43 And that's where the GPUs come in.

28:45 And GPFlow is naturally built on TensorFlow, which is built for optimizing parameters on the GPUs.

28:50 And that's what GPFlow is, but for Gaussian process specifically.

28:53 With a few kind of inference methods already pre-built into the package.

28:57 And you can just define your Gaussian process, define your kernel, define the hyperparameters, give it your data, and it will do the work for you.

29:05 That's what GPFlow essentially is.

29:07 Yeah.

29:07 It basically requires an NVIDIA graphics card underneath.

29:11 Well, any decent GPU will help it speed up, right?

29:14 But you can get away without it.

29:15 It just takes forever.

29:15 Yeah.

29:16 Yeah.

29:17 Nice.

29:18 So how about the compute cluster?

29:21 You know, I speak, I've had folks from like CERN, Large Hadron Collider.

29:25 They've got like this incredible compute cluster that they can send their stuff out.

29:29 The one kilometer array folks also had this huge amount of data they were processing in a big compute cluster.

29:35 What about yours?

29:35 So back when I started this project, I was doing it all on the laptop that they gave to us as part of the program.

29:41 But as David mentioned later on, it was run on a GPU.

29:45 So, yeah, it kind of progressed from a tiny laptop that I had to the GPU.

29:50 Yeah.

29:50 Though still, I mean, just one GPU.

29:52 We never really got to big computing nodes and clusters.

29:54 Like, it's not really a processing limited job, I think, at least at the scale we were running it at.

29:59 Right.

30:00 I think it's really interesting.

30:01 And that's why I want to kind of tease that out, because it didn't take a huge amount of computing power to do this.

30:07 All of the non-Gaussian process models, I mean, there's some random forests and multi-layer perceptron things and the like in there.

30:13 All of them, even by the end when we're running sort of like publication level results, we're still running on a pretty standard iMac, just on a desktop computer.

30:21 So it's just not using these big supercomputers, you know.

30:24 Yeah.

30:24 How much of the processing is happening in there?

30:26 Is the, do you start from the brightness over time signal or do you start from the actual image of the whole sky, the image of the star?

30:34 Like, where do you begin this analysis?

30:36 Yeah.

30:36 So this is the good question, right?

30:38 So a lot of the quick analysis I've been talking about is all of the training, the models and so on.

30:42 Like a lot of the features were pre-computed by NASA as part of the Kepler pipeline.

30:46 And that saves us a lot of processing time.

30:48 I mean, it really does.

30:49 So you can't really understate.

30:51 It's actually a bit more of a processing problem when you try and do that as well.

30:54 Right.

30:54 But it's already been sort of condensed down to you care about the star.

30:58 Here's the data.

30:58 Go look at it.

30:59 Yeah.

30:59 So here's a lot of metadata on it.

31:01 And we'll still take the time series and we'll calculate some extra things from it, but not at the same level.

31:05 So we don't have to start with the sort of real pixel level camera images.

31:08 And that takes a long time to process.

31:10 Yeah.

31:11 You talked about Kepler being done, tests coming online.

31:15 Maybe give us a sense for where this kind of research is going, where like additional projects, additional studies you all might be doing.

31:24 What does the future look like with new telescopes, larger field of view, more data, so on?

31:29 Well, I mean, with this one, we were really just trying to demonstrate the method on Kepler, but test is where you need it.

31:34 I mean, so Kepler has maybe 8,000 candidates, I think, in the database.

31:38 The yield estimates for tests kind of imply that we're going to find around 20,000 planets on top of several hundred thousand false positives.

31:46 So you're already upping it by a couple of orders of magnitude anyway.

31:48 Yeah.

31:49 That's in a sample of around 20 million light curves.

31:51 So it's a much bigger data set all by itself.

31:53 And that hasn't really been done yet to this sort of scale.

31:56 So really, the next thing we want to do is to generalize this model and apply it to tests.

31:59 Yeah.

31:59 And there's future missions coming on.

32:01 So there's one called PLATO, the European Space Agency mission that's due to launch in a few years.

32:06 And while various things coming online, it's a big interesting topic.

32:09 So there's lots of stuff in the pipeline.

32:11 How long will test run for?

32:12 Will you be able to detect things that are further out, like Neptune type of things, or is it still short?

32:19 So one of the trade-offs with tests was that it's observing the whole sky.

32:23 But that means for most of the sky, it observes it for less time, just a month, actually, and for most stars to start with.

32:28 I mean, the interesting thing about tests, though, is they built it so it has about 10 years of fuel, I think.

32:32 Like a lot more fuel than it's currently funded to operate for.

32:35 So optimistically, it could be going for the next decade.

32:38 We'll figure it out.

32:38 If we could get someone to pay attention in the next eight years, we'll just keep it running.

32:42 You know, if you're doing well enough, eventually it just sort of pays.

32:45 Yeah.

32:45 It's like Hubble, I guess.

32:46 Once you're doing well enough.

32:47 Right.

32:48 David, I've got a bit of a machine learning question.

32:51 Like, how applicable are the models actually built on Kepler to the test satellite?

32:56 Because from the test satellite, you won't have initial data to start with, right?

33:00 Yeah.

33:00 So this is one of the big challenges of building it for tests is we have to simulate a training set by making models of all the different scenarios and starting to inject them into the time series.

33:08 Right.

33:09 Because the measurement might be more sensitive.

33:11 It might look slightly different.

33:12 It's not the same instrument.

33:14 And so...

33:15 Different cadence, different noise.

33:16 Yeah.

33:16 All that stuff.

33:17 Yeah.

33:17 For sure.

33:18 I mean, the advantage of that, though, is we can start to really build up the explainability of it because we can test how things perform on different scenarios and actually just increase the size of our training set hugely.

33:27 So there's some advantages, too.

33:29 But yeah, it's some work for sure.

33:30 Yeah.

33:31 You kind of got to go back to restart.

33:32 You could go find the ones that are verified by Kepler and say, let's look at those, see what the curves look like, line it up.

33:39 Yeah.

33:40 Yeah.

33:40 And it's a nice thing because you can look at the stars that Kepler observed and Tess observed and say, like, oh, well, we know there's not a planet there.

33:45 So this is clearly some kind of noise.

33:47 And use that in your training set and this kind of thing.

33:49 Yeah.

33:50 There's some nice little synergies to build in with it.

33:52 Yeah, absolutely.

33:53 Well, what about other fields?

33:55 Can people take this idea of creating these machine learning models, studying the data that was already labeled, and automatically get insight into, I don't know, climate or weather or earthquakes, things like this?

34:09 I mean, of course, machine learning is being used everywhere, right?

34:11 Yeah.

34:12 You must have a...

34:13 I mean, I didn't quite understand the question.

34:14 If you could repeat, what is it that...

34:16 Well, I'm just thinking there must be some inspiration that other fields can take from this who are not yet using machine learning.

34:23 Do you see any where it's being underutilized, I guess?

34:26 Obviously, it's been utilized in a lot of places.

34:28 Probably plenty of fields.

34:29 Like, I can't think of many off the top of my head.

34:31 But, I mean, one big field is remote sensing.

34:34 There is just so much data observing the Earth's surface.

34:38 And most of it just kind of is there unused.

34:41 If you think about two satellites, Sentinel-1 and Sentinel-2, most of these satellites are not put to use at all.

34:47 But yet they produce terabytes of data.

34:49 And we've actually seen that in practice.

34:51 But the problem there is that there is very few labels.

34:54 But what you can learn is that actually it's okay to use labels that have been produced by other methods.

35:00 So the same way that as some of the planets have been confirmed by multiple follow-up observations,

35:06 there is still some likelihood that they could have been wrong.

35:09 But ultimately, once you put all of that together into one big data set, there will be still some signal from which machine learning can learn what features to use to do the classification.

35:19 And that way scale up once you have large data sets and slowly, slowly move towards some answer that you're looking for.

35:25 Or you might not be looking for an answer.

35:27 You just want to see what happens.

35:28 Yeah.

35:29 I guess if you start with sort of messy labels and you get a start-off classification, it can tell you which labels are most likely to be messy.

35:35 And you can make your training set better, right?

35:37 And start iterating.

35:37 Yeah.

35:38 What about using it for iteration?

35:41 I don't know very much about this data, but we could label a few things, see what it detects, go back and pay attention to what it thinks is important and say,

35:48 yo, you're right, no, you're wrong, let's retrain you, do it again, and so on.

35:53 Yeah.

35:53 So actually, there are quite a few publications in machine learning that look into these kind of problems.

35:58 And you can think of it like intuitively, it sounds very much like expectation maximization.

36:02 You move your parameters slightly, then you maximize over your data set.

36:06 And especially Amazon has quite a bit of research on that because they have the Amazon Mechanical Turk and they have loads of people labeling different data sets.

36:14 Yeah.

36:14 And sometimes people will label them wrong and therefore you want to fit your model to some of the labels, see how that compares to other labels.

36:21 And then kind of given that and for what are the possible kind of labeling noises and given that kind of optimize your model, et cetera, et cetera, and iterate on that continuously.

36:31 Yeah.

36:32 Very cool.

36:32 Well, this is such a neat project.

36:34 And like I said, I think it really captures the imagination when people see what you guys have done here because there's so much data out there.

36:43 And I think we're just scratching the surface on what we've learned, right?

36:47 This stuff is so hard to detect that the more we can do these studies and the more we can learn about stuff outside the solar system is just amazing.

36:54 Yeah.

36:55 I mean, for sure.

36:56 And we're keeping getting more and more data just coming in year on year with this.

36:59 Yeah.

36:59 Can you study anything from the ground?

37:01 Is there anything accurate enough on ground-based telescopes to give you that level of information or has it got to be as?

37:07 Yeah.

37:08 So, I mean, we actually run a telescope in Chile called NGTS, the Next Generation Transit Survey, that surveys stars and regularly picks up these transits.

37:16 What you can't do is get down really to the Earth-sized planets around sun-like stars.

37:21 That's really tough.

37:21 Yeah.

37:22 But certainly for Neptune-sized planets and Jupyter-sized planets, we can definitely go from the ground.

37:26 Yeah.

37:26 And some of the first transits were found from the ground for sure.

37:29 Oh, yeah.

37:29 Awesome.

37:30 Now, before we round out our conversation here, I do just want to ask you a couple of more sort of philosophical questions.

37:37 First of all, it seems to me that there's just an incomprehensible level of stars out there, right?

37:44 We've got how many stars in an average galaxy?

37:47 A hundred billion.

37:48 I don't know about the average, but that's about the Milky Way.

37:50 A hundred billion?

37:50 Yeah.

37:51 Something like that.

37:52 How many galaxies?

37:53 I mean, uncountable, right?

37:54 They keep going out to the edge of what we can see.

37:56 Right.

37:57 But it's millions of galaxies?

37:59 Well, more.

37:59 I mean, it's not a countable number.

38:01 I mean, there's just...

38:02 So, you've got to multiply those two numbers.

38:04 And then you think each star may have a planet?

38:07 Many stars have planets?

38:08 Maybe multiple planets?

38:10 They're certainly common.

38:10 Yeah.

38:11 Like, it's probably more common than not, at the very least.

38:13 So, yes, there's definitely planets everywhere.

38:15 It seems incomprehensible to me that life is only here, right?

38:19 Yeah.

38:20 And I think a lot of astronomers agree with that.

38:21 Though, obviously, we can't detect that life yet.

38:24 It seems remarkable that it wouldn't exist.

38:26 Right.

38:26 I mean, what it is hard to say is how often would intelligent life happen and this kind of

38:30 thing.

38:30 And there's all sorts of open questions there.

38:32 Yeah.

38:32 I think that's a really interesting thing to ponder.

38:34 Just, you put those numbers together and the fact that you all just keep discovering more

38:38 and more planets, so they must be fairly common.

38:41 The chances that there's not something like what we have here in other places in the galaxy,

38:46 it seems pretty small.

38:47 The chance that we find it or ever interact with it may be zero, but the fact that it exists

38:52 is already interesting to dream about.

38:55 Wow.

38:55 It would be incredible to prove that it exists, that's for sure.

38:58 It sure would.

39:00 What's even more interesting is that if we do kind of ponder about it, like you said, like

39:04 other life forms to exist, the impact it might have on the society and the society's perception

39:11 on itself.

39:12 Like as previously when there were big astronomical discoveries that kind of changed our worldview,

39:16 the society changes with that too.

39:18 And what can be the impacts of, I don't know, if we observe sufficiently many exoplanets and

39:23 we discover something new.

39:24 Maybe that will change our thinking and the way kind of we operate with one another.

39:29 I don't know.

39:30 Yeah.

39:31 That would be great.

39:31 Or if it turns out that there really is life on Venus.

39:35 Yes, that's a very new piece of news that they discovered.

39:38 I forgot the gas, but some gas that is typically...

39:41 Phosphine, wasn't it?

39:42 Yeah.

39:42 Yeah, it was phosphine.

39:43 That's right.

39:43 Amazing.

39:44 So it feels to me like there must be life out there.

39:49 And I think your work is tipping the scales and the likelihood of that, even if it has nothing

39:53 to do with actually detecting or interacting the life itself.

39:56 It's really cool.

39:56 Like you said, you've got to find the planets to be able to look for life on them.

39:59 That's right.

40:03 Talk Python To Me is partially supported by our training courses.

40:06 Python's async and parallel programming support is highly underrated.

40:10 Have you shied away from the amazing new async and await keywords because you've heard it's

40:15 way too complicated or that it's just not worth the effort?

40:18 With the right workloads, a hundred times speed up is totally possible with minor changes to

40:23 your code.

40:23 But you do need to understand the internals.

40:26 And that's why our course, Async Techniques and Examples in Python, show you how to write

40:31 async code successfully as well as how it works.

40:34 Get started with async and await today with our course at talkpython.fm/async.

40:40 Then the other one, I'm a big fan of SpaceX in the sense that it's really pushed the boundaries

40:47 of what we can do with space.

40:49 Like when I saw those two rockets land side by side, you know, people thought it was fake

40:53 because it was so synchronized.

40:55 And so like, there's no way that it's going to look that good.

40:57 That's just CGI, right?

40:59 And so the stuff they're doing is so amazing.

41:01 And yet they're not always positive for astronomy, right?

41:05 Like for their internet satellite stuff, there's a big uproar about them messing up ground-based

41:10 telescopes and stuff, right?

41:11 Yeah.

41:11 And I could find you some pictures where you see 50 of these satellites sort of streaking

41:15 through the sky, right?

41:16 Right.

41:16 Exactly.

41:17 It's not too bad now because there's not that many, but I think they have plans to

41:20 hugely expand the numbers.

41:21 Yeah.

41:22 So I just wanted to ask you too, what you thought about that?

41:24 I think some of their future generations are supposed to be designed to be a bit less kind

41:28 of bright in the images we have, but there's still a big open question about what effect

41:32 they're going to have.

41:32 I think people aren't really sure yet.

41:34 Yeah.

41:34 It's certainly not exactly my field.

41:36 I mean, I work with datasets from satellites, I should say.

41:38 The reason I thought about it is those things passing in front of stars are going to mess

41:43 up these light curves very much in the way that some other kind of transit would.

41:47 Well, if the light curve is coming from a satellite that's already outside of the orbit of these

41:51 ones, then it's okay, right?

41:52 But all the ground-based stuff can be affected, yeah.

41:54 Yeah.

41:54 And sometimes, I mean, some of the stars we look at are relatively bright, but people study

41:58 these very faint galaxies and things where the signal could be completely drowned out by

42:02 a satellite like that passing.

42:04 And that's what makes you worry.

42:05 I mean, there's other examples of radio telescopes where the datasets became unusable because of people's

42:10 mobile phones near them and things like that.

42:12 Right.

42:13 So you can see how some of the really sensitive measurements get affected by anything.

42:16 And a big trail of bright satellites is going to be something.

42:18 It is going to be something.

42:19 The fact that you all can detect things from so far away and so faint, it still just blows

42:25 my mind.

42:25 Yeah.

42:26 I mean, some of these, like the wobble method you were talking about earlier, like we can

42:30 sometimes measure stars moving towards us at a sort of walking pace.

42:33 Yeah.

42:33 Like when they're orbiting.

42:34 It's a whole star moving towards us at a few meters per second, like a brisk walk.

42:38 And it's 10 million light years away or something insane, right?

42:42 Or how far?

42:42 Hundreds of light years for sure.

42:44 Hundreds of light years.

42:45 Yeah.

42:45 Okay.

42:45 Yeah.

42:46 So many, many, many millions of miles, right?

42:48 Very far away.

42:49 And it's, yeah, like walking pace.

42:51 Unbelievable.

42:52 So I don't know.

42:53 We'll see what happens.

42:54 I guess, I mean, maybe there's things you could do if the, what you're measuring is like

42:58 looking for a long time.

42:59 You knew the satellite was coming.

43:01 Just don't take that signal at that moment.

43:04 have a little gap in the time.

43:05 But if there's a continuous stream of them, I mean, what do you do?

43:07 Yeah, I know.

43:08 It's not good.

43:10 And I think that is the end goal.

43:11 I think.

43:11 It's also that to me, the question that comes up with this project that SpaceX has with releasing

43:16 these satellites, it's also like, why do we need that in the first place?

43:19 Because to me, it's rather strange and it's kind of more of a symptom of the problems we

43:24 have in our century overall is like the focus is growth, growth, growth.

43:28 Yeah.

43:28 So we can throw satellites in the sky.

43:30 Let's just throw even more satellites.

43:31 Maybe we don't need that.

43:33 And maybe in the first place we need to think about, well, maybe we need to think about degrowth

43:37 if we think about climate change, et cetera, et cetera.

43:39 Yeah.

43:40 Yeah.

43:40 There is an argument for providing internet to a lot of places where you just can't get

43:43 the connections to.

43:44 And I think once you get into those sort of things, it's quite hard to decide like what's

43:48 the best thing to do philosophically speaking.

43:50 But if it's a for-profit thing, that's different.

43:52 Yeah.

43:53 I mean, the point there is that there are people and the specialists who study these kind of

43:57 things like anthropologists, et cetera, et cetera, in terms of how much do we need of that

44:01 stuff?

44:01 And sometimes it's useful to consult them too.

44:03 And maybe we should consult them too in these cases and not just look at it as a technical

44:08 kind of thing.

44:08 Will it hurt astronomers or will it not hurt astronomers, but look at it from a broader picture.

44:12 Yeah.

44:13 It's a good question.

44:13 And I think we're going to leave it there for our main conversation.

44:16 But let me ask you the final two questions before I let you all go.

44:20 So yeah, we'll start with you.

44:22 If you're going to write some Python code, what editor do you use?

44:24 At the moment, for the most part, I'm using PyCharm and sometimes I'm using Atom and sometimes

44:30 I'm using Veeam if I'm developing on a different machine that I'm connecting to remotely.

44:34 All right.

44:34 David?

44:35 I've been using TextWrangler for years, which is probably not a very fancy option.

44:38 I think it has to be BBEdit now.

44:40 They forced me to upgrade.

44:42 Perfect.

44:42 And then a notable PyPI package, like some library out there that you've run across and

44:48 you're like, this is really cool.

44:49 It really helped with our project.

44:50 Or maybe you don't know about, people should check this out.

44:54 You guys got any that come to mind?

44:55 I mean, the obvious one is GPFlow, which enabled a lot of the Gaussian process calculations.

45:00 I mean, everyone knows about scikit-learn, I guess.

45:02 Yeah, everyone definitely knows about scikit-learn.

45:04 The one that was most useful to me recently was Rosterio, which is a package that allows you

45:09 to work with georeferenced images and remote sensing images.

45:13 So that was really useful for me.

45:15 And PyTorch.

45:16 I really like PyTorch.

45:17 Big fun.

45:17 If we want to get in a whole debate of Keras versus PyTorch, I'm definitely on the PyTorch side.

45:22 Yeah, the PyTorch game.

45:23 Awesome.

45:23 All right.

45:23 Yeah, I hadn't heard of that one for geospatial data.

45:26 Very cool.

45:26 All right.

45:27 Final call to action.

45:28 People are out there listening.

45:29 They want to learn more about your study or maybe learn how they can take those ideas

45:33 and apply them to their area of research.

45:35 What do you say?

45:36 Just reach out on Twitter.

45:37 Yeah.

45:38 I mean, if you don't want to read the actual paper, that's definitely the best thing to do.

45:41 I mean, we're happy to answer questions.

45:42 Okay, cool.

45:43 I'll put your link to your ways to get in touch with you in the show notes.

45:46 And of course, the article.

45:48 Cool.

45:48 Sounds good.

45:49 Great.

45:49 Thanks a lot.

45:50 Yeah.

45:50 All right.

45:51 Thank you guys for being on the show.

45:52 It was really fun to talk about your project.

45:54 And congratulations on doing this work.

45:57 It's very cool.

45:57 Yeah.

45:57 Thanks for inviting us.

45:58 It's been great.

45:59 Yeah.

45:59 Thank you so much.

46:00 Yep.

46:00 Bye.

46:00 Bye-bye.

46:01 This has been another episode of Talk Python To Me.

46:04 Our guests on this episode were David Armstrong and Jeff Gamper.

46:08 And it's been brought to you by Brilliant.com.

46:10 Brilliant.com.

46:11 Brilliant.com encourages you to level up your analytical skills and knowledge.

46:15 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every

46:20 day.

46:21 Want to level up your Python?

46:23 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

46:28 Or if you're looking for something more advanced, check out our new Async course that digs into

46:33 all the different types of Async programming you can do in Python.

46:36 And of course, if you're interested in more than one of these, be sure to check out our

46:40 Everything Bundle.

46:40 It's like a subscription that never expires.

46:42 Be sure to subscribe to the show.

46:44 Open your favorite podcatcher and search for Python.

46:47 We should be right at the top.

46:48 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

46:53 and the direct RSS feed at /rss on talkpython.fm.

46:57 This is your host, Michael Kennedy.

46:59 Thanks so much for listening.

47:01 I really appreciate it.

47:02 Now get out there and write some Python code.

47:03 I'll see you next time.

47:24 Thank you.