Python and the James Webb Space Telescope

Episode #357, published Mon, Mar 21, 2022, recorded Wed, Feb 23, 2022

Episode Deep Dive Links Transcript

Telescopes have been fundamental in our understanding of our place in the universe. And when you think about images that have shaped our modern view of space, you probably think about Hubble. But just this year, the JWST or James Web Space Telescope, was launch. JWST will go far beyond what Hubble has discovered. And did you know Python is used extensively in the whole data pipeline of JWST? We have two great guests here to tell us about it: Megan Sosey and Mike Swam.

Play on YouTube

Watch the live stream version

Guests

Megan Sosey

Mike Swam

Episode Deep Dive

Guests Introduction and Background

Megan Sosey is the technical lead for the data management system of the Nancy Grace Roman Space Telescope at the Space Telescope Science Institute (STScI). She began coding in BASIC on an Osborne computer, developed an early love of programming, and later adopted Python for its power in scientific computing. Her work focuses on overseeing data pipelines and scientific analysis software for future missions.

Mike Swam leads the data processing team for the James Webb Space Telescope (JWST) at STScI. He started programming in Fortran, then moved to Python around 2002 when it became prominent for astronomy and data analysis tasks. Mike’s team monitors the flow of data from the JWST through NASA’s Deep Space Network into STScI, where it’s processed and eventually made available to the scientific community.

What to Know If You're New to Python

Here are a few basics that came up during the episode to help you follow along:

Basic familiarity with Python’s syntax is helpful since JWST’s pipelines heavily use Python modules and scripts.
Understanding how Python handles file I/O and basic data structures (like lists and NumPy arrays) is key to grasping how telescope data is processed.
Many astronomy tools rely on metadata (data about data) alongside raw pixel data, so be prepared to read structured data such as JSON or YAML.

Key Points and Takeaways

Python’s Pivotal Role in JWST’s Data Pipeline The episode highlights how nearly every stage of JWST’s data journey uses Python. From checking data completeness to reconstructing full images out of telemetry packets, Python scripts ensure that information flowing from the observatory is validated, corrected, and distributed efficiently. Astronomers also rely on Python to calibrate and interpret the processed data.
- Links and Tools:
  - JWST Pipeline GitHub (github.com/spacetelescope/jwst)
  - archive.stsci.edu
Astronomy Data Flow from JWST to STScI After NASA’s Deep Space Network receives the observatory’s data, it’s transferred to STScI for processing and archiving. Mike described how incoming files are chopped into smaller pieces on the telescope, sent down incrementally, and then validated and reassembled on the ground. This ensures no corruption or missing pieces, allowing scientists to see complete images and detailed telemetry.
- Links and Tools:
  - NASA’s Deep Space Network (nasa.gov)
  - HTCondor (htcondor.org)
Calibration and Data Processing Tools The JWST calibration software, written in Python, handles everything from removing cosmic ray artifacts to adjusting for detector quirks like “instrumental signatures.” With multiple detectors and complex optics, this software accounts for temperature, ephemeris data (where the telescope is in space), and more. It helps create science-ready data that astronomers can trust.
- Links and Tools:
  - JWST Calibration Pipeline (github.com/spacetelescope/jwst)
  - ASDF File Format (github.com/spacetelescope/asdf)
Hubble and JWST Distinctions JWST is not simply a bigger, better Hubble. It orbits around the Sun–Earth L2 point and uses infrared detectors, requiring a sunshield to keep instruments cold. Hubble primarily focuses on UV, visible, and near-infrared wavelengths while JWST goes further into the infrared spectrum, enabling it to see the earliest phases of galaxy formation and peer through dust clouds.
- Links and Tools:
  - Hubble Space Telescope (nasa.gov/hubble)
  - JWST Mission Overview (jwst.nasa.gov)
JWST’s Infrared Focus and Science Missions Because JWST observes in infrared wavelengths, it can detect the first galaxies and stars formed after the Big Bang—light that has been “redshifted” over billions of years. JWST is also excellent for exoplanet research, examining planetary atmospheres and potential transit signals from smaller, rocky worlds.
- Links and Tools:
  - JWST Science Goals (stsci.edu/jwst/science)
Reprocessing Data to Improve Accuracy Over Time Data coming in from JWST isn’t static. As calibration algorithms evolve and new reference data becomes available, entire archives are reprocessed. Python makes it easier to run large, automated pipelines that apply updated corrections and produce superior final data products for astronomers.
- Links and Tools:
  - HTCondor for distributed processing (htcondor.org)
  - AstroConda (astroconda.readthedocs.io)
Open Source and GitHub Integration Many of the tools used for JWST are freely accessible on GitHub under the spacetelescope organization. Astronomers and external developers can contribute fixes, features, and run the same pipelines on their own local data. This encourages broad collaboration, faster bug fixes, and an exchange of new ideas.
- Links and Tools:
  - STScI GitHub Repos (github.com/spacetelescope)
  - WebbPSF (github.com/spacetelescope/webbpsf)
Nancy Grace Roman Space Telescope Megan works on Roman’s future data pipeline, which will handle even larger amounts of data—300+ megapixels per image and a wide field of view comparable to Hubble’s but at massive scale. Roman is designed for rapid surveys of the cosmos, tackling dark energy research and potentially discovering thousands of exoplanets.
- Links and Tools:
  - Nancy Grace Roman Space Telescope (nasa.gov/roman)
  - STScI Roman Mission (stsci.edu/roman)
High-Performance Computing and HTCondor Managing enormous data sets requires distributing tasks across many machines. HTCondor is used to harness idle CPU time and run massive processing jobs in parallel, which is crucial for reprocessing data as calibration algorithms change. The cloud also plays a role, letting scientists spin up short-term resources.
- Links and Tools:
  - HTCondor (htcondor.org)
  - AWS Cloud (aws.amazon.com)
World Coordinate Systems with GWCS Astronomy relies on accurately mapping pixels to celestial coordinates. GWCS (Generalized World Coordinate System) is a Python framework that records the optical path from a star to a detector, encoding transformations so users can correlate image pixels with the actual sky. It’s a crucial piece for precision science.

Links and Tools:
- GWCS (github.com/spacetelescope/gwcs)
- Astropy Modeling (docs.astropy.org/en/stable/modeling/)

Interesting Quotes and Stories

Megan reminisced about growing up in the 1980s, teaching herself programming on an Osborne computer because she wanted to play games—showing how early curiosity can lead to a career in astronomy software.
Mike highlighted how the data pipeline must account for multiple re-transmissions from JWST, describing the intricate system that ensures scientists see a complete picture despite data packets arriving out of order.

Key Definitions and Terms

Ephemeris: Positional data indicating where the telescope (or another celestial body) is over time. Essential for accurately calibrating observations.
Cosmic Rays: High-energy particles that can strike a detector and create spurious signals or “hits” in imagery. The calibration software flags and removes these artifacts.
Infrared Wavelengths: Electromagnetic radiation with longer wavelengths than visible light, enabling telescopes to see through dust and observe extremely distant or cool objects.

Learning Resources

If you’d like to develop or deepen your Python skills for astronomy, data science, or general programming, you can check out these courses:

Python for Absolute Beginners: Ideal for those just starting their Python journey, covering core concepts and language features step-by-step.
Move from Excel to Python with Pandas: Ideal if you’re an Excel user and want to adopt more scalable solutions like Polars or Pandas.
Modern APIs with FastAPI and Python: For those wanting to dive deeper into building powerful Python-based services to distribute or analyze telescope data.

Overall Takeaway

The James Webb Space Telescope embodies a new era of astronomy, fueled by an intricate Python ecosystem to capture, process, and deliver data. Whether monitoring exoplanets or observing the earliest galaxies, Python underpins everything from raw telemetry parsing to high-level scientific analysis. As you heard from Megan and Mike, it’s a testament to how flexible, open, and collaborative the Python community has become—making groundbreaking discoveries more accessible to scientists around the world.

Links from the show

James Web Space Telescope: webbtelescope.org
JWST at NASA: jwst.nasa.gov
JWST's YouTube channel: youtube.com

JWST Repo on GitHub: github.com/spacetelescope/jwst
STSci's AstroConda: ssb.stsci.edu/astroconda
Telescope pointing: github.com/spacetelescope/gwcs
Simulator: github.com/spacetelescope/webbpsf
STSci's Archive and Tools: archive.stsci.edu
htcondor: datasci.danforthcenter.org/htcondor
Silly faker: github.com/cube-drone/silly
Nancy Grace Roman Space Telescope: roman.gsfc.nasa.gov
Myst Parser: myst-parser.readthedocs.io
Watch this episode on YouTube: youtube.com
Episode #357 deep-dive: talkpython.fm/357
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #357 deep-dive: talkpython.fm/357

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Telescopes have been fundamental in our understanding of our place in the universe.

00:04 And when you think about images that have shaped our modern view of space,

00:07 you probably think about Hubble.

00:09 But just this year, the JWST or James Webb Space Telescope was launched.

00:15 JWST will go far beyond what Hubble has discovered.

00:18 And did you know that Python is used extensively in the whole data pipeline of JWST?

00:24 We have two great guests here to tell us all about it, Megan Soze and Mike Swam.

00:29 This is Talk Python To Me, episode 357, recorded February 23rd, 2022.

00:35 Welcome to Talk Python To Me, a weekly podcast on Python.

00:51 This is your host, Michael Kennedy.

00:52 Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at talkpython.fm.

00:58 And follow the show on Twitter via at Talk Python.

01:02 We've started streaming most of our episodes live on YouTube.

01:05 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:13 This episode is brought to you by Datadog and Stack Overflow.

01:18 Transcripts for this and all of our episodes are brought to you by Assembly AI.

01:22 Do you need a great automatic speech to text API?

01:25 Get human level accuracy in just a few lines of code.

01:28 Visit talkpython.fm/assemblyai.

01:30 Megan and Mike, welcome to Talk Python To Me.

01:34 We're happy to be here.

01:35 It's great to have you here.

01:36 I'm such a fan of space and just it's amazing to think about our place in the universe

01:44 and where we are.

01:46 And so much of that has been revealed really through telescopes, right?

01:51 From the very beginning of like, oh, look, the sun doesn't rotate around us.

01:54 How weird to, you know, the universe is expanding with what Hubble did or just all the amazing discoveries we've had, the exoplanets and whatnot.

02:02 So super cool.

02:04 And if we're going to mix Python into that, it's definitely an interesting thing to do.

02:08 So I'm happy to have you both here to talk about that.

02:10 Yeah, we're happy to join you.

02:11 It's a pleasure.

02:12 Yeah, it's great.

02:13 Let's just really quickly before we get into the topics, maybe start with just a little bit of a story on your background.

02:18 Now, Megan, how'd you get into programming and find your way over to Python?

02:22 Yeah, so programming.

02:23 I started young and it was mostly because I was jealous of some of the other kids.

02:29 This is the early 80s that had access to the new Atari systems, the new Coleco Visions.

02:34 And I didn't really have that.

02:36 I wanted to be able to play games and do those types of things.

02:39 And as it turned out, my dad had just bought one of these new computers in Osborne,

02:45 which is kind of one of the early personal laptops.

02:47 That was one of the very early ones.

02:49 It might be the first laptop, a very heavy laptop, though.

02:53 Probably not as good battery life as today either, I would suspect.

02:57 I don't think there's no battery life.

02:59 Yeah.

02:59 And it only had a small green screen and you couldn't display graphics.

03:04 But I was determined I was going to play a game on it.

03:06 So I found this programming book in BASIC for microprocessors and followed along,

03:15 learned what it was doing to create responses on the screen.

03:18 I ended up making things like Mad Libs and choose your own adventure and ASCII art and

03:24 stuff like that.

03:25 And just learning what the commands I was doing really did.

03:28 So that got me into programming that I kind of just did that on and off.

03:32 It was really strange, even though I was always working with computers.

03:36 When I hit college, nobody ever suggested to me that I should do computer science or that

03:40 I should be a programmer or that I should do this other thing.

03:43 So I went along with some of my other interests, which were astronomy and physics and music.

03:47 But even in astronomy software and software engineering and working with computers is a must.

03:54 And after I started working at the Institute for a few years, Python started to become a much more used language.

04:01 It's something that could provide real benefit for the scientific community.

04:05 And so I started learning that.

04:07 And that's how I got into Python.

04:09 Wow, fantastic.

04:09 The days of just holding up a telescope and looking at stuff, those are pretty long gone, right?

04:15 Yeah, they're pretty long gone.

04:17 In fact, most astronomers don't even get to go to the telescope on the ground anymore.

04:21 There's a lot of remote observing.

04:23 It was starting to switch over when I was in college.

04:26 It's much, much heavier now.

04:28 Yeah.

04:28 I can imagine.

04:29 What a bummer.

04:30 Those are amazing places.

04:31 Yeah.

04:31 How about you?

04:33 How do you get into programming, Python?

04:34 Yeah, for me, I dabbed a toe in it in high school with a Fortran class.

04:37 We actually got bussed over to another school to run our deck of cards through someone else's computer.

04:41 Oh, with cards.

04:42 That's another level right there.

04:44 I'm that ancient.

04:44 And then in college, a few classes here and there.

04:47 And then when I got out, one of the only skills I had was programming.

04:51 So I had a math degree.

04:53 And so I started with programming in Fortran at the start, but then got into other languages

04:59 and eventually got to Space Telescope in 1996 and have been here ever since.

05:04 It started with Python in about 2002.

05:06 There was a software conference called ADAS, A-D-A-S-S, Astronomy, Data Analysis and Software

05:13 Systems.

05:13 And we put one on, the institute that I work at, put one on in 2002 in Baltimore.

05:18 And Python was really a highlight of that conference.

05:21 It was really coming into its own.

05:22 And that just kind of broke the door down for everyone at the institute.

05:25 Oh, that's cool.

05:26 Behind one of our founders, Perry Greenfield, who works at the institute, is very, very big

05:31 in the Python community.

05:32 And he kind of led the way and we all followed.

05:34 Right on.

05:34 Well, it's certainly an amazing language for data science and for working with things like

05:41 these telescopes and whatnot.

05:42 So it has a special blend of approachability, but you can actually do quite a bit of real

05:48 stuff with it.

05:49 Whereas, you know, so many other languages are either approachable or you can do real stuff,

05:53 but not both.

05:53 Yeah.

05:54 It also cuts down a lot on development time and lines of code and stuff like that, which

05:59 makes it a lot easier to maintain larger systems that can handle the Python.

06:03 Yeah, absolutely.

06:04 I was so impressed with the clarity.

06:05 When you looked at a piece of Python code, you stripped away all the syntax and all the

06:09 language decorations and it was just, the design was staring at you.

06:13 So to me, it's the simplicity of it that's the best feature.

06:16 Yeah, absolutely.

06:17 So let's start a conversation maybe by talking about where you two work at the Space Telescope

06:25 Science Institute.

06:26 It just sounds so amazing.

06:28 That is a cool thing.

06:31 Yeah.

06:31 So what do you do there?

06:32 Megan, you want to go first?

06:33 Yeah, sure.

06:34 So right now I'm the technical lead for the data management system that's going to be accepting

06:41 and processing and distributing all of the data that we're getting from one of the next

06:46 big telescopes, the Nancy Gates Roman Space Telescope.

06:48 So my day-to-day job is making sure that all that is designed and functioning and data is

06:55 flowing through and the software we're going to write for scientists to do analysis of that

06:59 data will run and is appropriate and accessible.

07:02 That's kind of my, that's my day-to-day.

07:04 Before that, I worked heavily on post-pipeline scientific analysis processing software.

07:11 So visualization tools, data analysis tools, things that really the astronomy community

07:17 uses after we've constructed the data for them from the telescope.

07:22 Okay.

07:22 Tell us just a little bit about the data story of this.

07:26 There must be a lot of data coming off of these newer telescopes with their huge resolution.

07:31 Yeah.

07:31 So there is a lot of data coming down.

07:33 The detectors on board JWST are 2,000 pixels by 2,000 pixels.

07:39 And there's a lot of them and there's a lot of instruments.

07:41 And so we have to be able to manage all of that data stream coming through.

07:45 And I might let Mike talk a little bit here since he works heavily on the ops side of the house,

07:50 helping with managing that data.

07:52 All right, Mike.

07:53 Yeah.

07:53 Take it away.

07:54 What's your role at ST Sci?

07:56 My role is, yeah, it is a team lead for the data processing team, also a scrum lead.

08:02 We use the agile approach to software development.

08:05 And I'm focused completely on JWST science data processing and guider data processing.

08:10 The data comes down to us through the deep space network of NASA and hits the ground,

08:16 comes up to us at Baltimore.

08:18 We have the control center right at the building, so they're actually talking to the observatory

08:21 through the tel, right through there, our main building.

08:24 And then the data feeds down to our processing systems.

08:27 And that's where my code comes in.

08:29 Making sure we got all the data, completeness checking, sending the data through the pipelines

08:34 and the processing systems with the right processing recipe so that we get the right kind of data

08:39 products out and getting them into our archive.

08:40 The Institute has a well-known archive for astronomy data.

08:45 And that's where a lot of astronomers come to do research and find data sets that they

08:49 can use in their science.

08:49 A lot of that data is public, right?

08:51 It is.

08:52 It is very much so.

08:54 There's a proprietary period for some kinds of data so that scientists have a chance to

08:59 write their special papers that they proposed for.

09:02 But a lot of data is made immediately public.

09:04 And then other data becomes public after a bit of a window.

09:08 Oh, interesting.

09:09 So you might write a proposal for time on web or one of the telescopes.

09:15 And then if that's approved, then you get your time, then you get your data.

09:19 And of course, it wouldn't make sense to just instantly publish that because that's part of

09:23 your work, right?

09:25 Yeah.

09:25 And so this is actually, you know, pieces of the entire process.

09:30 The whole world of astronomers is allowed to propose to use these telescopes.

09:35 And there's a telescope review process that happens where they submit their suggestions.

09:42 This is what I want to do.

09:43 This is how awesome the data is going to be.

09:45 This is what I'm going to provide to the community and the discoveries I think I'll make.

09:48 And a team of experts, you know, agrees and decides how much time everybody is going to get.

09:55 And we have a lot of tools on what you might think of as the front end of this process

10:00 to help them figure out, you know, how long do the observations have to be?

10:04 If I'm looking through these filters and these wavelengths, what kind of errors am I going

10:09 to get?

10:09 You know, what's the exposure time?

10:10 How should I most effectively use the telescope time to get the science out?

10:15 Because telescope time is always, it's gold, right?

10:18 Right.

10:18 Absolutely.

10:19 There's one and only one of these things, right?

10:21 Yeah.

10:22 This particular one.

10:23 I didn't really think about that you, for asking different types of questions, there

10:28 was a different amount of time that you might need.

10:31 I mean, obviously, if you're going to stare at space for a certain amount of time, it's

10:34 going to take that time.

10:35 But I hadn't really thought about the different processing pipelines might require more data

10:41 to get the right level of accuracy and stuff.

10:43 That's pretty interesting.

10:44 Yeah.

10:44 So it really does depend on what you want to get out of it.

10:48 When you insert in different wavelengths, you have to look at those things for a long enough

10:52 amount of time to collect the light, to get statistical certainty, to get detections.

10:57 But then you have to play that against the fact that as you look longer and deeper, your

11:04 field might get more crowded because you'll be able to pick up light from more distant objects.

11:08 You may have things that are called cosmic rays, extra energy that gets added into the

11:14 detector that you don't want.

11:15 That's not from your object, that's just messing up your data.

11:18 And the pipelines take these things into account.

11:21 We look for things that are, oh, this is not coming from the object.

11:24 This is coming from something in space that we don't want to detect right now or something

11:30 that is being imparted by the instrument onto the data.

11:33 And we want to remove the instrumental signatures.

11:35 So that's what a lot of the processing software tries to look at.

11:38 Yeah.

11:38 Fantastic.

11:39 Mike, anything you want to add to that?

11:41 Just one other kind of data is time-dependent data.

11:43 You may need to look at objects over time, either calendar time or even many hours or many

11:49 days to see variability.

11:50 And that's a big part of some of the data sets is they just stretch on for time so that you can look for

11:56 planetary transits or flares or various phenomena that might vary over time.

12:01 Yeah.

12:01 How interesting.

12:02 One of the goals that Webb is trying to solve is to detect exoplanets, right?

12:08 And for those, you've got to watch multiple years of whatever that planet's year orbital

12:14 time is, right?

12:15 If it takes it three months to go around, you've got to watch the star for several three-month

12:20 periods, right?

12:21 Well, and even in the case that you don't know that the stars have planets, you want to find

12:25 new ones.

12:25 You need to go back and look at that star multiple times so that you can detect that difference

12:30 in the light that it's emitting.

12:32 So when the planet goes across the star amount, some light is going to be dimmed by that.

12:37 Or if you're looking at spectroscopy, you might see the starlight passing through the atmosphere

12:43 of the planet itself.

12:45 Yeah.

12:45 Fantastic.

12:45 All right.

12:46 I want to dive into the Python side, given that that's what most people are interested in.

12:50 But before we get into that, let's just talk real briefly about what are some of the science

12:55 missions and how is this, say, different than Hubble?

12:58 It looks really different than Hubble, right?

13:01 It's this bunch of hexagon gold plates that unfold instead of being a tube like a traditional,

13:07 even, you know, the sort of space telescope like Hubble is.

13:10 It's got a tennis court sized shield around it and stuff.

13:15 So whoever wants to jump in, tell us a bit about just like what is the goal and the science

13:20 of Webb?

13:21 Yeah.

13:21 So Webb is really the largest space telescope.

13:24 It does have segmented mirrors, 18 of them, that can be adjusted to make sure that the focus

13:31 is good.

13:32 It's something like six meters across.

13:34 It's really big.

13:35 It's six and a half meters across the mirror.

13:37 I think it's something like 21 feet in diameter, something like that.

13:43 One of the things that makes it different from Hubble is not only the size of the mirror and

13:48 the size of the telescope, but Hubble orbits the Earth.

13:51 And JWST is demonstrably not orbiting the Earth.

13:55 It's not very close at all.

13:57 It's out at the second Lagrangian point, which is a more stable point that things orbiting into

14:04 the Earth's sun system tend to collect at.

14:07 And it's not even at that point.

14:09 It's orbiting around that point and facing away from the Earth.

14:13 And so one of the things that's really important for Webb is that it stays cool, that its instruments

14:18 stay cool.

14:18 And that's why it has this big sun shield.

14:20 And so the cold side of the telescope is where we have all of our instrument packages.

14:25 And that allows it to pick up this infrared wavelength of light, which is sensitive to heat.

14:31 And it allows it to do the science that we need to do to look at dust and look through dust and look all the way back, you know, to the very earliest time just after the Big Bang when light was starting to be visible, to collect and start to form objects and stuff like that.

14:49 So the design of the telescope is very much to enable the science that's possible with it.

14:53 And Hubble uses visible light?

14:55 Is that the difference?

14:56 So Hubble has a range.

14:57 Hubble actually has instrumentation that looks at uv, visible and then near infrared.

15:04 So it has a little bit of everything.

15:06 JWST also looks at near infrared, but it goes much further out into the mid infrared as well.

15:11 And I guess that goes through dust clouds and things like that better.

15:15 It can.

15:15 So you can switch between the different wavelengths of light depending on what types of phenomena you want to see.

15:22 But also because we want to look very far back in time, what happens and what enables us to do that is the red shifting of the light.

15:30 So because after the Big Bang and the expansion of the universe, the light gets stretched by the expansion of space.

15:38 To look back in time, you want to look at the light that is of that similar wavelength that's been stretched.

15:43 So that's why we're looking in the infrared wavelengths.

15:47 This portion of Talk Python To Me is brought to you by Datadog.

15:50 Let me ask you a question.

15:52 Are you having trouble locating the bug in your app's code or identifying the cause of your application's latency?

15:58 Track your Python app's performance with end-to-end tracing using Datadog's Application Performance Monitoring, or APM.

16:06 Datadog's APM generates detailed flame graphs to help you debug and optimize your Python code by tracing requests across web servers,

16:14 database calls, and services in your environment.

16:17 Without switching tools for context, you can navigate seamlessly to related logs and metrics within the same UI to troubleshoot errors fast.

16:25 Break down inefficient silos between your dev and ops teams and visualize your application's performance end-to-end.

16:32 Allow your development team to focus on revenue-generating projects and releasing applications faster to market.

16:40 Get started with a free trial of Datadog APM at talkpython.fm/Datadog, or just click the link in your podcast player show notes.

16:48 Thank you so much to Datadog for supporting the show.

16:53 So what are some of the missions?

16:55 I talked about the, you know, we talked about the exoplanet a bit.

16:59 Exoplanets, galaxy history, evolution.

17:02 Star formation.

17:03 Star formation.

17:04 Star formation.

17:05 Looking at the chemical composition of the galaxies and of stars.

17:10 So it really uses this combination of spectroscopy, imaging, and coronography.

17:15 And the coronography is excellent for exoplanets because it allows you to put, you know, basically put a stopper in front of the star

17:23 and look for really, really dim things that are very close to it.

17:27 Yeah.

17:27 Solar system objects as well, and even things out in the comet, you know, or cloud.

17:33 On the Kuiper belt.

17:34 Things that we just can't see.

17:35 We don't have anything that can get out there in the infrared like this telescope we'll be able to.

17:39 That's going to be super exciting.

17:41 Hubble really changed people's view of the world, right?

17:45 I'm seeing all the, like the sky full of galaxies and stuff like that.

17:50 What do you think we're going to learn?

17:52 I mean, maybe it's too hard to predict, but what do you think is going to be surprising?

17:56 Or what types of things do you think we're going to learn and change our perspective with this?

18:00 I guess what I hear from most of the scientists is they know they're going to find out things that they never even thought of

18:06 because that's what Hubble brought.

18:07 And when you put something up that's this groundbreaking, that it goes so far beyond what we can currently do,

18:13 there's going to be just surprises that are just going to blow people's minds because they did not think that things were even possible.

18:19 Hubble basically was involved in finding the acceleration of the universe, expansion, and that wasn't even conceived of.

18:26 So things that are groundbreaking and just never thought of, the astronomers are open to those,

18:32 and they're waiting for the data to come in to start to get to look.

18:35 Yeah, fantastic.

18:36 All right, speaking of data, let's start talking about that side, Mike.

18:41 You said the data comes in over the Deep Space Network.

18:45 Is that my name?

18:46 Yeah, the Deep Space Network.

18:48 Tell us about the kind of data coming in and how that all works.

18:52 Sure. The Deep Space Network is what we use to talk to the observatory.

18:56 We get a couple contacts.

18:58 Well, eventually, when it's in steady state operations, we'll get a couple contacts a day where it'll download the data to us.

19:03 Right now, we're in fairly steady contact.

19:05 There are three ground stations around the world in Australia, California, and Spain that kind of give nearly continuous coverage.

19:13 So they're getting the data down.

19:15 What kind of bandwidth can you get on this network?

19:18 These networks are big antennas, and we share this with other missions, with other things that are in space that need to communicate.

19:24 Like Mars rovers and whatnot.

19:26 Like Mars rovers and stuff.

19:27 And they typically operate in the KA and S-band regions.

19:32 Yeah, cool.

19:33 All right.

19:34 I'm sorry.

19:34 I didn't mean to derail you, Mike.

19:35 Go ahead.

19:35 No, all good.

19:36 So the data comes down to the Deep Space Network facilities.

19:39 It gets transferred up to the Institute, which is in Baltimore, to our flight ops team, the flight observation system.

19:46 They get the data in several forms.

19:48 They get binary files right off the flight data recorders.

19:52 There's a recorder that captures both science data from the science instruments and engineering data that's monitoring the state of the telescope temperatures and pressures and various things.

20:01 Right, of course.

20:01 Because there's a whole control center going, it's focused right, it's still running, and things like that.

20:07 Absolutely.

20:07 Yeah.

20:08 They have the health and safety of the observatory and then the science as well.

20:11 So we get those binary files that come up.

20:14 We get auxiliary files that are processed on the ground.

20:17 We take the engineering data that comes down, extract out some really key parameters that are necessary for the data processing and put those in special files and send those over to our data processing system, along with kind of a summary of what the telescope was doing since we last contacted.

20:31 So we know, I'd observe these things and these observations worked, these had problems, gives us kind of a status of things.

20:38 And the other kind of data that we absolutely need is called ephemeris data.

20:41 It tells us exactly where the telescope was at any given time, which can be important for some types of science.

20:47 So all that data flows our way in files.

20:49 We use Python tasks to whole a common disk area that files are dumped in and pick them up and transfer them to our processing system.

20:57 So the telescopes receive the data, that gets processed probably at the tele, not the telescope.

21:02 Ground stations.

21:03 Ground stations.

21:03 They receive it.

21:04 Yeah.

21:04 And then that gets sent probably over the internet, gets dropped into some like a local file or some cloud storage.

21:10 You're watching that and then Python picks it up from there.

21:13 That's right.

21:14 We pick it up.

21:15 We segregate what kind of data each is, send it on for various kinds of processing.

21:19 We put a lot of that, those data files, right as they come in the door in our archive.

21:22 So if we need to get them back out to look at them later, we can.

21:26 We use a system of, for kind of distributing the processing called HT Condor.

21:32 It's made by the University of Wisconsin and it lets us distribute the processing over a big network of machines.

21:38 This was developed even before the cloud came into bay and now the cloud's just part of it.

21:43 So we use that system to span out the data processing of these various types of files.

21:47 And we have a lot of kind of data completeness checking that we do in Python where we got to register what came in, whether we got all the packets.

21:54 The data can come in different orders.

21:55 It can be split up different ways in the recorder.

21:57 And you have to kind of do a bunch of data counting before you can send it down the pipeline for processing because otherwise you'll have holes in it and the processing will give you the best product.

22:06 So we do all that pre-data accounting analysis and when we've got something that's got all the parts, we send it on down the pipeline with a particular processing recipe that tells the pipeline to apply these exact calibration recipes, make these exact kinds of files and get them into our archive for the scientists to retrieve and do science with.

22:24 That's really interesting.

22:25 So you're doing some of the error checks and all that kind of data cleaning stuff for them before they have to pick it up.

22:32 Well, we also need to check to make sure that there wasn't an error in the transmission of the data that we got.

22:36 Everything off the telescope that we thought we were going to get, whether things need to be retransmated.

22:41 And we're recording all the information about the instrumentation surrounding the detectors, what the temperatures are on the telescope and saving that and associating it with the data so that when scientists are looking at it, they can make correlations with the different things that they see.

22:57 Oh, nice.

22:57 Teddy, the audience has a question that's pretty related to that.

23:02 How do you manage potential disruption in data transfer?

23:05 Like if you lose the connection?

23:06 Do you all have to worry about that or is that handled like below the layer?

23:11 Okay.

23:11 We do.

23:12 It does happen.

23:13 It is handled upstream from us.

23:15 The deep space network has that capability.

23:18 Our flight ops team has to get them sometimes to retransmit the data.

23:21 If we got it on the ground, but we just didn't get it to our institute, we can get it resent from another ground station.

23:25 But if it really didn't make it all the way down from the observatory, then they've got to go back on the next contact pass and get it to retransmit to the ground.

23:33 Yeah.

23:33 So there must be some kind of protocol exchange over the deep space network, right, that sort of takes care of that for you.

23:42 Yep.

23:43 And as Megan said, that's a shared resource.

23:44 So if they happen to be talking to the Mars rover or they happen to be talking to something else, we got to get in line with everyone else until we get our turn again.

23:51 Sure.

23:52 Also related to that is, you know, what's the latency between the first piece of data sent and the first one received?

23:57 I mean, we are at great distances talking about the speed of, you know, limited by the speed of light, right?

24:03 So it's not milliseconds, I'm sure.

24:05 Yeah.

24:06 Megan, do you recall what the travel time deal to is?

24:09 I don't recall, honestly.

24:11 I don't know.

24:11 Probably by the time we get it, we're talking minutes, some range of minutes, but I haven't calculated it recently.

24:18 Yeah.

24:19 Yeah.

24:20 Very interesting.

24:21 I mean, when we talk about data, you know, you may take pictures with your camera, with a CCD in your camera, and you'll see that picture immediately.

24:28 And it's a square, and it looks like the scene that you got.

24:30 But when Mike was talking about, you know, we're waiting for the pieces of the data to come, that square gets chopped up on board the telescope and sent to us in little tiny pieces, and we have to reconstruct it.

24:40 Yeah.

24:40 You don't do it all at once, right?

24:42 And the other part of that analogy is if you hold your camera up and you take a picture of the sky, it's just a picture.

24:46 We add all kinds of what we call metadata, supplementary data to that image that tells where it was pointed in the sky, what filters were in use on the observatory, what astronomer asked for this data, what data grouping is it part of?

24:58 They do groupings in what they call proposals.

25:00 And so we have all this extra data that we put into the files so that someone coming in to our archive who didn't propose the data can still extract it and get some understanding of what was done, how that piece of observation was set up.

25:14 That's a good idea, of course, because if somebody says, here's a directory full of large JPEGs, do science now, right?

25:21 That's, well, hold on.

25:23 Exactly.

25:23 They're beautiful and they're almost useless.

25:26 Yeah.

25:26 They're almost useless for science.

25:28 Yeah, absolutely.

25:29 They would still make a good coffee table book, though, probably.

25:32 Once they were cleaned up of the cosmic rays and all those things that Megan talked about.

25:36 Once you combine all the multiple filter images, you get the pretty colors.

25:42 Yeah, I bet.

25:42 Yeah, those are amazing.

25:43 Samperia asks, you know, what's the max onboard storage in case, like, the data, if there's an extended period where you can't get to it?

25:52 You know, how long will this thing run before it's just hard drive fills up?

25:55 Meg, do you remember the capacity for the recorder?

25:58 I know, because I have Roman in my head right now, so I don't remember the JWST capacity.

26:02 But they take that into account, and that's how they've scheduled our contacts and downlinks with the deep space network so that we can get the data off in a reasonable time.

26:11 And it's not affecting, you know, scheduling of new observations.

26:15 So they've done a lot of work.

26:18 And a lot of the work that we've done with Hubble in the last 30 years, because we do scheduling and data processing on Hubble, has allowed us to understand how to best optimize those types of things.

26:27 Yeah, very interesting.

26:28 Yeah, you must have a lot of experience from Hubble because that's a lot of data as well.

26:32 I was wondering how much computing happens on the telescope versus how much is it just a receptor and a transmitter?

26:38 There is a little bit that happens on board, JWST, especially in the instruments.

26:42 Meg probably knows more of the details of some of the infrared instruments, but they do image differencing and summations.

26:49 Infrared detectors build up their signal over time.

26:51 And rather than send all those bits to the ground, there are some on-board calculations that are done so that they send a bit less to the ground than they actually collected.

27:01 Yeah, so one of the things that's different about IR instruments than UViz CCDs is that we don't, every time, there is no shutter in the IR.

27:11 Every time we want to know what the detector's collecting, we kind of ask it really nicely.

27:17 What's the voltage of this pixel?

27:18 Then we ask it again, and we ask it a whole bunch more times.

27:21 And that's the data that we send down.

27:23 And knowing how that signal is accumulating in the pixel without removing the signal from the pixel allows us to do cool things with IR data that allow us to reject the cosmic rays that may have come during the course of the exposure and stuff like that.

27:41 Nice.

27:41 Do you have to do, like, stabilization?

27:43 I know the thing is probably pretty stable out there, but it's also looking really far away.

27:48 Yeah, stabilization, yeah, it is looking far away.

27:51 And it is mostly stable.

27:53 Where it is orbiting around L2 is not a completely stable orbit.

27:57 And so they do have to do station keeping, which I believe fires rockets to make sure that it's in the correct orbit.

28:05 Yeah, one of the big pieces of news that was really great was ESA, the European Space Agency who launched this, did an excellent job of getting it right on target.

28:14 So it didn't have to correct much, which means it has extra fuel to run longer to do that, right?

28:18 Exactly.

28:19 They did an amazing job with that launch.

28:22 That was like a flawless launch.

28:23 It was really, really cool.

28:25 Fantastic.

28:26 All right, a real-time follow-up.

28:27 We've got Adam out in the audience says, JWST can store at least 65 gigs of science data.

28:32 Downloads occur in two four-hour contacts per day where each can transmit 20.6 gigs of data.

28:39 How about that?

28:39 Nice.

28:40 When we reach steady state science operations, that's probably correct.

28:43 Yeah, right now we're contacting a little more often because as they're going through commissioning, they really need to interact with the observatory much more often.

28:50 When we get steady state operations, they'll send plans up to the observatory.

28:54 And the observatory will just tick through the plan.

28:56 It'll look at this star.

28:57 It'll throw these filters in it.

28:58 It'll turn to this galaxy.

28:59 Look at these filters.

29:00 Right.

29:01 Go look at this planet.

29:02 And it'll basically do it a program where right now they're interacting much more often with the observatory.

29:07 So if you tell it to stare at this spot, this blank dark spot in the sky for four hours, you don't need to check in with it as frequently as long as it's...

29:15 Exactly.

29:16 Yeah.

29:16 If you told it to stare at that spot and something went wrong, James, the web telescope is smart enough to skip ahead and go to the next thing in the schedule where Hubble was very ground dependent.

29:25 Everything Hubble did had to be told from the ground.

29:27 And James was T is a little bit more automated where it will, if it's got a problem with something it's trying to do, it'll just skip ahead to the next thing and someone else will reschedule that later on.

29:36 So I heard that there's some kind of JavaScript stuff running actually on the telescope.

29:41 Is there any Python happening there or is the Python story really once the data gets back here?

29:46 There's no Python on the telescope.

29:48 There is JavaScript on the telescope.

29:50 It is a forked private version of JavaScript from a while ago, I believe.

29:57 There was a study done to think about what's the best way to do instrument commanding and that's what I believe is chosen.

30:04 Yeah.

30:05 Interesting.

30:05 Yeah, it's fine.

30:06 It doesn't really have numbers, but I guess you don't need numbers for science.

30:09 So it's okay.

30:09 Well, so, I mean, when you think about the volumes of data that are coming down from the telescope, we're also transmitting, you know, the science pixels as integers.

30:18 They're unsigned integer arrays.

30:20 And then when we do the actual processing, those are expanded into floating point.

30:24 So there's expansion that happens on our end for processing and storage and analysis.

30:28 Yeah.

30:29 Cool.

30:29 Mike, you mentioned this HT Condor.

30:32 Maybe tell us a bit more about this.

30:34 Is this something that people would find generally useful outside of telescopes?

30:39 Absolutely.

30:39 It's a very generic product developed by the University of Wisconsin.

30:43 Back in the day, they developed it because they had all these computers sitting around with people's desktops that they wanted to make use of.

30:50 You know, people were using them for three or four hours a day and then going off to lunch and off to meetings and the computers were sitting there.

30:56 And they used this system as a way to harness those cycles to the point where if no one's sitting at a computer, Condor could tell it could drop a job on that computer, run until the person comes back.

31:05 As soon as they hit a keystroke, the job would leave that computer and if needed, go off and finish its work somewhere else.

31:11 So it started in that realm and they just expanded and expanded over the years.

31:14 Kind of like SETI at home or one of these sort of grid computing stories.

31:18 Exactly.

31:19 It started in that realm.

31:20 Now it can process over full universities using all the machines at a university, research clusters.

31:26 They're government initiatives that have grid setups.

31:30 Now the commercial clouds are involved.

31:31 They have interfaces to AWS and Microsoft Cloud and a whole bunch of others.

31:35 So they really expanded their access.

31:38 It's highly used.

31:40 The realm that pushes it the most are the ones that do the...

31:44 Oh, Megan, what is the big science that just came out last year?

31:46 I'm throwing a blank.

31:47 The big science that just came out?

31:48 Yeah, the detectors where they found...

31:51 Not LIGO.

31:53 LIGO, thank you.

31:54 Oh, that's right.

31:56 Something about space and black holes, yeah.

31:58 The computing demands of those detectors are just...

32:01 They make JDBST look like a pen.

32:03 Oh, really?

32:04 Computing on your pen.

32:05 So they're really off the charts.

32:06 They need so many cores and clusters to do their computations.

32:09 And this is one of the only systems...

32:11 Condor is one of the only systems around that can get them the access to the cores they need.

32:15 And it's highly used in the LIGO community.

32:17 This portion of Talk Python To Me is brought to you by the Stack Overflow podcast.

32:23 There are few places more significant to software developers than Stack Overflow.

32:29 But did you know they have a podcast?

32:30 For a dozen years, the Stack Overflow podcast has been exploring what it means to be a developer

32:36 and how the art and practice of software programming is changing our world.

32:41 Are you wondering which skills you need to break into the world of technology or level up as a developer?

32:46 Curious how the tools and frameworks you use every day were created?

32:50 The Stack Overflow podcast is your resource for tough coding questions and your home for candid conversations with guests from leading tech companies

32:58 about the art and practice of programming.

33:00 From Rails to React, from Java to Python, the Stack Overflow podcast will help you understand how technology is made and where it's headed.

33:08 Hosted by Ben Popper, Cassidy Williams, Matt Kierninder, and Sierra Ford,

33:13 the Stack Overflow podcast is your home for all things code.

33:16 You'll find new episodes twice a week wherever you get your podcasts.

33:20 Just visit talkpython.fm/stackoverflow and click your podcast player icon to subscribe.

33:26 And one more thing.

33:27 I know you're a podcast veteran and you could just open up your favorite podcast app

33:31 and search for the Stack Overflow podcast and subscribe there.

33:33 But our sponsors continue to support us when they see results and they'll only know you're interested from Talk Python if you use our link.

33:40 So if you plan on listening, do use our link, talkpython.fm/stackoverflow to get started.

33:45 Thank you to Stack Overflow for sponsoring the show.

33:48 And I think Python too is a huge part of that LIGO result.

33:53 It is.

33:53 They use a lot of Python.

33:55 Yeah, that's awesome.

33:56 Condor has a full Python interface.

33:58 So you can talk Python right to Condor, import the package, and you're off and running.

34:03 Yeah, that's really neat.

34:05 Just pip install or Condor install HD Condor and off it goes, right?

34:09 Something like that?

34:09 Yep.

34:09 pip install power.

34:10 Awesome.

34:11 You mentioned that it takes advantage of these computers, which are kind of sitting idle.

34:17 And of course, we've got like folding at home and we've got SETI at home,

34:22 those types of things, which they seem to have gone away, which I think it's a little bit sad,

34:26 actually.

34:26 But there's been these things where it's like, well, if a personal computer is sitting around

34:30 idle and we've got a bunch of them in an office or a university or whatever, then we could use them.

34:35 That's great, right?

34:36 But you also mentioned the cloud.

34:37 And I guess I had never really thought about it, but I know the cloud providers don't want this

34:42 from you.

34:42 But if you pay for a virtual machine, there's a good chance that it's sitting there doing 20%

34:48 of what it could do, right?

34:49 So you could actually, if you've got 10 or 100 virtual machines running in the cloud, you could

34:54 say, you know what?

34:54 Whatever extra capacity you have, I'm going to use that for this other sort of scheduling

34:59 service.

35:00 The good thing about astronomy is most of it doesn't have to happen in real time.

35:03 So we can buy up the cheap cycles where the machines are not being used one hour on a clock

35:09 and get good deals to go off and do our reprocessing and processing that has to happen.

35:14 Right.

35:15 Like reserved instances or something like that at AWS.

35:18 Right.

35:19 HST is already doing some of their process of reprocessing in the cloud.

35:22 That's another point is we get the data down from the observatories once and we process it once,

35:27 put it in our archive, and then we reprocess it again and again as the calibration algorithms

35:31 improve, as software bugs are found, and as reference data, it's particularly calibration

35:36 reference data, special data that's used to remove those instrumental signatures that

35:40 Megan was talking about.

35:41 As all that supplemental data and algorithms improve, you rerun the data again and again and

35:46 you need the computational capacity to do that while data is still coming down from the telescope

35:50 because your archive is getting bigger all the time.

35:53 So you've got a bigger crank to turn to get everything up to the best possible product it

35:58 could be.

35:58 That is pretty fascinating.

35:59 I hadn't really thought about that you would have to reprocess the data, but sure.

36:03 It drives all our designs because we reprocess the data tens of more times than that very

36:10 first time it hits the ground.

36:11 Sure.

36:11 What does the compute stuff look like for you all?

36:14 Are you using GPUs?

36:15 Are you using some of the NumPy, SciPy, AstroPy stack to do this?

36:21 Those are definitely involved, yeah.

36:22 Yeah.

36:23 So GPUs, I don't think we have an excessive amount of GPUs.

36:27 Probably those are more in the post-processing parts of the software.

36:30 But those other things you mentioned, most definitely.

36:33 And we actually are heavy contributors to those packages as well.

36:37 Fantastic.

36:37 We do a lot to provide back to the community the software that we're creating, especially

36:44 our external data analysis, post-processing analysis software, so that it can be used by

36:50 the rest of the community.

36:51 Yeah.

36:51 Samapriya asks, what can a user at home do with the data?

36:56 Or do you need these huge clusters, compute clusters, to work on them?

37:01 There are no books?

37:02 Yeah.

37:03 So if people are out there listening, I suspect there are some astronomers who have an expertise

37:08 in this, but there's probably a lot of people who are just fans of space and might want to

37:12 play with this.

37:12 So any data that's public that you can access in our archive, you can download and you can

37:19 analyze it.

37:19 You can install the science calibration software.

37:23 So the stuff that takes the data that we've already prepared for you and does further analysis

37:28 on it, we have visualization tools you can install.

37:31 We have many different things that scientists or even non-scientists might want to play with

37:37 to look at the data.

37:38 That's all possible.

37:39 I do that with Hubble data all the time on my laptop and my laptop just has 16 gigabytes of

37:44 RAM.

37:44 And now Hubble data is fairly small compared to the machines we were running on.

37:50 Yeah, I bet when it first, how old is Hubble?

37:52 20 years?

37:53 30.

37:54 30.

37:54 Yeah, 30-year-old computers probably thought that data wasn't small.

37:57 Yeah.

37:57 So we've gone from terminal services to desktops and laptops and back to terminal services eventually

38:03 in the near future because of the size of the data.

38:05 Yeah.

38:05 So do you have notebook servers that are running close to the data that then you can log into

38:11 and play with?

38:11 Or what's that look like?

38:12 So we do at different levels.

38:14 You're talking about Jupyter Notebooks, Jupyter Lab, that type of thing?

38:17 Yeah.

38:18 So that does exist.

38:19 We've been playing around with science platforms in the cloud to give people access to not only

38:25 the software, but the data in a very easy way.

38:28 And I think Jupyter Lab makes that really, really effective for the tools that we can provide

38:34 to people.

38:34 Yeah.

38:35 But you can also run it yourself on your own local machines.

38:37 Sure.

38:38 As the data gets bigger and bigger, part of the cost and latency is just transferring it,

38:43 right?

38:43 So if you can run your compute close to it, you can save money by not transferring it and all

38:47 sorts of things, right?

38:48 Exactly.

38:48 Or reading it into memory for very, very large files.

38:51 As a system, the system itself needs to be very large to process the sheer volume of data

38:56 that we get and are constantly getting and reprocessing and storing and serving.

39:00 But for individual data pieces, it's still very possible for people to process it on their

39:06 own.

39:06 Sure.

39:06 Do you use things like Dask or any of these sort of Panda NumPy-like things that scale

39:11 out to larger data?

39:12 Yeah.

39:12 And it depends on the tool and what the purpose is that you're using.

39:17 Sure.

39:17 For sure.

39:18 Mike, got some thoughts on this before I move us along?

39:21 No, that's a really good summary of the picture.

39:24 Yeah.

39:24 And I think, I'm pretty sure at this point, all the public HST data is in the cloud.

39:28 I think it's all on AWS.

39:30 And I believe you can work with it there.

39:32 Yeah.

39:33 Especially if you've got research to do.

39:35 Yeah.

39:35 So one of the things that got my attention, besides someone, I think, sending me a message,

39:40 hey, you should talk to the JWST folks, is you have over at, on GitHub, you have the Space

39:46 Telescope Science Institute, where you've got all kinds of various things that people can

39:52 go play with, right?

39:53 Maybe tell us a bit about some of the highlights over there.

39:56 The one that I ran across is just JWST.

39:58 Yeah.

39:58 Which is, it describes itself as the Python Library for Science Observations from the JWST.

40:04 Yeah.

40:05 So this is the, what we call the science calibration package for JWST.

40:10 The software that lives here is able to do the detector calibration that we talked about

40:17 and image combination and everything else that we need to do to create the standard products

40:22 we've agreed to create for the mission.

40:24 That's all contained in this package.

40:26 And this package gets installed in our backend systems to be run as we're processing the data.

40:32 And it's ready to be processed at this level.

40:34 So this is what users would install if you wanted to reprocess or do higher level analysis,

40:42 the JWST data for the different instruments.

40:44 You spoke about the reprocessing and the reanalyzing.

40:48 So you would install this.

40:49 And if you had an idea on how to maybe do different noise reduction or other processing,

40:55 this is what you use.

40:56 Exactly.

40:57 Exactly.

40:58 Exactly.

40:58 We performed some base calibration to put the data in the archive and to make it somewhat

41:03 usable.

41:03 But most high level scientists are doing their own recalibration and they're tuning the data

41:08 for what they're trying to get out of it.

41:10 Especially if they're working at the margins of noise and other things where they really have

41:14 to work with the data to get out of it to fit their science needs.

41:17 So a lot of this is the byproduct of this proposal process because we don't know where

41:21 the science is.

41:22 It's going to come through.

41:23 We try to provide the best generic processing for all the data.

41:27 Right.

41:27 It's pretty neat that this is just here on GitHub.

41:30 That's cool.

41:31 Yeah.

41:31 I suspect when Hubble came out, it was not like, well, here's the open source thing and

41:35 here's how you contribute back in the same way.

41:37 Right.

41:37 I'm sure it was shared to some degree, but the openness of science and really the computational

41:44 bits of science over on GitHub is pretty amazing.

41:47 It's a lot of fun.

41:47 I think we started using GitHub when it first came out.

41:50 We were, especially during Hubble, using subversion and even older version control systems.

41:56 CVS or something dreadful like that.

41:58 CVS, RCS, even.

41:59 Whatever we could use.

42:01 Yeah.

42:01 They were run on internal systems and managed on internal web pages.

42:06 So when GitHub came out, it was really nice to be able to share our software, not only with

42:12 astronomers in the larger community, but with other missions that we interact with and to

42:18 be able to talk to them about how we develop and accept changes and improvements into the

42:23 software.

42:24 Yeah.

42:24 You see right here, there's a 541 issues and 20 PRs that are open, but 3,462 that are closed.

42:32 Yeah.

42:33 And that's pretty amazing.

42:33 Yeah.

42:34 That's what we do a lot of work.

42:37 And JWST is a new mission now.

42:39 Yeah.

42:40 Awesome.

42:40 So another thing that I ran across that looks interesting is the web PSF, which is a simulation

42:48 tool, right?

42:49 Maybe tell people about this.

42:50 So web is obviously the telescope.

42:52 PSF are point spread functions.

42:56 This is the statistical pattern that light falling on the detector from a star would make.

43:00 And you can predict what that pattern is going to be based on the optics that are in your

43:06 telescope.

43:06 And so this piece of software takes that understanding of the optics in the telescope and how light

43:13 gets transmitted through those optics, including through different wavelengths, and create simulated

43:19 images of what we might be able to see.

43:22 This allows us to not only predict how the telescope is going to perform in different ways, but develop

43:30 our software, develop the algorithms that we use to do pull out stars and stuff from images.

43:36 Yeah.

43:37 Very neat.

43:38 I suspect if people didn't have access to the telescope and they wanted to play around

43:41 with some stuff, maybe they could use this.

43:44 Yeah, they could.

43:45 There's other simulation tools that are out there that will simulate full astronomical scenes

43:49 as well.

43:50 So not just individual stars, but galaxies and the combination of the two.

43:55 Do you have something you can point people at?

43:56 I think I'd have to send you a link.

43:59 Yeah, we'll put it in the show notes.

44:01 Yeah.

44:01 Okay.

44:01 So people can get to it.

44:03 That's great.

44:03 Let's do that.

44:03 Yeah.

44:04 Awesome.

44:04 Another thing I ran across looking at all this stuff is this place called Astro Conda.

44:09 And there's a whole bunch of stuff, just tons of libraries in here.

44:14 It looks like there's a lot of neat things like, for example, working with the ASDF file format,

44:19 which I suspect is something that you all provide a lot of.

44:22 Yeah.

44:23 And so on.

44:23 So we developed that format.

44:25 We wrote that format.

44:27 The primary interpreter is in Python.

44:29 So a little bit of astronomy history.

44:31 Astronomy for a very long time has used a file format called FITS, Flexible Image Transport

44:37 System.

44:37 It started around the time data was being saved on tapes, a tape store.

44:41 You might want to optimize for different things if it's going on tape and SSD.

44:46 Right.

44:47 And so it's been in the community a long time.

44:49 And a lot of community tools are based on it, accessing it.

44:52 It was a big part of this other thing in the astronomy community called IRAF, which was

44:59 a common software package.

45:01 It actually had its own virtual operating system and command line languages that did the reduction

45:06 for us.

45:07 One of the things that JWST wrote about is we needed to be able to handle those complex

45:14 optical path descriptions for how the light from a star you're observing gets onto the

45:20 detector and how you translate those positions back and forth.

45:24 So where's telescope pointed?

45:25 Where is this light as it's moving around?

45:28 And how can I tell at this pixel what star that relates to in the sky?

45:32 Yeah, it seems very non-trivial because you've got all these different hexagonal pieces that

45:37 are independently adjustable and you've got to sort of reassemble that into a continuous

45:41 thing, right?

45:41 So that's part of it.

45:42 And even larger part is the number of optics that are in that chain and then the optics

45:47 inside each of the instruments.

45:48 And some of them are spectroscopic instruments that divide light into its constituent wavelengths.

45:54 And those wavelengths have different travel paths and will fall on the detector in different

45:59 places in mathematically predictable ways, but not always simple mathematically predictable

46:04 ways.

46:05 And so saving that information was really difficult in fits.

46:09 And we developed this new format that will allow us to save analytical models into the

46:15 data itself that can then be opened up by the users and very easily used to understand the

46:21 relationship between the stars and the pixels that they are looking at and the stars in the

46:26 sky.

46:26 And so we started JWST with that and it actually gets packaged into a fits file, but in later

46:33 missions, we'll be using just as diff.

46:35 Yeah, that's really neat.

46:36 So lets you save more of just the raw data.

46:40 It gives us a really nice way of saving information we need to understand about the data along with

46:47 the data itself.

46:47 We'd also save the binary arrays.

46:49 It's actually, if you were to look at it, the text part of the format is YAML based on JSON

46:55 schema standard.

46:56 Yeah.

46:56 Oh, fantastic.

46:56 So looking through this AstroConda thing here, obviously there's stuff that's very focused

47:03 on astronomy like ASDF, but there's also other cool libraries like APTRS.

47:09 There's all these like little things.

47:13 So if I want to write something to a temp file or where's my user home or stuff like that,

47:19 right?

47:19 You can ask questions like that of this APTRS thing.

47:22 Yeah, that's interesting.

47:23 I actually, I haven't looked at APTRS, but it must be being used by one of our other sub

47:28 packages and somebody wrote it, wrote it for a good purpose.

47:30 This AstroConda site that you're looking at was, you know, this repackaging of tools that

47:36 are written in Python, often with C extensions that can be used by astronomers in the community

47:43 to do what they need to do to look at the data, calibrate the data.

47:46 And so this AstroConda channel is a Conda channel that allows us to organize that information

47:51 and deliver it to users who are using Conda environments.

47:54 Nice.

47:55 Or virtual environments.

47:56 Yeah.

47:56 You can say something like install this channel, everything in this channel, and then you'll

48:00 basically be able to do most of the work that we're talking about or something like that.

48:05 Right.

48:05 And so most of these are now available also separately on CondaForge and PyPy.

48:10 So often people will install from there too.

48:13 Sure.

48:13 Yeah.

48:13 So we've got, let's see, ASTEval, a safe, minimalistic, evaluated for Python.

48:19 That's pretty cool.

48:20 You've got mysterious ones like CubeTools, which just refuse to identify themselves.

48:25 That's actually for 3D images.

48:27 So we stack things up.

48:29 Yeah, cool.

48:30 Anyway, I'll put the link to this in the show notes so people can look around.

48:33 It's just interesting to see all the stuff that you were bringing together here.

48:37 You've got the STSI version.

48:40 That's, you know, a meta package for Conda that will install the individual things that

48:44 go together.

48:45 Yeah, of course.

48:45 Of course.

48:46 All right.

48:47 This JWCS one sounds pretty interesting.

48:51 One of the challenges with the telescopes, I'm sure, is, you know, things orbit around other

48:57 things which are orbiting around other things, you know, around the sun, which is around through

49:02 the galaxy and whatnot.

49:03 Right.

49:03 So figuring out where you're pointing is probably pretty tricky.

49:07 Is that what this library does?

49:08 So when I was talking about being able to understand the light and where it is in the sky and on

49:13 the detector, this is what we're saving in the ASDIF file.

49:16 These GWCS representations, generalized world coordinate system.

49:20 And that's a mouthful.

49:22 World coordinate systems are what astronomers use to relate, you know, an undistorted scene

49:27 on the sky to what you have on the detector.

49:30 And it changes depending on what the telescope optics are.

49:33 But what GWCS provides is layered on top of astropy modeling, which allows you to string

49:41 together mathematical models to translate coordinates between two different systems.

49:46 So we use this for translating perfect sky coordinates to detector coordinates and intermediate

49:52 systems as well.

49:53 And one of the benefits to that is sometimes there are effects that happen in the detector

50:00 that should be corrected at different stages along the way.

50:02 And you can insert those corrections in this pipeline of models.

50:06 It makes sense to correct in this coordinate system, potentially, rather than at different

50:12 levels.

50:13 Something like that.

50:13 I see.

50:14 Right.

50:15 Yeah.

50:15 Pretty fascinating.

50:16 It sounds complicated.

50:17 It's cool.

50:18 Yeah.

50:19 It's cool.

50:19 It is definitely cool.

50:20 It's built on a lot, you know, on other open source code that is useful for more things

50:25 than just astronomy.

50:26 Right.

50:27 Modeling is useful for everything in science.

50:30 Fitting is useful for everything.

50:31 So it's cool that we can, you know, help provide tools like that to the community.

50:35 Yeah, for sure.

50:36 What's the machine learning story around the telescopes and the data?

50:41 Are you all doing anything with that?

50:42 Any AI stuff?

50:44 Avery, you want to take it?

50:45 So I'll talk about, you can pipe up Mike for the HST stuff if you want.

50:49 I'm not familiar with that aspect.

50:50 I know.

50:51 So, yeah, obviously we have data science groups at the Institute that are doing a lot

50:56 of machine learning.

50:57 Machine learning can be applied to things like the catalogs of objects that astronomers scrutinize

51:03 details about what these things are that are confirmed that can be used to build up information

51:07 about unknown images.

51:09 Machine learning for processing.

51:11 We've started doing some of that for the HST processing in order to optimize scheduling,

51:17 time, spreading the data out, understanding if there are certain metadata keywords, like

51:23 what Mike was talking about before, that we know will affect the processing in certain

51:27 ways.

51:27 We can detect that early and make up for it.

51:30 Interesting.

51:31 How much of a challenge is an optimization problem of who gets time when?

51:35 So this person wants to look here and do this, then that person is going to look over there.

51:39 But if there's somebody that wants to look in the middle, maybe you could save some fuel

51:43 by turning a heart way, letting them do their job, like skip ahead in line and then keep moving.

51:49 It's a great point.

51:50 It's not only fuel, it's you don't want to move your mechanisms more often than you have

51:54 to.

51:54 So you don't want to flip the filter wheel three positions left, 10 positions right,

51:58 when you could have done that in a more economic way and save the wear and tear on your system.

52:02 So there's a lot of optimization that goes into the planning and scheduling system.

52:05 And of course, with astronomy, there's visibility.

52:08 You've got a certain targets are not visible, not visible at all times of the year, just depending

52:13 on where they are relative to the sun.

52:14 Right.

52:15 You can't turn around and look right past the sun because it'll...

52:17 Oh no.

52:18 JDBC especially cannot look back.

52:22 Even in the software, there are defined regions of avoidance that we are not allowed to move

52:27 the telescope.

52:28 And so we have planning and scheduling systems that know what these things are, that know

52:34 the range of places that this telescope proposal approval community has decided to point the

52:42 telescope.

52:42 And it figures out what is the most optimal way to organize those observations to get them

52:49 to the astronomers as fast as possible.

52:51 I see.

52:52 Towards the end of our time together.

52:54 What other things are you doing with Python you think people find interesting that maybe

52:58 I haven't asked you about yet?

52:59 Or did I cover everything?

53:01 We covered a lot of graph.

53:04 Yeah.

53:04 There's a lot of nuts and bolts that the data processing side has to do.

53:07 You know, we interface with a lot of databases.

53:09 So we need database interface packages that talk to Microsoft SQL Server and SQLite database

53:14 files and things like that.

53:16 As Megan said, we do a lot of parsing.

53:18 So we're JSON parsing and XML parsing.

53:20 And so all those packages come into play.

53:23 And then there's good old-fashioned bit busting where we've got to get into the telemetry binary

53:26 files and bust out the sections of bits that are the data from the wrappers that come down

53:32 that let us make sure we got the right data.

53:34 Figure it out.

53:35 Yeah.

53:35 Exactly.

53:36 So various packages across the Python scheme help us.

53:39 And there are a lot of external community packages that help us as well.

53:42 And then there's also all the documentation that has to be done for every user, especially

53:47 for our externally delivered packages of Sphinx.

53:50 Okay.

53:50 And code documentation.

53:51 We have a lot of our stuff on Read the Docs.

53:54 A lot of our infrastructure for testing.

53:56 I just had the folks on from Myst and have to do a better search to get something other than

54:02 a video game.

54:03 But the Myst project, do you still do all your stuff in restructured text?

54:06 Or are you doing any of the markdown things with Myst?

54:09 I'm not doing anything with Myst.

54:11 I have done a lot of restructured tests and pure LaTeX.

54:15 And Sphinx helps with those two things, too.

54:18 A lot of the scientists themselves just write pure LaTeX.

54:21 They just write all their papers in pure LaTeX.

54:23 And the symbols for it are so crazy.

54:25 It's so many.

54:26 Yeah.

54:27 But what comes out of it is really remarkable.

54:29 So, yeah, it's worth it.

54:31 I mean, it's really pretty.

54:32 I remember being introduced to it even as an undergrad in the 90s.

54:36 And it was like, this actually looks much cooler than Word.

54:39 Yeah.

54:40 Absolutely.

54:41 Yeah.

54:41 When I was in grad school, I did some of my homework in LaTeX.

54:44 But I haven't touched it for many, many years.

54:46 Not anything practical.

54:48 Yeah.

54:49 They sell a lot of the journals and a lot of the astronomy conferences expect you to turn in

54:54 posters and papers written in LaTeX that conform to their standards.

54:58 Yeah.

54:59 Absolutely.

54:59 All right.

55:00 One final question I want to ask you about.

55:02 And Megan, you said you were working on this.

55:05 It's the Nancy Grace Roman Space Telescope.

55:08 Yeah.

55:08 We haven't even really got the web telescope fully online.

55:13 And you all are working on this next one.

55:15 Yeah.

55:16 Give us an elevator pitch for this so people can be excited.

55:18 And what's the time frame?

55:20 So Nancy Grace Roman is really cool.

55:23 It is also going to go out to L2 where JWST is sitting in part because it's an infrared telescope

55:30 as well.

55:31 It's focused in near infrared.

55:32 So more similar.

55:33 And the web one might get lonely.

55:34 So it needs a friend.

55:35 Yeah.

55:36 I mean, there are some other telescopes out there too.

55:38 But yes, we need more friends.

55:40 And so it is about, the mirror is about the same size as Hubble.

55:45 But its optical prescription gives it a much wider field of view.

55:48 So 100 times the field of view of Hubble.

55:50 And it has way more detectors than pixels.

55:54 About 300 megapixels.

55:55 So there's 18 detectors.

55:57 And every time we take a picture, all those detectors are going off.

56:01 So there's also a lot more data volume coming down that we get to process.

56:06 But a lot of cool things we can do.

56:08 We can reach the level of the Hubble deep field very fast.

56:12 We're going to cover about 50 times more space in the first five years of this mission

56:17 than Hubble has in 30 years.

56:19 So it's going to provide a lot of cool things for the community.

56:23 It's also a survey telescope.

56:26 So a lot of its time is dedicated to some surveys that want to be done to support the community.

56:31 Looking for exoplanets.

56:33 And they expect to find several thousand exoplanets during the first five years of the mission.

56:39 Investigating dark energy and the expansion of the universe and the physics behind that.

56:44 And so and other things that I'm sure astronomers will come up with.

56:47 One of the cool things is that it will be launched to be compatible and collaborative with JWST.

56:53 And LSST, one of the large ground-based missions.

56:57 And Euclid, one of the European-based missions.

57:00 So it's really cool to be in this era of astronomy where the missions can really be, you know,

57:04 working together to make cool new discoveries and take data.

57:08 That's fantastic.

57:08 I haven't even thought of them working together.

57:10 Quick question from Julian in the audience.

57:12 We'll wrap it up.

57:13 How many of the experiments are providing data through the pipelines immediately versus

57:18 that priority, like private period that we talked about?

57:22 So some of that is up to the scientists.

57:24 The scientists can choose when they submit their proposal to make the data public immediately.

57:30 As soon as we get it, we process it, it's in the archive and it's there.

57:33 That's very, very fast.

57:35 One of the standards for Hubble is there's typically a year that the scientists have to look at the

57:41 data exclusively with their group and do the science they want to do.

57:44 And then it'll be released.

57:45 But I think that there are also cases of longer than a year.

57:48 And wasn't Roman going to...

57:49 Roman has...

57:50 Yeah.

57:51 Roman has no proprietary period.

57:53 The data will all be public immediately and we'll be serving data in the cloud and on

57:57 front.

57:58 Wow.

57:58 So no dragging your feet if you're trying to make a discovery and want to get credit for

58:02 it.

58:02 No, no.

58:03 Yeah.

58:04 No scooping other people either.

58:06 You got to be on the ball.

58:07 That's right.

58:08 Okay.

58:08 So I think we got to leave it there, but it's really fascinating to see the ways that

58:13 Python and the data science tools are connecting us with space and these discoveries.

58:18 So thanks for sharing those.

58:20 Yeah.

58:20 Thanks for having us.

58:21 You're quite welcome.

58:22 Yeah.

58:22 You bet.

58:22 Now, before you get out of here, you got to answer the final two questions, but there's

58:25 a couple of you.

58:26 So just maybe lightning around.

58:29 So if you're going to write some Python code, what editor would you use these days?

58:33 I'm old school.

58:34 I'm a VI editor.

58:35 Right on.

58:37 Yeah.

58:37 I use VI, especially if I'm logging into servers and stuff.

58:40 I'm doing something quick.

58:42 If I'm doing a more complicated development, I usually use Sublime.

58:45 Okay.

58:46 Fantastic.

58:46 But in the past, I've used PyCharm and other ones.

58:48 Sure.

58:48 And how about a notable PyPI package?

58:51 Some library you came across recently and you're like, oh, this is so cool.

58:54 People should know about this.

58:56 I would recommend the Condor package.

58:57 If you've got to do any kind of distributed data processing, you should really know what

59:02 Condor is providing.

59:03 So I highly recommend exploring that.

59:05 Fantastic.

59:06 Megan?

59:07 I know we've talked about this.

59:08 I was so excited to see PyLab come out because it provides us, not PyLab, I'm sorry, JupyterLab.

59:13 It provides us with so many opportunities to get the data and the analysis to our scientists in the easiest way possible and allows them flexibility to work on it with their colleagues and produce really good science.

59:27 But from the other side, I actually saw recently this package called Silly, which just produces random bits of data you can use for tests with all sorts of entertaining things.

59:39 Yeah.

59:40 Silly.

59:40 I think you can even have it emit like Chuck Norris quotes and stuff like that, which can make it, you know, oh, no, that's PyJokes.

59:47 That was the other one I've heard recently, PyJokes.

59:49 Oh, I love PyJokes.

59:50 You can have it.

59:51 Yes.

59:51 I love Chuck Norris.

59:53 Yeah.

59:54 He only makes programming better.

59:56 He does.

59:56 And so silly is like faker, but silly.

01:00:01 It's silly.

01:00:02 I love it.

01:00:03 I love it.

01:00:04 All right.

01:00:04 Cool.

01:00:05 Well, thank you both for being here.

01:00:07 If people want to get started with some of the JWST data or some of the libraries, you know, what would you say?

01:00:12 Go to our archive.

01:00:13 Yeah.

01:00:14 Archive.stsei.edu.

01:00:16 And as soon as the JWST data is coming through that they can look at, the public available data will be there.

01:00:23 Yeah.

01:00:23 And there's tools already there.

01:00:24 You can see, you know, the images, what they look like.

01:00:27 You can play with them a little bit.

01:00:28 You don't even have to know astronomy.

01:00:29 They have both the FITS formats that make them talk about from scientists.

01:00:32 And they have plain old JPEG preview images for folks to just browse and get a quick look at the sky.

01:00:37 Okay.

01:00:38 Fantastic.

01:00:38 All right.

01:00:39 Well, Megan and Mike, thank you so much for being here.

01:00:41 It's been great to chat JWST and space in general with you.

01:00:45 Yeah.

01:00:45 Thank you so much.

01:00:46 It was fun.

01:00:46 Take care.

01:00:47 Yeah.

01:00:47 You bet.

01:00:47 Bye.

01:00:49 This has been another episode of Talk Python To Me.

01:00:51 Thank you to our sponsors.

01:00:53 Be sure to check out what they're offering.

01:00:54 It really helps support the show.

01:00:56 Datadog gives you visibility into the whole system running your code.

01:00:59 Visit talkpython.fm/datadog and see what you've been missing.

01:01:03 We'll throw in a free t-shirt with your free trial.

01:01:06 For over a dozen years, the Stack Overflow podcast has been exploring what it means to be a developer

01:01:11 and how the art and practice of software programming is changing the world.

01:01:14 Join them on that adventure at talkpython.fm/stackoverflow.

01:01:19 Want to level up your Python?

01:01:20 We have one of the largest catalogs of Python video courses over at Talk Python.

01:01:24 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:01:30 And best of all, there's not a subscription in sight.

01:01:32 Check it out for yourself at training.talkpython.fm.

01:01:35 Be sure to subscribe to the show.

01:01:37 Open your favorite podcast app and search for Python.

01:01:40 We should be right at the top.

01:01:41 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:01:46 and the direct RSS feed at /rss on talkpython.fm.

01:01:51 We're live streaming most of our recordings these days.

01:01:54 If you want to be part of the show and have your comments featured on the air,

01:01:57 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:02:02 This is your host, Michael Kennedy.

01:02:04 Thanks so much for listening.

01:02:05 I really appreciate it.

01:02:06 Now get out there and write some Python code.

01:02:08 We'll see you next time.