#357: Python and the James Webb Space Telescope Transcript
00:00 Telescopes have been fundamental in our understanding of our place in the universe. And when you think about images that have shaped our modern view of space, you probably think about Hubble.
00:10 Just this year, the JWST or James Webb Space Telescope was launched. Jwst will go far beyond what Hubble has discovered. And did you know that Python is used extensively in the whole data pipeline of JWST? We have two great guests here to tell us all about it. Megan Sozie and Mike Swam. This is Talk Python to Me, episode 357, recorded February 23, 2022.
00:47 Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at talk.python.fm and follow the show on Twitter via @talkpython. We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at 'talkpython.fm/youtube' to get notified about upcoming shows and be part of that episode.
01:13 This episode is brought to you by Datadog and stack overflow, transcripts for this, and all of our episodes are brought to you by AssemblyAI do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code? Visit talkpython.fm/assemblyai. Megan and Mike, welcome to Talk Python to Me.
01:35 We're happy to be here.
01:35 Great to have you here. I'm such a fan of space, and it's amazing to think about our place in the universe and where we are. And so much of that has been revealed, really through telescopes right from the very beginning of like, oh, look, the sun doesn't rotate around us. How weird to the universe is expanding with what Hubble did or just all the amazing discoveries we've had, the exoplanets and whatnot. So super cool. And if we're going to mix Python, that's definitely an interesting thing to do. So I'm happy to have you both here to talk about that.
02:10 Yeah, we're happy to join you.
02:11 It's a pleasure. It's great.
02:13 Let's just really quickly before we get into the topics, maybe start with just a little bit of a story on your background. Megan, how did you get into programming and find your way over to Python?
02:22 Yeah, so programming. I started young, and it was mostly because I was jealous of some of the other kids. This is the early 80s that had access to the new Atari systems, the new collective visions. And I didn't really have that. I wanted to be able to play games and do those types of things. And as it turned out, my dad had just bought one of these new computers, and I was born, which is kind of one of the early that was one of the very early laptops. It might be the first laptop, a very heavy laptop.
02:53 Though probably not as good battery life as today either. I would suspect.
02:58 Yeah. There's no battery life, and it only had a small green screen and you couldn't display graphics. But I was determined I was going to play a game on it.
03:07 So I found this programming book in Basic for Microprocessors and followed along, learn what it was doing to create responses on the screen, and ended up making things like Mad Libs and Choose Your Own Adventure.
03:22 Oh, really?
03:23 Art and stuff like that and just learning what the commands I was doing really did. So that got me into programming that I kind of just did that on and off. It was really strange. Even though I was always working with computers when I hit College, nobody ever suggested to me that I should do computer science so that I should be a programmer or that I should do this other thing. So I went along with some of my other interests, for sure, astronomy and physics and music. But even in astronomy, software and software engineering and working with computers is a must. And after I started working at the Institute for a few years, Python started to become a much more used language, something that could provide real benefit for the scientific community. And so I started learning that, and that's going to Python.
04:09 Fantastic. The days of just holding up a telescope and looking at stuff, those are pretty long gone, right?
04:15 Yeah, they're pretty long gone. In fact, most astronomers don't even get to go to the telescope on the ground anymore. There's a lot of remote observing. It was starting to switch over when I was in College, and it's much heavier now.
04:29 What a bummer. Those are amazing places.
04:32 How do you get in a program by that?
04:34 Yeah, for me, dab the towing it in high school with a Fortune class. We actually got bust over to another school to run our deck of cards through someone else's computer.
04:42 That's another level right there.
04:44 I'm that ancient. And then in College, a few classes here and there, and then when I got out, one of the only skills I had was programming. So I had a math degree. And so I started with programming and Fortran at the start, but then got into other languages and eventually got to Space Telescope in 1996. And I've been here ever since. It started with Python about 2002, there's a software conference called ADAS ADASS Astronomy, Data Analysis and Software Systems. And we put one on the Institute that I work at put one on in 2002 in Baltimore. And Python was really a highlight of that conference. It was really coming into its own, and that just kind of broke the door down for everyone at the Institute. That's behind one of our founders, Perry Greenfield, who works at the Institute, is very big in the Python community, and he kind of led the way.
05:33 And we all followed right on. Well, it's certainly an amazing language for data science and for working with things like these telescopes and whatnot. So it has a special blend of approachability, but you can actually do quite a bit of real stuff with it, whereas so many other languages are either approachable or you can do real stuff, but not both.
05:53 Yeah. It also cuts down a lot on development time and lines of code and stuff like that, which makes it a lot easier to maintain larger systems that can handle the Python.
06:03 Yeah, absolutely.
06:04 I was impressed with the clarity when you looked at a piece of Python code, you stripped away all the syntax and all the language decorations, and it was just the design was staring at you. So to me, it's the simplicity of it. It's the best feature.
06:16 Yeah, absolutely. So let's start a conversation, maybe by talking about where you two work at the Space Telescope Science Institute. Just sounds so amazing.
06:30 That is the cool thing.
06:31 So what do you do there, Megan? You want to go first?
06:33 Yeah, sure. So right now I'm the technical lead for the data management system that's going to be accepting and processing and distributing all of the data that we're getting from one of the next big telescopes, the Nancy Gates Roman Space Telescope. So my day to day job is making sure that all that is designed and functioning and data is flowing through and the software we're going to write for scientists to do analysis of that data will run, and it is appropriate and accessible.
07:03 That's my day to day. Before that, I worked heavily on post pipeline scientific analysis processing software are visualization tools, data analysis tools, things that really the astronomy community uses after we've constructed the data for them from the telescope.
07:22 Okay. Tell us just a little bit about the data story of this. There must be a lot of data coming off of these newer telescopes with their huge resolution.
07:31 Yeah. So there is a lot of data coming down. The detectors on board, Jwst are 2000 pixels, and there's a lot of them and there's a lot of instruments. And so we have to be able to manage all of that data stream coming through. And I might let Mike talk a little bit here, since he works heavily on the upside of the house helping with managing that data.
07:52 All right, Mike. Yeah. Take it away. What's your role at STScI?
07:58 Yeah, that is a team lead for the data processing team, also a scrum lead. We use the Agile approach to software development, and I'm focused completely on JWST science data processing and guiding data processing. The data comes down to us through the Deep Space Network of NASA and hits the ground, comes up to us at Baltimore. We have a control center right of the building. So they're actually talking to the Observatory right through our main building. And then the data feeds down to our processing systems. And that's where my code comes in. Making sure we got all the data completeness, checking, sending the data through the pipelines and processing systems with the right processing recipe so that we get the right kind of data products out and getting them into our archive. The Institute has a well known archive for astronomy data, and that's where a lot of astronomers come to do research and find data sets that they can use in their science. Fantastic.
08:50 A lot of the data is public, right?
08:52 It is very much so. There's a proprietary period for some kinds of data so that scientists have a chance to write their special papers that they proposed for. But a lot of data is made immediately public, and then other data becomes public after a bit of a window.
09:08 Interesting. So you might write a proposal for time on Webb or one of the telescopes, and then if that's approved, then you get your time, then you get your data. And of course, it wouldn't make sense to just instantly publish that because that's part of your work, right?
09:25 Yeah. And so this is actually pieces of the entire process. The whole world of astronomers is allowed to propose to use these telescopes. So Hubble and JWST and there is a telescope review process that happens where they submit their suggestions. This is what I want to do. This is how awesome the data is going to be. This is what I'm going to provide to the community and the discoveries I think I'll make. And a team of experts agree and decide how much time everybody is going to get. And we have a lot of tools on what you might think of as the front end of this process to help them figure out how long do the observations have to be? If I'm looking through these filters and these wavelengths, what kind of errors am I going to get? What's the exposure time? How should I most effectively use the telescope time to get the science out? Because telescope time is always gold. Right? Right.
10:18 Absolutely. There's one and only one of these things, right?
10:22 This particular one, I didn't really think about that for asking different types of questions. There was a different amount of time that you might need. Obviously, if you're going to stare at space for a certain amount of time, it's going to take that time. But I hadn't really thought about the different processing pipelines might require more data to get the right level of accuracy and stuff. That's pretty interesting.
10:44 Yeah. So it really does depend on what you want to get out of it. When you insert in different wavelengths, you have to look at those things for a long enough amount of time to collect the light to get statistical certainty, to get detection. But then you have to play that against the fact that as you look longer and deeper, your field might get more crowded because you'll be able to pick up light from more distant objects. You may have things that are called cosmic rays, extra energy that gets added into the detector that you don't want, that's not from your object. That's just messing up your data and the pipelines. Take these things into account. We look for things that are. Oh, this is not coming from the object. This is coming from something in space that we don't want to detect right now or something that is being imparted by the instrument onto the data, and we want to remove the instrumental signatures. So that's what a lot of the processing software tries to look at.
11:38 Yeah, fantastic. If you want to add to that.
11:41 Just one other kind of data is time dependent data. You may need to look at objects over time, either calendar time or even many hours or many days to see variability. And that's a big part of some of the data sets as they just stretch on for time so that you can look for planetary transits or flares or various phenomena that might vary over time.
12:01 Yeah. How interesting.
12:02 One of the goals that Webb is trying to solve is to detect exoplanets. Right. And for those, you've got to watch multiple years of whatever that planet's orbital time is. Right. If it takes it three months to go around, you've got to watch the star for several three month periods. Right.
12:21 Well, and even in the case that you don't know that the stars have planets, you want to find new ones, you need to go back and look at that star multiple times so that you can detect that difference in the light that it's emitting. So when the planet goes across the star amount, some light is going to be dimmed by that. Or if you're looking at spectroscopy, you might see the Starlight passing through the atmosphere of the planet itself.
12:45 Fantastic. All right. I want to dive into the Python side, given that that's what most people are interested in. But before we get into that, let's just talk real briefly about what are some of the science missions and how is this different than Hubble? It looks really different than Hubble. Right. It's this bunch of Hexagon gold plates that unfold instead of being a tube like a traditional even the sort of space telescope like Hubble is, it's got a tennis court sized shield around it and stuff. So whoever wants to jump in and tell us a bit about what is the goal and the science of Web.
13:21 Yeah. So Webb is really the largest space telescope. It does have segmented mirrors, 18 of them that can be adjusted to make sure that the focus is good.
13:32 It's something like 6 meters across. It's really big.
13:34 It's six and a half meters across the mirror. I think it's something like 21ft in diameter, something like that. One of the things that makes it different from Hubble is not only the size of the near and the size of the telescope. But Hubble orbits the Earth, and JWST is demonstrably not orbiting the Earth. It's not very close at all. It's out at the second Lagrangian point, which is a more stable point that things orbiting in the Earth Sun system tend to collect at. And it's not even at that point. It's orbiting around that point and facing away from the Earth. And so one of the things that's really important for Web is that it stays cool, that its instruments stay cool. And that's why it has this big Sunshield. And so the cold side of the telescope is where we have all of our instrument packages, and that allows it to pick up this infrared wavelength of light, which is sensitive to heat. And it allows us to do the science that we need to do to look at dust and look through dust and look all the way back to the very earliest time, just after the Big Bang, when light was starting to be visible, to collect and start to form objects and stuff like that. So the design of the telescope is very much to enable the science that's possible with it.
14:53 And Hubble uses visible light is that the data.
14:56 Hubble has a range. Hubble actually has instrumentation that looks at UV visible and then near infrared. So it has a little bit of everything. Jwst also looks at infrared, but it goes much further out into the mid infrared as well.
15:11 And I guess that goes through dust clouds and things like that better.
15:15 It can. So you can switch between the different wavelengths of light depending on what types of phenomena you want to see, but also because we want to look very far back in time. What happens and what enables us to do that is the red shifting of the light. So because after the Big Bang and the expansion of the universe, the light gets stretched by the expansion of space. To look back in time, you want to look at the light that is of that similar weight like that's been stretched. So that's why we're looking in the infrared weight.
15:45 Links this portion of Talk Python to Me is brought to you by Datadog. Let me ask you a question. Are you having trouble locating the bug in your apps code or identifying the cause of your application's latency? Track your Python apps performance with end to end tracing using Datadog's Application Performance Monitoring or APM Datadogs. Apm generates detailed flame graphs to help you debug and optimize your Python code by tracing requests across Web servers, database calls, and services in your environment without switching tools for context. You can navigate seamlessly to related logs and metrics within the same UI to troubleshoot errors fast break down inefficient silos between your Dev and Ops teams, and visualize your application's performance end to end. Allow your development team to focus on revenue generating projects and releasing applications faster to market.
16:40 Get started with a free trial of datadog at talk. Python.fm/datadog or just click the link in your podcast player. Show Notes thank you so much to Datadog for supporting the show.
16:53 So what are some of the missions we talked about, the Exoplanet a bit.
16:59 Exoplanet is a Galaxy history.
17:01 Evolution, star formation, star formation.
17:04 Star formation, looking at the chemical composition of the galaxies and in stars. So it really uses this combination of spectroscopy, imaging and Coronography. And the choreography is excellent for Exoplanets because it allows you to basically put a stopper in front of the star and look for really dim things that are very close to it.
17:27 Solar system objects as well. And even things out in the comet work, cloud on the Kuiper belt, things that we just can't see. We don't have anything that can get out there in the infrared.
17:38 Like this telescope will be able to that's going to be super exciting. Hubble really changed people's view of the world, right? I've seen all the sky full of galaxies and stuff like that. What do you think we're going to learn?
17:53 Maybe it's too hard to predict, but what do you think is going to be surprising or what types of things do you think we're going to learn and change our perspective with this?
18:00 I guess what I hear from most of the scientists is they know they're going to find out things that they've never even thought of because that's what Hubble brought. And when you put something up that's this groundbreaking that it goes so far beyond what we can currently do, there's going to be just surprises that are just going to blow people's minds because they did not think that things were even possible. Hubble basically was involved in finding the acceleration of the universe expansion, and that wasn't even conceived of. So things that are groundbreaking and just never thought of that's the astronomers are open to those, and they're waiting for the data to come in to start to get the low.
18:35 Yeah, fantastic. All right, speaking of data, let's start talking about that side. Mike, you said the data comes in over the Deep Space Network. Is that amazing?
18:46 Yeah, the Deep Space deep Space Network.
18:48 Tell us about the kind of data coming in and how that all works.
18:52 Sure. The Deep Space Network is what we use to talk to the Observatory. We get a couple of contacts. Well, eventually when it's in steady state operations, we'll get a couple of contacts a day where it will download the data to us. Right now, we're in fairly steady contact. There are three ground stations around the world in Australia, California and Spain. That kind of gives nearly continuous coverage. So they're getting the data down.
19:15 What kind of bandwidth can you get on this network?
19:18 These networks are big antennas, and we share this with other missions, with other things that are in space that need to communicate, like Mars Rovers and what like Mars Rovers and stuff, and they typically operate in the K A and SBAN regions.
19:32 Yeah, cool. Alright. Sorry. I mean to derail you, Mike. Go ahead.
19:35 No, all good. So the data comes down to the space network facilities. It gets transferred up to the Institute, which is in Baltimore, to our flight Ops team, the flight observation System. They get the data in several forms. They get binary files right off the flight data recorders. There's a recorder that captures both science data from the science instruments and engineering data that's monitoring the state of the telescope temperatures and pressures and various things.
20:01 Right. Of course, because there's a whole control center going. It's focused. Right. It's still running and things like that.
20:07 Absolutely. Yeah. They have the health and safety of the Observatory and then the science as well. So we get those binary files that come up, we get auxiliary files that are processed on the ground. We take the engineering data that comes down, extract out some really key parameters that are necessary for the data processing and put those in special files and send those over to our data processing system, along with kind of a summary of what the telescope was doing since we last contacted. So we know it observed these things and these observations worked, these had problems, gives us kind of a status of things. The other kind of data that we absolutely need is called ephemeris data. It tells us exactly where the telescope was at any given time, which can be important for some types of science. Yeah. So all that data flows our way in files. We use Python tasks to pull a common disk area that files are dumped in and pick them up and transfer them to our processing system.
20:57 The telescopes receive the data that gets processed, probably another telescope, the ground station. Ground station.
21:04 And then that gets sent probably over the Internet, gets dropped into some like a local file or some cloud storage. You're watching that? And then Python picks it up from there.
21:13 That's right.
21:14 We pick it up, we segregate what kind of data each is, send it on for various kinds of processing. We put a lot of those data files right as they come in the door in our archive. So if we need to get them back out to look at them later, we can. We use a system for kind of distributing the processing called HT Condor. It's made by the University of Wisconsin, and it lets us distribute the processing over a big network of machines. This was developed even before the cloud came into being, and now the cloud is just part of it. So we use that system to span out the data processing of these various types of files. And we have a lot of kind of data completion checking that we do in Python, where we got to register what came in, whether we got all the packets, the data can come in different orders. It can be split up different ways in the recorder, and you have to kind of do a bunch of data counting before you can send it down the pipeline for processing because otherwise you'll have holes in it and the processing will give you the best product. So we do all that pre data accounting analysis. And when we've got something that's got all the parts, we send it on down the pipeline with a particular processing recipe that tells the pipeline to apply these exact calibration recipes, make these exact kinds of files, and get them into our archives for the scientists to retrieve and do science with.
22:24 Oh, that's really interesting. So you're doing some of the error checks and all that kind of data cleaning stuff for them before they have to pick it up.
22:32 We also need to check to make sure that there wasn't an error in the transmission of the data, that we got everything off the telescope that we thought we were going to get, whether things need to be retransmitted, and we're recording all the information about the instrumentation surrounding the detectors, what the temperatures are on the telescope, and saving that and associating it with the data so that when scientists are looking at it, they can make correlations with the different things that they see.
22:57 Oh, nice patio in the audience has a question that's pretty related to that. How do you manage potential disruption and data transfer? Like, if you lose the connection, do you have to worry about that or is it handled, like, below the layer?
23:11 Okay, we do. It does happen. It is handled upstream from us. The deep space network has that capability on our flight Ops team has to get them sometimes to retransmit the data if we got it on the ground, but we just didn't get it to our Institute. We can get it resent from another ground station, but if it really didn't make it all the way down from the Observatory, then they've got to go back on the next contact pass and get it to retransmit to the ground.
23:33 Yeah. So there must be some kind of protocol exchange between over the deep space network. Right? That sort of takes care of that for you. Yes.
23:43 And as Megan said, that's a shared resource. So if they happen to be talking to the Mars Rover or they happen to be talking to something else, we got to get in line with everyone else till we get our turn again.
23:51 Sure. Also related to that is what's the latency between the first piece of data set and the first one received. I mean, we are at great distances talking about the speed of limited by the speed of light. Right. So it's not milliseconds, I'm sure.
24:06 Yeah. Megan, do you recall what the travel time tail, too is? I don't recall.
24:10 Honestly, I don't know. Probably by the time we get it, we're talking minutes, some range of minutes, but I haven't calculated it recently.
24:19 Yeah, very interesting.
24:21 I mean, when we talk about data, you may take pictures with your camera, with the CCD and your camera, and you'll see that picture immediately and it's a square. It looks like the scene that you got. But when Mike was talking about we're waiting for the pieces of the data to come, that square gets chopped up on board the telescope and sent to us in little tiny pieces, and we have to reconstruct it.
24:40 You don't get it all at once.
24:41 Right. The other part of the analogy is if you hold your camera up and you take a picture of the sky, it's just a picture. We add all kinds of what we call metadata, supplementary data to that image that tells where it was pointed in the sky, what filters were in use on the Observatory, what astronomer asked for this data, what data grouping is it part of they do groupings in what they call proposals. And so we have all this extra data that we put into the files so that someone coming in to our archive who didn't propose the data can still extract it and get some understanding of what was done, how that piece of observation is set up. Sure.
25:14 Of course. Because if somebody says here's a directory full of large JPEGs now. Right, exactly.
25:24 They're beautiful and they're almost useless. Yeah, they're almost useless for science.
25:28 Yeah, absolutely. They would still make a good coffee table book.
25:32 Though, probably once they were cleaned up of the cosmic rays and all those things that Megan talked about.
25:37 Once you combine all the multiple filter images, you get the pretty colors.
25:41 Yeah, I bet. Those are amazing. Sempria asks, what's the Max Onboard storage, in case the data if there's an extended period where you can't get to it, how long will this thing run before it's just hard drive fills up?
25:56 Do you remember the capacity for the recorder?
25:58 I know because I have Roman in my head right now. So I don't remember the JWST capacity, but they take that into account. And that's how they've scheduled our contacts and down links with the deep space network so that we can get the data off at a reasonable time and it's not affecting scheduling of new observations. So they've done a lot of work and a lot of the work that we've done with Hubble in the last 30 years because we do scheduling and data processing on Hubble has allowed us to understand how to best optimize those types of things.
26:27 Yeah, very interesting. Yeah. You must have a lot of experience from Hubble because that's a lot of data as well. I was wondering how much computing happens on the telescope versus how much is it just a receptor and a transmitter?
26:39 There is a little bit that happens on board. James Webb, especially in the instruments. They probably knows more of the details of some of the infrared instruments, but they do image differences and summations. Infrared detectors build up their signal over time. And rather than send all those bits to the ground, there are some onboard calculations that are done so that they send a bit less to the ground than they actually collected.
27:01 Yeah. So one of the things that's different about IR instruments than you visit CCD is that every time there is no shutter in the IR, every time we want to know what the detectors collecting, we kind of ask it really nicely, what's the voltage of this Pixel? Then we ask it again and we ask it a whole bunch more times. And that's the data that we send down. And knowing how that signal is accumulating in the Pixel without removing the signal from the Pixel, it allows us to do cool things with IR data that allow us to reject the cosmic rays that may have come during the course of the sea exposure and stuff like that.
27:41 Do you have to do, like stabilization? I know the thing is probably pretty stable out there, but it's also looking really far away.
27:48 Stabilization. Yeah. It is looking far away, and it is mostly stable where it is orbiting around L2 is not a completely stable orbit. And so they do have to do station keeping, which I believe fires Rockets to make sure that it's in the correct orbit.
28:05 Yes. One of the big pieces of news that was really great was ESA. The European Space Agency, who launched this, did an excellent job of getting it right on target. So it didn't have to correct much, which means it has an extra fuel to run longer to do that. Right?
28:18 Exactly. They did an amazing job with that launch. That was like a flawless launch. It was really cool.
28:25 Fantastic. All right, real time follow up. We've got Adam on the audience says JWST can store at least 65 gigs of science data. Downloads occur in two four hour contacts per day where each can transmit 20.6 gigs of data.
28:39 How about that when we reach steady state science operations? That's probably correct. Right now, we're contacting a little more often because as they're going through commissioning, they really need to interact with the Observatory much more often. When we get steady state operations, they'll send plans up for the Observatory, and the Observatory will just tick through the plan. It will look at this star. It'll throw these filters in, it'll turn to this Galaxy, look at these filters, go look at this planet. And it'll basically do. It a program where right now they're interacting much more often with the Observatory.
29:07 So if you tell it to stare at this spot, this blank, dark spot in the sky for 4 hours, you don't need to check in with it as frequently. As long as it's.
29:16 If you told it to stare at that spot and something went wrong. James, the Web telescope is smart enough to skip ahead and go to the next thing in the schedule where Hubble was very ground dependent. Everything Hubble did had to be towed from the ground. And James team is a little bit more automated, where if it's got a problem with something it's trying to do, it'll just skip ahead to the next thing and someone else will reschedule that later on.
29:42 Is there any Python happening there? Or is the Python story really once the data gets back here?
30:05 Interesting. Yeah, it's fine. It doesn't really have numbers, but I guess you don't need numbers for science, so it's okay.
30:11 Well, I mean, when you think about the volumes of data that are coming down from the telescope, we're also transmitting those science pixels as integers. They're unsigned integer arrays. And then when we do the actual processing, those are expanded into floating point. So there's expansion that happens on our end for processing and storage and analysis.
30:29 Yeah. Cool. Mike, you mentioned this HT Condor maybe. Tell us a bit more about this. Is this something that people would find generally useful outside of telescopes?
30:39 Absolutely. It's a very generic product developed by the University of Wisconsin back in the day. They developed it because they had all these computers sitting around with people desktops that they wanted to make use of.
30:51 People were using them for three or 4 hours a day and then going off to lunch and off to meetings. And the computers are sitting there, and they used this system as a way to harness those cycles to the point where if no one sitting at the computer, Condor could tell it could drop a job on that computer run until the person comes back. As soon as they hit a keystroke, the job would leave that computer and if needed, go off and finish its work somewhere else. So it started in that realm and they just expanded and expanded over years.
31:14 Like setting at home or one of these sort of grid competing stories.
31:18 Exactly. It started in that realm. Now it can process over full universities using all the machines at a University, research clusters. There are government initiatives that have grid setups. Now the commercial clouds are involved. They have interfaces to AWS and Microsoft Cloud and a whole bunch of others. So they really expanded their access. It's highly used. The realm that pushes it the most are the ones that do the Omega. What is the big science that just came out last year? I'm drawing a blank.
31:47 The big science that just came out.
31:48 Yeah. The detectors where they found LIGO. Thank you.
31:58 The computing demands of those detectors are just they make JWST look like a pen. Oh, really?
32:05 So they're rarely off the charts. They need so many cores and clusters to do their computations, and this is Condors, one of the only systems around they can get them the access to the cores they need. It's highly used in the LIGO community.
32:19 This portion of Talk Python to Me is brought to you by the Stack Overflow Podcast.
32:24 There are a few places more significant to software developers than Stack Overflow, but did you know they have a podcast?
32:31 For a dozen years, the Stack Overflow Podcast has been exploring what it means to be a developer and how the art and practice of software programming is changing our world. Are you wondering what skills you need to break into the world of technology or level up as a developer? Curious how the tools and frameworks you use every day were created? The Stack Overflow Podcast is your resource for tough coding questions and your home for candid conversations with guests from leading tech companies about the art and practice of programming. From Rails to React, from Java to Python, the Stack Overflow Podcast will help you understand how technology is made and where it's headed. Hosted by Ben Popper, Cassidy Williams, Matt Kiernanda, and Sierra Ford, the Stack Overflow Podcast is your home for all things code. You'll find new episodes twice a week wherever you get your podcast, just visit Talkpython.fm/stackoverflow and click your podcast player icon to subscribe. One more thing. I know you're a podcast veteran, and you could just open up your favorite podcast app and search for the Stack Overflow Podcast and subscribe there. But our sponsors continue to support us when they see results, and they'll only know you're interested from Python if you use our link. So if you plan on listening, do use our link. Talkpython.fm/Stack Overflow to get started. Thank you to Stack Overflow for sponsoring the show.
33:50 And I think Python, too is a huge part of that LIGO result.
33:53 It is. They use a lot of Python.
33:56 Condor has a full Python interface, so you can talk Python, right to Condor, import the package and you're off and running.
34:03 Yeah, it's really neat. Just Pip install or Conduct install HD Condor and off it goes, right? Something like that. Yes.
34:09 Pip install power.
34:11 Awesome. You mentioned that it takes advantage of these computers, which are kind of sitting idle. And of course, we've got like folding at home, and we've got setting at home those types of things which they seem to have gone away, which I think is a little bit sad, actually. But there's been these things where it's like, well, if a personal computer is sitting around idle and we've got a bunch of them in an office or University or whatever, then we could use them. That's great, right? But you also mentioned the cloud, and I guess I had never really thought about it, but I know the cloud providers don't want this from you, but if you pay for a virtual machine, there's a good chance that it's sitting there doing 20% of what it could do. Right. So you could actually if you've got ten or 100 virtual machines running in the cloud, you could say, you know what, whatever extra capacity you have, I'm going to use that for this other sort of scheduling service.
35:00 The good thing about astronomy is most of it doesn't have to happen in real time, so we can buy up the cheap cycles where the machines are not being used on 01:00 and get good deals to go off and do our reprocessing and processing. That has to happen.
35:14 Right. Like reserved instances or something like that at AWS.
35:18 Right. Hsc is already doing some of their process reprocessing in the cloud. That's another point is we get the data down from the observatories once and we process it once, put it in our archive, and then we reprocess it again and again as the calibration algorithms improve, as software bugs are found and it's reference data, particularly calibration reference data, special data that's used to remove those instrumental signatures that Megan was talking about. As all that supplemental data and algorithm improved, you rerun the data again and again and again, and you need the computational capacity to do that while data is still coming down from the telescope because your archive is getting bigger all the time. So you've got a bigger crank to turn to get everything up to the best possible product it could be.
35:58 That is pretty fascinating. I hadn't really thought about that. You would have to reprocess the data.
36:03 But it drives all our designs because we reprocess the data tens of more times, and then that very first time it hits the ground.
36:11 Sure. What does the compute stuff look like for you all? Are you using GPUs? Are you using some of the NumPy Scipy Astro Py stack to do this?
36:21 Those are definitely involved. Yeah.
36:23 Yeah. So GPUs, I don't think we have an excessive amount of GPUs. Probably those are more in the post processing parts of the software. But those other things you mentioned, most definitely. And we actually are heavy contributors to those packages as well.
36:38 We do a lot to provide back to the community the software that we're creating, especially our external data analysis, post processing analysis software, so that it can be used by the rest of the community.
36:51 Yet the Samarita asks what kind of user home do with the data, or do you need these huge clusters, compute clusters to work on them?
37:01 There are note books.
37:04 I suspect there are some astronomers who have an expertise in this, but there's probably a lot of people are just fans of space, and I want to play with this.
37:13 So any data that's public that you can access in our archive, you can download and you can analyze it. You can install the science calibration software. So the stuff that takes the data that we've already prepared for you and does further analysis on it, we have visualization tools you can install. We have many different things that scientists or even non scientists might want to play with to look at the data. That's all possible. I do that with Hubble data all the time on my laptop. And my laptop just has 16GB of Ram.
37:45 Now, hubble data is fairly small compared to the machines we were needing.
37:50 Yeah. I bet.
37:51 When it first. How old is Hubble? 20 years.
37:54 30 year old computers probably thought that data wasn't small.
37:57 Yeah. So we've gone from terminal services to desktops and laptops and back to terminal services eventually in the near future because of the size of the data.
38:05 Yeah. Do you have notebook servers that are running close to the data that then you can log into and play with or what's that looked like?
38:12 So we do at different levels. Like you're talking about Jupiter notebooks, Jupiter Lab type of thing. Yeah. So that does exist. We've been playing around with science platforms in the cloud to give people access to not only the software, but the data in a very easy way. And I think Jupiter Lab makes that really effective for the tools that we can provide to people.
38:35 But you can also run it yourself on your own local machines.
38:37 Sure. As the data gets bigger and bigger, part of the cost and latency is just transferring.
38:43 So if you can close to it, you can save money by not transferring it and all sorts of things.
38:48 Right, exactly. Or reading it into memory for very large files as a system, the system itself needs to be very large to process the sheer volume of data that we get and are constantly getting and reprocessing and storing and serving. But for individual data pieces, it's still very possible for people to process on their own.
39:06 Sure. Do you think like Dask or any of these sort of Panda NumPy, like things that scale out to larger data?
39:12 Yeah. And it depends on the tool and what the purpose is that you're using. Sure, for sure.
39:18 Mike, got some thoughts on this before I move us along?
39:21 No, that's a really good summary of the picture.
39:24 And I think I'm pretty sure at this point all the public HST data is in the cloud. I think it's all on AWS. And I believe you can work with it there, especially that research to do. Yeah.
39:35 So one of the things that got my attention, Besides someone, I think, sending me a message. Hey, you should talk to the JWST folks, is you have over at on GitHub, you have the Space Telescope Science Institute, where you've got all kinds of various things that people can go play with. Right. Maybe tell us a bit about some of the highlights over there. The one that I ran across is just JWST, which it describes itself as a Python library for science observations from the JWST.
40:04 Yeah. So this is what we call the science calibration package for JWST. The software that lives here is able to do the detector calibration that we talked about, an image combination and everything else that we need to do to create the standard products we've agreed to create for the mission that's all contained in this package. And this package gets installed in our back end systems to be run as we're processing the data, and it's ready to be processed at this level.
40:35 So this is what users would install if you wanted to reprocess or do higher level analysis, JWST data, different instruments.
40:44 You spoke about the reprocessing and the re analyzing, so you would install this. And if you had an idea on how to maybe do different noise reduction or other processing. Exactly, this is what you do.
40:57 Got it.
40:58 We perform some base calibration to put the data in the archive and to make it somewhat usable. But most high level scientists are doing their own recalibration and they're tuning the data for what they're trying to get out of it, especially if they're working at the margins of noise and other things where they really have to work with the data to get out of it, fit their science needs.
41:17 So a lot of this is the byproduct of this proposal process because we don't know what the science is going to come through. We try to provide the best generic processing for all the data.
41:27 Right. It's pretty neat that this is just here on GitHub.
41:30 That's cool.
41:31 Yeah. I suspect when Hubble came out, it was not like, well, here's the open source thing and here's how you contribute back in the same way. Right. I'm sure to some degree, but the openness of science and really the computational bits of science over on GitHub is pretty amazing.
41:47 It's a lot of fun. I think we started using GitHub when it first came out. We were especially during Hubble, using subversion and even older version control systems.
41:56 EBS or something dreadful.
41:59 Even whatever we could use.
42:01 They were run on internal systems and managed on internal web pages. So when GitHub came out, it was really nice to be able to share our software not only with astronomers and the larger community, but with other missions that we interact with and to be able to talk to them about how we develop and accept changes and improvements into the software.
42:24 Yeah. You see right here there's 541 issues and 20 PRS that are open, but 3462 that are closed. That's pretty amazing.
42:35 We do a lot of work.
42:37 JWST is a new mission now.
42:39 Yeah. Awesome. Another thing that I ran across that looks interesting is the webpsf, which is a simulation tool. Right. Maybe tell people about this.
42:50 So Webb is obviously the telescope Psf point spread functions. This is the statistical pattern that might fall on the detector from a star would make. And you can predict what that pattern is going to be based on the optics that are in your telescope. And so this piece of software takes that understanding of the optics in the telescope and how light gets transmitted through those optics, including through different wavelengths, and create simulated images of what we might be able to see. This allows us to not only predict how the telescope is going to perform in different ways, but develop our software, develop the algorithms that we use to do pull out stars and stuff from images.
43:37 Yeah, very neat. I suspect that people didn't have access to the telescope and they wanted to play around with some stuff. Maybe they could use this.
43:44 Yeah, they could. There's other simulation tools that are out there that will simulate full astronomical scenes as well. So not just individual stars, the galaxies and the combination of the two.
43:55 Do you have something you can point people out?
43:58 I think I'd have to send you a link.
43:59 Yeah. We'll put it in the show notes so people can get it. That's great.
44:03 Let's do that.
44:03 Yeah. Awesome. Another thing I ran across looking at all this stuff is this place called Astro Conda, and there's a whole bunch of stuff, just tons of libraries in here. It looks like there's a lot of neat things, like, for example, working with the Asdf file format, which I suspect is something that you all provide a lot.
44:24 So we developed that format. We wrote that format. The primary interpreter is in Python. So a little bit of astronomy history, astronomy for a very long time has used a file format called Fits, Flexible Image Transport System. It started around the time data was being saved on tapes, on tapestore.
44:42 You might want to optimize for different things if it's going on tape and SSD.
44:46 Right. And so it's been in the community a long time, and a lot of community tools are based on it accessing it. It was a big part of this other thing in the astronomy community called Iraf, which was a common software package. It actually had its own virtual operating system and command line languages that did the reduction for us. One of the things that JWST wrote about is we needed to be able to handle those complex optical path descriptions for how the light from a star you're observing gets onto the detector and how you translate those positions back and forth. So Where's telescope pointed, where is this light as it's moving around? And how can I tell at this Pixel what star that relates to in the sky?
45:32 Yeah, it seems very nontrivial because you've got all these different hexagonal pieces that are independently adjustable, and you've got to sort of reassemble that into a continuous thing. Right.
45:41 So that's part of it. And even larger part is the number of objects that are in that chain. And then the optics inside each of the instruments and some of them are spectroscopic instruments that divide light into its constituent wavelengths. And those wavelengths have different travel paths and will fall on the detector in different places in mathematically predictable ways, but not always simple mathematically predictable ways. And so saving that information was really difficult and fits. And we developed this new format that will allow us to save analytical models into the data itself that can then be opened up by the users and very easily used to understand the relationship between the stars and the pixels that they are looking at and the stars in the sky. And so we started JWST with that, and it actually gets packaged into a Fits file.
46:32 But in later missions, we'll be using just as tiff.
46:36 That's really neat. So let's you save more of just the raw data.
46:40 It gives us a really nice way of saving information we need to understand about the data along with the data itself. It also saves the binary raise it's actually, if you were to look at it, the text part of the format is YAML, based on Json's Schema standard.
46:55 Yeah, fantastic.
46:57 So looking through this Astro Conda thing here, obviously there's stuff that's very focused on astronomy, like Asdf, but there's also other cool libraries like Appters.
47:12 There's all these little things. So if I want to write something to a temp file or Where's my user home or stuff like that, you can ask questions like that of this app server's thing.
47:22 Yeah, that's interesting.
47:24 I haven't looked at apps, but it must be being used by one of our other subpackages and somebody wrote it for a good purpose. This aftercontest site that you're looking at was this repackaging of tools that are written in Python, often with the extensions that can be used by astronomers in the community to do what they need to do to look at the data, calibrate the data. And so this Astro Conda channel is a Conda channel that allows us to organize that information and deliver it to users who are using Conda environments.
47:56 Yeah. You say something like install this channel everything in this channel, and then you'll basically be able to do most of the work that we're talking about or something like that.
48:05 Right. And so most of these are now available also separately on Condaforge and PyPI. So often people will install from there, too.
48:13 Sure. Yeah. So we've got, let's see, AST Eval, a safe, minimalistic evaluator for Python. That's pretty cool. You've got mysterious ones like Cube tools which just refused to identify themselves.
48:26 That's actually for 3D images.
48:28 Yeah. Nice.
48:28 We stack things up.
48:29 Yeah, cool. Anyway, I'll put the link to this in the show notes. People can look around. It's just interesting. To see all the stuff that you were bringing together here. You've got the STSI version.
48:40 That's a meta package for Honda that will install the individual things that go together.
48:45 Yeah, of course.
48:46 All right. This JWCSs one sounds pretty interesting. One of the challenges with the telescope, I'm sure, is things orbit around other things which are orbiting around other things around the sun, which is around through the Galaxy and whatnot. Right. So figuring out where you're pointing is probably pretty tricky. Is that what this library does?
49:08 So when I was talking about being able to understand the light and where it is in the sky and on the detector, this is what we're saving in the Asdis file, the GWCs Representations Generalized World coordinate system. And that's a mouthful. World coordinate systems are what astronomers use to relate an undistorted scene on the sky to what you have on the detector. And it changes depending on what the telescope optics are. But what GWS provides is layered on top of astroply modeling, which allows you to string together mathematical models to translate coordinates between two different systems. So we use this for translating perfect sky coordinates to detector coordinates and intermediate systems as well. And one of the benefits to that is sometimes there are effects that happen in the detector that should be corrected at different stages along the way, and you can insert those corrections in this. I see pipeline of models.
50:06 It makes sense to correct in this coordinate system potentially rather than a different level.
50:13 I see, right?
50:14 Yeah. Pretty fascinating. It sounds complicated.
50:20 It's built on a lot on other open source code that is useful for more things than just astronomy. Right. Modeling is useful for everything in science. Sitting is useful for everything. So it's cool that we can help provide tools like that to the community.
50:35 Yeah, for sure.
50:36 What's the machine learning story around the telescopes and the data? Are you all doing anything with that? Any AI stuff, if you want to take it.
50:45 So I'll talk about you can pipe up Mike for the HST stuff if you want.
50:49 Not familiar with that aspect of it.
50:50 I know.
50:52 Yes. Obviously we have data science groups at the Institute that are doing a lot of machine learning. Machine learning can be applied to things like the catalogs of objects that astronomers scrutinize details about what these things are that are confirmed. They can be used to build up information about unknown images. Machine learning for processing. We've started doing some of that, the HST processing, in order to optimize scheduling time, spreading the data out, understanding if there are certain metadata keywords, like what Mike was talking about before, that we know, will affect the processing in certain ways. We can detect that early and make up for it.
51:30 Interesting how much of a challenge is an optimization problem of who gets time when. So this person wants to look here and do this then that person is going to look over there. But if there's somebody that wants to look in the middle, maybe you could save some fuel by turning part way, letting them do their job like skip ahead in line and then keep moving. The other day.
51:49 It's a great point. It's not only fuel, you don't want to move your mechanisms more often than you have to, so you don't want to flip the filter wheel. Three positions left, ten positions right when you could have done that in a more economic way and see the wear and tear on your system. So there's a lot of optimization that goes into the planning and scheduling system. And of course, with astronomy, there's visibility. You've got certain targets are not visible at all times of the year, just depend on where they are relative to sun, right?
52:14 You can't turn around and look right past the sun because it all.
52:20 Cannot look back.
52:22 Even in the software, they're defined regions of avoidance that we are not allowed to move the telescope. And so we have planning and scheduling systems that know what these things are, that know the range of places that this telescope proposal approval community has decided to point the telescope, and it figures out what is the most optimal way to organize those observations, to get them to the astronomers as fast as possible.
52:51 Interesting. Towards the end of our time together, what other things are you doing with Python? Do you think people find interesting that maybe I haven't asked you about yet, or did I cover everything?
53:03 We covered a lot of ground. There's a lot of nuts and bolts that the data processing side has to do. We interface with a lot of databases, so we need database interface packages that talk to Microsoft SQL Server and SQL lite database files and things like that. As Megan said, we do a lot of parsing, so we're JSON parsing and XML parsing. And so all those packages come into play.
53:23 There's good old fashioned bit busting where we've got to get into the telemetry binary files and bust out the sections of bits that are the data from the wrappers that come down that let us try to figure it out. Yeah, exactly.
53:36 Various packages across the Python scheme help us. And there are a lot of external community packages that help us as well.
53:42 And then there's also all the documentation that has to be done.
53:45 We're having users, especially for our externally delivered packages that Sphinx okay and co documentation. We have a lot of our stuff on read the docs, a lot of our infrastructure for testing.
53:56 I just had the folks on from MyST and have to do a better search to get something other than a video game. But the MyST project. Do you still do all your stuff in Restructured text or are you doing any of the markdown things with MyST?
54:09 I'm not doing anything with MyST. I have done a lot of restructured test and pure logic and things helps with those two things, too. A lot of the scientists themselves just write pure logic. They just write on the papers and pure logic.
54:23 And the symbols for it are so crazy. It's so many.
54:27 But what comes out of it is really remarkable.
54:31 I mean, it's really pretty. I remember being introduced to it even as an undergrad in the 90s. I was like, this is the section looks much cooler than were absolutely.
54:41 When I was in grad school. I did some of my homework in Latek, but I haven't touched it for many years.
54:47 Not anything practical.
54:48 Yeah. They sell a lot of the journals and a lot of the astronomy conferences expect you to turn in posters and papers written in latex to confirm their standards.
54:58 Yeah, absolutely.
54:59 All right. One final question I want to ask you about.
55:03 And Megan, you said you were working on this is the Nancy Grace Roman Space Telescope.
55:09 We haven't even really got the webb telescope fully online. And you are working on this next one. Give us the elevator pitch for this so people can be excited. And what's the time frame?
55:20 So Nancy Grace Roman is really cool.
55:23 It is also going to go out to L2, where JWST is sitting, in part because it's an infrared telescope as well. It's focused in near infrared might get lonely.
55:34 So it needs a friend. Yeah.
55:36 I mean, there are some other telescopes out there, too, but yes, we need more friends. And so the mirror is about the same size as Hubble, but its optical prescription gives it a much wider field of view. So 100 times the field of view of Hubble. And it has way more detectors and pixels, about 300 megapixels. So there's 18 detectors. And every time we take a picture, all those detectors are going off. So there's also a lot more data volume coming down that we get to process.
56:06 But a lot of cool things we can do. We can reach the level of the Hubble Deepfield very fast. We're going to cover about 50 times more space in the first five years of this mission than Hubble has in 30 years.
56:20 So it's going to provide a lot of cool things for the community. It's also a survey telescope. So a lot of it's time is dedicated to some surveys that want to be done to support the community looking for exoplanets. And they expect to find several thousand exoplanets during the first five years of the mission, investigating dark energy and the expansion of the universe and the physics behind that and other things that I'm sure astronomers will come up with. One of the cool things is that it will be launched to be compatible and collaborative with JWST and LSST, one of the large ground based missions, and Euclid, one of the European based missions. So it's really cool to be in this era of astronomy, where the missions can really be working together to make cool new discoveries and take data.
57:08 That's fantastic. I haven't even thought of them working together.
57:10 Quick question from Julian audience. We'll wrap it up. How many of the experiments are providing data through the pipelines immediately versus that priority, like private period that we talked about.
57:22 So some of that is up to the scientists. Scientists can choose when they submit their proposal to make the data public immediately. As soon as we get it, we process it. It's on the archive and it's there. That's very fast. One of the standards for Hubble is there's typically a year that the scientists have to look at the data exclusively with their group and do the science they want to do, and then it will be released. But I think that there are also cases of longer than a year.
57:48 And was it Roman going to a year?
57:51 Roman has no proprietary, period. The data will all be public immediately, and we'll be serving data in the cloud and on Prem.
57:58 Well, so no dragging your feet if you're trying to make a discovery and want to get credit for it.
58:02 No scooping other people either. You have to be on the ball.
58:07 That's right. Okay, so I think we've got to leave it there. But it's really fascinating to see the ways that Python and the data science tools are connecting us with space and these discoveries. So thanks for sharing those.
58:20 Yeah. Thanks for having us.
58:21 You're quite welcome.
58:22 Yeah. Now before you get out of here, you got to answer the final two questions. But there's maybe lightning around. So if you're going to write some Python code and what editor would you use these days?
58:33 I'm old school. I'm a VI editor.
58:36 Right on.
58:37 Yeah, I use VI, especially if I'm logging into servers and stuff. I'm doing something quick if I'm doing a more complicated development. I usually use Sublime.
58:46 But in the past I've used PyCharm and other ones.
58:49 And how about a notable PyPI package, some library you came across for like, oh, this is so cool. People should know about this.
58:56 I recommend the Condor package. If you've got to do any kind of distributed data processing, you should really know what Condor is providing. So I highly recommend exploring.
59:05 That fantastic.
59:06 Megan, I know we've talked about this. I was so excited to see Py Lab come out because it provides us, not Pi Lab. I'm sorry, Jupyter lab. It provides us with so many opportunities to get the data and the analysis to our scientists in the easiest way possible and allows them flexibility to work on it with their colleagues and produce really good science. But from the other side, I actually saw recently this package called Silly, which just produces random bits of data you can use for tests with all sorts of entertaining things. Yeah, Silly, I think you can even have it admit, like Chuck Norris quotes and stuff like that, which can make it, you know. Oh, no, that's Py jokes. That was the other part I found was silly pyjokes. I love Chuck Norris.
59:54 He only makes programming better.
59:56 He does.
59:57 Silly is like Faker, but silly.
01:00:01 It's silly.
01:00:03 I love it. All right, cool. Well, thank you both for being here. If people want to get started with some of the JWST data or some of the libraries, what would you say?
01:00:12 Go to our archive Yeah, archive.stsci.edu. And as soon as the JWST data is coming through that they can look at the public available data will be there.
01:00:23 Yeah. And there's still is already there. You can see the images, what they look like. You can play with them a little bit.
01:00:28 You don't even have to know astronomy. They have both the fits formats that Megan talked about from scientists, and they have played old Jpg preview images for folks to just browse and get a quick look at the sky.
01:00:37 Okay. Fantastic. All right. Well, Megan and Mike, thank you so much for being here. It's been great to chat to JWST and space in general with you.
01:00:45 Yeah. Thank you so much. It was fun.
01:00:46 Take care.
01:00:47 Yeah. Bye. This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show Datadog gives you visibility into the whole system running your code. Visit 'talkpython.fm/Datadog' and see what you've been missing. But throw in a free T shirt with your free trial. For over a dozen years, the Stack Overflow podcast has been exploring what it means to be a developer and how the art and practice of software programming is changing the world. Join them on that adventure at 'talkpython.fm/stackoverflow' want to level up your Python? We have one of the largest catalogs of Python video courses over at Talkpython. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in site. Check it out for yourself at Training Python. Fm be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunesfeed at /itunes the GooglePlay feed at /Play and the Direct rss feed at /rss on talkpython FM.
01:01:51 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkpython.fm/Youtube' this is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.