Learn Python with Talk Python's 270 hours of courses

The Journal of Open Source Software

Episode #157, published Fri, Apr 6, 2018, recorded Fri, Apr 6, 2018

One of the hottest areas of growth for Python is in the scientific and data science communities. But if that work is done in an academic or research setting, it can be very hard to get proper credit for it. You have to write full on peer reviewed articles.

That's where Arfon Smith and JOSS or The Journal of Open Source Software come in. Here developer-scientists and other research-oriented folks can submit their software as a brief paper.

Join us on this episode to learn all about that and Arfon's work with some of the most cutting-edge projects in Astronomy at the Space Telescope Science Institute.
Arfon on Twitter: @arfon
Announcing The Journal of Open Source Software: arfon.org
The Journal of Open Source Software: joss.theoj.org
Become a reviewer: joss.theoj.org/reviewer-signup.html
A quick tour of a few papers: joss.theoj.org/papers/accepted
Zooniverse: zooniverse.org
Making Your Code Citable: guides.github.com/activities/citable-code
BATMAN article review: github.com/openjournals/joss-reviews
Space Telescope Science Institute: stsci.edu
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Episode Transcript

Collapse transcript

00:00 One of the hottest areas of growth for Python is in the scientific and data science communities.

00:04 But if that work is done in an academic or research setting, it can be very hard to get

00:09 proper credit for it. You have to write full-on peer-reviewed articles. That's where Arvon Smith

00:15 and Joss, or the Journal of Open Source Software, comes in. Here, developers, scientists, or other

00:20 research-oriented folks can submit their software as a brief paper. Join us on this episode to learn

00:26 all about that and Arvon's work with some of the most cutting-edge projects in astronomy

00:31 at the Space Telescope Science Institute. This is Talk Python to Me, episode 157, recorded

00:37 April 6, 2018.

00:53 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the

00:58 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter

01:03 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm

01:08 and follow the show on Twitter via at Talk Python. This episode is brought to you by ActiveState

01:13 and Rollbar. Please check out what they're both offering during their segments. It really

01:17 helps support the show. Arvon, welcome to Talk Python.

01:20 Thank you for having me. It's really wonderful to have you here. I'm super excited to talk

01:25 about Joss and this whole open journal of open source computing and scientific computing. I

01:31 think what you guys are doing there is really wonderful, and I think it'll open up some

01:34 possibilities and opportunities to a lot of listeners that maybe they weren't aware of.

01:38 Great. Yeah. I mean, it's a fun project, and so it's always fun to talk about it with new

01:44 people.

01:44 Should be great. But before we get to it, let's start with your story. How do you get into

01:48 programming?

01:48 Yeah. So I'm definitely not a professionally trained programmer. Let's put it that way.

01:54 So I have a background in chemistry as an undergraduate, and then actually I have a PhD in astrochemistry,

02:01 which is kind of like doing chemistry with big telescopes, so looking at gas and dust in space.

02:07 You're one of these people that can look at stuff like 25 light years away or something,

02:11 and go, oh, that probably has this element in the atmosphere.

02:13 Yeah. Yeah.

02:14 Yeah. Basically, and basically I went to astrochemistry. I did a PhD, in fact, actually,

02:22 because I really just didn't know what else to do, which sounds like an awful idea, but

02:26 it's how it happened. And I also wasn't that interested in chemistry, so I decided to go towards

02:33 astrochemistry, where you don't actually have to be that good at chemistry, it turns out,

02:37 because most astronomers don't know anything about chemistry, so you can be quite successful

02:42 with a little bit.

02:43 I absolutely love chemistry, but I'm scared of doing chemistry. Like when I pick up, say,

02:49 benzene or something, and they're like, oh, yeah, that's carcinogenic. And if it gets on your skin,

02:54 it'll soak through, so don't do that.

02:55 I know.

02:56 Yeah.

02:56 This really freaks me out to do this stuff, even though it's cool.

02:58 Yeah. So I had a lot of time in the lab as an undergraduate, and actually, it means I'm really

03:04 averse to precision anything now. So precision baking, in fact, baking, which is really just

03:10 chemistry, I hate it, and it drives my wife crazy. I just don't like measuring anything. She's a very

03:17 good cook and just won't ever have me in the kitchen, because I'm so averse to anything precise

03:22 involving ingredients now. And I blame my undergraduate.

03:26 Yeah.

03:27 Yeah.

03:28 So, yeah, sorry. So I did some Fortran programming. That was my first exposure to programming,

03:35 was as an undergraduate Fortran, which is very popular in chemistry still, computational chemistry

03:42 especially, because it's got a lot of very fast kind of numerical routines. If you need to work out

03:47 how an electron is interacting with another electron, you need fast maths to do that.

03:52 And then during my PhD, like lots of people, I had to do some data analysis and had reasonable

04:00 amount of data to process. So started with scripting languages. So first kind of language I really

04:05 picked up was Perl. And that was really, you know, that was because that was what was on the shelf in

04:12 the office. And this would be like 2002 or something. So, you know, that was probably a reasonable choice

04:19 then, I guess. And so, you know, mixture of sort of Perl, some C, some Fortran. And then I went to this

04:26 live course run by the library, by what was called information services, which is library science,

04:32 and did a web programming, as it was called course, and learned about HTML and stuff. And that thought that

04:40 was really exciting. And learned about iframes and then came back to my office and said, hey, iframes are really cool.

04:47 They were like, never use iframes. I was like, really? Oh, they're so awesome.

04:50 They seem amazing.

04:52 Yeah, they seem amazing.

04:53 And so then I started to pick up just had a very slightest kind of touch of PHP. But actually,

05:01 that was at a time when this framework Ruby on Rails was actually sort of a sort of beta phase,

05:07 I guess, 2004 or five. And a friend of mine who was, who was actually a legitimately very talented

05:13 programmer, I think, and, you know, coded since he was a kid, was like, oh, you should totally look at

05:19 this Rails thing if you're interested in sort of web application development.

05:21 Put down the PHP.

05:23 Yeah, yeah, yeah, for sure. So actually, you know, so I started using Rails, which at the time,

05:28 I didn't really even know I was sort of using Ruby, I guess, and then had a few years just kind of

05:34 building, toying around with stuff, and then read this book called Ruby for Rails, written by a guy

05:43 called David Black, who's big in the kind of in the Ruby community. He's famously sort of, you know,

05:48 was on IRC when David Heinmeier-Hansen was learning Ruby and building Rails and stuff. And, and,

05:55 and it's really just kind of explains how this sort of framework you're using is, is, is,

06:00 is actually just Ruby syntax. And it's just a, you know, DSL. And that was really enlightening for me.

06:06 And actually, that meant that by the end of my PhD, I realized I was much more interested in

06:11 programming than I was any of the science I was doing. I took that as a strong signal I should get

06:16 out of academia. Yeah, I received the same signal, by the way.

06:19 Yeah, well, it's just, you just start to realize that some people read the literature a lot more

06:23 than you and just know more. And I was like, how do you know about that result? And they're like,

06:28 well, I read papers. I'm like, huh, I'm not really doing that very much. And that seems like a bad sign.

06:33 But I was writing, reading a lot of programming books. So I just sort of gracefully exited with

06:38 my PhD at the end of my studies and, and actually went and had a year working in bioinformatics,

06:45 which is where Ruby is very big as well. So actually knowing Ruby there was actually

06:48 kind of a big deal. That's kind of the go to scripting language for, for bioinformatics,

06:53 certainly in the mid 2000s.

06:55 I think, you know, that, that sounds really interesting. And I think what you're doing

06:59 today is actually, you know, I'm just so excited to be able to talk to you a little bit later at the

07:04 end of the show about it. So tell people where you've gotten to today.

07:07 Yeah. So today I work at a place called Space Telescope Science Institute, which is in Baltimore,

07:13 on the US East Coast. And it, we run, we were actually set up to fly and operate the Hubble

07:19 Space Telescope. So that was something like 30 years ago that the Institute was created.

07:25 And we're actually a nonprofit government contractor. So we get, you know, NASA still pays us to operate

07:32 Hubble. And we're currently developing all the sort of ground systems, data management systems for the

07:39 James Webb Space Telescope, which is the kind of next big flagship mission for NASA.

07:43 And so we have a lot of, we do, we build a lot of kind of core infrastructure for data processing,

07:48 which is a lot of the work that I oversee here. We call it data management. And we also build a lot

07:56 of community tools, which are all pretty much exclusively these days built on the sort of

08:01 scientific Python. So position being in charge of lots of time and effort that we spend on scientific

08:07 Python, but full disclosure, I have never written a single line of Python in my life. So that's kind of

08:12 interesting for me. But I know a reasonable amount about open source and that kind of stuff. So I feel

08:17 qualified, but it's interesting that sometimes I write pseudocode, and I'm pretty sure it would

08:22 never compile. In fact, I've been told as much. So, you know, yes, I'm not a Python expert by any

08:29 stretch, although I should probably learn sometimes.

08:32 Yeah, I think it's really interesting how Python is really becoming highly used in this open source

08:37 science space. And it seems to really be something adopted by the various telescopes,

08:43 right? That was like a big theme at the conference last year.

08:46 I think at PyCon last year, a guy called Jake Vanderplass gave a keynote.

08:50 Yeah. Yeah, he gave a great keynote.

08:52 Yeah. And it was something like the unreasonable effectiveness of Python for science or something.

08:57 It was it was kind of a fun title. And yeah, he just really sketched out what I would say now,

09:04 which is just there's just this really deep set of libraries that you can use in sort of numerical

09:11 scientific computing in Python. And then, of course, you can do things if you need to have C bindings or

09:17 whatever you can do that, too. And so there's just really just about it goes very deep. And

09:22 and really now there's just this sort of overwhelming quantity of of kind of core libraries out there.

09:30 So that then things like we have a we have a project where we have a lot of core contributors for a thing

09:35 called AstroPy, which is a very popular library in astronomy and astrophysics. And, you know,

09:40 that builds upon SciPy and NumPy and obviously Python and, you know, and so it's really, you know,

09:46 where we're contributing to that ecosystem. We had people here a few years ago who were very active

09:53 on things like Matplotlib, that kind of thing. And so, you know, there's pretty the Institute's actual

09:58 credentials in the sort of Python community pretty deep, actually. People like Perry Greenfield,

10:04 who's still here, was really one of the key people that actually introduced the astronomical

10:10 community to Python, you know, and that that, you know, it's kind of interesting to reflect on the

10:15 fact that sometimes big changes come from just one or two people just deciding that they're going to

10:20 make a change. Right from the ground up. Yeah. Yeah. And so I feel very lucky that, you know, I can go and

10:26 he's like four doors down from me. I can go and just ask him questions about, hey, why? Why is it this

10:31 way? And he's like, let's talk about that. You know, he's got so much context. It's fantastic.

10:35 That's probably a good segue to just talking about the journal that you're the

10:40 chief editor for the journal of open source software, Joss.

10:44 That's right. Yeah. Yeah. So Joss is a, well, I'll give you the one line description. I'd like to

10:52 call it a developer friendly journal, which and so we should probably talk more about what that what I

10:58 mean by that. But Joss is a journal that tries to do all the right things in terms of being a legitimate

11:05 academic journal. And it's surprising how the establishment, I would say at some level makes

11:11 that look very hard and very complex. And it's actually really not. You have to do some of the

11:15 right things like register with the Library of Congress and get an ISBN number and things like

11:20 that. I mean, there's weird stuff that you just genuinely wouldn't know. But it's not like it's like

11:25 five things you need to do not 50. And we publish papers about open source software with a scientific

11:32 goal, whether software is science or research focused, I should say. And so generally, that means

11:38 academics who are writing software submit to us. And there's a couple of things that are kind of

11:42 important about it. One is we, we review, primarily for the quality of the software submission. So we're

11:51 actually not doing a big review of a paper. The does need to be a paper, it's generally very short. In

11:58 fact, we encourage it to be short. So just papers are generally less than two sides, a four or US letter,

12:04 however, you printed it out, they're really genuinely short. And they are submission format is marked down.

12:11 And, and, and bid text that we use pandoc to compile the things. And the kind of submission and review,

12:18 and kind of hold whole editorial process happens on GitHub, in public repository, so a lot in a public

12:25 reviews repository. So it's kind of a, yeah, it's, it's sort of interesting and weird in some of the

12:31 things we do. But at the end, we sort of try and do all the right things in terms of we give a DOI,

12:37 which is a kind of a weird URL shortener that academics use that mean that you can index,

12:43 sort of citations to other work.

12:46 Right. One of the issues that runs around that comes up around that is, if I wrote, say, a paper published

12:53 in a high end journal, and it references some package I depend upon that generated my results,

12:59 the owner of that package could just be having a bad day and just go, I'm deleting this GitHub repo,

13:04 and it's gone, right? And so this DOI is sort of a, almost like code in escrow type of thing,

13:10 right?

13:10 Yeah, so we, there's sort of, there's actually two DOIs that get created when a JOS paper comes into,

13:17 well, gets published. We make an archive of the software, or actually we request that the author makes an archive

13:23 of the software. So there's, there's, there's tools, like a tool called Figshare, and there's another one called

13:32 Zenodo that is run by the CERN, people at CERN, so people who do sort of the computing infrastructure

13:39 for the large Hadron Collider. And what they do is they, you, they actually set up a webhook. If you do

13:45 this from a GitHub repository, they, you know, they, you basically configure this add-on. And when you do a

13:51 release on GitHub, it makes an archive, it takes a snapshot of that code. And it actually doesn't include

13:56 the Git history, which weirdly maybe you would want, you know, actually legitimately you might want,

14:00 but it, it takes it like a table from GitHub, archives it and gives it a DOI. So the DOI points to Zenodo

14:07 then, but Zenodo then also has a copy of the code. So if yes, you or I decided to, you know, rage quit

14:14 open source or something, or just get really burnt out, actually, that's not rage quitting at all. That's just

14:20 legitimately decided to disappear off the planet. We, we, the code is still available. So when, when you submit to JOS

14:27 the last one of the last steps is when the review is complete and the changes have been requested and

14:32 made to the satisfaction of the reviewer, then they take an arc, we ask the author to make an archive of

14:38 code. And then the paper also gets a DOI. So, so then when people cite, want to cite that package,

14:45 we encourage them to cite the paper and then the paper connects to the archive of the code, if that makes

14:51 sense. So there's sort of some guarantee that in the future, if you stumbled across this paper, then you

14:56 should be able to still find the source code, even if it's not on GitHub or GitHub doesn't exist or

15:01 something. Right, right, right. You know, I think it's, there's a lot of stuff happening around there and

15:05 I don't want to go too deep down the hole of, of that, but even if you have the source code, that doesn't

15:11 necessarily mean it's saved for all, all time. Right. So maybe it runs on a certain flavor of

15:17 Linux that has a certain version of some internal bits that it works on. And if that is gone, right,

15:23 there's like, there's layers outside just the software. There's the versions of Python, if it

15:27 were based on Python, right, there's, there's whole layers of this and, you know, things like containers,

15:32 like Docker and whatnot, are like interesting players in the space as well.

15:36 Yeah. Yeah, for sure. And I think there's definitely more we could do there. One of the

15:41 things that's come out of the work we've done on Joss is that we've got a sort of fairly generic

15:48 tool set of tooling now. So we've got a sort of a fairly lightweight web application that allows

15:54 people to submit something for review. And then we've got an automated bot that's called

16:01 Weeden. Some of us are Firefly fans, I guess, Joss Weeden.

16:06 It is the Weeden handle on GitHub, which is kind of fun. And so that bot actually helps with a lot of

16:12 the editorial management. So a lot of that is sort of chat ops kind of automated in GitHub issues.

16:18 And so that tool chain can actually be applied to other things. So one of the things that's coming

16:23 up is that Joss has actually been forked to make a sort of sister journal called Journal of Open Source

16:28 Education or Jose. And that's actually using exactly the same tool chain. It's literally just a fork,

16:34 the code base. And so we've generalized that. So you could imagine we definitely talked about

16:39 containers as something that actually are interesting to think about reviewing and saving and having those

16:46 as actually there's been the idea of the sort of Journal of Open Source Containers as a journal.

16:51 I'm not sure exactly what I think about that yet, because I actually think to your point,

16:56 it might even just be better to say, well, if you've got a Joss submission, really,

17:00 what we want you to do is have a supporting kind of infrastructure piece like container to make sure

17:04 that that software has some chance of running in the future, some increased longevity. But we haven't,

17:10 we haven't gone that far yet. But it's definitely interesting.

17:13 It's very interesting. I think it also may, it may put extra pressure and friction, though,

17:18 on getting submissions.

17:20 Sure. Yeah, I mean, we're definitely not short on submissions. We've been going for a little

17:26 under two years now. And we're up to close to 300 submissions.

17:29 That's awesome.

17:30 Yeah, it's great. And it keeps me busy. And the editorial team, we've got a great team of editors.

17:36 So part of me thinks, huh, if we could slightly reduce the number of submissions, that'd be kind

17:40 of cool. Let me help my Friday evenings. But no, no, you're absolutely right. It's not,

17:45 we don't want to raise the bar too high. We feel like we've got a pretty good kind of quality bar right

17:51 now. You know, it says it in the name, you have to use an open source license, not one that you've

17:56 made up, you know, one that's approved by the OSI.

17:58 An official one.

17:59 Yeah, yeah. You pick one of these 300, it turns out, or whatever. But there are lots, but you

18:03 know, pick one. And then we, our review is primarily, you know, about the sort of usability

18:08 of the software. We encourage people to have tests, ideally automated tests. Documentation is

18:15 a must. Some, you know, we sort of have, you know, acceptable, better, best kind of categories.

18:22 And, you know, one of the reasons we set up JOS was to, that we felt like a lot of the software

18:29 that's in the sort of academic literature, when people write software papers, which is a thing

18:33 outside of JOS, like you write a paper about software to get some sort of career credit as

18:38 an academic, give people something to cite. Nobody ever looked at the software. The review

18:44 was always about the paper and never about the software. So we've turned it on its head. Most

18:48 of our review is about the software and not about the paper.

18:50 Yeah, I think that's right. I think that's the right way to do it. And you, like you said

18:54 at the beginning, the actual submission, what's your guidelines for the thing you accept is

19:00 really simple. It's like an abstract and basically supporting materials and links to the software.

19:05 Absolutely.

19:05 So maybe, maybe it's worth talking about briefly. Like, why does this exist? Because you mentioned

19:11 there was these other software oriented journals. There's pick an industry. There's 50 journals

19:17 in that industry. They're usually like expensive. You got to buy them. They're private. They go out

19:22 to like university libraries and professors and stuff like that.

19:26 For me, the number one motivation for JOS is to find a way to credit people in academic settings,

19:34 or in fact, research settings. The difference, I don't know whether it's interesting, you know,

19:38 academic research being sort of public, not for profit. And, you know, commercial research is a more

19:44 sort of general, you know, could include commercial activity, I guess. So people who are in a research

19:50 setting, who are writing software as part of their job, who are struggling to get career credit for

19:55 that. And that turns out to encompass a lot of people that I know. I guess technically, it probably

20:03 would have been me at one point, except I don't actually, I personally wasn't ever trying to sort of

20:08 follow an academic career track, which resulted, you know, relied upon papers and that kind of thing.

20:13 Right. But this is basically the currency of professorships and tenure track positions, right?

20:19 Yeah, yeah. So the way that you get a job as an academic is you write papers, and then people cite

20:24 your work. And, you know, if you write enough papers, and you get enough citations, and then at some point,

20:31 a group of people in a room, they're generally called like a tenure committee, tenure review committee will

20:36 decide that you're, you can have a permanent job in academia. And that's like the golden ticket tenure.

20:40 And, and the sort of the problem with that is that it's primarily and in many universities,

20:49 exclusively based on papers, it does not take into account whether you give really good public talks,

20:57 which lots of universities would say is a good, like outreach is important, or it would not take

21:01 into account the fact that you spent three years collecting a really valuable data set that lots of

21:07 people have used, like data sets generally aren't credit worthy. Like the only thing we have, we have

21:12 this one dimensional kind of credit model, which is papers and citations. And, and so the problem is,

21:18 if you write really high quality software for a research setting, you might spend a significant

21:24 fraction of your time doing that. And if you spend so much time that you, your number of papers

21:28 suffers, then you're going to get dinged on that in terms of your career prospects. And so software

21:34 papers, so a paper that describes a piece of software is a sort of understood hack on the current academic

21:43 system, except that software papers come with a bunch of problems. And Josh tries to address a few of

21:49 those one is one of which is, you know, if you want to write a paper about a piece of software,

21:52 you generally have to have sort of supporting new research results, right? That's the hardest part.

21:58 I think like it's incredibly tough. It's not enough to say I've built the most efficient, most awesome AI

22:04 framework for discovering exoplanets, you have to go and find exoplanets. And then you do coincidentally,

22:11 you get to talk about how you did it.

22:13 Yeah, it just happens. I have this repository over here, which happens to be on GitHub with a license,

22:18 and you might want to use it, you know, like, it's like a byline of the paper. So I have a problem

22:22 with that. And I think it especially is bad for long lived, so arguably like successful software,

22:30 right? Like, if there were only ever one release of a piece of software, you could say, okay, well,

22:36 you know, you probably built it when trying to do some research. So you should write a paper that

22:40 describes the software and some research results, the end. Okay, but if what happens on version two,

22:45 like if you and I decide to work on a piece of software that you did version one, that you probably

22:51 don't want to write another paper, because now people might cite the other paper. And now you don't get

22:56 academics worry about citation dilution, it sounds like such a ridiculous thing, but it's real,

23:01 like, and because it turns into this number called an H index, which is just, well, is a way of

23:08 trying to parameterize capture the sort of impact of a researcher. So yeah, so I have,

23:14 so Joss papers are, you know, the idea is that no results are actually permitted. It's not that we

23:20 don't need results, you're not allowed to put novel results in the paper, because we're not going to

23:24 review that, like we, that's not what the reviewers are there to do. So that's really sort of why we call

23:30 it developer friendly, like the idea being, if you have done the hard work to write this piece of

23:35 software, we don't want you to spend more than roughly an hour writing the paper to go with it. And that

23:42 turns out to be, you know, appealing to quite a lot of people.

23:45 Yeah, that's really awesome. And so people can, who have worked in any research area, like you said, not just

23:51 academics, if they want credit for the software that they have created, in terms of academic credit, right?

23:57 Yeah, the special coins you get at the university, when you get cited, right? That type of currency.

24:02 Yeah. So so actually, to that point, you know, we do get I was actually looking through some submissions

24:08 earlier today, we do have mixtures of people submitting, you know, most most of our authors are in an academic

24:16 setting. So either in some kind of research institute, but where papers legitimately count towards their sort of

24:23 career progressions. But we do have people in commercial companies as well, especially in the sort of data science

24:30 ecosystem. I was looking today, and there was a paper, it was a scikit-learn

24:36 contript package. I think it was the HDB scan, some some some new, you know, implementation of a of a of an

24:44 algorithm. And it was somebody at a university and somebody from Spotify, I think, or, or no, Shopify,

24:51 actually, yeah, very similar. But you know, but it was clearly looked like it was a data scientist and

24:56 at a company who maybe doesn't care about that credit, but also maybe was a university in a university

25:03 setting at some point, I think, especially in data science, I feel like there are many people who go and

25:09 sort of go a long way in the academic space, and then they go into commercial data science. And I feel

25:14 there's this interesting tension and trade off in the whole data science space in that the demand for

25:20 data scientists so strongly in motivates or pulls people out of academics as they're like, we'll pay

25:27 you half a million dollars. Forget tenure, you can just, you know, you can do this, right? Yeah, yeah.

25:33 But there's there's still that that tie back to like, I work with this professor or this group,

25:37 and I'm still kind of helping them and they give me good ideas. And so I feel like maybe a lot of the

25:42 papers come from that sort of remaining ties. Yeah, groups have Yeah, some some definitely do. I don't

25:49 think I have a good handle on how many. But the I think, even if you've left academia, I think there's

25:57 still, you know, the conventional way to share your ideas is in the literature, publishing,

26:03 papers. And so I think it's very natural for people to want to, to write a paper. And then, you know,

26:09 if we're like, well, here's a super short way of getting a paper.

26:12 Exactly. Well, especially if you've done the work, and we'll get to some of the details in a little bit. But I

26:18 feel like what you've done is you've come up with this concept of how you submit your stuff to the journal,

26:23 your software, and it pretty much just checks the box of here's how you run a good open source

26:28 project. Yeah, it has a good open source license, it has documentation, it has tests,

26:32 tests, it's hosted in somewhere like GitHub, etc, etc, right?

26:36 Yeah, so when I was working at GitHub, I really learned a lot about how, you know, successful

26:42 projects come sort of into existence. And what are some of the sort of key things that you need. And so

26:47 really, you know, there's, there's a lot of there's a lot of material out there now. But certainly,

26:51 four and a half years ago, this sort of idea of sort of what healthy, open source looked like,

26:56 or what a successful project looked like wasn't, like wasn't written down that much. And so

27:00 the team I was in GitHub was sort of trying to create some of that sort of shared understanding

27:07 in the community. And actually, while I was there, I was thinking about journals, and had already been

27:14 talking to some other commercial publishers, just who were asking me about GitHub. So I sort of helping

27:21 them understand open source, but I was representing the company. And, and, and then towards the sort of in

27:27 my final year, the company, I just sort of figured, you know what, I like none of the conversations I've

27:32 had have been very satisfying. They're not getting it, like they're not, like they're, they're doing

27:37 kind of the wrong thing. And I realized, I just started hacking on some code. I was like, you know,

27:41 I think I could do this. And I think it'd be super easy. And if you assume that the review could happen

27:46 in an issue, and, you know, submission is just creating an issue. And like, I realized, I mean, we have a

27:52 strong GitHub dependency via the API. But I just realized that I think thought I kind of knew enough

27:59 to do it better myself. So then so then just decided to go for it.

28:05 This portion of Talk Python to Me is brought to you by ActiveState. ActiveState gives you a faster way

28:10 to build and secure open source runtimes from your first line of code through to production. Every

28:15 second you spend building your Python distro or trying to secure your Python programs is time away

28:20 from doing the work you love. Tired of resolving dependencies or making sure you tick off all the

28:25 security boxes when you ship to production? With ActiveState, you can focus on your code and leave the

28:30 open source to them. Your teams can standardize with their Python builds for your specific use.

28:36 You'll have less friction in the development cycle. And that means you can deliver apps faster. If you

28:41 need to manage your apps in production, they even give you a unique server side way to verify your Python

28:46 applications at runtime. You can bake security right into your Python products without impacting performance.

28:51 Cut the hours wasted building your distro, finding the right package, or making sure you tick off all

28:56 the security boxes when you ship to production. Go faster, spend more time doing the work you love,

29:00 and comply with your enterprise needs for security. Try them and see why their distribution was chosen

29:06 by IBM, Microsoft, NASA, Siemens, PepsiCo, and others. Join millions of developers who trust ActiveState to

29:13 build their open source language distros. Visit talkpython.fm/ActiveState and cut the time

29:19 configuring and securing your Python builds. That's talkpython.fm/ActiveState.

29:25 Let's talk a little bit about just the whole process. Actually, let's touch first on the kinds

29:32 of projects that come up here. So people are listening. They're like, well, I've done some

29:35 stuff. It's kind of research. It's a project. Would it fit? So let me just read the quick

29:40 description of a couple recent things you guys accepted. So one is NIA Pi, N-I-A-Pi, and it's a new

29:48 micro framework for building and using nature-inspired algorithms in Python. So that's pretty cool.

29:55 There's one Pi Newcastro, which is Python interfaces for nuclear reaction rate databases,

30:02 including J-I-N-A stuff. There's a bunch of these. They're not super large scale. They seem a lot of

30:11 them are like, I need to get to this data or I need to do this calculation and nothing quite works. So

30:16 here's the bridge that I had to build for myself a lot of times.

30:19 Yeah. So we get, I mean, there's a few different categories of software that we get. Actually,

30:22 there's a bunch. I mean, those two, I'm pretty sure I didn't edit either of those. Because,

30:27 you know, one of the problems we have is, you know, we get submissions. I'm like, I have no idea.

30:31 I don't even, like we asked them to write, we asked the authors to submit a general audience,

30:38 for a generalist audience, a summary of their software. And I read them, I'm like,

30:43 I have no idea what this software does. I literally do not understand any of these sentences.

30:47 Thankfully, we've got like, I think 16 editors now and somebody will be like, oh yeah, I know this

30:53 stuff. Oh, I know enough that I can help edit this. And then we have a pretty good reviewer pool now.

31:00 It's, I think, well over 200 people who volunteered to review for the journal. And so we just have

31:07 their sort of language expertise and their subject expertise and their GitHub handles. So we just

31:13 ping them when the submission is made.

31:15 I think that works really well. And you do have a real rockstar cast of supporting editors. I mean,

31:20 you look through there and there's, there's a bunch of big names, including Jake Vanderplass.

31:24 Yep.

31:24 Catherine Huff, who does a bunch of the nuclear stuff. Maybe she probably reviewed that one

31:28 that I just shot at.

31:30 Yeah.

31:30 Yeah.

31:30 And the reason I learned about you guys is I actually was one of the reviewers on a thing called

31:36 Batman.

31:36 Ah.

31:37 Statistical analysis for expensive computer codes made easy.

31:41 Fantastic. I thought I recognized your face. There you go.

31:44 Yeah.

31:45 Yeah.

31:45 Yeah.

31:45 See?

31:45 I didn't make that connection.

31:47 It's from my GitHub. Yeah.

31:48 Yeah.

31:48 That's awesome.

31:49 Thank you for your time.

31:50 Yeah.

31:50 Absolutely.

31:51 Absolutely.

31:51 So one of the things I thought might be fun, and I think one of the reasons I wanted to

31:56 feature this is sort of twofold. One, if you're working in an area of either research,

32:02 you're a grad student or even undergrad, and you've got some kind of interesting open source

32:07 software thing, that would be a good thing to submit. That would be great. But there are many

32:11 people who ask me like, hey, I'm really just getting started.

32:13 I want to contribute to open source. But you can't just drop in on Django and just start

32:18 adding to Django because it's like an eight-year-old, highly polished, maybe not exactly, but not new

32:25 greenfield stuff. It's like very sensitive to change, and it's very nuanced. Whereas I think

32:31 becoming a reviewer, for example, might be a really nice way to start to be part of open source if you're

32:36 like a student or something.

32:38 For sure.

32:38 And you're trying to get it on your resume for when you get out of school or whatever.

32:42 Yeah, I mean, one of the things that I think this is true, I only have anecdotal evidence,

32:47 but I'm going to believe it because it supports what I want to believe, which is that people seem to

32:52 genuinely enjoy both the review process and being reviewed. But authors and reviewers seem to just,

33:01 we could do probably a sentiment analysis of all the comments or something because they're all public.

33:06 But, you know, we get people who say, I say, you know, would you mind reviewing this? I'd love to.

33:12 I'm like, really? Okay. I mean, I've had people email me and say, why haven't you given me anything

33:16 to review yet? I'm like, I don't know. And I'm just kind of, your name hasn't come up yet. And I do worry

33:22 that sometimes I go to people I know who would be good for this. So I actually, one of the things

33:28 I'd like to automate is reviewer suggestions. We have this big list. It's in the spreadsheet.

33:32 I feel like it's something that our bot could do. But yeah, maybe you could have tags and like,

33:37 these are their specialties. Yeah, exactly. Exactly. And, and, and it would also keep track

33:42 of how many reviews you've done because one sort of over taxing people is one of the things you worry

33:47 about. This person's good. Yeah. Yeah, yeah, exactly. So, so, but yeah, for sure. People seem to,

33:54 you know, a number of times where reviewers are like, this was a really good experience because I learned

33:58 about this package and all that all really we want people to do as a reviewer is try and install

34:03 it and run it and verify it. Right. So that involves reading the docs, looking at the code.

34:08 Ideally, you know, if you find that there's methods that are uncommented, quite often reviewers will

34:14 actually make pull requests against the thing they're reviewing, which is kind of nice. And just there's

34:20 this, you know, everyone gets the idea that what we're trying to do is create a, you know, highly usable

34:26 piece of software, you know, that solves, solves a real problem. So yeah, I think actually that's a

34:32 great suggestion for people who are looking to, you know, dip their toe, you know, take first steps

34:38 in open source. Joss is actually a great place to come and just read, read people's, you know, code.

34:45 Often, as you say, these are pretty small packages and, you know, and actually maybe even become a

34:51 contributor. Yeah, that's a great idea. Absolutely. A lot of people who are getting it,

34:56 open source, some of their first steps into any particular project is to help with documentation

35:00 or tutorials or examples. And this, this review process is similar to that.

35:05 It is. Yeah. And, you know, we, we, we have a fairly prescriptive process, so we don't kind of

35:10 leave people with just, what do you think about this piece of software? It's like, there's like 20

35:14 checkboxes that we ask people to. Yeah. Does it have an open source license? Yes or no? Yes.

35:19 Does it have tests? Yes or no? It's almost like a little checkbox. I think there are actual

35:22 checkboxes in there. There are. There are. Yeah, yeah, yeah. Yeah, absolutely. Absolutely. So we

35:27 checkbox driven development or something. Nice. Yeah. So I recommend to people out there, if this

35:32 sounds super interesting, if maybe you're still in college or grad school and you're like, I want to

35:37 sort of, you know, start to build a resume around this kind of stuff, you know, becoming a reviewer

35:42 would be real easy. And people who are, especially in school, they already have some specialty so they

35:47 can help in that area. Yeah, yeah, for sure. Nice. And so Joss is actually one of four journals

35:53 under a larger banner called just Open Journals at OJ.org, right? Yeah, that's correct. Tell us

35:59 briefly about the others. Yeah. So, so the first, I think the first one I set up is this one

36:06 called the Open Journal of Astrophysics. and that's my least successful. So I guess I was so

36:11 stepping back, it, it, it would appear based on the evidence I create journals in my spare time,

36:16 which is a horrible thing to do if you want to have any spare time, in the future. So yeah,

36:21 so this is like my problem in life that I seem to make academic journals. and it, and it,

36:27 it, it's a big time sink. so the Open Journal of Astrophysics was the first. we actually

36:34 published three papers. It's kind of currently paused right now, mostly because we don't really have a

36:39 particularly strong or, well, no, we have a strong, but not very engaged editorial board.

36:44 And the important thing to realize about a journal is it kind of lives and dies by the ability and

36:50 willingness of the editors to do, you know, and the reviewers to come together and review content.

36:55 So, you know, the Open Journal of Astrophysics kind of, and a bit of a hiatus right now.

37:00 I don't know what will happen with that project. it's a nice, it's a nice project.

37:05 in the sense that we, the model is to review papers that are already on the archive, which is a preprint

37:12 server where people put kind of free and open copies of papers that they're going to submit to other

37:17 journals. So the idea being that you could just do a sort of review, in a browser. and the,

37:23 there's this, there's lots of other journals now following this model. They call them archive overlay

37:27 journals. and so, I'm sorry, we didn't, weren't more successful with that, but you know,

37:34 such as life. the second is, the journal of brief ideas. I built that with a guy called David

37:41 Harris, who's a physicist, in Australia. And he just has this problem. He just really wanted to

37:49 find a way to capture, short ideas, good or bad. and have a way for people to just write them

37:57 down and say, you know, here's an idea. I'm not going to take it further or I don't have time right

38:03 now. And so it's more like a sort of diary of thoughts from the community. and it could be

38:09 like the seed of potential research projects, but I'm not going to pursue it. That type of thing.

38:13 Right. And, and, you know, why that exists is kind of interesting. You know, academics live or die at

38:20 some level by the quality of their ideas and the novelty of their ideas, which is good and bad.

38:26 so academics, it turns out care a lot about, you know, who had the idea first. And I, I really feel

38:33 like that's one thing reflecting on time and sort of industry, something that I find engineering cultures

38:39 care much less about. You're like, we care about building a good system, something reliable. I don't

38:45 care whose idea this was. This is just a good idea. Whereas academics are very keen,

38:49 very careful to award, you know, Oh, it was this person, you know, Mike's idea first. And then

38:54 I took it forward, but he had the original, you know, you hear them a lot.

38:58 It's very careful.

38:59 It's the citations. It's the papers. And these are all driven by the first paper. It's the citations on every subsequent paper and all that. Right.

39:06 So journal brief ideas as a way for people to say, I've got an idea. It could be good. It could be bad.

39:12 I want to write it down because I want to, I guess, put it, I mean, you know, like put a stake in the

39:18 ground. A little bit of a flag on this idea. If I ever come back to it. It's kind of fun.

39:22 And people, people do use it. And I don't edit that. It's David is the sole editor. It doesn't go

39:30 through review. So they're limited to 200 words, these ideas. So really short. So you can have 200 words

39:37 and a figure like, you know, and that's it. And then, and then Joss is the third journal that I've

39:45 created. And that's by far the most successful. It's at some level, sort of one of the things that

39:52 I did when building Joss was I really wanted to absolutely minimize the amount of new software I

39:58 wrote. One of the hard things about Open Journal of Astrophysics is there was quite a complex sort of

40:03 web-based UI with PDF annotations and like lots of bits of technology that I wasn't particularly well

40:10 versed in. And so Joss is super simple. It's like a web form that leverages the GitHub API to open an

40:16 issue. And that's it. Like literally that's it. And it's got a very small database behind it so that

40:21 it can render out the accepted papers. And then Journal of Open Source Education is, is ready to go. In fact,

40:29 I think they're very close to accepting submissions. And that, that's not a journal that I'm going to be

40:35 day to day involved in, in running, but it's part of the sort of family of journals. So really the two

40:43 that are most similar are Joss and Jose, they are, you know, very, very similar journals and using the

40:49 same rating model. Yeah, those are really cool. I like it.

40:55 This portion of Talk Python to Me has been brought to you by Rollbar. One of the frustrating things about

41:00 being a developer is dealing with errors, relying on users to report errors, digging through log files,

41:06 trying to debug issues, or getting millions of alerts, just flooding your inbox and ruining your day.

41:11 With Rollbar's full stack error monitoring, you get the context, insight and control you need to find and

41:16 fix bugs faster. Adding Rollbar to your Python app is as easy as pip install Rollbar. You can start

41:23 tracking production errors and deployments in eight minutes or less. Are you considering self hosting

41:28 tools for security or compliance reasons? Then you should really check out Rollbar's compliance

41:32 SaaS option. Get advanced security features and meet compliance without the hassle of self hosting,

41:38 including HIPAA, ISO 27001, Privacy Shield and more. They'd love to give you a demo. Give Rollbar a try

41:46 today. Go to talkpython.fm/Rollbar and check them out.

41:51 So let's just talk real briefly about compare and contrast. Most of these articles are written

41:59 and published in high end, very private, cloistered sort of journals, right? Like

42:06 JAMA for Journal of American Medical Association or JRME for education. And you can't just like

42:13 easily go get them. The papers are often not available on the internet.

42:17 they're really packaged away just for a few folks to get to, which I think is very odd because so much

42:23 of the research is paid for by National Science Foundation or National Institute of Health or

42:27 whatever. So we basically, the public pays for this research and then the results of it are hidden away

42:34 from public view. Right. Yes.

42:37 So yeah, like this is very much not like what you guys are doing.

42:41 No, I mean, everything you said is true. I think, you know, there's a growing interest in what's

42:47 generally termed open access publishing, which is not, you know, so stuff once it's accepted and is in

42:52 the journal is available for all to read. But right now, a lot of the business models of academic publishing

42:59 either rely on journal subscriptions. So when your library or you as an individual buy access to these

43:06 papers, and that's generally, you know, a journal subscription and that can run to, you know,

43:12 enormous amounts of money, you know, single universities paying millions of dollars a year

43:16 to publishers, just to gain access to hilariously or disgracefully the papers that

43:23 their academics have written. So, you know, like, yeah, exactly messed up actually.

43:29 and you, you, you as an academic secure public funding often for research, you then do the

43:36 research, you give your copyright to your research, to the journal that then puts it behind a paywall

43:43 and sells it back to your university and to the public. So, you know, there's a lot wrong with that.

43:47 To be fair, the journals would say, well, we add a lot of value. We bring peer review to the process.

43:53 we, you know, make the papers, we format them nicely. We, we, we, we, you know, we, we run,

44:00 we maintain quality at some level and much of that is true. but the, the, the cost is still pretty

44:07 high. And I think there's a lot of interest in low cost publishing these days. And that not doesn't

44:12 mean low quality. It just means how low can that cost go at some level. And so just our running costs,

44:19 if you ignore people time, which I'm going to, because we're all volunteers, we're something

44:25 around $4 per paper, to in sort of production costs. And that's actually most of that cost is,

44:32 you know, a small web server for running, the app, the web app and the fact that there are

44:38 actual, we have to pay, subscription fees to get the DOIs. So it costs us about a dollar 50 for each

44:45 DOI. And we have to pay a membership fee to this organization called cross ref to, to have

44:51 sort of, to be able to continue to generate those, those DOIs.

44:55 Right. That dramatically changes the structure. And, you know, I know that on the academic

45:01 journals, it's often professors who are not paid by the journal in any way or form asked to volunteer

45:07 for in the same role as the reviewers are here. So it's not like they're paying huge sums to the

45:13 reviewers. No, no, no, no, no, no. I mean, that's, yeah, yeah. It's crazy. Yeah. So I,

45:17 you know, I'm not, yeah, there's so much to say about publishing. and, and, you know,

45:24 one thing that's kind of interesting is that peer review, which we see as this, you know, pinnacle of

45:29 quality and as a process, it's actually pretty new. Peer reviews only existed for 50 years,

45:35 just full stop. Like most journalists just didn't have it. and like famously people like Einstein

45:40 would write to the editor and they tend to let her saying, Oh, you know, here's my new theory of

45:44 special relativity. And there is nobody qualified in the world to review this. So you must publish it

45:49 right now. And they'd be like, yeah, okay. You know, you're, I think you're right. Seems legit,

45:53 but like you just didn't get a review. It just got published. And so peer review is, you know,

45:58 is important. but generally, as you say, you know, people aren't paid for it. I think

46:02 almost exclusively people aren't paid for it. And so, it's part of your sort of contribution

46:07 to the, the, the academic ecosystem, yeah, it's part of your job as an academic to review. And

46:14 that's understood. We have the same model. We have people to review, we don't pay them.

46:19 and I don't think we have any interest in paying people to review. That would be weird, given

46:24 that we have no money anyway. So, but, but, but we, what we do have, I would say is I think we,

46:30 by being open, our reviews being open, a lot of peer review in academic journals is, is closed. You

46:36 don't know who's reviewing your work as anonymized. I feel like that openness incentivizes good

46:42 behavior, and actually quality. we get, you know, somebody, I, I hope that one day somebody

46:49 will be able to say, I am a Josh reviewer. I've reviewed 20 submissions. Here are my submissions.

46:53 And you can go and look at those and be like, this person does really nice reviews. This is actually

46:59 really high quality, good insights from this person. And, you know, there is some work

47:05 already going on in sort of publishing to make, reviews a sort of a credit worthy activity.

47:12 I mean, people write it on their resumes. They'll say, I review for, you know, app J or something,

47:17 but, but you can't actually prove that like, unless you're, unless you're the editor and you're like,

47:22 no, you don't review for me. Like you would never know. so yeah, no, I think there's some,

47:26 a lot to be said for sort of being open and there's a lot to be said for sort of innovating

47:30 with, cost models, and pricing models. And so, yeah, we don't charge anything to submit

47:35 to JOS. I don't think we have any interest in charging authors. We do, you know, you know,

47:40 we, we do have some ideas about how the review that we do could be valuable in other academic

47:48 settings, like for other journals who want to get software review, but with that's just early phase

47:53 stages of conversation right now, but it's interesting.

47:56 Yeah. Yeah. It's, it's really, really neat. I feel like Josh is open source and, you know,

48:02 2018 or 2016 business on the internet meets old, old business model. You know, it's just like,

48:09 wait, these, why, why is it done that way? Cause it doesn't seem like it needs to be done that way.

48:13 So yeah, pretty fun. Right. Right. Yeah. So I do want to spend a little bit of time talking

48:18 about other stuff. So maybe we'll leave it there for Josh. Okay. I just want to encourage people

48:23 who've worked on open source projects to either submit them or sign up for review. Cause I think

48:27 that would be cool. So let's talk about the, the space telescope science Institute where you work.

48:32 Right. So you've got two major new telescopes coming out that, that you mentioned at the top,

48:40 right? The James Webb space telescope. And what's the other one called? I forgot. It's like the whole sky.

48:45 Oh, so yeah. So I mean, Hubble has been running for 25 years. We operate that still. And then there's,

48:51 W first, which is a wide field, infrared space telescope. In fact, that's what the acronym

48:56 stands for. is, is, a mission that's currently kind of having a little bit of a rocky

49:03 stage in, Congress cause you know, budgets are weird. And, these projects span longer than

49:10 election cycles, which is dangerous. Yeah. it's so interesting to see, like, I've never had a job

49:17 where I've actually had to pay attention to politics daily. I now have that job. and it's

49:22 interesting. it's, also not being American. It's kind of learning about learning about that world.

49:28 so wait, they could do that. Yeah. And so, you know, so, so currently very active, very, we're very

49:36 active on JWST, James Webb space telescope, which is, is meant to fly, June ish, 2020. so

49:46 these are, you know, the lifetime, you know, it takes a long time. It turns out to both convince the

49:52 government to spend $9 billion, which is what JWST is going to cost. So that's a lot

49:58 obviously, and then, and then you have to build it and, you know, there's lots of novel

50:02 technology that's just never been developed before. And, and it's, yeah, it's, they're

50:07 complex and, you know, they take decades to build, it turns out. Yeah. So what's, what's the

50:11 primary, result expected from the James Webb one and then the, the wide field one? Yeah. So,

50:17 I mean, JWST is, I think for me, the most exciting thing about JWST is, it's gonna,

50:25 so it's an infrared space telescope, and infrared light is different in optical in the

50:31 sense that it can kind of look further back in time because infrared light isn't obscured by dust

50:37 in the galaxy and in the universe or isn't obscured as much. And so it allows us to look back further.

50:43 and so, look at the first light coming from the universe. So a period of time called

50:49 reionization when the universe kind of, when the sort of, you know, first atoms and,

50:55 were forming after the big bang. And so that's, you know, some hundreds of millions of years after the

51:01 big bang, JWST is going to be able to see the first galaxies and the first stars forming. And that

51:07 assembly of the very first, the very first galaxies, the stars, become sort of gravitationally bound.

51:12 And so that's, that's very exciting for lots of reasons, but, you know, understanding the earliest

51:18 phases of the universe, another, another, kind of couple of big areas of, some science

51:24 highlights there, the ability because of the infrared light to be able to not, be obscured

51:31 as much because of dust. You can look deeper into places where stars and planets are being formed.

51:36 so what get called sort of generally protoplanetary environments. So pre before that,

51:42 but when the stars are just even actually before sort of nuclear fission has started and the star

51:47 hasn't actually turned on, you can probe those environments. So understanding how solar systems like

51:53 ours form, is kind of a big theme, right? Cause it takes a while for that stuff to build up,

51:58 to get enough gravitational force to actually light up a star, right? Yeah. It's formed for a long time.

52:03 Yeah. and then the, and then the sort of final kind of big highlight for JST is it's going to be

52:07 the first telescope that really is going to be able to look at the atmospheres of planets outside our

52:13 own solar system. So those are generally called exoplanets. so over the past, you know, five, 10 years,

52:19 the number of planets going around stars other than our sun has grown from like two to, you know,

52:26 thousands. And we now think that most stars have planets. and there's pretty good reasons to

52:32 believe that most stars have rocky planets somewhat like earth, you know, not maybe the same mass, but,

52:39 but have, you know, places that might have, atmosphere. So JST is going to be able to

52:45 look at the light passing through the atmosphere of those planets and actually characterize that.

52:52 So you can look for things like methane and ozone. And, and so one of the things that is exciting about

52:58 that is that you could look for exoplanets that have atmospheres that aren't in an equilibrium state,

53:03 as in maybe have life. So that's kind of exciting. So we're really at this point where we're beginning

53:09 to think about characterizing, you know, we've discovered all these exoplanets. Now we're going

53:13 to say, well, what are they like? And, and I mean, this is, yeah, this is kind of a pretty exciting time

53:18 for everybody really.

53:20 Yeah. And it sounds like it's right up your alley actually as well. So, what about the

53:25 wide field infrared survey? Yeah. What is it up? So it's a little bit later. So James Webb is 2020.

53:31 The W first is 2025. yes.

53:34 Theoretically.

53:35 Yes.

53:35 Planned.

53:36 Maybe 26 now. Well, we'll see. so, yeah, so, W first is a different, fundamentally a different

53:42 type of telescope. It's actually, about five years ago. the U S government sort of,

53:49 somebody picked up a phone or emailed somebody at NASA and said, Hey, we've got a couple of

53:53 spare space telescopes. Would you like one? And so this is the, I forget which agency is that

53:59 builds all the U S spy telescopes, but they basically donated. They said, we have a, we have this

54:04 telescope that's never flown. In fact, we've got two, but you probably don't need to. Do you want

54:09 this one? And, and it's kind of a bit like Hubble in that, in the sense it's about two and a

54:14 half meter mirror. And, and the goal is to do an infrared again. So, longer wavelength,

54:21 optical light, I'd go and, do large scale survey of the sky. So one of the things about,

54:26 building telescopes and space is that you can, you don't have an atmosphere to look through.

54:31 And, that turns out to be a big deal because it means you get much better.

54:34 we astronomers call it seeing, but you get much better resolution. So the shape of the thing that

54:39 you see is not blurred by the atmosphere that you're looking through. So, infrared,

54:45 infrared space telescopes particularly are very exciting, especially when you're doing a large

54:50 survey. So W first is a survey telescope, Hubble and Jada was here. What are generally called sort of,

54:57 well, aren't survey telescopes. They're sort of a point and shoot. They fix on a point and

55:01 they'll stay there for maybe a long time to see farther into the past. Yeah. So W first is exciting

55:06 because of the volume of data. So instead of, you know, Hubble over the last 30 years has produced

55:11 something like a hundred terabytes of data. W first will produce about five petabytes, which is a

55:17 not ridiculous amount of data, but it's enough to be interesting and like requires some thought.

55:23 Yeah. I see a lot of interesting machine learning and image, AI type stuff being applied there.

55:28 Right. And so W first is gonna, has a number of key, science goals. again,

55:33 like exoplanets features heavily there. especially, what's called, micro lensing. So

55:41 when a, when a planet passes in front of a star, you get a slight increase in the brightness

55:46 because of the, effect, the, the micro lensing of the planet, which is just sounds crazy,

55:52 but it's great. Yeah. But the spend in space time curves the light to come over and dark energy,

55:58 which is this sort of not very well understood, what in fact pretty poorly understood, you know,

56:03 fraction of component of, of the universe. So understand like in sort of cosmology terms, how,

56:09 how the universe, kind of works. So it's, and it's, there's a, you need very large samples

56:15 of the galaxy and looking at like supernova and like distances and how they go off and how they're

56:22 affected, over, I was a cosmological distances. And then you need to do lots of shape

56:28 measurements of galaxies and it's, it's, but you need a big, you need a lot of them and you need

56:33 very high precision data. And so W first fundamentally is a different type of, of telescope, but it's,

56:39 yeah, it should be really interesting to see new, new science coming from this new type of telescope.

56:43 Yeah. Yeah.

56:45 Awesome. All right. Well, we could talk about space for hours actually, but I w I want to be

56:50 cognizant of your time. And, but one more thing that you worked on, I thought it's cool. Just give

56:54 you a chance to tell the world about is Zooniverse. What's Zooniverse?

56:58 Yeah. So Zooniverse is a, platform, a web-based platform for, citizen science, which is,

57:04 so citizen science, is this idea where members of the public citizens of the world, can help,

57:11 solve real research problems. So, it basically is a platform that brings together

57:17 people with research problems, generally academic research problems. so generally sort of professional

57:22 researchers have a problem where some, you know, they have some part of their analysis or some part

57:29 of their, their research project requires a lot of, you know, human effort is probably the best

57:36 way to think of it. So maybe classifying images by their type or, pictures of galaxies by their shape.

57:43 And so Zooniverse is a sort of a platform for bringing together the people who have the problems

57:47 and members of the public who are interested in working on these problems.

57:50 That's cool. So you can go and say, Hey, I'm interested in a project and maybe browse the existing projects

57:55 and then you'll learn how to participate.

57:56 That's right. So there's probably about, I'm guessing there's about 50 projects there listed today.

58:01 I haven't been day to day involved as Zooniverse for about, for about five years now.

58:05 but I was, I guess, second hire on the project after they'd had this original success with a project

58:11 called galaxy zoo, which was, taking a lot of images from a, from a telescope called the Sloan

58:16 digital sky survey, and looking at galaxies and making a judgment about their shape, whether they

58:21 had spiral arms, if they did, which way they were spinning, what, how many there were. and,

58:28 and, and, and that kind of thing. And it was very, there was like a one-off project that was very,

58:32 very successful. They, secured some research funding to build out this, I guess, this approach to doing

58:40 science with the public, and Zooniverse was born out of that. So, yeah, I mean, we did a whole bunch

58:46 of stuff. I mean, as I say, I, I don't know, I don't track it day to day these days, but we did crazy stuff

58:52 like, you know, looked at, images from camera traps in the Serengeti, looking at, looking at,

58:59 pictures of animals, doing things like tracing, you know, particle paths in, particle

59:07 physics data, looking for new physics, lots more looking at galaxies and gravitational

59:13 lensing. It was really broad. Actually. It was really fun. Yeah. That sounds really fun. Actually.

59:18 I don't know if this was part of it, but I, I knew there was this protein folding challenge where

59:24 they almost gamified that it's like that kind of stuff. Yeah, that's right. So that wasn't, that

59:29 wasn't us, but it was, definitely similar idea. And so, you know, the idea being that there's just

59:34 people are generally like, there's lots of people who are interested in science, but you know, just

59:39 aren't doing that day to day and are interested in contributing. And, yeah. So there's a chance

59:44 to go help with some problem and you don't need a PhD and a grant to do it. Sure. And, and you

59:49 know, some of the, some of the best projects are ones that just really, I think we didn't know at the

59:55 start we're going to be successful, but I think probably still my favorite project is this one called

59:59 old weather, which is, which is like probably the most boring title ever. It was pointed out to

01:00:04 me once, but, so taking a log books from ships, from the Royal Navy, world war one,

01:00:11 where they recorded the weather. And so it turns out that six times a day, the Royal Navy, and

01:00:17 actually lots of navies do now still, you know, they record, you know, the air pressure, the water

01:00:22 temperature, the atmospheric conditions, cloud coverage, that kind of thing. and just write

01:00:27 it down. And the handwriting is generally not very good. The way it's laid out on the page is complex.

01:00:32 And so you can do OCR on it, and try and get a machine to read it, but you still need that sort

01:00:37 of context to then extract, extract the data. So we, but these log books are really cool. They've got,

01:00:43 they've got basically stories about what's going on on the ship. We made a website where people could

01:00:47 transcribe them and people just got really into this and following, there was a guy, Lieutenant

01:00:53 Dolphin. I remember cause his last name's Dolphin, like the, the starfish is at the mammal.

01:01:00 and Dolphin kept getting thrown off ships for being drunk and disorderly getting reassigned. And they

01:01:07 found him over like 10 years on different Royal Navy ships. And there'd be a note from the captain

01:01:13 saying that Dolphin's been, you know, relieved of command and sent to another ship.

01:01:17 And just, but the, the, somebody got interested in, in this person and followed them. And then,

01:01:22 you know, there was like, you know, there's, it turns out there was only like two major sea battles

01:01:27 in the first world war, you know, battle of Jutland and the battle of the Falklands.

01:01:31 Turns out I know about this stuff now. That's the other thing. It was fun just doing lots of

01:01:35 other people's research, but you know, in these log books, you're, you're watching the battle

01:01:39 happen. It's saying, you know, spotted, you know, enemy battleship engaging, you know,

01:01:44 collecting survivors or sinking and like, you know, all real world, like stuff happening.

01:01:50 but so, but at the same time, these data that we're extracting get fed into these climate

01:01:56 models. So they do reconstructions of, climate, in, over historical times. Cause one of the

01:02:02 challenges in sort of understanding climate change today is actually having a long enough baseline

01:02:07 to build models that can actually make good predictions for the future. And so,

01:02:11 right. Right. Right. Much of that's over a land, right? Right. Right. So, cause you have that,

01:02:15 you could dig down into the ice or whatever, but the water washes that away, right? It's gone.

01:02:20 Exactly. So, this was a project we did with a bunch of meteorologists. and it was, yeah,

01:02:25 so it was a lot of fun. And, and for me, it was mostly a technology problem, building that

01:02:31 kind of infrastructure, but it was fun to like do lots of, or be involved in lots of people's research

01:02:36 as well. All right, Arvon, I think probably we're going to have to leave it there. Maybe just a quick

01:02:41 final call to action. People want to get involved with Joss or more generally all the stuff we've been

01:02:46 speaking about. People would like to help review or want to learn more than I think the URL will

01:02:51 probably be in your show notes. but Joss, Joss.theoj.org. yeah, we'd love to, we'd love to have

01:02:57 your help. It's been really good to chat with you and share what we're up to. Thanks for the

01:03:01 opportunity. Absolutely. Thanks for sharing your story. It's, it's cool working and keep it up. Yeah.

01:03:05 Thank you. All right. bye. Take care. This has been another episode of Talk Python to Me.

01:03:11 Today's guest was Arvon Smith, and this episode has been brought to you by ActiveState and Rollbar.

01:03:17 ActiveState gives you a faster way to build and secure open source runtimes from your first line of code

01:03:25 through to production. Check it out at talkpython.fm/active state. Rollbar takes the pain out of

01:03:32 errors. They give you the context insight you need to quickly locate and fix errors that might have gone

01:03:38 unnoticed until your users complain, of course. As Talk Python to Me listeners track a ridiculous

01:03:43 number of errors for free at rollbar.com slash talkpythontome. Want to level up your Python? If you're just

01:03:50 getting started, try my Python jumpstart by building 10 apps or our brand new 100 days of code in Python.

01:03:57 And if you're interested in more than one course, be sure to check out the everything bundle. It's like

01:04:01 a subscription that never expires. Be sure to subscribe to the show, open your favorite podcatcher and search

01:04:06 for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play

01:04:12 feed at /play and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy.

01:04:19 Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

01:04:24 Thank you.

01:04:35 Bye.

01:04:35 Bye.

01:04:35 Bye.

01:04:35 Bye.

01:04:36 Bye.

01:04:36 Bye.

01:04:36 Bye.

01:04:37 Bye.

01:04:38 Bye.

01:04:39 Bye.

01:04:40 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:42 Thank you.

01:04:44 Thank you.

Talk Python's Mastodon Michael Kennedy's Mastodon