Learn Python with Talk Python's 270 hours of courses

#157: The Journal of Open Source Software Transcript

Recorded on Friday, Apr 6, 2018.

00:00 One of the hottest areas of growth for Python is in the scientific and data science communities.

00:04 But if that work is done in an academic or research setting, it can be very hard to get

00:09 proper credit for it. You have to write full-on peer-reviewed articles. That's where Arvon Smith

00:15 and Joss, or the Journal of Open Source Software, comes in. Here, developers, scientists, or other

00:20 research-oriented folks can submit their software as a brief paper. Join us on this episode to learn

00:26 all about that and Arvon's work with some of the most cutting-edge projects in astronomy

00:31 at the Space Telescope Science Institute. This is Talk Python to Me, episode 157, recorded

00:37 April 6, 2018.

00:53 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the

00:58 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter

01:03 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm

01:08 and follow the show on Twitter via at Talk Python. This episode is brought to you by ActiveState

01:13 and Rollbar. Please check out what they're both offering during their segments. It really

01:17 helps support the show. Arvon, welcome to Talk Python.

01:20 Thank you for having me. It's really wonderful to have you here. I'm super excited to talk

01:25 about Joss and this whole open journal of open source computing and scientific computing. I

01:31 think what you guys are doing there is really wonderful, and I think it'll open up some

01:34 possibilities and opportunities to a lot of listeners that maybe they weren't aware of.

01:38 Great. Yeah. I mean, it's a fun project, and so it's always fun to talk about it with new

01:44 people.

01:44 Should be great. But before we get to it, let's start with your story. How do you get into

01:48 programming?

01:48 Yeah. So I'm definitely not a professionally trained programmer. Let's put it that way.

01:54 So I have a background in chemistry as an undergraduate, and then actually I have a PhD in astrochemistry,

02:01 which is kind of like doing chemistry with big telescopes, so looking at gas and dust in space.

02:07 You're one of these people that can look at stuff like 25 light years away or something,

02:11 and go, oh, that probably has this element in the atmosphere.

02:13 Yeah. Yeah.

02:14 Yeah. Basically, and basically I went to astrochemistry. I did a PhD, in fact, actually,

02:22 because I really just didn't know what else to do, which sounds like an awful idea, but

02:26 it's how it happened. And I also wasn't that interested in chemistry, so I decided to go towards

02:33 astrochemistry, where you don't actually have to be that good at chemistry, it turns out,

02:37 because most astronomers don't know anything about chemistry, so you can be quite successful

02:42 with a little bit.

02:43 I absolutely love chemistry, but I'm scared of doing chemistry. Like when I pick up, say,

02:49 benzene or something, and they're like, oh, yeah, that's carcinogenic. And if it gets on your skin,

02:54 it'll soak through, so don't do that.

02:55 I know.

02:56 Yeah.

02:56 This really freaks me out to do this stuff, even though it's cool.

02:58 Yeah. So I had a lot of time in the lab as an undergraduate, and actually, it means I'm really

03:04 averse to precision anything now. So precision baking, in fact, baking, which is really just

03:10 chemistry, I hate it, and it drives my wife crazy. I just don't like measuring anything. She's a very

03:17 good cook and just won't ever have me in the kitchen, because I'm so averse to anything precise

03:22 involving ingredients now. And I blame my undergraduate.

03:26 Yeah.

03:27 Yeah.

03:28 So, yeah, sorry. So I did some Fortran programming. That was my first exposure to programming,

03:35 was as an undergraduate Fortran, which is very popular in chemistry still, computational chemistry

03:42 especially, because it's got a lot of very fast kind of numerical routines. If you need to work out

03:47 how an electron is interacting with another electron, you need fast maths to do that.

03:52 And then during my PhD, like lots of people, I had to do some data analysis and had reasonable

04:00 amount of data to process. So started with scripting languages. So first kind of language I really

04:05 picked up was Perl. And that was really, you know, that was because that was what was on the shelf in

04:12 the office. And this would be like 2002 or something. So, you know, that was probably a reasonable choice

04:19 then, I guess. And so, you know, mixture of sort of Perl, some C, some Fortran. And then I went to this

04:26 live course run by the library, by what was called information services, which is library science,

04:32 and did a web programming, as it was called course, and learned about HTML and stuff. And that thought that

04:40 was really exciting. And learned about iframes and then came back to my office and said, hey, iframes are really cool.

04:47 They were like, never use iframes. I was like, really? Oh, they're so awesome.

04:50 They seem amazing.

04:52 Yeah, they seem amazing.

04:53 And so then I started to pick up just had a very slightest kind of touch of PHP. But actually,

05:01 that was at a time when this framework Ruby on Rails was actually sort of a sort of beta phase,

05:07 I guess, 2004 or five. And a friend of mine who was, who was actually a legitimately very talented

05:13 programmer, I think, and, you know, coded since he was a kid, was like, oh, you should totally look at

05:19 this Rails thing if you're interested in sort of web application development.

05:21 Put down the PHP.

05:23 Yeah, yeah, yeah, for sure. So actually, you know, so I started using Rails, which at the time,

05:28 I didn't really even know I was sort of using Ruby, I guess, and then had a few years just kind of

05:34 building, toying around with stuff, and then read this book called Ruby for Rails, written by a guy

05:43 called David Black, who's big in the kind of in the Ruby community. He's famously sort of, you know,

05:48 was on IRC when David Heinmeier-Hansen was learning Ruby and building Rails and stuff. And, and,

05:55 and it's really just kind of explains how this sort of framework you're using is, is, is,

06:00 is actually just Ruby syntax. And it's just a, you know, DSL. And that was really enlightening for me.

06:06 And actually, that meant that by the end of my PhD, I realized I was much more interested in

06:11 programming than I was any of the science I was doing. I took that as a strong signal I should get

06:16 out of academia. Yeah, I received the same signal, by the way.

06:19 Yeah, well, it's just, you just start to realize that some people read the literature a lot more

06:23 than you and just know more. And I was like, how do you know about that result? And they're like,

06:28 well, I read papers. I'm like, huh, I'm not really doing that very much. And that seems like a bad sign.

06:33 But I was writing, reading a lot of programming books. So I just sort of gracefully exited with

06:38 my PhD at the end of my studies and, and actually went and had a year working in bioinformatics,

06:45 which is where Ruby is very big as well. So actually knowing Ruby there was actually

06:48 kind of a big deal. That's kind of the go to scripting language for, for bioinformatics,

06:53 certainly in the mid 2000s.

06:55 I think, you know, that, that sounds really interesting. And I think what you're doing

06:59 today is actually, you know, I'm just so excited to be able to talk to you a little bit later at the

07:04 end of the show about it. So tell people where you've gotten to today.

07:07 Yeah. So today I work at a place called Space Telescope Science Institute, which is in Baltimore,

07:13 on the US East Coast. And it, we run, we were actually set up to fly and operate the Hubble

07:19 Space Telescope. So that was something like 30 years ago that the Institute was created.

07:25 And we're actually a nonprofit government contractor. So we get, you know, NASA still pays us to operate

07:32 Hubble. And we're currently developing all the sort of ground systems, data management systems for the

07:39 James Webb Space Telescope, which is the kind of next big flagship mission for NASA.

07:43 And so we have a lot of, we do, we build a lot of kind of core infrastructure for data processing,

07:48 which is a lot of the work that I oversee here. We call it data management. And we also build a lot

07:56 of community tools, which are all pretty much exclusively these days built on the sort of

08:01 scientific Python. So position being in charge of lots of time and effort that we spend on scientific

08:07 Python, but full disclosure, I have never written a single line of Python in my life. So that's kind of

08:12 interesting for me. But I know a reasonable amount about open source and that kind of stuff. So I feel

08:17 qualified, but it's interesting that sometimes I write pseudocode, and I'm pretty sure it would

08:22 never compile. In fact, I've been told as much. So, you know, yes, I'm not a Python expert by any

08:29 stretch, although I should probably learn sometimes.

08:32 Yeah, I think it's really interesting how Python is really becoming highly used in this open source

08:37 science space. And it seems to really be something adopted by the various telescopes,

08:43 right? That was like a big theme at the conference last year.

08:46 I think at PyCon last year, a guy called Jake Vanderplass gave a keynote.

08:50 Yeah. Yeah, he gave a great keynote.

08:52 Yeah. And it was something like the unreasonable effectiveness of Python for science or something.

08:57 It was it was kind of a fun title. And yeah, he just really sketched out what I would say now,

09:04 which is just there's just this really deep set of libraries that you can use in sort of numerical

09:11 scientific computing in Python. And then, of course, you can do things if you need to have C bindings or

09:17 whatever you can do that, too. And so there's just really just about it goes very deep. And

09:22 and really now there's just this sort of overwhelming quantity of of kind of core libraries out there.

09:30 So that then things like we have a we have a project where we have a lot of core contributors for a thing

09:35 called AstroPy, which is a very popular library in astronomy and astrophysics. And, you know,

09:40 that builds upon SciPy and NumPy and obviously Python and, you know, and so it's really, you know,

09:46 where we're contributing to that ecosystem. We had people here a few years ago who were very active

09:53 on things like Matplotlib, that kind of thing. And so, you know, there's pretty the Institute's actual

09:58 credentials in the sort of Python community pretty deep, actually. People like Perry Greenfield,

10:04 who's still here, was really one of the key people that actually introduced the astronomical

10:10 community to Python, you know, and that that, you know, it's kind of interesting to reflect on the

10:15 fact that sometimes big changes come from just one or two people just deciding that they're going to

10:20 make a change. Right from the ground up. Yeah. Yeah. And so I feel very lucky that, you know, I can go and

10:26 he's like four doors down from me. I can go and just ask him questions about, hey, why? Why is it this

10:31 way? And he's like, let's talk about that. You know, he's got so much context. It's fantastic.

10:35 That's probably a good segue to just talking about the journal that you're the

10:40 chief editor for the journal of open source software, Joss.

10:44 That's right. Yeah. Yeah. So Joss is a, well, I'll give you the one line description. I'd like to

10:52 call it a developer friendly journal, which and so we should probably talk more about what that what I

10:58 mean by that. But Joss is a journal that tries to do all the right things in terms of being a legitimate

11:05 academic journal. And it's surprising how the establishment, I would say at some level makes

11:11 that look very hard and very complex. And it's actually really not. You have to do some of the

11:15 right things like register with the Library of Congress and get an ISBN number and things like

11:20 that. I mean, there's weird stuff that you just genuinely wouldn't know. But it's not like it's like

11:25 five things you need to do not 50. And we publish papers about open source software with a scientific

11:32 goal, whether software is science or research focused, I should say. And so generally, that means

11:38 academics who are writing software submit to us. And there's a couple of things that are kind of

11:42 important about it. One is we, we review, primarily for the quality of the software submission. So we're

11:51 actually not doing a big review of a paper. The does need to be a paper, it's generally very short. In

11:58 fact, we encourage it to be short. So just papers are generally less than two sides, a four or US letter,

12:04 however, you printed it out, they're really genuinely short. And they are submission format is marked down.

12:11 And, and, and bid text that we use pandoc to compile the things. And the kind of submission and review,

12:18 and kind of hold whole editorial process happens on GitHub, in public repository, so a lot in a public

12:25 reviews repository. So it's kind of a, yeah, it's, it's sort of interesting and weird in some of the

12:31 things we do. But at the end, we sort of try and do all the right things in terms of we give a DOI,

12:37 which is a kind of a weird URL shortener that academics use that mean that you can index,

12:43 sort of citations to other work.

12:46 Right. One of the issues that runs around that comes up around that is, if I wrote, say, a paper published

12:53 in a high end journal, and it references some package I depend upon that generated my results,

12:59 the owner of that package could just be having a bad day and just go, I'm deleting this GitHub repo,

13:04 and it's gone, right? And so this DOI is sort of a, almost like code in escrow type of thing,

13:10 right?

13:10 Yeah, so we, there's sort of, there's actually two DOIs that get created when a JOS paper comes into,

13:17 well, gets published. We make an archive of the software, or actually we request that the author makes an archive

13:23 of the software. So there's, there's, there's tools, like a tool called Figshare, and there's another one called

13:32 Zenodo that is run by the CERN, people at CERN, so people who do sort of the computing infrastructure

13:39 for the large Hadron Collider. And what they do is they, you, they actually set up a webhook. If you do

13:45 this from a GitHub repository, they, you know, they, you basically configure this add-on. And when you do a

13:51 release on GitHub, it makes an archive, it takes a snapshot of that code. And it actually doesn't include

13:56 the Git history, which weirdly maybe you would want, you know, actually legitimately you might want,

14:00 but it, it takes it like a table from GitHub, archives it and gives it a DOI. So the DOI points to Zenodo

14:07 then, but Zenodo then also has a copy of the code. So if yes, you or I decided to, you know, rage quit

14:14 open source or something, or just get really burnt out, actually, that's not rage quitting at all. That's just

14:20 legitimately decided to disappear off the planet. We, we, the code is still available. So when, when you submit to JOS

14:27 the last one of the last steps is when the review is complete and the changes have been requested and

14:32 made to the satisfaction of the reviewer, then they take an arc, we ask the author to make an archive of

14:38 code. And then the paper also gets a DOI. So, so then when people cite, want to cite that package,

14:45 we encourage them to cite the paper and then the paper connects to the archive of the code, if that makes

14:51 sense. So there's sort of some guarantee that in the future, if you stumbled across this paper, then you

14:56 should be able to still find the source code, even if it's not on GitHub or GitHub doesn't exist or

15:01 something. Right, right, right. You know, I think it's, there's a lot of stuff happening around there and

15:05 I don't want to go too deep down the hole of, of that, but even if you have the source code, that doesn't

15:11 necessarily mean it's saved for all, all time. Right. So maybe it runs on a certain flavor of

15:17 Linux that has a certain version of some internal bits that it works on. And if that is gone, right,

15:23 there's like, there's layers outside just the software. There's the versions of Python, if it

15:27 were based on Python, right, there's, there's whole layers of this and, you know, things like containers,

15:32 like Docker and whatnot, are like interesting players in the space as well.

15:36 Yeah. Yeah, for sure. And I think there's definitely more we could do there. One of the

15:41 things that's come out of the work we've done on Joss is that we've got a sort of fairly generic

15:48 tool set of tooling now. So we've got a sort of a fairly lightweight web application that allows

15:54 people to submit something for review. And then we've got an automated bot that's called

16:01 Weeden. Some of us are Firefly fans, I guess, Joss Weeden.

16:06 It is the Weeden handle on GitHub, which is kind of fun. And so that bot actually helps with a lot of

16:12 the editorial management. So a lot of that is sort of chat ops kind of automated in GitHub issues.

16:18 And so that tool chain can actually be applied to other things. So one of the things that's coming

16:23 up is that Joss has actually been forked to make a sort of sister journal called Journal of Open Source

16:28 Education or Jose. And that's actually using exactly the same tool chain. It's literally just a fork,

16:34 the code base. And so we've generalized that. So you could imagine we definitely talked about

16:39 containers as something that actually are interesting to think about reviewing and saving and having those

16:46 as actually there's been the idea of the sort of Journal of Open Source Containers as a journal.

16:51 I'm not sure exactly what I think about that yet, because I actually think to your point,

16:56 it might even just be better to say, well, if you've got a Joss submission, really,

17:00 what we want you to do is have a supporting kind of infrastructure piece like container to make sure

17:04 that that software has some chance of running in the future, some increased longevity. But we haven't,

17:10 we haven't gone that far yet. But it's definitely interesting.

17:13 It's very interesting. I think it also may, it may put extra pressure and friction, though,

17:18 on getting submissions.

17:20 Sure. Yeah, I mean, we're definitely not short on submissions. We've been going for a little

17:26 under two years now. And we're up to close to 300 submissions.

17:29 That's awesome.

17:30 Yeah, it's great. And it keeps me busy. And the editorial team, we've got a great team of editors.

17:36 So part of me thinks, huh, if we could slightly reduce the number of submissions, that'd be kind

17:40 of cool. Let me help my Friday evenings. But no, no, you're absolutely right. It's not,

17:45 we don't want to raise the bar too high. We feel like we've got a pretty good kind of quality bar right

17:51 now. You know, it says it in the name, you have to use an open source license, not one that you've

17:56 made up, you know, one that's approved by the OSI.

17:58 An official one.

17:59 Yeah, yeah. You pick one of these 300, it turns out, or whatever. But there are lots, but you

18:03 know, pick one. And then we, our review is primarily, you know, about the sort of usability

18:08 of the software. We encourage people to have tests, ideally automated tests. Documentation is

18:15 a must. Some, you know, we sort of have, you know, acceptable, better, best kind of categories.

18:22 And, you know, one of the reasons we set up JOS was to, that we felt like a lot of the software

18:29 that's in the sort of academic literature, when people write software papers, which is a thing

18:33 outside of JOS, like you write a paper about software to get some sort of career credit as

18:38 an academic, give people something to cite. Nobody ever looked at the software. The review

18:44 was always about the paper and never about the software. So we've turned it on its head. Most

18:48 of our review is about the software and not about the paper.

18:50 Yeah, I think that's right. I think that's the right way to do it. And you, like you said

18:54 at the beginning, the actual submission, what's your guidelines for the thing you accept is

19:00 really simple. It's like an abstract and basically supporting materials and links to the software.

19:05 Absolutely.

19:05 So maybe, maybe it's worth talking about briefly. Like, why does this exist? Because you mentioned

19:11 there was these other software oriented journals. There's pick an industry. There's 50 journals

19:17 in that industry. They're usually like expensive. You got to buy them. They're private. They go out

19:22 to like university libraries and professors and stuff like that.

19:26 For me, the number one motivation for JOS is to find a way to credit people in academic settings,

19:34 or in fact, research settings. The difference, I don't know whether it's interesting, you know,

19:38 academic research being sort of public, not for profit. And, you know, commercial research is a more

19:44 sort of general, you know, could include commercial activity, I guess. So people who are in a research

19:50 setting, who are writing software as part of their job, who are struggling to get career credit for

19:55 that. And that turns out to encompass a lot of people that I know. I guess technically, it probably

20:03 would have been me at one point, except I don't actually, I personally wasn't ever trying to sort of

20:08 follow an academic career track, which resulted, you know, relied upon papers and that kind of thing.

20:13 Right. But this is basically the currency of professorships and tenure track positions, right?

20:19 Yeah, yeah. So the way that you get a job as an academic is you write papers, and then people cite

20:24 your work. And, you know, if you write enough papers, and you get enough citations, and then at some point,

20:31 a group of people in a room, they're generally called like a tenure committee, tenure review committee will

20:36 decide that you're, you can have a permanent job in academia. And that's like the golden ticket tenure.

20:40 And, and the sort of the problem with that is that it's primarily and in many universities,

20:49 exclusively based on papers, it does not take into account whether you give really good public talks,

20:57 which lots of universities would say is a good, like outreach is important, or it would not take

21:01 into account the fact that you spent three years collecting a really valuable data set that lots of

21:07 people have used, like data sets generally aren't credit worthy. Like the only thing we have, we have

21:12 this one dimensional kind of credit model, which is papers and citations. And, and so the problem is,

21:18 if you write really high quality software for a research setting, you might spend a significant

21:24 fraction of your time doing that. And if you spend so much time that you, your number of papers

21:28 suffers, then you're going to get dinged on that in terms of your career prospects. And so software

21:34 papers, so a paper that describes a piece of software is a sort of understood hack on the current academic

21:43 system, except that software papers come with a bunch of problems. And Josh tries to address a few of

21:49 those one is one of which is, you know, if you want to write a paper about a piece of software,

21:52 you generally have to have sort of supporting new research results, right? That's the hardest part.

21:58 I think like it's incredibly tough. It's not enough to say I've built the most efficient, most awesome AI

22:04 framework for discovering exoplanets, you have to go and find exoplanets. And then you do coincidentally,

22:11 you get to talk about how you did it.

22:13 Yeah, it just happens. I have this repository over here, which happens to be on GitHub with a license,

22:18 and you might want to use it, you know, like, it's like a byline of the paper. So I have a problem

22:22 with that. And I think it especially is bad for long lived, so arguably like successful software,

22:30 right? Like, if there were only ever one release of a piece of software, you could say, okay, well,

22:36 you know, you probably built it when trying to do some research. So you should write a paper that

22:40 describes the software and some research results, the end. Okay, but if what happens on version two,

22:45 like if you and I decide to work on a piece of software that you did version one, that you probably

22:51 don't want to write another paper, because now people might cite the other paper. And now you don't get

22:56 academics worry about citation dilution, it sounds like such a ridiculous thing, but it's real,

23:01 like, and because it turns into this number called an H index, which is just, well, is a way of

23:08 trying to parameterize capture the sort of impact of a researcher. So yeah, so I have,

23:14 so Joss papers are, you know, the idea is that no results are actually permitted. It's not that we

23:20 don't need results, you're not allowed to put novel results in the paper, because we're not going to

23:24 review that, like we, that's not what the reviewers are there to do. So that's really sort of why we call

23:30 it developer friendly, like the idea being, if you have done the hard work to write this piece of

23:35 software, we don't want you to spend more than roughly an hour writing the paper to go with it. And that

23:42 turns out to be, you know, appealing to quite a lot of people.

23:45 Yeah, that's really awesome. And so people can, who have worked in any research area, like you said, not just

23:51 academics, if they want credit for the software that they have created, in terms of academic credit, right?

23:57 Yeah, the special coins you get at the university, when you get cited, right? That type of currency.

24:02 Yeah. So so actually, to that point, you know, we do get I was actually looking through some submissions

24:08 earlier today, we do have mixtures of people submitting, you know, most most of our authors are in an academic

24:16 setting. So either in some kind of research institute, but where papers legitimately count towards their sort of

24:23 career progressions. But we do have people in commercial companies as well, especially in the sort of data science

24:30 ecosystem. I was looking today, and there was a paper, it was a scikit-learn

24:36 contript package. I think it was the HDB scan, some some some new, you know, implementation of a of a of an

24:44 algorithm. And it was somebody at a university and somebody from Spotify, I think, or, or no, Shopify,

24:51 actually, yeah, very similar. But you know, but it was clearly looked like it was a data scientist and

24:56 at a company who maybe doesn't care about that credit, but also maybe was a university in a university

25:03 setting at some point, I think, especially in data science, I feel like there are many people who go and

25:09 sort of go a long way in the academic space, and then they go into commercial data science. And I feel

25:14 there's this interesting tension and trade off in the whole data science space in that the demand for

25:20 data scientists so strongly in motivates or pulls people out of academics as they're like, we'll pay

25:27 you half a million dollars. Forget tenure, you can just, you know, you can do this, right? Yeah, yeah.

25:33 But there's there's still that that tie back to like, I work with this professor or this group,

25:37 and I'm still kind of helping them and they give me good ideas. And so I feel like maybe a lot of the

25:42 papers come from that sort of remaining ties. Yeah, groups have Yeah, some some definitely do. I don't

25:49 think I have a good handle on how many. But the I think, even if you've left academia, I think there's

25:57 still, you know, the conventional way to share your ideas is in the literature, publishing,

26:03 papers. And so I think it's very natural for people to want to, to write a paper. And then, you know,

26:09 if we're like, well, here's a super short way of getting a paper.

26:12 Exactly. Well, especially if you've done the work, and we'll get to some of the details in a little bit. But I

26:18 feel like what you've done is you've come up with this concept of how you submit your stuff to the journal,

26:23 your software, and it pretty much just checks the box of here's how you run a good open source

26:28 project. Yeah, it has a good open source license, it has documentation, it has tests,

26:32 tests, it's hosted in somewhere like GitHub, etc, etc, right?

26:36 Yeah, so when I was working at GitHub, I really learned a lot about how, you know, successful

26:42 projects come sort of into existence. And what are some of the sort of key things that you need. And so

26:47 really, you know, there's, there's a lot of there's a lot of material out there now. But certainly,

26:51 four and a half years ago, this sort of idea of sort of what healthy, open source looked like,

26:56 or what a successful project looked like wasn't, like wasn't written down that much. And so

27:00 the team I was in GitHub was sort of trying to create some of that sort of shared understanding

27:07 in the community. And actually, while I was there, I was thinking about journals, and had already been

27:14 talking to some other commercial publishers, just who were asking me about GitHub. So I sort of helping

27:21 them understand open source, but I was representing the company. And, and, and then towards the sort of in

27:27 my final year, the company, I just sort of figured, you know what, I like none of the conversations I've

27:32 had have been very satisfying. They're not getting it, like they're not, like they're, they're doing

27:37 kind of the wrong thing. And I realized, I just started hacking on some code. I was like, you know,

27:41 I think I could do this. And I think it'd be super easy. And if you assume that the review could happen

27:46 in an issue, and, you know, submission is just creating an issue. And like, I realized, I mean, we have a

27:52 strong GitHub dependency via the API. But I just realized that I think thought I kind of knew enough

27:59 to do it better myself. So then so then just decided to go for it.

28:05 This portion of Talk Python to Me is brought to you by ActiveState. ActiveState gives you a faster way

28:10 to build and secure open source runtimes from your first line of code through to production. Every

28:15 second you spend building your Python distro or trying to secure your Python programs is time away

28:20 from doing the work you love. Tired of resolving dependencies or making sure you tick off all the

28:25 security boxes when you ship to production? With ActiveState, you can focus on your code and leave the

28:30 open source to them. Your teams can standardize with their Python builds for your specific use.

28:36 You'll have less friction in the development cycle. And that means you can deliver apps faster. If you

28:41 need to manage your apps in production, they even give you a unique server side way to verify your Python

28:46 applications at runtime. You can bake security right into your Python products without impacting performance.

28:51 Cut the hours wasted building your distro, finding the right package, or making sure you tick off all

28:56 the security boxes when you ship to production. Go faster, spend more time doing the work you love,

29:00 and comply with your enterprise needs for security. Try them and see why their distribution was chosen

29:06 by IBM, Microsoft, NASA, Siemens, PepsiCo, and others. Join millions of developers who trust ActiveState to

29:13 build their open source language distros. Visit talkpython.fm/ActiveState and cut the time

29:19 configuring and securing your Python builds. That's talkpython.fm/ActiveState.

29:25 Let's talk a little bit about just the whole process. Actually, let's touch first on the kinds

29:32 of projects that come up here. So people are listening. They're like, well, I've done some

29:35 stuff. It's kind of research. It's a project. Would it fit? So let me just read the quick

29:40 description of a couple recent things you guys accepted. So one is NIA Pi, N-I-A-Pi, and it's a new

29:48 micro framework for building and using nature-inspired algorithms in Python. So that's pretty cool.

29:55 There's one Pi Newcastro, which is Python interfaces for nuclear reaction rate databases,

30:02 including J-I-N-A stuff. There's a bunch of these. They're not super large scale. They seem a lot of

30:11 them are like, I need to get to this data or I need to do this calculation and nothing quite works. So

30:16 here's the bridge that I had to build for myself a lot of times.

30:19 Yeah. So we get, I mean, there's a few different categories of software that we get. Actually,

30:22 there's a bunch. I mean, those two, I'm pretty sure I didn't edit either of those. Because,

30:27 you know, one of the problems we have is, you know, we get submissions. I'm like, I have no idea.

30:31 I don't even, like we asked them to write, we asked the authors to submit a general audience,

30:38 for a generalist audience, a summary of their software. And I read them, I'm like,

30:43 I have no idea what this software does. I literally do not understand any of these sentences.

30:47 Thankfully, we've got like, I think 16 editors now and somebody will be like, oh yeah, I know this

30:53 stuff. Oh, I know enough that I can help edit this. And then we have a pretty good reviewer pool now.

31:00 It's, I think, well over 200 people who volunteered to review for the journal. And so we just have

31:07 their sort of language expertise and their subject expertise and their GitHub handles. So we just

31:13 ping them when the submission is made.

31:15 I think that works really well. And you do have a real rockstar cast of supporting editors. I mean,

31:20 you look through there and there's, there's a bunch of big names, including Jake Vanderplass.

31:24 Yep.

31:24 Catherine Huff, who does a bunch of the nuclear stuff. Maybe she probably reviewed that one

31:28 that I just shot at.

31:30 Yeah.

31:30 Yeah.

31:30 And the reason I learned about you guys is I actually was one of the reviewers on a thing called

31:36 Batman.

31:36 Ah.

31:37 Statistical analysis for expensive computer codes made easy.

31:41 Fantastic. I thought I recognized your face. There you go.

31:44 Yeah.

31:45 Yeah.

31:45 Yeah.

31:45 See?

31:45 I didn't make that connection.

31:47 It's from my GitHub. Yeah.

31:48 Yeah.

31:48 That's awesome.

31:49 Thank you for your time.

31:50 Yeah.

31:50 Absolutely.

31:51 Absolutely.

31:51 So one of the things I thought might be fun, and I think one of the reasons I wanted to

31:56 feature this is sort of twofold. One, if you're working in an area of either research,

32:02 you're a grad student or even undergrad, and you've got some kind of interesting open source

32:07 software thing, that would be a good thing to submit. That would be great. But there are many

32:11 people who ask me like, hey, I'm really just getting started.

32:13 I want to contribute to open source. But you can't just drop in on Django and just start

32:18 adding to Django because it's like an eight-year-old, highly polished, maybe not exactly, but not new

32:25 greenfield stuff. It's like very sensitive to change, and it's very nuanced. Whereas I think

32:31 becoming a reviewer, for example, might be a really nice way to start to be part of open source if you're

32:36 like a student or something.

32:38 For sure.

32:38 And you're trying to get it on your resume for when you get out of school or whatever.

32:42 Yeah, I mean, one of the things that I think this is true, I only have anecdotal evidence,

32:47 but I'm going to believe it because it supports what I want to believe, which is that people seem to

32:52 genuinely enjoy both the review process and being reviewed. But authors and reviewers seem to just,

33:01 we could do probably a sentiment analysis of all the comments or something because they're all public.

33:06 But, you know, we get people who say, I say, you know, would you mind reviewing this? I'd love to.

33:12 I'm like, really? Okay. I mean, I've had people email me and say, why haven't you given me anything

33:16 to review yet? I'm like, I don't know. And I'm just kind of, your name hasn't come up yet. And I do worry

33:22 that sometimes I go to people I know who would be good for this. So I actually, one of the things

33:28 I'd like to automate is reviewer suggestions. We have this big list. It's in the spreadsheet.

33:32 I feel like it's something that our bot could do. But yeah, maybe you could have tags and like,

33:37 these are their specialties. Yeah, exactly. Exactly. And, and, and it would also keep track

33:42 of how many reviews you've done because one sort of over taxing people is one of the things you worry

33:47 about. This person's good. Yeah. Yeah, yeah, exactly. So, so, but yeah, for sure. People seem to,

33:54 you know, a number of times where reviewers are like, this was a really good experience because I learned

33:58 about this package and all that all really we want people to do as a reviewer is try and install

34:03 it and run it and verify it. Right. So that involves reading the docs, looking at the code.

34:08 Ideally, you know, if you find that there's methods that are uncommented, quite often reviewers will

34:14 actually make pull requests against the thing they're reviewing, which is kind of nice. And just there's

34:20 this, you know, everyone gets the idea that what we're trying to do is create a, you know, highly usable

34:26 piece of software, you know, that solves, solves a real problem. So yeah, I think actually that's a

34:32 great suggestion for people who are looking to, you know, dip their toe, you know, take first steps

34:38 in open source. Joss is actually a great place to come and just read, read people's, you know, code.

34:45 Often, as you say, these are pretty small packages and, you know, and actually maybe even become a

34:51 contributor. Yeah, that's a great idea. Absolutely. A lot of people who are getting it,

34:56 open source, some of their first steps into any particular project is to help with documentation

35:00 or tutorials or examples. And this, this review process is similar to that.

35:05 It is. Yeah. And, you know, we, we, we have a fairly prescriptive process, so we don't kind of

35:10 leave people with just, what do you think about this piece of software? It's like, there's like 20

35:14 checkboxes that we ask people to. Yeah. Does it have an open source license? Yes or no? Yes.

35:19 Does it have tests? Yes or no? It's almost like a little checkbox. I think there are actual

35:22 checkboxes in there. There are. There are. Yeah, yeah, yeah. Yeah, absolutely. Absolutely. So we

35:27 checkbox driven development or something. Nice. Yeah. So I recommend to people out there, if this

35:32 sounds super interesting, if maybe you're still in college or grad school and you're like, I want to

35:37 sort of, you know, start to build a resume around this kind of stuff, you know, becoming a reviewer

35:42 would be real easy. And people who are, especially in school, they already have some specialty so they

35:47 can help in that area. Yeah, yeah, for sure. Nice. And so Joss is actually one of four journals

35:53 under a larger banner called just Open Journals at OJ.org, right? Yeah, that's correct. Tell us

35:59 briefly about the others. Yeah. So, so the first, I think the first one I set up is this one

36:06 called the Open Journal of Astrophysics. and that's my least successful. So I guess I was so

36:11 stepping back, it, it, it would appear based on the evidence I create journals in my spare time,

36:16 which is a horrible thing to do if you want to have any spare time, in the future. So yeah,

36:21 so this is like my problem in life that I seem to make academic journals. and it, and it,

36:27 it, it's a big time sink. so the Open Journal of Astrophysics was the first. we actually

36:34 published three papers. It's kind of currently paused right now, mostly because we don't really have a

36:39 particularly strong or, well, no, we have a strong, but not very engaged editorial board.

36:44 And the important thing to realize about a journal is it kind of lives and dies by the ability and

36:50 willingness of the editors to do, you know, and the reviewers to come together and review content.

36:55 So, you know, the Open Journal of Astrophysics kind of, and a bit of a hiatus right now.

37:00 I don't know what will happen with that project. it's a nice, it's a nice project.

37:05 in the sense that we, the model is to review papers that are already on the archive, which is a preprint

37:12 server where people put kind of free and open copies of papers that they're going to submit to other

37:17 journals. So the idea being that you could just do a sort of review, in a browser. and the,

37:23 there's this, there's lots of other journals now following this model. They call them archive overlay

37:27 journals. and so, I'm sorry, we didn't, weren't more successful with that, but you know,

37:34 such as life. the second is, the journal of brief ideas. I built that with a guy called David

37:41 Harris, who's a physicist, in Australia. And he just has this problem. He just really wanted to

37:49 find a way to capture, short ideas, good or bad. and have a way for people to just write them

37:57 down and say, you know, here's an idea. I'm not going to take it further or I don't have time right

38:03 now. And so it's more like a sort of diary of thoughts from the community. and it could be

38:09 like the seed of potential research projects, but I'm not going to pursue it. That type of thing.

38:13 Right. And, and, you know, why that exists is kind of interesting. You know, academics live or die at

38:20 some level by the quality of their ideas and the novelty of their ideas, which is good and bad.

38:26 so academics, it turns out care a lot about, you know, who had the idea first. And I, I really feel

38:33 like that's one thing reflecting on time and sort of industry, something that I find engineering cultures

38:39 care much less about. You're like, we care about building a good system, something reliable. I don't

38:45 care whose idea this was. This is just a good idea. Whereas academics are very keen,

38:49 very careful to award, you know, Oh, it was this person, you know, Mike's idea first. And then

38:54 I took it forward, but he had the original, you know, you hear them a lot.

38:58 It's very careful.

38:59 It's the citations. It's the papers. And these are all driven by the first paper. It's the citations on every subsequent paper and all that. Right.

39:06 So journal brief ideas as a way for people to say, I've got an idea. It could be good. It could be bad.

39:12 I want to write it down because I want to, I guess, put it, I mean, you know, like put a stake in the

39:18 ground. A little bit of a flag on this idea. If I ever come back to it. It's kind of fun.

39:22 And people, people do use it. And I don't edit that. It's David is the sole editor. It doesn't go

39:30 through review. So they're limited to 200 words, these ideas. So really short. So you can have 200 words

39:37 and a figure like, you know, and that's it. And then, and then Joss is the third journal that I've

39:45 created. And that's by far the most successful. It's at some level, sort of one of the things that

39:52 I did when building Joss was I really wanted to absolutely minimize the amount of new software I

39:58 wrote. One of the hard things about Open Journal of Astrophysics is there was quite a complex sort of

40:03 web-based UI with PDF annotations and like lots of bits of technology that I wasn't particularly well

40:10 versed in. And so Joss is super simple. It's like a web form that leverages the GitHub API to open an

40:16 issue. And that's it. Like literally that's it. And it's got a very small database behind it so that

40:21 it can render out the accepted papers. And then Journal of Open Source Education is, is ready to go. In fact,

40:29 I think they're very close to accepting submissions. And that, that's not a journal that I'm going to be

40:35 day to day involved in, in running, but it's part of the sort of family of journals. So really the two

40:43 that are most similar are Joss and Jose, they are, you know, very, very similar journals and using the

40:49 same rating model. Yeah, those are really cool. I like it.

40:55 This portion of Talk Python to Me has been brought to you by Rollbar. One of the frustrating things about

41:00 being a developer is dealing with errors, relying on users to report errors, digging through log files,

41:06 trying to debug issues, or getting millions of alerts, just flooding your inbox and ruining your day.

41:11 With Rollbar's full stack error monitoring, you get the context, insight and control you need to find and

41:16 fix bugs faster. Adding Rollbar to your Python app is as easy as pip install Rollbar. You can start

41:23 tracking production errors and deployments in eight minutes or less. Are you considering self hosting

41:28 tools for security or compliance reasons? Then you should really check out Rollbar's compliance

41:32 SaaS option. Get advanced security features and meet compliance without the hassle of self hosting,

41:38 including HIPAA, ISO 27001, Privacy Shield and more. They'd love to give you a demo. Give Rollbar a try

41:46 today. Go to talkpython.fm/Rollbar and check them out.

41:51 So let's just talk real briefly about compare and contrast. Most of these articles are written

41:59 and published in high end, very private, cloistered sort of journals, right? Like

42:06 JAMA for Journal of American Medical Association or JRME for education. And you can't just like

42:13 easily go get them. The papers are often not available on the internet.

42:17 they're really packaged away just for a few folks to get to, which I think is very odd because so much

42:23 of the research is paid for by National Science Foundation or National Institute of Health or

42:27 whatever. So we basically, the public pays for this research and then the results of it are hidden away

42:34 from public view. Right. Yes.

42:37 So yeah, like this is very much not like what you guys are doing.

42:41 No, I mean, everything you said is true. I think, you know, there's a growing interest in what's

42:47 generally termed open access publishing, which is not, you know, so stuff once it's accepted and is in

42:52 the journal is available for all to read. But right now, a lot of the business models of academic publishing

42:59 either rely on journal subscriptions. So when your library or you as an individual buy access to these

43:06 papers, and that's generally, you know, a journal subscription and that can run to, you know,

43:12 enormous amounts of money, you know, single universities paying millions of dollars a year

43:16 to publishers, just to gain access to hilariously or disgracefully the papers that

43:23 their academics have written. So, you know, like, yeah, exactly messed up actually.

43:29 and you, you, you as an academic secure public funding often for research, you then do the

43:36 research, you give your copyright to your research, to the journal that then puts it behind a paywall

43:43 and sells it back to your university and to the public. So, you know, there's a lot wrong with that.

43:47 To be fair, the journals would say, well, we add a lot of value. We bring peer review to the process.

43:53 we, you know, make the papers, we format them nicely. We, we, we, we, you know, we, we run,

44:00 we maintain quality at some level and much of that is true. but the, the, the cost is still pretty

44:07 high. And I think there's a lot of interest in low cost publishing these days. And that not doesn't

44:12 mean low quality. It just means how low can that cost go at some level. And so just our running costs,

44:19 if you ignore people time, which I'm going to, because we're all volunteers, we're something

44:25 around $4 per paper, to in sort of production costs. And that's actually most of that cost is,

44:32 you know, a small web server for running, the app, the web app and the fact that there are

44:38 actual, we have to pay, subscription fees to get the DOIs. So it costs us about a dollar 50 for each

44:45 DOI. And we have to pay a membership fee to this organization called cross ref to, to have

44:51 sort of, to be able to continue to generate those, those DOIs.

44:55 Right. That dramatically changes the structure. And, you know, I know that on the academic

45:01 journals, it's often professors who are not paid by the journal in any way or form asked to volunteer

45:07 for in the same role as the reviewers are here. So it's not like they're paying huge sums to the

45:13 reviewers. No, no, no, no, no, no. I mean, that's, yeah, yeah. It's crazy. Yeah. So I,

45:17 you know, I'm not, yeah, there's so much to say about publishing. and, and, you know,

45:24 one thing that's kind of interesting is that peer review, which we see as this, you know, pinnacle of

45:29 quality and as a process, it's actually pretty new. Peer reviews only existed for 50 years,

45:35 just full stop. Like most journalists just didn't have it. and like famously people like Einstein

45:40 would write to the editor and they tend to let her saying, Oh, you know, here's my new theory of

45:44 special relativity. And there is nobody qualified in the world to review this. So you must publish it

45:49 right now. And they'd be like, yeah, okay. You know, you're, I think you're right. Seems legit,

45:53 but like you just didn't get a review. It just got published. And so peer review is, you know,

45:58 is important. but generally, as you say, you know, people aren't paid for it. I think

46:02 almost exclusively people aren't paid for it. And so, it's part of your sort of contribution

46:07 to the, the, the academic ecosystem, yeah, it's part of your job as an academic to review. And

46:14 that's understood. We have the same model. We have people to review, we don't pay them.

46:19 and I don't think we have any interest in paying people to review. That would be weird, given

46:24 that we have no money anyway. So, but, but, but we, what we do have, I would say is I think we,

46:30 by being open, our reviews being open, a lot of peer review in academic journals is, is closed. You

46:36 don't know who's reviewing your work as anonymized. I feel like that openness incentivizes good

46:42 behavior, and actually quality. we get, you know, somebody, I, I hope that one day somebody

46:49 will be able to say, I am a Josh reviewer. I've reviewed 20 submissions. Here are my submissions.

46:53 And you can go and look at those and be like, this person does really nice reviews. This is actually

46:59 really high quality, good insights from this person. And, you know, there is some work

47:05 already going on in sort of publishing to make, reviews a sort of a credit worthy activity.

47:12 I mean, people write it on their resumes. They'll say, I review for, you know, app J or something,

47:17 but, but you can't actually prove that like, unless you're, unless you're the editor and you're like,

47:22 no, you don't review for me. Like you would never know. so yeah, no, I think there's some,

47:26 a lot to be said for sort of being open and there's a lot to be said for sort of innovating

47:30 with, cost models, and pricing models. And so, yeah, we don't charge anything to submit

47:35 to JOS. I don't think we have any interest in charging authors. We do, you know, you know,

47:40 we, we do have some ideas about how the review that we do could be valuable in other academic

47:48 settings, like for other journals who want to get software review, but with that's just early phase

47:53 stages of conversation right now, but it's interesting.

47:56 Yeah. Yeah. It's, it's really, really neat. I feel like Josh is open source and, you know,

48:02 2018 or 2016 business on the internet meets old, old business model. You know, it's just like,

48:09 wait, these, why, why is it done that way? Cause it doesn't seem like it needs to be done that way.

48:13 So yeah, pretty fun. Right. Right. Yeah. So I do want to spend a little bit of time talking

48:18 about other stuff. So maybe we'll leave it there for Josh. Okay. I just want to encourage people

48:23 who've worked on open source projects to either submit them or sign up for review. Cause I think

48:27 that would be cool. So let's talk about the, the space telescope science Institute where you work.

48:32 Right. So you've got two major new telescopes coming out that, that you mentioned at the top,

48:40 right? The James Webb space telescope. And what's the other one called? I forgot. It's like the whole sky.

48:45 Oh, so yeah. So I mean, Hubble has been running for 25 years. We operate that still. And then there's,

48:51 W first, which is a wide field, infrared space telescope. In fact, that's what the acronym

48:56 stands for. is, is, a mission that's currently kind of having a little bit of a rocky

49:03 stage in, Congress cause you know, budgets are weird. And, these projects span longer than

49:10 election cycles, which is dangerous. Yeah. it's so interesting to see, like, I've never had a job

49:17 where I've actually had to pay attention to politics daily. I now have that job. and it's

49:22 interesting. it's, also not being American. It's kind of learning about learning about that world.

49:28 so wait, they could do that. Yeah. And so, you know, so, so currently very active, very, we're very

49:36 active on JWST, James Webb space telescope, which is, is meant to fly, June ish, 2020. so

49:46 these are, you know, the lifetime, you know, it takes a long time. It turns out to both convince the

49:52 government to spend $9 billion, which is what JWST is going to cost. So that's a lot

49:58 obviously, and then, and then you have to build it and, you know, there's lots of novel

50:02 technology that's just never been developed before. And, and it's, yeah, it's, they're

50:07 complex and, you know, they take decades to build, it turns out. Yeah. So what's, what's the

50:11 primary, result expected from the James Webb one and then the, the wide field one? Yeah. So,

50:17 I mean, JWST is, I think for me, the most exciting thing about JWST is, it's gonna,

50:25 so it's an infrared space telescope, and infrared light is different in optical in the

50:31 sense that it can kind of look further back in time because infrared light isn't obscured by dust

50:37 in the galaxy and in the universe or isn't obscured as much. And so it allows us to look back further.

50:43 and so, look at the first light coming from the universe. So a period of time called

50:49 reionization when the universe kind of, when the sort of, you know, first atoms and,

50:55 were forming after the big bang. And so that's, you know, some hundreds of millions of years after the

51:01 big bang, JWST is going to be able to see the first galaxies and the first stars forming. And that

51:07 assembly of the very first, the very first galaxies, the stars, become sort of gravitationally bound.

51:12 And so that's, that's very exciting for lots of reasons, but, you know, understanding the earliest

51:18 phases of the universe, another, another, kind of couple of big areas of, some science

51:24 highlights there, the ability because of the infrared light to be able to not, be obscured

51:31 as much because of dust. You can look deeper into places where stars and planets are being formed.

51:36 so what get called sort of generally protoplanetary environments. So pre before that,

51:42 but when the stars are just even actually before sort of nuclear fission has started and the star

51:47 hasn't actually turned on, you can probe those environments. So understanding how solar systems like

51:53 ours form, is kind of a big theme, right? Cause it takes a while for that stuff to build up,

51:58 to get enough gravitational force to actually light up a star, right? Yeah. It's formed for a long time.

52:03 Yeah. and then the, and then the sort of final kind of big highlight for JST is it's going to be

52:07 the first telescope that really is going to be able to look at the atmospheres of planets outside our

52:13 own solar system. So those are generally called exoplanets. so over the past, you know, five, 10 years,

52:19 the number of planets going around stars other than our sun has grown from like two to, you know,

52:26 thousands. And we now think that most stars have planets. and there's pretty good reasons to

52:32 believe that most stars have rocky planets somewhat like earth, you know, not maybe the same mass, but,

52:39 but have, you know, places that might have, atmosphere. So JST is going to be able to

52:45 look at the light passing through the atmosphere of those planets and actually characterize that.

52:52 So you can look for things like methane and ozone. And, and so one of the things that is exciting about

52:58 that is that you could look for exoplanets that have atmospheres that aren't in an equilibrium state,

53:03 as in maybe have life. So that's kind of exciting. So we're really at this point where we're beginning

53:09 to think about characterizing, you know, we've discovered all these exoplanets. Now we're going

53:13 to say, well, what are they like? And, and I mean, this is, yeah, this is kind of a pretty exciting time

53:18 for everybody really.

53:20 Yeah. And it sounds like it's right up your alley actually as well. So, what about the

53:25 wide field infrared survey? Yeah. What is it up? So it's a little bit later. So James Webb is 2020.

53:31 The W first is 2025. yes.

53:34 Theoretically.

53:35 Yes.

53:35 Planned.

53:36 Maybe 26 now. Well, we'll see. so, yeah, so, W first is a different, fundamentally a different

53:42 type of telescope. It's actually, about five years ago. the U S government sort of,

53:49 somebody picked up a phone or emailed somebody at NASA and said, Hey, we've got a couple of

53:53 spare space telescopes. Would you like one? And so this is the, I forget which agency is that

53:59 builds all the U S spy telescopes, but they basically donated. They said, we have a, we have this

54:04 telescope that's never flown. In fact, we've got two, but you probably don't need to. Do you want

54:09 this one? And, and it's kind of a bit like Hubble in that, in the sense it's about two and a

54:14 half meter mirror. And, and the goal is to do an infrared again. So, longer wavelength,

54:21 optical light, I'd go and, do large scale survey of the sky. So one of the things about,

54:26 building telescopes and space is that you can, you don't have an atmosphere to look through.

54:31 And, that turns out to be a big deal because it means you get much better.

54:34 we astronomers call it seeing, but you get much better resolution. So the shape of the thing that

54:39 you see is not blurred by the atmosphere that you're looking through. So, infrared,

54:45 infrared space telescopes particularly are very exciting, especially when you're doing a large

54:50 survey. So W first is a survey telescope, Hubble and Jada was here. What are generally called sort of,

54:57 well, aren't survey telescopes. They're sort of a point and shoot. They fix on a point and

55:01 they'll stay there for maybe a long time to see farther into the past. Yeah. So W first is exciting

55:06 because of the volume of data. So instead of, you know, Hubble over the last 30 years has produced

55:11 something like a hundred terabytes of data. W first will produce about five petabytes, which is a

55:17 not ridiculous amount of data, but it's enough to be interesting and like requires some thought.

55:23 Yeah. I see a lot of interesting machine learning and image, AI type stuff being applied there.

55:28 Right. And so W first is gonna, has a number of key, science goals. again,

55:33 like exoplanets features heavily there. especially, what's called, micro lensing. So

55:41 when a, when a planet passes in front of a star, you get a slight increase in the brightness

55:46 because of the, effect, the, the micro lensing of the planet, which is just sounds crazy,

55:52 but it's great. Yeah. But the spend in space time curves the light to come over and dark energy,

55:58 which is this sort of not very well understood, what in fact pretty poorly understood, you know,

56:03 fraction of component of, of the universe. So understand like in sort of cosmology terms, how,

56:09 how the universe, kind of works. So it's, and it's, there's a, you need very large samples

56:15 of the galaxy and looking at like supernova and like distances and how they go off and how they're

56:22 affected, over, I was a cosmological distances. And then you need to do lots of shape

56:28 measurements of galaxies and it's, it's, but you need a big, you need a lot of them and you need

56:33 very high precision data. And so W first fundamentally is a different type of, of telescope, but it's,

56:39 yeah, it should be really interesting to see new, new science coming from this new type of telescope.

56:43 Yeah. Yeah.

56:45 Awesome. All right. Well, we could talk about space for hours actually, but I w I want to be

56:50 cognizant of your time. And, but one more thing that you worked on, I thought it's cool. Just give

56:54 you a chance to tell the world about is Zooniverse. What's Zooniverse?

56:58 Yeah. So Zooniverse is a, platform, a web-based platform for, citizen science, which is,

57:04 so citizen science, is this idea where members of the public citizens of the world, can help,

57:11 solve real research problems. So, it basically is a platform that brings together

57:17 people with research problems, generally academic research problems. so generally sort of professional

57:22 researchers have a problem where some, you know, they have some part of their analysis or some part

57:29 of their, their research project requires a lot of, you know, human effort is probably the best

57:36 way to think of it. So maybe classifying images by their type or, pictures of galaxies by their shape.

57:43 And so Zooniverse is a sort of a platform for bringing together the people who have the problems

57:47 and members of the public who are interested in working on these problems.

57:50 That's cool. So you can go and say, Hey, I'm interested in a project and maybe browse the existing projects

57:55 and then you'll learn how to participate.

57:56 That's right. So there's probably about, I'm guessing there's about 50 projects there listed today.

58:01 I haven't been day to day involved as Zooniverse for about, for about five years now.

58:05 but I was, I guess, second hire on the project after they'd had this original success with a project

58:11 called galaxy zoo, which was, taking a lot of images from a, from a telescope called the Sloan

58:16 digital sky survey, and looking at galaxies and making a judgment about their shape, whether they

58:21 had spiral arms, if they did, which way they were spinning, what, how many there were. and,

58:28 and, and, and that kind of thing. And it was very, there was like a one-off project that was very,

58:32 very successful. They, secured some research funding to build out this, I guess, this approach to doing

58:40 science with the public, and Zooniverse was born out of that. So, yeah, I mean, we did a whole bunch

58:46 of stuff. I mean, as I say, I, I don't know, I don't track it day to day these days, but we did crazy stuff

58:52 like, you know, looked at, images from camera traps in the Serengeti, looking at, looking at,

58:59 pictures of animals, doing things like tracing, you know, particle paths in, particle

59:07 physics data, looking for new physics, lots more looking at galaxies and gravitational

59:13 lensing. It was really broad. Actually. It was really fun. Yeah. That sounds really fun. Actually.

59:18 I don't know if this was part of it, but I, I knew there was this protein folding challenge where

59:24 they almost gamified that it's like that kind of stuff. Yeah, that's right. So that wasn't, that

59:29 wasn't us, but it was, definitely similar idea. And so, you know, the idea being that there's just

59:34 people are generally like, there's lots of people who are interested in science, but you know, just

59:39 aren't doing that day to day and are interested in contributing. And, yeah. So there's a chance

59:44 to go help with some problem and you don't need a PhD and a grant to do it. Sure. And, and you

59:49 know, some of the, some of the best projects are ones that just really, I think we didn't know at the

59:55 start we're going to be successful, but I think probably still my favorite project is this one called

59:59 old weather, which is, which is like probably the most boring title ever. It was pointed out to

01:00:04 me once, but, so taking a log books from ships, from the Royal Navy, world war one,

01:00:11 where they recorded the weather. And so it turns out that six times a day, the Royal Navy, and

01:00:17 actually lots of navies do now still, you know, they record, you know, the air pressure, the water

01:00:22 temperature, the atmospheric conditions, cloud coverage, that kind of thing. and just write

01:00:27 it down. And the handwriting is generally not very good. The way it's laid out on the page is complex.

01:00:32 And so you can do OCR on it, and try and get a machine to read it, but you still need that sort

01:00:37 of context to then extract, extract the data. So we, but these log books are really cool. They've got,

01:00:43 they've got basically stories about what's going on on the ship. We made a website where people could

01:00:47 transcribe them and people just got really into this and following, there was a guy, Lieutenant

01:00:53 Dolphin. I remember cause his last name's Dolphin, like the, the starfish is at the mammal.

01:01:00 and Dolphin kept getting thrown off ships for being drunk and disorderly getting reassigned. And they

01:01:07 found him over like 10 years on different Royal Navy ships. And there'd be a note from the captain

01:01:13 saying that Dolphin's been, you know, relieved of command and sent to another ship.

01:01:17 And just, but the, the, somebody got interested in, in this person and followed them. And then,

01:01:22 you know, there was like, you know, there's, it turns out there was only like two major sea battles

01:01:27 in the first world war, you know, battle of Jutland and the battle of the Falklands.

01:01:31 Turns out I know about this stuff now. That's the other thing. It was fun just doing lots of

01:01:35 other people's research, but you know, in these log books, you're, you're watching the battle

01:01:39 happen. It's saying, you know, spotted, you know, enemy battleship engaging, you know,

01:01:44 collecting survivors or sinking and like, you know, all real world, like stuff happening.

01:01:50 but so, but at the same time, these data that we're extracting get fed into these climate

01:01:56 models. So they do reconstructions of, climate, in, over historical times. Cause one of the

01:02:02 challenges in sort of understanding climate change today is actually having a long enough baseline

01:02:07 to build models that can actually make good predictions for the future. And so,

01:02:11 right. Right. Right. Much of that's over a land, right? Right. Right. So, cause you have that,

01:02:15 you could dig down into the ice or whatever, but the water washes that away, right? It's gone.

01:02:20 Exactly. So, this was a project we did with a bunch of meteorologists. and it was, yeah,

01:02:25 so it was a lot of fun. And, and for me, it was mostly a technology problem, building that

01:02:31 kind of infrastructure, but it was fun to like do lots of, or be involved in lots of people's research

01:02:36 as well. All right, Arvon, I think probably we're going to have to leave it there. Maybe just a quick

01:02:41 final call to action. People want to get involved with Joss or more generally all the stuff we've been

01:02:46 speaking about. People would like to help review or want to learn more than I think the URL will

01:02:51 probably be in your show notes. but Joss, Joss.theoj.org. yeah, we'd love to, we'd love to have

01:02:57 your help. It's been really good to chat with you and share what we're up to. Thanks for the

01:03:01 opportunity. Absolutely. Thanks for sharing your story. It's, it's cool working and keep it up. Yeah.

01:03:05 Thank you. All right. bye. Take care. This has been another episode of Talk Python to Me.

01:03:11 Today's guest was Arvon Smith, and this episode has been brought to you by ActiveState and Rollbar.

01:03:17 ActiveState gives you a faster way to build and secure open source runtimes from your first line of code

01:03:25 through to production. Check it out at talkpython.fm/active state. Rollbar takes the pain out of

01:03:32 errors. They give you the context insight you need to quickly locate and fix errors that might have gone

01:03:38 unnoticed until your users complain, of course. As Talk Python to Me listeners track a ridiculous

01:03:43 number of errors for free at rollbar.com slash talkpythontome. Want to level up your Python? If you're just

01:03:50 getting started, try my Python jumpstart by building 10 apps or our brand new 100 days of code in Python.

01:03:57 And if you're interested in more than one course, be sure to check out the everything bundle. It's like

01:04:01 a subscription that never expires. Be sure to subscribe to the show, open your favorite podcatcher and search

01:04:06 for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play

01:04:12 feed at /play and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy.

01:04:19 Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

01:04:24 Thank you.

01:04:35 Bye.

01:04:35 Bye.

01:04:35 Bye.

01:04:35 Bye.

01:04:36 Bye.

01:04:36 Bye.

01:04:36 Bye.

01:04:37 Bye.

01:04:38 Bye.

01:04:39 Bye.

01:04:40 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:41 Bye.

01:04:42 Thank you.

01:04:44 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon