#157: The Journal of Open Source Software Transcript
00:00 Michael Kennedy: One of the hottest areas of growth for Python is in the scientific and data science communities, but if that work is done in an academic or research setting, it can be very hard to get proper credit for it. You have to write full-on, peer-reviewed articles. That's where Arfon Smith, and JOSS, or the Journal of Open Source Software, comes in. Here, developers, scientists, or other research-oriented folks can submit their software as a brief paper. Join us on this episode to learn all about that and Arfon's work with some of the most cutting-edge projects in astronomy at the Space Telescope Science Institute. This is Talk Python To Me, Episode 157, recorded April 6th, 2018. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by ActiveState and Rollbar. Please check out what they're both offering during their segments, it really helps support the show. Arfon, welcome to Talk Python.
01:20 Arfon Smith: Thank you for having me.
01:21 Michael Kennedy: It's really wonderful to have you here. I'm super excited to talk about JOSS and this whole open journal of open-source computing and scientific computing. I think what you guys are doing there is really wonderful and I think it will open up some possibilities and opportunities for a lot of listeners that maybe they weren't aware of.
01:38 Arfon Smith: Great, it's a fun project, and so it's always fun to talk about it with new people.
01:44 Michael Kennedy: Should be great, but before we get to it, let's start with your story. How did you get into programming?
01:48 Arfon Smith: I'm definitely not a professionally trained programmer, let's put it that way. I have a background in chemistry as an undergraduate, and actually, I have a Ph.D. in astrochemistry, which is kind of like doing chemistry with big telescopes, so looking at gas and dust in space.
02:07 Michael Kennedy: You're one of these people that can look at stuff, like, 25 light years away or something and go, oh, that probably has this element in the atmosphere.
02:13 Arfon Smith: Yeah, basically, I went to astrochemistry, I did a Ph.D. in fact because I really just didn't know what else to do, which sounds like an awful idea, but it's how it happened. I also wasn't that interested in chemistry so I decided to go towards astrochemistry, where you don't actually have to be that good at chemistry, it turns out, because most astronomers don't know anything about chemistry, so you can be quite successful with a little bit.
02:43 Michael Kennedy: I absolutely love chemistry, but I'm scared of doing chemistry. When I pick up, say benzene or something they're like, oh yeah that's carcinogenic and if it gets on your skin it'll soak through, so don't do that, you know?
02:55 Arfon Smith: I know, yeah.
02:56 Michael Kennedy: This is what really freaks me out to do this stuff, even though it's cool, I don't want to do it.
02:59 Arfon Smith: I had a lot of time in the lab as an undergraduate. Actually it means I'm really averse to precision anything now, so precision baking, in fact, baking, which is really just chemistry. I hate it and it drives my wife crazy. I just don't like measuring anything. She's a very good cook and just won't ever have me in the kitchen because I'm so averse to anything precise involving ingredients now, and I blame my undergraduate chemistry. So, so yes, sorry. So that's...I did some FORTRAN programming in, that was my first exposure to programming, was as an undergraduate. FORTRAN, which is very popular in chemistry still, computational chemistry especially, because it's got a lot of very fast kind of numerical routines. If you need to work out how an electron is interacting with another electron, you need fast maths to do that. Then during my Ph.D., like lots of people, I had to do some data analysis and had a reasonable amount of data to process so started with scripting languages. So first kind of language I really picked up was Perl. That was because that was what was on the shelf in the office, and this would be, like, 2002 or something. So that was probably a reasonable choice then, I guess. A mixture of Perl, some C, some FORTRAN, and then I went to this course run by the library, by what was called Information Services, I should have said Library Science, and did a web programming, as it was called, course and learned about HTML and stuff and thought that was really exciting, and learned about IFrames and came back to my office and said, "Hey, Iframes are really cool!" They were like, never use IFrames. I was like, really? They're so awesome.
04:51 Michael Kennedy: They seem amazing. Don't do that.
04:52 Arfon Smith: So then I started to pick up, had kind of the very slightest touch of PHP, but actually that was at a time when the framework Ruby on Rails was actually in sort of beta phase, I guess 2004 or '05. A friend of mine who is a legitimately very talented programmer, I think, and coded since he was a kid, was like, oh you should totally look at this Rails thing if you're interested in sort of web application development.
05:21 Michael Kennedy: Put down the PHP!
05:23 Arfon Smith: Yeah, yeah, yeah, for sure. So actually I started using Rails which, at the time, I didn't really even know I was sort of using Ruby, I guess, and then had a few years just kind of building, toying around with stuff, and then read this book called Ruby for Rails, written by a guy called David Black, who's big in the Ruby community. He's famously sort of, you know, was on IRC when David Heinmeier Hansson was learning Ruby and building Rails. It really explains how this framework you're using is actually just Ruby syntax and DSL, and that was really enlightening for me. That meant that by the end of my Ph.D. I realized I was much more interested in programming than I was any of the science I was doing. I took that as a strong signal I should get out of academia.
06:17 Michael Kennedy: I received the same signal, by the way.
06:20 Arfon Smith: You just start to realize that some people read the literature a lot more than you and just know more, and I was like, how do you know about that result? They were like, well, I read papers. I was like, huh, I'm not really doing that very much. That seems like a bad sign. But I was writing, reading a lot of programming books, so I just sort of gracefully exited with my Ph.D. at the end of my studies and actually went and had a year working in biochematics, which is where Ruby's very big as well. So actually knowing Ruby there was kind of a big deal. It's the kind of go to scripting language for bioinformatics certainly in the mid-2000s.
06:55 Michael Kennedy: I think that sounds really interesting and I think what you're doing today is actually... I'm just so excited to be able to talk to you a little bit later at the end of the show about it. So tell people where you've gotten to today.
07:06 Arfon Smith: Today I work at a place called Space Telescope Science Institute, which is in Baltimore, on the U.S. east coast. We were actually set up to fly and operate the Hubble Space Telescope, so that was something like 30 years ago that the institute was created. We're actually a nonprofit government contractor. NASA still pays us to operate Hubble, and we're currently developing all the ground systems, data management systems, for the James Webb Space Telescope, which is the next big flagship mission for NASA. We have a lot of, we build a lot of core infrastructure for data processing, which is a lot of the work that I oversee here. We call it data management and we also build a lot of community tools which are all pretty much exclusively these days built on the scientific Python. So position being in charge of lots of time and effort that we spend on scientific Python but, full disclosure, I've never written a single line of Python in my life. So that's kind of interesting for me. But I know a reasonable amount about open source and that kind of stuff, so I feel qualified, but it's interesting that sometimes I write pseudo-code, and I'm pretty sure it would never compile. In fact, I've been told as much. So I'm not a Python expert by any stretch. Although I should probably learn sometime.
08:32 Michael Kennedy: I think it's really interesting how Python is really becoming highly used in this open-source science base, and it seems to really be something adopted by the various telescopes, right? That was a big theme at the conference last year.
08:46 Arfon Smith: I think at PyCon last year a guy called Jake VanderPlas gave a keynote.
08:50 Michael Kennedy: Yeah, he gave a great keynote.
08:52 Arfon Smith: Yeah, and it was something like The Unreasonable Effectiveness of Python for Science or something. It was kind of a fun title. He really sketched out what I would say now, which is just this really deep set of libraries that you can use in numerical scientific computing in Python. Of course you can do things if you need to have C bindings or whatever, you can do that too. It goes very deep and really now there's this sort of overwhelming quantity of core libraries out there. So that then things, we have a project where we have a lot of core contributors for a thing called AstroPy, which is a very popular library in astronomy and astrophysics, and that builds upon SciPy and NumPy and obviously Python. We're contributing to that ecosystem. We had people here a few years ago who were very active on things like Matplotlib, that kind of thing. The institute's actual credentials in the Python community are pretty deep actually. People like Perry Greenfield, who's still here, was really one of the key people that actually introduced the astronomical community to Python. It's kind of interesting to reflect on the fact that sometimes big changes come from just one or two people deciding that they're going to make a change.
10:20 Michael Kennedy: Right, from the ground up, yeah.
10:23 Arfon Smith: So I feel very lucky that he's, like, four doors down from me, I can go and ask him questions about... hey, why is it this way? And he's like, well, let's talk about that. You know, he's got so much context, it's fantastic.
10:35 Michael Kennedy: That's probably a good segue to talking about the journal that you're the chief editor for.
10:41 Arfon Smith: Yes.
10:42 Michael Kennedy: Journal of Open Source Software, JOSS, right?
10:45 Arfon Smith: That's right, yeah.
10:46 Michael Kennedy: What is JOSS?
10:47 Arfon Smith: So JOSS is a, well, I'll give you the one-line description. I like to call it a developer-friendly journal. We should probably talk more about what I mean by that. JOSS is a journal that tries to do all the right things in terms of being a legitimate academic journal, and it's surprising how the establishment at some level makes that look very hard and very complex, when it's actually really not. You have to do some of the right things like register with the Library of Congress and get an ISBN number and things like that. There's weird stuff that you genuinely wouldn't know, but it's like five things you need to do, not 50. We publish papers about open-source software with a scientific goal, whether the software is science or research focused, I should say. Generally that means academics who are writing software submit to us. There's a couple of things that are kind of important about it. One is we review primarily for the quality of the software submissions. We're actually not doing a big review of a paper. There does need to be a paper. It's generally very short. In fact, we encourage it to be short, so JOSS papers are generally less than two sides, A4 or U.S. letter, however you printed it out. They're really genuinely short. Our submission format is Markdown and Bitext, and we use Pandocs to compile the things. There's kind of submission and review and the whole editorial process happens on GitHub, in public repository, in a public review's repository. So it's interesting and weird, some of the things we do, but at the end we try and do all the right things in terms of...we give it DOI, which is a kind of weird URL shortener that academics use. You can index citations to other work.
12:46 Michael Kennedy: One of the issues that comes up around that is if I wrote, say, a paper published in a high-end journal and it references some package I depend upon that generated my results, the owner of that package could just be having a bad day and go, I'm deleting this GitHub repo, and it's gone, right? So this DOI is sort of a code in escrow type of thing, right?
13:11 Arfon Smith: Yeah, there's actually two DOIs that get created when a JOSS paper gets published. We make an archive of the software, or we request that the author makes an archive of the software, so there's tools like... A tool called Figshare, and there's another one called Zenodo, that is run by the people at CERN, the people who do the computing infrastructure for the Large Hadron Collider. What they do is they actually set up a WebHook if you do this from a GitHub repository, they basically configure this add-on. When you do a release on GitHub, it makes an archive. It takes a snapshot of that code and actually doesn't include the Git history which, weirdly, maybe you would want. You know, actually legitimately might want. But it takes a tarball from GitHub, archives it, and gives it a DOI. The DOI points to Zenodo, then, but Zenodo then also has a copy of the code. So yes, if you or I decided to rage quit open-source or something, or just get really burnt out. Actually, that's not rage quitting at all, that's just legitimately decided to disappear off the planet, the code is still available. So when you submit to JOSS and when you're accepted, one of the last steps is when the review is complete and the changes are being requested and made to the satisfaction of the reviewer, we ask the author to make an archive of the code and then the paper also gets a DOI. So then when people want to cite that package, we encourage them to cite the paper, and then the paper connects to the archive of the code, if that makes sense. So there's some guarantee that in the future if you stumbled across this paper, then you should still be able to find the source code even if it's not on GitHub or if GitHub doesn't exist or something.
15:01 Michael Kennedy: Right, right, right. You know, I think there's a lot of stuff happening around there and I don't want to go too deep down the hole of that, but even if you have the source code, that doesn't necessarily mean it's saved for all time, right? So maybe it runs on a certain flavor of Linux that has a certain version of some internal bits that it works on, and if that is gone...right? There's layers outside just the software. There's the versions of Python, if it were based on Python, right? There's whole layers of this. Things like containers, like Docker and whatnot, are interesting players in this space as well.
15:36 Arfon Smith: Yeah, yeah, for sure. I think there's definitely more we could do there. One of the things that's come out of the work we've done on JOSS is that we've got a fairly generic set of tooling now. So we've got a fairly lightweight web application that allows people to submit something for review, and then we've got an automated bot that's called Whedon. Some of us are Firefly fans, I guess--Joss Whedon. It's the Whedon handle on GitHub, which is kind of fun. That bot actually helps with a lot of the editorial management. So a lot of that is sort of chat-ups, kind of automated in GitHub issues. That toolchain can actually be applied to other things. One of the things that's coming up is JOSS has actually been forked to make a sister journal called Journal of Open-Source Education, or JOSE, and that's actually using exactly the same toolchain. It's literally just a fork in the code base. We've generalized that. So you can imagine we've definitely talked about containers or something that actually are interesting to think about, reviewing and saving and having those. There's been the idea of this sort of journal of open-source containers as a journal. I'm not sure exactly what I think about that yet, because I actually think to your point it might even just be better to say, well, if you've got a journal submission, really what we want you to do is have a supporting kind of infrastructure piece like a container to make sure that that software has some chance of running in the future with some increased longevity. But we haven't gotten that far yet, but it's definitely interesting.
17:13 Michael Kennedy: It's very interesting. I think it also may put extra pressure and friction, though, on getting submissions.
17:19 Arfon Smith: For sure. I mean we're definitely not short on submissions. We've been going for a little under two years now, and we're close to 300 submissions.
17:29 Michael Kennedy: That's awesome.
17:30 Arfon Smith: Yeah, it's great, and it keeps me busy and the editorial team. We've got a great team of editors. So part of me thinks, huh, if we could slightly reduce the number of submissions, that'd be kind of cool, it would help my Friday evenings. But no, no, you're absolutely right. We don't want to raise the bar too high. We feel like we've got a pretty good kind of quality bar right now. It says it in the name, you have to use an open-source license, not one that you've made up. You know, one that's approved by the OSI.
17:58 Michael Kennedy: An official one.
17:59 Arfon Smith: Yeah, yeah, you pick one of these 300, it turns out, whatever, but there are lots, but pick one. Then our reviews are primarily about usability of the software. We encourage people to have tests, ideally automated tests. Documentation is a must. We have acceptable, better, best categories. One of the reasons we set up JOSS was we felt like a lot of the software that's in the academic literature, when people write software papers, which is a thing outside of JOSS, like you write a paper about software to get some sort of career credit as an academic, give people something to cite. Nobody ever looked at the software. The review was always about the paper, never about the software, so we've turned it on its head. Most of our review is about the software and not about the paper.
18:50 Michael Kennedy: Yeah, I think that's right. I think that's the right way to do it. And like you said at the beginning, the actual submission, your guidelines for the thing you accept, is really simple. It's like an abstract and basically supporting materials and links to the software. So maybe it's worth talking about briefly, why does this exist? Because you mentioned there were these other software-oriented journals. Pick an industry, there's 50 journals in that industry. They're usually, like, expensive, you've got to buy them, they're private, they go out to university libraries and professors and stuff like that.
19:26 Arfon Smith: For me, the number one motivation for JOSS is to find a way to credit people in academic settings or, in fact, research settings. The difference, I don't know whether this is interesting, academic research being sort of public, not for profit, and research is more sort of general, you could include commercial activity, I guess. So people who are in a research setting, who are writing software as part of their job, who are struggling to get career credit for that. That turns out to encompass a lot of people that I know. I guess technically it probably would have been me at one point, except I don't actually, I personally wasn't ever trying to sort of follow an academic career track, which relied upon papers and that kind of thing.
20:14 Michael Kennedy: Right, but this is basically the currency of professorships and tenure-track positions, right?
20:19 Arfon Smith: Yeah, yeah, so the way that you get a job as an academic is you write papers, and then people cite your work, and if you write enough papers and you get enough citations, then at some point a group of people in a room, they're generally called like a tenure committee, a tenure review committee, will decide that you can have a permanent job in academia. That's like the golden ticket, tenure. The problem with that is that it's primarily and, in many universities, exclusively based on papers. It does not take into account whether you give really good public talks, which lots of universities would say is a good, like, outreach is important. Or it would not take into account the fact that you spent three years collecting a really valuable data set that lots of people have used. Like, data sets generally aren't credit worthy. Like the only thing we have, we have this one-dimensional kind of credit model, which is papers and citations. So the problem is if you write really high-quality software for a research setting, you might spend a significant fraction of your time doing that. If you spend so much time that your number of papers suffers, then you're going to get dinged on that in terms of your career prospects. So software papers, a paper that describes a piece of software, is a sort of understood hack on the current academic system, except that software papers come with a bunch of problems, and JOSS tries to address a few of those. One of which is if you want to write a paper about a piece of software, you generally have to have supporting new research results.
21:56 Michael Kennedy: Right, and that's the hardest part, I think.
21:58 Arfon Smith: It is, it's incredibly tough.
21:59 Michael Kennedy: It's not enough to say I've built the most efficient, most awesome AI framework for discovering exoplanets, you have to go and find exoplanets.
22:09 Arfon Smith: You do.
22:10 Michael Kennedy: Then coincidentally you get to talk about how you did it, right?
22:13 Arfon Smith: Yeah, it just happens I have this repository over here which happens to be on GitHub with a license, and you might want to use it. It's like a byline of the paper. So I have a problem with that. I think it especially is bad for long-lived, so arguably successful, software, right? Like, if there were only ever one release of a piece of software, you could say okay, well, you know, you probably built it when trying to do some research, so you should write a paper that describes the software and some research results, the end. But what happens on version two? Like, if you and I decide to work on a piece of software that you did version one, you probably don't want to write another paper, because now people might cite the other paper, and now you don't get... Academics worry about citation dilution. It sounds like such a ridiculous thing, but it's real. Because it turns into this number called an h-index, which is, well, a way of trying to parameterize, capture the impact of a researcher. So yes, I have...JOSS papers are, the idea is that no results are actually permitted. It's not that we don't need results, you're not allowed to put novel results in the paper. Because we're not going to review that. That's not what the reviewer is there to do. So that's really why we call it developer friendly, the idea being if you have done the hard work to write this piece of software, we don't want you to spend more than roughly an hour writing the paper to go with it. That turns out to be appealing to quite a lot of people.
23:46 Michael Kennedy: Yeah, that's really awesome, and so people who have worked in any research area, like you said, not just academics, if they want credit for the software that they have created, in terms of academic credit--
23:57 Arfon Smith: Academic, yeah.
23:58 Michael Kennedy: The special coins you get at the university when you've been cited, right? That type of currency.
24:02 Arfon Smith: Yeah, and to that point, we do get, I was actually looking through some submissions earlier today, we do have mixtures of people submitting. Most of our authors are in an academic setting, so either in some kind of research institute, but where papers legitimately count towards their sort of career progression. But we do have people in commercial companies as well, especially in the sort of data science ecosystem. I was looking today and there was a paper, it was a scikit learn contrib package. I think it was the hdbscan, some new implementation of an algorithm, and it was somebody at a university and somebody from Spotify, I think.
24:50 Michael Kennedy: Nice.
24:51 Arfon Smith: No, Shopify, actually, yeah. Very similar. But it clearly looked like it was a data scientist at a company who maybe doesn't care about that credit, but also maybe was a university, in a university setting at some point.
25:04 Michael Kennedy: I think especially in data science, I feel like there are many people who go a long way in the academic space, and then they go into commercial data science, and I feel there's this interesting tension and tradeoff in the whole data science space, in that the demand for data scientists so strongly motivates or pulls people out of academics. They're like, we'll pay you half a million dollars. Forget tenure, you can do this.
25:31 Arfon Smith: Yeah, yeah, and--
25:33 Michael Kennedy: But there's still that tie back to, like, I work with this professor or this group, and I'm still kind of helping them, and they give me good ideas. So I feel like maybe a lot of the papers come from that sort of remaining ties that these groups have.
25:46 Arfon Smith: Some definitely do. I don't think I have a good handle on how many. I think even if you've left academia, there's still the conventional way to share your ideas is in the literature, publishing papers. I think it's very natural for people to want to write a paper and then if we're like, well, here's a super-short way of getting a paper, then...
26:14 Michael Kennedy: Exactly. Well, especially if you've done the work. We'll get to some of the details in a little bit, but I feel like what you've done is you've come up with this concept of how you submit your stuff to the journal, your software, and it pretty much just checks the box of here's how you run a good open-source project.
26:29 Arfon Smith: Yeah.
26:29 Michael Kennedy: It has a good open-source license, it has documentation, it has tests, it's hosted somewhere like GitHub, etc. etc., right?
26:36 Arfon Smith: Yeah, so when I was working at GitHub, I really learned a lot about how successful projects come into existence and what are some of they key things that you need and so, really, there's a lot of material out there now, but certainly four and a half years ago this idea of what healthy open source looked like or what a successful project looked like wasn't written down that much. The team I was in at GitHub was trying to create some of that shared understanding in the community. And actually while I was there, I was thinking about journals and had already been talking to some other commercial publishers who were asking me about GitHub, so I was helping them understand open source, but I was representing the company. In my final year at the company, I figured, you know what, none of the conversations I've had have been very satisfying. They're not getting it. They're kind of doing the wrong thing. I realized, I just started hacking on some code, and I was like, you know, I think I could do this. I think it'd be super easy, and if you assume the review could happen in an issue and, you know, submission is just creating an issue, I realized we have a strong GitHub dependency via the API, but I realized that I thought I kind of knew enough to do it better myself. So then just decided to go for it.
28:05 Michael Kennedy: This portion of Talk Python to Me is brought to you by ActiveState. ActiveState gives you a faster way to build and secure open-source run times, from your first line of code through to production. Every second you spend building your Python distro or trying to secure your Python programs is time away from doing the work you love. Tired of resolving dependencies or making sure you tick off all the security boxes when you ship to production? With ActiveState, you can focus on your code and leave the open source to them. Your teams can standardize with their python build for your specific use. You have less friction in the development cycle, and that means you can deliver apps faster. If you need to manage your apps in production, they even give you a unique, server-side way to verify your Python applications at run time. You can bake security right into your Python products without impacting performance. Cut the hours wasted building your distro, finding the right package, or making sure you tick off all the security boxes when you ship to production. Go faster, spend more time doing the work you love, and comply with your enterprise needs for security. Try them and see why their distribution was chosen by IBM, Microsoft, NASA, Siemens, PepsiCo, and others. Join millions of developers who trust ActiveState to build their open-source language distros. Visit talkpython.fm/activestate and cut the time configuring and securing your python build. That's talkpython.fm/activestate. Let's talk a little bit about the whole process. Actually, let's touch first on the kinds of projects that come up here. So people that are listening, they're like, well, I've done some stuff, it's kind of research, it's a project, would it fit? So let me just read the quick description of a couple recent things that you guys accepted. One is NiaPy, N-I-A Py, and it's a new microframework for building and using nature-inspired algorithms in Python. So that's pretty cool. There's one, pynucastro, which is Python interfaces for nuclear reaction rate databases, including JINA stuff. There's a bunch of these. They're not super large scale. A lot of them are like, I need to get to this data, or I need to do this calculation and nothing quite works, so here's the bridge that I had to build for myself a lot of times.
30:19 Arfon Smith: Yep, yes. We get, I mean, there's a few different categories of software that we get. Actually, there's a bunch. Those two, I'm pretty sure I didn't edit either of those, 'cause one of the problems we have is we get submissions, I have no idea. We ask them to write, we ask the authors to submit a general, for a generalist audience, a summary of their software, and I read them, and I'm like, I have no idea what this software does. I literally do not understand any of these sentences. Thankfully, we've got, I think 16 editors now, and somebody'll be like, oh yeah, I know this stuff, or I know enough that I can help edit this. We have a pretty good reviewer pool now. I think it's well over 200 people who volunteer to review for the journal, so we have their language expertise and their subject expertise and their GitHub handle, so we just ping them when a submission--
31:14 Michael Kennedy: Yeah, I think that works really well, and you do have a real rockstar cast of supporting editors. I mean you look through, there's a bunch of big names, including Jake VanderPlas.
31:24 Arfon Smith: Yep.
31:25 Michael Kennedy: Kathryn Huff, who does a bunch of the nuclear stuff. Maybe she probably reviewed that one.
31:28 Arfon Smith: Yeah, I think she did, yeah.
31:31 Michael Kennedy: And the reason I learned about you guys is I was one of the reviewers on a thing called BATMAN.
31:37 Arfon Smith: Ah!
31:37 Michael Kennedy: Statistical analysis for expensive computer codes made easy.
31:41 Arfon Smith: Fantastic. I thought I recognized your face, there you go. I didn't make that connection.
31:47 Michael Kennedy: Yeah, it's from my GitHub. That's awesome.
31:49 Arfon Smith: Thank you for your time.
31:50 Michael Kennedy: Absolutely. So one of the things I thought might be fun, and one of the reasons I wanted to feature this, is sort of twofold. One, if you're working in an area of either research or you're a grad student or even undergrad, and you've got some kind of interesting open-source software thing, that would be a good thing to submit, that would be great. But there are many people who ask me, hey, I'm really just getting started. I want to contribute to open source, but you can't just drop in on Django and just start adding to Django, because it's like an eight-year-old highly polished-- well, not exactly eight--but not new greenfield stuff. It's like very sensitive to change, and it's very nuanced, whereas I think becoming a reviewer, for example, might be a really nice way to start to be part of open source. If you're a student or something and you're trying to get it on your resume for when you get out of school, whatever.
32:41 Arfon Smith: Yeah, I mean one of the things... I mean, I think this is true, I only have anecdotal evidence, but I'm going to believe it, 'cause it supports what I want to believe, which is that people seem to genuinely enjoy both the review process and being reviewed. Both authors and reviewers seem to-- we probably could do a sentiment analysis of all the comments or something, as they're all public. But you know, we get people who say...I say, would you mind reviewing this? And...I'd love to! I went, really? Okay! I mean, I've had people email me and say, why haven't you given me anything to review yet? I don't know, I'm just kind of... Your name hasn't come up yet. I do worry that sometimes I go to people I know who would be good for this. One of the things I'd like to automate is reviewer suggestions. We have this big list that's in a spreadsheet. I feel like it's something that our bot could do.
33:36 Michael Kennedy: Maybe you could have tags, like these are their specialties.
33:38 Arfon Smith: Yeah, exactly, exactly. It would also keep track of how many reviews you've done. Because overtaxing people is one of the things I worry about.
33:48 Michael Kennedy: This person's good, they get everything.
33:49 Arfon Smith: Yeah, yeah, exactly. But yeah, for sure, people seem to... a number of times where reviewers are like, this was a really good experience, because I learned about this package. All really we want people to do as a reviewer is try and install it and run it and verify it, right? So that involves reading the docs, looking at the code, ideally if you find that there's methods that are uncommented, quite often reviewers will actually make pull requests against the thing they're reviewing, which is kind of nice, and everyone gets the idea that what we're trying to do is create a highly usable piece of software that solves a real problem. So yeah, that's a great suggestion, for people who are looking to dip their toe, take first steps in open source, JOSS is actually a great place to come and just read people's code. Often, as you say, these are pretty small packages. Actually maybe even become a contributor. Yeah, that's a great idea.
34:55 Michael Kennedy: Often a lot of people who are getting into open source, some of their first steps into any particular project is to help with documentation or tutorials or examples, and this review process is similar to that.
35:05 Arfon Smith: It is, yeah, and we have a fairly proscriptive process, so we don't leave people with just what do you think about this piece of software? There's like 20 checkboxes that we ask people to...
35:16 Michael Kennedy: Does it have an open-source license, yes or no? Does it have tests, yes or no? It's almost like a little checkbox. I think there are actual checkboxes in the...
35:24 Arfon Smith: Yeah, there are, absolutely, absolutely. We are checkbox-driven development or something.
35:30 Michael Kennedy: I recommend to people out there if this sounds super interesting, if maybe you're still in college or grad school and you're like, I want to sort of start to build a resume around this kind of stuff, becoming a reviewer would be really easy. People who are especially in school, they already have some specialty, so they can help in that area.
35:48 Arfon Smith: Yeah, yep, for sure.
35:50 Michael Kennedy: So JOSS is actually one of four journals under a larger banner called just Open Journals at TheOJ.org, right?
35:57 Arfon Smith: Yep, that's correct.
35:58 Michael Kennedy: Tell us briefly about the others.
36:00 Arfon Smith: Yeah, so the first, I think the first one I set up was this one called the Open Journal of Astrophysics, and that's my least successful. Stepping back, it would appear, based on the evidence, I create journals in my spare time, which is a horrible thing to do if you want to have any spare time in the future. This is like my problem in life, that I seem to make academic journals, and it's a big time sink. So the Open Journal of Astrophysics was the first. We actually published three papers. It's kind of currently paused right now, mostly because we don't really have a particularly strong or--no, we have a strong but not very engaged editorial board. The important thing to realize about journals is it kind of lives and dies by the ability and willingness of the editors to do... and the reviewers to come together and review content. The Open Journal of Astrophysics is on a bit of a hiatus right now. I don't know what will happen with that project. It's a nice project in the sense that the model is to review papers that are already on the archive, which is a preprint server where people put free and open copies of papers that they're going to submit to other journals. The idea being that you can just do a sort of review in a browser. There's lots of other journals now following this model, they call them archive overlay journals. I'm sorry we weren't more successful with that, but such is life. The second is the Journal of Brief Ideas. I built that with a guy called David Harris, who's a physicist in Australia. And he has this problem. He just really wanted to find a way to capture short ideas, good or bad, and have a way for people to write them down and say here's an idea, I'm not going to take it further or I don't have time right now, and it's more like a sort of diary of thoughts from the community.
38:09 Michael Kennedy: It could be the seed of potential research projects, but I'm not going to pursue it? That type of thing?
38:14 Arfon Smith: Right, and why that exists is kind of interesting. Academics live or die at some level by the quality of their ideas and the novelty of their ideas, which is good and bad. Academics, it turns out, care a lot about who had the idea first, and I really feel like that's one thing reflecting on time and industry, something that I find engineering cultures care much less about. We care about building a good system, something reliable. I don't care whose idea this was, this is just a good idea. Whereas academics are very keen, very careful to award, oh it was this person, Mike's idea first, and then I took it forward, but he had the original... You know, you hear them a lot, being very careful--
38:59 Michael Kennedy: It's the citations, it's the papers, and these are all driven by the first paper, the citations on every subsequent paper, and all that, right?
39:06 Arfon Smith: The Journal of Brief Ideas is a way for people to say I've got an idea, it could be good, it could be bad. I want to write it down because I want to, what is it, put a stake in the ground.
39:18 Michael Kennedy: A little bit of a flag on this idea
39:20 Arfon Smith: Yeah, I think so.
39:20 Michael Kennedy: if I ever come back to it.
39:22 Arfon Smith: It's kind of fun. And people do use it, and I don't edit that. David is the sole editor. It doesn't go through a review. So they're limited to 200 words. These ideas are really short. You can have 200 words, and a figure... and that's it. And then JOSS is the third journal that I've created, and that's by far the most successful. At some level, one of the things that I did when building JOSS was I really wanted to absolutely minimize the amount of new software I wrote. One of the hard things about Open Journal of Astrophysics is it's quite a complex web-based UI with PDF annotations and lots of bits of technology that I wasn't particularly well versed in. JOSS is super simple. It's like a web form that leverages the GitHub API to open an issue, and that's it. Literally, that's it. It's got a very small space behind it so that it can render out the accepted papers. Then the Journal of Open Source Education is ready to go, in fact I think they're very close to accepting submissions, and that's not a journal that I'm going to be day-to-day involved in running, but it's part of the family of journals. The two that are most similar are JOSS and JOSE. They are very similar journals, using the same freighting model.
40:51 Michael Kennedy: Those are really cool, I like it. This portion of Talk Python To Me has been brought to you by Rollbar. One of the frustrating things about being a developer is dealing with errors, relying on users to report errors, digging through log files trying to debug issues, or getting millions of letters flooding your inbox and ruining your day. With Rollbar's full stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster. Adding Rollbar to your Python app is as easy as pip install rollbar. You can start tracking production errors and deployments in eight minutes or less. Are you considering self-hosting tools for security or compliance reasons? Then you should really check out Rollbar's compliant SaaS option. Get advanced security features and meet compliance without the hassle of self-hosting, including HIPAA, ISO 27001, Privacy Shield, and more. They'd love to give you a demo. Give Rollbar a try today. Go to talkpython.fm/rollbar and check 'em out. Let's just talk real briefly about compare and contrast. Most of these articles are written and published in high-end, very private, cloistered sort of journals, right? Like JAMA for Journal of the American Medical Association, or JRME for education, and you can't just easily go get them, the papers are often not available on the internet. They're really packaged away just for a few folks to get to, which I think is very odd, because so much of the research is paid for by the National Science Foundation, or the National Institute of Health, or whatever. So we, basically the public, pays for this research, and then the results of it are hidden away from public view.
42:36 Arfon Smith: Right, yes.
42:38 Michael Kennedy: This is very much not like what you guys are doing.
42:41 Arfon Smith: No, everything you said is true. I think there's a growing interest in what's generally termed open-access publishing, which is not so stuff...once it's accepted and is in the journal, it's available for all to read. But right now, a lot of the business models of academic publishing either rely on journal subscriptions, so when your library or you as an individual buy access to these papers, and that's generally a journal subscription, and that can run to enormous amounts of money, single universities paying millions of dollars a year to publishers just to gain access to, hilariously or disgracefully, the papers that their academics have written. It's that messed up, actually.
43:29 Michael Kennedy: It is.
43:30 Arfon Smith: You as an academic secure public funding, often for research, you then do the research, you give your copyright to your research to the journal, that then puts it behind a paywall and sells it back to your university and the public. There's a lot wrong with that. To be fair, the journals would say well, we add a lot of value. We bring peer review to the process, we make the papers, we format them nicely, we maintain quality at some level, and much of that is true, but the cost is still pretty high. I think there's a lot of interest in low-cost publishing these days. And that doesn't mean low quality, it just means how low can that cost go at some level. So JOSS, our running costs, if you ignore people time, which I'm going to, because we're all volunteers, we're something around $4 per paper in production costs. Most of that cost is a small web server for running the web app, and the fact that we have to pay subscription fees to get the DOI. So it costs us about $1.50 for each DOI, and we have to pay a membership fee to this organization called Crossref to be able to continue to generate those DOIs.
44:55 Michael Kennedy: Right, that dramatically changes the structure, and I know that on the academic journals, it's often professors who are not paid by the journal in any way or form, asked to volunteer in the same role as the reviewers are here, so it's not like they're paying huge sums to the reviewers.
45:13 Arfon Smith: No, no, no, no, no, no, yeah, yeah.
45:16 Michael Kennedy: It's crazy.
45:16 Arfon Smith: There's so much to say about publishing. One thing that's kind of interesting is that peer review, which we see as this pinnacle of quality and process is actually pretty new. Peer review has only existed for 50 years, just full stop. Most journals didn't have it. Famously, people like Einstein would write to the editor, and they'd send a letter saying here's my new theory of spatial relativity, and there's nobody qualified in the world to review this. So you must publish it right now. They'd be like, yeah, okay.
45:52 Michael Kennedy: I think you're right.
45:53 Arfon Smith: Seems legit. But it just didn't get a review, it just got published. So peer review is important, but generally, as you say, people aren't paid for it. I think almost exclusively people aren't paid for it. It's part of your contribution to the academic ecosystem--
46:11 Michael Kennedy: The society.
46:12 Arfon Smith: Yeah, it's part of the job as an academic to review, and that's understood. We have the same model. We ask people to review, we don't pay them, and I don't think we have any interest in paying people to review. That would be weird, given that we have no money anyway. But what we do have, I would say, is I think we, by being open, our reviews being open, a lot of peer review and academic journals is closed, you don't know who's reviewing your work, it's anonymized. I feel like that openness incentivizes good behavior and actually quality. I hope that one day somebody will be able to say I am a JOSS reviewer, I've reviewed 20 submissions, here are my submissions. You can go and look at those and be like, this person does really nice reviews. This is actually really high-quality, good insights from this person. There is some work already going on in publishing to make reviews a sort of credit-worthy activity. People write it on their resumes. They'll say, I review for, you know, ApJ or something. But you can't actually prove that unless you're the editor, and you're like, no, you don't review for me. You would never know. So there's a lot to be said for being open, and there's a lot to be said for innovating with cost models and pricing models. We don't charge anything to submit to JOSS. I don't think we have any interest in charging authors. We do have some ideas about how the review that we do could be valuable in other academic settings, like for other journals who want to get software review, but that's just early stages of conversation right now. But it's interesting.
47:56 Michael Kennedy: That's really, really neat. I feel like JOSS is open source, and 2018, or 2016, business on the internet meets old business model.
48:07 Arfon Smith: Yeah.
48:08 Michael Kennedy: It's like wait, why is it done that way? Because it doesn't seem like it needs to be done that way. So yeah, pretty fun. I do want to spend a little bit of time talking about your other stuff. Maybe we'll leave it there for JOSS for now. Just want to encourage people who've worked on open-source projects to either submit them or sign up for review, because I think that would be cool. So let's talk about the Space Telescope Science Institute, where you work, right? So you've got two major new telescopes coming out that you mentioned at the top, right? The James Webb Space Telescope, and what's the other one called? I forgot. It's like the whole sky...
48:45 Arfon Smith: Oh yeah, so I mean Hubble has been running for 25 years, we operate that still, and then there's WFIRST, which is a wide-field infrared space telescope. In fact, that's what the acronym stands for. It's a mission that's currently having a little bit of a rocky stage in Congress because, you know, budgets are weird and...
49:08 Michael Kennedy: These projects span longer than election cycles, which is dangerous, right?
49:12 Arfon Smith: They do, yeah. It's so interesting to see... Like, I've never had a job where I've had to pay attention to politics daily. I now have that job, and it's interesting. It's also, not being American, it's kind of learning about that world.
49:29 Michael Kennedy: Wait, they could do that? Why do they do that?
49:32 Arfon Smith: Yeah, so currently very active, we're very active on JWST, James Webb Space Telescope, which is meant to fly June-ish 2020. So these are lifetime...it takes a long time, it turns out, to both convince the government to spend $9 billion, which is what JWST is going to cost, so that's a lot, obviously. And then you have to build it. There's lots of novel technology that's just never been developed before, and it's complex. They take decades to build, it turns out.
50:10 Michael Kennedy: What's the primary result expected from the James Webb one, and then the wide-field one?
50:18 Arfon Smith: JWST is...I think, for me, the most exciting thing about JWST is it's an infrared space telescope. Infrared light is different in optical in the sense that it can it can look further back in time because infrared light isn't obscured by dust in the galaxy and the universe, or isn't obscured as much, so it allows us to look back further. Look at the first light coming from the universe, so a period of time called re-ionization, when the universe kind of...when the first atoms were forming after the Big Bang. That's some hundreds of millions of years after the Big Bang. JWST is going to be able to see the first galaxies and the first stars forming, and that assembly of the very first galaxies, the stars become sort of gravitationally bound. So that's very exciting for lots of reasons, but understanding the earliest phases of the universe. Another couple of big areas, some science highlights there, are the ability because of the infrared light to be able to not be obscured as much because of dust, you can look deeper into places where stars and planets are being formed, so what gets called generally proto-planetary environment, so pre...before... when the stars...even before nuclear fission has started and the star hasn't actually turned on. You can probe those environments. So understanding how solar systems like ours form is kind of a big theme.
51:55 Michael Kennedy: Right, because it takes a while for that stuff to build up to get enough gravitational force to actually light up a star, right?
52:01 Arfon Smith: Oh, for sure, yeah.
52:02 Michael Kennedy: It's formed for a long time.
52:03 Arfon Smith: And then the sort of final big highlight for JWST is it's going to be the first telescope that really is going to be able to look at the atmospheres of planets outside our own solar system. Those are generally called exoplanets. Over the past five, 10 years the number of planets going around stars other than our sun has grown from, like, two to thousands. And we now think that most stars have planets. And there's pretty good reasons to believe that most stars have rocky planets somewhat like Earth, not maybe the same mass, but have places that might have atmospheres. So JWST is going to be able to look at the light passing through the atmosphere of those planets and actually characterize that. So you can look for things like methane and ozone and one of the things that is exciting about that is that you could look for exoplanets that have atmospheres that aren't in an equilibrium state, as in maybe have life. So that's kind of exciting. We're really at this point where--
53:07 Michael Kennedy: Yeah, that's super exciting.
53:08 Arfon Smith: where we're beginning to think about characterizing... We've discovered all these exoplanets. Now we're going to say what are they like? This is kind of a pretty exciting time for everybody, really.
53:20 Michael Kennedy: Yeah, and it sounds like it's right up your alley, actually, as well. So what about the wide-field infrared survey? It's a little bit later. So James Webb is 2020, the WFIRST is 2025.
53:34 Arfon Smith: Yes.
53:35 Michael Kennedy: Theoretically. Planned.
53:36 Arfon Smith: Maybe 2026 now. Well, we'll see. So WFIRST is fundamentally a different type of telescope. About five years ago, the U.S. government, I guess somebody picked up a phone or emailed somebody at NASA and said hey, we've got a couple of spare space telescopes. Would you like one? I forget which agency it is that builds all the U.S. spy telescopes, but they basically donated, they said we have this telescope that's never flown. In fact, we've got two, but you probably don't need two. Do you want this one? It's a bit like Hubble, in the sense that it's about a two, two and a half meter mirror. And the goal is to do infrared again, so longer wavelength optical light, and do a large-scale survey of the sky. One of the things about building telescopes in space is you don't have an atmosphere to look through, and that turns out to be a big deal, because it means you get much better... astronomers call it seeing, but you get much better resolution. So the shape of the thing that you see is not blurred by the atmosphere that you're looking through. Infrared space telescopes particularly are very exciting, especially when you're doing a large survey. So WFIRST is a survey telescope. Hubble and JWST are what are generally called, well, aren't survey telescopes. They're sort of a point and shoot.
55:00 Michael Kennedy: They fix on a point and they'll stay there for maybe a long time to see farther into the past, yeah.
55:04 Arfon Smith: So WFIRST is exciting because of the volume of data. Instead of, you know, Hubble over the last 30 years has produced something like 100 terrabytes of data. WFIRST will produce about 5 petabytes, which is a not ridiculous amount of data, but it's enough to be interesting and requires some thought.
55:23 Michael Kennedy: Yeah, I see a lot of interesting machine learning and image AI type stuff being applied there.
55:29 Arfon Smith: WFIRST has a number of key science goals. Again, exoplanets features heavily there, especially what gets called microlensing, so when a planet passes in front of a star, you get a slight increase in the brightness because of the effect the microlensing of the planet, which just sounds crazy, but...
55:53 Michael Kennedy: It's crazy, a bend in space-time curves the light to come over, right?
55:57 Arfon Smith: And dark energy, which is not very well understood-- well, in fact, pretty poorly understood-- component of the universe. In cosmology terms, how the universe kind of works. You need very large samples of the galaxy and looking at supernova and distances and how they go off and how they're affected over cosmological distances. Then you also need to do lots of shape measurements of galaxies, but you need a lot of them, and you need very high-precision data. WFIRST fundamentally is a different type of telescope.
56:39 Michael Kennedy: It should be really interesting to see new science coming from this new type of telescope.
56:43 Arfon Smith: For sure, for sure, yep.
56:45 Michael Kennedy: Awesome. Well, we could talk about space for hours, actually. But I want to be cognizant of your time. One more thing that you worked on that I thought was cool, just give you a chance to tell the world about, is Zooniverse. What's Zooniverse?
56:59 Arfon Smith: Zooniverse is a platform, a web-based platform, for Citizen Science. Citizen Science is this idea where members of the public, citizens of the world, can help solve real research problems. It basically is a platform that brings together people with research problems, generally academic research problems, so generally professional researchers, have a problem where they have some part of their analysis, or some part of their research project requires a lot of human effort, is probably the best way to think of it. Maybe classifying images by their type or pictures of galaxies by their shape. Zooniverse is a platform for bringing together the people who have the problems and members of the public who are interested in working on these problems.
57:51 Michael Kennedy: That's cool. So you can go and say hey, I'm interested in a project and maybe browse the existing projects and then you learn how to participate?
57:56 Arfon Smith: That's right, so there's probably about, I'm guessing there's about 50 projects there, listed today. I haven't been day-to-day involved with Zooniverse for about five years now, but I was, I guess, second hire on the project after they'd had this original success with a project called Galaxy Zoo, which was taking a lot of images from a telescope called the Sloan Digital Sky Survey and looking at galaxies and making a judgment about their shape--whether they had spiral arms, if they did, which way they were spinning, how many there were, and that kind of thing. It was a one-off project that was very successful, they secured some research funding to build out this approach to doing science with the public, and Zooniverse was born out of that. We did a whole bunch of stuff. As I say, I don't track it day to day these days, but we did crazy stuff like looked at images from camera traps in the Serengeti, looking at pictures of animals, doing things like tracing particle paths in particle physics data, looking for new physics, lots more looking at galaxies and gravitational lensing. It was really broad, actually. It was really fun.
59:16 Michael Kennedy: Yeah, it sounds really fun, actually. I don't know if this was part of it, but I knew there was this protein folding challenge, where they almost gamified that. It's that kind of stuff, right?
59:27 Arfon Smith: That's right. That wasn't us, but it definitely was a similar idea. The idea being that people are generally, like, lots of people are interested in science, but they just aren't doing that day to day and are interested in contributing.
59:42 Michael Kennedy: So there's a chance to go help with some problem, and you don't need a Ph.D. and a grant to do it.
59:47 Arfon Smith: For sure. Some of the best projects are ones that really, I think, we didn't know at the start were going to be successful. But I think probably still my favorite project was this one called Old Weather, which is probably the most boring title ever. It was pointed out to me once. So taking log books from ships from the Royal Navy, World War I, where they recorded the weather. So it turns out that six times a day, the Royal Navy, and actually lots of navies do now still, they record the air pressure, the water temperature, the atmospheric conditions, cloud coverage, that kind of thing, and just write it down. Their handwriting is generally not very good, the way it's laid out on the page is complex, so you could do OCR on it and try and get a machine to read it, but you still need that sort of context to then extract the data. But these log books are really cool. They've got basically stories about what's going on on the ship. We made a website where people could transcribe them. People got really into this and following... There was a guy, Lieutenant Dolphin. I remember because his last name's Dolphin, like the...not a fish, is it? The mammal. Dolphin kept getting thrown off ships for being drunk and disorderly, getting reassigned, and they found him, over 10 years, on different Royal Navy ships. There'd be a note from the captain saying that Dolphin's been relieved of command and sent to another ship. But somebody got interested in this person and followed them, and it turns out there were only two major sea battles in the First World War, the Battle of Jutland and the Battle of the Falklands. It turns out I know about this stuff now. That's the other thing, it was fun doing lots of other people's research, but in these log books you're watching the battle happen. It's saying spotted enemy, battleship engaging, collecting survivors, or sinking. Real world, serious stuff happening. But at the same time, these data that we're extracting get fed into these climate models, so they do reconstructions of climate over historical times. Because one of the challenges in understanding climate change today is actually having a long enough baseline to build models that can actually make good predictions for the future.
01:02:11 Michael Kennedy: Right, and much of that is over land, right? You can dig down into the ice or whatever, but the water washes that away, right? It's gone.
01:02:20 Arfon Smith: Exactly. So this was a project we did with a bunch of meteorologists. It was a lot of fun and for me it was mostly a technology problem, building that kind of infrastructure, but it was fun to be involved in lots of people's research as well.
01:02:37 Michael Kennedy: Alright, Arfon, I think probably we're going to have to leave it there. Maybe just a quick final call to action if people want to get involved with JOSS or more generally all the stuff we've been speaking about?
01:02:46 Arfon Smith: If people would like to help review or want to learn more, I think the URL will probably be in your show notes.
01:02:53 Michael Kennedy: It will.
01:02:54 Arfon Smith: But JOSS.theOJ.org. Yeah, we'd love to have your help. It's been really good to chat with you and share what we're up to. Thanks for the opportunity.
01:03:01 Michael Kennedy: Absolutely. Thanks for sharing your story. It's cool working, and keep it up.
01:03:04 Arfon Smith: Yeah, thank you. All right.
01:03:06 Michael Kennedy: Bye.
01:03:06 Arfon Smith: Take care.
01:03:08 Michael Kennedy: This has been another episode of Talk Python To Me. Today's guest was Arfon Smith, and this episode has been brought to you by ActiveState and Rollbar. ActiveState gives you a faster way to build and secure open-source run times from your first line of code through to production. Check it out at talkpython.fm/activestate. Rollbar takes the pain out of errors. They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course. As Talk Python to Me listeners, track a ridiculous numbers of errors for free at rollbar.com/talkpythontome. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps or our brand-new 100 Days of Code in Python. And if you're interested in more than one course, be sure to check out the Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code!