Python in Geoscience

Episode #163, published Fri, May 25, 2018, recorded Thu, May 3, 2018

Episode Deep Dive Links Transcript

Learn how Python is being used in research to understand the inner workings of the Earth. This week, you'll meet Lindsey Heagy, a PhD student in geophysics at the University of British Columbia. She shares how she is using Python to solve these computational problems along with an amazing framework for viewing scientific writing itself through the lens of Python and open source.

Episode Deep Dive

Guest Introduction and Background

This episode features Lindsay Hagee, a PhD student in geophysics at the University of British Columbia. Lindsay focuses on geophysical imaging, particularly in electromagnetics and reservoir monitoring. She’s part of a group that develops open-source software and educational resources to advance geoscience research and teaching. Lindsay also contributes to the Journal of Open Source Software (JOSS), helping other scientists publish and get credit for their open-source projects.

What to Know If You're New to Python

If you’re brand new to Python and geoscience, here are a few key points to help you follow the episode’s themes:

Python’s strength in scientific computing (via libraries like NumPy, Matplotlib, etc.) is central to the applications mentioned here.
Collaboration matters: Learning version control (like Git/GitHub) and open-source workflows is often just as important as learning the core language.
Being able to reproduce code and results (tests, automated builds) is a hallmark of modern scientific programming.
Geoscience often involves specialized libraries such as SimPEG, Discretize, and others that build on the broader Python ecosystem.

Key Points and Takeaways

Python’s Role in Geophysics Research Python provides an accessible yet powerful environment for scientific computing in geophysics, rivaling traditional tools like MATLAB. By adopting Python, researchers can leverage open-source libraries, collaborative development, and reproducible workflows. This fosters a community-driven approach where ideas and code improvements can be easily shared and peer-reviewed.
- Links and Tools:
  - Python (python.org)
- MATLAB (mathworks.com/products/matlab.html)
SimPEG: Simulation and Parameter Estimation in Geophysics SimPEG is a Python-based framework that lets researchers simulate physical phenomena (such as electromagnetic fields) and invert that data to estimate subsurface properties. It features an open ecosystem where new users can easily extend or adapt the code for their own scientific problems. Its approach demonstrates how Python’s open-source nature helps unify researchers across different institutions and domains.
- Links and Tools:
  - SimPEG GitHub Repository (github.com/simpeg/simpeg)
Refactoring Geoscience Education with Open Source: GeoSci.xyz Lindsay and her team created GeoSci.xyz as “living textbooks” for geoscience. By separating content from formatting using Markdown and Sphinx, they can apply software engineering best practices, like version control and automated testing, to educational materials. This approach makes it easy to fix errors, accept outside contributions, and keep materials updated without major rewrites.
- Links and Tools:
  - GeoSci.xyz (geosci.xyz)
- Sphinx (www.sphinx-doc.org)
Open-Source Tools from the Python Ecosystem Lindsay highlighted how libraries like Matplotlib (for plots) and the Sphinx Matplotlib plugin (for inline figures) make scientific communication more dynamic. Travis CI or other continuous integration services ensure the builds and docs remain functional as dependencies evolve. This seamless interplay of tools underlines the advantage of Python’s extensive ecosystem in research.
- Links and Tools:
  - Matplotlib (matplotlib.org)
- Travis CI (travis-ci.com)
Testing and Reproducibility in Scientific Computing Reproducibility is crucial in research. Lindsay mentioned the use of test I pi nB (testIpynb) to verify that Jupyter notebooks run to completion without errors. Tools like this catch broken dependencies and ensure that any changes to libraries or data do not silently break the research code over time.
- Links and Tools:
  - testIpiNB GitHub (github.com/opengeophysics/testipynb)
JOSS and Academic Credit for Code The Journal of Open Source Software (JOSS) provides a mechanism for researchers to receive formal academic credit for writing code, something often overlooked in traditional publication models. Lindsay serves as a geoscience editor at JOSS, and she emphasized how their review process focuses on the codebase quality as well as the scholarly write-up.
- Links and Tools:
  - Journal of Open Source Software (joss.theoj.org)
Combining Geophysics with Data Science and Machine Learning While not always “big data” at CERN-like scale, geoscience often involves multiple small but diverse datasets. Researchers are increasingly exploring data-science techniques, including clustering and machine learning, to integrate or interpret these various geophysical readings. Python’s data-science libraries make these workflows significantly more straightforward and flexible.
- Links and Tools:
  - scikit-learn (scikit-learn.org)
- PyTorch (pytorch.org) (mentioned as an example of modern ML in Python)
Collaboration and Community in Open Source Science Because geophysics is a relatively small community, sharing code and lessons learned is particularly beneficial. Lindsay noted the importance of open collaboration on GitHub, which promotes peer review, targeted issue tracking, and minimal duplication of effort. This dynamic helps researchers focus on innovations instead of re-implementing common routines.
- Links and Tools:
  - GitHub (github.com)
Challenges Balancing Code, Writing, and Research Lindsay underscored the tension between developing software, conducting core scientific research, and producing written publications or dissertations. In open-source, collaborative settings, these tasks can reinforce each other, colleagues can split work or peer-review code, but scheduling and prioritizing still remain significant hurdles for many in academia.
- (No specific link or tool beyond general references to Git/GitHub and publications.)
Properties, Discretize, and the “Equation Bank” Approach Two additional Python libraries mentioned include Properties, providing strong typing and data validation for interactive scientific APIs, and Discretize, which underpins SimPEG’s numerical methods. An “equation bank” approach was also discussed, letting authors “import” common equations (like Maxwell’s) so they appear consistently across multiple chapters and documents. This further demonstrates how software-engineering patterns map neatly into technical writing and publishing.

Links and Tools:
- Properties GitHub (github.com/seequent/properties)
Discretize GitHub (github.com/pyamg/discretize)

Interesting Quotes and Stories

"It was more like, 'What is this?' rather than thinking, 'Oh, this is something I'm excited about.'" -- Lindsay, on initially encountering programming in Perl and C++

"One of the important pieces is when you want to go back and estimate things, we need that inversion piece, and that’s such an essential piece to actually solve the parameter estimation problem." -- Lindsay, on how SimPEG simulates and inverts physical processes

"We've separated out what is data and what is packaging... you can immediately leverage all of those new developments." -- Lindsay, describing the advantage of Sphinx-driven educational content

Key Definitions and Terms

Geophysical Inversion: The process of using physical field data (e.g., magnetics, gravity) to estimate subsurface properties such as rock types or fluid content.
SimPEG: A Python library for Simulation and Parameter Estimation in Geophysics.
Sphinx: A Python-powered documentation generator that uses reStructuredText or Markdown as source content and can automatically build websites or PDFs.
Continuous Integration (CI): A practice of merging code changes frequently and verifying them via automated builds/tests (e.g., Travis CI).
JOSS: The Journal of Open Source Software, which formalizes peer review and publication for software codebases.
Properties: A Python library for strong typing, validation, and serialization in scientific code.
Discretize: A library providing finite volume meshes and discretization routines, often used in geophysics simulations.

Learning Resources

If you’d like to sharpen your Python skills in areas relevant to Lindsay’s work, like scientific scripting, robust documentation, and data exploration, these courses from Talk Python Training can help:

Python for Absolute Beginners: Ideal for those new to Python, covering foundational concepts with practical coding exercises.
Static Sites with Sphinx and Markdown: Learn how to use Sphinx (which Lindsay’s team used for GeoSci.xyz) to transform Markdown content into a polished site.
Data Science Jumpstart with 10 Projects: If you want a quick start applying Python to real data, this course helps you dive into multiple data-oriented projects.
Getting started with pytest: Learn to write effective tests and maintain reproducibility for research or production code.

Overall Takeaway

Python’s open-source tools, community-driven mindset, and large scientific ecosystem have transformed geophysics and similar fields. Whether for simulation, data analysis, or educational outreach, the Pythonic approach promotes collaboration, reproducibility, and continuous improvement of research methods and instructional materials. Lindsay’s story exemplifies how researchers can harness open-source practices to take geoscience, and science education, to new heights.

Links from the show

Lindsey on Twitter: @lindsey_jh
Simpeg: simpeg.xyz
Simpeg example: twitter.com/rowancockett/status/989361802893967360
GeoSci: geosci.xyz
Properties: propertiespy.readthedocs.io
Using Open Source Tools to Refactor Geoscience Education: youtube
EarthPy: earthpy.org

Extras
My MongoDB workshop: mongodb.com/webinar/python
Anvil: talkpython.fm/anvil
My Anvil App (from course): pypoint-100days.anvilapp.net
Episode #163 deep-dive: talkpython.fm/163
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #163 deep-dive: talkpython.fm/163

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Learn how Python is being used in research to understand the inner workings of the Earth.

00:04 This week, you'll meet Lindsay Hagee, a PhD student in geophysics at the University of

00:09 British Columbia. She shares how she's using Python to solve these computational problems,

00:13 along with an amazing framework for viewing scientific writing itself through the lens of Python and open source. This is Talk Python To Me,

00:21 episode 163, recorded May 3rd, 2018.

00:25 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:44 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter,

00:48 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm,

00:53 and follow the show on Twitter via at talkpython. This episode is brought to you by MongoDB and

00:59 Anvil. Please check out what they're offering during their segments. It really helps support

01:03 the show. Lindsay, welcome to Talk Python.

01:06 Thanks. Good to be here.

01:07 It's great to meet you. You're doing some really cool stuff in geophysics intersecting with Python,

01:13 so I'm super excited to explore that with you today. But before we do, let's get started with

01:18 your story. How did you get into programming in Python?

01:20 Most of that actually was undergrad, so I really hadn't touched much computing until I started at

01:26 university. I did my undergrad in Edmonton at University of Alberta. And it was in my first

01:31 year that I sort of got introduced a bit to programming ideas, but that was actually in Perl.

01:35 So did programming look like a good thing you wanted to do, or were you like, whoa, what is this?

01:39 It was more like, what is this?

01:42 I mean, it was interesting to just start getting exposed to the types of things you can automate to

01:47 make your life a bit easier. And so that was all that we had really done in the first year.

01:50 And then got a bit into C++ in my second year when I was starting in physics. And that was intimidating,

01:58 for sure. And then finally, actually, once I started getting much more into geophysics,

02:03 we were all in MATLAB. And that was much closer to what sort of the day-to-day scientific work

02:09 is readily supported by. And so, yeah, that was really my first foray into computing.

02:14 Yeah. And there's a pretty easy switch from MATLAB to Python. I mean, they're not the same.

02:19 Obviously, they have really different philosophies, right? Like MATLAB is like,

02:23 let's make it all super commercial. And every little function you want, you got to buy that

02:27 individually and so on. But, you know, the sort of feel of scripting it is kind of the same.

02:33 So that's pretty cool. So how did you get into Python? Actually, was that in grad school?

02:37 Yeah, it was grad school. And it was actually, I'll credit Rowan Cockett,

02:40 who I've done a lot of work with. And he just recently finished his PhD.

02:44 And so that was, I guess, my first or second year at grad school. He had really suggested that we

02:50 started working together on a number of projects as like a geophysics group. And so that's how

02:54 Simpeg started, which I'm sure we'll come back to. But then starting to build software together was

03:00 when, you know, we looked at Python because there is such a healthy community and so many tools for

03:05 making that easy. That, yeah, that's how we got into it.

03:08 Yeah, that's cool. And what year was that?

03:09 That was only a few years ago. So I guess like my second year of grad school.

03:13 Yeah. Earlier than that. Maybe 2014. Yeah.

03:17 I'm just trying. I'm just thinking because if you look at the history of Python, and especially how

03:24 it's appeared in data science, it looks like around 2012, I don't know what the actual trigger for this

03:29 was. But something happened that just really drove the adoption of Python, especially in the data science

03:34 space. I'm pretty sure Jupyter has a lot to do with it. But I'm sure there's other factors as well.

03:39 So it makes sense. Yeah, yeah.

03:41 Yeah. So what do you do today? Like, you're still in grad school? Is that right?

03:45 Yes, I am. My sort of day to day, I am trying to wrap up my PhD. So it's a lot of writing at the

03:50 moment. Like the thousand little details you didn't know you had to finish up or are attacking you?

03:56 Exactly. Yeah. So that's where I'm at at this point. But I'm fortunate to be a part of like a really

04:01 great group of people. And so there's a lot of collaboration that goes on back and forth.

04:05 So like most of my PhD and grad study journey has really been a lot of collaborative work,

04:10 which has been a lot of fun.

04:11 Nice. And so you're studying geophysics. And you're working at this place called the

04:16 Geophysical Inversion Facility. Is that correct? Is that right?

04:19 Yes, that's right. Yeah.

04:20 So first of all, what is geoscience or geophysics? Like, generally?

04:25 Geophysics would be a subset of geoscience. So geoscience is basically anything concerned with

04:31 trying to understand the earth. So that is very, very broad.

04:33 Right. It could be plate tectonics. It could be magnetic field. I guess it could even be climate

04:37 change, right?

04:38 Yeah. It could be climate change. It could be atmospheric studies. It could be like economic

04:42 geology, trying to map out where different rock units are. All of that.

04:46 Okay. So what's geophysics?

04:47 So geophysics, we're then understanding the earth through physics. And so what I do specifically in

04:54 geophysics is we're looking at geophysical imaging. So it's a lot like medical imaging in a lot of ways.

05:00 So, you know, when you go to the doctor, you're hoping that they don't have to drill into you to

05:04 get information about what's going on inside of you. So there are sensors, like if you go into an MRI or

05:11 something like that, there's sources and sensors on one side. And we can take the data that have been

05:16 collected in that survey of you and work with those to then get an image of what's going on inside your

05:22 head. And so we basically do the exact same thing, but then on the earth. And so it's a larger scale

05:27 survey, but it's the same general principles.

05:29 Okay. That sounds really interesting. I've seen some really amazing graphics and I'll link to a

05:34 couple of them, of course, in the show notes. What are some of the types of questions that you are

05:40 trying to answer or people generally in the field are trying to answer?

05:43 So it totally ranges. Our group, a lot of the history has been really connected with minerals.

05:47 So trying to map out and locate where mineral deposits are, characterize them, delineate the

05:52 different units and things like that. One of the big topics that's becoming more and more relevant

05:57 is characterizing groundwater. And so trying to figure out where do we have pockets of aquifers? You know,

06:04 how much is in that aquifer? Can we quantify that using geophysics? Because in a lot of places right

06:09 now, what's done is just wells are drilled and you can get water levels at, you know, single points,

06:14 but that obviously doesn't characterize the whole aquifer.

06:17 Right. Of course. And that's going to be an increasingly interesting question. You know,

06:21 we saw California go through some really serious droughts and they're finally out of it, but that

06:26 kind of stuff is going to be happening more and more, most likely. So those questions become more

06:31 critical, right?

06:31 Yeah, absolutely. So groundwater and seawater intrusion is another big thing in California. So it's not only

06:37 that you are losing, you know, aquifer water, the seawater is actually also coming in. So there's a whole

06:43 bunch of different things going on that we really need to get a handle on fairly soon.

06:46 All right. So what are some other questions or some other areas people are focused on?

06:50 A lot of the work that I've been doing, I've been looking at trying to monitor subsurface injections,

06:55 like if carbon dioxide or hydraulic fracturing is done, you know, we're injecting fluids into the

07:00 earth and trying to track out where those are growing. And so I've been doing a lot of work

07:04 looking at electromagnetics when we have steel cased wells there, because often, you know,

07:09 we've got this well that and we're going to inject the fluid through there. But what's

07:12 interesting about a well is that it is actually very, very conductive. So in a lot of ways,

07:16 it's like a big electrode. So it can help us get current to depth. But we're then trying to

07:21 understand, you know, the mechanisms and how does that actually happen? And how does it work?

07:25 Yeah.

07:25 So that's been a lot of, yeah, what I've done.

07:27 That sounds really, really interesting. I guess the first place we should probably start looking at

07:33 some of the programming side of things is, I guess, let's talk about Simpeg. So Simpeg is this thing

07:40 for simulation and parameter estimation. What does that mean? Put that in plain English for us.

07:46 Fair enough. What we need to do if we're trying to collect any sort of data,

07:50 so we can maybe go out and do a magnetic survey. And so in that case, the source is the Earth's

07:55 magnetic field. And then basically, whatever kind of rocks that you have that are susceptible,

08:00 meaning that they have, they act like little dipoles. And so they'll try and line up their

08:06 dipoles with the Earth's field. And then we can go over with a magnetometer and try and find these.

08:11 And so that would be a magnetic survey. And what we need to be able to do is we actually need to be

08:16 able to simulate the physics of that process. So in this case, we'd be simulating like the

08:21 magnetostatic equations.

08:22 Yeah, I was gonna say, so different types of minerals maybe react differently to the magnet.

08:27 So if you're looking for a cobalt or lithium or something else, like you can detect that using

08:33 magnets?

08:34 Yeah, some of them. So it depends if they're magnetic or not. So not all minerals are.

08:38 And then the strength can vary. And so mag is just one survey that you can do. But generally,

08:43 we'll try and do a few different surveys. So they might be magnetic, they might also be conductive.

08:48 So if you have something like iron, it's pretty easy to get current through that. And it's magnetic.

08:53 So if you go and do both of those surveys together, we can start to like pinpoint what these different

08:57 minerals are.

08:58 So iron is easy?

08:59 Iron's easier. Yeah.

09:01 Nice. Okay, so back to the simpeg. So you want to you want to analyze the magnetic response

09:06 of the stuff deep underground?

09:09 Yeah, so we first need to be able to like simulate the set of equations that governs that process.

09:14 So that's the simulation piece is we're going to solve some sort of partial differential equation

09:17 that's governing the physics. So we assume that we know what the earth is, like what the earth model

09:22 is, what the physical properties are, and we can predict what the data should be. So that's a

09:26 simulation piece. And then the parameter estimation piece is trying to go backwards,

09:30 is that we have data. And now from those data, we're going to try and estimate.

09:35 And so we do that through an optimization process, we can basically say, okay, I know what these data

09:40 are. And I know how to guess or I know how to simulate data. And so now we're going to try and

09:45 find a model that fits those data. And it's somewhat geologically reasonable.

09:48 Okay, that's pretty cool. And this is all done in Python. Is there maybe some like underlying

09:54 C or Fortran code that gets brought in there? Is it all straight Python?

09:59 It's basically all Python, we interface to lower level solvers, and things like that. And there's

10:04 a little bit of Cython for some of the meshing stuff. But for the most part, it's pure Python.

10:08 Okay. Yeah, that's, it's really cool. There's a picture, you mentioned Rowan Cockett, and I happen

10:14 to just have grabbed something off Twitter from him. So that's kind of a funny tie in. But if people are

10:21 kind of interested in seeing what the those look like, I'll put in the show notes, this cool,

10:26 sort of three dimensional graph of what is called the Richard equations. That's pretty cool. So

10:32 are you familiar with this?

10:34 Yes, I'm a co author on that paper, actually. So Oh, awesome. Yeah. Well, tell us, tell us a little bit about like what,

10:39 what the main question was, and what you guys found using Simpeg on it.

10:43 What this was, this paper actually describes a lot of the fluid flow machinery that we have in Simpeg.

10:49 So Richard's equation describes fluid flow, like going through soils. So it's a two phase flow where

10:55 you have air and water in some sort of medium like a soil. And so depending on how hydraulically

11:03 conductive pieces of that soil are basically how easy water goes through that front propagates down

11:08 differently. And so in this case, what we developed in this paper is describing like, how do we solve

11:15 those equations? But then the important piece is when you want to actually go back and estimate things

11:20 like hydraulic conductivity, we need that inversion piece. And so we need to have gradients that tell

11:26 you basically, if I change this model parameter this much, it changes my data in this manner.

11:31 I see. Like how responsive is it to variation in parameters and estimation and stuff?

11:36 Absolutely. And so that's like such an essential piece to be able to solve that parameter estimation

11:41 problem. So in this paper, we go through and basically derive all the mathematics for that, and then show a couple

11:48 examples.

11:48 Yeah, the pictures are really, really compelling. And it's pretty interesting. So I think it's really nice what you guys have

11:55 put together here. So are a lot of people using Simpeg for other work? Or is it mostly within your group?

12:01 It started mostly within our group, but it is starting to branch out a bit. So we've got some collaborators at Colorado

12:07 School of Mines. There's some people with USGS who are starting to dabble in it. We recently gave a short course that

12:14 traveled around the world. And so that was a great opportunity to start introducing people to the Python ecosystem.

12:20 And for some researchers, you know, Simpeg was an appropriate tool. So they're starting to come in and explore that a bit.

12:28 Hey, everyone, Michael here. Did you know I'll be doing a three-part webcast series about MongoDB and Python from May to June?

12:36 We'll see why MongoDB is a great choice for Python web apps. In this series, we'll go through the entire process of

12:42 building a clone of PyPI, Python's packaging website over at pypi.org. Everything from building the front end to

12:50 deploying the web app and MongoDB to the cloud. You'll learn everything from document modeling basics to special

12:56 considerations for running MongoDB in production. The webinar is free. So just click on the link in the show notes or go to

13:02 mongadb.com slash webinar slash Python and sign up. See you in May.

13:08 It sounds like a lot of what you've learned from Simpeg has sort of fed back into some of the other work that you're doing as well around like more general education, research and science, right?

13:19 Yeah, absolutely. I think for our group and for me in particular, Simpeg was like really my first entry point to the whole open source ecosystem and like what it's like to operate and run an open source project and be involved in an open source project.

13:34 And what some of the strategies are things, you know, like peer review and issue tracking and all of that sort of stuff. That was all really learned when we first jumped into Simpeg.

13:43 I think it's really interesting. I mean, you come from a place like say working with MATLAB as a community in general, right? Like they publish packages you can add on and you kind of wait for them to release new versions and they give you what you, you know, you get what they give you, right?

14:00 And there's not a whole lot of give and take there. And that's just so, so much the contrast with things like Simpeg or other open source projects, right? Like even on the courses that I do, I have people will come and they'll say, oh, you did this little demo like this, but actually I've refactored it to really take account for this other thing.

14:19 And it's really, you know, just you get these out of the blue, people are helping you out.

14:22 Even if you don't ask for it, it's really quite an interesting experience.

14:25 It really is. And like just the ability to put code out there, share it and get other people's opinions is like such a part of the Python community and like what people follow.

14:36 Whereas if you look at something like MATLAB that just, there's not easy mechanisms to do that.

14:40 So even if you write something useful, like you email code then to your friends and that just is not sustainable.

14:47 Which one do you want me to run? The one that you sent me on May 23rd or the one that you sent in June?

14:53 Like I can't remember which one we're working from, right? That's not a great version control, is it?

14:58 No.

14:58 So maybe let's spend a little time talking about this presentation that you gave at PySci, CyPy?

15:04 2016.

15:05 And it was called Using Open Source Tools to Refactor Geoscience Education.

15:11 And I think you kind of put it around the scope of geoscience, but I would say scientific education more generally, right?

15:18 Yeah. I mean, geoscience is our domain.

15:20 And so that's where like we've thought through a lot of these things.

15:23 But, you know, as we've been developing along the way, really trying to take a bit of a big picture perspective and figure out,

15:29 are there things here that we can learn and perhaps translate to teaching and learning as well as scientific publication?

15:35 Yeah. So maybe give us the big idea on your talk and then we'll touch on some of the pieces.

15:40 Geosci.xyz is a collection of like open source.

15:44 They're basically living textbooks.

15:46 And so the idea is that we're trying to take a lot of what we've learned from developing open source software

15:52 and try and apply that to open source educational resources.

15:56 And so that there are opportunities for collaboration, for peer review, for iterating on things and, you know,

16:03 to grow and develop resources as a community.

16:06 One of the things that we've really noticed in geophysics in particular is it's a really small field.

16:11 And so there really aren't actually many good textbooks, especially for introductory level classes.

16:16 And so what happens is, is, you know, there are professors scattered around the world teaching this

16:22 and they're all developing all of their own course notes from scratch.

16:25 And each of them has, you know, different expertise and different background.

16:28 And so there's one aspect of that that's going to be really strong.

16:31 But then the rest of it, they're having to go and learn from A to Z, like a whole bunch of techniques that there may or may not be.

16:37 They maybe learned it in grad school, but they haven't used it for 15 years.

16:40 But now they got to write about it, right?

16:41 Because they got to be comprehensive.

16:42 Yeah.

16:43 Yeah.

16:43 And so we're hoping to eliminate some of that so that, you know, people who are experts in given topics can contribute that

16:51 and then can leverage what other people know.

16:52 So, yeah, I think that is really quite an interesting perspective.

16:56 You talked about some of the problems with the sort of teaching and learning and writing sort of largely focused around like textbooks and stuff, but more generally as well.

17:06 If there are bugs in the book, you don't know.

17:10 They're hard to be tracked.

17:11 I mean, maybe you can find it, right?

17:12 Like versioning is difficult.

17:14 Do you have the current version?

17:16 You don't know, right?

17:17 How do you diff a book like that is, you know, two textbooks.

17:24 You just kind of flip through it, right?

17:26 I don't know.

17:26 It's not super easy, is it?

17:28 And sometimes it's really obvious, like, oh, we added a chapter on this or a figure on that.

17:32 But other times you said, you know, it's much more subtle.

17:35 Like, oh, there was an error.

17:36 Should have been a minus sign in this equation right here.

17:39 And like that's, you probably wouldn't catch that, right?

17:42 That's really a challenging position to be in when you're first trying to understand a topic, right?

17:47 It's like you work through the whole question.

17:49 You create the plot based on your understanding.

17:51 And it doesn't match what theirs is.

17:53 And then what do you do?

17:54 Like they're different.

17:55 And they could be wrong.

17:56 You could be wrong.

17:57 And there's no way to sort that out.

17:58 And like there's no way to contact the author really in most cases.

18:03 Yeah, it's quite hard.

18:04 And a lot of times you're a student.

18:06 So you're like, well, I'm wrong because I'm new at this.

18:10 I must be wrong, right?

18:11 Until, you know, somehow you maybe decide, no, I think I really am right.

18:16 This is really broken.

18:17 But yeah, that can be frustrating.

18:18 So you had a really nice way of breaking down the sort of things involved in that type of creation, that educational content creation, and sort of framing it in terms of concepts from the Python space.

18:34 And you started with functions.

18:36 And you asked the question, like, what are functions in the context of science and writing?

18:40 So what are functions in that context?

18:43 There's a few different things to think through.

18:45 But one of the things that we first looked at, maybe I'll give you a little bit of the backstory before diving into this.

18:51 One of the big motivating factors for this project was that my supervisor had developed a website quite a number of years ago instead of a textbook.

19:01 So he was really forward-looking in the sense that he wanted to get content out there for students in a very tangible, like easy to interact with way.

19:09 And so they built this website.

19:10 It's a great site.

19:11 But then we found a few typos in it and wanted to try and go in and fix them and realized it was tangled up in this crazy HTML mess.

19:18 The first, I guess, in the refactor of this was really identifying, like, what actually is the data here?

19:25 What's the data and what is the packaging?

19:26 So in this case, really, the data is just the text and the equations and the images.

19:31 The HTML and CSS, all of that is just packaging.

19:34 And so in this case, perhaps a function is actually like Sphinx.

19:40 And so you take your data, which is text and images, and then you compile that into a website.

19:45 And so what's powerful about that is then as styles and things like that update and as there's better ways to interact or somebody builds a fantastic new search tool, you haven't, like, you've separated out what is data and what is the packaging.

19:59 And so you can immediately leverage all of those new developments.

20:03 Yeah, it sounds super obvious.

20:04 I mean, as people who write software, like, you think, of course, you're going to do these types of things.

20:08 But then in practice, you go look at, you know, for example, the website you're talking about.

20:12 And it's all crammed together.

20:14 A lot of times these are written by people who don't have sort of formal software training either.

20:19 So they maybe don't even have some of these ideas sort of in the back of their mind when they come to it.

20:24 And so, yeah, so you're like, all right, well, let's break this out into restructured text.

20:29 People can edit that super easy.

20:31 I mean, you probably know LaTeX or something that's like an easier version of that.

20:34 So that's good.

20:35 And then, obviously, the styles.

20:37 So that's one part.

20:39 Another part you talked about was capturing inputs.

20:42 So, you know, you probably have some picture, but that picture has, you know, like a view onto it.

20:49 It has parameters for the equation.

20:51 It has all kinds of stuff, right?

20:52 So that's another aspect of sort of the function analogy, right?

20:56 One of the great things in this Sphinx documentation is the Matplotlib plugin.

21:01 And so that's what we've been leveraging in order to capture the inputs to your figures.

21:05 And so because to create a figure, we're running some sort of code with, as you said, some inputs.

21:10 So maybe we're looking at the electrostatic response of a sphere.

21:13 And you want to change it from a resistive sphere to a conductive sphere.

21:17 That should be something that, like, the user of this resource should be able to, like, readily do.

21:22 And so you can't do that with a textbook, obviously.

21:25 But in this case, if you've preserved that source code, then that's actually an entry point for people to actually take that single picture and start to be able to explore that.

21:34 And, I mean, once you, yeah.

21:36 It sounds like that's touching on one of the whole, like, super significant things in science in general these days is the whole reproducibility thing, right?

21:45 And this just makes the whole paper or the book or whatever it is more reproducible if you can re-execute it to regenerate the output, right?

21:53 Well, Rowan and I gave a talk at JupyterCon last year and sort of touching on reproducibility.

21:59 And I think one of the things that's important to keep in mind, too, is, like, what is the point of reproducibility, I guess?

22:07 So, I mean, it is obviously good practice to have your content be able to be regenerated.

22:11 That is a good thing.

22:13 But the way that you actually build upon somebody's ideas in science is you take what they've done and then extend it.

22:20 And so, also, by being able to, you know, at least capture all of the instructions to get to point A, then somebody can immediately pick that up and start to play around with it and hopefully get to point B, which is then actually maybe some sort of new discovery that extends on that work.

22:37 Yeah, that's cool.

22:37 So, you can actually literally build on the sort of algorithm and steps.

22:41 So, there's probably a lot of data exploration in Jupyter as well in this.

22:46 Like, you could, you know, copy little bits into Jupyter, play around, put it back in restructured.

22:50 Or can you load restructured directly?

22:52 You know, I don't actually know about that.

22:54 I'm sure that somebody's written something.

22:56 It's got to be out there somewhere, right?

22:58 Yeah.

22:59 Yeah.

23:00 But we've sort of been developing Jupyter Notebooks in parallel.

23:04 We've been teaching a lot of courses, actually, where computation is, like, not taught at all.

23:08 But we want to be able to have people play with figures.

23:11 So, we've been leveraging Jupyter and IPy widgets to basically wrap functions that compute things and give you plots to make that interactive.

23:20 And so, that's been really exciting to see that, you know, people can actually get up and running.

23:25 And they're running code, but they don't necessarily even need to know anything about Python, what is Jupyter, any of these things.

23:31 Right, right, right.

23:32 Just inputs and pictures.

23:35 Another thing you said is once we have this concept of sort of a reusable function, you can test it, like in Travis CI or continuous integration, right?

23:42 This has been exciting to see.

23:44 There's a few aspects to the testing.

23:46 First off is, like, when we have code snippets and things like that in the textbook, we can test them.

23:51 Same thing with all of the figures.

23:53 We can test those.

23:54 And so, I mean, if there's API changes or things like that down the road, we'll catch that so that the code always continues to work.

24:00 Because I think we've all seen the case, you know, where somebody actually wrote a textbook and there's printed examples of code in there.

24:06 And there's inevitably a bug.

24:08 And there's nothing that can be done about that once it's been published.

24:11 And so, then you just end up with generations of frustrated students who can't even get the code to run at the first stop.

24:18 So, here, at least, that's something that we can test.

24:21 Another thing that's actually been kind of interesting, too, is you can go in and test links.

24:26 And so, make sure that all of the things that you are pointing to and all of the extra resources that you are connected to continue to be there.

24:33 And if not, then you can go in and find something else relevant to point people to.

24:37 Yeah, that's a really good point because you don't necessarily control all the external resources that exist, right?

24:42 And you don't keep checking them.

24:43 Yeah.

24:44 Yeah, yeah, really cool.

24:44 So, the functions, that's a pretty low-level concept in structuring code.

24:50 So, the next level up would be classes, maybe?

24:53 So, starting to get to a bit more organization.

24:56 So, I mean, a function, you've defined a piece that is reusable.

24:59 And now a class, we're going to try and define something that you can perhaps inherit and build upon.

25:05 One of the things that we pointed to in this analogy is just looking at, like, a given page structure.

25:10 So, when you're talking through a concept, I mean, there's a few obvious things that every page has.

25:15 Like, it has a title.

25:16 It has contributors to that page.

25:18 One of the things that we've been trying to promote is a purpose statement on each of the pages.

25:24 You know, to give just, like, a high-level overview of why should I care about what is on this page.

25:28 I think that's a really good idea because one of the major benefits of people writing, say, like, unit tests against real proper, you know, software code is you know when you're done.

25:41 Like, if you put out all the things it's supposed to do in the test and it does them, well, you can stop messing around.

25:46 Because people can, like, fiddle with the code and think about what it might need in the future forever, right?

25:51 And so, there's this really clear, this is what I wanted to do.

25:54 I've done it.

25:55 Now, what's next, right?

25:56 And you obviously have the same problem in writing.

25:59 I really love this idea of, like, we're going to give this a purpose and almost test it.

26:04 You also said that this leads really well to a collaboration, right?

26:08 Because people coming to it know what the purpose is.

26:10 They all agree upon the purpose and it sort of helps communicate that.

26:13 Yeah, absolutely.

26:14 And, I mean, because when you have multiple authors contributing content to one resource, everybody's got a bit of a different writing style and all of those sorts of things.

26:23 And that's totally fine.

26:24 But it can lead to sort of a hodgepodgey resource.

26:27 But at the very least, you know what is going to be achieved in each page.

26:33 It's so much easier to collaborate and give meaningful feedback as well.

26:36 So, if I am reviewing somebody else's page and I know what they're trying to accomplish and have maybe a couple ideas about some different examples that they could include to help achieve their goal, that's easy to then point them to.

26:48 But if the purpose isn't clear from the outset and it's not immediately transparent, it's very hard to then give productive feedback.

26:55 Yeah, for sure.

26:56 It also helps with peer review.

26:57 You know, a simple peer review question is, have you achieved, does this thing do what it says it does, right?

27:03 Rather than, is it good or is it accurate or is it good, you know, whatever.

27:07 Those are really hard to answer.

27:08 Another thing is you say it leads to templates, right?

27:12 So, you could say this page is for, you know, a case study of this type.

27:17 And then that means it has this structure, right, which can really help with writing.

27:21 Yeah, and it helps, you know, solicit input from other authors as well.

27:26 So, what we've done with case histories, a case history and how we've defined it is it's basically like an exploration or geophysics example.

27:34 So, we walk through, you start with some sort of question.

27:37 And then we're going to walk through what are the relevant physical properties?

27:40 So, what are the different rock types or things like that that we're going to look for?

27:44 How are we going to try and detect those?

27:46 What do the data look like?

27:48 Then we go through and, you know, how do we process those data, interpret the results?

27:52 And then did we actually answer our initial question?

27:55 So, that's sort of like seven steps that we've broken all of these case histories into.

28:00 And once you actually start having a few examples of that, we've been able to send these templates out to researchers around the world who have experience in different applications than we do.

28:10 And just said, hey, do you have a good example that you could put into this framework?

28:14 And once you lay out the pieces, people are a lot more willing to go in and put their content there.

28:19 Because you've just removed, like, all of that overhead of figuring out how should I structure this.

28:24 Right.

28:24 How long should it be?

28:25 What should I say?

28:26 What's important?

28:27 What's not?

28:27 Yeah, it's super, super interesting.

28:29 I had Jesse Davis on the show quite a while ago.

28:33 And he did something similar for blogging.

28:36 He talked about how to write a good developer blog.

28:39 And he had come up with five design patterns for blog posts.

28:44 Right.

28:45 What are your goals?

28:45 Then this pattern applies.

28:47 And it's just, like, once you know what you're trying to do and you have the pattern, you're like, okay, these three steps.

28:51 This is what I do.

28:51 And then all of a sudden, like, it's the writer block can be largely gone.

28:56 It's much, much quicker.

28:58 And so, yeah, I really like this idea.

29:00 This is where we've seen definitely the most contribution coming in is because it's an easy place to jump in once something's structured.

29:06 Okay.

29:06 So once you have functions in the classes, you might want to reuse them other places.

29:10 That might be, like, import.

29:12 The import saving.

29:13 Yeah.

29:14 So this is where things get a little fuzzy.

29:16 But we've played around with this analogy.

29:18 And one of the things that has been kind of exciting is we've developed an equation bank.

29:23 And so one of the resources we've been working on is about electromagnetics.

29:27 And so Maxwell's equations are going to show up all over the place in multiple places.

29:31 And so that's something that you don't want to repeat writing.

29:36 And especially when you have multiple people, we want to try and stick to the same notation conventions and all of that sort of stuff.

29:42 So ideally, we don't want people rewriting these.

29:45 So we've actually set up an equation bank.

29:47 And then that's something that you can just include in your page is just include Maxwell's equations.

29:53 That's cool.

29:53 And so if there's a mistake, you fix it in one place, right?

29:56 And it fixes it everywhere.

29:57 Exactly.

29:57 Yeah.

29:58 Nice.

29:58 And let's see.

29:59 You could also maybe think of links as external, as sort of an thing you import, like an external resource that you depend upon.

30:07 Yeah, in a sense, because in like what you're doing in that way is you've got some word that's linked.

30:13 So, you know, maybe we've linked the word to some sort of specific geophysical system.

30:17 And so that is containerized piece of knowledge that you can sort of bring in and expose to the user in a meaningful way that also is in context.

30:26 So that's one piece that I think fits into that analogy.

30:30 Yeah, yeah.

30:30 Another one is once you start importing things, you see structures and dependencies.

30:34 And then you could almost say like, well, we should refactor this into this other form that's better once you see the large overall structure.

30:42 That's pretty cool.

30:43 Because then you get to looking at ideas of like which concepts build upon each other and what concepts you need to understand, you know, this given method or another method, which is kind of cool to be able to actually like introspect the field.

30:54 Yeah.

30:56 Yeah, it's pretty amazing.

30:57 And I'm sure there'll be some good visualizations at some point.

31:00 And then at the very outer end, someone else wants to use the thing you've created.

31:06 So you have the pip and packaging analogy as well.

31:09 This is something that I would love to see this idea evolve a bit more.

31:13 But I think at the basic level, making it clear how people can use things.

31:18 So applying a license, showing which concepts build upon others.

31:22 So if you're importing this more advanced resource, what are the things that you should be familiar with before that?

31:28 And then as well, versioning and all of that.

31:31 Make sure that that's clear when you're changing things.

31:36 This portion of Talk Python To Me is brought to you by Anvil.

31:40 With Anvil, you can build full stack web apps with nothing but Python.

31:44 Building for the web is complex.

31:46 You typically have to write JavaScript, HTML, CSS, some front end framework like React.

31:51 And then you've only done the front end.

31:53 You still have the server to write.

31:55 And then you have to decide where and how to deploy it.

31:57 With Anvil, all you need to know is Python to build production-ready apps and deploy and scale them with a single click.

32:04 You have a visual designer for your page.

32:06 And you've got the entire Python ecosystem to integrate with.

32:09 It even comes with a built-in database as a service.

32:11 I've been using Anvil myself.

32:13 And I'm really excited how accessible it makes the web, even for people who are not excited about writing HTML.

32:20 And if you happen to take my 100 Days of Code course, you'll see near the end, we actually spend a lot of time building a really cool web app with Anvil.

32:27 I'll put that app in the show notes.

32:29 But you can find it at pypoint-100days.anvilapp.net.

32:33 Get started at talkpython.fm/Anvil.

32:37 And they'll throw in a 10% discount on an individual plan just for you Talk Python listeners.

32:42 If you've been afraid of the web, go have a look.

32:45 This is something special.

32:46 And they're doing really interesting things with Python.

32:48 So that's the whole overall conceptual way of thinking about the work that you guys are doing in education and writing in the software Python space.

33:00 But you actually took a lot of the tooling literally from the software space, right?

33:05 Things like Git and continuous integration.

33:07 What all did you use there?

33:08 Everything's hosted on GitHub.

33:10 So that's our peer review mechanism.

33:12 That's all the versioning for us.

33:13 Issue tracking, all of that.

33:15 We have used Sphinx to actually build the pages.

33:18 And then I did mention that the Matplotlib plugin has been one that we're using to generate these reproducible figures.

33:25 We started out hosting stuff on Read the Docs.

33:29 But then the site got way too big.

33:30 So we host it separately now.

33:33 Yeah, and then Travis CI for all of the testing pieces.

33:37 It's not just the concept of it.

33:38 You're actually applying a lot of these tools and techniques to it.

33:41 That's pretty cool.

33:42 Oh, yeah.

33:42 And then Jupyter throughout as well.

33:44 Yeah, I'm sure Jupyter is in there.

33:46 Are you guys moving to Jupyter Lab these days?

33:49 Are you sticking with Jupyter?

33:50 What's the thought there?

33:51 I've dabbled in Jupyter Lab.

33:53 I'm quite excited to start diving into it a bit more.

33:56 I've just been with writing.

33:58 I'm trying to wrap up the PhD.

34:00 I'm hesitating diving into new and exciting tools because it's easy to lose track of time there.

34:05 Yeah.

34:06 Yeah, I can imagine that the fence probably has a top priority.

34:10 Yeah, at this point.

34:11 Yeah, Jupyter Lab looks really cool.

34:13 I haven't done anything with it.

34:14 But I've kind of looked and said, oh, this looks a little nicer than Jupyter.

34:17 Maybe I should start learning this.

34:19 I've been excited to see some of the Markdown plugins and things like that that they've been working on and actually being able to execute and test code within Markdown.

34:29 So that because I think there's a lot of utility there for the writing that we've done to make that process a lot easier for contributors.

34:37 So that I'm excited to start playing with.

34:39 Yeah, I really like this idea of testing your work.

34:41 And one of the things that I've seen as something of a detraction from the whole notebook way of working, and I definitely see the exploration and flow benefits.

34:53 But one of the things I see is less good is it's harder to say run tests over your Jupyter notebook or do code coverage of the code in your Jupyter notebook as part of those tests and things like that.

35:06 And you actually tweeted out a really cool project that when I first saw it, I didn't realize it actually had to do.

35:12 Its origins were in geophysics.

35:15 But over at github.com/open geophysics slash test I pi in B is a thing that lets you unit test Jupyter notebooks, right?

35:26 We've just started this.

35:27 This basically got actually pulled out of the GeoSci ecosystem because as a part of a lot of these courses, we were distributing notebooks.

35:35 And there were a whole bunch of different people who were contributing to these notebook repositories.

35:40 And then we were deploying them either on Microsoft Azure or MyBinder and then stepping up in front of a course and using them to teach.

35:47 And so, you know, when you're in front of a classroom, you really don't want errors popping up, especially if you didn't write the original notebook.

35:56 Yes, exactly.

35:57 Yeah.

35:57 So this was really born out of need to make sure that at least if you are standing in front of the class that the notebook should run.

36:04 And so what we've been trying to do here is we've extracted a lot of the work that we've done using NB Convert that just runs the notebook and make sure that it completes with no errors.

36:15 So I think there's a lot that we can think about to, you know, increase the utility of this.

36:20 But as a first pass, like just making sure that the notebook goes from A to Z without erroring, that makes sure, too, that you've properly defined all your dependencies and all of those pieces that are so easy to forget, especially, you know, to new contributors.

36:33 It's not always clear that, you know, if it works on my machine, why doesn't it work over here?

36:37 Yeah, exactly. So it tests things like that your Python environment has the dependencies installed and stuff like that.

36:43 Yeah.

36:43 Which that can be challenging in the whole data science, scientific computing space, right?

36:48 Another one of the reasons I wanted to have something like this out there is that we're sharing notebooks that go along with our publications.

36:56 And so a lot of them are built on Simpeg.

36:59 We know that, you know, down the road, we're going to make changes that are not backwards compatible.

37:03 And if you catch that right away, it's very easy to fix.

37:07 But if you let the notebook lag by a year, it's really hard to then go in and maintain and upgrade that.

37:13 And so part of this, too, was just to be able to put like the Travis Cron jobs that run once a month on our research notebooks and make sure that the research still continues to run.

37:25 Yeah, that's a really good idea.

37:26 Just run it periodically and just go grab everything new and see if it works.

37:31 Yeah, that's cool.

37:31 So do you have any way to do more specific data?

37:37 Like result validation?

37:39 So, for example, like if you have a cell, it would be great if it could convert it like to a function that you could call with parameters and get the response out or, you know, things like that, right?

37:49 Like not only does it still run, but it actually gives me the same results.

37:53 Yeah, I think that that would be super cool.

37:54 I mean, the simplest way right now to do that is like include an assert statement that like downloads your archive data set and make sure that you can still reproduce that.

38:03 But having something like that a little more exposed on the outside, I think, could be quite neat.

38:08 Same thing with even sort of checking figures.

38:11 So make sure that your figure looks the same.

38:13 Right.

38:14 Just like visually or, you know, pixel by pixel.

38:18 Just compare that the picture is the same.

38:21 Of course, when, you know, matplotlib updates to have like slightly faded cool axes, you know, that's going to break.

38:29 But you could just say, oh, no, no, this new picture is still OK for us.

38:33 We'll just upgrade that.

38:34 Right.

38:35 Or update the baseline.

38:36 Yeah, that would be nice.

38:38 But it sounds like a lot of work, right?

38:39 Yeah, absolutely.

38:40 Yeah.

38:41 But I think the biggest thing is just knowing when stuff changes.

38:43 So even if, you know, you compare these two things and they are different, but you can visually tell that it was just a style update.

38:49 Like, that's fine.

38:49 Yeah.

38:50 A friend of mine, Luan Falco, has this project called Approval Tests.

38:54 And I don't know.

38:55 It's integration with Python.

38:56 I know he was doing some there.

38:57 And it's all based on that idea.

38:59 Like, you write a test and you either get a result as a picture or as, say, a JSON document or whatever.

39:05 And you just go, yeah, that looks good.

39:07 And then all subsequent tests just go, is it the same?

39:09 Like, you don't have to do all sorts of testing.

39:11 You just feed it two pictures and it goes, they're the same, they're different.

39:14 And you can either reapprove the new one or it's an error.

39:17 Right.

39:17 And so, like, that idea, I don't know.

39:19 Maybe somehow these can be put together.

39:20 It sounds cool.

39:21 Yeah, that'd be interesting.

39:22 Yeah.

39:22 Nice.

39:24 So, one thing that sort of comes to mind in this whole space is there's got to be a lot of data that you're collecting to get all these pictures and stuff, right?

39:35 Like, the Earth is big.

39:35 So, what are some of the challenges around, like, big data in geophysics?

39:41 So, in a lot of cases, we're not necessarily encountering sort of the same style of big data problems that you think about when you think of, like, social media.

39:49 We don't have data sets that are that big, at least in our group.

39:51 Not like CERN, for example?

39:53 No, not nearly like that.

39:55 But a lot of what we're working with is small, disparate data sets.

39:59 And so, we'll have collected, you know, a whole bunch of different types of geophysical surveys over one setting.

40:05 And now we want to try and integrate all those different data types and figure out, okay, like, what is this telling us in terms of the rocks?

40:11 Like, what does that mean in terms of the geology?

40:14 So, that's one aspect where we're starting to see machine learning coming in in a very powerful way is actually trying to either take data that have been interpreted independently and then try and sort of merge those interpretations to then give you something stronger in terms of, like, I think this is rock A and this is rock B.

40:32 Rather than this is magnetic and susceptible, this one is conductive.

40:36 I see.

40:36 More like trying to draw the proper conclusions from the raw data.

40:41 Yeah.

40:41 So, really trying to drive much more so at, like, the geologic interpretation.

40:45 Yeah.

40:46 So, another thing I was wondering is how is machine learning being used there?

40:50 Because with these pictures and a lot of this data, it seems like somebody could come along and make some really interesting uses of TensorFlow or PyTorch or something like that.

41:01 Yeah.

41:01 I mean, I haven't seen a ton of neural network work yet, but that's also just my one sample point.

41:08 Where I have seen a fair bit of work done is on the clustering side of things.

41:12 So, really trying to either cluster interpretations or one of the things that I think is really kind of cool is when we start to meld together deterministic inversions.

41:23 So, that's a lot of the work that we've done in the past where you're running some sort of optimization problem to fit your data.

41:28 But then connecting that with statistical or machine learning approaches where you say, okay, I've got some sort of geologic knowledge about these rocks and how do I now couple that to my physics?

41:38 So, that's one area where I think there's a lot of potential.

41:41 Yeah.

41:42 We're just at the beginning of all this machine learning stuff, right?

41:44 So, who knows what it'll look like in 20 years.

41:47 Yeah.

41:48 Yeah.

41:48 It'll be wild.

41:50 So, it sounds like you do a fair amount of programming around your research projects and stuff.

41:56 And I know what the world looks like as a full-on software developer, right?

42:00 You write a lot of code and things like that.

42:03 But, you know, how do you think about balancing like programming and working on some of the libraries versus, say, research versus writing?

42:11 Like, I know right now it's writing because of the time frame and all.

42:15 But I feel like that's the question of any academic is how do you balance all these things?

42:22 So, if anybody knows, I would love to know.

42:23 But I think in a lot of ways what's been powerful with the group that I'm in is that a lot of this is collaborative.

42:33 So, that there are pieces that you're working on together in each of these aspects.

42:36 Because, I mean, as an academic, you do need to write about the things that you've been doing.

42:40 So, you need to do things.

42:41 And that involves, you know, programming and research.

42:43 And then you need to share that with the world.

42:45 And so, that's writing.

42:46 Yeah.

42:46 And I think having this open source angle to it, the collaboration really helps, right?

42:50 Because other people can do some of the programming where they have the specialty, right?

42:54 Absolutely.

42:54 And there's less duplication of efforts, especially on a lot of those like more mundane tasks of just like parsing data files and things like that.

43:02 Because there's no need for everybody to be writing that.

43:05 But you all need it.

43:06 Yeah.

43:06 Somebody gave me this thing in IDL out of this program or generated by IDL.

43:11 And like, we'll just import this thing and we'll read it.

43:14 We'll be good, right?

43:15 Yeah.

43:15 So, another thing that is a major in that space is getting sort of citations and credits, right?

43:21 Those are sort of like your upvotes for your career.

43:24 Yeah.

43:25 Are you familiar with the Journal of Open Source Software?

43:28 Yes, I am.

43:28 I'm actually the geoscience editor.

43:30 You are?

43:31 I didn't realize that.

43:32 Yeah.

43:32 Yes.

43:33 Okay.

43:33 That's awesome.

43:34 Yeah.

43:35 So, what are your thoughts?

43:37 I mean, obviously, you must be a supporter if you're one of the editors, right?

43:40 But just, you know, maybe tell people really quickly what it is since they if they didn't listen to, you know, my show a while ago with our fun.

43:46 So, it's really a developer-friendly journal.

43:47 So, it's really a developer-friendly journal.

43:51 And so, you know, I think it's a lot of people that I'm going to do.

43:52 And so, you know, I think it's a lot of people that I'm going to do.

43:52 In that sense, the biggest thing is you're actually sort of getting peer review much more so on your code development and practices.

43:58 There's a short paper.

44:00 It's like one to two pages.

44:01 So, it's meant to be pretty lightweight.

44:03 Like, that's supposed to be, you know, a couple hours work.

44:05 What we're really evaluating is the software.

44:08 So, I'm obviously very supportive of this.

44:10 I think it's fantastic to be getting credit tied basically immediately to the software.

44:15 Because that, I think, is a big piece that is missing in academia.

44:20 But one of the things that I've been really sort of stunned by, I've been with Joss now for almost a year, is just like how, what a positive process it is.

44:29 You know, I send out requests for reviewers.

44:31 And people not only say yes, they're very enthusiastic to jump in and learn about projects.

44:39 I've seen people, like, do small pull requests to fix typos or fix small things.

44:44 They're writing very well thought out issues.

44:48 And then the authors are so grateful to have somebody with a fresh set of eyes coming in and looking at what they've done.

44:54 And so, it's just been such an amazingly positive process in, like, in peer review, which is not something I've really seen elsewhere.

45:01 Normally, it's like, oh, I don't want to do this, right?

45:03 No, it's, I think it's a really cool project.

45:06 And I definitely wanted to give a shout out to it.

45:08 Because I think it really ties in well with, you know, the way you're thinking about the writing and structuring it in terms of software and open source.

45:16 And I think this is just the perfect complement to it.

45:19 Yeah, and related to that, Jose is the Journal of Open Source Education is starting up very soon.

45:25 In fact, they might actually even be accepting submissions.

45:28 It's parallel.

45:30 And I think there's a lot of interesting things sort of from the geoside perspective where we're now thinking about developing open education resources.

45:37 And they'll be accepting submissions like that.

45:39 So, a few avenues.

45:40 Yeah, that's really, really cool.

45:41 And I definitely think the work that you are doing, even though it sort of found its roots in geophysics, I think it could equally apply to biology or chemistry or lots of things.

45:52 Hopefully.

45:53 Yeah, that's the goal.

45:54 Hopefully.

45:54 With a slight adaptation.

45:56 Yeah.

45:56 Another thing that I ran across doing some research for the show is EarthPy at earthpy.org.

46:03 So, over at EarthPy, it's a collection of IPython notebooks with examples of earth science and the related Python code.

46:10 So, if people are listening and they're into geoscience of some sort, like maybe there's some really interesting things to draw from there.

46:18 Oh, cool.

46:19 I had not seen that before.

46:20 So, I will also be checking that out.

46:21 Yeah, that's really quite neat.

46:24 What's the IoT story in all of your research?

46:29 I mean, I can imagine like a bunch of little sensors planted around, but I don't know if you're involved in any of that.

46:35 I'm not involved in any of that, but there's definitely things coming.

46:39 There's a lot of interest in looking at the use of drones for smaller scale surveys.

46:45 I think actually, too, looking at more of the precision farming and precision viticulture for vineyards.

46:52 I think there's a lot of potential on those smaller scales for bringing in sensors that are giving you much more real-time feedback.

47:00 And you can make decisions about which areas need to be irrigated or not based on what you're seeing.

47:05 And you could get that information like, at 2 p.m., it needs to be irrigated, not at 3, right?

47:10 Yeah, exactly.

47:11 Like, that kind of, like, super detailed.

47:12 With the advent of such cheap connected devices, it just seems like, and the ability to program them so easily with Python, right?

47:21 Like, things like MicroPython and, like, $5 microchips that you can set up.

47:27 I don't know how long you've run those on battery, but they've got to last a long time.

47:30 So, it just seems like you could do such great stuff with that.

47:34 Yeah, and this is something I'm going to be very curious to watch and see what happens over the next few years.

47:39 Because I think a lot of these ideas are just starting to be worked out.

47:42 And people are figuring out how to make geophysical sensors a lot smaller.

47:45 Because that's been a problem for a long time, as they're, in a lot of cases, they're big.

47:49 But people are making progress.

47:50 It totally depends on the survey.

47:53 So, there's some, if you're trying to go out and do a gravity survey, it's very big, slow.

47:58 Because you need to go out, you need to level it, you need to stay level.

48:01 If the wind picks up, that's a problem.

48:03 That's a challenging one.

48:04 But looking at magnetometers or things like that, people can now start to make them only a few centimeters.

48:09 So, that's then something that is readily adapted to an IoT application.

48:14 Yeah, that sounds quite cool.

48:16 All right, well, I think maybe we'll leave it here for our geophysics talk.

48:22 But, of course, I have two more questions for you before you go.

48:25 So, if you're going to write some Python, yeah, if you're going to write some Python code, what editor do you use?

48:30 Or editors, if you use Horde 1, I guess.

48:31 Well, it'll be Jupyter and Sublime is the combination I use.

48:34 Nice.

48:35 Yeah, I figured Jupyter must be in there.

48:37 Although, maybe JupyterLab someday.

48:38 I don't know.

48:39 I'll see.

48:39 Yes, absolutely.

48:40 Nice.

48:40 And then, notable PyPI package.

48:44 I mean, we definitely have Simpeg out there, which people can install Simpeg.

48:50 And that's really cool that it's that easy, right?

48:52 Yeah.

48:52 So, what else that's, like, really people maybe haven't heard of that would be great?

48:56 I guess there's two.

48:57 So, there's Diskretize, which Simpeg is built on.

49:00 And that does a lot of finite volume meshing and all of that.

49:03 We've got Octree and Tensor Meshes and Cylindrical Meshes.

49:06 So, that's an interesting package if you want to simulate PDEs.

49:09 But one that I'm very excited about is called Properties.

49:12 And it's, basically, it does, like, strong typing, validation, serialization, all of that in Python.

49:19 But it does a really good job of helping you design an API that's meant to be interactive.

49:24 So, in the sense of Jupyter.

49:25 Like, you want to design an explorable API.

49:28 Properties is a great package to help you build that.

49:31 That sounds really cool.

49:32 And I've never heard of that.

49:33 That's great.

49:34 That's why I asked this question.

49:35 That's awesome.

49:38 Okay.

49:38 So, final call to action.

49:39 Maybe there's educators or scientists out there who would like to adapt some of the work you've done.

49:45 What would you say to them?

49:46 Yeah.

49:47 I mean, I think that if they have ideas and want to contribute directly to any of these resources,

49:51 we are always keen to have more contributors.

49:54 I think, too, that, you know, trying to build in a way that promotes community and build in a way that you can invite others to go in and contribute is such a powerful thing to keep in mind.

50:05 It's because then you yourself don't have to do all of the work.

50:08 And the end product that comes out is going to be better than any one person could have done.

50:13 And sort of the parallels to that, then, too, is if you see something that you have ideas about how to contribute to, get involved and get in touch.

50:21 People are always looking for contributors, and everybody has something to add, even if you don't know how they are approaching it or where to even start.

50:28 Yeah.

50:29 It doesn't have to be a major rewrite of some system, right?

50:31 Like, first contributions can be much, much smaller, just a little thing you can fix, right?

50:36 Yeah.

50:36 Absolutely.

50:37 All right.

50:37 Well, Lindsay, thank you so much for being on the show and sharing what you're up to.

50:40 It's been really fun to chat.

50:42 Thanks, Michael.

50:42 It's been good.

50:43 Yeah.

50:43 And good luck on the dissertation defense.

50:45 Thank you very much.

50:46 Yeah.

50:46 Bye.

50:47 This has been another episode of Talk Python To Me.

50:51 Our guest for this episode has been Lindsay Hagee, and it's brought to you by MongoDB and Anvil.

50:57 Interested in seeing how web apps are built with Python and MongoDB?

51:02 Register for my webinar I'm doing with MongoDB over at mongodb.com slash webinar slash Python.

51:08 See you there.

51:10 Anvil lets you build your web apps quickly and easily.

51:13 With Anvil, you can get your web app in Python up and running in hours, not weeks.

51:19 Want to level up your Python?

51:21 If you're just getting started, try my Python jumpstart by building 10 apps or our brand new

51:26 100 days of code in Python.

51:28 And if you're interested in more than one course, be sure to check out the Everything Bundle.

51:32 It's like a subscription that never expires.

51:35 Be sure to subscribe to the show.

51:36 Open your favorite podcatcher and search for Python.

51:39 We should be right at the top.

51:40 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct

51:46 RSS feed at /rss on talkpython.fm.

51:49 This is your host, Michael Kennedy.

51:51 Thanks so much for listening.

51:52 I really appreciate it.

51:54 Now get out there and write some Python code.

51:55 Thank you.