Python in Climate Science

Episode #134, published Thu, Oct 19, 2017, recorded Mon, Oct 16, 2017

Episode Deep Dive Links Transcript

What is the biggest challenge facing human civilization right now? Fake news, poverty, hunger? Yes, all of those are huge problems right now. Well, if climate change kicks in, you can bet it will amplify these problems and more. That's why it's critical that we get answers and fundamental models to help understand where we are, where we are going, and how we can improve things.

On this episode, you'll meet Dr. Damien Irving. He's a climate science researcher using Python to understand what the climate models are telling us.

Episode Deep Dive

Guests Introduction and Background

Guest: Dr. Damien Irving is a climate scientist and postdoctoral research fellow at Australia’s national science agency, CSIRO.
Expertise: He works extensively with Python for climate data analysis, focusing on large-scale climate model output.
Research Focus: Dr. Irving’s role involves studying simulated climate data (multidimensional arrays covering variables like temperature and rainfall) to better understand climate change and inform decisions on mitigation and adaptation.

What to Know If You're New to Python

If you are new to Python and the topics from this episode, here are a few pointers to help you follow along:

Familiarity with data-focused Python libraries like NumPy, pandas, and matplotlib is helpful, as climate model data often involves large arrays and visualizations.
Understanding the basics of Conda for environment management will help you easily install scientific libraries without the hassle of manual dependency resolution.
Knowing how version control tools (e.g., Git) integrate with Python projects is essential for reproducibility, crucial in climate science.
Gaining some exposure to the concept of “metadata” (information describing the data) will help make sense of formats like NetCDF, which store climate model outputs and associated documentation.

Key Points and Takeaways

Central Role of Python in Climate Research Python is integral to modern climate science because it excels at handling large-scale data and has powerful libraries for analysis and visualization. Dr. Irving highlighted that while many climate models are written in Fortran for historical and performance reasons, Python drives post-processing, statistics, and data visualization. This approach, model in Fortran, analyze in Python, is a standard pattern across much of the climate science community.
- Links/Tools:
  - Python
- Fortran (General info)
NetCDF as a Key Data Format Climate models typically output data in NetCDF (Network Common Data Form), a self-describing format that includes detailed metadata about the variables, dimensions, and units. Dr. Irving explained how metadata standards like “CF conventions” make it easier to share and compare climate model outputs. These conventions streamline tasks such as visualizing or extracting climate metrics across time, latitude, and altitude.
- Links/Tools:
  - NetCDF Documentation
- CF Conventions
Powerful Python Libraries for Climate Data Dr. Irving described two main Python libraries widely used in the climate community: Xarray and Iris. Xarray extends the Pandas concept to n-dimensional data, while Iris enforces stricter metadata rules built on CF conventions. Both libraries simplify loading, manipulating, and visualizing large arrays of atmospheric or oceanographic data.
- Links/Tools:
  - Xarray
- Iris (Met Office)
Visualization with Declarative Libraries While Matplotlib is a standard for plotting in Python, Dr. Irving introduced GeoViews and HoloViews, which enable high-level “declarative” plotting. These libraries “automatically” decide the best visual representation based on metadata, freeing researchers from manually tweaking projections, color scales, and plot details. This efficient approach is especially helpful when dealing with massive climate datasets and iterative experiments.
- Links/Tools:
  - HoloViews
  - GeoViews
  - Bokeh
Parallelism and Big Data With petabytes of simulation data, parallel and distributed analysis is essential. Although Dr. Irving mentioned that many “embarrassingly parallel” tasks in climate science can be handled via vectorized NumPy operations or built-in Dask support within Xarray, more advanced users might still leverage HPC systems or explicit parallel libraries. The main takeaway is that even moderate HPC techniques can drastically reduce analysis time.
- Links/Tools:
  - Dask
- Conda
Reproducibility and Best Practices The episode underscored the “reproducibility crisis” in science, where many researchers do not preserve the exact code or environment to replicate results. Tools like Git for version control and environment management with Conda or Docker (though Docker usage is not widespread yet) help ensure that analysis is repeatable and transparent. Capturing version info in NetCDF history attributes is an extra step that Dr. Irving uses to track analysis provenance.
- Links/Tools:
  - GitPython (for capturing commit IDs in workflows)
- Docker
Software Carpentry and Training Dr. Irving discussed Software Carpentry and Data Carpentry workshops as an introduction to best practices in programming and data analysis, especially tailored for scientists. These efforts train researchers in basic coding, version control, and reproducible workflows, topics often missing from traditional academic programs. Python’s ecosystem, combined with carpentry courses, levels the playing field for newcomers in climate or any scientific domain.
- Links/Tools:
  - Software Carpentry
- Data Carpentry
Engaging in the Wider Python Community A key recommendation was for researchers to attend PyCon or SciPy events to connect with developers beyond academia. This cross-pollination accelerates the sharing of new libraries, techniques, and problem-solving strategies. Collaborating with Python professionals and open-source maintainers can raise the quality of climate-science tools.
- Links/Tools:
  - PyCon
- SciPy Conference
Scientific Consensus vs. Public Debate on Climate Change A significant portion of the conversation addressed the gap between scientific consensus and public perception. In conferences, there is nearly unanimous agreement that humans are causing climate change, yet mainstream media often presents it as a contested debate. This disparity highlights the importance of clear communication, activism, and making data analysis (and the code behind it) accessible.
Activism and Economic Incentives Beyond coding and analysis, Dr. Irving encouraged climate-conscious actions like contacting representatives, protesting harmful projects, and using personal finances to support sustainable energy. While political efforts remain vital, economic shifts, such as the falling cost of solar power, also drive adoption of climate-friendly solutions. “Winning slowly” is not enough if continued emissions ramp up climate risks.

Interesting Quotes and Stories

"When I started, obviously Conda didn't exist... and it used to be a complete nightmare to install these libraries. But now it's just 'conda install', one line, it's all done." – Dr. Damien Irving on how packaging and environment management have evolved

"We need to win fast on climate, because a slow victory is effectively losing." – Dr. Damien Irving on the urgency of climate action

"For me, the real game-changer in climate analysis is the proper use of metadata. If everything is labeled in NetCDF the right way, so many tasks become just one command." – Dr. Damien Irving on the power of NetCDF CF conventions

Key Definitions and Terms

NetCDF: A self-describing, machine-independent file format that is widely used in climate and atmospheric sciences for array-oriented data.
CF Conventions: A set of metadata standards ensuring that NetCDF files describe climate and forecast data consistently.
Xarray: A Python library designed to handle multidimensional arrays by adopting a Pandas-like interface.
Iris: A Python library developed by the UK Met Office, enforcing stricter CF metadata standards for climate data.
Software Carpentry: A volunteer organization that teaches computing skills to scientists, focusing on coding and data management best practices.

Learning Resources

Below are a few curated Talk Python courses that can complement the ideas in this episode. These links include a special query string so we can track engagement.

Python for Absolute Beginners: Perfect if you're just starting out, with a focus on core concepts in Python.
Data Science Jumpstart with 10 Projects: Gain hands-on experience with data handling and analysis, useful in climate science contexts.
Fundamentals of Dask: Ideal for those ready to scale data operations and handle parallel tasks, which is directly applicable to large climate datasets.
Python Data Visualization: Explore plotting libraries to elevate how you visualize complex climate data sets.

Overall Takeaway

Python has become an indispensable tool in climate science, making it easier to manage and visualize massive amounts of simulation data. Dr. Damien Irving’s experience highlights how open-source software, reproducible practices, and an engaged scientific community can collectively push climate research forward. At the same time, bridging the gap between science and society remains crucial, developers and climate scientists alike can collaborate not only on the computational challenges but also the broader activism and policy decisions that shape our planet’s future.

Links from the show

Damien on the web: drclimate.wordpress.com
Damien on Twitter: @DrClimate
Python for the Atmospheric and Oceanic Sciences blog: pyaos.johnny-lin.com
Software Carpentry: software-carpentry.org
Episode #134 deep-dive: talkpython.fm/134
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #134 deep-dive: talkpython.fm/134

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 What's the biggest challenge facing human civilization right now?

00:03 Fake news, poverty, hunger, oppression?

00:05 Yes, all of these are huge problems right now.

00:08 But if climate change kicks in, you can bet that it'll amplify these problems and many more.

00:12 That's why it's critical that we get answers and fundamental models to help understand where we are,

00:18 where we're going, and how we can improve things.

00:20 On this episode, you'll meet Dr. Damien Irving.

00:22 He's a climate science researcher using Python to understand what the climate models are telling us.

00:28 This is Talk Python To Me, episode 134, recorded October 16, 2017.

00:33 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:53 This is your host, Michael Kennedy.

00:55 Follow me on Twitter, where I'm @mkennedy.

00:58 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:04 Hey, everyone.

01:05 I'm super excited to bring you this climate science, climate change episode.

01:09 But before we do, I want to just catch you up really quick on my free MongoDB course that I talked about last week.

01:15 If you're looking to learn MongoDB, especially with Python, check out freemogadbcourse.com.

01:21 Just last week, over 5,000 people signed up and really enjoyed it.

01:24 So drop by the website, sign up, and check it out.

01:27 Now let's talk with Damien about climate research and Python.

01:30 Damien, welcome to Talk Python.

01:32 Thanks for having me.

01:33 Very happy to be on.

01:34 Yeah, I'm really excited about our topic that we've got lined up today, Python and climate science.

01:39 And I think there's just so many aspects to talk about that.

01:42 Well, we're going to cover a bunch of them, right?

01:44 We're going to cover a bunch of different things, the programming, the problems you're trying to solve,

01:49 the education problems in terms of educating data scientists, and much more than that, right?

01:54 Yeah, absolutely.

01:54 Hopefully, we can have a wide-ranging chat.

01:57 Yeah, I think that's the way we should definitely do this.

01:59 But before we get to that, let's start with your story.

02:02 How did you get into programming in Python?

02:03 Well, I mean, I did my undergraduate degree in science, majoring in meteorology, way back in 2008 now.

02:09 But that actually kind of involved very little programming, which sounds shocking,

02:14 but is actually pretty common for people who study science.

02:16 So I think the only programming course I did was a short summer intensive learning FORTRAN in the engineering department.

02:23 And that actually put me ahead of the curve and ahead of most undergraduate scientists.

02:27 And so I picked up a summer job.

02:29 Yeah, that's amazing.

02:30 Yeah, it's scary to think back.

02:32 Yeah, and then so I was able to pick up a job off the back of that, basically doing some work for an assistant professor in the meteorology department

02:39 after I finished my undergraduate degree.

02:40 And he kind of sat me down in front of a command prompt, pointed me towards Ferris, which is a scripting language used a bit in oceanography,

02:49 and kind of left me to it.

02:50 And I bumbled my way through that.

02:52 Nice.

02:53 Was that actually in FORTRAN, or what language was that in?

02:55 No, so Ferris is kind of a standalone language, I guess, just for doing basic data analysis in oceanography.

03:02 So it's a fairly limited language, but it's good if you're dealing with oceanographic data.

03:06 But yeah, I kind of bumbled my way through that, and then bumbled my way through using FORTRAN and things like that in an honors degree,

03:12 which is kind of a year-long research project you do after you're undergraduate in Australia.

03:16 Yeah, and that was my kind of introduction to programming.

03:19 And it's kind of a pretty typical scientist experience, kind of self-taught,

03:23 no formal training or education in programming.

03:25 Right.

03:26 Here's the problem to solve.

03:27 We think programming can help.

03:28 Here's some tools.

03:29 Yeah.

03:30 I have at it, right?

03:30 Go.

03:30 Yeah.

03:31 But then, actually, after my honors year, I was lucky enough to get a job at CSIRO,

03:35 which is a national science organization in Australia.

03:38 And it was kind of a half research, half support scientist role.

03:42 And in my support science role, I got to spend a lot of time with IT people who basically introduced me to Python.

03:48 And that's kind of where my path kind of diverged from typical.

03:51 I got to spend a lot more time with IT nerds than your typical scientist would.

03:56 So I was kind of really lucky in that respect.

03:58 And they pushed me onto Python.

03:59 Thank goodness.

04:00 Yeah.

04:00 What is this language?

04:01 Nobody knows what you're doing.

04:02 Go learn Python.

04:03 Yeah.

04:03 Nice.

04:04 And so nowadays, what do you do?

04:06 Yeah.

04:06 So today, nowadays, I'm a postdoctoral research fellow at CSIRO.

04:11 So I'm back there after my PhD, which is nice.

04:13 That's cool.

04:13 What was your PhD in?

04:14 My PhD was in waves in the upper atmosphere and how they affect the weather down at the surface.

04:20 Okay.

04:21 Yeah.

04:21 I didn't even know there were waves in the upper atmosphere.

04:23 That's awesome.

04:24 Yeah.

04:24 It's kind of, I mean, the atmosphere is kind of a fluid.

04:27 You kind of, I guess you don't think of it like a fluid like you would an ocean, but it flows the same way as a fluid and stuff.

04:33 So there are waves, just like if you drop a stone into a pond and they ripple and things like that.

04:39 So looking at those waves in the atmosphere.

04:41 Okay.

04:41 Very cool.

04:42 Yeah.

04:42 But yeah, so the work involves, my work right now involves looking at climate model data.

04:47 There's probably about 40 or so modeling groups around the world that have computer climate models.

04:52 And periodically, every five or 10 years or so, they all perform sets of common experiments.

04:58 So they'll have worlds with humans in them and worlds without humans in them and worlds with different levels of greenhouse gases and all those things.

05:05 And they'll make all that data available.

05:08 And the entire data set is, you know, multiple petabytes in size.

05:12 But individual researchers like me typically only require, you know, a subset of the data set.

05:17 And so all that data in Australia is at a national computing facility.

05:21 And rather than have people download all that data to their own institution, which would obviously be impractical, they build a lot of, I guess, analysis infrastructure on top of the data.

05:30 So you log in remotely.

05:32 I see.

05:32 That's cool.

05:33 So there's so much data that it's in one place and you send the code and analysis to the data rather than the other way, huh?

05:38 Yeah.

05:39 So they have all the analysis infrastructure on top of the data so you don't have to move it anywhere.

05:42 So I live in Hobart in Australia, but I spend my days on a computer that sits on top of all the data in Canberra.

05:48 Yeah.

05:49 So that's kind of what I spend my days doing.

05:51 That's awesome.

05:52 That sounds really fun.

05:52 That's an incredible amount of data.

05:54 How much of what you work with, is it simulation?

05:57 And how much is it analyzing observational type data?

06:01 Yeah, for me, it's pretty much all simulation.

06:03 So it's all scenarios of the future world to look at climate change.

06:08 Or it's kind of reruns of the past 150 years with various elements taken out.

06:13 So humans might be removed or different aspects to kind of figure out what's caused what, if you like, in the observed climate.

06:20 I see.

06:21 What if we never ate beef or something like this, right?

06:23 Never raised cattle?

06:24 Yeah.

06:25 You can have models that have different kind of treatments of what's happening at the surface in terms of vegetation.

06:30 You know, yeah, livestock and stuff like that.

06:32 Yeah, farming practices.

06:33 Yeah.

06:34 So I deal almost exclusively in fictitious worlds, but there are plenty of people who deal with actual observations.

06:40 Yeah, and I'm sure that there's a really important place for both.

06:45 So maybe that's a good place to start.

06:47 Let's talk about the types of problems that you guys are trying to solve with Python.

06:51 Maybe some of the modeling and how that goes, things like this.

06:54 I guess most problems that climate scientists are trying to solve when they're using Python is that

06:59 someone's run one of these models, either a fictitious kind of world like I'm looking at,

07:03 or ones where a whole bunch of observations have been thrown in and the model is used to fill the gaps, basically.

07:09 So obviously, we don't observe temperature and rainfall and all those things everywhere.

07:14 So models can be used to kind of fill in the gaps between the observations using the observations as kind of a ground truth.

07:21 And so either way, whichever type of model you're using, it'll output large multidimensional arrays of data.

07:28 So they'll have a time axis, a latitude axis, a latitude axis, and a depth or a height altitude axis.

07:34 So large multidimensional arrays.

07:36 And then it's basically trying to draw insights from that data about how the climate system works, really.

07:43 So a lot of kind of time series analysis on those large arrays, other statistical analysis.

07:49 And just a lot of the time, it's just the fundamental task of actually visualizing what the model is simulating.

07:55 So actually visualizing the temperature data or the humidity data or whatever output it is, just what is this model doing?

08:02 And a lot of the challenge is just seeing it.

08:05 Right.

08:06 I can imagine, right?

08:07 Like once you've got so much data, just great ways to visualize it so you actually understand because that's a lot of axes.

08:14 Yeah.

08:14 It seems like a simple thing, but actually just seeing what a model is simulating is quite a challenge at times.

08:19 Yeah.

08:20 So do you feel like there's a lot of pandas, numpy, matplotlib, Jupyter Notebooks, those types of things?

08:27 Yeah, definitely numpy or libraries built on top of that for multidimensional arrays.

08:33 Pandas and stuff would be probably more common among, I guess, a lot of scientists alike me and have model data that comes out on these nice grids.

08:40 And then you have, I guess, the other scientists who is kind of, they're more GIS problems.

08:45 They've been out on a research vessel and they've dropped a temperature probe down at one point.

08:50 And then 100 nautical miles later, they drop another one and then another one.

08:54 So then they have this kind of unstructured spatial data.

08:57 I mean, it becomes much more of a GIS pandas kind of problem.

09:01 Right.

09:01 Because when you simulate the data, you don't have to clean it up as much.

09:04 No, it becomes fairly nice.

09:06 And the grid is, you know, all the latitudes and longitudes are perfectly spaced.

09:10 And it's this very nice regular grid.

09:13 Whereas if you're more of an observational scientist and you're just taking observations wherever you can, you have to clean it.

09:20 You have to get it on a grid that's a bit nicer.

09:22 And so, yeah, I kind of, I think my lucky stars some days that I'm not in that space, but they don't have the ridiculous amount of data that I've got to deal with.

09:29 So maybe, maybe there's some nice aspects to it.

09:31 Yeah, of course.

09:32 Of course.

09:33 That's pretty interesting.

09:35 And so for the simulations, I suspect that you have Python in action there, but it's not, is it all in Python or is it C++ or what's the mix there?

09:45 Most climate models themselves are written in Fortran.

09:48 So mostly you're using Python for the post-processing.

09:52 So analyzing the data once it's come out of the model.

09:55 And this, I guess, one reason most of them are in Fortran is kind of legacy issues that most climate models out there have been developed for decades.

10:04 So when they started, Fortran was the only thing available.

10:07 And the other one, I guess, is, well, I guess from a speed point of view, you know, with Siphon and things like that these days, you probably could do Python, write a climate model in Python.

10:16 And I think some very simple models written in Python do exist.

10:20 But in general, yeah, the models are written in Fortran and the post-processing data analysis is done with Python.

10:26 Right.

10:26 Okay.

10:26 Yeah, that's pretty interesting.

10:27 It scares me a little to think of that much Fortran.

10:30 Oh, yeah.

10:31 But that's okay.

10:32 Yeah.

10:33 Yeah, when I started in – I studied engineering for a while and they said the most important language you're going to learn is Fortran.

10:40 And I said, please let me take C++.

10:42 They said, nope, you have to learn Fortran.

10:44 Then you go do these elective courses if you want.

10:45 Like, oh, okay.

10:46 I see how that's going to be.

10:47 But, yeah, it sounds like it really might be the case in climate science.

10:51 That's cool.

10:51 So you talked about, like, tremendous amounts of data.

10:55 Probably parallelism is, like, an important thing there, right?

10:57 A lot of the problems are embarrassingly parallel in that you want to just do the same process on every single latitude point or something like that.

11:06 So there's no kind of talking between analysis points.

11:09 Or you want to do the same process across lots of different climate models.

11:12 So, yeah, so some people will write, I guess, parallel code explicitly, if you like, using the multiprocessing Python library or something like that.

11:21 Or, increasingly, I guess, it's kind of built into the package they're using.

11:25 So Dask is, I'm sure, one that's been talked about on previous episodes of your podcast.

11:30 But that's kind of built into XArray, which is kind of a build-on of pandas for multidimensional arrays.

11:36 And so it kind of chunks your problem and does all this parallel scheduling and stuff for you.

11:41 So some people are doing that.

11:43 But sometimes I feel like it's a little bit like everyone talks about big data, but most people aren't actually in that space with the analysis they do.

11:51 And it's a little bit the same.

11:52 I think most people are kind of still in the medium data space where even if they're dealing with a big data set, they're only interested in a smaller subset.

12:01 And you can kind of get around the need for writing parallel code by just using the numpy functions that allow you to vectorize your problem rather than looping.

12:10 And a lot of the packages in climate science kind of do lazy loading where you can kind of load all the metadata about the data first and have a look at what's in there.

12:19 And then you can just actually load the data that you want instead of the whole thing at once.

12:24 Okay.

12:24 Yeah.

12:25 That sounds pretty interesting.

12:26 Do you leverage like GPUs, you know, the running on GPUs for computational stuff?

12:32 Any or is it mostly just straight CPU?

12:34 I think I've spoken to people who are using GPUs as well as CPUs.

12:38 So I know it can be done.

12:39 I haven't personally.

12:41 But yeah, some people do.

12:42 Yeah.

12:43 Yeah.

12:43 Okay.

12:44 That's cool.

12:44 Yeah.

12:45 There's probably even some computational bits you can plug in that like what you're actually doing could harness the power of GPUs maybe.

12:52 So all this data, is this like in a giant database?

12:56 Is it in a bunch of CSVs?

12:58 Where do you get your data from?

12:59 Most of the output from client models these days goes into a file format called NetCDF.

13:04 I think Network Common Data Form must be what it stands for.

13:08 So it's a subset of HDF5.

13:11 So it's a self-describing format.

13:13 So it carries all its kind of metadata with it, which is really cool.

13:18 Right.

13:19 And then in kind of weather, climate, ocean, science, there's a whole set of CF conventions they're called.

13:25 So climate and forecast conventions around what metadata you use in the file.

13:29 So how exactly do you describe the time axis?

13:33 Do you say days since YYY slash MM slash DD?

13:38 Yeah.

13:39 So all these things so that then when people build libraries in weather and climate science, they can assume this basic level of metadata compliance in the files, which means that they can write functions that really speed up your analysis because they can assume certain things about the data.

13:55 So yeah, NetCDF in this kind of metadata format is kind of ubiquitous through weather, ocean, climate, science now.

14:04 All right.

14:05 Yeah, that's interesting.

14:05 So it sounds like maybe if you've got this common format, like a bunch of different libraries and packages can read and understand it.

14:12 So what are the major packages that you might use in Python to work with climate science?

14:17 Probably these days, I mean, first off, you would be installing your environment using Conda for sure.

14:23 So a big headache used to be just getting all the non-Python dependencies installed.

14:28 So obviously there's a lot of our libraries that are C underneath or there's NetCDF libraries that have to be installed and things like that.

14:35 And they used to be a complete nightmare.

14:37 And so Conda has absolutely been a game changer.

14:39 So yeah, people are installing their stuff with Conda.

14:42 Yeah, and Conda gives you basically a virtual environment, but it delivers the packages pre-compiled in a binary form for your OS rather than via source like pip or something like that, right?

14:53 But it used to be that if you wanted to use a particular library that had C dependencies or something, you had to figure out how to install those C dependencies yourself as well as pip installing it.

15:02 But now it's just, yeah, you go, Conda install, one line, it's all done.

15:05 It's amazing.

15:07 But yeah, so basically in terms of the main libraries that get used, so I mentioned XArray before.

15:12 So basically that library takes Pandas, which is obviously that labeled array concept for two-dimensional arrays and then expands it for multi-dimensional arrays.

15:23 And it was actually a climate scientist, Stephen Hoyer, who initially wrote it.

15:28 And now it's kind of been taken up by the broader PyData community.

15:31 But yeah, so that one is very popular.

15:35 And then the other one is the Met Office in the UK have written one called Iris, which is similar to XArray, except that XArray will let your files not be so kind of CF, net CDF metadata compliant.

15:50 So if you have files from a project that isn't very strict on their metadata and their net CDF files, it'll work just fine.

15:56 Whereas Iris really demands those things of you.

16:03 are very good about the metadata and stuff.

16:06 Because it's just like, you can do tasks slightly faster if you like, with less commands because it can make more assumptions of the input data.

16:17 Yeah, so I use Iris.

16:19 If you're going to draw results, you probably want to have some strictness on the data you're basing those results on, right?

16:24 Some people find it overly restrictive in that they want some files that don't have, you know, standard banner data type things.

16:30 So they'll go the XArray route.

16:32 But basically, if you're a climate scientist, you're kind of choosing between Iris or XArray for the bulk of your work.

16:39 So for input-output to net CDF files, for calculating basic statistical quantities and basic visualization and stuff like that,

16:46 it's all kind of, they're the core libraries you'll kind of leverage off.

16:50 This portion of Talk Python To Me is brought to you by ParkMyCloud.

16:55 Every second your cloud servers are running is costing you money.

16:59 Cut your monthly cloud spend and stop paying for idle instances and VMs with ParkMyCloud,

17:03 a cloud cost management tool that turns off resources when you don't need them.

17:08 From their dashboard, automatically schedule your instances to be turned on or off,

17:12 saving you as much as 65% or more on your cloud spend.

17:16 Manage databases, auto-scaling groups, set up logical groups and servers

17:20 to turn off during nights and weekends when they're not in use.

17:23 Whether you're using AWS, Azure, or Google Cloud, it's easy to save money with ParkMyCloud.

17:28 Try ParkMyCloud and see why it's chosen by McDonald's, Capital One, Unilever, Fox, and more,

17:33 saving customers tens of thousands of dollars every month.

17:37 Visit talkpython.fm/park and cut the cost of your cloud today.

17:42 That's talkpython.fm/park.

17:45 It seems to me like the whole Python data science space in the last five, ten, definitely five years,

17:52 has gotten really polished and really blown up.

17:55 Probably it was a little harder to work with Python and stuff in the beginning, right?

17:58 When I started, yeah, obviously Condor didn't exist.

18:01 Iris and X-Array weren't there.

18:03 There was something a little bit like those that existed.

18:07 Even these days, they're taking it a step forward.

18:09 And I'm not sure if you've heard of the GeoViews or HoloViews library for visualization in Python?

18:16 Yeah, I don't think I've heard of that one.

18:17 Tell us about it.

18:17 So basically, I guess it's getting into that kind of declarative visualization space,

18:23 the idea being that for data exploration, you don't want to spend all your time kind of

18:27 tweaking the axes of a plot and deciding on exactly which type of plot you want to create

18:33 to look at this particular data.

18:34 You basically just want to tell the library the characteristics of your data and have it decide

18:40 what the best axes would be, what the best map projection would be, and all those things.

18:44 So there's kind of two major libraries getting into that space.

18:49 One's called HoloViews and the other one's called Altair.

18:52 So HoloViews is much more established, I guess.

18:55 And it can kind of use Matplotlib or Bokeh under the hood, depending on whether you want

18:59 like a static image or an interactive image.

19:01 Right.

19:01 That sounds great.

19:02 And you can maybe almost publish it to the web really easy with Bokeh, yeah?

19:05 Yeah.

19:06 So with HoloViews, you can definitely do that.

19:07 But anyway, so HoloViews doesn't have support for geographic plots of the type on a world map

19:13 of whatever kind of map projection you'd like, which is a common thing in climate science.

19:17 So the Met Office, again, have developed GeoViews on top of HoloViews.

19:22 And so basically, the idea being that you just throw your data at it, you give it a description

19:26 of the basic characteristics of your data, and it figures out the rest.

19:29 And you can have a static image, or you can have an interactive image that you can publish

19:33 to the web.

19:33 And this is really kind of taking down the barrier to kind of what I was talking about before,

19:40 of just that task of visualizing the data you've got quickly and easily.

19:44 Right.

19:44 Yeah.

19:45 That's awesome.

19:45 And the more the system can do it automatically, just look at it and go, well, it looks like

19:48 the axis should be this, and it should all do this.

19:50 Like, that's really great.

19:51 These type of things are just game-changing in terms of the amount of analysis you can get

19:56 done.

19:56 You know, you're not spending your days mucking around with axes and map projections.

20:00 It just happens.

20:01 And what used to take weeks now takes a couple of hours.

20:03 Oh, that's awesome.

20:05 So either now or going in the future, how do you see, like, machine learning and AI starting

20:11 to work its way into, like, analyzing climate data and making predictions and stuff?

20:15 There is some active stuff happening in that space.

20:17 I was at a conference last year at Lawrence Berkeley National Lab in San Francisco, and there

20:23 was a group there looking at applying machine learning to weather and climate type problems.

20:28 So I think it'll definitely happen.

20:30 I think at the moment they're in that kind of space where they're trying to figure out

20:34 which problems are kind of amenable to machine learning type solutions.

20:41 It's almost like it doesn't make sense to apply to everything, but there will be certain

20:46 questions within weather and climate science that could be answered much better by applying

20:52 that.

20:53 So it seemed like they have, if you like machine learning as a solution, and now they're trying

20:58 to find good questions to answer with it.

21:02 Yeah, there's a bit of that.

21:03 Yeah.

21:03 That was kind of interesting.

21:04 It was kind of, it was an interesting conference.

21:06 Oh, I'm sure.

21:07 I know that machine learning has been taught to do things like look at mammograms and predict

21:13 breast cancer as good or better than professionals, right?

21:17 At least under some circumstances.

21:19 So it feels like, you know, it's really good at looking at pictures and finding subtle, subtle

21:25 trends that people might miss.

21:26 And I'm just wondering, like, it seems like there's probably some good ways to use it to

21:30 understand things that are subtle, but then explode later in climate evolution.

21:36 Well, particularly like you could imagine it could look at a weather map and identify the

21:40 fronts or other weather features.

21:43 And it's just a question of, would it do that better than existing ways that we do it, where

21:48 you might look at a temperature field and you can look at the gradient, the temperature gradients

21:52 and identify it.

21:53 And it's just, the question is, would it actually do it better than we do it now?

21:56 Or yeah.

21:57 So that's kind of the, I think the question at the moment is figuring out what it would be

22:00 most useful for.

22:01 Sure.

22:01 I guess one of the real challenges is if you're trying, especially for prediction,

22:05 if you're trying to predict the future, you know, in the longterm, not, not like what's

22:09 it going to be like tomorrow in this town, but like, what's it going to be like in 20 years?

22:12 Machine learning is really good when you feed it lots of examples.

22:15 It was like this.

22:16 And the outcome was that it was like, is that, you know, like that, the breast cancer one

22:20 was given like a hundred thousand mammography scans.

22:23 And the answer, then it was asking more, right?

22:25 We only live from the past towards the future once, right?

22:29 Like we can't, we don't have a bunch of examples to feed it, right?

22:32 Yeah, you're right.

22:33 Which makes it difficult in a climate sense because there's only one realization of our

22:36 past climate.

22:37 Whereas the applications might be more in forecasting and stuff where a cold front has come across

22:43 Portland, Oregon hundreds and hundreds of times in the past.

22:46 So yeah.

22:48 So maybe there's more of a weather forecasting application.

22:51 Right.

22:51 That's interesting.

22:52 Not as cool as if we could answer major, major questions with machine learning, but who

22:56 knows?

22:56 Maybe someday someone will figure out a way to make it work, but speaking out of figuring

23:00 out how to make it work.

23:01 One of the challenges I suspect is all of this programming and these data tools and programming

23:07 tools are great, but you said your experience was you weren't giving tons of programming support

23:11 as a scientist, and that's probably true for many of them.

23:14 So what do we need to be doing to help like instill those right real programming skills?

23:20 I don't think the situation, I mean, I only graduated.

23:22 I graduated from my undergraduate degree back in 2008.

23:25 I don't think the situation has changed much since.

23:28 And so I think obviously software carpentry is a big one, a big organization that's kind

23:33 of helping with this.

23:34 And I think you've had Greg Wilson on a, on a previous episode talking a little bit about

23:38 software carpentry and everything they do.

23:40 Yeah.

23:40 I actually had Jonah Duckles on the podcast and that was back on episode 93.

23:45 And we talked about software carpentry, which is a group that basically works with scientists

23:49 around the world to help them become better programmers and better data scientists.

23:53 It's really cool.

23:54 So there's some aspect that with, with climate science stuff.

23:58 Is there a particular course for climate scientists or is it just the general stuff that software

24:03 carpentry teaches?

24:03 I'm actually working on some, I guess, climate specific stuff.

24:07 Basically the Australian Meteorological and Oceanographic Society, which is like the professional society

24:13 in Australia for weather ocean climate scientists have hosted a software carpentry workshop alongside

24:18 their annual conference for the last four years now.

24:21 And so I'm, I'm basically in the process of writing those material, those teaching materials

24:25 we've been using for that up into a, I guess, a more climate specific course that'll be hosted

24:31 with data carpentry, which is actually a sister sibling organization of software carpentry that

24:36 actually has discipline specific materials.

24:39 Yeah.

24:40 So if you're, if you're lucky enough to, I guess, to be a young scientist coming through who, who

24:44 has someone at your institution who's really into software carpentry or a professional society

24:49 like Amos who, who offer these types of things, then I guess your situation is, is a bit better

24:54 than, than mine was when I came through.

24:56 But, there's still, yeah, a long way to go.

24:58 And a lot of people who kind of slipped through the cracks and just kind of get lumped on a

25:02 research project with really not much assistance at all in terms of learning how to program,

25:06 which is, which is really sad on a lot of fronts, sad personally for them, but also in

25:10 terms of limiting the progress of their research, it's terrible.

25:13 Right.

25:14 I also suspect that there's a, a sharing component that's limited as well, right?

25:18 Like if, if you can write, go to MATLAB and write a script that will analyze something

25:24 and come up with an answer.

25:25 That's one thing.

25:25 If you could form it in a way and say in Python, in a way that is reusable and general and tested,

25:32 then you could put it out on GitHub and all sorts of people could use it and add to it.

25:36 And I think there's a pure scientific research sort of sharing the knowledge upside there as

25:43 well.

25:43 When you think about where, I guess the state of climate science is at the moment,

25:47 and I guess for lack of a better word that the computational literacy of the community,

25:51 you're basically at the point where you're just introducing people to, to say GitHub and

25:56 to having their own personal code under version control and things like that, to, to go the

26:00 next step and have, and have them writing code that the wider community can use.

26:05 And it's, you know, it's up on GitHub and people are submitting pull requests and, you know,

26:09 it's tested and there's continuous integration.

26:11 And all those things is just, it seems like there are exceptional, I guess, individuals

26:16 you like within, within the discipline who, who do those things, but they're very rare.

26:20 And so I think the discipline is another five or 10 years off kind of just having the, the

26:26 computational literacy, if you like to, to do those, those things that would, as you say,

26:30 make, make life so much easier for everyone.

26:32 Yeah, it would really be great, but that, that's definitely, those are some advanced ways of

26:38 working and if, especially if, if people don't get started early in that, right.

26:43 They, they don't build up those skills solely over time.

26:45 You know, they're busy solving real problems with science and computation and probably with

26:49 some thing like MATLAB or something.

26:51 Yeah, I can definitely see how that's, that's really, really quite challenging, but it's, I

26:56 think it's important for a lot of things.

26:58 So hopefully the software and data carpentry folks can keep that going.

27:01 That's great.

27:02 They're doing great things.

27:03 They're an amazing organization.

27:04 What else would you recommend to people out there who are scientists to level up,

27:08 their skills or to keep improving or whatever?

27:10 Really just encouraging people to, to kind of participate in the wider Python community.

27:16 Go to a PyCon conference or a SciPy conference or something like that, which, which seems obvious

27:20 to people who are developers in Python and stuff, but academics usually don't really think about,

27:26 you know, attending conferences outside of their research discipline.

27:30 Right.

27:30 They might go to a climate science conference and not PyCon, for example, right?

27:34 The PyCons that I've been to, there are a lot of kind of support staff there.

27:38 So the support staff at the institution that I work at or at the Bureau of Meteorology or

27:42 wherever it might be, they're all there, but the actual scientists who are doing a lot of

27:46 data science with Python aren't and it probably wouldn't occur to them.

27:49 I must say my, my first kind of PyCon really blew my mind in terms of compared to an academic

27:54 lecture, academic conference with like, you know, the recorded lectures that go up straight

27:59 away.

27:59 Even simple things like you could use your own laptop rather than having to give them a USB

28:03 and put it into their Microsoft Windows machine.

28:06 And people like doing things like live, like live coding on the screen and stuff, as opposed

28:10 to just a static PowerPoint presentation.

28:12 And it was kind of, it was all very mind blowing compared to an academic conference.

28:16 So I think, yeah, I'd really encourage people to kind of try and get actively involved in the

28:21 community in some way and really kind of broaden their horizons and keep their skills improving.

28:25 Yeah, I would totally second that.

28:27 I think especially the PyCon conference conferences, you know, pick your continent, maybe there's a lot of

28:34 science conversation and data science conversations going on there.

28:38 Like at PyCon US this year, the keynote speaker was Jake Vanderplass on the first day who opened it,

28:45 basically surveying all the different ways people use Python, how it's used in astrophysics,

28:49 how it's used in space telescopes and all sorts of things.

28:53 And it was very, I think it would actually be a really welcoming environment to people

28:57 who have, say, some programming skill and some programming ambition, but mostly are doing data analysis type stuff.

29:03 I think they should definitely check it out.

29:04 Yeah, for sure.

29:05 Even like in Australia, there isn't quite a critical mass to have a standalone SciPy conference.

29:10 But at PyCon Australia, there's always a data science track.

29:14 So you don't have to kind of sit there and, you know, listen to talks on web development and Django.

29:20 Yeah, exactly.

29:21 And stuff like that.

29:22 I don't care anymore about what Instagram did with Python.

29:25 I'm over it now, even if it was cool.

29:27 No, you can definitely have an almost a pure data science experience when you go to one.

29:31 Sure.

29:31 Oh, that's awesome.

29:32 Yeah, absolutely.

29:33 The hallway track of those conferences is great as well.

29:36 You know, just the people you meet doing similar stuff.

29:38 A lot of the times you don't have much in common with researchers once you get outside of your specific niche.

29:44 But you often have a lot in common with, in terms of the Python libraries you use and the types of basic data analysis you do.

29:51 So you end up, you find, oh, I have a lot in common with all these people who are from very different fields.

29:55 And that doesn't usually happen if you're just talking about your specific research discipline.

30:00 Yeah, I was really blown away at how many different people kind of solve similar problems with similar tools.

30:06 But, you know, the outcome is really different because of the questions they ask.

30:08 Cool.

30:09 So I definitely want to, yeah, encourage people to go to the local Python conferences.

30:13 They're really good.

30:14 Some of that software carpentry stuff we talked about, it sounds like it would be really beneficial to one of the major problems in science is the whole reproducibility thing.

30:24 Right.

30:25 The better you can get your code on GitHub, maybe create a Docker image that people can download and run exactly.

30:30 Like the more that you could share, distribute, and sort of save your computation.

30:35 It seems really valuable.

30:36 The reproducibility crisis, I guess, is a big one.

30:39 And it's, so I guess the central tenet is obviously if someone publishes some research that they describe their methods in such a way that someone else working in that field would be able to reproduce their results if they wanted to.

30:52 It turns out that most papers published these days aren't reproducible.

30:55 And it's for a variety of reasons.

30:57 Some of them are to do with experimental design, the availability of the data sets that were analyzed and things like that.

31:03 But a big one is computational reproducibility in that most papers don't make the code available that they wrote to do the analysis or the details of, you know, the software environment that that code was executed in.

31:16 So, yeah, the software carpentry skills and just the basic things about, you know, using version control and all those types of things are huge if we're ever going to actually get past the reproducibility crisis and get to a point where our research is truly reproducible.

31:30 Yeah, I definitely think that's something that's important.

31:33 I mean, the more that our research depends upon code and data, the more important it is that I think that that's accessible.

31:42 This portion of Talk Python To Me has been brought to you by Rollbar.

31:45 One of the frustrating things about being a developer is dealing with errors.

31:49 Relying on users to report errors, digging through log files, trying to debug issues, or getting millions of alerts just flooding your inbox and ruining your day.

31:58 With Rollbar's full-stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster.

32:05 Adding Rollbar to your Python app is as easy as pip install Rollbar.

32:09 You can start tracking production errors and deployments in eight minutes or less.

32:13 Are you considering self-hosting tools for security or compliance reasons?

32:17 Then you should really check out Rollbar's compliant SaaS option.

32:21 Get advanced security features and meet compliance without the hassle of self-hosting, including HIPAA, ISO 27001, Privacy Shield, and more.

32:30 They'd love to give you a demo.

32:31 Give Rollbar a try today.

32:33 Go to talkpython.fm/Rollbar and check them out.

32:37 I know the guys at the Large Hadron Collider are doing some really interesting stuff with taking their code and putting it in an escrow thing.

32:45 It's not GitHub, but it's something kind of like GitHub where it's like we promise we won't change or delete this code because it's linked to by this paper.

32:53 Things like that.

32:54 It's been a while.

32:55 I forgot what the name was.

32:56 It was like two years ago when we talked about this.

32:58 But it sounds really important.

32:59 Some papers that I published recently, I mean, I give the link to GitHub of where the code is if people want to see, I guess, the latest version, if you like.

33:06 But then also, I guess, at the time of publication, you kind of take a snapshot, if you like, of your code repository at that point.

33:14 And there's websites.

33:15 There's one called FigShare and another one called Zenodo where you can put – they kind of call it the long tail of your research.

33:21 So things like code, supplementary figures, supplementary tables and stuff.

33:26 You put it all out there, it gets a DOI, a digital object identifier, and those –

33:31 Yeah, that's right.

33:31 And those websites guarantee that they're not going to disappear and it'll be around for, you know, all eternity for people to be able to get.

33:38 So, yeah.

33:39 So, in general, the best practice these days is definitely give people the link to GitHub so they can see the latest version, but also have a version up on persistent storage place like FigShare or Zenodo.

33:50 So if you ever change the name of your GitHub repo or something like that, it doesn't all just disappear.

33:55 Right.

33:55 Exactly.

33:56 I mean, it's great that it is – GitHub is there and you do get the latest version, but you could delete that if you just were in a bad mood or whatever, right?

34:03 Or your account gets suspended or hacked or whatever, right?

34:06 You want to – you definitely want to be careful.

34:07 So that sounds really cool.

34:09 And, yeah, that's what the Large Hadron Collider guys were talking about as well, the digital object identifiers for the papers.

34:14 That sounds great.

34:15 What about, like, Docker containers and these other things where you sort of ship, like, whole systems?

34:20 Do you guys – do you see much of that being used?

34:22 I hear people talking about it as a possible solution.

34:25 I think because I'm kind of more tuned in than most climate scientists to the kind of reproducibility scene.

34:32 But I think in general, it would probably be asking too much, I guess, of a regular climate scientist to be so up on these things to be able to do the whole Docker thing themselves.

34:42 So I think it's a possible solution in future, but it's kind of out of the reach of a regular climate scientist right now, if you know what I mean.

34:50 Yeah, yeah, I definitely know what you mean.

34:51 I think it's out of reach of a lot of developers as well.

34:53 Like, not that many people are actually doing, you know, complicated Docker things or containers in practice.

34:59 But still, it seems like it would be a great solution because you can – you capture the whole platform and its dependencies, not just the code plus the data.

35:08 I mean, I guess a lighter scale – very lighter scale version of that is Conda environments that – because you can post your environment on anaconda.org.

35:16 And then someone can just kind of go Conda ENV and then the URL to where you posted it on your profile on anaconda.org.

35:24 And then it'll install your software environment.

35:26 Oh, that is really cool.

35:27 I didn't know that that was a feature they had.

35:29 That's awesome.

35:30 Yeah, it's really cool.

35:31 And they actually – they go a step further now.

35:33 They've got a – I guess a bit of a beta thing called Conda Capsule.

35:37 Well, Capsule.

35:38 I think it's K-A-P-S-E-L.

35:39 Yeah.

35:40 In that one, instead of posting an environment to anaconda.org that someone can install, you basically write the specifications of the libraries you use in essentially your readme file.

35:51 And then Conda Capsule just takes your readme file and then installs it all.

35:55 Yeah, that sounds really cool.

35:57 They're doing all the stuff around Conda and kind of Conda environments is really exciting for kind of things that are doable for a regular scientist right now because it takes one line of code.

36:07 Yeah, which is totally possible.

36:09 It sounds like the Conda guys are doing tons of good stuff.

36:12 I know with the anaconda distribution and all that, but it sounds like even – maybe even more than I realize.

36:17 That's awesome.

36:18 We talked a little bit about reproducibility and the various things people should be doing.

36:22 What are some of the steps that you think are within reach to get people to sort of become better at participating in the wider community, do more stuff on GitHub, more libraries, things like that?

36:35 One of the major issues I think up until now has been – up until recently has been there isn't really an incentive for people to take the code that they've hacked together for themselves for the research problem that they were doing today for this particular paper and put in the time and effort to make it general enough that the wider community could use it and they could pip install it or Conda install it and those types of things.

36:57 There's a few journals out there now.

36:59 There's the Journal of Open Research Software, which I'm actually an associate editor on, or there's one called the Journal of Open Source Software as well.

37:05 Basically, the idea is because citations of academic papers is currency in academia.

37:11 That's how you get promoted.

37:12 That's what your career depends on.

37:14 That's the kind of incentive.

37:15 If people can write a paper documenting this research, scientific research software that they've released, and then if people start citing that every time they use it in their papers, then that's the kind of incentive that academics kind of need to go that extra step and actually take their personal code and make it community code.

37:34 That's starting to happen, and certainly we're getting a lot of submissions at the Journal of Open Research Software and stuff.

37:39 Now, I guess it's a matter of it becoming part of the culture for people to actually look for those papers and cite them in the methods section of their papers just to absolutely make sure that the authors of that software are getting the academic credit they deserve for it.

37:54 That's a really good point.

37:55 You're right that it definitely is the currency of academia.

37:59 I guess sort of making that a habit, right?

38:01 It's one thing to go to GitHub and grab a package and just go and do some analysis with it, but I guess it almost would be great to have in the package, and if you're going to cite this or use this in academic paper, please, this goes in the bibliography or whatever, right?

38:17 Just getting people to start the methods section of their paper with just one paragraph talking about the software that they use and citing the actual publications that relate to that software.

38:27 That would make a huge difference.

38:28 And so it's kind of – it's getting that to be a cultural thing where our method sections always have a bit about the statistical methods that we use.

38:36 Why don't they have a paragraph on the code that we used and where we got it from?

38:41 I think that's a great idea, and it's so low-hanging fruit, right?

38:45 It would be really easy to do that.

38:46 Yeah.

38:46 So hopefully as these journals get bigger and people start pushing people to cite them, that'll happen.

38:52 And in 10 years' time, it'll just be every method section has a paragraph on code.

38:56 I definitely see that as a possible future for us.

38:59 So speaking of possible futures, let's have a conversation a little bit about climate change.

39:05 You know, you've studied it more than almost – I'm sure you've studied it more than anyone else I've spoken to, more than most people, let's say.

39:13 So what do you think?

39:14 Climate change?

39:15 Is this a real thing?

39:16 Do people cause it?

39:17 Absolutely.

39:18 I guess the frustrating thing from a climate scientist's point of view is that that question, is climate change real and are humans causing it, hasn't been an active research question for more than 30 years.

39:29 It's been accepted in the scientific community for at least that long.

39:33 And so, you know – Well, hold on.

39:35 If I turn on the news here in the U.S., there's always some other – there's like half of the people on the TV channel saying, oh, I have 400, 4,000 scientists who say this is not real.

39:46 Just give us a sense of like why it's so accepted and whatnot.

39:51 You know, maybe put a little pushback on that vision that gets projected by news.

39:56 Yeah, those ones are very – even their statistics where they go, you know, 97% of climate scientists agree.

40:01 It's actually 100%.

40:03 Like I've been going to climate science conferences for a decade now and I've never, ever, ever sat in a presentation or read a paper that suggests that climate change isn't happening.

40:12 It's a given and it's been a given for a very long time.

40:15 So the disconnect between kind of what's being discussed in the science community and what's being discussed in public is very frustrating.

40:23 I'm sure it is, yeah.

40:24 And it's not unique.

40:25 Obviously, like if I went to a health conference, you know, the difference between, you know, public discussions and policy on health would be very different from what experts in health think should be happening.

40:37 And it's not a unique thing, if you like, but it is particularly frustrating just because it – yeah, that's a question that we moved on from.

40:44 30 plus years ago.

40:45 Yeah, I think one of the things that like makes people feel that this is more up for debate than it is at least in the U.S.

40:53 I haven't – I don't know outside the U.S.

40:55 But there's this tendency of if you're going to present something, you're going to present both sides of it.

41:00 So if you're going to talk about climate change, you have somebody for and somebody against.

41:04 And that makes it feel like it's 50-50.

41:05 And it's not 50-50.

41:07 It's like 1,000 to 1 or something.

41:10 It doesn't seem like it really needs like this other side to say, well, here's the other side of the argument.

41:15 This guy, he only works for the – you know, this coal company.

41:18 But he's really studied it.

41:19 It'll be fine.

41:20 You should listen to him.

41:21 No, that is – it gets you tearing your hair out when you – the people that they put up on television.

41:26 Yeah, for the against case, like it's in a debate on physical reality.

41:30 Yeah.

41:31 Right.

41:31 So I'm with you.

41:32 I think this is an important – this might be the fundamental challenge of our generation, our generations, you know, probably multi-generational actually.

41:41 But, you know, what are some of the things you think we can do as just citizens of the world?

41:47 And what do you think we can do as people with the magical wand of software development where we can actually make things that analyze change and so on?

41:56 I definitely think getting, I guess, politically active is important.

41:59 So, you know, writing to congresspeople, attending protests, getting involved in, you know, community organizations, whether they're against a pipeline that's being built or they're encouraging people to divest their money from fossil fuel companies or whatever it may be.

42:12 I think it's got to just be that groundswell of kind of grassroots activism that gets things changed because, I mean, there are a lot of vested interests in keeping things the way they are.

42:24 I mean, the biggest companies in the world are fossil fuel companies like, you know, ExxonMobil and companies like that.

42:29 So, it's a formidable opponent and it really needs a massive grassroots effort.

42:34 And I guess the more I've got to know about the issue, the more I've kind of got involved in those types of things.

42:41 And I certainly, when I first started as a climate scientist, I wasn't politically aware or active at all.

42:49 But all the, I guess, organizations that I've been involved with as I've become more active are, you know, really crying out for help with IT stuff, whether it's their website, whether they want to analyze some voting data or whatever the case may be.

43:03 Like, if you rocked up at a grassroots organization doing, you know, climate activism and said, hey, I've got a bunch of IT skills, they'll fall over themselves with gratitude and with things that you can help with.

43:15 And you'll be up to your eyeballs in way too many things that you've got time to do.

43:19 But yeah, no, definitely, I think developers and people with IT skills have a particular role to play just because they're such important skills.

43:28 And these organizations are just crying out for help with that kind of stuff.

43:31 So yeah, if you want to get involved, there's absolutely ways to put your skills to good use.

43:36 Yeah, that sounds like a really good thing to do.

43:38 Certainly, if you gave a couple of hours a week or something to one of these organizations, and if they really are totally missing the software side of the story, right, they don't have a lot of people to help out there, you could probably make a pretty big difference there.

43:52 Oh, yeah, absolutely.

43:52 So how much of this do you think is like a political fight versus an economic fight?

43:57 I mean, like every time you take an action or you buy something or you don't buy something, you're kind of voting with your dollars or your pounds or whatever, right?

44:06 And do you think it's more important to act as consumers or to push on the political side?

44:13 Or where do you think the leverage points are?

44:15 I mean, I think it's both.

44:16 I mean, I think I used to think that it was just definitely talk to politicians and get them to change their minds.

44:21 But if you read, you know, the writing of major climate activists like Bill McKibben and stuff like that, he has a whole book about the fact that he started all his campaigning in Washington thinking he had to be there to, you know, tell the politicians.

44:35 And then in the end, he comes to the realization that it's corporations that rule the world.

44:38 And that's why 350.org, his organization, focused so much on divestment and of actually thinking about where you're spending your money.

44:47 Because at the end of the day, maybe where you spend your dollars is more important than what you write down at the ballot box in terms of the impact it has.

44:55 Right. And sure. And once the voting is kind of done, like things are set for a while.

44:59 Right. And so but you buy stuff every day, you consume things or don't every day.

45:04 That's pretty interesting.

45:05 Here in the U.S., I don't think there's a lot of positive policy that's going to be passed in the next two years on climate stuff.

45:13 But I feel like we still get many, many choices on what we buy, what we don't buy, where we get our energy from, things like that.

45:21 So there's still lots of things that people can do, but I totally agree with you on donating some time to activist groups.

45:27 So maybe what do you think is the most exciting or encouraging development in the last couple of years around sort of positive change to fight climate change?

45:39 And then maybe like what do you think is like a setback that we've had that is unfortunately.

45:44 The most exciting would definitely be just the growth in renewable energy.

45:48 So I'm sure it's similar in the U.S., but here in Australia in particular, the growth of people having solar panels on their own roofs and stuff has been huge.

45:58 And it's kind of in spite of government efforts to kind of slow it down, if you like, because obviously, you know, energy companies have a lot of lobbyists and things like that.

46:08 But in spite of that, renewables are just going from strength to strength.

46:12 So that's definitely the most exciting.

46:14 I would definitely agree with you that like, you know, a little bit with the politics versus dollars, like renewable energy is becoming the financially smart choice.

46:22 And once that happens, like it's just forget the politics, right?

46:26 It's going to solve itself at that point.

46:28 But, you know, we can slow or hasten it for sure through politics.

46:33 Probably your question of what was the most discouraging thing is probably the ability of kind of vested interest to slow things down.

46:39 It kind of, it feels like, it almost feels like sometimes with climate change is that we will eventually get there and we will eventually reduce our emissions significantly.

46:49 But like, it's a slow victory is kind of a loss because of all the heat that will be of accumulated in the climate system in the time it took to get there.

46:58 We have to win and we have to win fast.

47:01 And the fact that we're potentially winning, but winning incredibly slowly is a big problem.

47:07 It's both encouraging and really frustrating at the same time.

47:10 Yeah, so yeah, no, winning slowly is not really kind of an option, but the ability of vested interest to kind of slow things down makes it feel like it could be a very slow victory, which would be really not a victory at all in the end.

47:23 Right.

47:24 All right, well, maybe we'll leave it there with that for the climate science stuff.

47:30 And let me just ask you the final questions that I ask everyone who's on the show.

47:34 So if you're going to write some Python code, what editor do you open up?

47:39 I'm a simpleton.

47:40 It'll be like text wrangler or g edis or a simple graphical text editor like that.

47:46 I used to be kind of self-conscious about that.

47:49 And then I was teaching at a software carpentry workshop.

47:51 And one of the helpers was a core Python developer.

47:53 And he told me that he uses simple graphical editors like that too.

47:56 Ever since he told me that, I've felt really good about myself.

47:59 Hey, if the core developers can do it, you could definitely do it.

48:03 Yeah, exactly.

48:03 All right.

48:04 And a notable PyPI package.

48:06 I mean, you named a few that are involved in climate science.

48:10 Yeah, I thought I might actually give a shout out to a little one, unknown one.

48:14 I use Git Python.

48:15 I'm not sure if you've ever used that.

48:17 It's basically just a hook to Git.

48:19 But basically...

48:20 G-I-T Python?

48:21 Yeah.

48:21 Yep.

48:22 All one word, G-I-T Python.

48:23 And so basically I use it because a lot of...

48:26 With these net CDF files that we use that have the metadata in them, some of the tools that

48:31 have been built kind of in the global history attribute of these files, it keeps a record

48:36 of what was entered at the command line to produce this file.

48:40 And so I can do that.

48:42 Basically, I can have a script that at the end puts in the history attribute.

48:47 At the command line, it was Python, the name of the script, and then whatever input arguments

48:51 it was.

48:52 So I have a complete record of what was entered at the command line to produce this file.

48:56 And then with Git Python, I can also have...

48:59 You know, every commit in Python has a unique 40-character string associated with it.

49:03 I can basically put the first seven or so digits of that so I know which version of the code

49:08 was executed.

49:09 So yeah, I actually use Git Python.

49:11 Yeah, that's awesome.

49:12 Yeah, Git Python helps a lot just with reproducibility down to which version of that script did I use.

49:17 Yeah, and the arguments as well, which is really important.

49:20 Awesome.

49:20 All right.

49:21 So we've heard a lot about the tools, a lot about how we can contribute, maybe some of the

49:26 problems.

49:26 What's the final call to action?

49:29 People are interested, how do they get further involved or do something along those lines?

49:33 I mean, I can give a shameless plug for myself.

49:36 So I have a blog, drclimate.wordpress.com, where I talk a lot about Python.

49:40 I talk about research best practice in general, but a lot of the time that means I'm focusing

49:45 on Python, basically, in climate science.

49:48 So if people want to subscribe to that, that's a good way to kind of keep up to date with things.

49:52 You follow you on Twitter, right?

49:53 Yeah, at drclimate as well.

49:55 Yeah, at drclimate, which I'll put in the show notes, of course.

49:58 Just to kind of, it'd be good to have, I guess, more people in the climate science Python discussion,

50:04 if you like.

50:05 I feel like through some of my involvement in software carpentry with other disciplines, particularly in the biosciences, bioinformatics and stuff like that, the community around R in those languages is huge.

50:17 And they seem to be a lot further down the track in terms of dealing with the reproducibility crisis and releasing packages that other people use.

50:25 I feel like climate science is a smaller community, but we need that strong sense of community around Python to really help with some of those challenges.

50:33 Right.

50:34 Maybe some people who could actually convert something into a package that could be reused and help getting it on PyPI or on GitHub.

50:42 Maybe those type of contributions could be helpful as well.

50:44 Oh, yeah.

50:45 Absolutely.

50:45 Yeah.

50:46 There will be a lot of people that have very interesting code from a research perspective that would need a bit of assistance in actually releasing it.

50:54 Sure.

50:54 All right.

50:55 Well, that sounds great.

50:56 Thank you so much for all your thoughts and sharing what you guys are up to in the climate space.

51:00 Oh, no worries.

51:01 Thanks.

51:01 Thanks a lot for inviting me on the show.

51:03 You bet.

51:03 Talk to you later, Damon.

51:04 This has been another episode of Talk Python To Me.

51:07 Today's guest has been Dr. Damian Irving, and this episode has been brought to you by ParkMyCloud and Rollbar.

51:13 Do you hear that sucking noise?

51:16 That's your cloud provider making you pay for your idle instances.

51:19 Turn on ParkMyCloud, plug the leaks, and save money.

51:22 Visit talkpython.fm/park to get started.

51:25 Rollbar takes the pain out of errors.

51:28 They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course.

51:35 As Talk Python To Me listeners, track a ridiculous number of errors for free at rollbar.com slash talkpythontome.

51:43 Are you or a colleague trying to learn Python?

51:46 Have you tried books and videos that just left you bored by covering topics point by point?

51:50 Well, check out my online course, Python Jumpstart by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.

51:59 And if you're looking for something a little more advanced, try my write Python code course at talkpython.fm/pythonic.

52:07 Be sure to subscribe to me.

52:07 Be sure to subscribe to the show.

52:09 Open your favorite podcatcher and search for Python.

52:11 We should be right at the top.

52:12 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

52:21 This is your host, Michael Kennedy.

52:23 Thanks so much for listening.

52:25 I really appreciate it.

52:26 Now get out there and write some Python code.

52:28 Now get out there and write some Python code.

52:28 Python code.

52:48 Thank you.