#93: Spreading Python through the sciences with Software Carpentry Transcript
00:00 You often hear that we need to teach computer science as a foundational skill. Why? Well, I'm not actually sure many of the leaders pushing this forward have great answers other than jobs. But it is fundamentally important that we do teach programming as a core skill. The reason I believe is that whatever your specialty be that biology, psychology, geo surveys, whatever, they say programming will supercharge that skill. And that's why I'm excited to introduce you to software carpentry, and Jonah knuckles. They are bringing these skills and more to scientists and educators throughout the globe. This is talk Python to me, recorded December 6 2016.
00:41 or developer, in many senses of the word because I make these applications vows and use these verbs to bake this music I constructed. Just like when I'm coding another software design. In both cases, it's about design patterns, anyone can get the job done. It's the execution that matters. Many interests
01:00 in Chomsky, welcome to talk Python, to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode has been sponsored by robar and MongoDB. I want to say a special thank you to MongoDB University for joining talk Python to me as a sponsor, I think both of them for supporting the show by checking out what they have to offer during their segments. Jonah, welcome to talk Python.
01:34 Thanks for having me.
01:35 It's it's great to have you here. I'm super excited to talk about software carpentry, and bringing programming skills to more than just programmers, researchers, scientists, and so on. It's gonna be fun.
01:47 Yeah, I'm excited, Michael, thank you for thank you for reaching out. And I'm excited to share what software carpentry is up to and how we can can build new communities that support researchers all over the world.
01:59 Yeah, it's super cool what you're talking about and what you're doing. Before we get into that, though, let's let's talk about you. What's your story? How did you get into programming in Python,
02:07 I've been using Linux and Unix since I don't know, probably my freshman year of high school. And then I've always kind of dabbled in open source software and tools and did a master's in landscape ecology, really kind of learning GIS skills, and learning how to apply j s skills towards remote sensing problems. And I started using ArcGIS in grad school, and they had this sort of Python ArcGIS scripting object. And it was a really kind of powerful tool for me to use. But I quickly hit up against the limits of how that scripting object work. And really just kind of dived into Python and the open source geospatial world around Python, and really developed my skills in graduate school around kind of applying Python towards geospatial analysis problems. So that was really the motivating factor.
03:02 Yeah, that's, that's really interesting. geospatial stuff is it's fascinating. There's so much data these days, right?
03:07 Yeah, no, and I went on from graduate school. And I was actually doing agricultural intelligence work taking imagery coming from, you know, freely available imagery coming down from satellites, like Landsat and MODIS. And writing Python scripts and our scripts and bash scripts, whatever, we could cobble together to kind of do analysis of tabular data and what we saw from remote sensing data. And really, my chops in terms of an applied programmer were developed in that job post graduate school where we were kind of on deadlines, trying to meet, meet reporting deadlines and things. And that's really where I became more of the language and trying to understand the nuances of Python and the power and the community around Python and the tools that were developed there.
03:54 So yeah, oh, yeah, that's, that's really cool. Taking the satellite data, I think we're gonna see a lot of that, especially with the Internet of Things stuff, right? The ability to take geospatial data, maybe some small bits of monitoring here and there or even control systems, it's gonna be fun to see how that goes. And all sorts of various Yeah,
04:12 that's, that's really a huge area that my organization software carpentry can help people with is to begin to understand how to kind of eat the elephant and build software tools that can take huge streams of data coming from disparate sources and start thinking about how you might build software, how you might take just enough skills as a software engineer, and apply them to your research problem. So you can start to build the tools that you need to answer the research questions that you have. And we're seeing this in so many different research communities all over the world.
04:49 Yeah. So that gets us a little bit to software carpentry and what software carpentry is right so that why don't we talk a little bit about what software carpentry is in this project that you guys have going at software carpentry.org.
05:02 Yeah, software carpentry was was really founded way back in 1997 by Greg Wilson when he was a newly minted PhD in computer science, and he showed up at Los Alamos National Lab as a postdoctoral researcher, and researchers were lined up out the door going, there's a computer scientist here that can help us with our research problems. And he was sort of inundated and overwhelmed, and realized that these researchers had had really a lot of motivation and a lot of excitement to become better programmers, but really didn't have any resources or tools to help them on their way. And so software carpentry was born of that kind of need of these researchers kind of expressing a need for research skills. And the first iteration of it was a five day workshop to teach you all the things. And that went well for a lot of people. But also, by the end of the third day, people were kind of passing out in their chairs and unable to retain any more information.
06:05 And the brain was full,
06:07 it was really overwhelming for the participants. And so about 2012, via some work from some people around the software carpentry community, particularly some people in Wisconsin, as part of the hacker within project started saying, How can we reboot this? And how can we create a really compelling two day workshop, where we train instructors on good adult teaching pedagogy? And how do we take really high quality lessons? And how do we deliver them with trained instructors, impactful, engaging and exciting workshops. And so that's really the work of what we do. Now, as we curate these lessons, they're open source CC by Creative Commons Attribution license, we have a community of instructors around the world that are trained about 900 people have gone through our training so far and become certified instructors. And we're really about capacity building at research organizations to build teams of instructors who talk about how they teach, and try to push the envelope to always improve how they're teaching, and to develop new lessons, and really push the skills that researchers are demanding in data driven research. So a lot of this is the tools of software engineering, how to build software. But you know, we start at a very basic, basic, basic level, because researchers really need that foundational explanation of how these things fit together. That's really great.
07:35 I feel like certainly, when you're in research, knowing some programming is really, really empowering. And I used to work at a place with a bunch of cognitive science PhDs. And not everybody there was had some level of programming, but many of the people, the professors and so on, would do really advanced work, maybe it was in MATLAB or something, we'd have to take their MATLAB algorithms and convert them to real programming languages. But it made me what they did more or less possible. So I think, you know, there's this whole learn to code and right now recording during the Hour of Code week, which is, you know, pretty interesting. So all these people in the world are learning to code. And I think the the biggest takeaway, or the biggest benefit that seems to get lost is this ability to do even a little bit of programming can superpower whatever your specialty is, so, you know, if you're a psychologist, you can analyze data like nobody's business, compared to like people who can't write other PCs or whatever, who can't similarly in biology or agriculture, like even for farmers, right? So I think this is, is really great. What kind of stuff do you guys cover? And they will go into more detail later, but like, what, what are the general areas that you teach? What do people learn when they come work with you guys? What do you classify? We're basically
08:56 so we talked about kind of having a flagship workshop, and this is our Bass Lessons. And the target audience originally for this was scientists who were coding, but we're coding poorly. But as it turns out, there's actually a huge community, there's sort of two communities that come to our classes. It's about 6040, split, about 60% of people coming to our workshop say, they have not coded but they know that they have a research problem, that code could help them be more productive and be more efficient at what they do. And about 40% come and they say, I've been coding as a practicing scientist for a long time. But nobody stopped and taught me how I should be doing it. And that's why I'm here. And so the reality, you know, you talk about this researcher who's who's got this really complex MATLAB code. The reality is that there's no intentional time spent for a lot of researchers, for a lot of scientists to say this is how you build software. This is how you build programs. They're just thrown in and told to sink or swim in. Part of the the kind of academic process. And so our goal with our sort of flagship two day workshop is to really kind of give you a rope to help pull yourself out of the water to help you you kind of build a mental model for there's this ecosystem of open source tools that are out there, they can fit together in ways that you can build, it's almost like learning that, here's a bucket of Legos, let's put the things together. And so what we teach in this foundational workshop is the Unix shell, and how to repeat tasks in the Unix shell. And we'll go into more depth about the specifics of these things, but basically how to be more efficient at repeating tasks in the Unix shell, and then how to extract ad hoc shell commands into a script that you can save and then call again and again. So those are the two major goals of that lesson, it takes about a half day, the Python lesson takes about another half day, sometimes another full day to have two ends. And we're not teaching you, you know, all the data types in Python, we're basically trying to teach you that Python is a nice high level language, it's a good glue to a lot of tools that are out there. And it's a great way to start thinking about how to build software that tackles your research challenge. And so we're really trying to teach them, a lot of researchers and I know a lot of listeners to this podcast might be surprised by this. But a lot of researchers are never told that there's these things called functions that you can wrap code with, right? Where you can make a thing that's callable in other places, right. So a lot of researchers write very procedural code, and we teach them about functions and that functions are callable functions should be small. And we teach them some of these kind of aspirational things, we kind of build an ideal for them so that they're thinking, their mental model as they're building software, and they're eating that elephant around the research problem allows them to think about, okay, maybe this function that's going across three pages in my editor, now, maybe I could break that into four or five functions, right? So we're, we're trying to inspire those kinds of ways of thinking, while we're also trying to translate skills to them that one of the things that seems to me that
12:15 is the kind of overarching theme there is writing software, that doesn't just solve a problem, but is becomes reusable, right? And so like, what I saw a lot of the times would be, there would be some kind of code written by let's just, you know, take my cognitive science folks, as an example, they'd write code that did amazing stuff. But it would, you know, have all sorts of hard coded specific things for the exact problem they're trying to solve. Like you said, it wasn't typically broken into a lot of functions, there was not the concept of reusability or testability. It was like, well, we, we, this is what I did. These are the steps. And I made it do the steps, right. And they might be really advanced, but it is, it's like moving from from maybe there to, oh, here's a bunch of things I could put into a package that I could then open source and 1000 people can work on it, not just me, or here's the thing I could unit test, like, what is that? Right?
13:12 Yeah, yeah. And we, in our Unix shell lesson, we're trying to teach the Unix small tools philosophy a bit, too. And we hope that kind of rubs off in the other areas where, you know, we're teaching WC to word count lines, right? And we teach them to read the man page of this thing called WC and find what the different flags do and make these magical space delimited incantations at the bash prompt. And think about, they wrote one tool that just counts lines, words and characters. That's all it does, right? And that's all it needs to do. And that's enough, and we can build something. And we can write a script that has the abstraction layer appropriate for our research question that uses WC that uses grap, that uses sort and plugs a few of these things together, so that we can say, okay, we can now apply this to the data that's coming in from our sequencer, the data that's coming down from our satellite, the data that's coming in from our time series from our sensor that we've put out there. And so, you know, in that way, we're kind of by thinking about those abstractions. We're hoping and helping to give them the superpowers they need, like you were saying, to give them those kind of abilities to ask more complicated research questions of the data that's all around them, and all that.
14:38 So speaking of the data that's around them, you've got a nice breakdown of different areas of study, right astronomy, particle physicists biology's and kind of what they might need to know. And so on. Do you want to maybe talk about that a
14:52 minute. So the astronomy community is really training their people to be prepared for this data driven future? It's really kind of exciting and empowering to see and we see participation in software carpentry from places like the Space Telescope Science Institute, University of Washington, other astronomy communities at the University of Illinois.
15:12 Yeah, the the stuff they do in astronomy is no longer just like people going out and grabbing a little light telescope and just staring at the sky and taking notes on a notepad. It's like a serious data driven thing, right? And there's tons of machine learning. And there's you almost have to be a data scientist to be an astronomer these days.
15:33 Yes, you do. And we've had a collaboration with some folks in Italy, that are astronomers that actually want to formally train their PhDs in astronomy, with data driven data scientists skills, so that they're marketable because they realize there's only so many jobs in astronomy.
15:53 That's a good point, too.
15:54 And that's a theme we're seeing in a lot of disciplines is people are saying, hey, there's this huge demand for data driven skills. How can we take you know, we know that we're not going to be able to meet, we're not going to be able to give jobs all these people were training? How can we make sure that their training is relevant to other areas as well?
16:14 That's a great aspect. Yeah, you were about to talk about particle physics.
16:17 Yeah, so the particle physics community, you know, everybody's heard of the Large Hadron Collider, you know, at CERN. And all of these, these kinds of activities that have gone on they they have for a long time. You know, I remember I did my undergraduate in physics, and I remember they were designing the sensors, back when I started my bachelor's degree, 20 years ago or so. And now they have these huge data streams. And they really built a system for these the peers that they use for the processing. And that is really kind of built in to their community and their training very well, it's actually a good kind of example, there are a tremendous amount of needs there, because they've really, their data streams are so systematized and whatnot, but there's always the need for people with new skills, building new tools, new analysis software. And so we've seen some interest from the the particle physics community and kind of preparing new graduate students to have those baseline skills that are necessary. Moving on to biologists, biologists have sequencing data coming out their ears, noses, wherever they can find it, the rate at which sequencing data can be generated far surpasses, you know what most grabbed labs abilities are to process it. And this is a, you know, sort of an acute need in biology to take people who have really good experimental design and lab skills and train them on data driven research. I thought it was a really hilarious tweet yesterday, which is that bioinformatics is just advanced bash, right? So you're, you're actually learning really advanced bash to kind of string together. And that's one perspective, right to string together a lot of tools that other people have written. But through software, carpentry, what we want to do is we want to help people understand that they can build software that can meet their needs, and they can build these kind of small tools that fit into the open source ecosystem. And so about 50% of the people who attend our workshops, and there have been about 20,000 people who've attended our workshops are biologists of some Wow, that's really interesting. Yeah, no, I
18:31 think I wonder if, if partly, that's because the astronomers and the particle physicists, like from a very early place in their study, have to basically become programmers, or they just can't process it, right. Whereas biology, I feel like can go a little farther before you actually get forced into code.
18:51 Yeah, I think that's right, I think you can get so far with, you know, pipetting, and doing pcrs, and getting some sequence data and kind of swirling it around in Excel for a couple of samples. And then, you know, an advisor or funding agency says, you know, you've got to do 505,000 10,000 samples for us to believe this. And then they go, what, I can't do that with the skills I have. And so they attend it took, well, 1020 years ago, was, you know, a PhD, a postdoc, and some time as a faculty member. So that and I might be exaggerating there on 10. But it's the rate at which they can get data is exponential at the moment. And they have problems, never mind the fact that their people aren't trained in programming. They just have computational limits to what they can accomplish, even if they could use the computing power available to them. So there's really interesting questions in biology. There's really interesting communities being developed. We're working with groups all over the world that are really building networks of people that have have the expertise spread across the network in order to answer the big research challenges they have. One is this group in Europe, which is a group of 20 countries building infrastructure, bioinformatics infrastructure, and they really see that as the sequencing facilities, the computational facilities and the people. And we're participating in how they build the network of people to be a support network that teaches each other the skills necessary, and is agile and adaptable to those skills as they're changing. Because for your listeners who are in the bioinformatics community, every time you turn around, there's a new hot tool that you should be using. And you should drop your whole previous workflow as a joke in the community that that's that that's how fast things are changing and moving. That sounds familiar actually.
22:28 Yeah. So ecologist, if you think about field biologists and ecologists, people who go out into the ecosystem and try and make observations, they have these fantastic abilities now, to buy cameras, get microphones that sit out in the field and listen for days, weeks, months, to the ecosystem, measuring time series of pretty much any parameter, you can find a sensor that will measure that parameter, and gather all that together. And so they're really in this data rich era where they have no problem acquiring data, they really have a challenge when they sit down and they say, okay, I've, you know, made these observations across these different treatments over these different sites for this period of time. And I'd really like to build some insight from it, they really don't have great communities around that in ecology, teaching each other how to do this kind of analysis. And so software carpentry has really been trying to help people think about how to build tools, and one of the applications are in ecology is really, you know, how do I build software that can help me smash all this data together and get some insight from it. We don't have all the answers. But they if you think about an ecologist or a field biologist, they're usually people who like to spend time out in the ecosystem, they do a lot of field studies. And now Same way, as biologists are getting more data than they know what to do with it, colleges are sitting there and they're getting data from different vendors, each vendor wants you to buy a $5,000 license key with a dongle that plugs into your laptop, so you can actually process the data. And I'm not even kidding. That's what they want you to do. Now, I'm sure and you've got seven of these vendors for one site, one observation site and you sit down and you go, Okay, how do I start even eating this elephant? And you really do need to build software? That is special purpose for the experiment if you're going to do that, right,
24:32 right. It sounds like an ecosystem or group that's really ripe for some some solid open source stuff to come in and go, we don't need a dongle. Let's do this.
24:43 Yep. Yeah. And there's, there's so much vendor lock in in that space. It's completely nuts. So there are ecologists that are really, you know, owning the process trying to own their data better. And that's really the kind of empowerment that Learning software building skills gives you is you feel like you can tackle that challenge and you feel like your data and your processing of your data,
25:11 you feel like you've become a creator. And you can I can, I can change this, I don't have to just take what they give me. That's great. So one final group that you you mentioned, maybe worth covering is archivist and like scanning manuscripts and digitizing the things, all the things
25:25 yeah, this is kind of broadly digital humanities. This is one particular case of digital humanities where an expert on manuscripts and if you want to be a student of manuscripts, you literally have to travel to the places sometimes monasteries, and libraries where these manuscripts are held. And you have to petition for time to sit down and spend some number of days or hours analyzing this manuscript, you can petition to take pictures of it all different kinds of things. This particular researcher that I did some work with back when I was working for university said, what I'm going to do is I'm going to petition for that time, and I'm going to create a dataset that everyone can benefit from. And I'm just going to create the status, I don't have the tools to quite process it yet. And every way I want to process it, but I'm going to create this dataset, and hope that it becomes a model of a way that we can go digitize manuscripts and create data sets that kind of build a generation of digital archivists and manuscript experts. And so he goes in with this camera rig that he's built. And he moves a point source light around and he puts different patterns in order to see the the sort of how the the page is not necessarily flat, and he can see kind of all the undulations in the surface model he can build kind of a elevation model of the manuscript itself, is that to
26:55 try to like flatten it to protect the bank?
26:57 No, well, it's not necessarily to flatten it, it's just to understand when you're looking at it, that it isn't flattening, okay. And as you're interpreting it, you know, there's all sorts of crazy things that happened on these manuscripts, like monks would carve their initials into the vellum. So this is skin, they'd carved their initials into the vellum, and in hopes that they would go to heaven, because their initials were on, you know, page 23 of a particular manuscript. And so finding those kind of scratchings on those inscriptions is really hard to do. Even when you're sitting in a room, if you imagine you just walked into, you know, a dimly lit monastery with poor lighting, and you're trying to see where the stuff is, his data set he's created really makes it easier than it has ever been to look for this stuff. Okay. And so there's really interesting kind of follow on from this. But all of humanity's is doing the same thing as all these other disciplines did, and saying, you know, what, we can collect data, or we can gather data and put it in one place? How do we begin to ask in a really interesting research questions of it? And how can we build our own tools? And how can we empower our next generation of researchers to be prepared to build tools that help us understand this stuff? Better?
28:14 Yeah, that definitely gives us a good sense of like, how the different disciplines might might benefit from the guys, what you guys are teaching, and certainly the general reasonable software story is super cool. So what are you mentioned tools? What are some of the right tools to be working with, like, people? You know, how often have there been like research mistakes are false results based on using the wrong tools or messing this up?
28:41 Yeah, I mean, that stuff happens all the time. There's this recent, I can't remember exactly what journal it came out in. But there was this recent study with that said 20% of gene expression data sets in major peer reviewed papers had errors in them introduced by cutting and pasting gene names into Excel. And if you've ever seen gene names, they have these crazy names that have numbers and letters, and some of them look like dates. And if you've ever cut and pasted data into Excel, you know that Excel kind of just starts making up data types as it sees data. And so there have been actual problems in gene analysis based on this, these gene naming errors by cutting and pasting vectors of gene names into Excel, you Mangle your gene vectors, and they don't mean what you think they mean anymore. And then you draw upon those and you make conclusions from those false gene names. Because Excel mangled them as you pasted them. Yeah, that
29:47 sounds painful.
29:48 Yeah, that's kind of a model of how we're not always using the right tools. Right. So the the tools that we're, we're using are the tools that we kind of grew up with in graduate school. And, you know, it's the sort of self perpetuating system where the professor's learned to do their their bioinformatics analysis in Excel maybe or in other tools, and they're teaching the next generation to do it in that same way. And so our sort of approaches is not to say, okay, you should never use Excel, or you shouldn't use this tool, or you shouldn't use that tool. But what is a set of skills and tools that we can teach you that we can really empower you to own as a part of your process, so that you are a more effective and impactful researcher? And the three broad themes of that are how do you repeat tasks? And we we say, you know, how do you iterate over directories full of files? That's a pretty common thing that if you have that superpower, you're able to increase your research throughput? How can you share your code and methods with with other researchers? And how can you, we don't go all the way into unit testing? But how can you have some level of assertion that the function you wrote, does the things you think it does, right? And that is an important concept? That's, that's almost an ethical concept, right? So to be a responsible researcher, you have to understand that your code is doing what you think it's doing? And how can you begin
31:19 to ask that question of your code. Sure. And code as part of your, your research project, as part of your paper and your conclusions is becoming more of a thing, right? Like, there's places you can submit your code to, to guarantee that it hasn't changed from from when you wrote the paper and all sorts of stuff like that. Right? That's and making sure it works. It's just part of that, I think,
31:41 yeah, there's there's huge problems there. I mean, things like containers purport to make some of this easier things like Docker containers, where if you start thinking, and I spent some time working for an academic library, and if you start thinking about, okay, the library has to archive this project. And it's using, let's say, Python, two, seven, that should be using Python three, but it's not, it's still using Python three, seven, and a whole chain of dependencies with custom compiled Fortran, right? And you say, okay, we're gonna archive this paper, and you don't archive that runtime environment in some way. And then 20 years from now, you want to run Python two, seven, with all that huge dependency tree and you want to get the same result? How do you even do that? Right? That's not a problem we're trying to solve with our workshops yet at the moment. But that's a conversation that's happening in the broader scientific community of how do we, how do we articulate what we did in the process of running this code in ways that we can reproduce it at a later date?
32:43 Yeah, that's a really interesting problem. I mean, it's one thing to say reproduce it across different systems or whatever, it's another to say reproduce it. 2050 100 years from now.
32:53 Yep. Yep. And should we even be trying to do that, right. So, you know, if you think about libraries, libraries are these hard drives, if you want to use kind of the the computational inclined method, and a file begins at one cover and ends at another cover, right, and they put them on shelves, which are, you know, volumes within your hard drive or whatever. So right, it's, it's a format they, they've settled upon. Some of them, when you open them up, they have really big glossy pages that spill out that you can fold out or whatever, right. But there's, there's this sort of constraint that a library says, We will archive these things called books, and we will put them on shelves, we're still having that conversation of what the digital asset is that we will archive and put on the shelf, right. And it's got to be simple. Like a book, it starts at this cover, it ends at that cover. And the problem is everything between the covers is so damn complicated. Nobody can figure out we're gonna put we're gonna put on the shelf.
33:55 Well, it goes 00110001. But other than that, we really don't agree a lot. I think I feel like partly the problem is, the world is just changing so fast, digitally right now. And I feel like, as we go farther in the future, there will be some stability, some underlying stability, I don't think the change will stop, and innovation will stop. But I think there will be some things okay, this has been around for 50 years, and is continuing to work like we can probably go with this for like it has to be ultra stable. That's a lot about what the technical bits of these researchers need and what you teach, and so on. But one of the really important parts is not just to teach classes to people, but actually to build a community. Right. Do you want to talk about that a
34:40 bit? Yeah. So we train our instructors in our community. They come to a workshop and they teach our learners are the people coming to get these skills, and we collaborate on open source lessons online. And so those kind of three constituencies, the people who work on lessons the instructors in our community And the learners in our community are all really how we start to build community so that we can have first and foremost and impactful workshop that shows off really high quality teaching, and then builds a support structure and community, a community of practices that were at the institution at the organization so that you can sustain this kind of future learning, more development of the lessons that are needed more talking about how to be an impactful instructor and turning instructors into learners of new topics, turning learners into instructors, and really giving people avenues to enter and contribute in many different ways. And so the ways we do that are we invite people who have technical skills, but do not have our instructor certification to become helpers that our workshops and a helper is really just someone that helps scale the instructor. Remember, we're doing, we're really doing hands on workshops here, we're doing type along pedagogy, we're at the Unix shell, everybody's got a shell prompt open, and they're typing along. And they're, they're going through the pain that is understanding that a space delimited bash shell environment or a Python environment that gets really picky. If you don't put the tabs in the right place. It throws up the screen of error messages that that are, if you've never seen them before, or if you have seen them before, but never stopped to kind of interpret them and are looking at them. You can use help. And you can use guidance to this and our helpers are really in the room to help our learners go through that process of learning about what an error message means trying to look critically at the incantation they typed in, figure out what might be wrong with it, programming is one of those things where you you basically have to do it to learn like you can watch and watch and watch. And you can get more ready to try. But until you've actually tried and worked through some kind of problem, you're not there. So I think it's great that you're doing the hands on stuff. And so these guys basically look over the shoulders, if anyone and see how they go and walk around the room, something like that. Right, exactly. And that's their role in the workshop. But what's also happening when they're in the workshop is they're seeing an instructor who's engaged and having a good time teaching. And our helpers are typically people who have a little bit of these technical skills. And we we try to request that our helpers, consider becoming instructors and going through our instructor training process. And so that's kind of the entry entry method for a lot of people to become instructors in our community is to first become a helper, a lot of our learners to see this as a way that they can spread these skills to their research lab, or to their particular research community that they go to a national conference. And they aspire to become helpers, and then instructors as well. And so we have with these kind of different avenues of onboarding into our community, we we have really built a vibrant community that really gives everybody a role in making those workshops. really impactful, really fun, really exciting. And we invite people also to contribute, you know, via Git and GitHub, to our lessons online, which are all Creative Commons tradition, we have a core set of about 10 lessons. And we have people that are thinking about in building other lessons all the time, and we're having kind of an ongoing community conversation about what lessons are the appropriate ones in our community, to be included in our community, what should have their own sort of branding and spin off, we've had a recent group called library carpentry, which is really created a stack of carpentry skills that aren't programming, but are appropriate for a librarian doing the work of a 21st century data driven librarian. And they've really built a vibrant community around that set of lessons. But they share the same kind of inviting structure of offering workshops to learners, training people as instructors. And we've agreed to do that in the same way. And so that kind of gives you a sense of overall our community is really a place to share skills, and how you teach and have conversations about how you teach skills workshops. Yeah, that
39:13 sounds really, really great. Like, not everybody from some research group can go maybe the principal researcher and maybe one or two other people go and they can take that skill skill back. And because the materials are creative commons, they could actually go and do like a little workshop for their people or something like that, right?
39:30 Yeah. And we have professors all over the world who use our workshop lessons as the foundation for the first week or two skills that they want their students to have in their semester long courses before they dive into kind of a deep dive of particular application of the skill. Right?
39:49 Okay. Sure. So if you're going to teach a class on astronomy, maybe you'll use that to like bootstrap it a little bit.
39:55 Exactly. Yeah. Hey, everyone,
39:57 let me take just a quick moment and tell you about Do sponsor the show MongoDB University MongoDB is one of the fastest growing job skills on the market. Long ago when I was getting into MongoDB. I took one of their free courses on Mongo University, it was a great way to get up and running with my first app. MongoDB University offers free seven week courses on MongoDB designed to teach you everything you need to know about how to build a MongoDB based app. This course will cover basic installation, JSON schema design, querying, inserting data indexing and working with the Python driver. Of course, after completing this course, you should have a good understanding how applications are built on top of MongoDB using Python. Plus, you have a great foundation for preparing for the MongoDB developer certification exam. I hope you can join me as a MongoDB University alumnus and sign up for the free seven week course at talk python.fm slash Mongo.
40:49 We feel strongly that lessons that are taught by many people are much better than lessons that are taught by a single person. And there's this famous adage that no battle plan survives first contact with the enemy. The teaching sort of corollary to that is no lesson plan survives first contact with the student. And so having lessons that have been taught by many people to many students, and are continuously improved through an open contribution model, an open source model are better lessons than if you sat down and you said, I'm going to teach you shell today. And these are the things you should know about the shell. And so over time, we've built this set of core lessons that have been taught a lot people have had conversations about them they're not perfect is anyone who's contributed to open source knows, just because a project has a lot of excitement around it. And a lot of people contributing to it doesn't mean that everything about it is perfect, right? There's always things that crop up because of the nature of multi party contribution systems. But overall, there is a deep conversation happening about what should be there and what shouldn't be there. And what the ideals are of that community and what they're trying to advance by that project and our lessons and
42:08 body that a and it's cool. And you guys have like a retrospective or something after a bunch of people teach one week or something like that. Yeah,
42:15 yeah, we have these debriefing sessions where everyone in the world who's taught a workshop in the last week or so, will come to an online video call and talk about what worked for them. What didn't work for them. Maybe it was their first time teaching get they taught Python before but they'd never taught kit and they say, you know, I just failed spectacularly when I was trying to tell people how to resolve conflicts, right? I can't even resolve my own conflicts and get very well and it drives me I
42:40 still confused about rebase.
42:44 Lesson doesn't get into rebase. But, you know, maybe our advanced lesson
42:50 that'll come up eventually. Yeah. One thing I'd like to bring it back to you is just a concrete example of all of this. I recently had Jake Vander PLAs from university, Washington and the Science Research Institute up there. And he's an astrophysicist. That was Episode 81. And you guys are actually doing a lot of work with them up there. I think they're doing some really cool stuff with merging academics and science along with real data science and programming
43:16 at the University of Washington. Yeah, with the sea Science Center. We've trained a ton of instructors up there, they've actually been when I I've been in the shop a little over a year now. And when I first arrived, they were so enthusiastic and excited to get more of their people trained as instructors, because it's really great for their career as being affiliated with the see Science Center, teaching workshops to people, they've really created this buzz on their campus about e science being for everyone and all disciplines. They're running quarterly workshops that have something like 25 instructors on campus now, so they can run workshops whenever they want to. And they've really got their postdocs and affiliated graduate students that are affiliated with the see Science Center excited about the community building, and how in the process of you teaching other people the skills that you have, as a data driven researcher, you get better at your own skills. And so that that has been a really neat, evolving story. They're always pushing into new research communities, we're always hearing that they've they've got instructors coming from different disciplines. And they're really building kind of this neat center on campus, which is the literal place you come to, to ask questions about data driven research and skills that you need to tackle research problems and software. carpentry has become kind of the entry way into that conversation to say, hey, we've got this workshop. It's a way to begin to learn these skills, and we've got these other programs and structures as well. That can
44:50 support you. Yeah, that sounds really cool. I think that whole Science Institute sounds great. Like Like I said, I wish that it was around when when I was in grad school, but you know, it's a different time. This all sounds good when people want to get involved, like, what do they do,
45:05 we have a couple of ways to get involved. Software dash carpentry.org slash join is a website that has a couple of links on it about how to join our mailing list how to get on our, our newsletter release, what are going on in terms of our community events and calendars, we we typically have a monthly community call that people can attend. And that's a way to really become a volunteer and a contributor to the community. We also have a partnership program or a membership program where your organization if you are at an organization that would like to bring software, carpentry, instructor training and capacity building services to your organization, you can become a member organization, what's an example of a member organization without a university? or What is that? Yeah, so it can be everything from a research lab to a university, for example, the Science Institute is a member organization of software carpentry. And what we do with them is we make sure they get a consistent amount of instructor training every year, we give them a seat on what we call our advisory council, which is a body that helps to set the direction of where our priorities are in terms of lesson development, how we run the community, how we, we handle things like what our code of conduct is, and how we we spread and grow the community and where we spread it to so we really give people a seat at the community steering wheel through the Advisory Council. Oh, yeah, excellent. Yeah. So partner organizations range, everything from this pan European collective of 20 countries trying to enhance bioinformatics infrastructure for all of Europe, at that scale, like a multinational scale, all the way down to one particular researchers graduate lab, who wants every one of his graduate students or her graduate students to go through software carpentry training, once every one of them by the time they finish their PhD to be an instructor in our community. And so broad scale, but the really common thread is that they want to build capacity around these kind of data driven
47:08 research skills. Sounds great. So where are you guys going? What's the future for software? carpentry look like?
47:14 Yeah, right now we're we're kind of in a mode where we're trying to plant new instructional communities all over the world, we're trying to spread into developing world countries, we're trying to figure out what are our business model works in those places. Right now, our business model is really these paying partnering organizations and helping to bring workshops to organization. So we're in this kind of expansion of instruction, instructional community, we want to train as many instructors as we can around the world. To do that, we've got to build these memberships and partnerships, in order to support our organization to do that long term. You know, as we have these vibrant instructional communities around the world, we really want to support the growth of the lessons that people need. So I kind of have this a dream of a big chart on the wall where you walk into graduate school, and you look on the wall, and it's seven or eight workshops, you know, with a dotted line through them that tell you start here. And over here, you'll have some machine learning and data visualization skills, right? Or if you're, you know, you want to go into a different discipline, you really want to get into all the tooling and continuous integration and all those kinds of things you really want want to get smart on the software engineering tools side of things follow this dotted line. Yeah, that's going to take a long time. But the first thing we need to do is get more people in the community talking about these lessons that we have creating lessons that we don't have, and creating that conversation. And then we can kind of step into this rapid expansion of the number of lessons we have. And so that's that's kind of the goal. At the moment, our overall goal is really to you know, we're a nonprofit, our success are kind of, basically in our formation statement. We've said once everybody has all these skills, we should cease to exist, right? We're a long, long way away from
49:06 aspire to a world where we're not needed. Exactly.
49:09 Yeah. So that's our long term goal is to go out of existence. That being said, there's there's a lot of work to do between here and there.
49:18 And yeah, that's a great, great goal. He said, You guys are nonprofit, but you still got to make money to support some of your efforts and so on. What's the business model look like? You talked about the member organizations, they're somehow involved, right?
49:30 Yeah. So remember, organizations pass an annual fee, and they're typically universities, national labs, research networks, and they pay this annual fee to both get instructor training and to help kind of steer the community and the resources we have as a community. So some things we do are for example, every workshop we run has an assessment survey. And that assessment survey is a pre workshop and post workshop assessment that can help you understand kind of the boost that the workshop had on the workshop at attendees. So these are all things that if you were someone sitting at an organization trying to build a training arm of your organization, it would be very expensive for you to build. But we try to spread that across the number of organizations with a modest annual fee. And we build in those kinds of services and structures into our lessons into the lessons that we support. And so that's one part of our revenue model. The other part is, when we bring workshops to new communities, we charge a fee to help get our instructors there, we don't pay for the travel, but we we do the coordination to bring instructors to your particular site. So for example, if you Yeah, so the Department of Physics and then astronomer said, I'd really like to have a software carpentry workshop in the spring of 2017, they would contact us, we would find some instructors that were appropriate for teaching the skills that they wanted the lessons that they wanted for that constituency. And they would pay us a finder's fee and kind of a workshop fee. And they would pay the travel for those instructors, all of our instructors volunteer, these volunteer instructors really get a tremendous benefit we have, maybe we can link in the show notes. There's a impact of instructor report, which is 40 pages of qualitatively coded responses of our 265 respondents at the time about a year ago, telling us really how being an instructor in the community has boosted their career has boosted their science has boosted their technical capabilities. And it really is a virtuous cycle.
51:38 Yeah, that's cool. It definitely would make you stand out from all the other graduating scientists to say, Oh, yeah, and I teach this to all around the world, all these gets, you know, I've done a lot of training in the software, developer space in person and virtual and whatnot. But it makes a huge difference the connections you build from meeting all the students the experience, you build the depth of knowledge you get compared to just knowing how to do something with code, or the tools, but understanding of things like that. It's great. Now, I thought your site that you do have some videos that people can go watch. That's not the main thing you guys do. It's not like video training. These are mostly live, right?
52:15 live videos are really demonstrations of some of our senior instructors, pedagogical classroom management kinds of practices. Yes, they are teaching the topics and the skills. And yes, they are exhaustive in the sense that the videos are the full workshop or full modules of the workshop. But we really feel strongly that the live instruction and the live experience plus having helpers available to help you in the room, we use a system of sticky notes where you put a green sticky note on your laptop screen on the back side of your screen. If you've solved the problem we've asked you to solve and it's working. And you put a red sticky note up a few you haven't been able to solve it or you have a question. And our helpers that are in the room, we have one helper for every 10 people in the room are really going around the room and helping you with your problems as they arise.
53:10 That's a really clever, I've done a lot of the workshop stuff that you're talking about. But we never had sticky notes. That's a great idea.
53:16 That's something that we've been doing for quite some time. And it's we've even had some people create a third sticking out a blue one, which is speed up or slow down, or I think it's slow down, right? So you're talking too fast, or you're teaching too fast. And so you know, if you look out in the room of 40 people in 10, people have a blue one up, you know, to kind of pace it down a little bit kinds of things. So yeah, it's a it's a great system. We're really trying to have a lot of fun as an instructor and share our passion for these tools really to share our experience of how maybe we learned about me as a graduate student, I can remember having a stack of Landsat files, satellite image files and needing to do a particular Unix command line tool applied to it. You know, when you write that for loop that gets you to iterate over it, and you can go have a coffee, and it runs for three more days, and then gets all that work done. You're like, Oh my god, how do people do it without
54:13 it until we have a test? They just can't solve those problems, basically. Right? That's right.
54:17 Right. So when you can share that passion and enthusiasm about how you solved some particular problem. For me, that is really what's so exciting about teaching. I came up as an instructor in the community. I taught something like 11 workshops in two years as an instructor in the community. And really, it's just a lot of fun. And you give back you build up people. You're impacting and empowering them to do really awesome research.
54:45 Wow, that's awesome. Well, I think we might be getting getting to the end of the show. We're more or less out of time, but it sounds like such a cool project. And the fact that it's the content is open source I think is really great. So nice work. Yeah. Before we hit the road, let me ask you two final questions. Of course, I always ask them, I guess. Uh huh. There's now over 90,000 pipe packages out there. A crazy number I just keeps growing. So what are some of your favorites that you think people should know about
55:14 these days, I do a lot of preparing documents, like legal agreements and things like that. So in order to do that, I use cookie cutter a whole heck of a lot and Jinja templates to fill in document.
55:29 Alright, so that's interesting. I know about cookie cutter, it's like templates for building projects, like you can build a Django project or some other type of project, you can pre bake that in a cookie cutter, how's that work with documents
55:41 for every agreement that I create, and maybe this isn't the right way to do it. But this is the way I do it, I create a little bit of a YAML. slug, that is all the things I want to fill into that document. And then cookie cutter kind of places everything in the in the right directory for that agreement, every agreement is a little bit different, I have to create a new directory with the template every time because the template isn't necessarily the same for every, every single agreement, the template can kind of change a little bit. But the hard part, the arduous part, if I did this manually, was to fill in all those fields. And I so cookie cutter kind of drops everything in place, it drops the markdown template, which I use pan doc to walk into an ODT or a docx or a PDF, whatever people want. And so cookie cutter for me is the way that when I have any partnership or any membership, I'm gonna sign I say cookie cutter, the name of the organization or cookie cutter, the name of the the cookie cutter directory, and it kind of prompts me to fill in all those yamo fields,
56:44 how that's really, really, really cool. I and for editing, if you're going to write some Python code or any code really what, what editor do you open up,
56:52 I'm a vi kind of guy, I've never, you know, achieved vi ninja skills with commands, you know, four or five letters long or anything like that. But things like magic, like delete inside quotations or parentheses are just like, magical to me. And I've learned a couple of things that helped me move around more efficiently in vi than I can anywhere else. And it's on every frickin system anywhere, no matter if your terminal emulator is working right or not, it just works. So back in the dark ages of tell netting into systems that you didn't even know what the system was. Vi would be there to hold under your pillow at night. So yeah, I've just always gone for that. That's cool. And a lot of the popular editors that are not vi but many of them have vi bindings as well. So you can carry it along even if you go somewhere else. You can read your Gmail with J and K to scroll up and down, you know, just to feel like you're, you're at home. That's right, I
57:54 think I was just doing that like right before we started talking about. Awesome. Alright, so final call to action. And people want to get involved, like they want to learn more, what's the story,
58:02 you can subscribe to our newsletter. It's a monthly newsletter at the moment, it may be slightly more periodic in the future. You also in our join page link to before we have a pretty active discuss mailing list for people who are interested in kind of pedagogical discussions and how to teach programming skills, you can host a workshop at your organization. And you can go to our website and find how to host a workshop. And you can also take the next step, which is to really get your organization to become a member organization and build a community of instructors at your local organization. Feel free to reach out to me
58:40 directly on Twitter or email. And I'll put that those links in the contact info stuff in the show notes. That's really cool. Okay, excellent. Awesome. Well, I'm really glad to see what you guys are doing to create more creators, fewer consumers in the world of science and software. So nice work. Very cool stuff.
58:57 Well, thank you, Michael for the time and I'm excited to contribute. I know this. This shows a little bit off the beaten path in terms of talking tools and Python systems. But this is an important kind of foundation that can grow the community of contributors to other Python projects and other parts of the Python ecosystem. So I'm excited to have had the time to have this conversation with you.
59:21 Yeah. Well, it was great to have you here. Thank you, Jonah, and see you later.
59:24 All right, cheers.
59:25 This has been another episode of talk Python to me. Today's guest has been Jonah dunkles and this episode has been sponsored by robar and MongoDB University. Thank you both for supporting the show. rhobar takes the pain out of errors. They give you the context insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain of course, fans talk Python to me listeners track a ridiculous number of errors for firstname.lastname@example.org slash talk Python to me. Get the skills you need to build your Python apps on top of them. most successful in indemand document database at MongoDB. University. Take a free class by visiting talk python.fm slash Mongo. Or you are a colleague trying to learn Python. Have you tried books and videos that just left you bored by covering topics point by point? Well check out my online course Python jumpstart by building 10 apps at talk python.fm slash course, to experience a more engaging way to learn Python. And if you're looking for something a little more advanced, try my write pythonic code course at talk Python FM slash pythonic. And be sure to subscribe to the show open your favorite pod catcher and search for Python we should be right at the top. You can also find the iTunes feed at slash iTunes, Google Play feed at slash play in direct RSS feed at slash RSS on talk python.fm. Our theme music is developers developers, developers by Cory Smith Goes by some mix. Corey just recently started selling his tracks on iTunes. So I recommend you check it out at talk python.fm slash music. You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Let's mix. Let's get out of here.
01:01:12 standing with my boys
01:01:15 having been sleeping. I've been using lots of rest got the mic back