#154: Python in Biology and Genomics Transcript
00:00 Michael Kennedy: Python is often used in big data situations. One of the more personal sources of large datasets is our very own genetic code. Of course, as Python grows stronger in data science, it's finding its way into biology and genetics. In this episode you'll meet Ian Maurer. He's working to help make cancer a thing of the past. We'll dig into how Python is part of that journey. This is Talk Python to Me, Episode 154, recorded February 9th, 2018. Welcome to Talk Python to Me, a weekly podcast on Python, the Language, the Libraries the Ecosystem and the Personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is brought to you by Codacy. Learn how they make code reviews better by checking out what they're offering during their segment. Ian, welcome to Talk Python.
01:06 Ian Maurer: Hi Michael. Thanks for having me on.
01:08 Michael Kennedy: Yeah, I'm really glad to have you on to talk about Python in biology and genomics. These are two areas where I've wanted to do a show on for a long time, but just haven't managed to get the right stuff all lined up, so really excited to see how Python is playing a role here. I think it's just another cool example of how Python is being used in all these really varied ways.
01:33 Ian Maurer: It's really been taking off the last few years and it's gone really well with what we're trying to get done at our company.
01:40 Michael Kennedy: Awesome, so let's dig into that, but first, let's hear your story. How did you get into programming in Python.
01:45 Ian Maurer: I started programming when I was 13. My parents got me a Commodore 64. I started learning Basic and trying to make my own games and things like that. I went to school for programming, computer engineering. Learned a lot of C and Pascal even at the time, to date myself a little. After graduating, I worked at a defense contractor in their logistics department where we did some SGML and XML-based tools for documentation of these complex systems that they have. And, part of that was, parsing of those files and we actually were using this library that was built in Python. And I kind of fell in love because of the REPL. So, being able to open up the REPL and explore and play with the information right there, was what hooked me. Ever since then I'd kind of been doing it as a hobby, hoping that it would take off in the web space and it did with Django and falling on after Rails, but it just never worked out for me. I was always doing Java-based development for e-commerce sites and other stuff that I've done.
02:48 Michael Kennedy: Java again! No!
02:51 Ian Maurer: Yeah, always Java. So I was doing Java. I still like Java. I still consider myself a good Java developer. When I joined my current company, we did a couple of small things in Python and those kind of took off a little bit and we were able to kind of just double down and add some more features. And, really, since that time, with bioinformatics and other stuff we'll talk about, Python's really taken off and, actually made sense to really use it as one of our core languages for some of our products.
03:19 Michael Kennedy: That's cool. So its finally grown into this place where it's not just, "Oh, I'd like an excuse to use it." But it really makes sense, right?
03:27 Ian Maurer: Yup, exactly right. Actually solves the niche and it's really taken over for Perl in a lot of ways and in the bioinformatics space and it kind of sits along with R and has really got a lot of mind share in this bioinformatics world.
03:41 Michael Kennedy: There's probably some infographic of Perl, sorry, R and Python duking it out over some sort of data science crown I don't know. We'll see where that goes, but they're both doing really well and it's nice to see Python growing so quickly. I really think you look at the growth of Python, there's like this huge jump in its popularity. It's always been growing, which is kind of amazing, but it has this sort of inflection point where it grows faster around 2012, which I feel like is where the data science stuff really started to kick in for Python.
04:13 Ian Maurer: Yeah, NumPy and Scikit Learn and Jupyter, Pandas. Some of these core lang--
04:18 Michael Kennedy: All the machine learning.
04:20 Ian Maurer: Yeah, all of the machine learning stuff. All those things really have just kind of gotten some mindshare all together and it's really we're kind of riding that wave and it's really great and I think you might have said this in one of your previous podcasts. The fact that people who are in data science want to learn something that is a general purpose language too that they can use to make themselves a little bit more marketable is, I think, another feather kind of feather in the cap for Python over some of the other languages.
04:42 Michael Kennedy: It definitely is. Awesome so this sounds like a really interesting way of getting into it. So you went through the computer engineering perspective. Very nice, and I think maybe the first placer to start this discussion really is to talk about the biology and your company, kind of the problem space that you guys are working in so then we talk about all the tools and the way Python's solving the problems people know. So maybe tell us a bit about the way you do day-to-day.
05:11 Ian Maurer: Yup, so I lead development for a company called GenomOncology out in Cleveland. I will talk more about Cleveland later, but so I lead our software design, development, testing and deployments. We were founded in 2012 and really that timing is important because really around 2011, some of the big NGS platforms, Next Generation Sequencing platforms, came out around there. So these include the things like Illumina, Ion torrent, and why those are important is because the human genome project which you might have heard of, kind of wrapped up between 2000, 2003--
05:43 Michael Kennedy: When did that start? Like late '90s?
05:45 Ian Maurer: Mid '90s, late '90s. It took a few years for sure. It took about $3 billion to complete it. And that was basically just mapping a first draft kind of, of the human genome. It basically says, "These are all the variants that quote, unquote a typical human is made up of." and that took about $3 billon to do, and now we're talking about less than a thousand dollars. And as the Moore's law applies to computer chips, right? There's kind of a Moore's Law effect but even I read some analyses where it's even greater exponentially than Moore's Law. With these costs of genomics, it really is just driving the price down which allows us to apply these technologies for lots of different reasons. My favorite part obviously is the work that we're doing around helping people with cancer and helping use genomics to help people find clinical trials, find therapies and hopefully improve their odds at fighting that disease.
06:40 Michael Kennedy: It's definitely one of the great challenges of our time. We've sort of solved the problems that were really bad for humanity to a large degree. Now cancer is one of the major things that people have to deal with. It used to be you might be hungry, you might be eaten by a wolf. Now, you live a long, healthy life until you get some kind of bad news. How much is cancer a genetic problem versus other types of problems. You guys are building genetics tools. How's this all fit together?
07:16 Ian Maurer: Caveat, just by saying I'm not a molecular pathologist. I'm not a bioinformatics person. But, cancer is a disease of the genome. You have your genome, 23 chromosomes, 23 pairs of chromosomes. You're talking about chromosome one has got 2,000 genes, 250 million base pairs. That's kind of the scope of the data that we have. Cancer is really mutations within that genome causing things to break down in a certain way. One of the analogies, and there's a book called One Renegade Cell that we kind of make all of our new employees read, really walks you through the sticky gas pedal and the cut brake line. Basically, what ends up happening is your cell, if you were to have a little cut on your finger, the cells around that cut would know to multiply and grow and then cover over that cut and then they know how to stop, which is really an amazing feat.
08:14 Michael Kennedy: It's actually unbelievable that the machine is humans works. Or, any form of animal, really. It's incredible.
08:20 Ian Maurer: It's all these individual cells. There's different signals throughout the cell. Those signals are called pathways. What ends up happening is, those pathways stop working in some fundamental way and the way that they stop working is through mutations. Those mutations can occur due to some environmental factor, like smoking or UV light or some other mechanism that causes that mutation to happen. Then, from there, it ends up, one cell ends up growing and taking over the space of the other cells. So, a lot of these drugs and therapies that are out there are looking, some of these targeted, personalized therapies are targeting those individual cells that are kind of going rogue and bringing them back and getting rid of them so that the healthy cells can do their thing. Our software, our company, basically, is in the business of helping people, helping oncologists, helping pathologists and other folks in the health care industry identify what these mutations are, figure out what they mean, and then help their patients. Get them on a clinical trial or prescribe them a therapy.
09:28 Michael Kennedy: Right, if you understand the actual genetics that's causing the problem then maybe there's a better, more focused sort of treatment, right?
09:36 Ian Maurer: Exactly right.
09:37 Michael Kennedy: If you look at chromosomes, we talk about big data all the time, right?
09:42 Ian Maurer: Yup.
09:43 Michael Kennedy: There's only 23 pairs so that's no big deal. But, they're actually made up of a lot of stuff. Take us through the sort of big data store. Just sort of the scale of the data. I guess the better way to put it around chromosomes and genetics.
09:56 Ian Maurer: So there's 23 pairs of chromosomes in a "normal human being." You have about three billion base pairs across all those chromosomes, and they get labeled 1 to 22 and then X and Y for the sex chromosomes, but the three billion base pairs in the human genome, there's about 21,000, 24,000 of those are what we call genes and genes are the actual thing that code to proteins. Proteins are the thing that actually make the whole system work. So the genes, the actual DNA part of it, is the base pairs. And then there's three base pairs, As, Cs, Gs and Ts. Those goes in sets of three if you remember from biology. Those go about to become amino acids. And then the average person has about 30 million or so variants that can present. So, one of the tricks that we do, obviously in this space, is we don't actually record all three billion base pairs, we just record the delta, just to make it a lot less data. The other part of making it a lot less data is focusing on specific genes. So right in cancer, there's, depending on the disease type, there only might be three or four genes that matter, or maybe there's only 50 genes that matter, but cancer across all the different types of cancers there might be about 800 or 900 genes that matter. Our types of tests and sequencing that we do really focus in on those smaller regions to just kind of manage the data in a faster way. That collection of three billion base pairs, As, Cs, Gs and Ts, those are what are called the reference genome. The reference genome is what everybody gets compared against. So when sequencing is done on a tumor or on a normal cell the deltas, the variance are what is actually captured, and recorded and we're actually recording it also in the context of what genes are there. So genes actually don't make up a huge amount of the genome. It's a very small portion of the genome that actually codes the proteins.
11:53 Michael Kennedy: Interesting, so is there like a bunch of basically instructions that are just off? They just don't do anything?
12:01 Ian Maurer: They call it junk DNA. Now, it doesn't necessarily mean it is junk, it just is not necessarily known at this time, or it doesn't code to protein, but maybe it does other things. Like there's things called methalation and thee other factors that affect the coding parts of it. And there's lots of theories of how that happens. Some of it's through evolution and pieces just kind of fall out and don't actually matter anymore in the human species. But there's other theories that maybe some of it isn't junk, as well. And then even within the genes themselves, there's exons and introns. So the exons are these strings within the gene that actually get sliced out and turned into the RNA, and then that goes and codes the protein. And then the other part is called introns. The introns are the parts in between each of the exons. So, understanding how the whole map works, understanding how to sequence the data, get the data off the sequencer and then keeping track of all that data is interesting. And one of the things that might be interesting to your listeners just 'cause of the whole Python 2, Python 3 thing, is these reference transcripts get released over time. So, the one that's currently the main one that people use in the clinical setting is called GRCH 37 and that was released in February of 2009. And lots of tools and things were built off of this version of the reference genome. Well over time they learned new things, they apply new regions. It's a very dynamic map. And then in 2013, five years ago now, almost, they released GRCH38. The whole industry hasn't moved over to this new version of their reference genome so it's that, because you got to update all your tools, update all your databases, and it's a tricky thing to do.
13:44 Michael Kennedy: This major incompatibility. How interesting. So you talk about this reference genome. And there's about three billion base pairs make up a person. How much of that is consistent across every single person and how much of there is different? 'cause I feel like we look at people, we all look quite varied. But then you also hear things like, well your DNA's 1.5% different than, say, a chimpanzee or something like that. So give me a sense for, when you say, "I'm going to save the delta." What does that look like?
14:15 Ian Maurer: Usually, an average person, has about 10% variation from the three billion. So about 30 million base pairs will be different across different people. And, obviously, the numbers go up and down, and there's prevalency frequencies, so a lot of these databases that are out there and available for people to consumer as part of their process, they actually say, "We sampled a thousand people and 20% of folks had a G in this spot, and another percent had an A in that spot." That's a big part of just understanding what some of these variants are and one of the things we do in our tools for cancer is, that we'll, Doctors are interested in that prevalency, that allele frequency because, if the frequency if 50%, well there's no way that that's actually a cancer-causing variant because people would be born with cancer and it just doesn't really work that way. It wouldn't be a viable situation. So one of the data points they like to look at is how often does this variant actually happen in the wild. And in the human population. So, it's a very interesting stat.
15:18 Michael Kennedy: Yeah, for sure. Okay, interesting. So, maybe, let's talk about how are you actually do the sequencing at a high level then we'll get into the tools in the Python code that you actually make, how that's working there. But give us the sort of overall pipe, Give us the general pipeline. How do you go from a swab on the cheek or whatever it is to here's your printout. Your A-C-G-T-A-C is you.
15:46 Ian Maurer: There's been older technologies that work in smaller regions and can do things like that. There's a thing called Sanger Sequencing. But as I said earlier, one of the major changes is in 2011 they did this next-generation sequencing. That, basically, takes raw data right from a blood sample or a tumor sample, they put it in this machine called a sequencer and then through the either chemicals or lights of that actual machine, and, once again, this isn't my area of expertise, they're able to analyze it and basically do what are called reads. So they're doing 65 base pairs across or what have you, and readout As, Cs, Gs and Ts and write that to a file. And that's written to a file called either a FASTA file, or a FASTQ file which has quality associated with it. So all these raw reads are happening and it's basically like little snippets of a book. But it's like a book but someone's cut up into little fragments and then kind of thrown it up in the air and then try to figure out how to reassemble it. So that tells us--
16:44 Michael Kennedy: Sounds awkward.
16:46 Ian Maurer: Yeah, so that process is not something we do at my company but that process is what we call alignment and we take the book and try to basically tape it together. And the way they do that is by trying to compare regions against the reference genome itself. And through math and algorithms and some machine learning now they're able to kind of align the whole readout of the reference genome. And those get stored into a file called a SAM file and really it's just a listing of all these different variants but in a line format. And then those files then get compressed into what's called a VAM file. And then we, there are tools that are open source and tools like ours that actually allow you to do visualization of that alignment and really get a good understanding of, do the reads line up? Do the variants look right? Is the quality there? And do you believe the actual calls that are being done? And the next step after aligning it, is actually what's called variant calling. So some additional software, once again, stuff we don't actually do, it goes through these alignment files a and makes a decision and say, "Yup, I read through this VAM file or SAM file and I believe that this position on this chromosome and it's an A and not a T. And, obviously, with two pairs of chromosomes, you might have half of them being As and half of them being Ts and things like that. And cancer's a little bit different because you then have a mixture of tumor cells that are kind of co-mingled with normal cells. So you might actually get allele frequencies, what we call variant allele frequencies, are VAFs, that are not 0.5 or one, but something in between.
18:23 Michael Kennedy: Because it could be that actual mutation that is causing the cancer so like half of them have some values there, others have another, right?
18:30 Ian Maurer: The original normal cell, and then you have these clones of tumor cells, the actual cancer-causing cell, that is now growing and spreading in that region.
18:39 Michael Kennedy: Right, so that gives you more or less, "Here's what we think the genetics is." And then you have to analyze it, right?
18:45 Ian Maurer: Right and this is really where we come into play. So our company started in 2012 just because of this NGS data was starting to overwhelm pathologists and physicians with lots of genomic and molecular data and the belief of our company is that all medicine's going to be molecular in the future. And, really, understanding how that, what those variants mean in the context of cancer especially is where we're really focusing our energies and that includes things like annotating the variants and trying to help people understand how often do they happen in the population. Has there been papers out there said this this variant's pathogenic or benign. There are some prediction models that people have written to say, "This variant will cause the protein to degrade in some known way and the stuck gas pedal or the broken brake line analogy, and then, from there, we're able to do decision support. So there are FDA drugs that are available. There are clinical trials that are available. These things have very complicated eligibility criteria. And our software helps doctors make sense of all of this disparate data, bring it all together and say, "Oh, yeah, for this patient, given these mutations and maybe some other tasks and some other data about the person themselves, we can say that this critical trial's best for you or this therapy would work for you. The FDA's approved it for you. And one of the interesting things that's happening is to prove the whole idea of cancer is a disease of genetics and not a cancer of something else, is that these drugs that are getting approved for lung cancer with a specific variant. Well that drug might work for a melanoma patient with a specific variant, or vice versa, I might be getting the analogy wrong but you get the point. You, basically, it's the specific mutation that matters. The fact that you have V600E on BRAF is the most important part, not the fact that it was on your skin.
20:36 Michael Kennedy: That's pretty interesting. Understanding at this level is really powerful. So, let's talk about the software stack. Maybe at a high level first, and then we can dig into some of the tools. What kind of software are you guys writing to solve these problems and where's Python fit in?
20:50 Ian Maurer: We started off with a research application that we used to get the company started. And then we built our first clinical app for pathologists. And that was all built using Java and GWT. So Google Web Toolkit is a Java-based JavaScript tool. So we don't really have any JavaScript wizards in house and we've always been Java-based. And so, while that was getting built, we actually partnered up with a team at Vanderbilt University called My Cancer Genome and they have a website for people that are looking for information about genetics and cancer. While the rest of my team was kind of building our first couple of products, I actually built a curation tool for them and I built that with the Django Admin tool. So Django has this great Admin tool so it was able to kind of whip together a nice content management tool for them so they could get rid of their Sharepoint solution that they were running at the time.
21:41 Michael Kennedy: Anything that gets rid of SharePoint, that's a good thing.
21:43 Ian Maurer: That was the thinking there, so--
21:44 Michael Kennedy: You can hold your head high in that day. We turned down Sharepoint.
21:48 Ian Maurer: Right, so, yeah, and having the user interface for that. And then, we've since evolved that tool and now that tool is managing, not just some basic content management stuff, but My Cancer Genome type, but it's basically managing all of our knowledge. And what we call our Knowledge Management System. And then what we did was built on top of that a Django REST framework API. So, using Tom Christy's tool to build out an API using REST and now you can hit the API, and get back specific information running at thing called Match in our software. So you can, given a patient's information and demographic, you can, and whatever bio-markers they might have, you hit our API and we'll give you back "Hey, this is a good clinical trial for you within 50 miles of the patient. Here's a good trial for you that you could maybe put them on. Here's a therapy that's approved by the FDA."
22:37 Michael Kennedy: That sounds really powerful and some cool tools that are involved in there. You talked a little bit about user interfaces. Is that all Java or do you doing some UI in Python?
22:47 Ian Maurer: I've heard your recent stuff about UIs. Yeah, all our UIs are in Google Web Toolkit right now. We are doing the new version, the My Cancer Genom Website, using React. So, that's one piece of JavaScript that we've started to use. But, for the most part, we're bulding out strong APIs with Python and then our UIs and things are still with Java and GWT.
23:09 Michael Kennedy: Sounds good. I've heard a lot of good thing about React, but I haven't done anything with React, so I can't speak to much to it. But, yeah, yeah, cool. This portion of Talk Python to Me is brought to you by Codacy. If you want to improve code quality, prevent bugs and security issues from making it into production and at the same time, speed up your code review process, by 20%, then you need to try Codacy. That's C-O-D-A-C-Y. Codacy makes it easy to track code quality and identify and fix issues by automatically analyzing your commit and pull requests with all the most widely used static analysis tools. Codacy helps great teams build great software. Join companies like Deliver Hero, PayPal, Samsung, and more. Try your first code review by visiting talkpython.fm/codacy and linking your GitHub or BitBucket account. You can also just click on the Codacy link in the show notes. All right, so let's talk about some of the tools that you're using. So you talked about Django REST Framework. That's Tom Christy's tool or framework. I had him on the show a while ago as well. So it's basically layers on a REST API on top of Django. So, maybe tell people how you're using that. Like, what it's doing for you.
24:22 Ian Maurer: One of the key things that we do is annotations and one of the annotations people want to know is, "Okay, where is this variance?" And, "Where is it in the context of the whole genome?" And that's called the G dot. Or, where is it in the context of the coding region of a gene, and that's called the C dot. Or, where does it end up, land, once it goes from a C dot to a P dot, which is the protein so the actual amino acids. So G dot, C dot, P dot. So that is nomenclature called a HGVS. There's actually a lot, and so, our API actually houses all of our knowledge, but it also calculates annotations for people. And one of the great libraries to use is our own but called the biocommons and HGVS and those two libraries are open source, open on GitHub and they do a really good job of doing those calculations. So if you're trying to understand, how to get into genetics, I'd look at those libraries. There's also a library called BioPython. We don't use that, but it's also really good. And then from a bioinformatics perspective, you know, we use that whole stack. So we have, on top of our API, we have built out some user interfaces that use, actually, Jupyter and Bokeh and Pandas and NumPy. So, I actually take that back. Our genome analytics platform, the major part of it, the container part of it, is written in GWT, but it's actually calling it and bringing in Bokeh plots as well. So Bokeh's being used on the backend, using Pandas to calculate these grid plots and then we're rendering them in our front end.
25:51 Michael Kennedy: Yeah, that's really cool. I've never had a chance to do anything with Bokeh, but that's where you basically do the sciency visualization stuff on the server in Python and it just transfers over to the web front end. Is that right?
26:04 Ian Maurer: Yeah, so it's calculating the JavaScript for you, 'cause one again, we don't have the JavaScript chops in house, but you're basically running pure Python, using Pandas dataframes. And then you basically configure your Bokeh plot using the Python library and then it renders it and it basically streams out HTML and JavaScript and you can just kind of embed it in an iFrame or what have you. And in your UI, it works great.
26:28 Michael Kennedy: It sounds really great. Like you don't have to be in the charting business.
26:32 Ian Maurer: Exactly right.
26:33 Michael Kennedy: Those are live, right? They're not just like PNGs or something.
26:36 Ian Maurer: You can definitely work with them dynamically right there. You can use them to generate PNGs if that's what you need and some of our clients do need that for their to include it in their research papers if that's what they're using our tools for, but yeah, it's got lots of different use cases and Python keeps coming up with great libraries for visualizations and there's lots of different options too but Bokeh's work out well for us.
26:59 Michael Kennedy: Yeah, it's kind of becoming a paradox of choice. There's as soon as you learn something, yeah, as soon as you learn something you're happy with then you're like, "That looks better. Maybe I should do that." And, of course, it's a constant treadmill sort of thing. So one of the tools you're using that didn't surprise me and I think is interesting and I want to hear more about is spaCy. So, I don't even think I've mentioned spaCy on the podcast Before. Tell us about that, about spaCy.
27:22 Ian Maurer: Yeah, so, we've done a really a proof of concept at this point, using Natural Language Processing. So one of the major challenges in our space and IBM and a few other big companies are spending lots of money to try and tackle this problem. But, basically, the problem is a lot of of these EHRs, DMRs, you know, people are recording their notes about patients in kind of free text. And one of the challenges with that, obviously, is it's unstructured and it's hard to do anything with it. We're not really in the business of major machine learning. We're kind of in the workflow and tools business. We help people solve problems in kind of a more pragmatic way. We're a small company. We can't spend billions of dollars. But what we're doing is we're taking spaCy and using that to parse some of these free text files and basically make recommendations to people. So, helping doing things like what are called entity recognition. Entity recognition means I'm reading this Wikipedia article and finding all the proper nouns in it. Barack Obama did this in Detroit, Michigan, or whatever. Those would all be proper nouns and this is a great tool for extracting out named entities like that. We've trained spaCy to find named entities based on our ontologies, our data within our KMS.
28:35 Michael Kennedy: Right. These are our important words. Go see if they say this. Something like that?
28:39 Ian Maurer: Exactly right. So there's a pattern-matching framework that's within spaCy that's really very easy to use and then the other thing we'd use it for is for classification. So, basically, we've trained the models to say, "Okay, when you read this sentence, and it says, Estrogen receptor, strongly expressed," well, we want that to actually mean something. We want that to mean ER positive in our use case, in our vernacular. And that means something to our RN customers. And what it really does is, what we then do is present it to them and say, "Hey, we saw this sentence, and we think it says this. Do you agree, yes or no?" And if they say yes, then we kind of keep that piece of information and use it to further train our model to make it better over time. We're not really trying to, We don't really think we can get rid of the human in the loop at this point just because we're just at the start of this thing and we want to make sure we get the right answer 100% of the time. But what we want to do is make it so they don't have to read, spend half an hour reading through a document where we can just scan it for them and say, "Here are the interesting parts. "Please go ahead and just confirm it."
29:40 Michael Kennedy: That's pretty wild. I feel like this whole machine learning, AI business, is deeply reaching into medicine and things like that. This is just another super interesting example, I hadn't even thought of. In terms of oncology, like the analyzing, say, scans, like pictures, to see, have the machine say, "No, that looks like cancer to me," kind of doing what a radiologist might do or something, right?
30:05 Ian Maurer: Exactly.
30:06 Michael Kennedy: Yeah, it's pretty amazing.
30:06 Ian Maurer: We like spaCy a lot. I originally tried playing with NLTK a few years ago and actually kind of ran into some barriers. That's an older project. spaCy is really modern in that it's kind of does some of the best practices with Python. I highly recommend it. The documentation's really good. Performs really well out of the box and I was able to pull together a really good demonstration in just a few weeks. So, I highly recommend it.
30:28 Michael Kennedy: That's really cool. Definitely, they have it lined up to, when you got to visit spaCy.io, it really looks appealing and polished. I was wondering why you didn't choose, what the difference or what made you choose spaCy over NLTK, but it's actually pretty obvious straight away, isn't it?
30:45 Ian Maurer: They're doing a really good job with, as a small open source company. I think there's maybe two people working there, from what I can tell. And they've basically open sourced their core product and they're selling ancillary products on top of it and their consultant services too, And it seems like a great project.
31:02 Michael Kennedy: Yeah, that's really cool and I definitely going to look at it more. 'cause I'm always fascinated how these people are building really interesting business models on top of some kind of successful open source thing. So, yeah, another cool example. So, you're building some interesting CLI tools and you guys are using Click, which is pretty common. That's from Armin Ronacher who made Flask. You're also using PEX. I think it's less, a little less awareness. Tell us about PEX. It's really interesting.
31:32 Ian Maurer: Click's great. There's obviously lots of great ways of building command line tools in Python. Been doing that for a long time, but Click's really easy to use. And then, but we find is, how do we get this to our clients? We do a lot of things with Docker, and when we're setting up servers, using Docker to set up a server is great, but we actually also have now command line tools that we're trying to distribute to people and pushing things up to PyPI and having them pull things down use pip and having them set up virtual environments. It just sometimes gets a little bit difficult for some of our end users who might not be Python, day-to-day Python developers. So, using PEX, you're able to actually just build the whole module together with the virtual environment baked in. When you deliver it to them, it just runs. And you can build to different platforms. One of my projects we have a little docking container that actually builds it to Linux and then builds it to macOS and we were able to share it out to people and use the tools without having to go through the whole virtual environment setup stuff.
32:33 Michael Kennedy: That's really cool. So I think PEX is the one that actually takes everything, zips it up and then it turns out Python can execute zip files and run from there, right, which is pretty wild. Do you know if that entirely eliminates the dependency on Python. Like, if I had a blank machine or is it just sort of the packaging that they got out of the base Python there?
32:55 Ian Maurer: Someone asked me that just the other day. I don't think, I actually just think it's just the libraries because it doesn't seem that big of a file. It's not like when you download Eclipse and you get the whole jar with it.
33:03 Michael Kennedy: Yeah, yeah, yeah.
33:04 Ian Maurer: Get the whole Java JDK with it. I actually don't think it, I don't think so.
33:08 Michael Kennedy: It's pretty cool. I've been playing with PyInstaller and it's pretty nice as well. It'll do it so there's no dependency. It's also more problematic because it's trying to solve the problem bigger. I was just thinking, oh, maybe PEX is going to be nice. Another thing that I think is really cool around this stuff, just as a shout out, that I've been playing with a lot lately is this thing called GOOEY, G O O E Y. Have you heard of this?
33:32 Ian Maurer: I did see your little prototype up on GitHub I think.
33:34 Michael Kennedy: Yeah, so you take something like this and then just throw, like a little UI would drop down instead of command line arguments on top of it. It's pretty cool.
33:42 Ian Maurer: Right.
33:42 Michael Kennedy: So, another thing that you're doing is aiohttp. Tell us, are you using the server or the client component of that?
33:49 Ian Maurer: Client, for its high through-put annotation So one of our clients, basically paid millions of dollars for this high-throughput system to generate, to go through the whole alignment and varying calling situation. So they're trying to do high-throughput thousands of cases per week, or whatever they're doing, and they're trying to keep up with that, but they need annotations from our KMS, our Knowledge Management System. And so the challenge was, okay, how do I keep up with them? And, the first version of my software had trouble. So we were trying to parallelize things with multi-processor and it worked, but once I'd actually played with aiohttp and asyncio and really understanding how to program in that paradigm and really look for the IO bottlenecks and work around them, it made my redesign of that tool we called our annotator, that actually does that annotation, much easier. So now, I basically have these five stages in my little program, with Queues in between them where basically what an annotator does is really just reading a file making a call to an API--
34:56 Michael Kennedy: A remote API over a service, right?
34:57 Ian Maurer: Exactly right. And then injecting that data into the stream, and then writing it out to disk. So you got basically, let's just say, three spots where you can leverage the asyncio. So, reading from the original file, making the call to the HTTP server and then running out to disk. And this whole framework allows me to do all three of those things. It kind of just magically balances itself with regards to how much it's reading from the disk, how much it's writing to the disk and how much it's calling the API. The only thing you have to do is make sure you don't call your API too much, unless you want to take down your server, and then our server, on the other end, is highly parallelized through using Celery and Redis and it can scale up because we've thrown lots of hardware at that and so, what we're able to do is we're able to keep up with that multi-million dollar hardware solution with Python 3 and asyncio and it's been great.
35:48 Michael Kennedy: And probably like what? One thread?
35:50 Ian Maurer: So I basically, yeah, one process running and it's doing the job. So, we can then scale out that one program across multiple processes if we want, but it's really pretty high-performance and our client's pretty happy with it.
36:02 Michael Kennedy: That's really awesome. Yeah, cause so much of the time, programs that are slow, they're actually just waiting on some other part of the system. They're waiting on the web service. They're waiting on disk. They're waiting on whatever, right? And so this lets them be productively waiting, basically.
36:15 Ian Maurer: It's definitely a paradigm shift. You have to think through the whole this async method is calling this other async method and really understanding how that all fits together and it can definitely bend your brain a little bit if you're not used to it. But once you actually do figure it out, it's kind of a super power and it's really great.
36:31 Michael Kennedy: Yeah, and as far as super powers go, the actual change in the programming model, it's pretty mellow. It's not that different from serial requests.
36:41 Ian Maurer: Yeah, you just got those couple key words with async and await and once you figure that out, then it's kind of easy from there. And then it's just really about using queues. And then you get into the whole queuing theory and lean manufacturing and that kind of stuff and try to understand how do you remove the bottlenecks from your system so that things go as fast as they possibly can go. And if you kind of have that background and mentality with it, it's really cool.
37:04 Michael Kennedy: Yeah, it's cool. But of course any time you're thinking about concurrency it can definitely sort of veg your mind, like you said.
37:09 Ian Maurer: Yeah, exactly.
37:10 Michael Kennedy: So, speaking about concurrency. Another thing that you guys are using that's really cool is Channels and Celery and Redis. Channels, is that like Django Channels?
37:18 Ian Maurer: Yeah, Django Channels. So one of our tools there's actually async mode to it. So, in the oncology space, one of the big things that happens is for challenging cases, they go to what's called a Tumor Board. So some of your bigger hospitals will have a Tumor Board where basically all the experts at that hospital, or they could even WebEx other people in from other hospitals, to get the experts to help people with rare cases. There's a case, there's a variant. They don't know what it means. What do they do about it? And, that's what they call a Tumor Board. And we built software for that and one of our modes is actually async mode where people can kind of, so they don't actually have to have a WebEx. They can just kind of go to our app and everybody's in the app at the same time. And if there's a leader, the person's moving around from one page of the app to the other, that' sync mode. And that's actually done using web sockets. And so, if you know anything about Django and its history. So Django started off as built on WSGI. And that's a synchronous protocol.
38:16 Michael Kennedy: Yeah, all the popular ones are. They still haven't found a way, really, around it.
38:19 Ian Maurer: Godwin, Andrew Godwin?
38:21 Michael Kennedy: Yeah, Andrew Godman, yeah.
38:22 Ian Maurer: He added this capability to Django, which is basically kind of like this little side thing to Django called Channels. He invented another framework for interfacing in with Django from your web server, from Apache or nginx. And using ASGI, I think is what he called it, and it's an asynchronous platform and so that enable us to do web sockets. And the web sockets is the thing that allows us to do the synchronous movement between different people on our application. So if one person clicks a link and jumps to another page, all the other people that are on the app jump along with them and, really, the main goal of this is to allow people to kind of dynamically work with the genomic information at their fingertips rather than having a bunch of people on their phones Googling what do these variants mean. So they're all kind of working together on single call.
39:10 Michael Kennedy: So you guys sort of built like the Google Docs. You kind of added a Google Docs-equivalent type of experience to your app, right? So everybody fires up your app and they have this local sort of guided experience.
39:23 Ian Maurer: Yeah, that's a really good analogy.
39:25 Michael Kennedy: Yeah, I think more apps need that. I think that's really awesome. How hard was it to do the Channels code and to add this stuff together?
39:33 Ian Maurer: Well the Channels part was easy. It basically just kind of worked out of the box for where we were able to send messages from one thing to the other, but once again, getting actual communication going from one instance to the other, is tricky and it's managing state, and how do you change from one user to another and make sure that the experience is smooth. That's always tough. And then as you add new features, you have to make sure that the sync thing works across those new features, right?
39:57 Michael Kennedy: Right, Right. We've got this new visualization, but it only shows up for the leader, not for you.
40:00 Ian Maurer: Those are always fun. But the actual Channels plumbing and things like that, even though it's kind of cutting-edge code for in beta or what have you, works really well and adding the Redis Channel in between is what ends up happening when you actually set this up, you end up having your web server, nginx, you have what's called an interface server, which is basically an instance of your Django app. You have the Redis channel. And then you have Workers. So the Workers are basically other instances of your Django app. They're actually doing the actual work of responding to either a plain old HTTP request or to one of these web socket requests. And, all that plumbing just worked great.
40:38 Michael Kennedy: How cool. Yeah, it sounds really fun. I've had no chance to use it, but it definitely looks really cool.
40:44 Ian Maurer: Yup.
40:45 Michael Kennedy: All right. Well that sounds like quite the list of cool projects and technologies you're getting to put together there. Must be fun to work on.
40:51 Ian Maurer: It's great. And having a purpose and working for something that's not online marketing or e-commerce or whatever I was doing at past life is great. So, it's great working on something that I think is going to make a difference.
41:03 Michael Kennedy: Yeah, definitely try and make people healthier and live more full lives is way better than trying to optimize that click rate or convert one more piece of data to try to piece together, no this person is actually that other person and they have turned this demographic, right?
41:18 Ian Maurer: Right, yeah, exactly.
41:19 Michael Kennedy: Just some other thing that nobody needs.
41:22 Ian Maurer: Online stocking is not something I'm interested in, no.
41:24 Michael Kennedy: No, for sure. Cool, so you actually have a couple of somewhat related open source libraries. You want to talk about those a bit?
41:31 Ian Maurer: One of the libraries that's out there is called attrs and it's actually I think the basis of the new data classes that's in Python 3.7. So the new those data classes. So there was actually an original project called attrs which was a really great project and it lets you define your classes and you get a bunch of kind of boilerplate Python stuff for free for comparisons and string representations and things like that.
41:55 Michael Kennedy: Right. Implements like, say, hashing correctly and all that kind of weirdness that you can overlook, yeah.
42:00 Ian Maurer: The problem I was trying to solve at the time was I wanted an immutable way of reading the YAML file, getting a nested Python object and not having to munge dictionaries. 'cause you start writing code to dictionaries and quickly things get kind of nasty with some nested dictionary references and things like that. So that's what I was looking for, was a way to roundtrip to YAML, kind of like in Java there's a library called Jackson that'll do that, that'll roundtrip to JSON or to what have you. And Python does a good job of obviously roundtripping from dictionaries to YAML, so what I wanted was an actual object model in attrs, which is really good, but Python had just a different mental model and I wanted something more like the Django ORMs. And I have a lot of use cases where I wanted to basically, say, "I want to call this a string field and I want it to always have this validator and this converter." So, what attrs will let you do is, when you define your fields, you can say it's got this converter and this validator and I kind of just wanted some templatized versions so I didn't have to keep saying over and over again. And I also wanted this magical transformation and that's what the Related project does.
43:06 Michael Kennedy: Related. It looks really cool and it does look like you're working there in the Django ORM or Mongo engine or one of these types of things where you define what the object actually is. Could you have nested objects?
43:22 Ian Maurer: You basically can have, if you declared a class A, can definitely relate to Class B as a child object B, or it could have a list of Bs or it could have a map of Bs. So that object model, and it fully knows how to, kind of render it to and from a dictionary and it does the whole serialization and deserialization for you.
43:44 Michael Kennedy: That's sweet. So, yeah, definitely people should check this out if they're working on Python and YAML, it definitely looks like a cool project. So, the other one's called Rigor?
43:53 Ian Maurer: Obviously we're in a very, it's very important to us to have the right answers for people.
43:58 Michael Kennedy: Yeah, answers have consequences.
44:00 Ian Maurer: The most important thing about my job, I want to make sure we get people the best data, the most relevant data, the most up-to-date data, and one of the key things we got to do is testing. And we spend a lot of time testing by hand. We do a lot of unit testing. We believe in the testing pyramid at my company. But, one of the things I like to make sure we have is kind of an end-to-end test or an integration test or a functional test, however you want to describe it. And we, in our Java space we actually use a tool called Cucumber. And what Cucumber lets you do is basically declare your tests sort of a given one then kind of an English-style DSL. And that allowed our product team, our product specialist team, which are basically non-developers, but the understand the science and they understand how to use the software and test the software to describe how a function should work. Given some state, when I do some function, then I should get some result. What I wanted was something like that on our API side, but I didn't want to go through the whole pain of having Glue where people actually had to write code that runs behind the DSL. And since HTTP is kind of its own language in itself, I decided to kind of shortcut it and just basically build out a simple YAML-based approach. And that's kind of where this related project came from. So you write out a YAML file that actually describes your steps. And the steps describe what you make requests to and then get the response back. And basically it allows us to build out a suite of hundreds and thousands of tests, testing out the software to make sure it gives the same answer every time, so that people know, when they make changes, they're not breaking anything. And it does it using asyncio because I want it to run fast and then we use the thing called James Path, that actually transforms the response that comes back so that the transformation, that allows for the test to not be fragile. So, one of our rules for APIs is we don't let you change a field or remove a field without some major consequences. But if you add a field, if you add a field, it's usually not a problem. But it can break your tests if you have very specific tests that have all the fields listed if it doesn't match exactly like a string test. Then it's going to break.
46:06 Michael Kennedy: I just expecting this string back or this JSON document back or the same no crash. Yeah, it's too much, yeah.
46:12 Ian Maurer: So, with James Path we're able to kind of filter it down and say, "Yup, I only care about or these three fields match exactly as I expect and if so, it's correct. And, so I was going to open source this thing a few months ago, and then I heard on one of your other programs I think, the Tavern CI project was released and it's very similar. So people should definitely check that one out. And both our project and that project were kind of based off the idea of PyRest Test which seems to have been abandoned, which is a nice project but it just had a few things that it didn't, that we needed that it didn't have. And, I'd say the reason to choose our project over maybe Tavern CI would be this James Path thing. We also have API coverage for Swagger. So we define all of our APIs with a open specification otherwise known as Swagger, which we still call it Swagger. And so we can tell you, "Oh, you've got a hundred percent coverage of all your API endpoints and their variables. And then we also, actually, included the Cucumber reporting HTML reporting tool called Cucumber Sandwich. Which brings up a nice, pretty HTML view of your tests and shows you how all your steps ran and things like that.
47:17 Michael Kennedy: Yeah, the graphical output really is nice. And colorful. You could tell you can get info out of it right away.
47:22 Ian Maurer: Yup, it's great.
47:22 Michael Kennedy: Very cool, and you can see how Related fits in there perfectly.
47:25 Ian Maurer: Yes, exactly right.
47:26 Michael Kennedy: I also saw you using the aiohttp, so it's all like async nice and quick.
47:30 Ian Maurer: Yeah, so I wrote, yeah aiohttp to do this little rigor test thing so I could do parallel testing to kind of speed up our test suite 'cause I didn't want them to, you know, if you have to run them sequentially it's going to take a lot longer than if I run them all parallel. So, it takes three to five times less time when you turn the concurrency on with our test suite for all of our API endpoints.
47:50 Michael Kennedy: Very, very nice. All right, so, I think that's, maybe we'll leave it there for the genomic stuff. But that was a really interesting look at how you're using Python to address these major problems and I got to commend you, you've got a bunch of really cool tools and systems put together it sounds like. So, nice work.
48:08 Ian Maurer: Thank you. Python's got a great ecosystem and great community. So many great tools. So, it makes getting stuff done really fast easy.
48:16 Michael Kennedy: The paradox of choice is a real thing that continues to vex people building stuff like this. 'cause you build it all out and you're like, "Oh, but there's some other REST calling API thing maybe I should use API Star instead of Django Rest Framework 'cause Tom Christy's now working on that, right? But you've got to just put a stake in the ground and say we're building something productive here.
48:36 Ian Maurer: Always lots of new toys to play with. And it can get distracting.
48:39 Michael Kennedy: Another thing that we want to touch on is, there's some kind of event going on in your city. Is that right?
48:44 Ian Maurer: Yeah. A PyCon is coming here.
48:46 Michael Kennedy: Yeah, in May, is that May 7th, I think? Yeah, so beginning of May.
48:51 Ian Maurer: Yeah, it'll be here and be right down in downtown Cleveland which is great city. Been here 18 years. It's about two blocks away from my office. So, I'm just kind of be able to stroll right over there at the end of the day. And it's great. Cleveland's awesome, so people should definitely take advantage of some of the sights when they're here.
49:06 Michael Kennedy: I absolutely think so as well. Quick correction. It's May 9th, not May 7th, but, basically, the same, more or less, that timeframe. I'm looking, can I still register? I think I can. I don't think it's sold out yet. It's not sold out yet. So, maybe it will be by the time people here this. So one of the things I wanted to touch on with you. Maybe two parts. One is what advice do you have for getting the most out of the conference itself by, like, I'm within the walls of the convention center, you know? And then, people are going to be in your town, a bunch of folks together traveling here for the conference. What would you recommend they do to get the most out of Cleveland?
49:48 Ian Maurer: I haven't been to a PyCon since 2005, I think, was, I figured out, so I think maybe it was Dallas or something like that.
49:55 Michael Kennedy: I bet it's a really big difference of an experience.
49:58 Ian Maurer: I'm excited to check it out. So it's going to be great to go. You know, obviously everything's online. So, if you've never been and you've never noticed the PyCon on YouTube, definitely check that out. So, what that should do, is give you confidence that you can miss some of the speaker, some of the talks that maybe you're not super interested in and spend more time in the hallway track and talk and meet some folks in the community as the Python group does a great job of getting all those videos online.
50:23 Michael Kennedy: Within like a day, so you almost could watch it while you're at the conference if you really felt like "Oh geez, I wish I saw that."
50:28 Ian Maurer: That's my recommendation there. And then as far as if you're downtown and you're staying downtown, there's some great restaurants over on East 4th Street. There's Lola by Michael Simon, the Iron Chef. There's another one called Greenhouse Tavern. There's the House of Blues which might have a concert that night. There's the Rock Hall which has some special events sometimes and, if you're a rock and roll fan, that's definitely a place to check out. The Indians are in town. I checked. The Indians are in town that weekend. They're playing the Royals. So, if you're a baseball fan, that's a few blocks away.
50:56 Michael Kennedy: Yeah, that's really cool. So if people are in town they could obviously drop in and see that, but if they're traveling from, say, outside the country, I know tons of people come from all over the world. Like when do you get to see a professional baseball game. This might be your chance. Take a couple hours, skip the conference and go watch it, right?
51:12 Ian Maurer: Yeah, the Indians have been good the last few years so it should be a good team. And then, there's some other areas too to check out. So, there's on the west side, there are Ohio City the West Side Market, lots of breweries. Micropub type of things. Definitely check those out. Playhouse Square, which maybe another six or seven blocks away. That's actually the largest performing arts center in the United States, other than New York City. And then University Circle, which is a few miles away. That's not as easy to get to.
51:43 Michael Kennedy: There's Lyft, or Uber. Like you could get there pretty easy, right?
51:47 Ian Maurer: Exactly right. So, yeah, Cleveland's pretty easy town to get in and out of and lots of great restaurants, lots of great things to do.
51:52 Michael Kennedy: Ah that sounds really fun. but I definitely want to second, first of all, what you said about the hallway track. I may take that track too much when I go to conferences. But I find I skip a lot of the talks and actually just really try to experience being with people. Because when you go to the talk, it's great but it's really, you sit quietly and you watch a great presentation. And you experience it there, right? But you don't interact really with anyone near you or anyone presenting so much at all. And so, there's the hallway track which is just hanging out talking to people. And if you find yourself in an interesting situation just take advantage of that because you can always, like you said, go watch on YouTube the thing that you would have gone to see. The other thing that they're doing really well there are open spaces. So I find the open spaces are more participation and engagement than the main talks. And they're not recorded. So, there'll be a big board, if it's like the last two years, there'll be a big board where people put up index cards saying, in this room at this time we're going to just meet and it's kind of undirected group conversation about something amazing. So definitely take advantage of those as well.
52:58 Ian Maurer: That's great. Yeah, and if you want to connect at PyCon, just send me an email and I'll look for you there.
53:03 Michael Kennedy: Yeah, very cool. And do take advantage of some of these fun things that Ian pointed out, like the worst thing about traveling is if you just get on a taxi to a plane to another taxi to a hotel to a conference center and then you pop those off the stack again, you want to go like, "I was in Cleveland and I saw this amazing thing or I, same thing, like wherever you go, try to take advantage of that. So, that's great.
53:32 Ian Maurer: That's great. Yup.
53:34 Michael Kennedy: Yeah. Awesome. All right, well, it's down to the two questions. So I'm going to hit you with those. First of all, if you're going to write some Python code, what editor do you run.
53:42 Ian Maurer: Converted to PyCharm. Its great. I use the VIM editor mode and it's a great environment and love using it every day.
53:50 Michael Kennedy: Yeah, awesome. It definitely kind of overwhelming when you get started, right?
53:53 Ian Maurer: Yeah. A lot of great tools and integration with pytest and the integration with the VIM and Markdown editors. It's a really good tool though.
54:00 Michael Kennedy: Yeah, it is. Once you get use to using the feature, it's hard to not, it's hard to imagine not using it. Awesome, okay. And then notable PyPI package?
54:07 Ian Maurer: All right. I'm going to go with Deep Variant by Google. So, I haven't used this. I probaby won't ever use this. But it's just such an interesting use of AI,. They are actually taking those BAM pileups that I described and basically using image recognition type AI actually determine and make variant calls. So what used to be somebody with a way-bigger brain than me, doing these calculations with math and trying to figure out the right determination of what a variant is being superseded now by this really interesting Google project. So, Deep Variant is the name of it.
54:41 Michael Kennedy: Okay. That sounds really cool and just another one of those AIs creeping in to solve these tricky problems.
54:47 Ian Maurer: Exactly right.
54:48 Michael Kennedy: Yeah, very cool. All right, well definitely interesting choices and thanks for sharing everything. Any final call to action? People want to get involved in biology, genomics, Python. How do they get started?
55:00 Ian Maurer: There's a website called Biostarts. There's lots of interesting topics up there. It's a very stack overflow type clone, I would say. And then there's Stack Overflow itself. There's lots of conversation there. Feel free to reach out to me if you're interested in learning more. And, Python is just a great ecosystem and there's so many cool tools to play with.
55:21 Michael Kennedy: Yeah, I totally agree. So, one of the challenges I see for people getting started in this space, is they're not researchers or doctors. Where do they get the data? Do you know of any good open place to get some data to work with?
55:33 Ian Maurer: Lots of the research that's out there is funded by the U.S. Government or the European governments. You know, NCBI is a website. I can't tell you what the acronym stands for right now. They've got tools. There's data sets out there. Such as TCGA which is the Cancer Genome Atlas. There's a project called Genie which we were involved with helping them analyze their data and they've got lots of cancer data that's out there. But, lot of tools so, search for keywords like VCF and BAM and SAM tools and there's lots of different keywords to search for. And you'll find lots of different data sets. It really just kind of depends on what kind of analysis are you looking to do. And you also find a bunch of Jupyter notebooks out there. People are doing their analyses in Jupyter notebooks and then posting it to the web for people to follow along with. And really it's, I've learned all this stuff in the last five years. It's not insurmountable. It's just the matter of having a goal and trying to reach that goal and solve a problem.
56:30 Michael Kennedy: That's cool.
56:31 Ian Maurer: And it's great.
56:32 Michael Kennedy: Yeah, and solve the problems one at a time and eventually have this big tool chest, right?
56:36 Ian Maurer: Exactly right.
56:37 Michael Kennedy: All right, well, Ian thanks for being on the show. It was great to talk to with you and learn all about this stuff.
56:41 Ian Maurer: That's great, thanks Michael. Really glad to be here.
56:44 Michael Kennedy: This has been another episode of Talk Python to Me. Today's guest was Ian Maurer and this episode has been brought to you by Codacy. Review less, merge faster with Codacy. Check code styles, security duplications and complexity and coverage on every change while tracking code quality throughout your sprints. Try them at talkpython.fm/codacy, C-O-D-A-C-Y. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point? Well, check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. And, if you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/pythonic. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now, get out there and write some Python code.