#178: Coverage.py Transcript
00:00 Michael Kennedy: You know you should be testing your code, right? How do know whether it's well tested? Are you testing the right things? If you're not using code coverage, chances are you're guessing. But you don't need to guess. Just grab coverage.py maintained by our guest this week, Ned Batchelder. This is Talk Python To Me, Episode 178, recorded September 10th, 2018. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @MKennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @TalkPython. This episode is brought to you by Brilliant.org and Manning. Please check out what they're offering during their segments. It really helps support the show. Ned, welcome to Talk Python.
00:59 Ned Batchelder: Hi, thanks Michael. It's great to be here.
01:01 Michael Kennedy: Yeah, it's great to finally have you on the show. I cannot believe we are at Episode 178 and you have not been a guest on the show. How did this happen?
01:08 Ned Batchelder: I know, you're doin' somethin' wrong over there. No, you're doing something great over there. You got to Episode 178 which is astounding. Anyone who says, I'm going to do a thing and then does it 178 times, clearly is doing something right.
01:19 Michael Kennedy: Yeah, we're comin' up on, I think on three years. Yeah, but actually maybe over three years, so I got to do some quick math but yeah, it's been goin' for a while and it's really fun. I'm just absolutely loving it and we're going to dig into a project that you've been actually working on more than three years, right?
01:34 Ned Batchelder: Yeah so coverage.py is a project that I've been maintaining for 14 years which seems crazy.
01:41 Michael Kennedy: That is really amazing and you know, kudos to you for doin' that, that's great.
01:44 Ned Batchelder: I think what I realized about myself is that I'm very inertial. It's hard for me to start new things and then it's also hard for me to stop old things.
01:51 Michael Kennedy: Once you get 'em rollin', they just keep going.
01:53 Ned Batchelder: That's right.
01:54 Michael Kennedy: That's not a bad trait at all. So before we get into all the details of code coverage and so on, let's just get a little background on you. How did you get into programming and Python?
02:03 Ned Batchelder: Well, so I've got kind of an unusual story that way. So I'm fairly old for the Python world. I'm 56 years old and I got into programming because my mother was a software person too. She was a programmer in the 1960s and '70s, '80s and '90s, I guess until she retired.
02:21 Michael Kennedy: Back when programming was really hard. There was no internet, not many books.
02:25 Ned Batchelder: Programming was different back then, yeah. There was definitely no internet. The books were in a big, huge three ring binder at the other side of the room, etc, etc, but the cool thing was that she would bring home some of those books. So I remember as a kid, looking through the IBM 360 programmers manuals and you know, puzzling over what this stuff might mean and so I sort of come by programming naturally. I've been doing it for a while. I joke that it's the only skill I've got so it's a good thing that people will hire me to do it and I got into Python probably in the year 2000, maybe 1999. I'd be working at Lotus on Lotus Notes, which is a collaboration environment.
03:01 Michael Kennedy: Was that in C++ before?
03:03 Ned Batchelder: Well Lotus Notes is written in C but the reason, the way I got to Python was that Lotus Notes, with its access controls and collaboration controls, someone said, oh you should look at this thing called Zope. It also does stuff like that. And I looked at Zope and Zope, I thought, you know, Zope's kind of cool. I don't really need that but this Python thing it's written in seems kind of interesting and so basically from that point on, when I had a choice of tools for writing some little tools or some scripting or some automation, I would reach for Python and that's just grown and grown since then. Now I guess I've been using it for 18 years or so.
03:35 Michael Kennedy: Oh that's really cool, and Python itself has grown with you, right? I mean, Python of 18 years ago is not Python of 2018.
03:41 Ned Batchelder: Yeah and it's funny, I don't even what version of Python that was. It might've been a 1.x I guess. Yeah and Python has definitely grown. I feel like I sort of made a technology choice there and it's worked out very well. I've watched Python grow into at least two new major niches since then into web dev and now into data science. I feel much more comfortable in web dev. I feel a little bit like I'm getting left behind with the data science and machine learning and even just hanging out where people ask questions about things. It's very clear that the center of interest is outside my expertise now, so I've got a lot to learn. It's interesting that you can be an expert in Python after 18 years and be a beginner at the things that people want to use Python for.
04:27 Michael Kennedy: Yeah, that's super interesting. Yeah, like if somebody said, hey Michael go do this plot with Matplotlib and get this data loaded up with pandas, like I'm pretty sure I could not do that without documentation or examples in front of me 'cause I spend most of my time writing web and database code.
04:41 Ned Batchelder: Yeah and I have done Matplotlib and a little bit of notebooks and I am like the typical, I'm on Stack Overflow. I'm just searching for stuff. I see a chunk of code, I don't know what it means, I paste it in, it seems to work and we're done, you know, and I would love to have a deeper foundational understanding but I don't have day-to-day problems that need those tools so there's not much chance for me to really get that learning.
05:05 Michael Kennedy: Yeah, I feel like it's sort of bimodal now. There's these two big areas that Python is really being used a lot in at least and you know. Did you catch Jake VanderPlas' keynote at PyCon 2017 about Python being a Mosaic?
05:20 Ned Batchelder: I didn't, 2017 was the PyCon I missed of the last decade of PyCons.
05:26 Michael Kennedy: Oh, that was a good one. Well basically it was, look there's all these different ways and people are using Python and their goals and their entire purpose of using Python may be very different than the person you're sitting next to, but if you learn to appreciate it, it just makes it richer. And I thought it was a really great way to sort of say like, look at all these things people are doing. They have different motivations and whatnot but it's just as valid Python and style but it's just for a different use case.
05:53 Ned Batchelder: Right, exactly.
05:54 Michael Kennedy: Yeah, I feel like we're definitely there.
05:55 Ned Batchelder: Yeah, one of the things I do is organize the Boston Python user group and we have project nights every month and we just get a big room with a lot of round tables and we put labels on each table. So there's a table labeled beginners and a table labeled web and then we get data and then we get science and we get hardware and it's just really interesting to see the variety of uses that people are putting Python to. There was a woman who came last month, was sitting at the beginner's table and towards the end of the night, I was asking her more about what she wanted to do and she mentioned biology and I said, oh at the beginning of the night, you should've stood up and said, I'm doing biology. We could've found you some biologists to talk to and she laughed like I was joking, but then I introduced her to the four or five biologists across the room who were doing biology with Python. So it's really a very rich ecosystem of expertise and individual domains, which is fascinating.
06:49 Michael Kennedy: One of the things I love about Python is a lot of people seem to come to it with another expertise, kind of like you were just saying, right? Like if you're a C++ developer, there's a good chance you may a developer first, right, but if you're doing Python, you may be something else first that uses Python and I think that just makes us a richer community.
07:08 Ned Batchelder: Mm hmm, that's why I think it's doing so well in data sciences that it's, for whatever reason, it's the kind of language and environment that those types of people can succeed in.
07:17 Michael Kennedy: Yeah absolutely. So you mentioned the Boston user group. This is a global podcast. The internet doesn't have zip code or whatever, but for people generally in the northeast, like do you want to just tell them really quickly about it so they can find it if they don't know?
07:30 Ned Batchelder: Sure, so the Boston Python user group is a big group. We run events twice each month, generally. A project night which is basically a two hour unstructured hack-a-thon with some sorting by topic like I just described and then most months, we also run a presentation night where we try to find people to give talks. We did lightening talks for August. I'm working on grooming a web scraping talk for September and we're going to have science talks for November and we're very friendly. We're big and open. We're on Meetup.com or BostonPython.com and if you're anywhere around, come and see us, and we've had people travel as long as three hours to get to events, so all of New England is kind of in scope.
08:11 Michael Kennedy: Yeah absolutely. Well I guess it depends on the time as well, right, rush hour and all that. Three hours could be not far away in certain parts of Boston.
08:18 Ned Batchelder: That's true, the really memorable one was the father and son who took a three hour bus ride down from Maine and that kid was 13 and he was one of the smartest people I've ever met and they were going to leave and get home at like 2:00 in the morning, based on the bus schedule to have attended. They only came once and I mean, I don't blame them but it was very impressive.
08:36 Michael Kennedy: No, that's cool. Is there any way to remotely attend? Any streaming options?
08:39 Ned Batchelder: So we've never managed to be routine about videoing the presentations which is really unfortunate because even in Boston, we have 8000 people on the Meetup group and we can only fit 120 people in the room and we always have a waiting list so lots of people would like to see video but we have just never managed to find the staff to make it a regular thing.
09:01 Michael Kennedy: It's almost got to be somebody's, their responsibility, their role to just do that, right?
09:06 Ned Batchelder: Yes absolutely, yep and they got to show up every time.
09:09 Michael Kennedy: Yeah so another thing that you do and I'm also super passionate about has to do with online education, right?
09:14 Ned Batchelder: Right yeah, my day job is at edX at edX.org is the website that was founded by Harvard and MIT and puts university level courses online. We've got, I don't know, 2000, 3000 courses from 130 different institutions at this point and it's all Python and Django and it's all open source which is the thing that really appeals to me because I'm an open source guy. I actually work on the open source team here at edX, so we are encouraging and enabling other people to use our software to do online education. There's about 1000 other sites besides edX.org that use open edX to do their education which is thrilling because as great as Harvard and MIT courses are, there's all sorts of other kinds of education that those institutions will never provide. We just recently discovered there was a website in Indonesia which has something like 150 different courses all very, very focused on specific skills that might lift someone out of poverty. You know, how to be a maid, how to do hairdressing, how to raise chickens, how to fix small engines, how to catch fish, like just all sorts of things.
10:19 Michael Kennedy: Super practical vocational type things, huh?
10:19 Ned Batchelder: Super practical vocational in Indonesian, for Indonesians and edX.org, as many courses as we're going to get, we're never going to deliver with those courses, so having the software be open source, you know, we give away education on edX.org and we give away the software to give away education to the rest of the world and there's 1000 sites out there that are using it which is really, really gratifying.
10:43 Michael Kennedy: Oh that's awesome and it sounds like a great project and it's mostly Python and Django?
10:54 Michael Kennedy: Very cool.
10:55 Ned Batchelder: And we're hiring if anyone you know. Tell 'em Ned sent you, there's a referral bonus.
10:59 Michael Kennedy: Do you guys have remote positions or it's got to be in Boston?
11:03 Ned Batchelder: The easiest thing to say is let's say it's got to be in Boston. We are not super good at remote which is something I wish we could get better at, but that's the reality of the situation today, yeah.
11:13 Michael Kennedy: Yeah so if people are listening and they want a cool Python job in the general Boston area or they're willing to get there? Yeah awesome.
11:21 Ned Batchelder: Get in touch, we'll see what we can do.
11:23 Michael Kennedy: Yeah that's really great, it sounds super fun. Okay, so let's talk about this brand new project that you just started called Coverage.py.
11:31 Ned Batchelder: That's right, this is a podcast from December of 2004.
11:34 Michael Kennedy: Exactly, all right, so let's talk about what is code coverage and what is this project?
11:39 Ned Batchelder: Okay well and let me just start with one thing which is I didn't actually start this project. This project was started by a guy named Gareth Rees and back in 2004, I was working on a different Python thing and I wanted to use some code coverage on it and I found this thing called coverage.py and it worked almost exactly the way I wanted and the way it didn't I tried to make a change and I tried to get it to Gareth and Gareth didn't seem to be reachable, so I just sort of published it with my change and 14 years later, I'm the maintainer of coverage.py.
12:11 Michael Kennedy: That's how open source works, right?
12:12 Ned Batchelder: That's how open source works, yeah. I'm mulling the idea of doing a live talk called Lies People Believe About coverage.py, one of which is that I started it.
12:23 Michael Kennedy: Yeah, perfect.
12:24 Ned Batchelder: Okay but to answer you question, so what is code coverage? So the idea of code coverage is you've got some product code that you've written, meaning the code you actually want to write and then to make sure that that code works, you write some tests and I'll give the entire audience the benefit of the doubt and say, you have written tests, but now you need to know, are the tests actually doing their job which is proving that your code works and one way to test your tests, essentially is to observe the tests running and see if all of the lines of your product code were executed by the tests because if there's a line of code that isn't executed when you run your entire test suite, then there's no way that line of code can be tested. The converse isn't true. If the line of code is run, it still might not be properly tested but if it isn't run, then it's definitely not properly tested.
13:14 Michael Kennedy: Absolutely if it was never executed, you know nothing about it, absolutely.
13:18 Ned Batchelder: That's right, there's no way you know how that line of code works. So the code coverage in general is the automation of that process which is, it is a tool that can observe a program being run and can tell you what lines of code were run in the program and notice in that sentence, I didn't say anything about tests. Coverage doesn't know anything about what a test is, it's just that typically the program you want to watch is your code while the tests are being run, but you could run coverage for any reason to know what parts were run or not.
13:48 Michael Kennedy: Right, this is typically most spoken about in terms of unit testing and other types of tests, but one example that comes to mind right away is I've got some app, it's been handed down from person to person and somehow it arrives in my lap and they're like, Michael, you've got to now add a feature or maintain this thing and it's a big scrambled mess and the person who knows all about it is gone. Maybe I just want to know, when it does its job, does it ever even call this function, right?
14:16 Ned Batchelder: Exactly.
14:17 Michael Kennedy: Like there could be all sorts of code in there that is just, nobody wanted to remove it 'cause they didn't know for sure it was okay but if you can run the coverage and say actually no, it's never executed, let's delete it, then you're golden.
14:27 Ned Batchelder: That's right so long as you are sure that you know how to fully exercise the code.
14:31 Michael Kennedy: Yes, you've got to, that is another thing but I mean, I've spent hours trying to understand what a particular function does and how it influences like a big program just to realize that actually, the reason any changes I'm making to this section or try to make it do a thing, have no effect because it's not being called, right? It's super frustrating.
14:52 Ned Batchelder: Right so coverage can be used for that but like you say, the 99.9% use case for any code coverage tool, including coverage.py is for it to observe your test suite being run and then to tell you about your product code, which lines were run and which lines weren't. The idea being that the lines that weren't, that's what you focus in on and you think about how can I write a test to make that line of code be run and you'll gradually increase the coverage and then you test.
15:18 Michael Kennedy: Or you make a conscious decision this part we don't care to test potentially.
15:22 Ned Batchelder: Right that's, you can also decide that, yes.
15:24 Michael Kennedy: But I think the important thing is even if you're in that place, there is a core reason your application exists. There is a thing that it does and there's stuff that supports it doing that, right? If you were writing a stock decision application, the stock decision engine had better have a good bit of coverage on it or you failed with your test, right? Testing the login like crazy doesn't help the core engine do anything better, right?
15:48 Ned Batchelder: You could decide you're going to increase coverage on the parts you care about and coverage doesn't really have any opinions about this. It's designed to just tell you something about your code. I've found over the years, I am drawn to projects that are all about helping developers understand their world better and coverage is one of those ways, right? You wrote something and you wrote something to test it and you thought it was testing it and oh, is it? Well, coverage can tell you whether it is.
16:16 Michael Kennedy: Yeah, I really, I really love that, they way it works. So I guess maybe we could talk a little bit about how you run it? Like is this something you put in continuous integration? Is it a command line tool? Like where does it work?
16:28 Ned Batchelder: Right so the simplest thing is it is a command line tool and you can run it in your continuous integration on Travis or something or you can just run it from the command line. The design is that the coverage command has a number of sub-commands one of which is run and when you type, coverage run, anything you could put after the word Python, you could put after coverage run. So if you used to run your program by saying python prog.py, then you can say coverage run prog.py and it will run it the same way that Python would've run it but under observation. That will collect a bunch of data and then if you type, coverage report, it will give you a report and for every line, every file that got executed, it will tell you things like how many statements there are, how many got executed, how many didn't get executed and therefore, what percentage of them were executed and then a big total at the end.
17:16 Michael Kennedy: Yeah that's really cool. So I could say like, coverage -m unittest and some file or something like this?
17:23 Ned Batchelder: Yeah, coverage run -m unittest, yes I'll just leave it at that, yes.
17:30 Michael Kennedy: It is also a pytest plugin, right?
17:32 Ned Batchelder: There are plugins for pytest and for nose and I'll be perfectly honest with you, I'm not a huge fan of the plugins because it's just another bunch of code between you and me, sort of. Like I don't understand exactly how those plugins work so it's hard for me to vouch for them doing what you want them to do, but yes, there are a plugins for pytest. So there's pytest-cov and when you install it, you now have --cov. A few, about a half dozen --cov options to pytest to say things like, I want you to only look at these modules or I want you to produce this kind of report at the end. So it's much more convenient in that you don't, you can get all of the coverage behavior in one pytest run, rather than having coverage run pytest and then coverage producing a report as two separate commands but like I said, there's a trade off there in that if the plugin isn't doing what you want, then it's a little bit trickier to now you've got to figure out is it the plugin, or is it coverage or is it my script that's doing it, right, it's one more variable in there, so.
18:33 Michael Kennedy: Yeah it just makes it a little more complex. Okay, interesting. Another place that I really like to run coverage.py is from PyCharm, if you've ever seen the integration there but that is just incredible. You right click on some part of your code and say run this with coverage and then you get a report in PyCharm but the editor itself actually colors each line based on the coverage which I think is a really nice touch.
18:55 Ned Batchelder: Yeah, PyCharm is really an amazing IDE, which I don't actually use 'cause I'm old. There's also like a Vim plugin I believe to do similar things to get the coverage data and to display it to you in Vim and probably for Emacs as well.
19:09 Michael Kennedy: Oh that's cool.
19:10 Ned Batchelder: Yeah the more convenient we can make it for people to see the information, the better off it's going to be, the tighter feedback loop you've got and right there in the editor is the best place to see it because that's where you're going to have to be dealing with the code anyway.
19:23 Michael Kennedy: Right, you're not trying to correlate some like report back to some file, right? You just look at it like oh, why is that red? That should be green. What's going on here?
19:30 Ned Batchelder: Right exactly and coverage will produce HTML reports that are colored red and green and actually have a little bit of interaction if you want to focus in on things, but you know, if other IDEs or whatever can produce displays that make it more convenient for people then more power to them. Coverage has an API, a Python API, which is I assume what PyCharm is built on although they could be also just doing subprocess launches and things like that. So yeah, I'm happy to have people get access to the power of that coverage measurement however they're most happy with.
20:02 Michael Kennedy: Yeah that's really great. So one of the things that I kind of hinted at before with the core trading engine and certainly with say like your test code, you probably don't care about looking at the analysis of the coverage of your test code. You would like to see the analysis of your code under test. So how do you exclude some bits of code from being analyzed?
20:22 Ned Batchelder: All right so I'll answer your question first and then I will challenge the premise of the question.
20:27 Michael Kennedy: Okay fair, sure, good.
20:29 Ned Batchelder: So coverage gives, there's a bunch of controls that you can use with coverage to say basically to focus its attention on the code you want. If you run coverage just the way I started with, it will tell you about a lot more than what you care about because it's going to tell you about every library you've imported even if it's not your code and that's partly because that's the way it used to be and partly because it's hard to know what counts as a third party library versus your code if you're running your code in an installed setting and so on and so forth. So there are controls to focus coverage in around the code you want. The simplest is the source option and the source option says, I'm only interested in any source code that you find from this tree downwards and for instance, source equals . is often a very good choice because it means, I'm in the current working directory, here's my code, don't tell me about any code you find anywhere else. More than that, you can say things like, omit these file paths or only include these file paths so there's a lot of controls there to focus coverage in and that's because an automated tool is great unless you're constantly knowing more than the automated tool tells you. Like if the automated tool is like a noisy kindergartner, yammering at you and you have to tell it, keep thinking to yourself, shut up about that. I know that's not what I'm concerned about, then the tool is not useful.
21:56 Michael Kennedy: Yeah. This portion of Talk Python To Me is brought to you by Brilliant.org. Many of you have come to software development and data science through paths that did not include a full on computer science or mathematic degree, yet in our technical field, you may find you need to learn exactly these topics. You could go back to university but then again, this is the 21st century and we do have the internet. Why not take some engaging online courses to quickly get just the skills that you need? That's where Brilliant.org comes in. They believe that effective learning is active. So master the concepts you need by solving fun, challenging problems yourself. To get started today, just visit talkpython.fm/brilliant and sign up for free and don't wait either. If you decide to upgrade to a paid account for guided courses and more practice exercises, the first 200 people that sign up from Talk Python will get an extra 20% off an annual premium subscription. That's talkpython.fm/brilliant. People would stop using it right? Like if it becomes too noisy and too annoying, then you're like, well it would've been helpful but it's just, there's so much noise, forget this thing.
23:03 Ned Batchelder: Right there's actually useful information in it that they can't see or whatever. So it shouldn't be that the way you use coverage is you run it, you get a report and then you skip over all the stuff you know you don't care about just hopefully see the stuff you do care about. So there's a lot of controls in coverage to let you be the smart one in the room and let it be the savant about the one thing that it is smarter than you about which is what code got run. So in addition to being able to exclude or include file paths and modules, you can actually put comments in your code to tell coverage this line isn't run but I don't care. Like don't tell me about this line anymore. I'm okay with it not being run.
23:47 Michael Kennedy: Yeah one of the examples you gave was the repr method, dunder repr, like do you really want to write a test that instantiates an object and just prints it just to get that function, right, like no.
23:57 Ned Batchelder: And I've got lots of dunder reprs in the coverage.py code and they're only ever run when I'm debugging coverage.py. I'm in a debugger and I want to see what...
24:05 Michael Kennedy: You want to see the string representation better than this type at that address.
24:09 Ned Batchelder: Yeah that's right. So I write a dunder repr and I don't want to be told that it's not being executed by test suite because I don't run it in my test suite and in addition to putting comments on lines, what you can actually do with coverage is there's a coverage rc file to configure coverage and you can actually specify a list of regex's and any line that matches one of those regex's will be excluded from coverage measurements and that's how the comment works. The comment is a regex pattern by default in that setting but for instance, when I run coverage, I put like def space dunder repr as one of the regex's and if that will match all my reprs and then they'll all be excluded from coverage and I don't need to worry about them anymore.
24:49 Michael Kennedy: Yeah, that makes a lot of sense. So you could put a comment #pragma nocover but you might not want that in your code all over the place.
24:58 Ned Batchelder: Right, you'd have to remember to put it in and etc, yeah.
25:02 Michael Kennedy: And people who don't care about it, they're like why is this in this code or you know, they write their code, they don't add it or yeah, it's just, sometimes it's better to have it separate, yeah. Right and what I actually have done because I'm a little bit obsessive about it in coverage.py itself, is I have a half dozen or 10 different comment syntaxes that I'll use to exclude lines from coverage because they're being excluded for different reasons. Like a line that only runs on Jython, I will exclude from coverage 'cause I'm not doing coverage measurement under Jython even though I want to have a little bit of Jython support in the code and so I'll have a comment that says only Jython and then I know why it's been excluded. That's cool.
25:40 Ned Batchelder: It's easy because I can just make a list of a dozen regex's and it's all...
25:43 Michael Kennedy: You put it into that configuration file, yeah. Nice, another thing that I thought was interesting was sort of the converse of this, is to explicitly include some files because if you say, look here and you run this one particular file, it only looks at the actual files that were loaded and the modules that were loaded, not the stuff laying next to it that maybe should've been reported on but nobody ever touched.
26:10 Ned Batchelder: Right and that was one of the failings of early versions of coverage.py was that the only thing it knew about your code is code that got ran, run and so, for instance, if there was a particular file in your source tree that was never executed at all. I mean, forget about lines not being executed. The entire file was never executed. It wouldn't show up in the coverage you've worked because coverage had never heard about it, right? If you run a line of code in a file, coverage knows about the file and then it can see all the lines that weren't run but if you never ran any of them, coverage never heard of the file. So that feature's now in coverage.py. If you give it a source option, then it has a tree to search and it can look for all of the importable Python files that it never heard of and tell you about those zero percent files.
26:52 Michael Kennedy: Right that's kind of like the example of me saying, here's this method that nova's never run and like, not even the module is imported in the main bit of code, right?
27:00 Ned Batchelder: Right and by the way, there's a really cheap low tech way to do file level coverage which is you delete all your .pyc files and then you exercise all your code and .py file that doesn't have a pyc file was never imported.
27:16 Michael Kennedy: Yeah right, okay that's pretty interesting.
27:19 Ned Batchelder: Very low tech. Oh and I forgot to challenge the premise of your earlier question.
27:24 Michael Kennedy: Yeah, yeah let's get back to that.
27:24 Ned Batchelder: Yeah so you said you don't want 'em to do coverage measurement of your tests but it actually can be very useful to do coverage measurement of your tests if only because the way test runners work, it's really easy to make two tests that accidentally have the same name. You know, oh I like that test. I want to do one kind of like it. I'll copy it and I'll past it and I'll forget to change the name and now I actually only have one test. If I look in the code, it looks like there's two but there's really only one.
27:51 Michael Kennedy: Yeah, yeah it's so easy to do that.
27:53 Ned Batchelder: And so if you do coverage measurement of your test files also, then you'll see those cases.
27:58 Michael Kennedy: Interesting, okay. I guess, yeah I never really thought about that. That's pretty valid. You know, I was thinking more of like you might say like ignore a particular test and you don't want that to say break the build 'cause it drops it below some percentage or something like that but yeah, that's a very, 'cause I feel test code is probably some of the most copy and pasted code there is.
28:18 Ned Batchelder: Exactly, exactly and you never actually use the function name directly so you'd never know, right, the name of the test function doesn't matter.
28:28 Michael Kennedy: Yeah exactly, so that's a super good, all right, I accept defeat on this one. No, that's a really good point.
28:35 Ned Batchelder: All right, well score one for me. The other thing, when people say they want to exclude their tests from coverage, often it's because they've set a goal for their coverage measurement like we need 75% coverage and those goals are really completely artificial. Like how did you choose 75%? Like why, like a quarter of your code doesn't need to be tested? Why is that okay, right? So the way I look at it, the coverage number has no meaning except that lower numbers are worse. That's the only meaning to the number. So if someone says like how much coverage should I have, there's no right answer to that.
29:11 Michael Kennedy: Yeah you know I guess probably your test code has a pretty high coverage rate relative to your other code so you're only helping yourself in that number.
29:21 Ned Batchelder: That's right, that's right, you could game the system.
29:23 Michael Kennedy: Exactly, I guess one of the reasons I was thinking about excluding it is, I just don't want to see it in the report. Like I don't need to see the test coverage but your example of this copy and paste here actually is pretty valid, I think.
29:33 Ned Batchelder: Right well and another option in the coverage.py reports is to exclude files that are 100%, right? Which again, lets you focus in on where the problems are. Like you don't need to think about files that have 100% coverage. What you need to think about is the ones that are missing some coverage 'cause you need to go and look at those lines and write some tests for those files. So if all your tests have 100% coverage, then include them and exclude the 100% files from the report.
29:58 Michael Kennedy: Right and hopefully that is a smaller list. Yeah you could always use the pragma stuff on say, your test code. You say well these three parts, I'm having a hard time getting it to run for whatever reason and just tell it to not report on that. It'll hit to 100 and it drops out of the list, right?
30:16 Ned Batchelder: Right exactly yeah. The other thing about test code I find is that in a full mature test suite, you've got a significant amount of engineering happening in your, if not in the tests themselves, in the helpers that you have written for your tests and not that there's code in there that isn't being run and should be run but there might be code in there that isn't being run and therefore you can delete it.
30:40 Michael Kennedy: Right yeah, that same conversation of like let's, I do feel like a lot of times people treat their test code with less, what's the right word, professionalism or attention. They're like, well this is test code so it doesn't matter that this big block of code is repeated 100 times, why would I ever extract a method to that? Like there's just less attention to the architecture and patterns there and I feel like that would help.
31:04 Ned Batchelder: Right and so I agree with you that repetition should be removed from tests as it is from elsewhere but I've also heard people feel passionately that tests should be repetitive. That each test should be readable all by itself so I'll give him the benefit of the doubt that maybe that's what they want and that's fine too.
31:25 Michael Kennedy: Yeah sure and if that's a conscious decision then that's fine I think but a lot of people just do it because well they wrote one test and then they copied it and then they edited it and they copied it and then they edited it, you know what I mean?
31:36 Ned Batchelder: Yeah exactly right and they don't have time to make the tests nice 'cause who cares about the tests.
31:41 Michael Kennedy: Yeah exactly, until you change the thing under tests and all of a sudden it's so hard to get it to run because you had poor decision making around writing the test, then you claim unit testing is too hard because I don't want to do it and so on.
31:55 Ned Batchelder: Yeah writing tests is real engineering with different problems than writing your product code and those problems need to be paid attention to which that's probably a whole other episode.
32:06 Michael Kennedy: Yeah I've certainly heard people make statements like that they don't really understand you know, things like object oriented programming and other proper design patterns until they started writing tests and trying to make their code more flexible. Like how do I actually get in between the data access layer and my middle tier logic and test that without having a database and things like that. They really make you think.
32:29 Ned Batchelder: Yeah they do because it's a second use of your code and your code's going to be way better designed if you consider more than one use of the code.
32:37 Michael Kennedy: Yeah, absolutely.
32:38 Ned Batchelder: Yeah testability is a great topic.
32:40 Michael Kennedy: Yeah actually yeah I totally agree and I think like that what you just said right there is one of the main reasons to test is the architecture that comes out of it. All right, so you spoke a little bit about these config files and in these config files, I can put regular expressions which will you know, limit these sections, oh and one point I did want to make really quick about that, when you put one of those comments, if you put that on like a branch or some kind of structure like a class, like everything underneath that will be blocked. You don't have to like put that on 100 lines, right? You put it at sort of the root node.
33:10 Ned Batchelder: Yeah exactly and so it's basically if you put it on a line with a colon, then that entire clause is excluded. The one thing that sounds like it might be a generalization of that that doesn't work is to put one of those comments at the top of a file, it doesn't exclude the entire file. There's a request to do that. That seems like a good idea but we haven't gotten to that yet.
33:30 Michael Kennedy: Okay cool, so but back to config files. There's more stuff that you can put in there. Like what are useful things that people are doing there beyond just exclusion?
33:39 Ned Batchelder: Yeah so there's some basic options like one of the things that coverage.py can do is to measure not just which statements got executed but which branches got taken. So for instance, if you write an if statement, it can tell you whether both the true case and the false case were executed and you might say, well, I can tell that by looking at the code in the true case and the code in the else clause but not every if statement has an else clause. So if you have an if with a statement in it but there's no else clause, you can't tell just by looking at individual lines that are executed whether the false case of the if was executed.
34:12 Michael Kennedy: Right, did you effectively skip that if block.
34:15 Ned Batchelder: Right have you ever actually skipped that statement in the if and branch coverage is what can do that, and so that's one of the options you can set in the config file as well as on the command line. I mean, basically just to go back to the plugins question, the original reason I wrote support for the coveragerc was because I was adding features to coverage and people were using the test runner plugins as their UI so I didn't have any way to give them to try new features in coverage until the plugins were updated.
34:45 Michael Kennedy: I see.
34:46 Ned Batchelder: So I added support for an RC file so you could control coverage even under a plugin from a thing that I could actually add features to. So all the features of the coverage are controllable from the rc file. A big thing that gets put in there, one of the complicated scenarios for coverage is if you have tests that run in parallel either because you're running them on separate machines or you're just running them in separate processes or because you ran the 2.7 tests separately from the 3.6 tests and then you want to take all that data and combine it back together to get one coverage report and under those scenarios, there's often cases where, oh on my CI system, the code was in this directory but back home, where I'm going to write the report, it's all in this directory and coverage has to somehow know that those two directories are the same in some way, that they're equivalent and those file path mappings go into the coveragerc file, for instance.
35:40 Michael Kennedy: I see so that's cool.
35:41 Ned Batchelder: So that during the combination process, it can remap file names to get everything to make sense.
35:47 Michael Kennedy: Yeah that makes a lot of sense. So speaking of stuff in parallel, what's the story around like threading, asyncio, multiprocessing or even if I just want to start a subprocess? Does it have to port for that? Do I have to do anything special for that?
36:00 Ned Batchelder: Yeah so that gets complicated too. So there are three kinds of concurrent, four kinds of concurrency that coverage.py supports right out of the box. Threading, multiprocessing, gevent and green, I forget which, one of the other ones. I forget what it's called and there's code in coverage.py that's specifically designed to see that that's happening and do the right thing. You have to tell coverage which one you're going to use but once you do that in your coveragerc, it knows how to do that. If you're running your own subprocesses, it gets a little bit trickier because you've got Python code in your main process and then you're going to spawn a subprocess which is a whole new Python process that's sort of jumped outside of what coverage is watching. There's a little bit of support for coverage getting started on that subprocess. You have to do some manual setup and that's covered in the docs. I would like to make that more automatic but it feels a little intrusive to sort of have coverage start on every Python process you ever start in the future. You know, I'd rather be conservative about that than suddenly be in the middle of something where people didn't expect it. So we try to support all those different concurrencies. I just got a report that Qt threads don't work, which doesn't surprise me 'cause they're C threads and so, how do I get...
37:15 Michael Kennedy: Right, it doesn't carry that process across, yeah.
37:18 Ned Batchelder: Right and it gets very fiddly. Some of the worst code in coverage.py is in the way we support those kinds of concurrencies because we need to get involved at the very beginning of a thread or the very beginning of a process and there isn't always support for me to just say, hey next time you start a process, why don't you run a little bit of my code first.
37:37 Michael Kennedy: They're like, no that's not your process.
37:40 Ned Batchelder: Yeah we do some very invasive things there. If you're interested, go and look at multiproc.py.
37:45 Michael Kennedy: Yeah how interesting. This portion of Talk Python To Me is brought to you by Manning. Python has become one of the most essential programming languages in the world with the power and flexibility and support that others can only dream of, but it can be tough learning it when you're just starting out. Luckily, there's an easy way to get involved. Written by MIT lecturer, Ana Bell, and published by Manning publications, Get Programming, Learn to Code with Python, is the perfect way to get started with Python. Ana's experience as a teacher of Python really shines through as you get hands on with the language without being drowned in confusing jargon or theory. Filled with practical examples and step-by-step lessons, Get Programming is perfect for people who want to get started with Python. Take advantage of a Talk Python exclusive 40% discount on Get Programming by Ana Bell. Just visit talkpython.fm/manning and use the code BellTalkPy that's BellTalkPy, all one word. One thing that the documentation talks about is this thing called SiteCustomized.py.
38:47 Ned Batchelder: Right exactly.
38:48 Michael Kennedy: I had never even heard of this. You're like, oh you can just put it in here and like, wait what is this? Tell everyone about this, it's all sort of the start of the initialization of Python itself, right?
38:56 Ned Batchelder: Right exactly. So this is about if you're going to run a Python program, how can I make it so that my code runs before your program even starts, right? Because if you're going to run a subprocess and it's a Python program, I want coverage to start before your code starts. One way to do that is for you to change your code so that instead of launching your Python program, you launch it with coverage but that gets very invasive. No one wants to do that. You're not going to run your product by launching coverage so your test would have to be...
39:26 Michael Kennedy: Probably not.
39:27 Ned Batchelder: Yeah, so I looked around for ways to have code run before your main program and there's no sort of built in support for it. Perl actually has a command line switch that says run this program but first run something else. Python doesn't so there are two ways to do it. One is that when you run a program in Python, if it finds a file called SiteCustomized.py, it will run it but I don't know exactly what that file is for. It sounds like the kind of thing that would be recommended against these days like it needs something for your...
40:01 Michael Kennedy: Seems weird.
40:02 Ned Batchelder: Right, don't put it into a weird file in your site packages. Somehow put that in your main program but it's there so one of the ways to get coverage to run before your subprocess starts is by putting something inside customized.py.
40:14 Michael Kennedy: And can you make that just like a file alongside your working directory, the top level of your working directory or is it got to go somewhere else?
40:21 Ned Batchelder: Honestly I'm not quite sure. When I have to do this, I use the second technique for doing it because changing a file is scarier than creating a new file and so the second way to do it only involves creating a new file even though it's perhaps an even more obscure way to get code to run before the start of Python. Now we don't have to go through all the details but if you go into your site packages directory and you look for .pth files, path files, they have this very bizarre semantic which is if the align in that file starts with the word import, then it will execute that line even if the line has way more stuff than just an import statement.
41:01 Michael Kennedy: Weird.
41:02 Ned Batchelder: Yeah super weird, it's super weird.
41:04 Michael Kennedy: And you can name it anything just like...
41:06 Ned Batchelder: Yeah anything .pth. This is all part of how site packages and the path get set up. Just a little bit of back story, working at coverage is fascinating because just programming technology is interesting but you also get to discover all sorts of really weird dark corners of the Python world and I've gotten in the habit of every time a new alpha of CPython is announced, I get it as soon as I can and I build it and I run the C coverage test suite and some of the core developers are used to like we better get the alpha out there so that Ned can run the coverage test suite and tell us what we broke. So I think 3.6 RC3 was because of me because a bug I reported from the coverage.py test suite. Yeah there's a a lot of weird dark corners and a lot of kind of gross hacks to make things work. It's very difficult, so for instance I said coverage run works just like Python. Well it's not easy to make a program that will run your Python program just the way Python runs your Python program and there's some extensive tests in the coverage.py test suite that actually do run the two side-by-side and then do a bunch of comparisons of the environment in which the code finds itself to try to assert that it does run it the same way.
42:15 Michael Kennedy: Yeah I don't envy the job of keeping that compatibility working.
42:20 Ned Batchelder: Yeah well luckily we have a good test suite.
42:23 Michael Kennedy: Yeah well I bet it has good code coverage as well.
42:26 Ned Batchelder: Yeah well the sad thing is that there's code inside coverage.py that because it's at the very heart of how coverage measurement gets done, cannot itself be coverage measured. So my code coverage is at about 94%.
42:38 Michael Kennedy: That's still pretty good.
42:38 Ned Batchelder: That's still pretty good.
42:40 Michael Kennedy: But yeah, that's quite ironic.
42:42 Ned Batchelder: It is ironic, you'd think like me of all people, I'd have 100% test coverage and I've thought about really weird hacky ways of trying to get at that stuff but it's just not worth it.
42:50 Michael Kennedy: Yeah I totally hear you. So you spoke a little bit about the complexities and challenges of making it work. Let's dive in a little bit to how this actually works, right? It seems like magic that I type your command line thing and then numbers all spit out of well here's exactly how your code ran, like how does that work?
43:08 Ned Batchelder: Yeah so at the very heart of it is a feature of CPython called the trace function. So if you'll look in the sys module, there's a function called settrace and you give it a function and it will call your function for every line of Python that gets executed and that function can do whatever it wants. It is the basis for debuggers, for profilers and for code measurement tools and for some other things like tools that will let you run your program and have it just print every line of code that gets run so you can sort of get like a global log of everything that happened.
43:42 Michael Kennedy: Right some of those like sort of inspectors that will print out like an execution of your code as it runs like the actual lines that are running. Yeah, those are pretty neat.
43:50 Ned Batchelder: Yeah exactly. Doug Hellman wrote a full one called smiley.
43:52 Michael Kennedy: Smiley, okay.
43:53 Ned Batchelder: It does some cool things, yeah it spies on your code. So at the heart of it is that trace function and for instance, if you've ever had to break into a debugger from your code and type import pdb pdb.settrace, the reason it's called .settrace instead of what it should be called, you know, break into the debugger, is because that's the function where pdb calls settrace to get its trace function in place so that it will get told as lines get executed and then you're debugging. So that's the core C Python feature and you can go and write you know 10 lines of code that do interesting stuff with the trace function. It's actually pretty simple. Coverage.py has way more lines of code than that but you know, at its heart, that's what it's doing. It sets a trace function and then it runs your code and as its trace function gets invoked with you know, this line got executed, this line got executed, etc, etc it records all that data and dumps it into a data file and that's the run and out phase of coverage.py. All that raw data then gets picked up when you say coverage report. It reads that data, it looks at your source to analyze the source to figure out what could have been run, right, in that first phase, all we'll hear is what got run. The analysis phase is, let's look at the source code. How many lines are there? What could get run and then essentially, it just does a big set subtraction. Here's the lines that could've been run. There are the lines that did get run. What's leftover are the lines that didn't get run. I mean, conceptually...
45:18 Michael Kennedy: That's interesting, one thing that you did talk about in how it works, in the documentation, is that you actually use the pyc files as part of this, right? You've got to look in there for the line table, the line numbers and stuff which is kind of interesting.
45:31 Ned Batchelder: Yeah exactly and that's one of the things that's actually changed a number of times over the course of coverage.py's life is well, how do we know what could have been executed? The question of what did get executed is kind of straightforward 'cause we get it from the trace function. There's not much we can do there. I mean, there's a lot to do there but there's essentially only one way to know what got run. Figuring out what could have been run, there's a bunch of different ways. As you mentioned, .pyc files, compiled into them are a line number table that tells Python which lines are executable and that's how Python actually decides when a next line has been run. That the trace function, it's executing bytecodes. It calls the trace function when the line number table says this bytecode is the first bytecode on a new line.
46:17 Michael Kennedy: Right because if write like print, you know, Ned, Batchelder, that actually is like multiple steps of the instruction. If I disassembled that, that'd be like load the first string onto the stack, load the second string, execute the function, get the return right, all that type of stuff, right?
46:36 Ned Batchelder: That's right so the trace function only gets called for lines and so we're looking at the same information the Python interpreter is for deciding where the lines are. It gets more involved than that because for instance, for branch coverage, we have to decide where the branches are and we do that by analyzing the Abstract Syntax Tree. We used to do it by analyzing the bytecode back when I thought, oh well the bytecode will have all the ifs and jumps I need but it doesn't and it never did and when the async keywords got added, they were completely different than everything else and I said okay fine, we're getting rid of all this bytecode analysis and I'm going to do an AST analysis 'cause I understand the AST when I don't understand the bytecode.
47:13 Michael Kennedy: All right, it really does support async and await?
47:15 Ned Batchelder: Yeah you know, I have to be honest, when that first came out, I didn't know anything about async and await. I didn't understand how to write programs that use them and I sort of took the low tech, lazy maintainers approach which is I'll wait for the bug reports to come in about how it doesn't work for those things and no one has ever said it doesn't work, so I guess it works?
47:35 Michael Kennedy: Yeah all right, that'll do it. Cool, so I guess one question is, calling functions in Python is not one of the fastest things that Python does relative to say other languages. So if you were calling this settrace function on every single line, like how does that not just stop the execution basically?
47:52 Ned Batchelder: Yeah, so well, so the really simple answer is that if you write a Python function and set it as your trace function, then yeah, you're going to have a significant performance edge because it's going to call a Python function for every line of your code and if that Python function does anything interesting at all then it's going to do a lot of work for every line of your code. Inside coverage, we actually have two implementations of the trace function. One in Python and one in C and the one that's in C is there to make it go faster. So there's a bunch of work to try to do as little work as possible on each call. So for instance, when we get told that we're calling a new function, we do a bunch of work to figure out is this function, or is this file rather, a file that we're interested in at all and if it's not...
48:37 Michael Kennedy: Have you already figured out how much coverage it has? Like if it's 100 right, then don't do it again.
48:41 Ned Batchelder: Well if you don't know it, at this point we're in the run phase. We have no idea about numbers like that. All we know is this file's getting executed. Is our current configuration telling us that that file is interesting or it's not interesting and if it's not interesting, then we try to quickly get out of that function and set some bookkeeping so that the function doesn't get called again until we return from the function, for instance. So there's a lot of work to try to make the trace function as fast as possible because especially for mature test suites for the types of people that are running coverage, the one thing I can tell you about all of those test suites is the developers wish they ran faster. Like I don't need to know anything about your test suite. If you've got a lot of tests, you want them to run faster. So the last thing I want to do is make them go even slower and I don't know what the typical overhead is of coverage on a test suite 'cause it's going to vary so much depending on what else those tests are doing but the C code is there to try to make it go as fast as possible.
49:38 Michael Kennedy: I see, yeah that makes a lot of sense.
49:40 Ned Batchelder: The pure Python implementation is there for, for instance, PyPy which can't run C extensions.
49:45 Michael Kennedy: Right but after a while, it'll start to speed up as they maybe consider it has to be JITted after like 1000 lines or something.
49:52 Ned Batchelder: I guess so, PyPy is still under the magic category for me and I don't know whether a trace function will actually get the JIT applied. That's the thing I am used to is that a trace function is not run like regular code and I don't know if PyPy can JIT a trace function, for instance.
50:14 Michael Kennedy: Yeah, I have no idea either.
50:16 Ned Batchelder: Yeah, I don't know. There are a few speed ups in there specifically for PyPy. Basically when a PyPy developer tells me, hey why don't you say this magic thing that you don't understand and PyPy will go a little bit faster and I'm like, great sure, I'll put it in.
50:30 Michael Kennedy: Yeah PyPy is interesting. So I guess you know, that's a good segue over to which run times are supported. Like you mentioned Jython, you mentioned PyPy, obviously CPython. What versions of C Python, what else?
50:42 Ned Batchelder: Yeah so the current release version of coverage.py is 4.5.1 and that supports Python 2.6 and Python 3.3 and up. So 2.6, 2.7, 3.3, 3.4, 3.5, 3.6, 3.7. The tip of the repo now, I'm already working on coverage 5.0 alphas and I've dropped support for 2.6 and for 3.3, but the released version can do 2.6. For a while I was very proud of supporting 2.3 through 3.3 which is kind of an interesting trick but I don't have to do that anymore. So PyPy both 2 and 3 are supported. I have run tests on Jython and IronPython. They can't do all of it because they can't do the analysis phase 'cause they don't have AST but you can run your code under Jython and IronPython for the measurement phase and then use CPython to do the analysis phase.
51:37 Michael Kennedy: I see.
51:38 Ned Batchelder: I don't know how well that works. I haven't heard a lot from people about it. I don't know how much those Jython and IronPython are getting used these days.
51:44 Michael Kennedy: There's another one called Python.net. I actually haven't used it for anything but I think it's a successor type of thing. A little bit different than IronPython but similar.
51:51 Ned Batchelder: Is it, I need to look into this. I need more complication in my life.
51:57 Michael Kennedy: Yeah I haven't done a lot with it but Python.net seems like a more maintained different take on what IronPython was but I pretty have exhausted my knowledge of it now.
52:07 Ned Batchelder: All right, cool.
52:08 Michael Kennedy: My question earlier was, what are your plans on supporting old versions, right, like so if CPython stops supporting 2.7, are you going to stop supporting it or are you going to keep...
52:18 Ned Batchelder: Which it will.
52:19 Michael Kennedy: Yeah that's certainly going to happen soon enough or you know, say 3.4, like what's the timing? Do you try to stay farther behind? What's the thing?
52:26 Ned Batchelder: Like I said at the top of the show, I'm pretty inertial so I tend to support things for longer. So my philosophy about coverage.py development is that I want coverage.py to be available for anyone who's trying to make their code work better for Python and the environments they care about, I will care about. So for instance, I first ported coverage.py to 3.0 in 2009 which is pretty early and it was pretty rough around the edges but I thought, you know, if people are going to start porting their libraries to Python 3.0, it'd be super useful if coverage.py were there to help them understand how their tests were doing. So I tried really hard to get ahead of it and for that reason also, coverage.py has no installation prerequisites because if I pre-require a library, then that library has to get into a new environment before coverage.py can and then we've got a chicken and egg problem. It also gives me a principled reason to have written my own HTML template engine which is fun.
53:26 Michael Kennedy: That sounds like a pretty challenging thing. Also challenging, another runtime, if you want to call it that, I guess, that seems interesting is Cython. Does this have any, but it seems to me, my first guess is that it wouldn't work with Cython but what's the story there?
53:41 Ned Batchelder: I should really know more about Cython and in fact, it's been proposed to me that instead of writing C code for the trace function, I should write Cython for the trace function which I've just been resistant to because like I said, I'm inertial. So my understanding of Cython is that it compiles to C code and that when you're running, you're running C code so you need a C coverage measurement tool in order to understand what's happening there?
54:04 Michael Kennedy: Yeah it compiles to a C file which then compiles, at least on Mac, to a .so file and the .so file is running.
54:12 Ned Batchelder: Right, coverage.py won't be able to see any of that execution. A C coverage tool could but then you'd have to figure how to map that back to the Cython code that you're actually interested in, which actually brings up an interesting point. One of the things that coverage.py has gotten in its 4.0 releases is plugins so that you can do coverage measurement of things that are not directly Python but result in Python. So for instance, there's a Django coverage plugin that can measure the coverage of your Django templates because Django templates themselves have if statements and for loops that are in them, so that's code. You'd like to know if there's coverage there and so there's a plugin, a Django coverage plugin that works with coverage.py to basically understand how Django executes the templates and take the raw Python information about the template execution and map it back to the lines in the Django template so that you can get sort of a red, green Christmas tree of your Django template.
55:18 Michael Kennedy: Oh that's cool.
55:19 Ned Batchelder: Yeah it's very cool and it was getting a little stale but I've just gotten a new maintainer to take it over and so it looks, they just had a release today actually.
55:27 Michael Kennedy: Oh great, all right, it's interesting. It seems to me like there's probably enough information there but that it's really not super straightforward so up for Cython but maybe...
55:35 Ned Batchelder: For Cython would do it.
55:36 Michael Kennedy: Yeah I mean the C file is a huge massive thing relative to the size of the Python code but I do believe it has line numbers that map back to the Python, so it's possible, I guess.
55:46 Ned Batchelder: The reason that reminded me of the plugins was because I wanted to also make a Mako plugin. So Mako is another HTML templating engine and Mako and Django work completely differently. Mako actually compiles your template to a Python file and then runs the Python file, whereas Django templates, there's Python code that's actually running on an Abstract Syntax Tree of the template. It's sort of, so it that sense, Mako has kind of compiled the Python and Django's kind of interpreted in Python and so the plugin technologies had to be completely different and the Mako template has pointers back to the lines in the template, but they're a little bit inaccurate and so I was frustrated at not being able to sort of close that loop because of a limitation in Mako, and Mako seems kind of unmaintained at this point so it kind of didn't go anywhere.
56:34 Michael Kennedy: Yeah, yeah interesting. It's cool that those are there though if people want to use 'em.
56:38 Ned Batchelder: Yeah, yeah it's very gratifying.
56:40 Michael Kennedy: Yeah, so last question before we run out of time 'cause that's where we are, what's next? What are the upcoming features?
56:46 Ned Batchelder: Right so I mentioned that I've got some, I've already released 5.0 Alpha2. The big feature that's coming up is something that I colloquially call who tests what, which is instead of just telling me that that line got run, tell me which tests actually ran that line. Now people are interested in this for all sorts of reasons but it can require some significant changes to the core of coverage.py. It's going to present some challenges in that if you have 1000 tests, then you have to collect roughly 1000 times as much data now because you're essentially, it's as much data as if you did a separate coverage run for every single one of your tests. And there's some tools out there that kind of do this already by doing exactly that, running coverage independently around each test but coverage will do it differently 'cause it's all bundled together. So there's an alpha out already now, Alpha2, which has switched the data format from adjacent file to a SQLite database and I needed the SQLite database because if I'm going to dump 1000 times more data, I want to dribble it into a database and I don't want to give you a database that you can query 'cause I don't know how people are going to use the data.
57:49 Michael Kennedy: Right, do indexes on it and all that stuff.
57:51 Ned Batchelder: I have no idea how to report on this data 'cause I'm not going to make an HTML file that for every line that your source code has the names of the 200 tests that ran it, so we need to get this out there and into people's hands and start seeing what people do with it. That's the big feature that's coming up.
58:06 Michael Kennedy: That's cool, one of the most interesting things I think you could do with that, just listening to you describe it, would be if I have 1000 tests and I change a little bit of my Python code and only three of those tests actually interact with those lines, theoretically, I could just run those three tests to retest it, not all 1000, which would be dramatically awesome.
58:26 Ned Batchelder: Right exactly. There are also some tools out there that do that now too. Kirk Strasser has one and the name of the tool escapes me at the moment, but he's got his own trace function that essentially does, serves as a mini coverage measurement of his own to get at that information. The thing I'd be interested to experiment with, with that information is what if my code is over-tested? If my test suite is taking too long, maybe that's because I've got 100 tests all sort of getting at the same information and if I can reduce it to 10 tests, then it would take a tenth of the time and I honestly have no idea whether, who tests what will give me the information that I'd need for that but it'd be interesting to play around with.
59:07 Michael Kennedy: Yeah it sounds really positive.
59:08 Ned Batchelder: Yeah I'm looking forward to having the data at least.
59:10 Michael Kennedy: Yeah cool, all right. Let me ask you the two final questions before you get out of here. You hinted at this at the beginning, if you're going to write some Python code, what editor do you use?
59:20 Ned Batchelder: I use Vim and I mentioned earlier that I am old and I do not use Vim because I am old. I've only been using Vim for about 10 years, so I got to it late in life but it really suits my low tech mentality, I think.
59:34 Michael Kennedy: Oh beautiful and beyond coverage.py, what is a notable Python PyPI package that you know, maybe people haven't heard of but you're like, this is awesome, you should know about this.
59:46 Ned Batchelder: Yeah so I'll tell you the one, so this is the very last thing that happened to me at PyCon this year. I was literally almost dragging my suitcase out of the convention center to go catch a plane when I stopped by the Pylint Sprint and heard about a package called Check Manifest and this pack, it only does one little thing but it's a thing that no one really cares about well enough to get it right themselves, so it's great to have a helper which is, it tells you whether the manifest.in that you wrote for your setup.py has all of the files from your working tree or not.
01:00:19 Michael Kennedy: Okay so really sort of a check on your package before you ship it off.
01:00:24 Ned Batchelder: Exactly and packaging is one of those things that like, everyone hates. It's no one's first love. No one wants to think about it. It's very confusing. What's a manifest.in? Why is that different than package data? I don't get it and so Check Manifest, it just does one little thing and it's beautiful and I had never heard of it before and it seems like people should be screaming it from the rooftops.
01:00:43 Michael Kennedy: Yeah awesome, well that's exactly the thing I'm looking for, thanks for sharing that.
01:00:46 Ned Batchelder: That's right, a second library is TQDM, T-Q-D-M, which is a progress bar library which is very cool.
01:00:52 Michael Kennedy: Yeah I really like that one. I've been using those types of progress bars lately and they're pretty cool, nice. Okay, final call to action. People are excited about coverage, maybe even they're excited about edX, what do you want to leave folks with?
01:01:03 Ned Batchelder: Read about coverage. I've got some docs that I think are good but that's 'cause I wrote them, so.
01:01:07 Michael Kennedy: I can confirm that. They were very good. I went through the docs to do a lot of research for this show and they were at the right level. What I wanted to know but not so much, I couldn't get through them. That was perfect.
01:01:16 Ned Batchelder: All right, good, that's good to hear. I hang out on the Python IRC channel and I love to see people there. I think it's a great way to connect with people. I like to think of it as a nice IRC channel so if you've been to IRC and didn't like it in the past, try the Python IRC channel on free node. BostonPython.com, OpenEdX.org, I'm NedBat on Twitter. You can follow me. I've got a blog that I've been running for far too long if you want to read what I thought about 16 years ago. Get in touch, you know, I like hearing from people.
01:01:46 Michael Kennedy: Yeah that's awesome. The internet is written in ink, right? All that stuff's still there. So I definitely find your blog interesting. There's some topics in there I would love to have you back on to talk about, but for this one, we're going to have to just leave it here, I think. So thanks for being on the show, Ned.
01:01:59 Ned Batchelder: Sure, thank you Michael, this was great.
01:02:00 Michael Kennedy: You bet, bye.
01:02:01 Ned Batchelder: Bye.
01:02:03 Michael Kennedy: This has been another episode of Talk Python To Me. Our guest on this episode has been Ned Batchelder and it's been brought to you by Brilliant.org and Manning. Brilliant.org wants to help you level up your math and science through fun guided problem solving. Get started for free at talkpython.fm/brilliant. Learning Python doesn't have to be overwhelming or intimidating. Check out Get Programming by Ana Bell from Manning. Just visit talkpython.fm/manning and use the code BellTalkPy to get 40% off. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps or our brand new 100 Days of Code in Python and if you're interested in more than one course, be sure to check out the everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thank you so much for listening. I really appreciate it. Now get out there and write some Python code.