Monitor performance issues & errors in your code

#63: Validating Python tests with mutation testing Transcript

Recorded on Wednesday, Jun 15, 2016.

00:00 Do you think it's a good idea to test your software? Do you write unit tests or other automated verification for code? I think most of us do these days. A key question is how do you know whether your tests sufficiently verify your code? The standard answer is code coverage.

00:00 But there is a difference between executing code (which code coverage measures) and truly verifying it.

00:00 On this episode, we'll talk with Austin Bingham. He created a mutation testing framework for Python that goes beyond code coverage to actually perform this verification. It's a fresh and powerful idea. I hope you enjoy it!

00:00 This is Talk Python To Me, episode 63, recorded June 15th, 2016.

00:00 [music intro]

00:00 Welcome to Talk Python To Me, a weekly podcast on Python- the language, the libraries, the ecosystem and the personalities.

00:00 This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython.

00:00 This episode is brought to you by Hired and Snap CI, thank them for supporting this show on Twitter via @hired_hq and @snap_ci.

00:00 Hey everyone. We have an interesting deep dive into the world of Python testing and Python internals today. Before we chat with Austin about mutation testing and his Python library called Cosmic Ray, I have a few goodies to give away to lucky couple of listeners. First, Austin and his co-author Rob are giving away a copy of their book The Python Apprentice, as well as two free passes to their online Python course. As always, just visit talkpython.fm and make sure you are a friend of the show to be eligible to win. I'll pick three lucky winners next week. Now, let's meet Austin.

02:01 Michael: Austin, welcome to the show.

02:02 Austin: Thanks Michael. I'm really glad to be here.

02:05 Michael: Yeah, I'm super excited to share this mutation testing idea that you have sort of manifested in Python, that's really cool. We'll talk a lot about that. Before we get into it though, what's your story, how did you get into Python and programming?

02:16 Austin: Well how I got into programming was when I was quite young I guess, around ten years old, we had a computer around the house and it was IBM 80 or something along those lines, I forgot exactly the model, and it could be programmed in Basic and that's really caught my attention. My parents got me some magazines and so forth that taught me how to do more complicated things and you could figure out on my own, and it sort of took off from there. Python I have tried to figure out where I first started using Python and to be honest I am not entirely sure, I think it was around in graduate school though, so this would have been in the late 1990s.

02:52 Michael: Nice. What did you study in grad school?

02:54 Austin: That was software engineering, this was University of Texas in Austin. And we were doing all sorts of stuff related to artifact traceability and large scale software systems, and somewhere in there Python showed up for a built system or something along those lines and it really caught my attention and it sort of started there and grew and grew with my career and it is shown up everywhere since then to larger and larger degrees, so it's something that I have really enjoyed using for the past I guess 20 years at this point or so.

03:23 Michael: Yeah, that's definitely a while, that's almost from the beginning right, not quite but it's pretty close.

03:27 Austin: Not quite at the very beginning but yes, it's been a long time, yeah.

03:31 Michael: Yeah, awesome. So, we were in Oslo together, Norway, last week with a bunch of other speakers and developers at the NDC, the Norwegian Developer Conference there, and I would say that you and I were the guys carrying the Python flag, there was, we were kind of the Python guys in the sea of other type of folks, right.

03:55 Austin: Yes, that's very true, I mean, that's traditionally Python doesn't have a large footprint at that conference, so you and I definitely were the diplomats, I think the ambassadors for Python. but I was surprised at how much interest there was in it among the other delegates, a lot of people have some glancing experience with it and they were I think interested to see, to learn more about it, so I think it's growing topic of interest even at not traditionally Python heavy conferences and venues.

04:25 Michael: Yeah, I have done Python talks at several conferences that were I would say decidedly not Python conferences, and they've been received really, really well and I think it's just one more manifestation, one more piece of evidence that Python is really a growing ecosystem.

04:40 Austin: Yes, yeah, Very much so, and it's just getting more popularity every year, it's incredible.

04:46 Michael: Yeah, it is- I say this a lot in the show but it's amazing to me that like the language grew at a pretty respectable but not insane growth rate for a really long time and kind of germinated and then you know, just caught fire in the 2000s, it's cool.

05:06 Austin: Now is the time for Python, yeah.

05:06 Michael: It is definitely the time. The reason I brought up NDC is you had a really cool presentation there, on this concept which is a general programming concept, it's available in Java and maybe some other languages, I am not sure, but called mutation testing. So I've done a lot of the unit testing and other kinds of testing, I've heard of genetic algorithms, so maybe it's genetic algorithms I actually that it's not necessarily- why don't you tell us what mutation testing is? I want to share it with the audience.

05:34 Austin: Sure. The mutation testing is you can think of it in substance as a test for your tests, it's the main goal of mutation testing is to gage the effectiveness of your existing tests, so if you take the sort of theoretically perfect standpoint that you have a test suite that tests 100% of your functionality at least in principle, than mutation testing can tell if your test suite actually does test your functionality, you can find holes in your test system, and it can also help you find code in your code base that isn't tested and maybe it can just be removed because it doesn't actually contribute to any real functionality, so mutation testing as you said has nothing to do with genetic algorithms, it doesn't try to search out failing test cases or something, it's a very dumb algorithm, it systematically makes small modifications.

06:28 Michael: It's kind of exhaustive, right.

06:31 Austin: Exactly, right, it's an exhaustive 6:32 for search through a pretty large space to try to just trick your tests into passing a mutant, the basic idea is that you make these very small changes to your code base and then run your test suite and if your test suite passes then we say that the mutant test survived and this is what you don't want, this means that your test suite is incapable or doesn't have the fidelity for detecting the change you've made which we consider an error.

06:58 Michael: Ok, so before we get into that though, you said something I thought was interesting, you said you have 100% code coverage and yet your tests are not doing their thing, so I think there is like layers of or levels rather of sort of verifying your tests, like you are writing tests, that's level 1, you have tests, they exist; it's an existing thing, step 2 or level 2 of enlightenment would be you have a significant amount of code coverage because without code coverage you could have like a 1000 tests but they could all be about some small useless part of your app and an important core section like might not actually be tested. But this supposes you are kind of, you are at level 2 enlightenment right, you have tests, you have tests, you have good, maybe not 100% but you have pretty solid code coverage. And now you want to say is this actually- there is a difference between executing code and verifying code, right?

07:51 Austin: Right, and this is really important distinction that mutation testing gets to the heart of which is as you say, you could have 100% coverage in the sense that your test suite causes 100% of your instructions to be executed and how would you define that instruction set; but it doesn't tell you whether or not- the traditional coverage doesn't tell you whether or not your tests are verifying the functionality, so you could have a glaring defect in your program that your test suite is exercising but not actually verifying and mutation testing goes to the next level and tries to tell you if your test suite is actually verifying functionality, if it's capable of detecting actual errors, and if it's not then we say that you need a more powerful test suite, and that's the kind of a whole point of mutation testing, it's sort of adjunct to an existing test suite. And just to add on that, I mean this question of 100% coverage no matter how you slice it is a really hard thing for most projects to have in fact most projects that are in the world don't have anything close to 100% coverage but mutation can still be useful even with systems that don't have 100% coverage, it just is going to throw a lot of initially just throw a lot of flags telling you that you have problems where you just don't have tests yet. It's not a technique that can just be used on some systems that have 100% coverage already, this is my point.

09:18 Michael: That's good. Because that means it would have been excluded from quite a wide bit of [indiscernible] [laugh]

09:25 Austin: yeah, nobody would be able to use it yeah, that's the truth, yeah.

09:27 Michael: Right, ok. There is some really cool ideas, you talked about mutant, this is the idea of like changing your program, introducing some mutation and just almost randomly find a spot and make a change and see what the effect is, because theoretically you should be able to detect this change which broke something presumably by changing it, right?

09:49 Austin: Yeah, that's exactly the case, the mutations we are talking about, the modifications we are talking about are typically very small and so the example is replacing a relational operator, so if I have some line of code that's x is less than 1 I could change that, the mutation would be to change that to x is greater than 1 for example and then make that one small change and then run the test suite again so these changes are very small, but the point is that the changes should all in principle be detectable by a sufficiently powerful test suite.

10:24 Michael: Ok, that sounds like that would be true most of the time, but I think there might be some cases where it might not be detectable, before we get into that though, I want to clarify for the listeners, you are not the one doing the mutations, right, like as a developer, that's not you?

10:42 Austin: Correct. the whole point of the mutation testing tool is that it will do the hard work, the boring work of ploughing through your code, finding the places that can potentially be modified, modifying them and then running your test suite so in principle you should be able to point the tool at your code, tell it what your test suite is and then walk away for probably a very long time, and come back to get your results. And, yeah, this takes away all the drudgery associated with that, and it actually gets you some really interesting results typically in the end.

11:16 Michael: Yeah, that is really cool, so maybe we could think about like what does the mutation testing tell us, because, sometimes you might make a change and your code will then fail the test, right, so if you are testing that I select the user and is equal the count of users I got back was 1, and you changed that to not equal for example, your framework changes it to not equal, obviously that test would fail but it could also change things that I don't detect.

11:44 Austin: Yes, that's true. You can have results where your test suite passes a mutant and then you go examine the code and you realize that there is really no way to write a realistic test that would detect that change and this is a class of mutant set if you read the literature, it's called the equivalent to mutant and an equivalent to mutant is exactly that a mutant that is functionally equivalent, it's still a mutant, it's been changed but it's still functionally equivalent to the original program and for some reason or the other and this is a very language specific thing but for some reason or the other you simply cannot detect, it and this is one of the really tricky difficult aspects of mutation testing is ferreting out and somehow avoiding these equivalent to mutants, yes.

12:29 Michael: Yeah, that's interesting. So obviously, if we mutated and then the test fail that's a upvote for our test, right, we've made a change to the code, we rerun the test, the test said your code has changed, it's no longer good, but then sometimes, it might not come back- is it possible that if like you change some kind of like while loop condition, it could just go forever?

12:53 Austin: That is entirely possible and this is yet another class of complexities that we have to deal with the mutation testing these mutant set, the great example, the canonical example is what you said, they go into an infinite loop so if I change for example one mutation would be to change a break to a continue, and if you do that then you typically create a situation where an infinite loop is very lightly because you have taken a place where your code is in the exe condition and then where it wants to break and you said don't break, continue the loop, and it's going to stay in the exe condition and just kind of continue forever. So that kind of mutants falls into the category that we call incompetent, and I should back up and say there are sort of three main categories for mutants, after you find your test suite you have some results, so you talked about just a second ago where the test suite fails, and in that case your test suite has killed the mutant, your test just has failed indicating that it knows you've made a change.

13:53 The other broad category is that you mutant survives, that is where your test suite passes and this is where we start to look for weaknesses in our test suite, The third sort of solo category is this category of incompetent mutants, most incompetent mutants fail immediately by throwing exception or failing to compile or doing something along those lines, something catastrophic that prevents them from actually even being one under the test suite. And these, we still count these as killed, these go into the checkbox category and this is good but there are some incompetent mutants that do things like you say this run forever, maybe run for a very long time, so long that we don't really want to try to see if it stops. So, this area is a difficult one, it's one that you have to address on a practical level when you develop tools to do mutation testing this problem of incompetence, and you know, you start looking into the theory of detecting incompetent mutants and you run smack into Alan Turing's famous proof of the halting problem saying that you cannot look at a program and determine a priory if it is going to stop running at some point in the future and that is the problem you face with incompetent mutants and mutation testing.

15:06 Michael: yeah, there is not even much reasoning about it because it's just a brute force method anyway.

15:11 Austin: Correct, yeah.

15:12 Michael: Yeah, that's a big challenge. Can you give me some idea of like how frequent that category shows up, is that like .1%, 5%, 10%?

15:23 Austin: It’s a tough question to answer on a global scale, because I mean, I haven't run mutation testing on every program but in my experience, it's a relatively small amount, less than 1% of mutations, far less than 1% of mutations become incompetent, they are not huge problem in practice because the strategies we use to deal with them are really simple, which basically what we do if we time out, we establish using one method or another, a timeout for your test suite, and if it takes longer than the time out we just count that as incompetent, we say that mutant is in an infinite loop or in a huge loop and we are going to say that it didn't get to run.

16:00 Michael: If you consider performance part of your feature set, I mean, it's failing anyway, right?

16:04 Austin: Right, yeah, it's clearly problematic at that point.

16:08 Michael: Ok, interesting. And there is, you actually have two ways of timing out, like you could just say well we are never going to run tests for more than 5 minutes, but you had a cool thing to do with base lines, as well, right?

16:18 Austin: The tools that we have right now for testing, doing mutation testing in Python, the approach we take is to as you said, one way is let the user provide the time out, they can just provide an absolute time out and we'll honor that, or we can run the test suite over unmutated code and time that and use that as a base line and then let the user provide some multiplier, say 2 or 3 and then if a mutants test suite takes longer than n times, the baseline timing then we consider that incompetent and we kill it off, so this is our really simple but generally very effective approach to dealing with the halting problem in practice.

16:56 Michael: Yeah, it's way easier than proving it, right?

16:58 Austin: yeah definitely, that would be really difficult to do.

17:02 Michael: yeah, nice. So we have obviously the case where the mutant is killed, we have this incompetent mutant which we kind of can't really deal with, but then we have the more challenging case, it's maybe the interesting case you would say where you've changed the code, you run the tests and the tests also pass, right, so there is a couple of conclusions you can draw from this, yeah?

17:26 Austin: Well yeah, if you've made the mutation and the test still pass, then you have a couple of things to look at. One possibility, the standard thing that happens in that case is you just don't have enough tests, you need more or better tests because you have some change that was undetected by your test suite. The other very common, although the less common but still quite common case is that you have code in your program that doesn't need to be there anymore. It's extra code, it doesn't contribute to any functionality so your test in that case, your test suite is perfectly good, because it's testing the things it needs to be testing, the important functionality, but you've got bits of code that can be mutated but aren't being tested so you should get those bits of code out, if you view code as a liability rather than something important to keep around, just get rid of it at that point. The third possibility, and this is really a subcategory of the first is that we come back to this notion of equivalent mutants set they have been changed, are tests we hasn't detected them but there is no practical way to write a test for those and there is all sorts of interesting examples of these, they are a bit difficult to describe perhaps purely without showing some code.

18:43 Michael: Speaking of showing, all the videos of the sessions including yours and mine from NDC will be online shortly, and so as soon as they are online, I'll put the link to your presentation, so people can go back and see it but yeah, it is tough to talk about code examples on audio, right.

19:01 Austin: Yeah, it's quite difficult. But for Python I think I can probably describe the __main__ example.

19:08 Michael: Yeah, go for it.

19:08 Austin: So one equivalent mutant that you made retrospect it's quite obvious but I had 19:12 anticipated is the standard idiom in Python of using __name__=__main__ to setup your main block when you are writing program. But of course, if you've got that in your program and you have any kind of code in that block that can be mutated, the mutation testing suite will mutate that code. But of course, that block is never executed in a test because it's not accessible inside the test because __name__ is not equal to __main__ in that case ever. So we had this really interesting case, we have this whole body of code that is really important to your program in a way but that cannot be tested and never will be tested so that's the flavor of at least some equivalent mutants but one of the joys of equivalent mutants is that when you find them, you have these a-ha moments almost every time, because they are surprising, they are interesting and they kind of make you scratch your head a little bit, and so it's one of the I guess you might say strange joys of mutation testing.

20:13 Michael: [laugh] Yeah, it does sound pretty interesting, definitely it gives you some insight you probably wouldn't normally get.

20:13 [music]

20:13 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

20:13 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.

20:13 Typically, candidates receive 5 or more offers in just the first week and there are no obligations ever.

20:13 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $1,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $2,000!

20:13 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.

20:13 [music]

21:17 Michael: One example that I was thinking of when you were talking about the categories like logging, right, so maybe you've got some test and it says if this then log this thing, else log that. And you know, what it really makes sense to like write the test and detect what your log in-

21:33 Austin: Right, that's a really good point, most, a large category of equivalent mutants are exactly of that flavor, they are, the changes caused by the mutation are things that you in principle could test for but you never would because there is no reason to do it, there is no maybe business reason to do it, there are, maybe just no practical reasons to do it depending on what values are driving your project. And, so you end up not ever writing test for those and one of the challenges of writing mutation testing tools is allowing your user to specify in some way shape or form that mutation should not be performed on certain bodies of code for various reasons and this is something that all the tools that do mutation testing have to account for in some way shape or form.

22:19 Michael: Ok. So I am going to hold my question on how do you deal with that until we get to your framework. Because that is a really interesting problem and I want to dig into it. But before we do like, could you give us some, you know you talked about the sort of the basic changes like if you got a less than change that to a greater than. There is a whole variety and different types, there is like language agnostic changes, there is changes you can make that affect object oriented programming can you give us like a sense of those?

22:51 Austin: Sure, yeah. So this is actually one of the areas where there is active research into mutation testing, it's not a huge group of people doing this research but the research that is going on is to a larger degree into which kind of mutation should we be actually performing, so you mentioned that there is some mutation set are language agnostic int he sense that they apply to almost all programming languages you can imagine, and so typical example would be something like we are placing a constant, if you found a constant 4, in your code, the mutation testing suite might change that to 5 or 19 or -6 or something like that, so this is obvious change, it's sort of blatant change it should be obviously testable. And so, it's right for that kind of thing but other examples include things like replacing arithmetic operators, removing unary or adding unary operators we talked earlier about, relational operators replacing those, and all of these sort of fall into, they are broadly applicable, you can see them being applied in a functional language or LO language or any other potential language, whatever kind of the language it happened to be working on.

24:01 But some research has looked into mutations that are specific to for example object oriented languages, not Python, but a lot of object oriented languages have access modifiers for example, private, public, protected and so forth, and so one really clever and interesting mutation is to replace public with private or vice versa, to basically go in and mess with the access modifiers, and see if that is detected by the test we 24:27 results with that can't compile, in C++ for example if you change public to private that would probably break compilation of many programs. But, changing private to public- it's hard to say. That's actually very difficult to test for. And there are examples of specific mutations would include for example change in base class or- this is another one that can have really dramatic effects on what your program does, or in other cases, you can have absolutely no effect whatsoever, so you can see how changing base class order in Python for example could have completely undetectable, almost completely undetectable change to your program, the only way to detect that would be if you had a test that was checking base class order was checking the MRO for the class, and of course, nobody is going to write that test and I am not advocating that anybody write that kind of test.

25:18 Michael: It's not the kind of code you want to write, that's for sure.

25:20 Austin: Right, it's a complete waste of time. [laugh] That's I think a fairly enlightening example of the kinds of problems you face doing mutation testing as elegant and interesting and straight forward as the approach sounds, there are these really difficult edge cases you have to deal with. For example functional mutations for functional languages and the classic example there, a lot of Huskel like and f shock like languages have pattern matching for example on their functions, and changing the order of pattern matching is a common mutation you might perform on a language like that, and that again is another area where sometimes changing the order makes a huge difference, and sometimes changing the order makes absolutely no difference; and in those cases I actually don't know how you would test for them, because they are undetectable unless you have introspection and you have, you know, reflection to capabilities, you actually go in and do the kinds of tests I talked about the second ago that you would never write. So, it's a fascinating field to kind of deep down into and the papers are pretty accessible if you want to read about these kinds of things as well.

26:21 Michael: It sounds to me like one of the major challenges that you are going to run into for any, almost any reasonable size program is that it's going to be really slow right, because you are looking at basically every per mutation of all the operators, and you know, inheritance, method, it just, it is a crazy number of things in play here, right?

26:44 Austin: That's absolutely true. And I think the single biggest practical problem with mutation testing, the single biggest practical road block for using mutation testing is that it takes a long, long time to do. If you consider the possibility of having dozens or a hundred operators, the kinds of mutation that you might make in your code, and you have a large code base, you know, hundreds of thousands of lines of code is not uncommon in valuable systems, or even enough for valuable systems for that matter, and then you consider the fact also that a test suite might take a considerable amount of time to run, so you have this triply nested loop of the operators, the places these operators can be applied, and the amount of time it takes to run your test suite and you are talking about you know, if you do the math you can find some systems that adds up to years. I mean literally, it's not something you can do on a practical basis for all your code, in any way shape or form.

27:37 But there are some strategies that we can apply to, try to deal with that, the most basic strategy is just simply to parallelize for all the problems we have with long runtimes and mutation testing, the saving graces perhaps that it's embarrassingly parallel you can run each mutation- test suite run in a completely separate process all at the same time if you want to and they won't affect the results. So you could in principle go to Azure or Amazon and rent for 5 minutes 10 000 or 100 000 machines or whatever they let you get run all your tests and then be done with it. But that's not something that's probably economically feasible for most people. So, other approaches that there are to dealing with this, well there is not that many other approaches that I am aware of, but one is another form of baseline we talked earlier about baseline, time outs, when to kill the test suite and call it incompetent, another kind of base line that you can do is to run the full test suite with all your operators, all your code and get those results, and then as you start making changes to your code base, only run the tests that you know work and exercise modified code, code that you've changed, and that way you can drastically reduce the scope of the number of tests you need to run and that drastically reduces the number of operators you get applied, the amount of code space that can be potentially modified and so forth.

29:00 Also, you can tell your mutation testing system to only mutate code that was modified, so basically, we are analyzing deltas, analyzing our get difs so to speak and say only run the tests that we know could possibly have an impact or be impacted by the changes that were made. And, this is a heuristic approach because of course, it's not water tight, you could of course make changes to your code that influence the code pass that your tests are now exercising, and if you purely use this kind of base lining for determining which tests to run and what to test then you would be missing things so you would have to occasionally do, only in principle will you be missing things, so you have to do occasional rebase linings to make sure that you kept up with all the few changes. And it also assumes that you have some way of co-relating your tests with lines of code, so this is where mutation testing and traditional coverage analyses tools can come into play, where they work together, because now you can say ok, I take the coverage analyses information I know which test exercise, which lines of code I can couple that compare it to the deltas and determine which test needs to be run by the mutation testing suite.

30:09 Michael: Right, it's like an inverted code coverage, right? So, if I look at this test, what part of my code in my real app changed or somehow was affected by running this test, right, so you could just focus on say like ten lines of code or probably way more than that but focusing in that area, right.

30:29 Austin: Exactly, that's the point, is drastically reduce the space and if I think in principle you can get this down to where you can run things fast enough that you could do it on every commit or bundles of commits rather than once a week or something along those lines which may or may not be desirable but it's interesting goal from a tool developer's point of view.

30:49 Michael: Yeah, it's definitely an interesting goal, one of the things I was thinking of as you said this was is this a thing that needs to run on say every check in, or every time that you want to run your test because if you have a good set of tests, hopefully your tests are actually catching your bugs and this feels to me like a validation of your tests, rather than- it seems like it could theoretically run less often and still be really valuable.

31:17 Austin: I think in practice you're right, that it doesn't need to run on every check in but if you are working on a team that wants perfect code coverage for example and that requires say you have a policy on a legacy code that any change you make needs be backed up by tests, which is a common thing to do with existing legacy systems, that will try to approve their lot in this world. You might have that policies as every commit, whatever changes you made need to be backed up by tests and this is a good way to verify that, not to verify just that you've made tests, but to verify that the tests you've created actually test, it's test the functionality correctly and so if you can make mutation testing fast enough, you can actually enforce that kind of constrain in a pretty strong way and that's an interesting thing.

32:00 Michael: Yeah, that is quite interesting, because my experience is there is a massive difference among team members on their level of embracing testing and how much they run the tests, like some people are really into it and some people only run it if there is something making them run it basically.

32:17 Austin: Yeah, that's very true, and so now you have a new stick to beat the people around the head with it, you have mutation testing in place. [laugh]

32:27 Michael: Nice, so let's make this, let's bring this down to Python, let's make it concrete. Let's talk about this thing called Cosmic Ray that you created?

32:34 Austin: Ok, yeah, Cosmic Ray is as you just hinted, it's a mutation testing tool for Python, I should say it's not the first mutation testing tool for Python there were a few available when I started writing it, but none of them were- they didn't quite work the way I wanted, or they were unmaintained, really this was an interesting project. So this started out almost as just a fun thing to do, and it turned out to be really fascinating project all around. The Cosmic ray is a system for finding, searching through your Python code, finding places to mutate, making the mutations and then running your test suite. And it's a fairly young project, and it has quite a bit work left to be done on it but it has produced some results already so it's looking quite promising. It's about a year and a half old I think at this point and really has only been used by me and a few sort of close trusted friends but it's open source, it's on github and anybody who wants to try it, to make contributions or give any feedback is more than welcome and in fact encouraged to do, you know, go take a look at it.

33:39 Michael: Yeah, awesome, and I'll be sure to link to the github repo and things like that, and it's on PyPi of course, right?

33:45 Austin: Honestly I am not sure, I think it is but I last time I pushed up into PyPi- [laugh]

33:52 Michael: Let me see- yes it is, it is on, it's cosmic_ray on PyPi.

33:58 Austin: Ok, saved. I guess the interesting parts for a lot of people are going to be how cosmic ray works, internally.

34:05 Michael: Yeah, absolutely. And there is some really amazing stuff in there, before we get into that, can you just really quickly like tell me what do I need to do, like if I've got some Python app with some tests, you know I'm using pytest or something like that, like what are my steps to apply this?

34:23 Austin: The steps are pretty straight forward: identify the parts of your code that you want to mutation test, very often you'll have some part of your code that has a good test suite and is heavily, thoroughly tested and is central to the functioning, and other parts that aren't and you can't use Cosmic Ray to slice and dice the parts you do and do not want to test, so if you just want to take it for a spin, identify some module that you are interested in.

34:50 Michael: Because you want it to happen in a short amount of time, right?

34:54 Austin: Well that's one of the other reasons, yeah, you'll get more bang for your buck if you just try to test drive this, then if you try to run it over, you know, a 10 000, if you want to run this over Django forget it, it's not going to work but I mean, not in the practical sense, but if you want to run it over a single module, in Django or some other package then you'll have more luck, that's been my experience at least with this so far, but yeah, you'll need a test suite, right now we only support unit tests, the standard library unit test and pytest as the suites we support but there is a plugin system for other testing systems if you feel you need one supported they are pretty easy to add. Point Cosmic ray at your module and at your test suite and you'll pass it a few other parameters you know, the things having to do with timeouts and so forth and it will build up a work order, yeah, basically the list of things that's going to do it and put those in a little database, and then you'll need to setup Celery, Celery is a task distribution queue that runs on top of RabbitMQ by default and this we use Celery to distribute work out to workers that actually do the mutation, run the test suite and then send results back. And so you'll have workers setting on your Celery queue and then you tell Cosmic Ray to run the work order that it's built and it will start doling out work to these workers and collecting the results back into the little database it’s got. And that's the short version of what you need to do, once you have results back then you start analyzing them and trying to figure out what cosmic ray is telling you.

36:20 Michael: Right, you look at those three categories you decide what to ignore what not to ignore.

36:20 [music]

36:20 Gone are the days of tweaking your server, emerging your code and just hoping it works in your production environment. With SnapCi's cloud based, hosted continuous delivery tool you simply do a git push and they autodetect and run all the necessary tests through their multistage pipelines. If something fails, you can even debug it directly in the browser.

36:20 With the one click deployment that you can do from your desk or from 30 000 feet in the air, Snap offers flexibility and ease of mind. Imagine all the time you'll save. Thank SnapCI for sponsoring this episode by trying them for free at snap.ci/talkpython.

36:20 [music]

37:24 Michael: And is there a way to flag and say this thing you've detected here I want to ignore that?

37:31 Austin: Not yet, and this is actually one of the big open areas for development, is how do we let users specify exceptions effectively, how do we let them say don't make this mutation on this line of code or even more coarsely don't make mutations on this line of code because we need that kind of thing because the problems of equivalent mutants and so forth that we have no real solution to. Right now, there is a thought about the direction to take this and then how to deal with it, if you look at tools like Pylint, they have great systems for putting essentially orders into comments in your code telling Pylint please don't apply rules such and such to this line of code. We could probably apply the same kind of technique to Cosmic Ray but I am not sure yet if that's better than having some extrinsic description of exceptions, it's basically an open question and if anybody has ideas or wants to take a swing at it, this really is one of the big things we need to sort out. Soon.

38:28 Michael: Let's look inside, basically you point Cosmic Ray at your module, and you say go shred this thing and for every shred that you create, go run the unit test, right?

38:41 Austin: That's exactly right.

38:42 Michael: Walk us through the internals there, there is some interesting stuff you are doing?

38:45 Austin: Well, at the core of all of this is the standard library module AST, AST is short for, an acronym for abstract syntax tree. And abstract syntax tree is just a programmatic structure defining a program, the syntax in your source code when Python parses your source code it produces an abstract syntax tree and then you can access this looking at the different notes in the tree, you know, looking at the different parts of your program, and not just look at them but you can also change them; so what AST allows us to do in cosmic ray is load up your source code with literally read your source code from your .py file and we pass it into a parse function which parses the source code into the abstract syntax tree and then AST has other components which allow us to walk down that tree and if we want to make changes. The details of exactly how it does that, it might be difficult to talk about operators and things like that in too great of detail.

39:42 Michael: Yeah, so well basically you get this abstract syntax tree and then you start applying your transformations to it, right, your mutations if you will, yeah?

39:53 Austin: Well that's the fundamental idea, yes, so you have the AST and you find the place where you want to make a modification and that you make a modification to it, you can and there is support in the AST module for doing that kind of work. Once you've modified the AST then you need to get it to make it available to you test suite, you need to make it importable and that is the whole other kind of second level trick.

40:13 Michael: Yeah. There is one thing to say, hey Python on this module, it's another load up an individual AST and then turn that into executable things, right?

40:23 Austin: Exactly, yeah. That was sort of the second big phase of work in building cosmic ray, it was figuring out how to do that so once you have an AST, a modified AST, you can pass that to the built in compile function and that spits out what's called a code object, and it's this kind of thing that modules can use, so to speak. Then we can execute to populate the module, so figuring out how to make that available make your modified AST available through the standard import was a big goal of cosmic ray, we didn't want people to have to modify their test suites to do mutation tests, we wanted the test suites to just naturally say import, import of the module, and get the right one, so we had to do a lot of investigation into how Python does this- at the core there is three main moving parts, to how Python does imports, how it lets you control imports.

41:17 The first thing was called a finder, and a finder is an object that was a class typically but a function or a class that is responsible for telling Python that it knows how to load a module given that module's name. So Python will ask the finder, I've been asked to import foo do you know how to do anything with foo? And the finder can say yes, or no. If a finder does know how to lead something, it returns what is called the loader and the loader is then responsible for populating essentially the shell of a module, so Python will make the empty shell of the module, pass it to the loader and then say ok, now you populate this with the names, the functions, the name bindings, all that kind of stuff that come from the module that you are supposed to be loading from it.

42:06 What we do, is in cosmic ray we have our own custom finder and that finder is given the modified AST and it's told the name of the module, and if it's then asked by Python do you know how to load that module, it will say yes, and then it hits back a loader, we have a custom loader which also has this AST and it's the custom loader is able to execute the AST, or compile the AST I should say and to use that compiled AST to populate the shell module and then that shell module is passed back to Python and it's naturally imported so that everybody can use it. The last sort of moving part in this whole system is something called sys.metapath, if you import sys you'll see it has an attribute called metapath. Metapath is just a list of finders, and when Python wants to import something, and some experts might tell me that I am a little bit wrong with the details but this is effectively correct, Python marches down the matapath asking each finder in order do you know how to load this name, and the first finder that responds, is the one that wins.

43:08 So what we do is we take our custom finder, we populate it with its AST and its name and we stick it at the front of metapath inside our worker processes; and these worker processes then are able to hijack the import system in a sense and put these mutated ASTs directly into place so that nobody has to know they are there, but they get imported naturally by whoever wants to use them. So that's the long and the short I guess of how we stick mutated ASTs into Python programs.

43:34 Michel: Yeah, you will hand it dig deep down inside the guts of Python, take the red pill and not the blue pill, right?

43:41 Austin: Yeah, there was a lot of pep archaeology and stuff, to get tot e bottom of this. But at the end it's very elegant and powerful, it was- one of the joys of this project was learning all this stuff that I may never apply again but I feel like I've reached the next level of my Python expertise in the sense.

43:58 Michael: Yeah, that's really cool, but it's awesome because you don't change your code to make this happen, right, it adapts to what it has to do to basically take over.

44:08 Austin: Exactly, we work, cosmic ray works at deep enough level that your, neither your test code nor your code under test needs to be modified to use Cosmic Ray, it should work transparently in all ways, yeah, that was a big goal of the project.

44:25 Michael: You talked about celery, and celery is really awesome; there is a couple of other really cool projects that you kind of built upon, one of them was this thing called tiny db.

44:37 Austin: Yeah, tiny db, it is what its name says, it's a tiny data base, it's a little embedded file oriented json data base that you can import into your Python and use with basically no configuration, so it was exactly what I was looking for when I was looking for a database for cosmic ray. We used the data base for basically keeping track of the work order I described earlier, when we, the first thing you do in the mutation testing run is figure out what it is you are going to do and write all that down, we write that into the data base. And as the results arrive back via celery, we stick that, the results back into this database. So, Tiny DB is something that has worked up really well for us so far, and it was as I said super easy to use and it's stuck around so far, I have a feeling that it's going to end up being a bottleneck in larger projects, but that's a gut feeling, I don't have any evidence to indicate that, but if it has to be replaced then we'll start looking at something like sql lite or maybe we'll make the user, get the user the power to specify Mongo DB or whatever they want but, Tiny Db has really worth looking at I think if you don't have really sophisticated needs into data base and you want something that is just 45:50 and it's really beautiful little program that worked out of the box with really no reading on my card whatsoever.

45:56 Michael: That's lovely. I really like to use SQLlite and SQLAlchemy together, and those work really well and sort of in equivalent way, but I am a huge fan of the document databases.

46:08 Austin: The big selling points, one of the big selling points what made me stick with tiny DB is that I can- it literally is a json file, I can open it up in Emacs and just look at it, and I don't have to have any extra tools to examine its contents, I think that that json nature is what's going to be its downfall, that's what makes me think it's not going to last that long for this project but that's been a real selling point is, I can run my tests as I am testing Cosmic ray which as you might imagine is a real challenge, and then see what's in the database really really easily and so my cycle time has been really high, by using TinyDB.

46:45 Michael: Yeah, that's cool, and it's a 100% Python, according to gihub?

46:48 Austin: That sounds right, yeah, I don't remember any compilation happening when I used it, yeah.

46:53 Michael: Nice, yeah, it has a 1000 stars so it's done pretty well, I definitely want to check it out. The other one was Doc opt?

46:59 Austin: Yeah. Doc opt is one of my current favorite packages for not just Python but for lots and lots of languages. Doc opt is this, it's a tool for building command line parsers, but unlike things like Rparse or the other sort of standard tools for doing this, it takes kind of a backwards approach, you provide it with a string which is the posix standard help output that you would get from any program, you know, saying your usage, program name, option names and all that kind of stuff, the text information somebody gets when they type program-h, you give that string to Cosmic ray, and from that it generates a parser, that can then parse command an argument. So you never have to think really hard about building up these parser objects yourself, everything is done magically and everything you need to is think about how your pretty help messages is going to look.

47:55 Michael: Which you've got to write anyway.

47:56 Austin: You have to write, or have to get generated by some other tool but this has the need effect that embedded in your code somewhere is your full help message that is great documentation, not just for your users but also other programmers that can get your code. It really, it solves a really annoying problem that every programmer in the world has which is writing parsers for command arguments, in a really sleek way; one of the interesting things is that I didn't know this until I looked at Doc opt is that there actually is a posic standard for these help messages, so it can rely on actual existing standard for defining these things which is really cool.

48:37 Michael: That is cool, actually that's the first where I had heard of that there was a standard for this by learning about Doc opt, wait there is a standard for help messages- interesting.

48:47 Austin: I highly recommend that anybody who has to write command line tools and who hasn't tried Doc opt take a look at it, it's really addictive and you can produce really really powerful command like parsors, things like you have with git, like sub command based tools. I guess the other interesting about Doc opts is that while it was originally written in Python, the original, the canonical implementation is Python, it exists now for something like 30 languages. So if you are sometimes C# developer, sometimes Java developer, sometimes whatever developer, you can continue using Doc opt in those languages as well. It's a neat project from that point of view, it's something that you don't see a lot of.

49:22 Michael: Definitely means the idea of it resonated super well, right?

49:24 Austin: Yeah. It did.

49:27 Michael: Ok, so we are getting kind of near the end of the show now, I wanted to ask you- you have a company called Sixty North, right?

49:35 Austin: That's correct, yes.

49:36 Michael: Yeah, you and Robert Smallshire is that right?

49:39 Austin: Yeah, that's right.

49:41 Michael: Yeah, you guys are up in Norway which is why I ran into you, although we also seem to run into each other in London, so what do you guys do there?

49:49 Austin: At Sixty North we do do a lot of Python work, we do consulting, training, some development as well, we've made some courses for Pluralsight, if you go to Perl site and you look for the Python training courses we have the Python Fundamentals as our first course and Python Beyond The Basics which is sort of the next step intermediate level is there and we are working on a third one which is Advanced Python I think is the official name and that will be out by the end of the year, hopefully.

50:22 Michael: Ok, you and I we are both very passionate about online courses, tell me what's in your intermediate and your advanced courses?

50:28 Austin: Oh, I'd have to stretch my brain to have exactly the contents of those courses, but I know the intermediate course, we start getting into things like decorators, class properties, some of the more, the details of classes beyond just you know, functions and methods-

50:42 Michael: How you define a class and add fields to it?

50:46 Austin: Yeah, getting beyond that. A couple of things that are beyond the basics, you'd be surprised at how many things there are that go into the basic course that are really basic, and I mean the course is quite long and doesn't really scratch the surface of Python so anything like I mentioned decorators or-

51:03 Michael: Probably lambda expression type things?

51:05 Austin: I think lambda's in there, context managers, implementing a lot of the dunder magic methods, that kind of stuff is in the intermediate. And then the advance classes where you start to get into things like what we talked about earlier, finders and loaders, or you start getting into meta classes, and things that we classify to a degree as things you might do once a year. And sort of things that you do every day, as a professional Python programmer, I mean finders and loaders I programmed Python for 20 years and never used it, but it's an interesting and important part of the language, so it needs to be in there somewhere.

51:37 Michael: Yeah, and once you understand it, maybe you do not use it often but knowing the mechanics helps you understand a lot of things often at that level.

51:45 Austin: Yeah, and you have that in your pocket and so that might be the most elegant solution for some particular problem you face rather than some horrible hack you would have to come up with otherwise, so it's the advanced stuff is for people who are using Python a lot and need to find the best solutions who really understand the inner workings of the Python runtime.

52:04 Michael: yeah cool, so if you guys have a Pluralsight subscription go over there and type in Python in the search box, you'll find Austin.

52:09 Austin: Yeah, [laugh]

52:13 Michael: Nice. And you also write some books too?

52:15 Austin: We do have some books, yeah, we have, the books are based on the same material as the Pluralsite courses, and the first one is I think 90% done now it's on Leanpub, it's called the The Python Apprentice. The second and third books, the Python Journeyman and the Python Master are in the works and it will be published probably not this year but soon, and it's on Leanpub, you can you know, get the early version and we'll keep sending you updates as we make updates to the books, but if you prefer books, these are available as I think pdfs, and mobys and epubs on the Leanpub site.

52:51 Michael: Nice, and that is self publishing, right?

52:53 Austin: That is self publishing, yes.

52:55 Michael: Very cool, I am a big fan of self publishing, so I like to see when people are succeeding with that, that's great. I'll be sure to link to all those things in the show notes as well.

53:04 Austin: Ok, that would be great.

53:05 Michael: Yeah, absolutely, very interesting. Definitely cool. Two more questions before I let you go; what is your favorite PyPi package? I saw the other day there is over 80 000 distinct packages out there, that's an insane number, there's got to be something that you had exposure to that you want to share like oh you should check this out.

53:26 Austin: Well, it is going to feel like a but of a cheat but doc opt. Doc opt is one that I once I learned about it I started using it on almost every project I use. But I know that it's not that well known, it is not as well known as I think it should be so I'll just put a second vote in for Doc opt, that's for my money, that's the tool I keep going back to in PyPi every time and it should be more widely known and more widely used because it's awesome.

53:55 Michael: Yeah. That's awesome, I'll go and throw one for Cosmic Ray, that's pretty awesome and very interesting to check out.

54:04 Austin: Thanks.

54:05 Michael: And then, you mentioned Emacs earlier, but if you are going to write some Python code what do you typically open up?

54:10 Austin: Well, the short answer is Emacs. I've been using Emacs for almost as long as I've been using Python, I think and it's in my fingers to a degree. If I know that I am working on just a dedicated Python project then PyCharm is a wonderful IDE and it's got a lot of power that Emacs doesn't have when it comes to working with Python, that Emacs doesn't have yet, I should say. It's really great for just pure Python editing, I guess the reason I stick with Emacs is stubbornness to a degree, I am old and don't want to change. But I am also very often working on multiple languages at a same time and any given project, you know, everything from Javascript to Python 2, to whatever, it happens to be part of that project and I find it Emacs makes it easier for me to do that. Or at least it's the best for that kind of work from what I can tell. And, honestly, Python as an Emacs IDE is pretty good, you can do all sorts of fancy stuff in there if you want to spend the time to configure it and if you use a package Emacs configuration like space max, you'll find that you get pretty sophisticated support for things like completion right out of the box, you get Jedi support and things like that. So I know that- I try not to recommend Emacs to new people to people getting new to Python because that's a whole level of complexity but Emacs as a way of life is interesting place to be.

55:39 Michael: So, any final call to actions for our listeners, you've got the mic.

55:44 Austin: Any more calls to action...

55:45 Michael: Are you looking for contributors to your projects?

55:50 Austin: Certainly Cosmic Ray could use some people who are willing to put in some work, we have of course the github issues page where I keep track not just of defacts but also of the higher level issues that need to be done. I mentioned earlier that we have this pressing need for being able to embed exceptions and processing instructions in our code so that Cosmic Ray can notice not do certain kinds of mutations, and that's a big project that somebody might be able to take on, we have I guess the two other big topics I could think of are support for different kinds of modules, right now Cosmic Ray can only work against modules that are written in pure Python code, so .py files. But of course there are plenty of other exotic kinds of modules out there. So Cosmic Ray needs to either gracefully skip over those other kinds, or learn how to process those and there is no support for that right now. And that's a big limiting factor. And the other is, this is more of a researchy thing, but the integration with coverage testing that I talked about earlier, being able to take output from say coverage.py and use that to determine how we can narrow down these scope of Cosmic Ray mutation testing runs and make it a more practical tool. But, really it's, go to the issues page on github and look and you'll see the nature of the things that are going on. That would be my call to action I guess for Cosmic Ray.

57:06 Michael: All right, fantastic. I'll put link to github repo in the show notes. So Austin, it's been really fun to talk about this idea of mutation testing, it's I think it's a really interesting evolution if you will. All the testing tools, right, it's I can see a place where this algorithm gets tuned and like the various optimizations you talked about, get in there, that this could be a big part of day to day work, it’s cool.

57:33 Austin: Cool, I am glad you think that and thanks for having me on the show to talk about it, it's something I really enjoy talking about in public, so [laugh]

57:41 Michael: Yeah, you bet. Thanks for being on the show and it was great to see you last week, take care.

57:45 Austin: It was great seeing you last week, bye Mike.

57:45 This has been another episode of Talk Python To Me.

57:45 Today's guest was Austin Bingham. This episode has been sponsored by Hired and Snap CI. Thank you guys for supporting the show!

57:45 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $2,000 USD.

57:45 Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at snap.ci/talkpython

57:45 Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python.

57:45 You can find the links from this episode at talkpython.fm/episodes/show/63

57:45 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm.

57:45 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song at talkpython.fm/music.

57:45 This is your host, Michael Kennedy. Thanks for listening!

57:45 Smixx, take us out of here.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon