Validating Python tests with mutation testing

Episode #63, published Thu, Jun 16, 2016, recorded Wed, Jun 15, 2016

Episode Deep Dive Transcript

Do you think it's a good idea to test your software? Do you write unit tests or other automated verification for code? I think most of us do these days. A key question is how do you know whether your tests sufficiently verify your code? The standard answer is code coverage.

But there is a difference between executing code (which code coverage measures) and truly verifying it.

On this episode, we'll talk with Austin Bingham. He created a mutation testing framework for Python that goes beyond code coverage to actually perform this verification. It's a fresh and powerful idea. I hope you enjoy it!

Links from the show:

Cosmic ray on Github: github.com/sixty-north/cosmic-ray
Cosmic ray on PyPI: pypi.python.org/pypi/cosmic_ray
TinyDB: tinydb.readthedocs.io
TinyDB on Github: github.com/msiemens/tinydb
Austin on Twitter: @austin_bingham
Sixty North: sixty-north.com
The Python Apprentice book:
leanpub.com/python-apprentice
50% off coupon for apprentice book:
leanpub.com/python-apprentice/c/talk-python-to-me
The Python Journeyman book:
leanpub.com/python-journeyman
The Python Master:
leanpub.com/python-master

Episode Deep Dive

Guest Introduction and Background

Austin Bingham, co-founder of the consultancy and training company 60 North, has been programming in Python since the late 1990s. He studied software engineering at the University of Texas at Austin and has deep experience in testing, tooling, and Python's internals. Austin is heavily involved in Python education: He co-authored "The Python Apprentice" and co-created Python courses on Pluralsight, focusing on Python fundamentals and advanced features. He also maintains the open-source mutation testing framework for Python called Cosmic Ray.

What to Know If You’re New to Python

Here are a few prerequisites to help you get more out of the mutation testing and test topics discussed:

Have a basic understanding of how Python’s standard library unittest or pytest can test your code.
Be aware that code coverage (often measured with coverage.py) shows which lines of code run, but does not guarantee correctness.
Appreciate that testing includes verifying a wide range of features (not just line coverage).
Know that Python has an ast module enabling programmatic inspection and modification of code structure.

Key Points and Takeaways

Mutation Testing as a “Test for Your Tests” Mutation testing modifies (“mutates”) small pieces of your code to see if your test suite catches the changes as failures. If the tests still pass, that means the suite did not detect the defect (the mutation survived). By systematically mutating and evaluating whether the tests fail, mutation testing highlights the blind spots in your test suite.
- Links and Tools:
  - Cosmic Ray: Austin Bingham’s mutation testing tool in Python.
Going Beyond Code Coverage Traditional code coverage only tells you which lines were executed during tests, not whether your tests truly validate correctness. Mutation testing addresses this gap by verifying whether code changes get caught as errors. Code coverage can remain high even if critical bugs remain undetected.
- Links and Tools:
  - coverage.py: Common tool for measuring code coverage in Python.
Equivalent Mutants and Incompetent Mutants An “equivalent mutant” is a mutated version of the code that behaves the same as the original, making it logically impossible for tests to fail. “Incompetent mutants” are ones that either fail immediately (e.g., cause a syntax error) or never complete (like causing infinite loops). Handling these categories is one of the trickiest parts of mutation testing.
- Links and Tools:
  - PyLint: Sometimes used to mark sections of code or skip certain checks (though skipping mutations is an open issue in Cosmic Ray).
Performance and Parallelization Challenges Mutation testing can be slow because it systematically mutates and reruns tests many times. It’s an “embarrassingly parallel” task, so frameworks like Cosmic Ray distribute it across multiple workers to reduce overall runtime. This allows teams to apply mutation testing to only critical modules or only code that recently changed.
- Links and Tools:
  - Celery: Used by Cosmic Ray to distribute testing tasks and aggregate results.
Leveraging Python’s ast (Abstract Syntax Tree) Cosmic Ray reads and modifies Python code using the built-in ast module. It parses source code, modifies specific nodes (like changing a + to a -), then recompiles that mutated AST. This deep integration with Python internals lets Cosmic Ray mutate code without manual file edits.
- Links and Tools:
  - Python ast documentation
Hijacking Python’s Import System Python import mechanics revolve around “finders” and “loaders” managed by sys.meta_path. Cosmic Ray uses a custom finder/loader to intercept import requests, compile the mutated AST into a module, and inject it in place of the original code. This approach requires no user code changes to run mutated versions under the usual tests.
- Key Terms:
  - Finder
  - Loader
  - sys.meta_path
Partial Testing or Subset of Files Teams rarely run mutation testing on an entire codebase due to time costs. Instead, you can limit mutation testing to recently changed modules or the core logic that truly matters. Combining coverage-based analysis with mutation testing narrows down which lines to mutate, speeding up results.
- Tools and Ideas:
  - coverage.py: Potential synergy with coverage to detect which lines to mutate.
Using DocOpt for Command-Line Parsing Austin’s tool, Cosmic Ray, relies on DocOpt for building the command-line interface. DocOpt inverts normal parser logic: You write the help message in a standard POSIX format, and DocOpt automatically generates the argument-parsing code from it.
- Links and Tools:
  - DocOpt GitHub repo
TinyDB as a Lightweight Database Cosmic Ray uses TinyDB to store work items (the “work order”) and results. TinyDB is a purely Python-based, JSON-backed database that requires no separate server, making it easy to embed.
- Links and Tools:
  - TinyDB Docs
Editor and Tooling Choices During the interview, Austin discussed editing environments. He frequently uses Emacs for a wide range of tasks, partly because it supports multiple languages without switching tools. However, he also acknowledged that PyCharm is a strong IDE for dedicated Python work.

Links and Tools:
- Emacs
- PyCharm

Interesting Quotes and Stories

"Mutation testing is, you can think of it as a test for your tests." -- Austin Bingham

"But there's a big difference between executing code, which is what code coverage measures, and truly verifying it." -- Michael Kennedy

"One of the joys of equivalent mutants is that when you find them, you have these aha moments because they're surprising and interesting." -- Austin Bingham

Key Definitions and Terms

Mutation Testing: A process of programmatically altering code in small ways (“mutations”) and rerunning tests to see if the suite detects errors.
Equivalent Mutant: A mutant that behaves identically to the original code, making it impossible for tests to detect.
Incompetent Mutant: A mutation that crashes or hangs (e.g., infinite loop) immediately, often considered “killed” by default.
AST (Abstract Syntax Tree): A tree representation of code structure; Python’s ast module lets you read, modify, and compile ASTs.
Finder/Loader: Components in Python’s import machinery; they locate modules and load them, possibly substituting modified code.
sys.meta_path: A Python list of finders. Each finder is given a chance to handle an import request in sequence.

Learning Resources

Python for Absolute Beginners: If you’re new to Python, this course is an excellent place to start.
Getting Started with pytest: Great next step for learning more advanced testing in Python.
Cosmic Ray on GitHub: Dive into mutation testing with Austin’s open-source project.
DocOpt: Learn how to simplify your command-line parsing by writing standard help messages.
The Python Apprentice: Austin’s book on Python fundamentals, co-authored with Robert Smallshire.

Overall Takeaway

Mutation testing pushes test-driven development to the next level by ensuring that your tests do more than just run the code, they verify correctness. Tools like Cosmic Ray leverage powerful Python features (AST manipulation, custom import hooks, parallel task distribution with Celery) to automate this otherwise tedious process. While challenges remain, such as handling “equivalent mutants” and keeping runtime manageable, the payoff is a more robust, confidence-inspiring test suite.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Do you think it's a good idea to test your software?

00:01 Do you write unit tests or other automated verification for your code?

00:05 I think most of us do these days.

00:07 But a key question is, how do you know whether your tests efficiently verify your code?

00:12 And the standard answer is code coverage.

00:14 But there's a big difference between executing code, which is what code coverage measures, and truly verifying it.

00:20 On this episode, we'll talk with Austin Bingham.

00:23 He created a mutation testing framework for Python that goes beyond code coverage to actually perform this verification.

00:30 It's a fresh and powerful idea.

00:31 I hope you enjoy it.

00:32 This is Talk Python To Me, episode 63, recorded June 15, 2016.

00:37 Developers, developers, developers, developers.

00:41 I'm a developer in many senses of the word because I make these applications, but I also use these verbs to make this music.

00:48 I construct it line by line, just like when I'm coding another software design.

00:53 In both cases, it's about design patterns.

00:56 Anyone can get the job done.

00:58 It's the execution that matters.

00:59 I have many interests.

01:01 Sometimes you can flip it.

01:02 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:08 This is your host, Michael Kennedy.

01:10 Follow me on Twitter, where I'm @mkennedy.

01:12 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:19 This episode is brought to you by Hired and SnapCI.

01:22 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.

01:29 Hey, everyone.

01:30 We have an interesting deep dive into the world of Python testing and Python internals today.

01:35 Before we chat with Austin about mutation testing and his Python library called Cosmic Ray,

01:39 I have a few goodies to give away to a lucky couple of listeners.

01:42 First, Austin and his co-author Rob are giving away a copy of their book, The Python Apprentice,

01:48 as well as two free passes to their online Python course.

01:51 As always, just visit talkpython.fm and make sure you're a friend of the show to be eligible to win.

01:56 I'll pick three lucky winners next week.

01:58 Now, let's meet Austin.

02:00 Austin, welcome to the show.

02:02 Thanks, Mike. I'm really glad to be here.

02:04 Yeah, I'm super excited to share this mutation testing idea that you've sort of manifested in Python.

02:10 That's really cool.

02:11 We'll talk a lot about that.

02:12 Before we get into it, though, what's your story?

02:14 How did you get into Python and programming?

02:15 Well, how I got into programming was when I was quite young, I guess, around 10 years old.

02:20 We had a computer around the house, and it was an old IBM AT or something along those lines.

02:27 I forget exactly the model.

02:29 And it could be programmed in BASIC, and that's really caught my attention.

02:32 My parents got me some magazines and so forth that taught me how to do more complicated things than I could figure out on my own.

02:39 And it sort of took off from there.

02:40 Python, I have tried to figure out where I first started using Python.

02:45 And to be honest, I'm not entirely sure.

02:47 I think it was around in graduate school, though.

02:49 So this would have been in the late 90s.

02:52 Nice.

02:53 What did you study in grad school?

02:53 That was software engineering.

02:55 This was at University of Texas at Austin.

02:57 And we were doing all sorts of stuff related to artifact traceability in large-scale software systems.

03:03 And somewhere in there, Python showed up for a build system or something along those lines, and it really caught my attention.

03:10 And it sort of started there and grew and grew with my career.

03:13 It has shown up everywhere since then to larger and larger degrees.

03:17 So it's something that I've really enjoyed using for the past, I guess, 20 years at this point or so.

03:22 Yeah, that's definitely a while.

03:24 That's almost from the beginning, right?

03:25 Not quite, but it's pretty close.

03:27 Yeah, not quite the very beginnings, yes.

03:28 But it's been a long time, yeah.

03:29 Yeah, yeah.

03:30 Awesome.

03:30 So we were in Oslo together, Norway, last week with a bunch of other speakers and developers at the NDC, the Norwegian Developer Conference there.

03:41 And I would say that you and I were the guys carrying the Python flag, if you will, right?

03:48 Yes.

03:48 We were kind of the Python guys in the sea, among other types of folks, right?

03:53 Yes.

03:54 That's very true.

03:56 I mean, traditionally, Python doesn't have a large footprint at that conference.

04:00 And so you and I definitely were the diplomats, I think, the ambassadors for Python.

04:06 But I was surprised at how much interest there was in it among the other delegates.

04:10 A lot of people have some glancing experience with it, and they were, I think, interested to see or to learn more about it.

04:17 So I think it's a growing topic of interest, even at not traditionally Python-heavy conferences and venues.

04:24 Yeah.

04:25 Yeah, I've done Python talks at several conferences that were, I would say, decidedly not Python conferences.

04:31 And they've been received really, really well.

04:33 And I think it's just one more manifestation or one more piece of evidence that Python is really a growing ecosystem.

04:40 Yes.

04:41 Yeah, very much so.

04:42 And it's just gaining in popularity every year.

04:44 It's incredible.

04:46 Yeah, it is, I say this a lot on the show, but it's amazing to me that, like, the language grew at a pretty respectable but not insane growth rate for a really long time.

04:57 It kind of germinated and then, you know, just caught fire in the 2000s.

05:01 It's cool.

05:02 Yeah.

05:04 Now is the time for Python, yeah.

05:05 It is definitely the time.

05:06 So, well, the reason I brought up NDC is you had a really cool presentation there on this concept, which is a general programming concept.

05:15 It's available in Java and maybe some other languages.

05:18 I'm not sure.

05:19 But called mutation testing.

05:21 So I've done a lot of unit testing and other kinds of testing.

05:24 I've heard of genetic algorithms.

05:26 So maybe it's genetic algorithms.

05:28 I actually know, seeing the talk, I know that it's not necessarily.

05:32 Why don't you tell us what mutation testing is?

05:34 I thought it was really interesting, and I wanted to share it with the audience.

05:37 Sure, sure.

05:38 Mutation testing is, you can think of it in some sense as a test for your tests.

05:43 The main goal of mutation testing is to gauge the effectiveness of your existing tests.

05:49 So if you take the sort of theoretically perfect standpoint that you have a test suite that tests 100% of your functionality, at least in principle,

05:58 then mutation testing can tell you if your test suite actually does test your functionality.

06:03 You can find holes in your test system, and it can also help you find code in your code base that isn't tested, and maybe you can just be removed because it doesn't actually contribute to any real functionality.

06:14 So mutation testing, as you said, has nothing to do with genetic algorithms.

06:19 It doesn't try to search out failing test cases or something.

06:24 It's a very dumb algorithm.

06:25 It systematically makes small modifications.

06:28 It's kind of exhaustive, right?

06:29 Exactly.

06:30 It's an exhaustive, brute force search through a pretty large space to try to just trick your tests into passing a mutant.

06:39 I mean, the basic idea is that you make these very small changes to your code base and then run your test suite.

06:46 And if your test suite passes, then we say that the mutant test survived, and this is what you don't want.

06:51 This means that your test suite is incapable.

06:53 It doesn't have the fidelity for detecting the change you've made, which we consider an error.

06:57 Okay.

06:58 So before we get into that, though, you said something that I thought was interesting.

07:03 You said you have 100% code coverage, and yet your tests are not doing their thing.

07:07 So I think there's like layers of, or levels rather, of sort of verifying your test.

07:14 Like you're writing tests.

07:15 That's level one.

07:16 You have tests.

07:17 They exist.

07:17 It's an existing thing.

07:19 Yeah.

07:19 Step two or level two of enlightenment would be you have a significant amount of code coverage.

07:26 Because without code coverage, you can have like 1,000 tests, but they could all be about some small, useless part of your app.

07:32 And the important core section might actually not be tested.

07:35 But this supposes you're kind of, you're at level two enlightenment, right?

07:39 You have tests.

07:40 You have good, maybe not 100%, but you have pretty solid code coverage.

07:44 And now you want to say, is this actually, there's a difference between executing code and verifying code, right?

07:51 Right.

07:51 And this is a really important distinction that mutation testing gets to the heart of, which is, as you say, you could have 100% coverage in the sense that your test suite causes 100% of your instructions to be executed, however you define that instruction set.

08:07 But it doesn't tell you whether or not, traditional coverage doesn't tell you whether or not your tests are verifying the functionality.

08:15 So you could have a glaring defect in your program that your test suite is exercising, but not actually verifying.

08:24 And mutation testing goes to the next level and tries to tell you if your test suite is actually verifying functionality, if it's capable of detecting actual errors.

08:34 And if it's not, then we say that you need a more powerful test suite.

08:37 That's kind of the whole point of mutation testing.

08:40 It's this sort of adjunct to an existing test suite.

08:44 And just to add on that, I mean, this question of 100% coverage, no matter how you slice it, is a really hard thing for most projects to have.

08:52 In fact, most projects in the world don't have anything close to 100% coverage.

08:57 But mutation testing can still be useful, even with systems that don't have 100% coverage.

09:02 It just is going to throw up a lot of, initially at least throw up a lot of flags telling you that you have problems where you just don't have tests yet.

09:10 It's not a technique that can just be used on systems that have 100% coverage already.

09:16 I guess that's my point.

09:18 That's good, because that means it would have been excluded from quite a wide bit of code out there, right?

09:23 Yeah.

09:23 Nobody would be able to use it.

09:24 Yeah, that's the truth.

09:26 Yeah.

09:26 Right.

09:26 Okay.

09:27 So there's some really cool ideas here.

09:30 You're talking about a mutant.

09:31 This is the idea of changing your program, introducing some mutation.

09:36 It's just some, almost randomly find a spot and make a change and see what the effect is.

09:42 Because theoretically, you should be able to detect this change which broke something, presumably by changing it, right?

09:49 Yeah, that's exactly the case.

09:52 The mutations we're talking about, the modifications we're talking about are typically very, very small.

09:56 And so the kind of canonical example that it's always trotted out is replacing a relational operator.

10:03 So if I have some line of code that's, you know, X is less than one, I could change that.

10:08 The mutation would be to change that to X is greater than one, for example, and then make that one small change and then run the test suite again.

10:15 So these changes are very, very small, but the point is that the changes should all, in principle, be detectable by a sufficiently powerful test suite.

10:23 Okay.

10:24 That sounds like that would be true most of the time, but I think there might be some cases where it might not be detectable.

10:32 Before we get into that, though, I just want to clarify for the listeners, you're not the one doing the mutations, right?

10:39 Like, as a developer, that's not you.

10:42 Correct.

10:42 The whole point of a mutation testing tool is that it will do the hard work, the boring work of plowing through your code, finding the places that can potentially be modified, modifying them, and then running your test suite.

10:54 So in principle, you should be able to point the tool at your code, tell it what your test suite is, and then walk away for probably a very long time and then come back to get your results.

11:07 And, yeah, you don't – it takes away all the drudgery associated with that and actually gives you some really interesting results typically in the end.

11:15 Yeah, that is really cool.

11:16 So maybe we could think about, like, what does the mutation testing tell us?

11:21 Because sometimes you might make a change and your code will then fail the test, right?

11:27 So if you were testing that, I select a user and it's equal – the count of users I got back was one, and you change that to not equal, for example, your framework changes it to not equal.

11:37 Obviously, that test would fail, but it could also change things that I don't detect up.

11:42 I don't detect.

11:43 Yes, that's true.

11:44 You can have a result where your test suite passes a mutant, and then you go examine the code, and you realize that there's really no way to write a realistic test that would detect that change.

11:58 And this is a class of mutants that, if you read the literature, is called an equivalent mutant.

12:03 And an equivalent mutant is exactly that, a mutant that is functionally equivalent.

12:07 It's still a mutant.

12:08 It's been changed, but it's still functionally equivalent to the original program.

12:13 And for some reason or the other – and this is a very language-specific thing – but for some reason or the other, you simply cannot detect it.

12:19 And this is one of the really tricky, difficult aspects of mutation testing is ferreting out and somehow avoiding these equivalent mutants.

12:28 Yes.

12:28 Yeah, that's interesting.

12:30 So obviously, if we mutate it and then the test fail, that's an upvote for our test, right?

12:37 We've made a change to the code.

12:38 We reran the test.

12:40 The test said, your code has changed.

12:42 It's no longer good.

12:43 But then sometimes it might not come back.

12:47 Is it possible that if, like, you change some kind of, like, while loop condition, it could just go forever?

12:53 That's entirely possible.

12:54 And this is yet another class of complexities that we have to deal with in mutation testing.

13:00 These mutants that – the great example, the canonical example is what you said.

13:06 They go into an infinite loop.

13:07 So if I change, for example, one mutation would be to change a break to a continue.

13:12 And if you do that, then you typically create a situation where an infinite loop is very, very likely because you've taken a place where your code is in the exit condition and then where it wants to break.

13:23 And you said, no, don't break.

13:24 Continue the loop.

13:25 And it's going to stay in the exit condition and just kind of continue forever.

13:28 So that kind of mutant falls into the category that we call incompetent.

13:34 And I guess I should back up and say there are sort of three main categories for mutants.

13:39 After you've run your test suite, you have some results.

13:41 You talked about just a second ago where the test suite fails.

13:47 And we say that in that case, your test suite has killed the mutant.

13:50 Your test suite has failed, indicating that it knows that you've made a change.

13:53 The other broad category is that your mutant survives.

13:56 That is, your test suite passes.

13:58 And this is where we start to look for weaknesses in our test suite.

14:01 The third sort of smaller category is this category of incompetent mutants.

14:06 Most incompetent mutants fail immediately by throwing an exception or failing to compile or doing something along those lines,

14:13 something catastrophic that prevents them from actually even being run under the test suite.

14:18 And these, we still count these as killed.

14:20 These go in the checkbox category.

14:22 This is good.

14:23 But there are some incompetent mutants that do things like, you say, just run forever or maybe run for a very, very, very, very long time,

14:30 so long that we don't really want to try to see if they stop.

14:34 So this area is a difficult one.

14:38 It's one that you have to address on a practical level when you develop tools to do mutation testing, this problem of incompetence.

14:46 And, you know, you start looking into the theory of detecting incompetent mutants,

14:51 and you run smack into Alan Turing's famous proof about the halting problem,

14:55 saying that you cannot look at a program and determine a priori if it's going to stop running at some point in the future.

15:02 And that's the problem you face with incompetent mutants and mutation testing.

15:06 Yeah.

15:06 There's not even much reasoning about it because it's just a brute force method anyway.

15:10 Correct.

15:11 Yeah.

15:11 Yeah, that's a big challenge.

15:12 Can you give me some idea of, like, how frequent that category shows up?

15:18 Is that, like, 0.1%, 5%, 10%?

15:22 That's a tough question to answer on a global scale because, I mean, I obviously haven't run mutation testing on every program.

15:29 But in my experience, it's a relatively small amount.

15:33 I mean, less than 1% of mutations, far less than 1% of mutations become incompetent.

15:39 They're not a huge problem in practice because the strategies we use to deal with them are really simple,

15:44 which basically what we do is we timeout.

15:47 We establish, using one method or another, a timeout for your test suite.

15:51 And if it takes longer than the timeout, then we just count that as incompetent.

15:54 We say that mutant is in an infinite loop or in a huge loop, and we're going to say that it didn't get to run.

15:59 If you consider performance part of your feature set, maybe it's failed anyway, right?

16:03 Right.

16:04 Yeah, it's clearly problematic at that point.

16:07 Okay, interesting.

16:08 And you actually had two ways of timing out.

16:11 Like, you could just say, well, we're never going to run tests more than five minutes.

16:14 But you had a cool thing to do with baselines as well, right?

16:17 In the tools that we have right now for doing mutation testing in Python, the approach we take is to, well, as you said, one way is let the user provide a timeout.

16:27 They can just provide an absolute timeout, and we'll honor that.

16:29 Or we can run the test suite over unmutated code and time that and use that as a baseline and then let the user provide some multiplier, say, two or three.

16:40 And then if a mutant's test suite takes longer than n times the baseline timing, then we consider that an incompetent mutant and we kill it off.

16:48 So this is our really simple but generally very effective approach to dealing with the halting problem in practice.

16:55 Yeah, it's way easier than proving it, right?

16:57 Right, yeah.

16:59 So that would be really difficult to do.

17:00 Yeah, nice.

17:02 Okay, so we have obviously the case where the mutant is killed.

17:05 We have this incompetent mutant, which we kind of can't really deal with.

17:10 But then we have the more challenging case.

17:13 It's maybe the interesting case, you would say, where you've changed the code, you run the test, and the tests all still pass.

17:21 Right?

17:22 So there's a couple of conclusions you could draw from this, yeah?

17:24 Well, yeah.

17:26 If you've made a mutation and the tests still pass, then you have a couple of things to look into.

17:35 One possibility, the standard thing that happens in that case is that you just don't have enough tests.

17:40 You need more or better tests because you have some change that was undetected by your test suite.

17:45 The other very common, the less common but still quite common case is that you have code in your program that doesn't need to be there anymore.

17:53 It's extra code that doesn't contribute to any functionality.

17:56 So your test, in that case, your test suite is perfectly good because it's testing the things it needs to be testing,

18:02 the important functionality of your program.

18:05 But you've got bits of code that can be mutated but aren't being tested, so you should yank those bits of code out.

18:10 If you view code as a liability rather than something important to keep around, you can just get rid of it at that point.

18:16 The third possibility, and this is really a subcategory of the first, is that we come back to this notion of equivalent mutants.

18:23 Mutants that they have been changed.

18:26 Our test suite hasn't detected them, but there's no practical way to write a test for those.

18:31 And there's all sorts of interesting examples of these.

18:35 They're a bit difficult to describe, perhaps, purely without showing some code.

18:42 Speaking of showing, all the videos of the sessions, including yours and mine from NDC, will be online shortly.

18:49 And so as soon as they are online, I'll put a link to your presentation so people can go back and see it.

18:55 But yeah, it is tough to talk about code examples on audio, right?

19:00 Yeah, it's quite difficult.

19:01 But for Python, I think I could probably describe the Dunder main example.

19:06 Yeah, go for it.

19:08 So one equivalent mutant that, in retrospect, is quite obvious, but I hadn't really anticipated, is the standard idiom in Python of using Dunder name equals Dunder main to set up your main block when you're writing a program.

19:23 But of course, if you've got that in your program and you have any kind of code in that block that can be mutated, the mutation testing suite will mutate that code.

19:33 But of course, that block is never executed in a test because it's not accessible inside a test because Dunder name does not equal Dunder main in that case ever.

19:42 So you have this really interesting case where you have this whole body of code that's really important to your program in a way, but that cannot be tested and never will be tested.

19:50 So that's the flavor of at least some equivalent mutants.

19:55 But one of the joys of equivalent mutants is that when you find them, you have these aha moments almost every time because they're surprising, they're interesting, and they kind of make you scratch your head a little bit.

20:07 And so it's one of the, I guess you might say, strange joys of mutation testing.

20:12 Yeah, that does sound pretty interesting.

20:15 It definitely gives you some insight you probably wouldn't normally get.

20:30 This episode is brought to you by Hired.

20:32 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

20:37 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.

20:45 Typically, candidates receive five or more offers within the first week, and there are no obligations ever.

20:51 Sounds awesome, doesn't it?

20:52 Well, did I mention the signing bonus?

20:54 Everyone who accepts a job from Hired gets a $1,000 signing bonus.

20:58 And as Talk Python listeners, it gets way sweeter.

21:00 Use the link Hired.com slash Talk Python To Me, and Hired will double the signing bonus to $2,000.

21:06 Opportunities knocking.

21:08 Visit Hired.com slash Talk Python To Me and answer the call.

21:16 One example that I was thinking of when you were talking about that category is like logging, right?

21:21 So maybe you've got some test, and it says, if this, then log this thing, else log that.

21:27 And, you know, would it really make sense to, like, write a test to detect what you're logging?

21:32 Right.

21:32 That's a really good point.

21:33 I mean, most, a large category of equivalent mutants are exactly of that flavor.

21:38 The changes caused by the mutation are things that you, in principle, could test for, but you never would because there's no reason to do it.

21:46 There may be no business reason to do it.

21:49 There may be just no practical reason to do it, depending on what values are driving your project.

21:55 And so you end up not ever writing tests for those.

21:57 And one of the challenges of writing mutation testing tools is allowing your user to specify, in some way, shape, or form, that mutations should not be performed on certain bodies of code for various reasons.

22:12 And this is something that all the tools that do mutation testing have to account for in some way, shape, or form.

22:19 Okay.

22:19 So I'm going to hold my question on, like, how you deal with that until we get to your framework.

22:24 Okay, good.

22:25 Because that is a really interesting problem, and I want to dig into it.

22:30 But before we do, like, could you give us some, you know, you talked about the sort of the basic changes, like, if you've got a less than, change that to a greater than.

22:39 There's a whole variety of different types.

22:42 There's, like, language agnostic changes.

22:44 There's changes you can make that affect object-oriented programming.

22:47 Can you give us, like, a sense of those?

22:50 Sure, yeah.

22:51 Yeah.

22:51 So this is actually one of the areas where there's active research into mutation testing.

22:56 There's not a huge group of people doing this research, but the research that is going on is, to a large degree, into which kinds of mutations should we be actually performing.

23:05 So you mentioned that there's some mutations that are language agnostic in the sense that they apply to almost all programming languages you can imagine.

23:14 And so a typical example would be something like replacing a constant.

23:19 If you found the constant 4 in your code, the mutation testing suite might change that to 5 or 19 or negative 6 or something like that.

23:27 So this is, you know, it's an obvious change.

23:29 I mean, it's a sort of blatant change.

23:31 It should be obviously testable.

23:33 And so it's ripe for that kind of thing.

23:36 But other examples include things like replacing arithmetic operators, removing unary or adding unary operators.

23:43 We talked earlier about relational operators replacing those.

23:46 And all of these sort of fall into – they're broadly applicable.

23:52 You could see them being applied in a functional language or an OO language or any other procedural language, whatever kind of language you happen to be working on.

24:01 But some research has looked into mutations that are specific to, for example, object-oriented languages.

24:06 Most – well, not Python, but a lot of object-oriented languages have access modifiers, for example, private, public, protected, and so forth.

24:15 And so one really clever and interesting mutation is to replace public with private or vice versa, to basically go in and mess with the access modifiers and see if that is detected by the test suite.

24:27 Often that results in something that can't compile.

24:29 In C++, for example, if you changed public to private, that would probably break compilation of many programs.

24:35 But changing private to public, it's hard to say.

24:38 That's actually very, very difficult to test for.

24:41 Other examples of OO-specific mutations would include, for example, changing base class order.

24:48 This is another one that can have really dramatic effects on what your program does.

24:54 Or in other cases, it can have absolutely no effect whatsoever.

24:57 So you could see how changing base class order in Python, for example, could have a completely undetectable – almost completely undetectable change to a program.

25:07 The only way to detect that would be if you had a test that was checking base class order, was checking the MRO for the class.

25:12 And, of course, nobody's going to write that test.

25:15 And I'm not advocating that anybody write that kind of test.

25:17 That's not the kind of code you want to write, that's for sure.

25:19 Right.

25:20 It's a complete waste of time.

25:21 So that's, I think, a fairly enlightening example of the kinds of problems you face doing mutation testing.

25:29 As elegant and interesting and straightforward as the approach sounds, there are these really difficult, thorny edge cases you have to deal with.

25:36 Other branches of research into, for example, mutations for functional languages.

25:41 In the classic example there, a lot of Haskell-like and F-sharp-like languages have pattern matching, for example, on their functions.

25:47 And changing the order of pattern matching is a common mutation you might perform on a language like that.

25:52 And that, again, is another area where sometimes changing the order makes a huge difference, and sometimes changing the order makes absolutely no difference.

25:59 And in those cases, I actually don't know how you would test for them because they're undetectable unless you have introspection and you have reflection capabilities.

26:08 You actually go in and do the kinds of tests I talked about a second ago that you would never write.

26:11 So it's a fascinating field to kind of dip your toe into, and the papers are pretty accessible if you want to read about these kinds of things as well.

26:20 It sounds to me like one of the major challenges that you're going to run into for almost any reasonable-sized program is that it's going to be really slow, right?

26:31 Because you're looking at basically every permutation of all the operators in, you know, inheritance, methods.

26:39 There's a crazy number of things in play here, right?

26:43 That's absolutely true.

26:44 I think the single biggest practical problem with mutation testing, the single biggest practical roadblock for using mutation testing is that it takes a long, long, long time to do.

26:55 If you consider the possibility of having, you know, dozens or, you know, 100 operators, you know, kinds of mutations that you might make in your code, and you have a large code base, you know, hundreds of thousands of lines of code is not uncommon in valuable systems.

27:12 Or even not very valuable systems, for that matter.

27:15 And then you consider the fact also that a test suite might take a considerable amount of time to run.

27:19 So you have this triply nested loop of the operators, the places those operators can be applied, and the amount of time it takes to run your test suite.

27:27 And you're talking about, you know, if you do the math, you can find on some systems that adds up to years.

27:31 I mean, literally, it's not something you can do on a practical basis for all your code in any way, shape, or form.

27:37 But there are some strategies that we can apply to try to deal with that.

27:42 The most basic strategy is simply to parallelize.

27:45 For all the problems we have with long run times and mutation testing, the saving grace is perhaps that it's embarrassingly parallel.

27:51 You can run each mutation slash test suite run in a completely separate process, all at the same time if you want to, and it won't affect the results.

28:00 So you could, in principle, go to Azure or Amazon and rent for five minutes, you know, 10,000 or 100,000 machines or whatever they'll let you get, run all your tests, and then be done with it.

28:11 But that's not something that is probably economically feasible for most people.

28:16 So other approaches that there are to dealing with this, well, there's not that many other approaches that I'm aware of.

28:23 But one is another form of baselining.

28:25 We talked earlier about baselining for your timeouts, you know, when to kill the test suite and call it incompetent.

28:31 Another kind of baseline that you can do is to run a full test suite with all your operators over all your code and get those results.

28:38 And then as you start making changes to your code base, only run the tests that, you know, work and exercise modified code, code that you've changed.

28:48 And that way you can drastically reduce the scope of the number of tests you need to run.

28:54 And that drastically reduces the number of operators that get applied, the amount of code space that can be potentially modified, and so forth.

28:59 Also, you can tell your mutation testing system to only mutate code that was modified.

29:04 So basically, we're analyzing deltas, analyzing our git diffs, so to speak, and saying only run the tests that we know could possibly have an impact or be impacted by the changes that were made.

29:14 And this is a heuristic approach to speeding things up because, of course, it's not watertight.

29:22 You could, of course, make changes to your code that influence the code paths that your tests are now exercising.

29:29 And if you purely use this kind of baselining for determining which tests to run and what to test, then you'll be missing things.

29:36 So you have to occasionally do, or at least in principle, you'll be missing things.

29:39 So you'll have to do occasional rebaselinings to make sure that you've kept up with all of your changes.

29:43 And it also assumes that you have some way of correlating your tests with lines of code.

29:48 So this is where mutation testing and traditional coverage analysis tools can come into play, where they can work together.

29:57 Because now you can say, okay, I take the coverage analysis information.

30:00 I know which tests exercise which lines of code.

30:02 I can couple that and kind of compare it to the deltas and determine which tests need to be run by the mutation testing suite.

30:09 It's basically like an inverted code coverage, right?

30:12 So if I look at this test, what part of my code in my real app changed or somehow was affected by running this test, right?

30:22 And so you could just focus on, say, like 10 lines of code or probably way more than that.

30:26 But focus it in on that area, right?

30:29 Exactly.

30:29 That's the point is drastically reduce the space.

30:32 And I think in principle you could get this down to where you could run things fast enough that you could do it on every commit or bundles of commits rather than once a week or something along those lines, which may or may not be desirable.

30:45 But it's an interesting goal from a tool developer's point of view.

30:48 Yeah, it's definitely an interesting goal.

30:50 Well, one of the things I was thinking of as you said this was, is this a thing that needs to run on, say, every check-in or every time you want to run your test?

31:00 Because if you have a good set of tests, hopefully your tests are actually catching your bugs.

31:06 And this feels to me like a validation of your tests rather than it seems like it could theoretically run less often and still be really valuable.

31:16 I think in practice you're right.

31:18 It doesn't need to run on every check-in.

31:20 But if you're working on a team that wants perfect code coverage, for example, and that requires, say you have a policy on a legacy code system that any change you make needs to be backed up by tests, which is a common thing to do with existing legacy systems that are trying to improve their lot in this world.

31:36 You might have that policy.

31:38 This is on every commit.

31:38 Whatever changes you've made need to be backed up by tests.

31:41 And this is a good way to verify that.

31:43 Not to verify just that you've made tests, but to verify that the tests you've created actually test the functionality correctly.

31:51 And so if you can make mutation testing fast enough, you could actually enforce that kind of constraint in a pretty strong way.

31:57 And that's an interesting thing.

31:59 Yeah, that is quite interesting because my experience is there's a massive difference among team members on their level of embracing testing and how much they run the test.

32:09 Like some people are really into it.

32:11 Some people only run it if there's something making them run it, basically.

32:14 Right.

32:16 Yeah, that's very true.

32:18 And so now you have a new stick to beat people around the head with if you have mutation testing in place.

32:23 Nice.

32:26 So let's bring this down to Python.

32:28 Let's make it concrete.

32:29 Let's talk about this thing called Cosmic Ray that you created.

32:33 Okay.

32:34 Yeah.

32:34 Cosmic Ray is, as you just hinted, it's a mutation testing tool for Python.

32:39 I should say it's not the first mutation testing tool for Python.

32:42 There were a few available when I started writing it, but none of them were work.

32:47 They didn't quite work the way I wanted or they were unmaintained.

32:50 And really, this was an interesting project in its own right.

32:53 So this started out almost as just a fun thing to do.

32:56 And it turned out to be a really fascinating project all around.

32:59 But Cosmic Ray is a system for searching through your Python code, finding places to mutate, making those mutations, and then running your test suite.

33:11 And it's a fairly young project.

33:14 And it has quite a bit of work left to be done on it.

33:17 But it has produced some results already.

33:19 So it's looking quite promising.

33:21 It's about a year and a half old, I think, at this point.

33:25 And really, it's only been used by me and a few sort of close, trusted friends.

33:29 But it's open source.

33:31 It's on GitHub.

33:31 And anybody who wants to try it or make contributions or give any feedback is more than welcome and, in fact, encouraged to go take a look at it.

33:38 Yeah.

33:39 Yeah, awesome.

33:39 And I'll be sure to link to the GitHub repo and things like that.

33:42 And it's on PyPI, of course, right?

33:43 Honestly, I'm not sure.

33:46 I think it is.

33:47 But I don't know the last time I pushed up a version to PyPI.

33:50 Let me see here.

33:52 Yes, it is.

33:53 It is on.

33:54 It's cosmic underscore ray on PyPI.

33:57 Woo.

33:57 Okay.

33:57 Saved.

34:00 I guess the interesting parts for a lot of people are going to be how cosmic ray works internally.

34:05 Yeah, absolutely.

34:06 There is some really amazing stuff in there.

34:09 Before we get into that, could you just really quickly, like, tell me, what do I need to do?

34:14 Like, if I've got some Python app with some tests, you know, I'm using pytest or something like that, like, what are my steps to apply this?

34:22 The steps are, well, pretty straightforward.

34:24 Identify the parts of your code that you want to mutation test.

34:29 Very often you'll have some part of your code that has a good test suite and is heavily, thoroughly tested and is central to the functioning and other parts that aren't.

34:38 And you can use cosmic ray to divvy out and slice and dice the parts you do and do not want to test.

34:44 So if you just want to take it for a spin, identify some module that you're interested in.

34:49 Because you want it to happen in a non-trivial, you know, in a short amount of time, right?

34:53 Well, that's one of the other reasons.

34:54 Yeah, you'll get more bang for your buck if you're just trying to test drive this than if you try to run it over.

34:58 You know, a 10,000.

34:59 If you want to run this over Django, forget it.

35:01 It's not going to work.

35:01 But I mean, not in any practical sense.

35:03 But if you want to run it over a single module in Django or, you know, some other package, then you'll have more luck.

35:09 That's been my experience, at least, with it so far.

35:11 But yeah, you'll need a test suite.

35:12 Right now, we only support unit tests, the standard library unit tests, and pytest as the test suites we support.

35:19 But there's a plug-in system for other testing systems.

35:22 If you feel you need one supported, they're pretty easy to add.

35:26 Point Cosmic Ray at your module and at your test suite.

35:30 And you'll pass it a few other parameters, you know, things having to do with timeouts and so forth.

35:34 And it will build up a work order, you know, basically the list of things it's going to do and put those in a little database.

35:39 And then you'll need to set up Celery.

35:42 Celery is a task distribution queue that runs on top of RabbitM queue by default.

35:50 And this, we use Celery to distribute work out to workers that actually do the mutation, run a test suite, and then send results back.

35:57 And so you'll have workers sitting on your Celery queue.

36:01 And then you'll tell Cosmic Ray to run the work order that it's built.

36:06 And it will start doling out work to these workers and collating the results back into the little database it's got.

36:11 And that's, I mean, that's the short version of what you need to do.

36:15 Once you have results back, then you start analyzing them and trying to figure out what Cosmic Ray is telling you.

36:20 Right, you look at those three categories, you decide what to ignore, what not to ignore.

36:24 Gone are the days of tweaking your server, merging your code, and just hoping it works in your production environment.

36:45 With SnapCI's cloud-based, hosted, continuous delivery tool, You simply do a git push, and they auto-detect and run all the necessary tests through their multi-stage pipelines.

36:55 Something fails, you can even debug it directly in the browser.

36:58 With a one-click deployment that you can do from your desk or from 30,000 feet in the air,

37:03 Snap offers flexibility and ease of mind.

37:06 Imagine all the time you'll save.

37:08 Thanks, SnapCI, for sponsoring this episode by trying them for free at snap.ci.com.

37:14 And is there a way to flag it and say, this thing you've detected here, I want to ignore that?

37:31 Not yet.

37:31 And this is actually one of the big open areas for development, is how do we let users specify exceptions effectively?

37:38 How do we let them say, don't make this mutation on this line of code?

37:42 Or even more coarsely, don't make mutations on this line of code.

37:45 Because we need that kind of thing because of the problems of equivalent mutants and so forth that we have no real solution to.

37:54 Right now, there's a thought about the direction to take this in, how to deal with it.

37:59 If you look at tools like PyLint, they have great systems for putting essentially orders into comments in your code telling PyLint,

38:06 please don't apply rule such and such to this line of code.

38:09 We could probably apply the same kind of technique to Cosmic Ray.

38:13 But I'm not sure yet if that's better than having some extrinsic description of exceptions.

38:19 It's basically an open question.

38:21 And if anybody has ideas or wants to take a swing at it, this really is one of the big things that we need to sort out soon.

38:28 Let's look inside.

38:29 Basically, you point Cosmic Ray at your module and you say, go shred this thing.

38:35 And for every shred that you create, go run the unit test, right?

38:39 That's exactly right.

38:41 Walk us through the internals there.

38:43 There's some interesting stuff you're doing.

38:44 Well, at the core of all of this is the standard library module AST.

38:50 AST is short for, what's it, acronym for Abstract Syntax Tree.

38:54 An Abstract Syntax Tree is just a programmatic structure defining a program, the syntax in your source code.

39:01 When Python parses your source code, it produces an abstract syntax tree.

39:05 And then you can access this looking at the different nodes in the tree, you know, looking at the different parts of your program.

39:10 And not just look at them, but you can also change them.

39:13 So what AST allows us to do in Cosmic Ray is load up your source code.

39:18 We literally read your source code from your .py file.

39:21 And we pass it into a parse function, which parses the source code into the abstract syntax tree.

39:26 And then AST has other components, which allow us to walk down that tree.

39:31 And if we want to make changes, the details of exactly how it does that, it might be difficult to talk about operators and things like that in too great of detail.

39:41 Yeah.

39:42 So, well, basically you get this abstract syntax tree and then you start applying your transformations to it, right?

39:50 Your mutations, if you will.

39:52 Yeah.

39:52 Well, that's the fundamental idea, yes.

39:54 So you have the AST and you find a place that you want to make a modification and then you make a modification to it.

39:59 And there's support in the AST module for doing that kind of work.

40:02 Once you've modified the AST, you then need to get it, you need to make it available to your test suite.

40:08 You need to make it importable.

40:09 And that's a whole other kind of second level trick.

40:13 Yeah, because there's one thing to say, hey, Python, run this module.

40:16 It's another to load up an individual AST and then turn that into executable things, right?

40:22 Exactly.

40:23 Everything it depends upon and so on.

40:25 That was sort of the second big phase of work in building Cosmic Ray was figuring out how to do that.

40:32 So once you have an AST, a modified AST, you can pass that to the built-in compile function.

40:38 And that spits out what's called a code object.

40:40 And that's this kind of thing that modules can use, so to speak, that we can execute to populate a module.

40:46 So figuring out how to make that available, make your modified AST available through standard import was a big goal of Cosmic Ray.

40:56 We didn't want people to have to modify their test suites to do mutation tests.

40:59 We wanted the test suites to just naturally say import, you know, import of the module and get the right one.

41:05 So we had to do a lot of investigation into how Python does this.

41:09 And at the core, there's three main moving parts to how Python does imports, how it lets you control imports.

41:17 The first thing is what's called a finder.

41:20 And a finder is an object that's responsible, a class typically, but a function or a class that's responsible for telling Python that it knows how to load a module given that module's name.

41:32 So Python will ask the finder, I've been asked to import foo.

41:36 Do you know how to do anything with foo?

41:38 And a finder can say yes or no.

41:39 If a finder does know how to load something, it returns what's called a loader.

41:45 And the loader is then responsible for populating essentially the shell of a module.

41:52 So Python will make the empty shell of the module, pass it to the loader, and say, okay, now you populate this with the names, the functions, the name bindings, the constants, all that kind of stuff that come from the module that you're supposed to be loading for me.

42:05 What we do is in Cosmic Ray, we have our own custom finder.

42:09 And that finder is given the modified AST, and it's told the name of the module.

42:15 And if it's then asked by Python, do you know how to load that module?

42:19 It'll say yes.

42:19 And then it hands back a loader.

42:21 We have a custom loader, which also has this AST.

42:24 And the custom loader is able to compile the AST, and then use that compiled AST to populate the shell module.

42:36 And then that shell module is passed back to Python, and it gets naturally imported so that everybody can use it.

42:42 I guess the last sort of moving part in this whole system is something called sys.metapath.

42:46 If you import sys, you'll see it has an attribute called metapath.

42:49 Metapath is just a list of finders.

42:52 And when Python wants to import something, and some experts might tell me that I'm a little bit wrong with the details, but this is effectively correct.

42:59 Python marches down metapath, asking each finder in order, do you know how to load this name?

43:04 And the first finder that responds is the one that wins.

43:08 So what we do is we take our custom finder, we populate it with its AST and its name, and we stick it at the front of metapath inside our worker processes.

43:16 And these worker processes then are able to hijack the import system, in a sense, and put these mutated ASTs directly into place so that nobody has to know they're there, but they get imported naturally by whoever wants to use them.

43:27 So that's the long and the short, I guess, of how we stick mutated ASTs into Python programs.

43:33 Yeah, you really had to dig deep down inside the guts of Python.

43:37 You had to take the red pill, not the blue pill, right?

43:40 Yeah, there's a lot of pet archaeology and stuff here to get to the bottom of this.

43:45 But at the end, it's very elegant and powerful.

43:46 One of the joys of this project was learning all this stuff that I may never apply again, but I feel like I've reached the next level of my Python expertise, in a sense.

43:57 Yeah, that's really cool.

43:58 But it's awesome because you don't change your code to make this happen, right?

44:02 It adapts to what it has to do to basically take over.

44:07 Exactly.

44:08 We work, Cosmic Ray works at a deep enough level that neither your test code nor your code under test needs to be modified to use Cosmic Ray.

44:19 It should work transparently in all ways.

44:23 Yeah, that was a big goal of the project.

44:24 You talked about Celery.

44:26 Celery is really awesome.

44:27 There's a couple of other really cool projects that you're kind of built upon.

44:33 One of them was this thing called TinyDB.

44:35 Yeah, TinyDB.

44:38 Well, it is what its name says.

44:41 It's a tiny database.

44:42 It's a little embedded file-oriented JSON database that you can import into your Python program and use with basically no configuration.

44:52 So it was exactly what I was looking for when I was looking for a database for Cosmic Ray.

44:57 We use the database for basically keeping track of the work order I described earlier.

45:03 The first thing you do in a mutation testing run is figure out what it is you're going to do and write all that down.

45:08 We write that into the database.

45:09 And then as the results arrive back via Celery, we stick the results back into this database.

45:14 So TinyDB is something that's worked out really well for us so far.

45:18 And it was, as I said, super easy to use, and it's stuck around so far.

45:23 I have a feeling that it's going to end up being a bottleneck in larger projects.

45:26 But that's a gut feeling.

45:29 I don't have any evidence to indicate that.

45:31 But if it has to be replaced, then we'll start looking at something like SQLite,

45:36 or maybe we'll give the user the power to specify MongoDB or whatever they want.

45:41 But TinyDB is really worth looking at, I think, if you don't have really sophisticated needs in a database

45:48 and you want something that's just file-oriented.

45:49 It's a really beautiful little program that worked out of the box with really no reading on my part whatsoever.

45:54 That's lovely.

45:55 I really like to use SQLite and SQLAlchemy together.

46:00 And those work really well in sort of an equivalent way.

46:03 But I'm a huge fan of the document databases.

46:06 One of the big selling points, what made me stick with TinyDB, is that it literally is a JSON file.

46:13 I can open it up in Emacs and just look at it.

46:16 And I don't have to have any extra tools to examine its contents.

46:22 I think that that JSON nature is what's going to be its downfall.

46:26 That's what makes me think it's not going to last that long for this project.

46:29 But that's been a real selling point, is I can run my tests as I'm testing CosmicWrite,

46:35 which, as you might imagine, is a real challenge, and then see what's in the database really, really easily.

46:40 And so my cycle time has been pretty high by using TinyDB.

46:45 Oh, yeah.

46:45 That's cool.

46:45 And it's 100% Python, according to GitHub.

46:47 That sounds right.

46:49 Yeah.

46:49 I don't remember any compilation happening when I used it.

46:52 Yeah.

46:52 Nice.

46:53 Yeah, it has 1,000 stars.

46:54 So it's doing pretty well.

46:55 I definitely want to check it out.

46:56 The other one was DocOpt.

46:59 Yeah.

46:59 DocOpt is one of my current favorite packages for not just Python, but for lots and lots of languages.

47:07 DocOpt is a tool for building command line parsers.

47:13 But unlike things like ArtParse or the other sort of standard tools for doing this, it takes a kind of a backwards approach.

47:20 You provide it with a string, which is the POSIX standard help output that you would get from any program.

47:29 You know, saying usage colon program name, blah, blah, option names and all that kind of stuff.

47:33 And the text information somebody gets when they type, you know, program dash H.

47:38 You give that string to Cosmic Ray, and from that it generates a parser that can then parse command line arguments.

47:44 So you never have to think really hard about, you know, building up these parser objects yourself.

47:49 Everything is done magically, and all you have to do is think about how your pretty help message is going to look.

47:54 Which you've got to write anyway.

47:56 Which, yeah, you either have to write or have to get generated by some other tool.

48:01 This has the neat effect that embedded in your code somewhere is your full help message that is great documentation, not just for your users, but also other programmers looking at your code.

48:12 It really, it solves a really annoying problem that everybody and every programmer in the world has, which is writing parsers for command line arguments.

48:21 And it does it in a really slick way.

48:23 And one of the interesting things is that, I didn't know this until I looked at DocOpt, is that there actually is a POSIX standard for these help messages.

48:31 So it can rely on an actual existing standard for defining these things, which is really cool.

48:36 That is cool.

48:37 Actually, that's the first way that I had heard of that there was a standard was by learning about DocOpt.

48:41 Like, wait, there's a standard for the help message?

48:43 Interesting.

48:45 I highly recommend that anybody who has to write command line tools and who hasn't tried DocOpt, take a look at it.

48:50 It's really addictive.

48:52 And you can produce really, really powerful command line parsers.

48:55 You know, things like you had with Git, you know, sub-command-based tools and stuff.

48:59 I guess the other interesting thing about DocOpt is that while it was originally written in Python, the canonical implementation is Python.

49:06 It exists now for something like 30 languages.

49:08 So if you're a sometime C# developer, sometime Java developer, sometime whatever developer, you can continue using DocOpt in those languages as well.

49:17 It's a neat project from that point of view, something that you don't see a lot of.

49:21 Definitely means the idea of it resonated super well, right?

49:24 Yeah.

49:25 Yeah, it did.

49:27 Okay, so we're getting kind of near the end of the show, and I wanted to ask you, you know, you have a company called 60 North, right?

49:34 That's correct, yes.

49:35 Yeah.

49:35 You and Robert Spolscher, is that right?

49:38 That's right, yes.

49:39 You guys work together.

49:40 Yeah, you guys are up in Norway, which is why I ran into you in Oslo.

49:43 That's right.

49:44 That's awesome.

49:44 Although we also seem to run into each other in London.

49:46 Yeah.

49:47 Yeah, what do you guys do there?

49:48 Well, 60 North, we're not terribly pigeonholed, but we do do a lot of Python work.

49:53 We do consulting, training, some development of our own as well.

49:59 We've made some courses for Pluralsight.

50:03 So if you go to Pluralsight and you look for the Python training courses, we have the Python Fundamentals as our first course.

50:08 And Python Beyond the Basics, which is sort of the next step, intermediate level, is there.

50:13 And we're working on a third one, which is Advanced Python, I think, is the official name.

50:18 And that will be out by the end of the year, hopefully.

50:20 Okay, yeah.

50:21 You and I, we're both very passionate about online courses.

50:24 Tell me, what's in your intermediate and your advanced courses?

50:27 Oh, I'm going to stretch my brain to remember exactly the contents of those courses.

50:32 But I know the intermediate course, we start getting into things like decorators, class properties, some of the details of classes beyond just functions and methods.

50:42 Here's how you define a class and add fields to it.

50:45 Right, yeah.

50:46 Getting beyond that.

50:47 A couple of things that are beyond the basics.

50:48 You'd be surprised at how many things there are that go into the basic course that are really, really basic.

50:54 I mean, the course is quite long and doesn't really scratch the surface of Python.

50:57 So anything like, like I mentioned, decorators or...

51:02 Probably Lambda, Lambda expression type thing.

51:04 I think Lambda's in there, context managers, implementing a lot of the Dunder magic methods.

51:10 That kind of stuff is in the intermediate.

51:12 And then the advanced classes where you start to get into things like what we talked about earlier, finders and loaders.

51:17 Or you start getting into meta classes.

51:20 Things that we classify to a degree as things you might do once a year instead of things that you do every day as a professional Python programmer.

51:29 I mean, finders and loaders, I programmed Python for 20 years and never used it.

51:33 But it's an interesting and important part of the language.

51:35 So it needs to be in there somewhere.

51:37 Yeah, and once you understand it, maybe you don't use it often, but knowing the mechanics helps you understand a lot of things often at that level.

51:44 Yeah, and you know you have that in your pocket.

51:46 And so that might be the most elegant solution for some particular problem you face rather than some horrible hack you would have to come up with otherwise.

51:53 So the advanced stuff is for people who are using Python a lot and need to find the best solutions and really understand the inner workings of the Python runtime.

52:03 Yeah, cool.

52:04 So if you guys have a Pluralsight subscription, go over there and type in Python in the search box and you'll find Austin.

52:09 Yeah.

52:10 Yeah.

52:12 Nice.

52:13 And you also wrote some books, too.

52:14 We do have some books.

52:15 Yeah, we have – the books are based largely on the same material as the Pluralsight courses.

52:20 And the first one is I think 90% done now.

52:24 It's on LeanPub.

52:24 It's called The Python Apprentice.

52:27 The second and third books, The Python Journeyman and The Python Master, are in the works and will be published probably not this year but soon.

52:36 And since they're on LeanPub, you can get the early version and we'll keep sending you updates as we make updates to the books.

52:43 But if you prefer books, these are available as, I think, PDFs and Mobi's and EPUBs on the LeanPub site.

52:50 Nice.

52:51 And that's self-publishing, right?

52:52 That is self-publishing, yes.

52:54 Yeah, very cool.

52:55 I'm a big fan of self-publishing.

52:56 So I like to see when people are succeeding at that.

52:59 That's great.

52:59 Yeah, so I'll be sure to link to all those things in the show notes as well.

53:03 Okay.

53:04 That'd be great.

53:05 Yeah, absolutely.

53:05 Very interesting.

53:06 Definitely cool.

53:07 Two more questions before I let you go.

53:09 What's your favorite PyPI package?

53:12 I saw the other day there's over 80,000 distinct packages out there.

53:16 That's an insane number.

53:17 That's probably why people like PyPI.

53:18 But there's got to be something that you've had exposure to that you want to share, like, oh, you should check this out.

53:25 Well, this is going to feel like a bit of a cheat, but DocOpt.

53:30 DocOpt is one that I, once I learned about it, I started using it on almost every project I use.

53:37 But I know that it's not that well-known, at least not as well-known as I think it should be.

53:41 So I'll just put a second vote in for DocOpt.

53:45 For my money, that's the tool I keep going back to in PyPI every time.

53:51 And it should be more widely known and more widely used because it's awesome.

53:55 Yeah, that's awesome.

53:56 I'll go ahead and throw one for Cosmic Ray in there for you because that's pretty awesome and very interesting to check out.

54:03 Thanks.

54:04 Thanks.

54:04 And then you mentioned Emacs earlier, but if you're going to write some Python code, what do you typically open up?

54:10 Well, the short answer is Emacs.

54:11 I've been using Emacs for almost as long as I've been using Python, I think.

54:16 And it's in my fingers to a degree.

54:18 If I know that I'm working on just a dedicated Python project, then PyCharm is a wonderful IDE.

54:25 And it's got a lot of powers that Emacs doesn't have when it comes to working with Python.

54:30 Well, that Emacs doesn't have yet, I should say.

54:34 But no, it's really great for just pure Python editing.

54:36 I guess the reason I stick with Emacs is, well, stubbornness to a degree.

54:41 I just, I'm old and don't want to change.

54:43 But I'm also very often working on multiple languages at the same time in any given project.

54:48 You know, everything from JavaScript to Python to, you know, L to whatever happens to be part of that project.

54:54 And I find that Emacs makes it easier for me to do that.

54:57 Or at least it has, it's not, it's the best of, it's the best of breed for that kind of work from what I can tell.

55:03 And honestly, Python as an Emacs IDE is pretty good.

55:07 You can do all sorts of fancy stuff in there if you want to spend the time to configure it.

55:11 And if you use a package like, or a canned Emacs configuration like SpaceMax, you'll find that you get pretty sophisticated support for things like completion right out of the box.

55:22 You know, you get Jedi support and things like that.

55:24 So it's, I know that I try not to recommend Emacs to new people, to people getting new to Python because that adds a whole level of complexity.

55:32 But Emacs as a way of life is, it's an interesting place to be.

55:35 So you could do a lot worse as a programmer.

55:38 So any final call to actions for all the listeners while you got the mic?

55:43 Any more calls to action?

55:45 Are you looking for contributors to your project?

55:47 Certainly.

55:48 Sounded like you were.

55:48 Certainly Cosmic Ray could use some people who are willing to put in some work.

55:54 We have, of course, the GitHub issues page where I keep track not just of defects, but also of, you know, the higher level issues that need to be done.

56:01 I mentioned earlier that we have this pressing need for being able to embed exceptions and processing instructions in our code so that Cosmic Ray can not do certain kinds of mutations.

56:12 And that's a big project that somebody might be able to take on.

56:17 We have, I guess, the two other big topics I could think of are support for different kinds of modules.

56:24 Right now, Cosmic Ray can only work against modules that are written in pure Python codes of .py files.

56:28 But, of course, there are plenty of other exotic kinds of modules out there.

56:32 So Cosmic Ray needs to either gracefully skip over those other kinds or learn how to process those.

56:37 And there's no support for that right now.

56:38 And that's a big limiting factor.

56:40 And the other is, this is more of a researchy thing, but the integration with coverage testing that I talked about earlier.

56:45 Being able to take output from, say, coverage.py and use that to determine how we can narrow down the scope of Cosmic Ray mutation testing runs and make it a more practical tool.

56:56 But, really, it's, you know, go to the issues page on GitHub and look and you'll see the nature of the things that are going on.

57:02 Yeah, that would be my call to action, I guess, for Cosmic Ray.

57:05 All right, fantastic.

57:06 I'll put the link to GitHub repo in the show notes.

57:09 So, Austin, it's been really fun to talk about this idea of mutation testing.

57:14 I think it's a really interesting evolution, if you will, of all the testing tools, right?

57:23 I can see a place when this algorithm gets tuned and, like, the various optimizations you talked about get in there that this could be a big part of day-to-day work.

57:32 It's cool.

57:32 Cool.

57:33 I'm glad you think that.

57:34 And thanks for having me on the show to talk about it.

57:36 It's something I really enjoy talking about in public.

57:39 So, yeah.

57:40 Yeah, you bet.

57:41 You bet.

57:42 Thanks for being on the show.

57:43 And it was great to see you last week.

57:44 Take care.

57:44 It was great seeing you last week.

57:45 All right.

57:46 Bye, Mike.

57:46 This has been another episode of Talk Python To Me.

57:50 Today's guest was Austin Bingham.

57:52 And this episode has been sponsored by Hired and SnapCI.

57:55 Thank you guys for supporting the show.

57:57 Hired wants to help you find your next big thing.

57:59 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $2,000.

58:08 SnapCI is modern, continuous integration and delivery.

58:12 Build, test, and deploy your code directly from GitHub.

58:14 All in your browser with debugging, Docker, and parallels included.

58:18 Try them for free at Snap.CI slash Talk Python.

58:21 Are you or a colleague trying to learn Python?

58:23 Have you tried books and videos that left you bored by just covering topics point by point?

58:28 Well, check out my online course, Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python.

58:37 You can find the links from this episode at talkpython.fm/episodes slash show slash 63.

58:42 Be sure to subscribe to the show.

58:44 Open your favorite podcatcher and search for Python.

58:47 We should be right at the top.

58:48 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

58:58 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

59:02 You can hear the entire song at talkpython.fm/music.

59:06 This is your host, Michael Kennedy.

59:08 Thanks so much for listening.

59:10 I really appreciate it.

59:11 Smix, let's get out of here.

59:17 I'll see you next time.

59:34 Bye.