#67: Property-based Testing with Hypothesis Transcript
00:00 Let's talk about your unit testing strategy. How do you select the tests you write or do you even write tests? Typically, when you write a test you have to think of what you are testing and the exact set of inputs and outcomes you're looking for. And there are strategies for this. Try to hit the boundary conditions, the most common use-cases, seek out error handling and so on.
00:00 We all do this to varying degrees of success. But we if we didn't have to do this. What if there was some kind of way to express the relationship between inputs and outputs but your tests could explore the problem space themselves?
00:00 Well, there is a way and it's called property-based testing. This week you'll learn about Hypothesis, the most popular property based testing system created by David MacIver.
00:00 This is Talk Python To Me, episode 67, recorded Monday July 11th 2016.
00:00 [music intro]
00:00 Welcome to Talk Python To Me, a weekly podcast on Python- the language, the libraries, the ecosystem and the personalities.
00:00 This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython.
00:00 This is episode is brought to you by Hired and SnapCI. Thank them for supporting the show on Twitter via @hired_hq and @snap_ci.
01:42 Michael: David, welcome to the show.
01:42 David: Thanks Michael, it's pleasure to be here.
01:44 Michael: I am really excited to talk about this new angle on testing that I honestly just learned about a few weeks ago; I am really thrilled to talk about property based testing, or the computer can sort of do more of the heavy lifting and more of the automation honestly for us right?
02:00 David: Yep, that's the idea. Sometimes when I am feeling particularly salty I describe it as real automated testing.
02:07 Michael: Yes, yeah, we'll get into why that is and what that is; but before we do, let's talk about your story, how did you get started in programming?
02:14 David: So I actually got started in programming for very selfish reasons, simply for the money. I got out of the university with the math degree and I didn't really want to stay in Academia, I was completely lost in anything fewer than 10 dimensions, so I had to figure out what to do that would actually work for me and the options that came up were banking or software and software seemed less of two evils.
02:40 Michael: I would say you made a good choice.
02:41 David: And I found a nice little company that was happy to take a chance on me, and let me learn to program on the job, and it turned out I quite like this programming thing. And ironically having gotten to it for the money I've been spending a lot of the last year and a half doing work entirely for free, but hey, it's been fun.
02:58 Michael: You know, that's a really interesting sort of view of things and I kind of got into programming not- I didn't think of it for the money but for the opportunities, I also was working on my PhD in math and I looked at the jobs and into how difficult and limited the jobs were, people coming out of, like you said, you could basically go work at like a bank, or an insurance company, or you can be in Academia and it's either super boring, working like optimizing a little bit of some kind of intern's policy, or it is extremely competitive, I decided to go into computers as well sort of after the fact, and I had a similar experience, I started working for a company that was willing to take a chance on me, I basically said look, I don't know much programming but this one thing that you need, this is the one thing I can actually do really well, you know, if you are willing to just let me do this one thing and then you give me some support I can do a great job for you and they did and, it's no looking back, it's great. So yeah, very cool that you got to start that way. Where did Python came along?
04:05 David: Python originally came along just for another job, I've always been sort of a programming languages polyglot and I tend to just pick up languages either when they seem interesting or when I need them for a particular problem or particular job; in Python's case it was a job I had being working in a company doing ruby actually for a couple of years, but when I was looking to move on the job that seemed most interesting was one using Python, Python seemed like a perfectly nice language, I was very happy to learn it and so I did. I learned it in advance for the job rather than actually starting from day one with zero Python knowledge, and using my Hypothesis was actually my learning Python book; so the version of Hypothesis you see today isn't my learning Python project, it's been essentially completely rewritten since then, I wrote basically a prototype back in 2013 when I was first learning Python and then beginning of 2015 when I had left another job and had some time on my hands I got frustrated that no one had improved on this silly little prototype I wrote so I took sort of the core idea and turned it into a real thing.
05:16 Michael: That's cool, yeah. And looking on github it's got over a thousand stars, it's definitely going- how long has it been sort of actively use or you said you sort of there is a learning thing and then you revamped it like when was that revamping and when is it kind of taking off?
05:31 David: So, people were actively using even the prototype which was sort of what prompted me to do this because-
05:39 Michael: Oh gosh, people depend on this thing-
05:39 David: Yeah, exactly, it was terrible at that point, but it managed to be less terrible than all the alternatives, so I don't think it had a lot of years with that but I think it had maybe five or something like that, I know that someone at Twitter once told me a couple of months ago that they were just updating from Hypothesis 1.1.4 I think which I think wasn't the 2013 version I think it was the first patch release I put out beginning of 2015. But there were definitely some people using the ridiculous early prototype. In terms of serious usage, basically the first three months I think of 2015 were what it took me to go from the .1.x series to the 1.0 series, by that point there were definitely a couple of significant users. I think, Rob Smallshire was one of the early users of the 1.0 series and he was 6:40 new hat on recently which is why I mention him. And there are another handful.
06:45 Michael: Yeah the mutation testing show.
06:48 David: Yeah, exactly. And there were another handful then 1.3 which I think happened April 2015 was probably- is the first one that is really recognizable in sort of modern Hypothesis, and I think that's when sort of the traction started really- so when it really started building momentum because early Hypothesis even 1.0 series was still a bit weird, and 1.3 was the point at which all of the visible weirdness went away and I think it just became a really good useful system and people stopped having to scratch their head whenever they get started.
07:23 Michael: Yeah.
07:25 David: I mean people still scratch their heads a bit but mostly because they are new to property testing rather than because Hypothesis is weird.
07:30 Michael: Yeah, it's really interesting idea and I definitely want to talk about it. You know, maybe we should take just a little bit of a step back and talk about testing and property based testing in general, then we can dig into Hypothesis itself. So I have a lot of listeners that have a diverse background, so there's many programmers but also scientists and data scientists and so on, so maybe not everyone fully gets like the benefits of testing, so could you maybe just touch on like three reasons why we care about testing then we'll dig into what is property testing and so on.
08:04 David: Sure. So, I can give you one reason I care about testing, which is mainly it's about confidence. I was having a good discussion with some people recently about the fact that I do something slightly weird in releasing of Hypothesis which is that someone who find a bug I go here is the bug patch, I've done a patch release, ok, a new version is out, and that's fine, you can use it and install it, and the reason I can do that is because I've gotten incredibly comprehensive build an automated test so if the build passes and if all the test pass, then I could be basically sure that this code is if not perfect, it's at least working as well as I have known previous versions to work and I can 8:47 on it.
08:49 And, for me that's sort of the big thing about testing, is that it gives me the reproducible ideas of how all my software is working, and I am free to make changes to it, I am free to push out new releases because I knew this releases aren't broken, and these days trying to do any sort of software without that level of confidence feels really upsetting and alien to me. I mean, I absolutely have some small hack projects which are terribly tested but that's because they have a single user who is me and I don't really care if they break, because I'll just fix them when they break, but for anything where I want other people to be able to depend on me tests are amazing for that.
09:33 Michael: Yeah, I mean if you do an open source project having tests supporting it make it much more likely that people will have confidence in using your library, right. If you have an actual product or website that you are shipping, it would be really nice to be able to say, you know, this morning I came in and there was this cool feature somebody wanted, it took me about an hour, I edited it, it has the tests to support it, let's push it live, right, rather than we are going to queue up all of our work and give it off the keyway and in two months we'll get together over the weekend and take this site down and upgrade it, right, like those are super different philosophies, and the first one deeply depend on automated testing and automated builds I think.
10:14 David: Ideally you wouldn't do the first one either, because the problem with adding features that quickly is that you need to then support the features indefinitely. So I tend to do very rapid releases for bug fixes but I am much more cautious about adding new features.
10:36 Michael: Yeah, I guess there is two parts to it, right, the part that I was expressing is like the ability to make changes to code with confidence quickly and then there is the part of product design and evolution needs to be not done in heist, right, like, it needs to be carefully thought through, yeah. All right, cool, so what's- we know what unit testing is and automated testing, but what is this property based testing? What's the idea there?
11:03 David: So, one thing where I slightly differ in my framing of this from other repl is that I don't regard property based testing as distinct from unit testing, I think property based testing is an extension to your normal testing tools, but a property based test could be unit test, it can be an integration test, or it could be a functional test, however you want to draw these boundaries. It's property based tests versus example based tests are sort of orthogonal to the usual splits that people do for testing.
11:36 Michael: Right, it's like a layer that you can lay upon any of your testing layers, right, so if you are doing integration tests you can add property based testing to that, to get better coverage. And so I guess we need to have some terminology here that people may or may not be familiar with, like the traditional unit tests with the three As, you know you write out your test and you say here is the arrange, I am going to setup my data, I am going to act, I am going to pass in this number and expect this individual specific value back, and then I am going to assert that that's true. That is referred to as example based testing, right?
12:17 David: Yes. That's certainly the term I use and it seems to be reasonably standard, like a number of people have independently invented it so I think that was probably the best terminology to use.
12:26 Michael: Yeah, when I first learned about property based testing, I didn't know how to refer to it distinctly from- I didn't know how to call the regular or like sort of traditional unit based testing what we were calling example based testing I didn't know like what the terminology for that was, it was hard to compare, but once I sort of realized that was how it was done before, it was like here is one specific example of the test, this number equals its value, but property based testing tries to cover a range or a set of constraints, almost like rules, right?
12:57 David: Yeah, exactly, so it looks very like what you are describing, the example based test but just you replace the steps slightly; so the first step where you would set up the data yourself instead you sort of describe to the testing library Hypothesis in this case how to set up the data that you want, so instead of saying I want this string, you say get me a string here, instead of saying I want this user you say get me a user object here, that sort of thing, and then at the end, instead of being able to say you should get this exact value you can say this, you should get this exact value if the exact value is one of the things you passed in or something you can use to compute to it, from it.
13:44 Michael: Right, if I get the user that's not registered, non paying user I should be able to ask the question, is this a paying user- no. Something like that.
13:52 David: Yes exactly.
13:52 Michael: Yeah, ok.
13:53 David: And, so your final step sometimes it becomes a bit more specific, often good example based tests are already sort of quite general in terms of how they do their final asserts since we are not repeating things and reusing values from previous, so sometimes you can just take an example based test and more or less just replace the first bit and then you've got the property based test, but sometimes there are sort of other things you end up wanting to test that you could do an example based test but it's often not worth doing, if you just got to assert very small range of examples.
14:27 Michael: Right, a single example really would cover it. So I think it might be worth to discuss, it's hard to talk about code, but just to get people an idea of what this is like, so we install Hypothesis the package, and then suppose I- let's just take an example of I have some kind of method and it's just calling a function and a function is supposed to return an even number and so I am doing something like deriving from test case so self.assert this if I pass it 5 it returns me like the number 4, like the closest even number or something, like how would I change that code to become something property based with Hypothesis.
15:08 David: So you would start by adding a given decorator which is the main entry point Hypothesis that describes essentially how to call test function with a bunch of arguments that are your test data, so you would say given integers for example, integers is a standard strategy that Hypothesis provides for generating integers; now you've got your test function which has an extra argument and it's been passed in integers via that argument, you would then call your method that you are testing, and look at the result.
15:59 Michael: Right, and would you like add a parameter or something to that method or something like this?
16:03 David: Yes, exactly, you would add a parameter for the integer you want passed in and you would either pass in, either given to pass the parameter by name or positionally as you confer.
16:16 Michael: Right, ok.
16:07 David: So then you call your function with the integer you can past in and you look at the value; the first thing you would do is you would assert that the value is always even, so value==0 and then if you wanted that it was always the closest even number then what you could do is say assert that the absolute value of difference of your argument and your return value1.
16:39 Michael: Ok. And then I don't change anything else, right, like that's one of the beautiful things about Hypothesis is it's not like now we'll use, instead of using nose you now use Hype test or whatever, right, it's just you run the test like unchanged basically, yeah?
16:54 David: Absolutely. It works with essentially any of the testing frameworks without you having to use its own test runner. It's a little annoying in unit test because when it fails it will print out the falsifying examples sort of in the middle of your test runs, because unit test doesn't have a good way of hooking into test outputs
17:12 Michael; Ok, which one would you say is the best to connect it to? pytest?
17:16 David: Py Test is probably the best. It works about, a Py Test has a plug in for it but that does almost nothing, so it works maybe 10% with Py test and with Nose, and unit test isn't the least supported, unit test works fine other than the printing issue, the least supported is probably Twisted test runner because the asynchronous support in Hypothesis isn't quite there.
17:45 Michael: Right, that's the most different.
17:46 David: You can do it ok with some hooks to basically turn the asynchronous test into a synchronous test, so if you want to test asynchronous code from Py Test or Nose that will mostly work but the temporarily blanking will be twisted test runner is called but it's assumptions about how tests work or slightly different from Hypothesis assumptions about tests work.
18:09 Michael: Right, ok, sure, you know, I think I got a pretty good handle on how it works, but a lot of the people are here in this for the first time, when you run the test basically the decorator wraps to our test function and then it looks in the description and the parameters and then it comes up with ways in which it tries to break your test and it passes and executes it with many different values, right?
18:31 David: Yes, that's exactly right. It doesn't so much commonly tries ways to break your test, it's just trying with a large range of values, many of which are quite nasty, so it doesn't currently look inside your test and pick out what might break things, but it has pretty good heuristics for figuring out values and they are like break things.
18:49 Michael: Right, and you know, I think one of the really important and maybe under- you know, it's hard to make generalizations for people, but under tested parts of code is the error handling and the of by 1 errors and just like right on the boundary of good and bad. It seems like Hypothesis is really good at like looking for that, and unusual inputs and values in general, right?
19:13 David: Yeah, it's definitely, there are a lot of things that Hypothesis will try at most times which people sort of forget sort of thing. I don't know if this is actually true but I often joke that about the third of the bugs that you'll find when running Hypothesis on 19:26 is you forgot something to be empty. The third is a made up number but I definitely see in this pattern a lot, it's good to finding unicode bugs because people often forget about particularly weird unicode edge cases although I found recently a bunch of bugs where it needs to be 19:41 finding them.
19:44 Michael: Right, ok. Yeah so for example if I have like some kind of scientific thing, that expects numbers, you know, maybe it assumes that they are all positive and Hypothesis would of course pass negative numbers or zero, but it also might pass like not a number, infinity, all sorts of funky inputs that people typically don't test for in example based testing or if they even have these at all, right?
19:44 Michael: Right, ok. Yeah so for example if I have like some kind of scientific thing, that expects numbers, you know, maybe it assumes that they are all positive and Hypothesis would of course pass negative numbers or zero, but it also might pass like not a number, infinity, all sorts of funky inputs that people typically don't test for in example based testing or if they even have these at all, right?
20:08 David: One problem I found is that a lot of people who are doing floating point math don't actually care enough about getting the right answer to make using Hypothesis for really valuable. Like if you do care about it you are prepared to do a numeric stability analyses on your code, then Hypothesis is great but for the most part I think people who are using floating point kind of want to pretend that they are not, and the edge case is that Hypothesis would tell them about somewhat unwelcome.
20:41 Michael: Right, so what if I have like some kind of scientific thing and I know it only expects say positive integers, or but not infinity, something like this, and is there a way in the given decorators say I want to give it integers but only in say this range or something like that?
20:59 David: Yes, so the strategy functions all come with the huge number of 21:04 so the integers function accepts both min value and max value parameters and you can just specify those as you want, so you could say min value=1 if you only want the 21:17 of integers similarly floats have, floating strategy has both min and max value and you can also tell it 21:26 don't get me infinity or don't give me num, so you could do something like floats min value=0 allow infinity=false allow none=false
21:36 Michael: Yeah, right, ok, that's really cool.
21:36 [music]
21:36 This portion of Talk Python To Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to 3500 companies who will work to compete with you.
21:36 Take it from one of Hired users who recently got a job and said, "I had my first offer on Thursday, after going live on Monday and I ended up getting eight offers in total. I've worked with recruiters int he past but they've always been pretty hit and miss, I've tried LinkedIn but I found Hired to be the best. I really like knowing the salary upfront, privacy was also a huge seller for me."
21:36 Sounds awesome, doesn't it? Wait until you hear about the signing bonus- everyone who accepts a job from Hired gets a $1000 signing bonus, and as Talk Python listeners it gets way sweeter. Use the link hired.com/talkpythontome and hired will double the signing bonus to $2000. Opportunity is knocking, visit hired.com/talkpythontome and answer the door.
21:36 [music]
22:45 Michael: And another thing that I thought was nice on these, I don't know if this works for every strategy you'll have to tell me, is if I have a set of whatever, like let's just stick with numbers for a moment, integers, and I want to somehow control them in a way that is not just min max or something like that, like I want only Fibonacci numbers or prime numbers or something like this, like you can do a .filter and then add a lambda expression that will further like say you are going to give me a bunch of stuff but actually these are the only ones I'll allow you to give me, something like this, right?
23:17 David: So you can do that with every strategy, but for a lot of the examples you just gave you shouldn't. Because, the problem with filter is that it's not magic, essentially the way it works is by generating values and then throwing them away if they don't and trying again if they don't pass the lambda expression. So, if you've got something like the Fibonacci numbers and you were trying to filter by a test is a Fibonacci number then you would basically- the only numbers you'd ever really find are the really small ones probably because high up there're 23:53. The way you can also do is you can, instead of filtering you can map, so you could instead generate a positive integer, and then say map that by a give me the 24:06 but not a number function, so you would start by generating say 10 is to get map through and you would get the 10 Fibonacci number.
24:17 Michael: Yeah, some huge number there. Yeah, can you take like a bounded set of like here is 214 things that I think are valid inputs and please start from here, pick one of these?
24:26 David: Yeah, absolutely, the strategy is called sampled from in Hypothesis, you just give it an arbitrary collection and it will give you values from that collection.
24:33 Michael: Can you compose or combinatorially combine these like let's say have like 3 sets, one's got 5, one's got 2 and another's got 3, can I say I want a sample from this and this and this and who knows what that would, how many combinations that would have to do to work that out, but you know, like it would figure that out and sort of pick some from here and there and test them together?
24:55 David: So if you want one for then you can use the tuple strategy which you can basically just pass in a sequence of n strategies and it will give you give you a tuple of n elements with these sort of 25:15 elements drawn from the 25:15 strategy.
25:08 Michael: Ok.
25:09 David: So in that case you could use something like tuples sampled from sampled from, sampled from
25:13 Michael: Right, ok, cool, and then what are some of the other strategies like we are talking about numbers a lot, but there is more to it than that, right?
25:21 David: Yes, so there is strategies for most of the built in types, you've got those unicode there is date time, although the date time in an extra package because it depends on pytz because no one actually works with date time without pytz; you've got all the collections types so you've got the tuples one dimension but you can also generate dicts of things, you can generate sets frozen sets, you've got unicode and byte string generators of course
25:48 Michael: Per mutations, right?
25:49 David: yes, yeah, you can get generate per mutations of given list, I don't think many people use that one but I've occasionally found it super useful.
25:56 Michael: Right, when you need it I am sure you are like this is exactly what I need for things, you know, but not so often, right?
26:01 David: Yeah, and then you've got, these are sort of most of the primitive strategies I think there are few more, but essentially if it's in the standard library you can probably generate it, and then on top of that you've got various strategy composition functions which is for things like the map and filter I mentioned, you've also got the composite decorator which lets you sort of chain together a bunch of traffic generators and generate whatever data you like. When you've got builds which is a bit of a simpler version of that and there is also a special Django support, again in an extra package so that it can generate more or less arbitrary Django models without you having to specify anything in particular. You can say models from my model class and it will do the rest, you can have 26:53 the rest it does, but by default it should work reasonably well out of the box.
26:56 Michael: That's cool. So if I had like a users table, and I wanted to sort of replace that and not actually talk to the data base but I want to test the code that works with those I could have some kind of thing that would randomly generate users that match my data model, yeah?
27:12 David: It actually does talk to the data base, it works with a normal 27:13 testing and the model objects it generates are persisted in the data base.
27:18 Michael: Ok, interesting.
27:18 David: Yeah, so this is real Django 27:22 other than any sort of marking.
27:24 Michael: I see. Ok, very cool. One thing that I thought was neat is you can say 27:31 you give this strategy, you say like so if you are given something from the set of integers and something from the set of floating points where you do like a filter, like you know some property on your floating points, it's going to randomly pick a bunch of combinations and try them, and that's cool, but maybe you are coming from a particular example based testing scenario where like this combination of numbers for whatever reason is super important to test and it has to be this and so like in addition to all the randomness that you are doing to sort of explore the problem space, you can say, you can give an example decorator, right, and say, and also include this particular test that I was testing before and say if you are like upgrading to property based testing.
28:15 David: Yes, absolutely. And the example decorator, one that is 28:18 that is 28:19 is unique to Hypothesis but was in retrospect obviously a good idea, the original use case was more for people who wanted to include the examples that Hypothesis had found for them in their source code, but it's found a whole bunch of other applications, like one is the one you mentioned, of making it much more comfortable to transition form example based testing, another thing that I also was doing and I've seen a couple of other people doing is that you use example to ensure that the test always gets the maximum coverage that the test can reach. So if there is some line that your test covers 80% of the time than you just add an examples that touches that line and so means that your coverage isn't very 29:10 test run as a result of Hypothesis if you do that.
29:14 Michael: Interesting. Yeah, that is really cool, can you use it as a negative a negation operator like do all the stuff but don't send it this example or is there some way to say this, I am not sure that that would make any sense but-
29:25 David: You can do that with example, that's what the assume function is for, it basically gives you a way of filtering out some classes of example as not good examples that you don't really care about.
29:39 Michael: I see, is that like a pre conditioned sort of concept so I could say like if I was generating automatic user I would say an assume that user was generated today, or assume that their age is over 18 or something like that.
29:53 David: Yeah, exactly. It's essentially a pre conditional that you can put it anywhere int he test so sometimes you would put it in the middle of the test when some previous operations produce the result that you don't care about, so you could do a bunch you could say you've got a list of users and you did a bunch of calculations and you could then assume that I don't know, you have at least one of the users pass through the code you just run and you haven't sort of unregistered all the users or something.
30:26 Michael: yeah, sure. Ok, I think that's really useful and helpful. Does Hypothesis ever look at the code or the code coverage or anything like this or does it just work black box style, I am going to put stuff in and see if it crushes.
30:40 David: So it currently just works black box style, I've got most of the pieces for making it work in a less black box style, in fact I've got a separate library I wrote called glass box which is designed for adding sort of glass box elements to your testing which uses some coverage information, a lot of Hypothesis internals look quite a lot like the internals of a fuzzer called American fuzzy lop which has this really nice coverage based metric, but the reason why the coverage stuff has never made it to Hypothesis is because coverage metrics work really well when you got some sort of long running fuzzing process, so if you are prepared to run your tests for a day then coverage metrics are amazing and will do a really good job at 31:29 and seeing what is going on, but if you are sort of upset if your tests take long as ten seconds to run then coverage starts to become less useful because it doesn't really have the time to actually 31:53 things out. I think I've mostly figured out a way of making use of coverage in a way that works ok within time constrains, but it hasn't made and do anything in production yet.
32:03 Michael: Ok. That sounds like a cool direction.
32:05 David: However, I've been saying that for about six months, so I wouldn't hold your breath waiting for this features.
32:11 Michael: Yeah, of course. So, it seems that it would be interesting for Hypothesis to remember scenarios that came up with that failed, like is there a way to say Charli'e examples and if you ever discover one, you know save it to something that you are going to know to replay or-
32:30 David: It actually does that out of the box.
32:31 Michael: Ok.
32:32 David: Hypothesis has a data base of examples, say data base, it's basically an exploited directory format it is designed so you can check it into gate 32:39 if you want but whenever a test fails then what Hypothesis does is well first of all it shrinks the examples so it turns the example into this simpler one and then 32:52 doing that every example it sees fail is saved in this data base, so that when you rerun the test it replays this example starting from the simplest working its way upwards, so whenever basically one of the things that I made really sure in Hypothesis is that all this randomness it serves the good kind of randomness because it finds bugs but it never forgets bugs, and if the test fails, then when you rerun it the test will continue failing until you fix the bug.
33:25 Michael: yeah, because if it didn't remembered I guess like suppose the number are 13 for some reason broke your code and you just said the strategies was integers, like how would it know to try 13 again if it's like really going that out to that many numbers and so on, so ok, because it did seem to keep failing consistently, and I guess, I kind of clued in that is a little odd that it seemed just I'm like oh it's really smart, but I didn't put it together how it's smart of course.
33:52 David: For a lot of tests the failure is common enough that even without the data base Hypothesis would be able to do this and would be able to find the failure 34:00 each time, the one of the annoying things that happens there is that sometimes if it didn't have a data base it would be finding often different bugs each time, because you don't always shrink to sort of the globally minimal failure and sometimes there are two bugs and if you are starting from one we'll shrink one way and starting from the other we'll shrink the other way, so in many ways like that's almost more annoying than failing unreliably, because you don't know whether you've just changed the bug or whether you introduced a new line or what.
33:52 [music]
33:52 Gone are the days of tweaking your server, emerging your code and just hoping it works in your production environment. With SnapCi's cloud based, hosted continuous delivery tool you simply do a git push and they autodetect and run all the necessary tests through their multistage pipelines. If something fails, you can even debug it directly in the browser.
33:52 With the one click deployment that you can do from your desk or from 30 000 feet in the air, Snap offers flexibility and ease of mind. Imagine all the time you'll save. Thank SnapCI for sponsoring this episode by trying them for free at snap.ci/talkpython.
33:52 [music]
35:30 Michael: One of the things I think is interesting is the scenarios we've been talking about so far, they've been I give you two numbers, something happens, the other side, it gives you the right answer, the wrong answer and the test detects that. But I think, one of the things that seems very powerful is for sort of stateful testing and these set of steps like let's suppose I have a board game, and I've got to move things around the board in a certain way and some properties are supposed to always hold, like you know, the number of chips on the board is always the same even if it comes from one player comes off and another has to go back on to the other player, right, who knows I have no idea what game this is, but like you could write a test, to say do all these operations and then keep asserting that if this is true and if it fails it gives you a really nice reproducible output right, a set of steps, almost, right?
36:25 David: The output format is a little idiosyncratic, so it's not unfortunately something you can just copy and paste into a test currently. But yeah, the stateful testing is really cool; one of the reasons I don't push it quite as hard as I could is that I don't feel like we've got very little workflows for this right now. Because the stateful testing sort of is exploring incredibly large sort of space, so you do want something that's more like the fuzzing work as I talked about where you do want to set it running for 24 hours or whatever, you can use this in your CI, and I am using in my CI in a few places, and at least one other project I know of is using it but I wrote those tests that's material. But I think it's not quite there yet in terms of being as- it's certainly not as usable as the rest of the library, and I think it has some more work with it before it gets there but I am really excited about it and I do want to sort of spend more time developing that.
37:27 Michael: Yeah, the possibilities of it are really amazing, it's just like you said, it's such a problem space that how are you going to find it, right?
37:36 David: In many ways, my crack about Hypothesis being true automated testing is really only true for the stateful testing, because one of the things I emphasize sometimes is that Hypothesis doesn't write your tests for you, you are still writing your tests, it's just that Hypothesis is doing some of the heavy lifting in terms of the boring coming up with examples, but the stateful testing is sort of, that's almost no longer true, at that point Hypothesis is almost writing tests for you.
38:05 Michael: Right, yeah. That is actually really cool. But I think even so, that level of automation that you talked about that is already really good at, is super helpful because you coming up with those examples is hard, and like I talked about earlier, I think coming up with the examples that are just inside the working realm and just on the edge of the error conditions or these weird inputs, you know, those are hard to come up with and you just, I think there is just fatigue as well, like ok, I've tested three cases like that's probably good enough, let's just move on to building new features, right, whereas property based testing will sort of explore that space for you automatically, right?
38:41 David: Yeah, yeah, absolutely. The fast with this is, why you are still writing the test isn't just intended to take away from what Hypothesis is doing. The coming up with examples part is both most time consuming part and the most boring part. And also, it's the bit that people are really bad at so having software which can take this boring difficult task and just do it for you is amazing, and I get really frustrated when I have to write tests that I don't have the capability for now.
39:11 Michael: Yeah, I'm sure you do. And the fact that the problems that it finds get permanently remembered that's really cool. So do you recommend people check in their .hypothesis folder in the git?
39:23 David: It's designed so that you can do that, but I mostly don't do it myself. I generally think that you are probably better off writing at example decorators for any example you specifically want to be remembered. The main problem with the example data base is that its format is quite opaque, you can't really look at the file in there and say ah, I know what example this corresponds to. So, even though form a computer's point of view it will remember what you need and it will do what you want, from a human's point of view, you probably want to be more explicit than that. One of the things I like to work on at some point, but haven't sort of found the time or bluntly the customers for doing this work is better sort of centralized management of test examples, so that you can have your cake needed to, and rather than checking it into gits you can have a nice management interface where you can see what Hypothesis has run and get both the 40:26 and the visibility. But that's sort of at some point in the future project, it's not anything on any short term right now.
40:35 Michael: Yeah, we'll even something automated that would take the Hypothesis data base and inject them as add examples into your code would be cool.
40:44 David: Yeah, because the major problem is that there is no real way of taking a Python object and going from that to a literal that you can just copy and paste into your code and it will produce that object. You can do it for really simple things, so a lot of the built in types the repr now will do that.
41:05 Michael: Yeah, basically the repr output is parsable, it's probably ok but that's often not the case, especially for custom types.
41:12 David: Yeah, I would say it's almost never the case for custom types, one of my idiosyncracies as a programmers is I do try to make sure that all my types have good reprs and that will evaluate the thing you started with. But, this is very rarely the case in the wild, if sort of one of the cute little details in the Hypothesis is that I spend far too much time on for what it's worth, but if you almost any of the strategies you get out of the standard Hypothesis strategies will give you a very nicely formatted repr that will exactly reproduce the strategy, and this is true even up to the point of like if you filter by a lambda it will give you the source code of the lambda, in the rapr for the strategy. It's all a bit ridiculous and I really don't recommend emulating it but every time I see it, it makes me smile.
42:01 Michael: Yeah, I'm sure. That's cool. So one thing that I seemed to hear a lot about when I was looking into property based testing was there seems to be a set of like patterns, that people come across that seemed to repeat themselves that the property based testing as well served, yeah? Can you talk about some of those?
42:21 David: You mean like sort of standard styles of tests that...?
42:26 Michael: Well, I am thinking like one of the things people often say is a really good type of thing to send to return this type of system onto is serialization and deserialization. Or upgrading from like a legacy code, to a rewrite, you could be able to like pull in old library and always ensure that the same inputs here get the same outputs in the same thing, and the new system, or if I have a really simple algorithm I am optimizing using the simple slow version to verify the fast version, things like this.
42:57 David: Yeah, absolutely. So the absolutely best thing to test with property based testing in general is these two things should always get the same answer. Because, it covers such a wide range of behaviors and gives you so many opportunities to get wrong, and particularly for these sort of the optimized and naive version of an algorithm is great because often there are very different styles of algorithm so what you are essentially testing for is have I made the same mistake on both of these things, and usually you'll make different mistakes and so a test failure is either a bug in your optimize done or your naive one-
43:39 Michael: Right, so it's almost like double check accounting, it doesn't necessarily mean your new one is wrong, but some things need to be looked at and who knows, yeah?
43:46 David: Yeah, One of the things that I've been trying to do with the newish Hypothesis website, Hypothesis.works is gather a family of these different properties because relatively few of them have been written down and there is a blog post scattered across the internet with some of them and there are a few really good prior articles but a lot of them either hasn't been written down, or start with this long category theory, and I am not against category theory, but I don't really use it myself and I think it tends to scare people off, so most of the time I am just starting with here is a concrete problem, let's solve it in Python, here's how you test it with Hypothesis.
44:37 Oh, there is one pattern that I've noticed recently, which is either original to me or is an independent reinvention that no one else has written down before, but I really like it, this style of testing which is rather than these two things should give the same thing, it's if I change this data then this should move in this direction. So, you generate some data, you run a function on that and you then change the data in some way and you run it again, and you sort of- and the change in the output should in some way be reflective of the change in the input, so I originally came up with this optimization problems where you run the optimizer and you make some changes which should make the problem harder, and you are certain that the score of the output doesn't get better, or you make the problem easier and you are certain the score of the output doesn't get worse. But I also had a nice example recently with binary search which is if you run a binary search then you insert an extra copy of the value at the point that 45:46 then this shouldn't change the output of the binary search because it's only a sort of shifting stuff to the right.
45:54 Michael: Right, exactly.
45:54 David: And sort of in general looking for things where function shouldn't be even predictable ways and end up moving in ways that you didn't expect.
46:01 Michael: Ok. Interesting. Like, I increased the tax rate my ecommerce system and the price went down what happened, or the price didn't change, for example oh oops, we are not actually including the tax, who knows.
46:11 David: Yeah, exactly.
46:13 Michael: Ok. Very interesting. So I think property based testing in the Hypothesis is really exciting, I think in a couple of ways, I think it means that testing is more effective and I think that it's less work to write those tests so that's like a perfect combination. Have you done anything like just as a proof of concept like grab some popular open source project that has good test support and like converted its tests and found new bug or anything like that?
46:43 David: I haven't personally. One of the problems here is that it's like with any testing, it's very hard to test a project you don't understand, so I generally do not go into other people's projects and try to write tests for them, I've done it once or twice, a customer paid me to do some testing work on Mercurial and add some tests on that which was interesting.
47:08 Michael: Did you find any bugs?
47:09 David: Yes, so I found a bunch of bugs actually. None of them particularly critical but some of the, so the encoding one came out, Mercurial has a bunch of internal encoding representations, for some reason Mercurial has 3 different jsoning code, we found bugs in one of them, and there is some stuff where Mercurial wants to represent things as UTF8P which is a way of taking arbitrary binary data and turning it into valid UTF code and text and backwards. And that had a bunch of bugs. I don't think that's used very widely but it still had a bunch of bugs.
47:46 Michael: Yeah.
47:48 David: And then we used this stateful testing for sort of validating repository operations and found two interesting bugs in the HD shelve extension, which is HD shelve is basically just stash for Mercurial. I am sure someone will be mad at me for saying that but that's because of what it is, and I can't remember exactly, I think one of them was that the set of valid shelf names was narrower than the set of branch names, and so you could create a branch which had a name that wasn't a valid shelf and then when you try to shelves. it would default to use the branch name for the shelf name and everything would go wrong. And the other was, it was something like if you create a file, than immediately delete it and without adding the- without committing the delete and then try the shelf stuff- no sorry, here is- you delete the file, then you create a new file with the same name which is on tracks and then you try to shelf stuff and the shelf extension gets this off 48:53
48:54 Michael: Wow, it seems really interesting, I am sure that lots of those types of tests were already there, but just those particular cases weren't found, there is an example of, there was a talk at PyCon this year by Matt Bachmann and he talked about property based testing on his project, actually uncovered a bug in date util, because his project was depending on date util and some you know, your system sent some kind of insane date time like, you know before BC or something like this, and he freaked it out, which then they fixed in date util which is really cool.
49:31 David: Yeah, I've occasionally thought about making the Hypothesis dates a little more restricted by default, because by large no one really cares about the representation of 9000 BC or 9000 AD because they will be using a different calendar, they were using different calendar anyway, but it does come up with these fun bugs, so I left it in for now.
49:55 Michael: Ok, cool.
49:56 David: I think I vaguely recall a bug in question, I think that one might have actually been slightly more reasonable dates, like I think it was for a century AD it went wrong or something where I vaguely recall that being one where if you try and represent the year 99, then it would assume that you meant 1999 or something like that.
50:15 Michael: Yeah, it's cool, that's why I kind of asked you about the open source projects because I think you know, it would be "fun" to take to see the result of somebody taking all the tests for say like the top 100 most popular PyPi packages go and look at their test suites and convert the example based testing to property based testing, and just see what that spits out. Yeah.
50:41 David: In general, what I would recommend in that space is more if you are working on a popular Python package and you want to give that a try, pop into IRC or send me an email and I would be very happy to help you out. Because, I think I really do think that what you need for doing that is more experience in the project than it is experience with Hypothesis.
51:01 Michael: Of course, understand the domain of what you are trying to actually test, right. Ok, cool, well, we know quite a bit about property based testing now but you maybe tell me a little bit about yourself, what else do you do in the programming space, for work and things like this?
51:15 David: So this is more or less my day job, I've been working to it actively on Hypothesis on the last couple of months, because I've been sort of researching simulated things, but basically I do independent R&D and testing tools and I do some consulting and training around that either helping people to use Hypothesis or helping people to improve their testing in other ways. It's- historically I've done more sort of back end data engineering, but once I sort of got into Hypothesis properly and found that I really like working with this sort of thing and there is demand for it, that's mostly what I've been doing.
51:48 Michael: Yeah, I can see a huge demand of companies that have large projects that have tests but not this kind of tests, like you know, it might be really useful to spend two weeks with you just give it another go with the different system, right, with Hypothesis.
52:04 David: Yeah. I wish there are few demands, there is huge demand for Hypothesis, I've currently got the thing that I think most new businesses have in their first year, where the best way to get new customers is to have existing customers, and that it turns out the sales and marketing are hard, so right now I would say that I am experiencing 52:25 but not here, it may take a few years to get there.
52:31 Michael: No, but it's a cool project and I definitely can see it growing in the future because it's solving a real problem and it solves it in a better way than what we are doing today. 52:40 David: Hypothesis itself is getting plenty of demands, it's I think PyPi stats are broken right now but certainly when they were last working it was getting quite respectable number of downloads compared to projects that I had really thought of this being much more popular than they were, or much more popular than Hypothesis.
52:59 Michael: That's awesome. Well yeah, congratulations on that, that's cool.
53:01 David: Thank you.
53:02 Michael: All right, we are getting near the end of the show, I have two questions I always ask my guests, first of all, if you are going to write some code, what editor- not specifically Python, but in general as well, like what editor do you open up?
53:13 David: Basically, Vim, I've been experimenting with using Windows recently and I was trying PyCharm but I've ended up mostly just going back to Vim even on Windows.
53:26 Michael: All right, ok, cool, and of all the PyPi packages out there, you know, what one maybe there is over 80 000 there is a bunch we all have exposure to that aren't necessarily main stream but you know, like wow this is really a gem that I've found that people didn't know about. In addition to Hypothesis, so in addition to pip install Hypothesis which is cool, what else would you recommend?
53:47 David: Most of my- I don't really have any niche packages I can recommend in that regard, I really like PyhtonTest and Coverage is an exceptionally good piece of work, both of these are like quite well known and enterprising; there is actually a package by Matt Bachmann I think called diff-cover that I haven't used myself but it looks really good for putting your CI- so when you've got, when you want say 100% coverage and it's very hard to get a 100% coverage and a single loop, so what you do is you set 54:29 and just say I don't care what the coverage currently is but you can never make it worse, and so a diff cover is just nice little tool that is designed to help you configure that 54:43 thing on coverage or on PEP 8 check or things like that.
54:48 Michael: Nice. So you basically enforce through all that coverage, it always needs to get better.
54:58 David: I don't currently use that because I already have 100% coverage, but if I were coming into someone's existing project which was large and had a slightly less good state I would absolutely recommend checking something like this out.
55:14 Michael: That's excellent, thanks for the recommendation. All right, any final calls to action, how do people get started with Hypothesis?
55:19 David: So in getting started with Hypothesis what I would really recommend is just checking out the hypothesis.works website and sort of getting a feel for it trying ti out of it, and then if you are a company who wants to improve your testing with Hypothesis, I would very strongly recommend hiring me to come in and either do a consult or run a training workshop. One of the workshops I run is basically an exploratory thing where you stick me in a room with ten devs for a day and we figure out how to write a whole bunch of new tests for your software, and we usually find some interesting bugs particularly in the unicode handling, most of the time we've done it, but also in other weird edge cases that you wouldn't necessarily test.
56:05 Michael: Yeah, that's awesome, I am sure you do find some weird ones and that's great. All right, well David thanks so much for being on the show, it's been great to talk to you.
56:10 David: Thank you very much Michael.
56:12 Michael: Bye.
56:13 David: Bye.
56:13 This has been another episode of Talk Python To Me.
56:13 Today's guest was David Maclver and this episode has been sponsored by Hired and Snap CI. Thank them both for supporting the show!
56:13 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary inequity presented right upfront and a special listener's signing bonus of $2,000.
56:13 Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at snap.ci/talkpython
56:13 Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. If you're looking for something a little more advanced, try my write pythonic code course at talkpython.fm/pythonic.
56:13 You can find the links from the show at talkpython.fm/episodes/show/67
56:13 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.
56:13 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song at talkpython.fm/music. You can browse his track cs for sale on iTunes and listen to the full length version of the theme song.
56:13 This is your host, Michael Kennedy. Thanks for listening!
56:13 Smixx, take us out of here.