Learn Python with Talk Python's 270 hours of courses

#67: Property-based Testing with Hypothesis Transcript

Recorded on Monday, Jul 11, 2016.

00:00 Let's talk about your unit testing strategy. How do you select the tests you write, or do you even

00:04 write tests? Typically, when you write a test, you have to think about what you are testing and the

00:09 exact set of inputs and outcomes that you're looking for. And there are strategies for this.

00:13 Try to hit the boundary conditions, test the most common use cases, seek out error handling,

00:18 things like that. We all do this to varying degrees of success. But what if we didn't have

00:23 to do this? What if there is some kind of way to express relationships between inputs and outputs,

00:27 but your test could explore the problem space independently on its own? Well, there is a way,

00:33 and it's called property-based testing. This week, you'll learn about Hypothesis,

00:37 the most popular property-based testing system for Python created by David McLeaver.

00:42 This is Talk Python to Me, episode 67, recorded Monday, July 11th, 2016.

00:48 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

01:18 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm at,

01:22 mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show

01:28 on Twitter via at Talk Python. This episode is brought to you by Hired and SnapCI. Thank them for

01:34 supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI. David, welcome to the show.

01:43 Thanks, Michael. It's a pleasure to be here.

01:44 I'm really excited to talk about this new angle on testing that I honestly just learned about a few

01:51 weeks ago. I'm really thrilled to talk about property-based testing, where the computer can

01:56 sort of do more of the heavy lifting and more of the automation, honestly, for us, right?

02:00 Yep, that's the idea. Sometimes when I'm feeling particularly salty, describe it as real automated

02:05 testing.

02:06 Yes, yeah. And we'll get into why that is and what that is. But before we do, let's talk about

02:11 your story. How did you get started in programming?

02:13 So I actually got started in programming for very selfish reasons, which was simply for the money.

02:17 I got out of university with a maths degree and didn't really want to stay in academia,

02:23 but was completely lost in anything fewer than 10 dimensions. So I had to figure out what to do

02:30 if that would actually work for me. And the options that came up were banking or software,

02:34 and software seemed the lesser of the two evils.

02:38 I would say you made a good choice.

02:40 And I found a nice little company that was happy to take a chance on me and let me learn to program

02:46 on the job. And it turned out I quite liked this programming thing. And ironically, having gotten to

02:53 it for the money, I've been spending a lot of the last year and a half doing work entirely for free.

02:57 But hey, it's been fun.

02:58 You know, that's a really interesting sort of view of things. And I kind of got into programming,

03:05 not I didn't think of it as for the money, but for the opportunity. I also was working on my PhD in

03:10 math. And, and I looked at the jobs and that how difficult and limited the jobs were people coming

03:16 out of you, like you said, you can basically go work at like a bank or an insurance company,

03:20 or you can be in academia. And it's, it's either super boring, you're working, you know,

03:27 like optimizing a little bit of some kind of insurance policy, or it's, it's extremely competitive.

03:33 And I decided to go into computers as well, sort of after the fact. And I had a similar experience

03:39 that I started working for a company that was willing to take a chance on me. I basically said,

03:45 look, I don't know much programming, but this one thing that you need, this is the one thing I can

03:50 actually do really well. You know, if you're willing to just let me do this one thing,

03:53 and then you give me some support, I can do a great job for you. And they did. And,

03:57 you know, it's no, no looking back. It's great. So yeah, very cool that you got that start that way.

04:02 Where'd Python come along?

04:04 Python originally came along just for another job. I've always been sort of a programming languages

04:10 polyglot and tends to just pick up languages either when they seem interesting or when I need them for

04:15 a particular problem or a particular job. In Python's case, it was a, it was a job. I had

04:22 been working in a company doing Ruby actually for a couple of years. But when I was looking to move on,

04:30 the job that seemed most interesting was one using Python. Python seems like a perfectly nice language.

04:35 I was very happy to learn it. So I did. I learned it in advance of the job rather than actually

04:40 starting from day one with zero Python knowledge. And musingly, a hypothesis was actually my learning

04:46 Python project originally. Wow.

04:49 It's so the version of hypothesis you see today isn't my learning, isn't my learning Python project

04:54 has been essentially completely rewritten since then. I wrote basically a prototype back in 2013,

04:59 when I was first learning Python. And then beginning of 2015, when I left another job and had some time on

05:08 my hands, I got frustrated that no one had improved on this silly little prototype I wrote. So I took

05:13 sort of the core idea and turned it into a real thing.

05:15 That's cool. Yeah. And it, you know, looking on GitHub, it's got over a thousand stars. It's,

05:19 it's definitely going, how long has it been sort of actively used or, you know, you said you sort of

05:26 did it as a learning thing and then you revamped it. Like when was that revamping and when is it kind

05:30 of taken off?

05:31 So people were actively using even the prototype, which is sort of what prompted me to do this because

05:37 Oh my gosh, people don't put another thing.

05:39 Yeah, exactly. It's, it was terrible at that point, but it managed to be less terrible than

05:45 all the alternatives. So I don't think it had a lot of users then, but I think it had maybe

05:49 five or something like that. I know that someone at Twitter once told me a couple of months ago that

05:55 they were just updating from hypothesis not 0.1.4, I think, which I think wasn't the 2013 version.

06:03 I think that was the first patch release I put out beginning of 2015, but there were definitely

06:08 some people using the ridiculous early prototype. In terms of serious usage, basically the first

06:16 three months, I think of 2015 were what it took me to go from the 0.1.x series to the 1.0 series.

06:25 And by that point, there were definitely a couple of significant users. I think Rob Smallshear was one of the early users of the 1.0 series and whose colleague Austin you had on recently, which is why I mentioned him. And there were another handful.

06:44 Yeah, yeah. The mutation testing show. Yeah, exactly. And there were another handful then. But 1.3, which I think happened April 2015 was probably, it's the first one that is really recognizable as sort of modern hypothesis. And I think that's when sort of the traction started

07:01 really, sort of when it really started building momentum, because early hypothesis, even in the 1.0 series was still a bit weird. And 1.3 was the point at which all of the visible weirdness went away. And I think it just became a really good, useful system. And people stopped having to scratch their head whenever they got started.

07:22 Yeah. I mean, people are still scratching their head a bit, but mostly because they're new to property-based testing rather than because hypothesis is weird.

07:29 Yeah, it's a really interesting idea. And I definitely want to talk about it. You know, maybe we should take just a little bit of a step back and talk about testing and property-based testing in general, and then we can dig into hypothesis itself.

07:43 So I have a lot of listeners that have a diverse background. So many programmers, but also scientists and data scientists and so on. So maybe not everyone fully gets the benefits of testing. So could you maybe just touch on three reasons why we care about testing that we'll dig into what is property testing and so on?

08:03 Sure. So I can give you one reason I care about testing, which is mainly it's about confidence. I was having a good discussion with some people recently about the fact that I do something slightly weird in the releasing hypothesis, which is that someone reports a bug. I go, here's a bug patch. I've done a patch release. Okay. New version is out. And that's fine. You can use it and install it.

08:28 And the reason I can do that is because I've got an incredibly comprehensive build and automated test. So if the build passes and if all the tests pass, then I can be basically sure that this code is, if not perfect, it's at least working as well as I've known previous versions to work and I can rely on it.

08:49 And for me, that's sort of the big thing about testing is that it gives me a reproducible idea of how well my software is working. And I'm free to make changes to it. I'm free to push out new releases because I know those releases aren't broken.

09:07 These days, trying to do any sort of software without that level of confidence feels really upsetting and alien to me.

09:16 I mean, I absolutely have some small hack projects which are terribly tested, but that's because they have a single user who is me and I don't really care if they break because I'll just fix them when they break.

09:28 But for anything where I want other people to be able to depend on me, tests are amazing for that.

09:33 Yeah, and if you're doing an open source project, having tests supporting it make it much more likely that people will have confidence in using your library, right?

09:42 If you have an actual product or website that you're shipping, it would be really nice to be able to say, you know, this morning I came in and there was this cool feature somebody wanted.

09:54 It took me about an hour. I added it. It has the test to support it. Let's push it live, right?

09:59 Rather than we're going to queue up all of our work and give it off the QA and in two months we'll get together over the weekend and take the site down and upgrade it, right?

10:08 Like those are super different philosophies and the first one deeply depends on automated testing and automated builds, I think.

10:14 Although ideally you wouldn't do the first one either because the problem with adding features that quickly is that you need to then support the features indefinitely.

10:25 Sure.

10:25 Features are made in haste and regretted at leisure. So I tend to do very rapid releases for bug fixes, but I'm much more cautious about adding new features.

10:36 Yeah, I guess there's two parts to it, right? The part that I was expressing is like the ability to make changes to code with confidence quickly.

10:45 And then there's the part of product design and evolution needs to be, you know, not done in haste, right?

10:51 Like it needs to be carefully thought through. Yeah. All right, cool. So what's the guy, we know what unit testing is and automated testing, but what is this property based testing? What's the idea there?

11:03 So one thing where I slightly differ in my framing of this from other people is that I don't regard property based testing as distinct from unit testing.

11:13 I think property based testing is an extension to your normal testing tools, but a property based test could be a unit test or it could be an integration test or it could be a functional test.

11:25 However, you want to draw those boundaries. It's property based tests versus example based tests are sort of orthogonal to the usual splits that people do for testing.

11:36 Right. It's like a layer that you can lay upon any of your testing layers, right? So if you're doing integration tests, you can add property based testing to that to get better coverage.

11:50 Yeah. Yeah. Yeah. And so I guess we need to have some terminology here that people may or may not be familiar with, like the traditional unit test with the three A's.

12:00 You know, you write out your test and you say, here's the the range. I'm going to set up my data. I'm going to act. I'm going to pass in this number and expect this individual specific value back.

12:10 And then I'm going to, you know, assert that that's true. Yeah. That is referred to as example based testing, right?

12:16 Yeah. That's certainly the term I use. And it's, it seems to be reasonably standard. Like a number of people have independently invented it. So I think that's probably the best terminology to use.

12:25 Yeah. When I first learned about property based testing, I didn't, I didn't know how to refer to it distinctly from, I didn't know how to call the regular or like we're sort of traditional unit based testing and what we're calling example based testing.

12:38 I didn't know like what the terminology for that was. It was hard to compare it. But once that sort of realized that that was how it was done before, it's like, here's one specific example of a test.

12:49 This number equals this value, but property based testing tries to cover a range or a set of constraints, almost like rules, right?

12:57 Yep, exactly. So it looks very like what you were describing, the example based test, but just you replace the steps slightly.

13:04 So the first step where you would set up the data yourself, instead you sort of describe to the testing library hypothesis in this case, how to set up the data that you want.

13:15 So instead of saying, I want this string, you say, give me a string here. Instead of saying, I want this user, you say, give me a user object here, that sort of thing.

13:26 And then at the end, instead of being able to say, you should get this exact value, you, well, I mean, sometimes you can say this, you should get this exact value if the exact value is one of the things you passed in or something you can easily compute to it from it.

13:43 Right. If I get a user that's not registered, not a pain user, I should be able to ask the question, is this a pain user? No. Something like this.

13:51 Yes, exactly. Yeah. Okay.

14:21 Could do an example based test, but it's often not worth doing if you just got a very small range of examples.

14:26 Right. A single example really would cover it.

14:29 So I think it might be worth to discuss, you know, it's hard to talk about code, but just to give people an idea of what this is like.

14:38 So we install hypothesis, the package, and then suppose I've, let's just take an example of, I have some kind of method and it's just calling a function and that function is supposed to return an even number.

14:50 And so I'm, I'm doing something like deriving from test case.

14:53 So self dot assert this, if I pass it five, it returns me like the number four, like the closest even number or something.

15:01 Like how would I change that code to, to become something property based with hypothesis?

15:07 So you would start by adding a given decorator, which is the main entry point hypothesis that describes essentially how to call your test function with a bunch of arguments that are your test data.

15:21 So you'd say given integers, for example, integers is a standard strategy that hypothesis provides for generating integers clues in the name.

15:31 But now you've got your test function, which has an extra argument and is being passed in integers via that argument.

15:39 You would then call your test, your method that you're testing and look at the result.

15:47 Right. And would you like add a parameter or something to that method or something like this?

15:51 Yes, exactly. You, you would add a parameter for the integer you want passed in and you would either pass in, either tell given to pass the parameter by name or positionally as you prefer.

16:05 Right. Okay.

16:05 But yeah, so then you call your function with the integer you've been passed in and you look at the value.

16:13 The first thing you would do is you would assert that the value is always even.

16:19 So value mod two equals equals zero.

16:21 And then if you wanted that it was always the closest even number, then what you could do is say assert that the absolute value of the difference of your argument and your return value is less than or equal to one.

16:36 Okay.

16:38 And then I don't change anything else.

16:40 Right. Like that's one of the beautiful things about hypothesis is it's not like, well, now use instead of using nose, you now use hype test or whatever.

16:49 Right.

16:49 It's just you run the test like unchanged basically.

16:52 Yeah.

16:53 Absolutely. It works with essentially any of the testing frameworks without you having to use its own test runner.

16:59 It's a little annoying in unit test because when it fails, it will print out the falsifying example sort of in the middle of your test runs because unit test doesn't have a good way of hooking into test output sensor plane.

17:11 Okay. Which one would you say is the best to connect it to?

17:14 pytest?

17:15 pytest is probably the best.

17:16 It works about, pytest has a special, has a plugin for it, but that does almost nothing.

17:21 So it works maybe 10% better with pytest than with nose.

17:26 And unit test is, or unit test isn't the least supported.

17:32 Unit test works fine other than the printing issue.

17:34 The least supported is probably Twisted's test runner because the asynchronous support and hypothesis isn't quite there.

17:43 Right. That's the most different.

17:45 You can do it okay with some hooks to basically turn the asynchronous test into a synchronous test.

17:51 So if you want to test asynchronous code from pytest or nose, that will mostly work.

17:56 But the, I'm temporarily blanking on what the Twisted test runner is called, but it's assumptions about how tests work or slightly different from the assumption, from hypothesis assumptions about how tests work.

18:07 Right. Okay. Sure.

18:09 You know, I think I got a pretty good handle on how it works, but a lot of people are hearing this for the first time.

18:15 When you run the test, basically the decorator wraps your test function and then it looks at the description and the parameters.

18:22 And then it comes up with ways in which it tries to break your test and it passes and executes it with many different values, right?

18:30 Yep. That's exactly right.

18:31 It doesn't so much come up with ways to break your test.

18:34 It's just try it with a large range of values, many of which are quite nasty.

18:38 So it doesn't currently sort of look inside your test and figure out what might break things, but it has pretty good heuristics for figuring out values that are like the break things.

18:48 Right. And, you know, I think one of the really important and maybe under, you know, it's hard to make generalizations for people, but under tested parts of code is the error handling and the off by one errors and just like right on the boundary of good and bad.

19:04 And it seems like hypothesis is really good at like looking for that and unusual inputs and values in general, right?

19:11 Yeah, it's definitely, there are a lot of things that hypothesis will try most times, which people sort of forget or a thing.

19:18 I don't know if this is actually true, but I often joke that about a third of the bugs that you'll find when running hypothesis on a new code base is that you forgot that something could be empty.

19:27 Yeah.

19:27 And the third is a made up number, but I've definitely seen this pattern a lot.

19:30 It's good at finding Unicode bugs because people often forget about particularly weird Unicode cases.

19:37 Although I found recently a class of bugs where it needs to be better at finding them.

19:41 Right. Okay.

19:43 Yeah. So for example, if I have like some kind of scientific thing that expects, you know, numbers, you know, maybe it assumes that they're all positive and hypothesis would of course pass negative numbers or zero.

19:56 But it also might pass like not a number, infinity, all sorts of funky inputs that people typically don't test for in example-based testing or if they even have these at all, right?

20:06 One problem I found is that a lot of people who are doing floating point maths don't actually care enough about getting the right answer to make using hypothesis really valuable.

20:18 Like if you do care about it and you're sort of prepared to do a numeric stability analysis on your code, then hypothesis is great.

20:26 For the most part, I think people who are using floating point kind of want to pretend that they're not.

20:33 And the edge cases that hypothesis will tell them about are somewhat unwelcome.

20:40 Right. So what if I have like some kind of scientific thing and I know it only expects, say, positive integers, but not infinity, something like this.

20:50 And is there a way in the given decorator to say, I want to give it integers, but only in say this range or something like that?

20:58 Yeah. So the strategy functions all come with a huge number of mobs to twiddle.

21:04 So the integers function accepts both min value and max value parameters.

21:08 And you can just specify those as you want.

21:13 So you can say min value equals one if you only want to strike the positive integers.

21:17 Similarly, floats have, the float strategy has both min and max value.

21:23 And you can also tell it, and don't give me infinity or don't give me none.

21:27 So you could do something like floats, min value equals zero, allow infinity equals false, allow none equals false.

21:34 Right. Okay. Yeah, that's really cool.

21:49 This portion of Talk Python to me is brought to you by Hired.

21:51 Hired is the platform for top Python developer jobs.

21:54 Create your profile and instantly get access to 3,500 companies who will work to compete with you.

21:59 Take it from one of Hired's users who recently got a job and said, I had my first offer on Thursday after going live on Monday, and I ended up getting eight offers in total.

22:07 I've worked with recruiters in the past, but they've always been pretty hit and miss.

22:11 I tried LinkedIn, but I found Hired to be the best.

22:14 I really liked knowing the salary up front.

22:16 Privacy was also a huge seller for me.

22:19 Sounds awesome, doesn't it?

22:20 Well, wait until you hear about the sign-in bonus.

22:22 Everyone who accepts a job from Hired gets $1,000 sign-in bonus.

22:25 And as Talk Python listeners, it gets way sweeter.

22:28 Use the link Hired.com slash Talk Python to me, and Hired will double the sign-in bonus to $2,000.

22:33 Opportunity's knocking.

22:35 Visit Hired.com slash Talk Python to me and answer the door.

22:45 And another thing that I thought was nice on these, I don't know if this works for every strategy you'll have to tell me,

22:50 is if I have a set of whatever, like let's just stick with numbers for a moment, integers,

22:56 and I want to somehow control them in a way that is not just min-max or something like that.

23:02 Like I want only, I don't know, Fibonacci numbers or prime numbers or something like this.

23:07 Like you can do a dot filter and then add a lambda expression that will further, like say, you're going to give me a bunch of stuff,

23:13 but actually these are the only ones I'll allow you to give me.

23:16 Something like this, right?

23:18 So you can do that with every strategy, but for a lot of the examples you just gave, you shouldn't.

23:22 Because the problem with filter is that it's not magic.

23:26 Essentially the way it works is by generating values and then throwing them away and trying again if they don't pass the lambda expression.

23:36 So if you've got something like the Fibonacci numbers and you were trying to filter by a test is a Fibonacci number,

23:45 then you would basically, the only numbers you'd ever really find from there are the really small ones probably because high up they're too sparse.

23:53 But what you can also do is you can, instead of filtering, you can map.

23:58 So you could instead generate a positive integer and then say map that by a give me the nth Fibonacci number function.

24:08 So you would start by generating, say, 10.

24:13 This would get mapped through and you would get the 10th Fibonacci number.

24:16 Yeah, some huge number there.

24:18 Yeah, can you take like a bounded set of like here's 214 things that I think are valid inputs and please start from here.

24:25 Pick one of these.

24:26 Yep, absolutely.

24:27 The strategy is called sampled from in hypothesis.

24:30 Right.

24:31 You just give it an arbitrary collection and it will give you values from that collection.

24:34 And can you compose or combinatorially combine these?

24:38 Like let's say have like three sets.

24:41 One's got five, one's got two, and another's got three.

24:44 Yep.

24:45 Can I say I want to sample from this and this and this and who knows how many combinations that would have to be able to work that out?

24:52 But, you know, like it would figure that out and sort of pick some from here and there and test them together.

24:56 So if you want one from each, then you can use the tuple strategy, which you can basically just pass in a sequence of n strategies and it will give you a tuple of n elements with the sort of ith element drawn from the ith strategy.

25:08 Okay.

25:10 So in that case, you could use something like tuples sampled from, sampled from, sampled from.

25:13 Right.

25:14 Okay, cool.

25:15 And then what are some of the other strategies?

25:18 Like we're talking about numbers a lot, but there's more to it than that, right?

25:20 Yeah.

25:21 So there's strategies for most of the built-in types.

25:24 You've got, there's unicode, there's datetime, although the datetime one's in an extra package because it depends on PYTZ because no one actually works with datetimes without PYTZ.

25:35 You've got all the collections types.

25:39 So you've got the tuples one dimension, but you can also generate dicts of things.

25:44 You can generate sets, frozen sets.

25:45 You've got unicode and byte string generators, of course.

25:48 Permutations, right?

25:50 Oh, yes.

25:50 Yep.

25:50 You can generate permutations of a given list.

25:53 I don't think many people use that one, but I've occasionally found it super useful.

25:57 Right.

25:57 When you need it, I'm sure you're like, this is exactly what I need.

26:00 Yeah, yeah, exactly.

26:01 But not so often, right?

26:02 Yep.

26:02 And then you've got, those are sort of most of the primitive strategies.

26:07 I think there are a few more of them I'm temporary writing on, but essentially if it's in the standard library, you can probably generate it.

26:13 Okay.

26:14 And then on top of that, you've got various strategy composition functions, which is for things like the map and filter I mentioned.

26:24 You've also got the composite decorator, which lets you sort of chain together a bunch of strategy generators and generate whatever data you like.

26:32 And you've got builds, which is a bit of a simpler version of that.

26:36 And there's also, there's special Django support in, again, in an extra package so that it can generate more or less arbitrary Django models without you having to specify anything in particular.

26:48 Right.

26:49 You just say models from my model class and it will do the rest.

26:53 You can override any of the rest it does, but by default it works reasonably well out of the box.

26:57 That's cool.

26:58 So if I had like a users table and I wanted to sort of replace that and not actually talk to the database, but I want to test the code that works with those, I could have some kind of, some kind of thing that would randomly generate users that match my data model.

27:11 Yeah.

27:12 It actually does talk to the database.

27:13 It works with the normal Django testing and the model objects it generates are persisted in the database.

27:18 Okay.

27:19 Interesting.

27:19 Yeah.

27:20 So this is real Django.

27:21 Yeah.

27:22 Rather than any sort of mocking.

27:24 I see.

27:25 Okay.

27:25 Well, very cool.

27:26 Very cool.

27:28 One thing that I thought was neat is, is you can say at given, you give it this strategy.

27:32 You say like, so if you're given an, you know, some set of, you know, something from the set of integers and something from the set of floating points where, and do like a filter, like, you know, some, some property on your floating points.

27:44 It's going to randomly pick a bunch of combinations and try them.

27:48 And that's cool.

27:49 But maybe, you know, maybe you're coming from a particular example-based testing scenario where like this combination of numbers for whatever reason is super important.

27:58 It's important to test.

27:59 And it has to be this.

28:00 And so like, in addition to all the randomness that you're doing to sort of explore the problem space, you can say, you can give an example decorator, right?

28:09 And say, and also include this particular test that I was testing before, say, if you're like upgrading to property-based testing.

28:16 Yep, absolutely.

28:16 The example decorator is sort of one of those little touches that as far as I know is unique to Hypothesis, but was in retrospect, obviously a good idea.

28:25 The original use case was more for people who wanted to include the examples that Hypothesis had found for them in their source code.

28:34 Mm-hmm.

28:35 But it found a whole bunch of other applications.

28:38 Like one is the one you mentioned of making it much more comfortable to transition from example-based testing.

28:43 Another thing that I sometimes do, and I've seen a couple other people doing, is that you use example to ensure that the test always gets the maximum coverage that the test can reach.

28:57 So if there's some line that your test covers 80% of the time, then you just add an at example that touches that line and so means that your coverage isn't varying from test run to test run as a result of Hypothesis if you do that.

29:14 Interesting.

29:15 Yeah, that's really cool.

29:16 Can you use it as a negative, a negation operator to like do all this stuff but don't send it this example?

29:22 Or is there some way to say this?

29:24 I'm not sure that that would make any sense.

29:26 You can't do that with example.

29:27 That's what the assume function is for.

29:30 It basically gives you a way of filtering out some classes of example as not good examples that you don't really care about.

29:39 I see.

29:39 Is that like a precondition sort of concept?

29:42 So I could say like if I was generated an automatic user, I would say and assume the user was generated today or assume that their age is over 18 or something like that.

29:53 Yeah, exactly.

29:54 It is essentially a precondition, although you can put it anywhere in the test.

29:59 So sometimes you would put it in the middle of a test when some previous operations produced a result that you don't care about.

30:05 So you could do a bunch of, say you got a list of users and you did a bunch of calculations and you would then assume that, I don't know, you have at least one of the users passed through the test that you just, in the past through the code you just ran and you haven't sort of unregistered all the users or something.

30:26 Yeah, sure.

30:27 Okay.

30:28 I think that's really useful and helpful.

30:31 Does hypothesis ever look at the code or the code coverage or anything like this?

30:35 Or does it just work black box style?

30:38 I'm going to put stuff in and see if it crashes.

30:41 So it currently just works black box style.

30:44 I've got most of the pieces for making it work in a less black box style.

30:51 In fact, I've got a separate library I wrote called Glassbox, which is designed for adding sort of glass box elements to your testing, which uses some coverage information.

31:01 A lot of hypothesis internals look quite a lot like the internals of a fuzzer called American Fuzzy Wop, which has this really nice coverage-based metric.

31:11 But the reason why the coverage stuff hasn't ever made it into a hypothesis is because coverage metrics work really well when you've got some sort of long-running fuzzing process.

31:22 So if you're prepared to run your tests for a day, then coverage metrics are amazing and will do a really good job at poking inside the code and seeing what is going on.

31:35 But if you are sort of upset, if your tests take as long as 10 seconds to run, then coverage starts to become less useful because it doesn't really have the time to actually figure things out.

31:47 I think I've mostly figured out a way of making use of coverage in a way that works okay within those time constraints, but it hasn't made it into anything production-like yet.

31:58 Okay.

31:58 Well, that sounds like a cool direction.

32:00 However, I have been saying that for about six months, so I wouldn't hold your breath on waiting for those features.

32:05 Yeah, of course, of course.

32:06 So it seems that it would be interesting for Hypothesis to remember scenarios it came up with that failed.

32:15 Like, is there a way to say, try all these examples, and if you ever discover one, you know, save it to something that you're going to know to replay?

32:23 It actually does that out in the box.

32:25 Okay.

32:26 Hypothesis has a database of examples.

32:29 Well, I say database.

32:30 It's basically an exploded directory format that is designed so you can check it into Git if you want.

32:35 But whenever a test fails, then what Hypothesis does is, well, first of all, it shrinks the example.

32:42 So it turns the example into a simpler one.

32:47 And then in the process of doing that, every example it sees fail is saved in this database so that when you rerun the test, it replays those examples, starting from the simplest and working its way upwards.

33:00 So whenever, basically, one of the things that I've made really sure in Hypothesis is that although it's randomness, it's sort of the good kind of randomness because it finds bugs, but it never forgets bugs.

33:13 If the test fails, then when you rerun it, the test will continue failing until you fix the bug.

33:19 Yeah, because if it didn't remember it, I guess, like, suppose the number 13, for some reason, broke your code, and you just said the strategy was integers.

33:26 Like, how would it know to try 13 again if it's, like, really going that, you know, out to that many numbers and so on?

33:34 So, okay.

33:34 Because it did seem to keep failing consistently.

33:36 And I guess I kind of clued in, like, that is a little odd that it seemed just, I'm like, oh, it's really smart.

33:42 Yeah.

33:43 But I didn't put it together how it's smart, of course.

33:46 For a lot of tests, the failure is common enough that even without the database, Hypothesis would be able to do this and would be able to find the test, find a failure consistently each time.

33:56 But one of the annoying things that happens there is that sometimes, if it didn't have the database, it would be finding often different bugs each time.

34:05 Because you don't always shrink to sort of the globally minimal failure.

34:09 And sometimes there are two bugs, and starting from one will shrink one way, and starting from another will shrink the other way.

34:15 So, in many ways, like, that's almost more annoying than it failing unreliably.

34:19 Because you don't know whether you've just changed the bug or whether you've introduced a new one or what.

34:25 Gone are the days of tweaking your server, merging your code, and just hoping it works in your production.

34:45 With SnapCI's cloud-based, hosted, continuous delivery tool, you simply do a git push, and they auto-detect and run all the necessary tests through their multi-stage pipelines.

34:55 Something fails?

34:57 You can even debug it directly in the browser.

34:59 With a one-click deployment that you can do from your desk or from 30,000 feet in the air, Snap offers flexibility and ease of mind.

35:07 Imagine all the time you'll save.

35:10 Thanks, SnapCI, for sponsoring this episode by trying them for free at snap.ci.com.

35:14 One of the things I think is interesting is the scenarios we've been talking about so far.

35:30 They've been, I give you two numbers, something happens.

35:33 The other side, it gives you the right answer, the wrong answer, and the test detects that.

35:39 But I think one of the things that seems very powerful is for sort of stateful testing and these set of steps.

35:47 Like, let's suppose I have a board game, and I've got to move things around the board in a certain way,

35:53 and some property is supposed to always hold like, you know, the number of chips on the board is always the same,

36:00 even if it comes from one player comes off and another has to go back on from the other player, right?

36:04 Who knows? I have no idea what game this is.

36:06 But, like, you could write a test to say, do all these operations, and then keep asserting that this is true.

36:13 And if it fails, it gives you a really nice reproducible output, right?

36:17 A set of steps almost, right?

36:19 The output format is a little idiosyncratic, so it's not, unfortunately, something you can just copy and paste into a test currently.

36:26 But, yeah, the stateful testing is really cool.

36:28 One of the reasons I don't push it quite as hard as I could is that I don't feel like we've got very good workflows for this right now,

36:36 because the stateful testing sort of, it's exploring an incredibly large search space.

36:42 So you do want something that's more like the fuzzing workflows I talked about, where you do want to set it running for 24 hours or whatever.

36:50 You can use it in your CI, and I'm using it in my CI in a few places, and at least one other project I know of is using it, but I wrote those tests.

36:59 That's material.

37:02 But the, I think it's not quite there yet in terms of being as, it's certainly not as usable as the rest of the library.

37:10 And I think it has some more work required before it gets there.

37:17 But I'm really excited about it, and I do want to sort of spend more time developing that.

37:21 Yeah, the possibilities of it are really amazing.

37:24 It's just, like you said, it's such a problem space that how are you going to find it, right?

37:30 In many ways, my crack about hypothesis being true automated testing, it's really only true for the stateful testing.

37:37 Because one of the things I emphasize sometimes is that hypothesis doesn't write your test for you.

37:45 You're still writing your tests.

37:46 It's just the hypothesis is doing some of the heavy lifting in terms of the boring coming up with examples.

37:52 But the stateful testing, it's sort of, that's almost no longer true.

37:56 At that point, hypothesis is almost writing tests for you.

37:59 Right, yeah, that is actually really cool.

38:01 But I think even so, that level of automation that you talked about that it's already really good at is super helpful.

38:07 Because coming up with those examples is hard.

38:12 And like I talked about earlier, I think coming up with examples that are just inside the working realm and just on the edge of the error conditions or these weird inputs, you know, those are hard to come up with.

38:23 And you just, I think there's just fatigue as well.

38:26 Like, okay, I've tested three cases.

38:27 Like, that's probably good enough.

38:28 Let's just move on to building new features, right?

38:31 Whereas property-based testing will sort of explore that space for you automatically, right?

38:36 Yeah, yeah, absolutely.

38:36 The fact that it's, the fact that you're still writing the test isn't intended to take away from what hypothesis is doing.

38:43 The coming up with examples part is both the most time-consuming part and the most boring part.

38:49 And also, it's the bit that people are really bad at.

38:52 So having software which can take this boring, difficult task and just do it for you is amazing.

38:59 And I get really frustrated when I have to write tests that I don't have the capability for now.

39:06 Yeah, I'm sure you do.

39:07 And the fact that the problems that it finds get permanently remembered, that's really cool.

39:13 So do you recommend people check in their .hypothesis folder into Git?

39:17 It's designed so that you can do that.

39:19 But I mostly don't do it myself.

39:22 I generally think that you're probably better off writing at example decorators for any example you specifically want to be remembered.

39:31 The main problem with the example database is that its format is quite opaque.

39:37 You can't really look at a file in there and say, I know what example this corresponds to.

39:41 So even though from a computer's point of view, it will remember what you need and it will do what you want,

39:48 from a human's point of view, you probably want to be more explicit than that.

39:52 One of the things I'd like to work on at some point but haven't sort of found the time or bluntly the customers for doing this work

40:03 is better sort of centralized management of test examples so that you can have your cake and eat it too.

40:10 And rather than checking it into Git, you can have a nice management interface where you can see what hypothesis has run

40:16 and get both the memory and the visibility.

40:22 But that's sort of a, at some point in the future project.

40:26 It's not anything on the short-term roadmap.

40:29 Yeah.

40:30 Well, even something automated that would take the hypothesis database and inject them as ad examples into your code would be cool.

40:38 Yeah.

40:38 Because the major problem is that there's no real way of taking a Python object and going from that to a literal that you can just copy and paste into your code and it will produce that object.

40:52 You can do it for really simple things.

40:54 So a lot of the built-in types, the wrapper will do that.

41:00 Yeah.

41:00 Basically, the wrapper output is parsable.

41:02 It's probably okay, but that's often not the case, especially for custom types.

41:06 Yeah.

41:06 I would say it's almost never the case for custom types.

41:09 One of my idiosyncrasies as programmers, I do try to make sure that all my types have good wrappers and that will evaluate the thing you started with.

41:18 But this is very rarely the case in the wild.

41:21 Yeah.

41:22 Sort of one of the cute little details in hypothesis that I spent far too much time on for what it's worth.

41:28 But almost any of the strategies you get out of the standard hypothesis strategies will give you a very nicely formatted wrapper that will exactly reproduce the strategy.

41:40 And this is true even up to the point of if you filter by a lambda, it will give you the source code of the lambda in the wrapper for the strategy.

41:48 Oh, that's great.

41:50 It's all a bit ridiculous and I really don't recommend emulating it, but every time I see it, it makes me smile.

41:55 Yeah, yeah, I'm sure.

41:58 That's cool.

41:58 So, you know, one thing that I seem to hear a lot about when I was looking into property-based testing was there seemed to be a set of like patterns that people come across that seem to repeat themselves that the property-based testing is well served.

42:13 Yeah.

42:14 Can you talk about some of those?

42:16 Do you mean like sort of standard styles of tests that?

42:20 Well, I'm thinking like one of the things people often say is a really good type of thing to send to turn this type of system onto is serialization and deserialization.

42:32 Or upgrading from like a legacy code to a rewrite.

42:36 You could be able to like pull an old library and always ensure that the same inputs here get the same outputs and the same thing in the new system.

42:43 Or if I have a really simple algorithm, I'm optimizing using the simple slow version to verify the fast version.

42:51 Things like this.

42:51 Yeah, absolutely.

42:52 So sort of the absolute best thing to test with property-based testing in general is these two things should always give the same answer.

43:00 Because it covers such a wide range of behaviors and gives you so many opportunities to get wrong.

43:11 And particularly for the sort of the optimized and naive version of an algorithm is great because often they're very different styles of algorithm.

43:19 So what you're essentially testing for is have I made the same mistake on both of these things?

43:25 And usually you'll make different mistakes.

43:29 And so a test failure is either a bug in your optimized one or your naive one.

43:32 Right.

43:33 It's almost like double-check accounting.

43:34 It doesn't necessarily mean your new one is wrong, but something needs to be looked at and it's who knows.

43:40 Yeah.

43:40 Yep.

43:41 One of the things that I've been trying to do with the new-ish hypothesis website, hypothesis.works, is gather a family of these different properties.

43:51 Because relatively few of them have been written down.

43:56 And there's a blog post scattered across the internet with some of them.

43:59 And there are a few really good prior articles.

44:01 But a lot of them are either folklore, which hasn't been written down, or start with this long diatribe about category theory.

44:11 Yeah.

44:12 And I'm not against category theory, but I don't really use it myself.

44:18 And I think it tends to scare people off.

44:20 So most of the time I'm just starting with, here's a concrete problem.

44:25 Let's solve it in Python.

44:26 Here's how you test it with hypothesis.

44:28 Oh, there is one pattern that I've noticed recently, which is either original to me or is an independent reinvention that no one else has written down before.

44:38 But I really like this as a style of testing, which is rather than these two things should give the same thing.

44:46 It's if I change this data, then this should move in this direction.

44:49 Okay.

44:50 So you generate some data, you run a function on it, and you then change the data in some way, and you run it again.

45:01 And the change in the output should in some way be reflective of the change in the input.

45:09 I originally came up with this for sort of optimization problems, where you run the optimizer and you make some changes, which should make the problem harder.

45:19 And you assert that the score of the output doesn't get better.

45:23 Or you make the problem easier and you assert that the score of the output doesn't get worse.

45:27 But I also had a nice example recently with binary search, which is if you run a binary search, then you insert an extra copy of the value at the point that you search to.

45:39 Then this shouldn't change the output of the binary search, because it's only sort of shifted stuff to the right.

45:48 Right. Exactly.

45:49 And so sort of in general, looking for things where functions should move in predictable ways and end up moving in ways that you didn't expect.

46:00 Okay. Interesting.

46:01 Like I increased the tax rate in my e-commerce system and the price went down.

46:05 What happened?

46:05 Or the price didn't change, for example.

46:08 Oh, whoops.

46:08 We're not actually including the tax.

46:09 Who knows?

46:10 Yep.

46:11 Exactly.

46:11 Yeah.

46:12 Very interesting.

46:13 So I think property-based testing and hypothesis is really exciting.

46:19 I think it, in a couple of ways, I think it means that testing is more effective.

46:25 And I think that it's less work to write those tests.

46:28 So that's like a perfect combination.

46:30 Have you done anything like, just as a proof of concept, like grab some popular open source project that has good test support and like convert its test to hypothesis tests and found new bugs or anything like that?

46:42 I haven't personally.

46:43 One of the problems here is that, as like with any testing, it's very hard to test a project you don't understand.

46:50 So I generally don't go into other people's projects and try and write tests for them.

46:58 I've done it once or twice.

47:00 A customer paid me to do some testing work on Mercurial and add some tests to that, which was interesting.

47:06 Did you find any bugs?

47:07 Yeah.

47:07 So I found a bunch of bugs, actually.

47:09 Wow.

47:09 None of them particularly critical.

47:12 But some of the, so the encoding one came up.

47:16 Mercurial has a bunch of internal encoding representations.

47:20 There is, for some reason, Mercurial has three different JSON encoders.

47:24 And we found bugs in one of them.

47:27 And there is some stuff where Mercurial wants to represent things as UTF-8B, which is a way of taking arbitrary binary data and turning it into valid UTF-8 encoded text and backwards.

47:37 And that sort of hit, that had a bunch of bugs.

47:40 I don't think that's used very widely, but it still had a bunch of bugs.

47:44 Yeah.

47:45 And then we used the stateful testing for sort of validating repository operations and found two interesting bugs in the HDShelv extension, which is, HDShelv is basically GitStash for Mercurial.

48:00 I'm sure someone will be mad at me for saying that, but that's basically what it is.

48:05 And I can't remember except, I think one of them was that the set of valid shelf names was narrower than the set of valid branch names.

48:13 And so you could create a branch which had a name that wasn't a valid shelf.

48:20 And then when you try to shelf stuff, it would default to using the branch name for the shelf name and everything would go wrong.

48:27 And the other was, it was something like if you create a file, then immediately delete it.

48:34 And without committing the delete and then try to shelf stuff.

48:39 No, sorry.

48:41 You delete the file, then you create a new file with the same name, which is untracked.

48:47 And then you try and shelf stuff.

48:48 And the shelf extension gets itself into a complete testing.

48:51 Wow.

48:52 It seems really interesting.

48:53 I'm sure lots of those types of tests were already there, but just those particular cases weren't found.

48:58 There was an example, there was a talk at PyCon this year by Matt Bachman.

49:04 And he talked about property-based testing on his project actually uncovered a bug in DateUtil.

49:10 Because his project was depending on DateUtil and some, you know, your system sent some kind of insane date time, like, you know, before BC or something like this.

49:23 And it freaked it out.

49:24 Yeah.

49:24 So, which they then fixed in DateUtil, which is really cool.

49:28 Mm-hmm.

49:28 Yeah.

49:30 The, I've occasionally thought about making the hypothesis dates a little more restricted by default.

49:37 Because, by and large, no one really cares about the representation of 9000 BC or 9000 AD.

49:45 Because they were, they'll be using a different calendar or they were using a different calendar then anyway.

49:49 But it does come up with these fun bugs.

49:51 So, I sort of, I've left it in for now.

49:53 Okay.

49:53 Yeah.

49:53 Cool.

49:53 I think, I vaguely recall the bug in question.

49:57 I think that one might have actually been slightly more reasonable dates.

50:01 Like, I think it was first century AD that went wrong or something.

50:04 I vaguely recall there being one where if you try and represent the year 99, then it assumed

50:11 that you meant 1999 or something like that.

50:13 Yeah.

50:14 It's cool.

50:14 That's what I kind of asked you about the open source projects.

50:17 because I think, you know, it would be fun to take, fun in quotes, to see the results of somebody taking all the tests for, say, like the top 100 most popular PyPI packages.

50:29 Go and look at their test suite and convert their example-based testing to property-based testing and just see what that spits out.

50:37 Yeah.

50:37 In general, what I would recommend in that space is more, if you're working on a popular Python package and you want to give that a try, pop into IRC or send me an email and I'd be very happy to help you out.

50:48 Okay.

50:48 Because I think, I really do think that what you need for doing that is more experience in the project than it is experience with hypothesis.

50:58 Of course.

50:59 Understanding the domain of what you're trying to actually test, yeah.

51:02 Okay.

51:03 Cool.

51:03 Well, we know quite a bit about property-based testing now, but maybe tell me a little bit about yourself.

51:08 What else do you do in the programming space for work and things like this?

51:13 So this is more or less my day job.

51:15 I've been working too actively on hypothesis in the last couple of months because I've been sort of researching some related things.

51:21 But basically, I do independent R&D and testing tools and I do consulting and training around that, either helping people to use hypothesis or helping people to improve their testing in other ways.

51:33 Historically, I've done more sort of back-end data engineering.

51:39 But once I sort of got into hypothesis properly and found that I really liked working on this sort of thing and there was demand for it, that's mostly what I've been doing.

51:47 Yeah.

51:47 I can see a huge demand of companies that have large projects that have tests, but not this kind of test.

51:53 They're like, you know, it might be really useful to spend two weeks with you just giving another go with a different system, right?

52:01 With hypothesis.

52:02 Yeah.

52:03 Well, I wish there were a huge demand.

52:04 There's huge demand for hypothesis.

52:06 I've currently got the thing that I think most new businesses have in their first year where the best way to get new customers is to have existing customers.

52:15 So it turns out that sales and marketing are hard.

52:20 So right now, I would say that I'm experiencing demand, but not huge.

52:25 Right.

52:25 Okay.

52:25 It may take a few years to get there.

52:29 Yeah, but it's a cool project and I definitely can see it growing in the future because it's solving a real problem and it solves it in a better way than what we're doing today.

52:38 Well, I mean, hypothesis itself is getting plenty of demand.

52:41 I think PyPI stats are broken right now, but certainly when they were last working, it was getting quite a respectable number of downloads compared to projects that I'd really thought of as being much more popular than they were or much more popular than hypothesis.

52:56 That's awesome.

52:58 Well, yeah, congratulations on it.

52:59 That's cool.

52:59 Thank you.

53:00 All right.

53:00 We're getting near the end of the show.

53:01 I have two questions I always ask my guests.

53:04 First of all, if you're going to write some code, what editor, specifically Python, but in general as well, what editor do you open up?

53:11 Basically, Vim.

53:12 I've been experimenting with using Windows as my primary operating system recently, and I was trying PyCharm with Vim mode, but I've ended up mostly just going back to Vim even on Windows.

53:24 Right.

53:24 Okay, cool.

53:25 And of all the PyPI packages out there, you know, what one, maybe there's, you know, there's over 80,000.

53:31 There's a bunch that we all have exposure to that aren't necessarily mainstream, but you're like, wow, this is really a gem that I found that people didn't know about.

53:39 In addition to hypothesis, so in addition to pip install hypothesis, which is cool, what else, what would you recommend?

53:45 Most of my, I don't really have any niche packages I can recommend in that regard.

53:52 I really like Py.test and coverage is an exceptionally good piece of work.

53:57 Both of these are quite well known and unsurprising opinions.

54:04 There's actually a package by Matt Pacman, I think, called diff cover that I keep meaning, I haven't used myself, but it looks really good for putting your CI.

54:17 So when you want, say, 100% coverage, it's very hard to get to 100% coverage in a single leap.

54:25 So what you do is you set ratcheting and just say, I don't care what the coverage currently is, but you can never make it worse.

54:33 And so diff cover is just a nice little tool that is designed to help you configure that ratcheting on coverage or on pepH checks or things like that.

54:46 Nice.

54:47 So you can basically enforce the rule, like coverage always needs to get better as our project grows.

54:52 Or at least never gets worse.

54:53 Right.

54:54 Okay.

54:55 But I don't currently use that because I already have 100% coverage and play gate cloneness on all my main projects.

55:01 But if I were coming into someone's existing project, which was large and had a slightly less good state, I would absolutely recommend checking something like this out.

55:11 Oh, that's excellent.

55:11 Thanks for the recommendation.

55:12 All right.

55:13 Any final call to actions?

55:14 How do people get started with Hypothesis?

55:16 So getting started with Hypothesis, what I would really recommend is just checking out the Hypothesis.Works website and sort of getting a feel for it, trying it out a bit.

55:27 And then if you are a company who wants to improve your testing with Hypothesis, I would very strongly recommend hiring me to come in and either do a consult or run a training workshop.

55:39 One of the workshops I run is basically an exploratory thing where you stick me in a room with 10 devs for a day and we figure out how to write a whole bunch of new tests for your software.

55:51 And we usually find some interesting bugs, particularly in the Unicode handling.

55:57 Well, most of the time we've done it, but also in other weird edge cases that you wouldn't necessarily have thought to test.

56:02 Yeah, that's awesome.

56:03 I'm sure you do find some weird ones.

56:05 That's great.

56:06 All right.

56:06 Well, David, thanks so much for being on the show.

56:08 It's been great to talk to you.

56:08 Thank you very much, Michael.

56:09 It's been a pleasure.

56:10 Yeah, bye.

56:10 Bye.

56:11 This has been another episode of Talk Python to Me.

56:15 Today's guest was David McClaver, and this episode has been sponsored by Hired and SnapCI.

56:20 Thank them both for supporting the show.

56:22 Hired wants to help you find your next big thing.

56:24 Visit Hired.com slash Talk Python to Me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $2,000.

56:34 SnapCI is modern, continuous integration and delivery.

56:37 Build, test, and deploy your code directly from GitHub, all in your browser with debugging, Docker, and parallels included.

56:43 Try them for free at snap.ci slash Talk Python.

56:46 Are you or a colleague trying to learn Python?

56:49 Have you tried books and videos that just left you bored by covering topics point by point?

56:53 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.

57:02 And if you're looking for something a little more advanced, try my WritePythonic code course at talkpython.fm/Pythonic.

57:10 You can find the links from this episode at talkpython.fm/episodes slash show slash 67.

57:16 Be sure to subscribe to the show.

57:18 Open your favorite podcatcher and search for Python.

57:20 We should be right at the top.

57:21 You can also find the iTunes feed at /itunes.

57:24 Google Play feed at /play.

57:27 And direct RSS feed at /rss on talkpython.fm.

57:30 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

57:35 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music.

57:42 You can browse his tracks he has for sale on iTunes and listen to the full-length version of the theme song.

57:47 This is your host, Michael Kennedy.

57:50 Thanks so much for listening.

57:51 I really appreciate it.

57:52 Smix, let's get out of here.

57:55 Stating with my voice, there's no norm that I can feel within.

57:58 Haven't been sleeping, I've been using lots of rest.

58:01 I'll pass the mic back to who rocked it best.

58:04 I'll pass the mic back to who rocked it best.

58:09 I'll pass the mic back to who rocked it best.

58:10 I'll pass the mic back to who rocked it best.

58:12 I'll pass the mic back to who rocked it best.

58:13 I'll pass the mic back to who rocked it best.

58:15 I'll pass the mic back to who rocked it best.

58:16 I'll pass the mic back to who rocked it best.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon