00:00 Michael Kennedy: How do you build reliable software with fewer bugs? Yes, unit testing is part of that. But did you know that code reviews often play a key role in this process, and come with many more benefits on top of just bug detection. This week, you'll meet Dougal Matthews. It's Talk Python to Me, Episode 102, recorded February 28, 2017. Welcome to Talk Python to Me, a weekly Podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I am @mkennedy, keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Rollbar and Hired. Thank them both for supporting the show, check them out at @rollbar and at @hired_hq on Twitter, and tell them thank you. Dougal, welcome to Talk Python.
01:18 Dougal Matthews: Hi, thanks, it's an honor to be here.
01:19 Michael Kennedy: Yeah, it's great to have you here. I'm really excited to a bring slightly different view of technology here. Often, we talk about how you execute this code with the API you call. But let's take a step a little bit back from just writing code and talk about how we work as a team, how we contribute to open source, contribute to our company's code, and things like that, through this code review, whole conversation that you have here.
01:43 Dougal Matthews: Yeah, I spend so much time doing code review, just as part of my daily job. It seems like such an important thing to actually talk about and see how we can improve.
01:51 Michael Kennedy: Yeah, I feel like people don't talk about it very much. That's why I was drawn to a talk you gave at EuroPython that we're going to talk about. You know, it's great that you're talking about this here, because it can actually either be very beneficial to bring in people, to level them up as developers, or it can be confrontational and actually upsetting, right? So, I think having this as a thing you study and a thing that you do meaningfully, not just, hey, I have to review your code, somebody said so, seems like a good idea.
02:19 Dougal Matthews: Absolutely. At the EuroPython talk, I actually did a quick sort of raise-of-hands to see what people thought about code reviews, and I think it was only about 20% of the room said they looked forward to or enjoyed the code review process. And given how much time people spend doing code review, which was several hours a day, if not more in some cases, then that's just something we need to do better at.
02:41 Michael Kennedy: Yeah, we definitely do. Before we get to that, though, let's start with your story, how did you get into programming?
02:46 Dougal Matthews: I got a fairly common story, I think. I started messing around with computers when I was young. The first thing I'd done, really, was learning HTML, and, I guess, CSS later on, because I first encountered it in, sort of, websites, and I looked at the View Source; and just with that, it kind of lead me down this rabbit hole. My dad actually done something really great when I was a teenager, he embraced it. He was running his own company at the time, and not many people actually had websites for their small businesses, so,
03:14 Michael Kennedy: What year was this, do you remember?
03:16 Dougal Matthews: It must have been around 2000.
03:18 Michael Kennedy: Okay, yeah.
03:19 Dougal Matthews: Maybe a couple of years before.
03:20 Michael Kennedy: Yeah, the web was certainly well-established, like, the whole .com boom was going crazy around that time; but it was still not a thing that, if you had a business, you had a website, right? Like, that was not a one-to-one mapping. It was, if you were doing internet stuff, or you were a big company, like Coke, or Chevrolet, or something, you would of course have one, right?
03:40 Dougal Matthews: Yeah, absolutely. I think at that point, it was really out of reach for your regular small companies. So my dad just basically got me to put together a simple website for him. And honestly, it was fairly terrible, but it was my first kind of step into that world, in that direction, and I just loved the idea of web development. And I just
03:58 Michael Kennedy: Yeah. I think it's interesting that the whole View Source thing brought so many people into programming, right? It used to be we would go to a store, we'd by a box, it would have compiled machine instructions on a disk; we'd put those in our computers and run them, and then the web came along and it did something interesting: you could pretty much just look and study it, right?
04:19 Dougal Matthews: Yeah, and that actually reminds me of one time, I was just starting and I was trying to figure out why something wasn't working. And actually, I didn't have access to the computer all the time, so what I had done was I printed about 20 pages of HTML and just sat down and studied it. It wasn't a very good approach, but,
04:35 Michael Kennedy: I've tried that as well, it is extremely hard to get anything out of it; I did it with C++. Awesome, okay, so you created the site, what language was that in, what technology?
04:43 Dougal Matthews: It was purely static, so it was just HTML, I'm not even sure if I was using CSS at the time; it might have been a bit early in my learning at that point. But fairly quickly, I moved on to doing PHP. I think it just because you seen it everywhere, all the websites ending with .php. Like, I guess Facebook, later on, then they were everywhere at time. I never really felt quite at home with PHP, it wasn't really the language for me. I'm not really quite sure what it was, I mean, it was just a large number of things, it always felt a bit strange, and I was trying to do more, like, I was trying to do more object-oriented and think about my code more from an engineering point of view. And eventually, I stumbled on a video by Jacob Kaplan-Moss, when he was introducing Django; it was very early on, it was before Django 1.0 and I just picked it up and tried it out, I just loved it, it was great. And that just kind of led me down that path to learning Python. I remember writing a blog post years ago, that Django is just Python because it's really, it was Python that I really liked. I mean I do enjoy Django, but it was Python and the whole ecosystem and everything beyond that, and it just kept on going.
05:48 Michael Kennedy: Yeah, I hear that, a story sort of like that over and over, in that, the guys around Django were very just enthusiastic, right? And their enthusiasm wore off on anybody who came in touch with them, and Django just really grew, partly because it's really great, and partly because it's Python, but partly, just the enthusiasm of getting started with this new world, right?
06:09 Dougal Matthews: Yeah, absolutely. And I had gone from some hacking on WordPress sites and various PHP frameworks to this, and suddenly, the elegance of Django and Python, just defining models and having the admin, and all that just there, was just amazing, mind-blowing, really.
06:23 Michael Kennedy: You mean there is objects and architecture and factoring, oh, my goodness. Yeah, yeah, very cool, so you said you have some strong ties to EuroPython; what's the story there, are you working with the conference somehow?
06:34 Dougal Matthews: The reason I wanted to mention EuroPython, I mean, there's actually several reasons, but when I started with Django, I just thought I need to get involved with this, but there's nobody local that I know, I can't find anyone that's doing anything with Python or Django. So, as I approached June at the time, I found all the money I had, and got myself to EuroPython in 2009, and that's the first EuroPython that I went to, and I've been every year since. And I also went to the first, and actually, the only EuroDjango con, because after that, it got changed to Django Con EU. Yeah, and it's just, I've met so many people there, and slowly you start to do some small things to help out with EuroPython. So I'm on the program committee this year, and I'll actually take the moment to point out that the EuroPython dates for this year are the 9th to 16th of July; these are tentative, but I expect they will be finalized by time this is released.
07:24 Michael Kennedy: Oh, yeah, that's really great, thanks. So, if you're anywhere near Europe, or you want to go to Europe, right, check it out, right? What's the URL?
07:31 Dougal Matthews: I think it's just,
07:32 Michael Kennedy: It's, like, EuroPython,
07:34 Dougal Matthews: EuroPython.eu is the website. And this year it's gonna be in Romani in Italy.
07:39 Michael Kennedy: Oh, that sounds awesome. Okay, excellent, well, there is a possibility I'll make it there, but it's very far away, so we'll see. But that's awesome, if people can make it, they should go. I think that Python conferences are something really, really special. Like, there is a different way that people interact with each other there than a lot of typical tech conferences. So, if you can get there, check it out.
08:02 Dougal Matthews: Yeah, I agree, there's a long-running phrase of come for the language, and then stay for the community, and I think that really resonates with me.
08:08 Michael Kennedy: Yeah, absolutely. So, you're doing a lot with EuroPython, a lot with Django, what do you do for your job today?
08:15 Dougal Matthews: So, I am currently working on OpenStack. I am employed by Reta and I work on the OpenStack project full time. So, I am focusing primarily on making OpenStack easier to use and install. And I'm also working on a work-flow service within OpenStack, which is a quite an interesting project.
08:32 Michael Kennedy: I think that's really good. You know, OpenStack is really a great example of Python doing major stuff, and it's a great contribution; but it is a little bit hard to install, especially if I just want to run it natively on my Mac or something, it's challenging, right?
08:46 Dougal Matthews: Oh, absolutely, I mean it's, I think the problem with OpenStack, or one of the problems with OpenStack is there is such a wide range of different projects, there's no, no two OpenStack installations are actually the same: some people care about containers, or some people care about VMs, some people care about storage, that kind of thing, and they're all very different.
09:06 Michael Kennedy: Yeah, I mean you're basically installing AWS in a sense, right?
09:10 Dougal Matthews: Yeah, exactly.
09:10 Michael Kennedy: Which is quite the thing. And by the way, S3 is entirely down, and the internet is in chaos today, so that's quite something. I'm really glad our recording is working, that's great. Alright, so let's talk about code reviews. So, first of all, let's start with who should be doing code reviews, like, are these big enterprises, is this open source, like, who should care about this?
09:35 Dougal Matthews: I think everybody should be doing code review. Really, the only reason I can think of to not do code review, and that would be that you are the sole developer on a project, and obviously, then, it's kind of tricky to have a peer review. Most of the time, I am involved in open source projects and that kind of environment, but I think code review is something that should be in the enterprise in all companies as well. And I think, actually, it's more common in open source, just because of the nature of how people communicate and collaborate.
10:03 Michael Kennedy: Yeah, okay, so I tend to agree with you, I mean, I guess, if you're on a small team, you work at, like, a small company, and you have one project, and maybe there are some other developers, but they have their other project, and you don't really cross over. It might be a little bit tough to do code review, especially if one guy is a C++ person, and another person is a Python person or something, where they don't know, deeply, what they're looking at.
10:24 Dougal Matthews: That's a fair comment, but the, you're missing out on an opportunity when you're looking at situations like that. Perhaps the Python person becomes ill or leaves the company, if nobody else knows what they've been doing, then you're in trouble. At least if they had a light code review process, it might not need to be as in depth as in some other situations, they will have some idea of what's going on.
10:43 Michael Kennedy: Yeah, I think that's a really good point. You also actually pointed out some scenarios where it might make sense, even as a solo developer, to do your own code reviews.
10:52 Dougal Matthews: Yeah, so, this is kind of an interesting thing which I just stumbled on by mistake. I do some work one of the documentation projects used in Python, and it's called MK Docs. And it's a fairly popular project, there quite a lot of people watch the repo. And I was just hacking away on it, and I opened a pull request onto the project, just because I wanted to trigger the Travis builds, and make sure things passed and were working before I merged it, just avoiding myself making any errors. But in the process of doing that, I left the pull request open for maybe six hours or a 1/2 day, and somebody came along and actually made a comment on it, and made a suggestion. So it's just, by including that process in something I am working on, even if I am planning to merge it myself anyway, it gives other people an opportunity to get involved. And it actually helped me find new contributors as well.
11:40 Michael Kennedy: Oh, yeah, that's really cool. So, especially if you have open source, but really, if you have a repo that other people are watching, going through the whole workflow, even if it's just a cursory sort of review, lets them know there is this milestone, like, a feature is fixed and here is the code that fixes that feature, right?
11:57 Dougal Matthews: Yeah, and also if you have your CI integrated and that, it will avoid you making any silly mistakes, where you've accidentally missed a file or added a file, because run will be, sorry, your test will run against that pull request as well.
12:10 Michael Kennedy: Yeah, yeah, absolutely. So do you follow or subscribe to the GitFlow style, with, like, feature branches and stuff and PRs, or like, how do you guys work together at OpenStack?
12:22 Dougal Matthews: We basically do master development; every patch goes into master, and then we post stable branches for releases that are still receiving updates. It's a much lighter work flow than something like GitFlow. I find that something like GitFlow might work well in a company where you can force everyone to follow the same conventions easily; but in an open source project, I don't think long-lasting branches tend to work very well, because they just get out of sync and there is different problems going on, or the ... about. So I prefer everything just to go into master as soon as possible, but keeping master as stable as possible as well.
12:57 Michael Kennedy: Yeah, I think that's a real interesting point, and I've worked both ways; I worked in a company where we basically agreed to do GitFlow stuff, so every change has it's own branch, like, a feature branch or a bug-fix branch; and then you do a pull request back to the dev or main or whatever, to have someone else review it on the way back in. But I've also worked, basically we've got a dev branch and a releases branch, and we just push, you know, merge over to the releases every now and then. I don't know, I'm torn; like, they both have their advantages. One of the things I see about the second one, the latter one, like, what you described you're doing is, the closer you are to dev on your branch or what you're doing, the less merge pain you're gonna have. Like, a friend of mine describes, sort of, being branched as credit card debt: a little bit of credit card debt is fine; if you let I build up for six months, there is gonna be a reckoning when you try to fix it, right?
13:51 Dougal Matthews: Yeah, absolutely, and I think your people are using longer-run branch, and long as you're re-basing, then maybe you're avoiding most of the problems.
13:58 Michael Kennedy: That's right, so, if you take the dev branch and merge it into your feature branch every now and then, basically to keep that time, the work that's been done from the time you're working on your code, until what you're gonna check in minimum, that's great. Okay, so, in my mind, I guess code reviews are obviously about finding bugs, or making sure that you follow certain procedures for, say, security reasons. But you actually said that there is a lot to do with, like, code quality, and mentoring, and a lot of things, so, like, what are some of the benefits?
14:33 Dougal Matthews: Yeah, so this is actually quite an interesting point. There is a paper which is called Expectations, Outcomes, and Challenges of Modern Code Review. This is one of the first academic papers I read about code review, and it's kind of what lead me down this rabbit hole. At least I hope I am remembering the right paper; it might, this data might be from another one.
14:50 Michael Kennedy: We'll put all the papers, the links to the papers, so that people can, like, find the references.
14:54 Dougal Matthews: Somebody had done some research with Microsoft, and what they done is they done a questionnaire for the developers before and after the code review process, and they were just finding out why they were doing code review, what they were expecting to get out of code review, and the motivations. And while most of the developers actually stated their primary motivation was to find bugs and find problems, when they took the outcomes of code review, so they looked at the comments and all the different feedback, that was in the code review, then they manually classified it based on all the different things that were coming out of it, they actually found out the bug finding was much smaller, while bug finding was still important, it was a much smaller piece of the overall code review process, and there were a lot of other benefits that you get out of it as well.
15:38 Michael Kennedy: I see, yeah, they probably start out the same sort of mental model, like, okay, how well did we do finding bugs? And, like, oh, wait, all these other good things came out of it, wow.
15:46 Dougal Matthews: Yeah, yeah, and I think this is part of why a lot of people don't enjoy code reviews, because if you're going in just to find bugs, you're going in with a negative stance, you're expecting what can I find that's wrong with this, what can I find that's broken? It's not, it's a slightly different process, but what they actually found was the biggest outcome was code improvement. So this is kind of subtle to a point, because the code that was up for review was valid and it worked. But actually, based on the feedback and the duration it went on, the code review, they ended up with a better solution in the end. And then there is understanding and sharing knowledge between the teams, so it's about having awareness of what other people are doing on different projects or different areas of the code base.
16:25 Michael Kennedy: Okay. Interesting, yeah, so, you had some interesting quotes, like one from Jeff Atwood, the creator of Stack Overflow about the value. And there is also one from Steve McConnell, right?
16:37 Dougal Matthews: So, yeah, ... who a lot of people will know, he wrote a blog about code review a while ago, and he basically said that the hardest thing about doing code review is finding somebody that you trust enough to do the code review with. But once you actually get into the process, you'll find that every minute spent in code review is paid back tenfold, which is really a huge payback for your time investment.
16:56 Michael Kennedy: Yeah, that sounds like a good investment.
16:58 Dougal Matthews: And then, the one from Steve McConnell, so he was basically, in his book, Code Complete, he compared the detection rate, so how often you are finding bugs in your reviews, and in their study, they found out that unit tests has a 25% chance of catching bugs, function tests had a 35% chance, and integration tests, 45% chance, but then code reviews were actually between 55 and 60%. So I'm not sure that I completely buy those numbers, but, I mean, even seeing that code review is just as valid as you unit test is quite something. I mean, I know I couldn't live without my unit tests.
17:33 Michael Kennedy: Yeah, it certainly is interesting. And that book, Code Complete, by Steve McConnell is really, it's quite a book; it really has a lot of wisdom, a lot of things, like, backed up by studies and numbers, and so if you haven't seen that book, and you're kind of into this process stuff a little bit, definitely check it out. So, the fact that you get better bug coverage than unit testing, that's pretty interesting. Also, how do you feel, like, unit testing contributes to mentoring and better understanding, and of team understanding, versus, say, like, code review; I feel like you would get more of that out of a code review, right, because you're actually talking, not just writing tests.
18:11 Dougal Matthews: Yeah, absolutely, so, one of the things I covered in my talk was the issues with sort of the quotes I gave, they are very focused on bug detection and detection rate. And, as you said, there is a lot more to code review that I could cover as well. I think the reason that paper covered bugs is, bugs are sort of an absolute thing. Something is either a bug or it isn't, so it's very easy to measure, you can see how many bugs that you reported. Whereas all of the other benefits, they tend to be a lot softer and a lot harder to actually get a tangible measurement of.
18:40 Michael Kennedy: Yeah, like that example of what if the Python guy gets hit by a bus, right? Your company is more anti-fragile if you do these kinds of things, right, it can withstand these problems, whereas, I guess it's obvious, like, the user can't log in, and you get a 500 error on the server. Okay, well, that's a bug, obviously, right? But you said, we could have saved this with a test, but also, our team works better together, our team has team has a better understanding of what each other is doing and the trade-offs they're making, and so on. So that's, you can't really test for that.
19:12 Dougal Matthews: No, not at all.
19:28 Michael Kennedy: This portion of Talk Python to Me has been brought to you by Rollbar. One of the frustrating things about being a developer is dealing with errors, eh. Relying on users to report errors, digging through log files trying to debug issues, or a million alerts just flooding your inbox and ruining your day. With Rollbar's full stack error monitoring, you'll get the contexts, insight, and control that you need to find and fix bugs faster. It's easy to install. You can start tracking production errors and deployment in eight minutes or even less. Rollbar works with all the major languages and frameworks, including the Python ones, such as Django, Flask, Pyramid, as well as Ruby, Javascript, Node, iOS, and Android. You can integrate Rollbar into you existing workflow, send error alerts to Slack or Hip Chat, or even automatically create issue in Jira, Pivotal Tracker, and a whole bunch more. Rollbar has put to get a special offer for Talk Python to Me listeners. Visit rollbar.com/Talk PythontoMe, sign up and get the bootstrap plan free for 90 days. That's 300,000 errors tracked, all for free. But, hey, just between you and me, I really hope you don't encounter that many errors. Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacarts, Zendesk, Twitch, and more. Give Rollbar a try today. Go to rollbar.com/Talk PythontoMe. Yeah, so how do you feel like automation fits into this? So, like, continuous integration and whatnot.
20:54 Dougal Matthews: So, I really strongly believe that you should automate as much as possible. Anything that can be automated and isn't is essentially wasting your time, and for most people, their time is their most precious thing. So this means that you should be automating you linting, so this is, like, your ... checking, or your Flake8.
21:13 Michael Kennedy: Yeah, you had an interesting quote where you said something to the effect of this whole thing often gets set up as an adversarial sort of relationship, like, okay, these guys sitting on the side of that table, you sit here, and we're gonna review and tell you what you did wrong and how you need to fix it. And one of the things that code reviews are supposed to do is say, like, well, you're not following this coding convention, or you didn't do this thing, you didn't document this or name this correctly, or whatever, right? And if that comes from a human, it seems super nitpicky, but if it comes from a machine, everybody is sort of subjected to the thing absolutely, uniformly, so you're like, oh, yeah, Flake8 told me that should have an underscore, and not camel case, or whatever, right?
21:57 Dougal Matthews: Yeah, absolutely. The feedback from an automated machine, so, in terms of, also including, verifying the tests are running, it doesn't feel negative, it doesn't feel like the code is judging you. But sometimes, if somebody comes along and reviews something that you have been spending a lot of time on that, but then they just point out different small little things that they thing would be slightly nicer formatting, then that can be infuriating.
22:19 Michael Kennedy: Yeah, I think it's infuriating, but also, it can be, anyway, but I think the other part of that is that there is, if your job is to be the reviewer, you don't want to spend your time going, yeah, you named that wrong, yeah, that needs a comment, yeah, that needs a doc string. Like, that is not, that is, like, a job that robots could do, and should do, right? You should say, like, no, no, I think this algorithm is, when we scale up, this is not gonna scale, or this is a state-ful thing you're doing, that means we can't load balance it, so we need to do that different. Like, that is the kind of thing people should be contributing to, not spending the energy on, camel case, camel case, doc string, right?
22:56 Dougal Matthews: Yeah, so, focusing your time on the design and the architecture, and the actual solution to the problem, is just so much more productive.
23:03 Michael Kennedy: Yeah, and so, you said you were trying to rename this process to, like, something other that a review, right?
23:08 Dougal Matthews: Yeah, I have, sort of, a bit of an issue with code review. I think review has a slightly negative feeling about it. You think of something like a performance review at work, that's rarely a good thing, if your manager says, okay, we need to have a performance review. I much prefer to think that it's going to be a code collaboration or a code discussion; it's something that you're working on with someone else, and I think you should try to encompass that more. I don't realistically think we'll ever change the name at this point, but that's how I perceive it in my head at least.
23:37 Michael Kennedy: Yeah, yeah, that's cool. I think code collaboration, to me, maybe code conversation, right, something like this, right?
23:43 Dougal Matthews: Yeah, that's nice.
23:44 Michael Kennedy: Yeah, I mean, certainly the open source space, I feel like, these can almost be a conversation starter, or an introduction to somebody maintaining an important project. Like, you can go look at some code for, okay, do some work, and then your pull requests, like, starts the conversation, like, here's how I think I can both contribute to your project and improve it; and then you get to know that person, a little bit, at least, by going through and reviewing their code, back and forth, and well, you didn't think about this in our project, and you're gonna have to account for it, and so on, right?
24:18 Dougal Matthews: Yeah, I think it's real important when you submit a change to a project that you state that you want to have feedback. One of the pull request that I had on MK Docs, for example, somebody said, I just want you to merge this, and I was like, well, you've only just opened this pull request, we're only just starting to talk about it, I never seen this before. So this is like the start of the conversation, and I think both people need to go into it at that point, whereas sometimes somebody has maybe been working for a week on something, and they feel like when they opened the pull request, they kind of feel like they're done, they feel like they're finished, and that's not the case.
24:52 Michael Kennedy: Yeah, I know, that's the beginning, right? Beginning of the conversation. What would you recommend to have made that person's experience better? Should they have opened and issue and said, I am going to do a pull request, here are my intentions, here is the problem I've identified, and what I think I'm going to do. I'm sort of personally torn because, on one hand, actions speak much louder that words. Like, here is the code that I have written that will fix this for you, puts you in a different place than somebody who opens an issue and, sort of, almost complaining, right? So, they come with more credibility, but at the same time, the put a lot of energy into it, and they've gone on a path which is incongruent with what you're doing, right? So what do you think?
25:28 Dougal Matthews: Yeah, so, I think in this case, you just need to judge it yourself, really. There isn't a standard answer. Sometimes what you're working on seems fairly obvious, because it's fairly small and contained, and then you should just go ahead and do the pull request. But then if you feel like maybe there is more discussion, you're not sure about how it should work, then it should go towards an issue first. I do think that sometimes it is actually easier to express what you want to do in correlate, you're just asking around. So I think if you also to go straight to the pull request, but you should always have the expectation that this is a very first draft, and it will change.
26:02 Michael Kennedy: Yeah, I think maybe think of it as like a draft of a paper sort of thing is a pretty interesting way to think about code, right? Okay, so, when your authoring changes, don't start with the code. Well, what should you start with?
26:15 Dougal Matthews: You should either be starting, well, you can start with the code if you want to, but it's always good to start with a discussion first on an issue or some bug report somewhere, where you can actually flesh out what it's gonna be. But now, it isn't always immediately that it's going to be much smaller change in it.
26:29 Michael Kennedy: Yeah, also, understanding the context of where that person is coming from, where the project is, right? Like, on one hand, you might, it would be very easy to look at one algorithm an go, this algorithm does everything in memory, it loads it up, all this data at once, and it's gonna be super slow, we can't do this. Well, that's true if you have a billion records. But if you have 20, it's totally fine, and the billion record solution would actually make it worse, right? So understanding context, I think, I super important.
26:55 Dougal Matthews: Yeah, and when you actually provide a change, it would be useful to tell people, both why you are doing something, sorry, what you are doing and why you are doing it. I quite often see pull messages or pull requests that only really cover one of the two. And I really think you need to understand both, as a reviewer, to really give a thorough review to that code.
27:14 Michael Kennedy: Yeah, for sure, I think, obviously, you have to keep that in mind no matter what. But I feel like, in open source, you don't necessarily know how people are using your code, as opposed to, like, within your company, like, well, it's that website, we know what the traffic is, we know how that's gonna look, right? So, it seems almost more important in the open source, open space world, rather than confined behind your company's wall.
27:36 Dougal Matthews: Yeah, absolutely. Say, for example, you need to add a new method to a function for some reason. If you just add a new pull request added perambulatory to this function, then that's not very useful, you need to also explain why you are doing that, what was the actual advantage to this as well.
27:52 Michael Kennedy: Yeah, and how your not going to break everybody's code by doing it.
27:54 Dougal Matthews: Yep.
27:55 Michael Kennedy: Another thing, another piece of interesting research that you pulled out was something from Mozilla was, people actually find more issues and give better feedback on small changes, small PRs, and so on, rather than large ones, even though the large ones surely have more things to address.
28:12 Dougal Matthews: This happens because reviews are just overwhelmed by a big change, and I actually seen someone tweeted about this. Basically, they decided that, they should just submit the changes, because people do less thorough review, and the do occur more easily.
28:25 Michael Kennedy: Sneaking it by.
28:27 Dougal Matthews: Yeah, that's obviously not the idea here, so if you can just keep your changes really self-contained to one particular thing, one particular issue, and ideally touching less files within the project, although, obviously sometimes you need to touch multiple files, and it just makes it easier for someone to review, to get their mind around it, and really understand what's going on. And I think that's why the smaller changes then get much more valuable review, whereas the bigger changes tend to nitpick at different things that they've maybe not been quite able to wrap their head around everything that's going on in this review.
28:57 Michael Kennedy: Yes, so do you recommend for having one concept per check-in type of thing, rather than, I did all my work for today, so I'm done, so I checked it in, but, like, here is this little change I made, then I reformatted this thing, and then I addressed this performance thing over here, rather that just all in one shot, like, would that get better feedback?
29:17 Dougal Matthews: I'd definitely go for splitting that up into multiple reviews. I actually find that, you mentioned that you reformatted something in that example, and I actually think that that can make it harder to review something, because the diff could be 10 times bigger, just because they also reformatted the code at the same time. To do that was two individual changes, and you can actually see, okay, this is where they actually made the functional change, this one is just laid out, okay, they're both fine. Something like this can depend a bit based the tool they are using. So if you're using a get or pull request, you just have to commits this in your pull request, then that would be fairly easy to review in that case. But something like Gerrit that only allows one patch at a time, it's a bit trickier.
29:56 Michael Kennedy: Yeah, sure. What do you think, what contribution to this do you think Github has made? How important is all this well known, commonly used infrastructure that is connect to Github.com for code review, code collaboration, do you think the tools there are decent?
30:12 Dougal Matthews: I think they're reasonably good. They've actually made some changes over the last, I can't remember when exactly, but I think over the last six months or so, they added a whole bunch of features, and at least some of them seem to be inspired by Gerrit, and maybe other code review tools as well. I think they're improving the process. Originally, I think, I was almost too simple, but actually, for small projects, it seems to work quite well. When you see larger projects on Github, using pull request as their primary code review, they tend to come up with a very complicated system on top of that, with labels, or maybe, I think Facebook has bots on all the different projects that are, like, progressing things on, so, marking them as failed or not. So yeah, I think they've done a great job in making code review accessible to everyone, to at least a degree, but, yeah, I still think there is space for improvement there.
30:59 Michael Kennedy: Yeah, it's interesting, I mean, it's not necessarily true that the code that, say, that you guys are using for OpenStack, or Python is using for CPython, is necessarily the best same set of tools for people with a 1000-line project. Alright, like, there is overhead in these bigger things that the smaller ones would actually suffer from it being there. So it's always a tension, right.
31:23 Dougal Matthews: Yeah, absolutely. So, we use Gerrit for OpenStack, and it's a Python open source project, so that's quite nice. In all honesty, it is very hard to use, and it's also extremely ugly, and it takes a long time to just get used to using it. But once you do get to that point, I find it really good. Whereas, Github, I feel like that's maybe the opposite, whereas it's really easy to use, but you start to hit all the limitations of it fairly quickly.
31:49 Michael Kennedy: Yeah, it think that's totally correct. I mean, even simple stuff like nested issues or to-dos, right? I'm not even sure if Github supports those, but it can be like we want to do this thing, but there is actually five things we have to do to make that happen, right, that could be important. Okay, now, you talked about how, if you were the reviewer, you shouldn't see this as I'm assigning a grade or judging this person's code, but maybe thinking of it as like a shared responsibility with the author. And you had a nice picture of bunnies or guinea pigs or something.
32:22 Dougal Matthews: Yeah, it was guinea pigs, I think, I found that on Flickr. Yeah, I think it's really important when you're reviewing something that you take on ownership of that code as well, because I think, effectively, I think the author of the code and the person who's approved it and merged it are just as responsible for it ultimately. This may not work in practice sometimes, because people that submit the changes, it gets merged, they'll disappear, on an open source project, for example. You should really be trying to share that responsibility, and I think this sort of highlights an issue that you would see in something like git-blame, because you sometimes would use git-blame to look through the history of a project and see who has written what, and that will only show you the author and the sometimes the merger, depending on how you are doing things. It won't show you everyone that's involved in the code review process as well. And I think surfacing all these different people is useful as a sign, I think git-blame is a terrible name for the command in general. But I think just having something which sort of helps you find everyone that knows about this, it's more about finding someone that has the knowledge here, rather someone to blame.
33:24 Michael Kennedy: Yeah, I agree. It's certainly blame, well, sometimes it feels appropriate, when you're like, alright, who, what is this, who wrote this? It's not a real positive thing, right? And you're right, it only captures the final commit, which I could have just been the person who reformatted it because there was a problem, right? It doesn't even necessarily mean they created it.
33:43 Dougal Matthews: Yeah, or moved the code around.
33:44 Michael Kennedy: Yeah, yeah, something like that. You also said that code contributions are like puppies.
33:50 Dougal Matthews: Yeah, so that, the picture of the puppies is actually my dog when she was a only couple of months old. But it's quite a nice quote, I don't know where it came from, so I can't attribute it, frankly, but somebody said that code contributions are like puppies, everyone loves them, but you have to look after them, you've gotta walk them, you've gotta feed them, you've gotta care for them. And this is the same for any code contributions that you receive. You need to make sure that they continue to work, you need to make sure that they are documented, and so on, until you can depricate it, and that, often, is quite hard to do in open source projects.
34:22 Michael Kennedy: Absolutely, well, and, I think the puppy analogy is nice, like, everybody loves them, and they're cool when they're nice and cute, but when they're four years old and it's raining and you're busy, do you really want to walk them, right? Or do you want to leave the party early because the dog needs to go out? It's easy to say, well, I just want to change this behavior a little bit. I just want to add this function, I just want to like add an overload or a default parameter here, and think, how hard can that be? It's, like, two lines of code, I'll check it in. Well, there's the documentation, there's that video you recorded, showing people how to use it, that you have to rerecord. There's this thing you have to redeploy, there's this, like, they can, the smallest changes can expand in, like, large ways, right? So, having somebody who knows that, like, is reviewing, going, you don't really fully appreciate the cascading changes that this small change is gonna have, right?
35:11 Dougal Matthews: Yeah, I think it's something that open source developers in particular should feel lucky they can reject code more, basically on this basis. I think people feel quite often pressured to accept contributions.
35:25 Michael Kennedy: I worked hard on it, and so why don't you want to accept it? It works, it passes the test.
35:28 Dougal Matthews: Yeah, but then they are responsible for looking after that code for forever.
35:34 Michael Kennedy: Yeah, yeah. This portion of Talk Python to Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to 3,500 companies who will work to compete with you. Take it from one of Hired's users who recently got a job and said, "I had my first offer on Thursday. After going live on Monday, I ended up getting eight offers in total. I've worked with recruiters in the past, but they've always been pretty hit-and-miss. I tried LinkedIn but I found Hired to be the best." "I really like knowing the salary up front. Privacy was also a huge seller for me." Sounds awesome, doesn't it? Well, wait until you hear about the signing bonus. Everyone who accepts a job from Hired gets $1,000 signing bonus, and as Talk Python listeners, it gets way sweeter. Use the link hired.com/talkpythontome and Hired will double the signing bonus to $2,000. Opportunity is knocking. Visit hired.com/talkpythontome and answer the door. You know, I had a recent experience with this. I suggested some change to Pyramid, the web framework, and they're like, great, and if you're willing to be the maintainer of this part of that feature for two years, then we're totally happy to accept that. Is there a simpler way we can solve this problem? So, they subscribe to the puppy philosophy.
37:00 Dougal Matthews: Yeah, well, it's nice that they were willing to give you that opportunity, because actually, it is quite hard to get to the point of trust with people sometimes because a lot of people might promise to support something for two years, but then they'll just disappear shortly after.
37:15 Michael Kennedy: That's right, that's right. The other thing that I thought was pretty insightful was you said that not just senior devs should be reviewing code. It shouldn't be a top-down, so like, I'm above you, and so I look down on your stuff and I tell you whether it can be approved but also juniors should do reviews, maybe even a senor's check-ins, right?
37:34 Dougal Matthews: This is a mistake which I have seen in a couple companies. Not recently, actually, so hopefully it is a trend that is dying out. I've seen cases where only the seniors were allowed to review the code, and the juniors would then be subjected to the review, but they would never get the chance to review themselves. There was actually a study which found that senior developers were actually doing slightly better code reviews, just because of their experience. And they also found that junior developers most quickly improved, and most quickly became on par with the other seniors, by doing code reviews themselves. I think that really highlights back to all the other benefits that come out of code review, all of the shared knowledge and shared understanding, that's how the juniors become seniors.
38:15 Michael Kennedy: Yeah, absolutely, I totally think that's a great idea and a great point. Sure they'll find fewer bugs, potentially, because they don't really know as well to look for them. Maybe their suggestions will be pushed back upon, like, I suggest you rewrite this in a simple way; well, no, no, it won't scale, we have a million users, and so on. But even just having that conversation, making them actively think about it and propose those those changes, whether or not they actually make it through, the value is already being gained from that, right?
38:41 Dougal Matthews: Yeah, absolutely.
38:41 Michael Kennedy: Okay. Very nice. You also said, having a review checklist might help. It seems to me like that could depersonalize it a little bit. Like, look, these are the things we'll check for, so we're goin down the list, and it's not me being a jerk to you, this is a list, it applies to me, too.
38:56 Dougal Matthews: A code review checklist can be very beneficial because it gets everyone really on the same page, and everyone is kind of reviewing to the same standard and the same expectations. And the one thing you need to be careful with is that you don't include anything which could be automated. Because there is a temptation, it's easier to write something on, sort of, a review policy, than it is to automate it. So you just need to make sure that that doesn't happen. But otherwise, I think it's a really good idea for projects to explicitly state what they're expecting. And this can be beneficial for the reviewer and also somebody who is submitting something for review, because they know what they are working towards.
39:28 Michael Kennedy: Yeah, that's cool. Certainly, it seems like you can just build the checklist and go, okay, what of this can we automate? And we're not gonna use this checklist until we've gotten everything that can be automated automated.
39:38 Dougal Matthews: Yeah, those are two good ideas, that can be a really nice way to build up your priorities.
39:41 Michael Kennedy: Yeah, for sure. Like automated linting and things like that we already talked about the bots providing the nitpicky stuff, and so on. You also suggested possibly having multiple reviewers, not a one-on-one situation.
39:53 Dougal Matthews: Yeah, I think a lot of people actually assume that code review is essentially one-to-one. You have one person doing the altering and one person doing the reviewing. But you can get real benefits from having multiple reviewers. So this is something I hadn't done before, working on OpenStack. And in OpenStack, every change has to have at least two core reviewers. And also, then, they'll also have other reviewers, or just community members that are interested. So every change is, I actually looked at the numbers at one point, every change on average has between three and five people reviewing it. So not a huge number, but only two of those were gonna be in-depth reviews. And it's just really surprising to me how often I will review something, and someone else would come along and review, and actually see something that I missed. It helps multiply the benefits of the shared knowledge and the shared understanding. It also takes the pressure off code review. So you can not feel like you're the only person who is verifying this, you don't feel as responsible, but you kind of share that more.
40:51 Michael Kennedy: It's not like you're the goalie, and if it goes past you it's over, right?
40:55 Dougal Matthews: Yeah, exactly.
40:55 Michael Kennedy: Interesting. We definitely talked about the short reviews. Do you think it would be beneficial to kind of reverse that as well, and have multiple contributors to, like, if I'm submitting a patch, this probably works better in a corporate space but if I'm submitting a patch, and it's getting reviewed, I also have to sit in on the review, maybe be a reviewer, maybe just observe someone else's code being reviewed as well. Like, having that other side of the story, not be one to many.
41:26 Dougal Matthews: That sounds a little bit like doing pair programming, and maybe in an asynchronous way. So you have multiple people contributing to the project. And I think that that can work well, but it does probably require that you know people a bit more. So this is something that I do see happen in OpenStack fairly often. And some of my colleagues, if I see that that they've got a review up, and then there is a small addition or something else that I can make to it, I'll just issue a change to that. I guess it also requires the tooling to support it. It would be quite tricky in Github, for example, because you'd have to send a pull request to the branch which is the original pull request, and it becomes messy.
42:01 Michael Kennedy: Yeah, yeah, I'm certainly thinking this is a place where you guys sit down over, like, a shared projector, and have this conversation. You know, something like that, peer open source, where it all asynchronous I'm not sure if that makes any sense.
42:13 Dougal Matthews: I work at home remotely, so I always think about everything as being asynced and whatnot.
42:20 Michael Kennedy: Yeah, yeah, same here, I work from my house as well. Okay, all that stuff is really interesting. Now, we talked a little bit on the tools you might use to make this work. So we've got Gerrit, which is open source Python, it's like a web app that you install. Do you, like, connect it to your Git repo and it watches the branches? Or, how does that work?
42:40 Dougal Matthews: There are integrations for Github, which, I've not used them very often. But the way you interact with Gerrit is with a git plugin, which is also actually written in Python. So you pick install a project called git-review, and that gives you a new command, which is "git review". And there's a little configuration file in your code repository, and then that will send the review up for review to the destination of the Gerrit.
43:07 Michael Kennedy: Okay.
43:07 Dougal Matthews: And you will have an account with Gerrit already, with your SSH key in there, so it knows who you are.
43:13 Michael Kennedy: Nice, I see, so you do, like, a push to that origin or something, right? Then it has it. You don't necessarily have to hook into the full code repository.
43:21 Dougal Matthews: Yeah, so the git review, you, it does a couple of different things, the command. But right now, what it does is it sets up the remote for you, and fishes and creates the branch for you to change on the remote one, and gives you, like, a review number, and that kind of thing.
43:33 Michael Kennedy: Yeah, yeah, very cool. And then, Github, obviously we talked about that a lot. They just added a full, proper code review where you can go to a check-in, and, like, expand it out and comment. So Github is coming along. There also it Bitbucket. Probably a lot of people use Jira; I don't know the source, you said, relative to Github. But is seems like, at least, issue and review stuff happens a lot in Jira.
43:52 Dougal Matthews: I'm not sure it Jira has code review built in, because I've not really used it. The one I do hear good things about is Phabricator, so this is actually written in PHP, I think, and I also think it came out with Facebook originally. I've heard of some people really having some success with that, but again, I've never actually used it myself. The only two that I have recent and extended experience with are Github and Gerrit.
44:15 Michael Kennedy: Sure. Okay, excellent, so, some tools out there. I know, actually, in this whole space, there's, like, a billion tools, right, there's a whole bunch of small things. But these are some of the big players in our space. Very cool. I think we kind of getting toward the end of the show, we're running low on time. Let me wrap it up with my final two questions, as always. So, if you're gonna write some Python code, or any code, what editor do you open?
44:35 Dougal Matthews: I use Vim all the time. Yeah, I work in so many remote servers and VMs that I have some scripts to kind of set up my Vim environment on that machine, and then it's just great.
44:48 Michael Kennedy: Oh, nice, yeah, that's super portable, awesome. Okay, and, favorites from the PyPI package? Have we passed 100,000? I'm gonna look while you're thinking. We're definitely getting up there, it is at 99,706. I think tomorrow we'll be at 100,000 packages. So there are, rounding up, 100,000 packages in PyPI. What ones do you recommend for people to check out that are awesome.
45:12 Dougal Matthews: So, I think some of my favorites are the fairly common ones people know about, so I picked a couple of more obscure ones which I think are just quite nice and have been useful in my development process. So Vulture is one that you actually covered in Python Bytes. You referenced one of my blog posts about it, and basically it's a tool for finding dead code, so it finds functions which are never used and that kind of thing, it's just really useful.
45:37 Michael Kennedy: I think that's a really cool project and I'm glad we that talked about it, because nothing is more frustrating to me than spending half an hour trying to understand what a function does in the context of an app, to realize, oh, I don't understand it because it doesn't do anything, right, like, it's super cool, so, very nice.
45:53 Dougal Matthews: It's amazing how often I've found projects which have a function and there is a unit test for it, but it isn't used anywhere else but because there is a unit test, everything that inspects the code thinks that it is being used. But with Vulture, you can exclude any uses in your test cases, for example, and then it will highlight it, and so that works quite nicely.
46:12 Michael Kennedy: Very cool, very cool. Alright, what's the next one.
46:14 Dougal Matthews: The next one that I have been using recently is called pipdeptree. So that's one word. And what you can do with pipdeptree is you can provide it the name of a package which is installed in your virtual land, and it will actually give you the tree of requirements for that, so you can see where another package came from, why it's there, and so on.
46:31 Michael Kennedy: Right. You might be looking at a virtual environment via "pip list", and going, why do I have beautiful soup installed? Like, I didn't install that, right? And this would tell you, right? This installed because of this, which installed that because it needed that or whatever.
46:41 Dougal Matthews: Yeah, and it prints it out in a nice sort of tree structure. And then it's got a some other good things, like, you can do the reverse of that, so you can have it tell you all the requirements for a package, or you could have it trace where something from, like, the opposite end. So, from the leaf node of the tree, rather than the trunk.
46:57 Michael Kennedy: Oh, that sounds awesome, I didn't know about that, sounds amazing. And the final one you said is entry_point_inspector?
47:01 Dougal Matthews: Yeah, so this is probably the smallest one, and I guess this is because I do quite a lot of Python packaging. But if you have, so say for example, when you install pipdeptree, it creates the "pip deptree" command for you that you can use in your terminal. But actually trying to find out where these commands come from is quite difficult. So entry_point_inspector will actually inspect the Python entry points that are defined in different setup.py libs, and all this amounts, you can see where all these commands are coming from, where they're defined, and what packages they're coming from.
47:33 Michael Kennedy: That sounds really cool, too. These are all great. Awesome, thank you for sharing them. Final call to action, what should people do after listening to this, they're all inspired about code review?
47:41 Dougal Matthews: If people are inspired about code review, they should just think about how they can improve their process. They should start thinking about what they are doing, build improvement into your code review process. So every now and then, have some kind of retrospective and think about what is working and what isn't working, and just try to improve that.
47:57 Michael Kennedy: Yeah, remember, remember why you're doing it, right?
47:59 Dougal Matthews: Yeah.
48:00 Michael Kennedy: Absolutely, okay. And then, EuroPython is coming up, right?
48:03 Dougal Matthews: Yeah, so I did touch on this on this earlier but EuroPython is going to be this year in July from the 9th to the 16th. These dates are officially tentative, but I think they will be finalized by the time this episode is out. I have to give that disclaimer. And people should start thinking about talks. So we will have a couple papers coming out very soon as well. I love people to speak if people have never spoken before, I'd be happy to speak to them about ideas and help them. So find something that they want to talk about, and that would be good.
48:32 Michael Kennedy: Yeah, that would be great. It's a very welcoming community, so a good place to get started with public speaking, right?
48:37 Dougal Matthews: Yeah, absolutely. It's where I first started speaking.
48:40 Michael Kennedy: Well, Dougal, thank you so much for being on the Podcast and sharing all these ideas with us. It's been great.
48:43 Dougal Matthews: Thank you, I really enjoyed it.
48:44 Michael Kennedy: You bet, bye. This has been another episode of Talk Python to Me. Today's guest has been Dougal Matthews, and this episode has been sponsored by Rollbar and Hired. Thank you both for supporting the show. Rollbar takes the pain out of errors. They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed, until your users complain, of course. As Talk Python to Me listeners track a ridiculous number of errors for free at rollbar.com/Talk PythontoMe. Hired wants to help you find your next big thing. Visit hired.com/Talk PythontoMe to get five or more offers with salary and equity presented right out front, and a special listener sign-in bonus of $2000. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point. Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python. And if you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/Pythonic. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python, we should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss, on talkpython.fm. Our theme music is Developers, Developers, Developers, by Cory Smith, who goes by Smix. Cory just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music. You can browse his tracks he has for sale on iTunes, and listen to the full version of the theme song. This is you host, Michael Kennedy. Thanks so much for listening, I really appreciate it. Smix, let's get out of here.