Learn Python with Talk Python's 270 hours of courses

#266: Refactoring your code, like magic with Sourcery Transcript

Recorded on Thursday, May 21, 2020.

00:00 refactoring your code is a fundamental step on the path to professional and maintainable software. We rarely have the perfect picture of what we need to build when we start writing code. And attempts to over plan and over design software more often lead to analysis paralysis rather than ideal outcomes. Join me as I discussed refactoring with Brendan McInnes, and Nick Chapman, as well as their tool sorcery, which adds automatic refactoring in the popular Python editors. This is talk Python to me, Episode 266, recorded may 21 2020.

00:46 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on twitter where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm. And follow the show on Twitter via at talk Python. This episode is sponsored by data dog and linode. Please check out what they're offering during their segments. It really helps support the show. Brendan Nick, welcome to talk Python to me. Thank you very much.

01:13 Thank you for having us.

01:14 Yeah, it's great to have you here. I'm such a huge fan of refactoring, and code quality and all these these ways of like taking living software and making it evolve, right? Like, I think Long gone are the days of we have to plan this perfectly. And then we're going to build the perfect thing that we've thought up, right. And so having this idea of continuously evolving and improving code, it just frees you from worrying about trying to get it all right, and you can just get started. And so I'm really excited to talk about sorcery and refactoring with you guys. Awesome. Yeah, we think iterations super important as well, sort of trying to get a skeleton of the thing up and running. And then sort of tidying up later. Yeah, absolutely. It's kind

01:52 of how we like to work. No one really understands the domain when they first write the code Anyway, you have to write the code, find out all the mistakes that you've made, and then tidy it up, clean it up. Over time, like you say, evolves into what looks to be a nice quality code base and solves the needs of the users.

02:13 Yeah, I think that's a super good point. Because understanding, you don't fully understand it until you've gotten mostly into it. But if if you try to understand at all, like, it's just so much work, even if you get it right, it's so much work to get to that point that you might as well just written it three times. So that's awesome. But before we jump into your more that, let's just get your stories. And Nick, I guess we'll start with you. First. How did you get into programming in Python? So

02:36 it's the first program I did was back in school on these graphics calculators we had back in the 90s. And I remember reading a book about complex numbers being super proud of getting my calculator to a fractal. I think it took about 12 hours to draw it up with a Mandelbrot set on my little calculator, like

02:54 the Mandelbrot set or something cool like that. Could you zoom in,

02:57 I think you could zoom in, but it would take another 12 hours to show you.

03:01 I only do it twice, because then the battery would run out pretty much

03:04 my little bits of basics and stuff. But I never did programming in a serious way until joining a software company after university. What are you studying university? So it's math, some philosophy? Okay.

03:14 That's a cool combination. Actually.

03:15 Yeah. And there's like a lot of logic, which I guess is quite key to, to programming. Then the first languages I learned in the software company were this mainframe language called IBM RPG, I don't know, if you've

03:26 come across that one. I could not look at RPG source code and tell you what it is like, I couldn't identify it from code. We have heard of it. That's incredible.

03:33 So it's like punch cards. Everything has to be in the same in the right column kind of thing. Okay, cuz there was this finance company, Brendon right there as well.

03:41 It sounds useful, but I think I'll stick with Python. Yeah, this guy

03:45 had this codebase from the 80s. Okay, cool. And he had to break him in the green screen terminal. So I didn't put me off.

03:50 Yeah, if you go through that you're definitely good for this industry. If you can make it through that. Steve, you'll be good.

03:55 That was delfy, which I also look back on not being super amazing. And then Java. And then it was only when I joined Imperial and started doing sort of machine learning. I got into Python, I guess. Mid 2010s. Well,

04:08 you like what is it this language have semicolons? What happened to it?

04:11 Yeah, I mean, the Java just seems like a breath of fresh air. Really? When I'm doing Java. It seems so verbose.

04:17 Like why I can't read this or symbols all over what it's trying to tell me. Yeah, yeah, curly braces and parentheses and semicolons. And the

04:24 way they do the libraries and things actually, it seems to be all done in a very verbose way. Yeah, indeed. Lots of boilerplate. Nice,

04:30 cool. Well, Brendan have a the first programming I did was in my final year master's degree. So we had a project to do, which was to simulate various mathematical equations. So the ones that I chose were the heat equation, and it would simulate one wall being hot and another wall being cold, and various obstacles between the two walls. And then, after ran, who would show Charge of the temperature at different places in the room. And the other thing I simulated was the Schrodinger equation with quantum mechanics.

05:10 Yeah, quantum mechanics is so interesting. And it's just like, it seems like such a weird Twilight Zone alternative reality. And yet it's, it seems to be applying to real reality. It's such a weird world. I love quantum mechanics.

05:22 Yeah. So it's amazing how accurate the predictions are. And yet, when you try to understand it, it just has no, no similarity to reality,

05:32 making that up, aren't you? There's no way this is real.

05:34 Yeah. When you get so small, it doesn't match up with what you understand. How can this object be in two places at the same time? What does it even mean? Exactly? Yeah. Well, the equation, I was just simulating a single object and the waveform of it. So it was quite simple. But the way they did it was his min GW go and program it in C, and then just enter software into it. And we knew nothing about how its program. So my programs were just hundreds of lines of statements. Absolutely no functions. Nothing. No, yeah, yeah.

06:14 I think we probably have similar memory leaks in there, as well as written and see

06:17 Oh, yeah, the whole thing. Yeah, whole thing was pretty terrible. But you get to the end of the project, and you simulate it correctly, and you think I can program now, I know how to program.

06:30 Even if that's not entirely true, I think you do come away with this feeling of like, Oh, my gosh, look, what I built like, this is so awesome. Even if it's not really it, just like that feeling of creating that thing is so cool. In those days when you're getting started.

06:44 Absolutely. I think that Yeah, the first time that I managed to output the heat equation on onto the screen where you see the different temperature across the room. So they provided us with a charting library to use. That was amazing. So Wow, I can throw down. I can do anything. I know computers now.

07:05 Yeah, super

07:06 cool. So. So yeah, after I finished university, I joined the same company as Nick, learn RPG, and I learned LCA, I learned Java. And then I achieved quite a senior role in the company as part of the architecture team. And I've managed to introduce Scarlet to the company, which Yeah, we did a language comparison. And it was between JVM languages. So one of the languages was actually j pi for node, Jason, we're going to go for that or j Ruby, because it wasn't first class language on the platform. So really, the main contenders were Java, Scala and closure. And we had someone come in and talk to us about Scala, who was really, really amazing. And he convinced us to use it. And so yeah, so I ended up leading the project to bring scholar into the company and re architect a lot of the systems and that when I left that company, I was very much down the functional programming route. So I did some closure, which is very functional. I did some Haskell. But then I, I joined NEC Imperial College London. And that's where he said, he got into the pipe. And that's where I got in into it as well. So I joined a reading group that was all about deep learning. And I went to the reading grade, and we read a paper, which was called differential neural computer. And I had no understanding what it meant. And I was like, Okay, I'm going to go home, and I'm going to implement it. So I went home, and I cracked out some pipes and cracked out sensors. And it took me weeks and weeks to implement it, because I had no idea what was going on. That was the start. So I just started implementing more and more of these papers as I went to more and more reading groups. And that tended to be all of the Atari playing, reinforcement learning algorithms. Just really fascinating to me, okay, cool. Learn how to play break out and learn how to play Pong and right,

09:21 maybe some pitfall in

09:22 there. I've never tried it on pitfall actually was a good one.

09:27 Nice, I'm really fascinated with these ways to train AI is around like video games. I mean, I'm blanking on one of the options but there's, there's a handful of libraries that you can kind of like plug your AI into the virtual world. So it has somewhere to interact with and things to interact with. And yeah, that's fascinating.

09:47 Yeah, I really just wanted to touch tonight and that's why I was doing it every week and, and then just around that time AlphaGo came out as well and it was like a shock to the world. So Like this, the hardest game that humanity can play is facing by computer.

10:05 Yeah. And it's one of the first might have been the first real AI opponent that used like strategizing rather than just deep exploration of the path, right? Like the, the chess one is like, well, I can hold 12 steps ahead in every direction. In my mind. That's more than the chess masters. So right, like, we'll just play them all out all the possible futures out, and then go down the best one right that but that's not how AlphaGo works, which is, I think part of the magic.

10:34 Yeah, it's cool. Yeah. That's the interesting bit. Yeah.

10:38 Yeah. Yeah, it was, it was just fascinating the way they did that. I ended up writing a version of AlphaGo to play Connect for Okay, which is actually pretty strong. Is that was a really fun project as well. I mean, it can beat me. So. Yeah, very strong. Yeah, I'm an average.

10:59 Yeah, how cool. How cool. All right. So what are you both working on the same project yet? Again? What do you do in day to day? What are you doing now? We're both working on sourcery.

11:07 Full time, we kind of started working on it back at the end of 2018. From Brendan's flat, slide 10. Up and heavy still SPM is eating cereal. Sit down and encode away. Yeah, we're kind of totally focused on making sorcery as good a refactoring tool as can be really?

11:26 Yeah. Cool. We'll get into what sorcery is more later. But what's the quick elevator pitch? Before we dive into just some more general software stuff,

11:33 I guess we've been pitching it, it's kind of grammerly for code. And if you know what Grammarly is, it's like improves the style of the code. Without changing the sort of meaning or the content, I guess that's what refactoring is, you improve the quality and the structure without changing the functionality. And that's what we aim to do. So as you're writing it, we analyze it, and we suggest refactoring improvements, sort of as you go.

11:55 So maybe I got a loop, and I'm doing some kind of accumulation into a list. And it could say, you know what, that could just be a list comprehension. Yeah, exactly. Exactly. Yeah. Okay, that sounds awesome. It's easy to just get focused on writing code, and then not really worrying about the quality. It's also equally easily easy to get super obsessed with the quality and not just get the thing done, right. I kind of see like, bi modal distributions, ones, like I don't give a crap, I are always at work. So I'm just going to write Right, right. Yeah. And, you know, that's one group of people's philosophy. The other is they're like, really slow and super meticulous to get it just right. Because they want to write it the best way. Or it sounds to me like the tool would let people kind of find a middle ground, right? Like, write it a little more loose and free. But then it says, oh, by the way, that thing you just wrote, actually, we could make that way better if you just let us.

12:51 Yeah, that's kind of ideal. And that's some of the feedback we've been getting that lets people write code a bit more quickly, without worrying so much about the quality or kind of tidy it up for them. Okay, cool.

13:00 And then the other aspect of it is, some people don't actually know what good quality code is. So if you're starting out in Python, you may be able to write the solution that you don't necessarily know how to write it. Well. So you may not even know about this comprehensions yet, so right, right, right. And the benefit of using sorcery in that case is, it can teach you these pythonic way of writing code.

13:31 This portion of talk Python, to me is brought to you by data dog, are you having trouble visualizing bottlenecks and latency in your apps, and you're not sure where the issue is coming from or how to solve it. With data dogs end to end monitoring platform, you can use their customizable built in dashboard to collect metrics and visualize app performance. in real time. Data dog automatically correlates logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python application. Plus, their service map automatically plots the flow of requests across your application architecture, so you understand dependencies and can proactively monitor the performance of your apps. be the hero that got your app back on track at your company. Get started today with a free trial at talkpython.fm/data dog. to maybe you don't necessarily know that idioms. I think one of the challenges of Python in particular, all languages have this problem, but Python suffers more than others from it. One is that it's so easy to learn that people feel like they can learn it in a weekend. And then they just go write code in it, right? Because like, like, it's a simple little language. There's not a whole lot to it, which I think is actually not true, right? I still feel like I'm learning Python every day. I'm like, I'm getting Oh, I didn't know or this I should have done this, right. Like there's all these things. There's just there's so much nuance detail to it. But it's easy for people to come from a language like C or Java or something else and just do the Java style programming. They're or the C style programming there. And not, like you said, not even know that there's this other component, right? They say, Oh, I thought it's crappy. It doesn't have you know, a for a numerical for loop. This is crummy, right? When really a better way to do it would be to use enumerate collection where you get the index and the item. Right? You don't have to go back. If you didn't know like, that's multiple layers. One, you have to know there's a four in loop. And then two, you have to know that enumerate is a thing. And then you've got to know about tuple. And back. And like, that's a pretty complex set of topics. If you've Well, I spent the weekend I kind of know it, let's finish this project. Yeah, that philosophy?

15:39 Absolutely. I mean, like you say that the learning the Python language, and it's so beautifully simple, that most people can pick it up in a week to a month time. So then you've got all of these items, and that they're not just the different bits of syntax or the enumerate, there's multiple libraries all over the place, and the libraries that are built into the Python main library, but they're also all of the all the libraries out there that you need to accomplish more complicated tasks. Yeah, just finding out what what the best library is for the job is a difficult piece of research. Often it is, sometimes it's something built in, like it or tools or something like that. Or other times, it's something you've never heard of, because there's 200,000 options, and they all have their their behaviors and whatnot. But if you would grab it like that would take 20 lines down to one, probably be faster at the same time. It's like, it's incredible, right? That's kind of what I was thinking. I'm like, I've never done learning Python, because like, oh, there's this other standard library module I discovered, or I was using counter in this way, or you like, whatever it is, right? There's just all these options. Yeah, we've definitely come across that.

16:47 Sorry, come?

16:48 Yeah, yeah, I can imagine I can imagine coming from an academic space, I'm sure you see that a lot there as well, because there's probably a lot of people who don't see themselves as a developer. But there's Yeah,

17:02 touching Python writing code or code, you kind of write once and then don't look at again, it's kind of, I guess, done with a different aim in mind, you know, they're not so worried about reuse by other people. Right? Right,

17:14 it's actually been quite a challenge for us, because we learnt hyson, together effectively, at the same time. And we've never written Python in a large code base apart from our own. So we've had to learn all of these things ourselves. So over time, we've rewritten bits of the system where we found out Oh, we can use this, this feature of Python, that makes it so much better. And at one point, we integrated my pi into our code base. And that was a big improvement. And you say it slowly learn about these options. And I think a lot of people out there who are learning Python probably fall into that bracket as well, then they're learning it as their first language. They're not learning it with people who can guide them how best to use it. And it's really hard on your own. I mean, where is there as developers, and I think we're finding it really hard ourselves.

18:12 Yeah, to do it, right to take full advantage of it. It is. So it's cool to have like extensions for the ID that will sort of not quite be a paired programming partner, but someone to sort of sit there. But you know what, that actually is not the right way to do it. But it's super easy to fix. And I can take care of that for you, right?

18:28 Oh, yeah,

18:30 exactly. That's awesome, too. We talked about code quality. And that's a little bit in the eye of the beholder. It's also a little bit in the trade off, like Nicky talked about this concept of I'm going to write a script and get an answer and never run it again, right, that has a different threshold for code quality than, you know, the core trading engine at a bank. Right? Like, it would probably be in improper to put that much energy into that script that's going to be run once, right, you should just write the thing and get it to work and not worry too much about it. But the same time if you're building something to be reused, and as important as you'll be run by lots of people, you really want to get it right. And so I think there's a spectrum and people gotta like figure out where they live on it. But no matter where you live, you would like to have better code quality rather than less good code quality by like, whatever applies your situation, right? Oh, definitely.

19:23 It's kind of hard to quantify what cobalt is sometimes you know, when you see it, yeah, I guess who's reading the book clean code. It's by

19:29 Bob Martin. Robert C. Martin. Yeah.

19:31 Yeah. Everybody crystallized it for me. I was always trying to get the graduate, Neil computer worked out to read it. And eventually one of them stole it. So I think, I guess, I mean, they liked it. You want Yeah, it worked.

19:44 you convert it and they stole your book.

19:45 But it's the real real core of his disciples. He goes easy to read and understand. It just reads like a sort of story. Does this notice and this Yeah,

19:53 I really love this idea of clean code and the stuff that Bob Martin talks about. He's got some really good ideas. I don't totally agree with Everything he says, but I think there's a lot of good lessons to take from what he's doing. Sure, yeah, one of the really interesting ideas that come from one of his contemporaries, Martin Fowler, way back at the origins of refactoring. So like, I remember reading the book called refactoring, in 1999, or something like that, just going, my mind is blown, right? Like, I've had this problem of bad code quality. And I've had this problem of like, trying to write it well, or to fix it. And then I realized, you know, reading what he was talking about, like, oh, there's this way to take the bad stuff you've already put down as like sediment in the crystal and like, turn that into something that can be improved and grown over time. And I just remember, it really changed my way of thinking about programming, like digging into refactoring. So I'm just such a huge fan. How did you guys come across it?

20:51 It's an interesting one, because the the code base who worked on our oil company was so huge and difficult to change. Often, we didn't even try refactoring it, it just kind of he did a surgical approach. He went in and try to understand it and make the small changes, you could please don't break

21:05 just a new feature without breaking

21:07 organs, Brendan, you're actually sort of the to the other approach and said, like, okay, I'll just rewrite this this whole bit.

21:14 Yeah, well, I mean, it was a risky approach, given that the system was not under test at all. So I would only do that on front ends components or the UI. But yeah, sometimes it was, there was just a case of, I don't understand what's going on here, I can see what the functionality currently is. So I'm just going to reimplemented from scratch,

21:37 right? You know, if people are in that space, like the whole area of what Michael feathers talked about with legacy code, and like how to take these these huge systems that are hard to change that you don't necessarily know, they don't have tests and how to like break off little bits that are maintainable. That's such a cool book, working effectively with legacy code, I really do enjoy sort of tinkering with code, tracking down bugs, sort of improving it, making it a little better. I think, when I see a blank sheet of paper, or blank screen, that kind of someone's quite difficult to solve, I think Britain's a bit better at that. So I, I super enjoy refactoring. And I guess it's one of the things we're trying to kind of achieve a sorcery is sort of, if there's a machine that can do the refactoring for you, it can be less worried about it being under test, because you know, it's done proper analysis. Because whenever I do refactoring, I break something nice. I would lean heavily on the tests is almost never seems like a good idea to me to do a refactoring manually, if there's some sort of like tool based way in which it will happen, right? It's just, you never know what little thing you're gonna go. There's that one that cron job thing that we had that was also using that now apparently, it's not going to take it anymore. And with with Python, you have compiling, right? So you're not going to catch the obvious stuff. Like I move this function over here. It's just like, well, don't run that part. It's gonna crash

22:57 is one of the challenges with Python? And I mean, the way we've approached it with sorcery code base is testing and my PI type annotations. Yeah, I think they give enormous confidence when you're refactoring the code. So nowadays, I don't do crazy refactor, crazy rewrites like that i incrementally improve through small changes. That's Yeah, I've experimented with that once and realized that's not the way to go. Yeah, the real key is having those tests and the type annotations, you can move something anywhere in the system. And you'll get told about all of the errors that you now have. And then you can go and fix all of those. And then you can do the next refactoring, and build it up through the

23:49 mice or bugs or in the bits we hadn't added my PI to.

23:53 So I totally expected to hear automation from you guys, and applying sorcery back unto itself and things like this. And we'll dig into the features in a second. But I didn't expect to hear typing in my PI. I'm personally a huge fan of the type annotations in Python, I think they make working with Python code. So much easier. You don't you know, annotate everything. But certain places like this function returns one of these just knowing like, actually expect to return one of these is super helpful. And it like, it'll light up the editors as well, right? They can all of a sudden give you autocomplete for what they saw before. But now they know oh, here's the five things you can do with what you got back perfect. How does my pie fit into your world? That's pretty interesting

24:37 be doing with that. We really only use it internally, because most code out there doesn't actually have my PI

24:44 or type annotations, right? Even if it has some type annotations, it's got to be there's like sort of a chain of annotations that have to be consistent for like my PI is a stronger level than just saying, Oh, this function happens to return a list.

24:55 Yeah, I mean, one of the great things about type annotations As you can just look at a function at the definition of it and understand the interface, you don't need to read the code. Without those type annotations, you have to actually read the code and say, Oh, actually, this is a string. And this is an integer, and the returns a list of integers or something like that. Yeah, with the type annotations. Yeah, that's a really good point. I like that. So I think type annotations, a form of documentation are really, really powerful. Just from a readability point of view, then you get all of the security as well as knowing when you've broken the code, or when you've not caught something correctly. One of the things that is really important to us with sorcery is that we never break other people's code. So we have to have extremely strong gut.

25:49 Yeah, so let's take a step back. And let's Why don't you tell people about like, what sorcery is, how do people use it. So you mentioned that it's a plugin Fridays, but give us a little bit more detail, then we can talk about how I'd like you keep from breaking people's code, which is probably people probably appreciate that.

26:06 If it acts as a plugin to your IP, so we've got plugins for VS Code and pi charm. And as you're coding away, it sits there reading your code and analyzing it. And if it identifies a change to a function that you're working on, that will improve the code quality, they'll suggest it to you. And you can review that suggestion. And if you like it, you can accept it. And they'll apply that change in line and you carry on coding, it works almost seamlessly in your workflow.

26:42 Okay, that sounds awesome. Does it change the way the editors work? in other ways? Like, for example, does it change the autocomplete? or things like that? Or is it really more like the code intentions? Like the little light bulb in PI charm?

26:56 Yeah, it's kind of exactly like a code intention in PI charm. That's kind of the thing we've gone with. Yeah. So it does a little underlying both.

27:02 So as you're going along, you're watching Oh, there's a little pop up, I should go see what this is about. Yeah,

27:06 exactly. And it runs locally on your machine, I guess that was quite a sort of concern for people, they didn't want their code being sent to the cloud kennametal. When we first started it, we were gonna do it kind of on the service in the cloud, and we kind of had to do a pivot. And yeah, get it running locally on the machine. Okay.

27:24 So somehow behind this, like, how does it make decisions about refactorings? Is it like an AI based thing? Is it pattern matching? Like, what is it doing inside?

27:35 So at the moment, there's no machine learning, or AI. And the way it works is, is essentially pattern matching. So it's looking for while it works at the level of the abstract syntax tree, so it takes the code and it parses it into a data structure. And that data structure will have, for instance, an F node, and within that F node, it will have a function call. And this node will also have a list of statements. It looks at those nodes. And it looks for the patterns like you say. So, for instance, it might look for a for loop that has an if statement within it, that appends to a list. And then it says,

28:22 okay, like that's a list comprehension waiting to be made right there. Yeah,

28:25 exactly. And I guess,

28:27 the clever bit is it kind of has these little, lots of little tiny little patterns of improvements it can do. It can compose those together into like a bigger refactoring. And it's guided by a load of code metrics, we've kind of incorporated into it so we can get into those later,

28:41 like the cyclomatic complexity and some of those types of things.

28:45 Yeah, so we do cyclomatic, we do use cognitive convexity, which I think is like a trademark of sonar cube or something, but it's a different metric. We use a few we've written ourselves. And so it can kind of chain together little refactorings to do something bigger. So, for example, on the if you've seen the gilded rose refactoring, Carter, no, tell people about it. So it's kind of this big fantasy, let's

29:11 take a step back, what's a kata.

29:13 So a coding cartridge is a coding exercise to improve your programming. Right, right. And there are various ones the first ran into that. And gilded Rose is like this big, complicated set of nested ifs, basically. And it's all about this fantasy. In I think, it takes maybe an hour to kind of manually sort the code out and refactor it. That's sort of an exercise people do. And I sort of am when we started sorcery was to this was like, our initial target problem. So we can kind of do all that work at once by chaining together lots of little refactorings. So it can take the sort of complex mess of spaghetti code, and then turn it into something understandable,

29:51 right? Instead of having to say okay, here's a little if statement that could be improved, and then apply it again and say, Well, now that we have this code, there's Another thing we could improve than a plan again, it was like chain those all together and go, actually, we could roll this all up.

30:05 Exactly, yeah. Because when you're doing manual factoring, that's kind of what often happens, you sort of do a right thing. And then you realize, Oh, now I can do this. You could have you might have a name in mind, or you might not. And then you start chaining these things together. And in the end, you're like, oh, now it's understandable. codebase.

30:19 Yeah, yeah, very cool. Oh, that sounds super, super useful. I know that some refactoring is built into certain tools, like pi charm has certain refactorings. But they don't seem to take this more holistic approach, right? They're like, Oh, this list, this list, comprehension could be expanded to a for loop if you need it, or something like that. But that's kind of as far as it goes.

30:38 Yeah. And they're kind of very developer driven, you have to know you want to do them. Yeah, you have to know where you can do them. And then you have to do them. So they're very useful if you know, you want to do something, because it'll do it for you. And like, right, we will make mistakes. But they don't sort of, they don't tell you if it's a good idea or not. So our idea is we're kind of suggesting things that we think are good ideas to actually change.

30:59 This portion of talk Python to me is brought to you by linode. Whether you're working on a personal project or managing your enterprises infrastructure, linode has the pricing support and scale that you need to take your project to the next level. With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise grade hardware, s3 compatible storage, and the next generation network will know delivers the performance that you expect, at a price that you don't get started on the node today with a $20 credit and you get access to native SSD storage, a 40 gigabit network industry leading processors, their revamped Cloud Manager cloud.linode.com root access to your server along with their newest API and a Python COI just visit talkpython.fm/ linode when creating a new linode account, and you'll automatically get $20 credit for your next project. Oh, and one last thing they're hiring go to lynda.com slash careers to find out more, let them know that we sent you. Let me share one of my favorite concepts from refactoring that ask you about some of your favorite refactoring. So there's all these different refactorings, even in the early days that Martin Fowler talks about, like, okay, there's a God object, and here's how you break it down, or there's a function with that's too long, here's what you do, and so on. But those are all kind of interesting, like, the most interesting concept around all that stuff, to me was the concept of a code smell. Right? Just Just like, there's something wrong with this, like, it's, it works. But your nose kind of turns up, when you look at you're like, there's something wrong with this part of my code or code, probably I inherited from somebody else, right. And the other thing was, he would talk about comments and say, often comments have value, but a lot of times they they're really just deodorant for these code smells like this is really hard to understand, because it's written badly. So let me write a comment that tells people what it really means. Yeah, and then just leave the bad stuff there. Right, it kind of deodorizes the code smell a little bit. And that's this idea of like, if you have those comments, it's like this underlying thing of like, you should start thinking of applying these different refactorings. So my question to you with, like, sort of put that out there is, what are some of the favorite refactorings that you guys are seeing possible with like this, this deeper integration, right? Like, obviously, for loop to lose comprehension, less competition, the for loop, like those are pretty straightforward, but it sounds like there might be just things you just really love, or there might be like some more interesting, larger refactorings. The main

33:27 code smell that I think sorcery does really well with is eliminating duplicate code within a function, and in particular, within different branches of the complicated set of expressions. So you may have the same body of code in two different places. And sorcery can restructure the code until there's just a single condition that applies for that block of code. That's cool.

33:54 I think I think that lets me delete lines of code is always very pleasing. So yeah,

33:58 yeah, yeah. and delete conditionals as well. All right, if possible, or simplify them. Yeah, I feel like the more the words legacy gets applied to a code base, the less you want to do those kinds of things. Like I'm pretty sure these things, these three things are the same. But I don't want to be responsible for what happens if I misunderstand that these are actually slightly different, and try to refactor my version. So as much as you can get software does you go Actually, no, this is totally safe. We got you,

34:23 especially if you're interested in because one of the things that has happened is, we've suggested refactorings to pay for and they've gone, this is incorrect. And there's been this these examples of eliminating duplicate code and simplifying this expression. It's kind of once you work through the logic. Yeah.

34:45 We've discovered actually sorcery is correct. And it turns out your code was either very confusing or possibly have a bug in it, which you have now identified, right, right. You thought these two things were doing different stuff, but in fact, it has no The effect of this is not what you had in your mind. It's actual effect being right. Yeah, misunderstood what was actually happening. And your mental model didn't match the refactoring result. But that's because it wasn't actually doing that.

35:13 Exactly. And so actually, once it's done that refactoring, you can say, Oh, actually, there is a bug in my code. I can fix it. But you have to have the trust and sorcery to know that it's correct before you're willing to take that step. So it takes a little bit of usage to build up that trust

35:33 to how do you guys check that your refactorings are valid.

35:37 So like we said, there's, we have a library of smaller refactorings. And then we have search engine that composes those together. So the important thing is making sure each of those individual refactorings in the library is correct, right,

35:52 right, because composing a bunch of things that are correct, is not going to break anything.

35:56 Exactly. Yeah. So the challenge is to try and make sure that those individual ones are correct. So we have lots and lots of tests. And those tests are performed. Here is a piece of source code. And here is the expected refactor source codes. And for each refactoring, there's a multitude of those. But there's also a multitude of tests of the form. Here's a piece of source code that looks similar to these other bits that you have pre factored, but you should have refactor it. Because if you do refactor it, or if you do make the change, you'll break the code. And so that's not in there. Yeah, it's not a true refactoring. And it will tend to be things like you're calling a function. And so you can swap these statements because one of them is actually a global variable. So we have an awful lot of analysis, which determines what statements in a function depend on the other statements? Yeah, sounds interesting. It turns out, the hardest problem that we've tried to solve is

37:01 I can imagine, have you guys looked at using things like hypothesis or other property based testing, where it's like, here's a block of code,

37:10 apply refactoring to it, feed it a bunch of inputs to both versions, and see if long as you get the same outputs, or things like that, that's a future plan that we have, we do have a second form of testing that we do at the moment, which is, as part of our build process, we run sorcery over a whole bunch of popular open source libraries, and refactor them. And then once they've been refactored, we run the tests over themselves to check. We haven't broken any of their code, because people have already written a ton of tests for SQLAlchemy for requests or whatever. Right, exactly. So I mean, that has identified I mean, when we first introduce it identified lots of issues. And since then, it stopped us releasing any any new bugs, as far as we're aware,

38:01 was like a pretty good way to just hit it with enough information that it's gonna get caught any issues. Have you found errors and other libraries because of this and gotten back to them? Like, you know what? This is actually, we thought our stuff was broken, but actually your stuff is broken. I mean, that'd be cool.

38:19 I mean, sometimes the tests are failing and masters. After a while, we just decided to pick a tag and stick with it, and be done with it. But actually, we found that there's just a lot of that code to review. So we tend not to review at the moment.

38:37 Yeah, it's not your job to check all open source libraries for correctness. Right?

38:42 Exactly. And unlike some of the libraries, like SQLAlchemy, that we do run out on are absolutely enormous. There's hundreds of files and multiple drivers for different database backends. So takes a long time to assess. Yeah, yeah, we just rely on the test to tell us whether we're good to release or not. Yeah, it's cool. But yeah, the hypothesis based testing is a very interesting idea that we have thoughts about. So the way we were considering doing it was exactly how you talked about you, you write a piece of code, and then you put some inputs, and you take what the outputs, and then you run sorcery over. And the way we're going to do it is actually also write a generator for source code that takes maybe an initial piece of code and does random mutations to it to start off with. So here's a piece of code that sorcery should refactor. Let's apply a bunch of random mutations to it and then run sorcery over it, check the inputs and outputs at the same again. So yeah, the generator that would have been quite interesting to write and is something that we're considering in the future. Okay.

39:56 Yeah, it sounds cool. It sounds like it would take forever to run but it sounds like it. cool project, maybe you don't run it on every, every save nice. All right, well, let's see, there's a bunch of things I want to ask you about. But I don't want to like go over too much in time. So I guess one area that looks interesting to me is, we've talked about this being a plugin for an editor that's interactive. You also talks about just applying it to open source libraries. And on the sourcery homepage, I see that there's a, you know, get instant quality of your Python code base, like just pointed at your repo, and it'll give you some answers. So is there more stuff that it does then just be a plugin, like a COI way to like to use

40:37 it, so it is also available as a GitHub bot. So you install the sorcery bot, into your GitHub repo. And every time you do a pull request, sorcery will review that for requests. And if it finds any improvements to the code or to any other files that have been touched by the pull request, then it will create a pull request on top of that saying, here's the changes that you can make an improvement to it, yeah, then you can just merge that pull request in straight away

41:10 when you first install it. Also, it can refactor the whole library kind of all at once. Very

41:15 cool.

41:16 All right. So let's talk about pricing. So this is something that is free for some people, but it's not free for some other people. What's the story with whole business model? Open Source side of things? Like what is? What are you guys offering here? Because it sounds really useful to a lot of people. But at the same time, you are charging some folks for it. So that might that might influence people's opinion on how they feel about it.

41:39 Yeah, so the plugins are free. At the moment, we think in the future will probably introduce a premium version and still have a free version. I see. So if I'm sitting here, I want to write code on my MacBook on Python. Yeah, I can just go get it for free. I don't pay anything. Just go get it for free right now not pay anything. That's why they're open source or closed source or, or anything. I could be working for a bank, even though

41:59 he could we have had Jesus from banks working on it, or using it. So yeah, right on. But there is some business model

42:05 where you guys charge money for something. So what do you what is that side of the things? Yes. So for the code review, it's free for open source again, there if you want to use on a closed source repository as a small charge per developer per month, basically. And that's something we've only just released in the last few days, basically. Alright, cool.

42:24 So basically, if I'm gonna apply it to my code base, as a autonomous bot type of thing, and I'm open source, it's 100%. Free, right? So if I was taking care of requests, or sequel, alchemy, or flask or whatever, I could just plug it into the flask repo on GitHub, and all of a sudden it would solve those problems if like, you didn't seriously, just give me a for loop that a pen still listed you? Right?

42:48 Yes, absolutely. So yeah, particularly useful for people maintaining large open source libraries, because they'll get a lot of pull requests, and they may come in at various standards. So there's the initial code review for the maintainer of the project. There's another way of trying it out, if you have a GitHub account. And that is simply to star our public repo. And the GitHub got our GitHub bot will find your most popular Python repository, and send you a pull request to refactor your code base. So it's as simple as click a star and you'll get a pull request.

43:30 Oh, wow. Okay, cool. Yeah, I mean, it seems to me really useful to have it just built into GitHub, automatically, looking over the code, because I know you all have worked with different groups of people at different companies, different languages, my experience has been that people don't care about code quality, and refactoring and testing, and maintainability, and patterns and all that kind of stuff. There's a massive spectrum on any given team, some people, it really matters to them and others, those failing tests, and that feeling build is just a nuisance, and how do I turn off the build, so I don't have to hear about it again. Right. And so having it as part of the repo means they get kind of applies to everyone, at least it suggests for everyone, whereas if it's just in the editor, there's gonna be the people who love it, and the people who just like, how do I install this or disable this? So it doesn't, because it's just, I wrote my code, and I don't want it, you know what I mean? Like, there's just, it doesn't matter how much advocacy there is, there's gonna be that. So having that kind of external is pretty cool.

44:33 Yeah, like we started doing in the editor, because we thought that was kind of the way to really make you write code faster and to kind of be Hackett about it. Straight away does the change. But like I saying, definitely, that's why we introduced the plugin because the code review because not everyone's at the same level that kind of brings things up to a level.

44:53 Yeah, so it's got it gives you the benefit as a beginner programmer in the team. So you get those code reviews, but also at Experienced developer saves you time during the code review, because there's already at all doing the simple steps. It's not dealing with the architectural elements of it, but it's making sure each function is nicely written. Yeah, I

45:14 think that makes a lot of sense. Because it doesn't matter how good you are, you don't want to have to go think of the implications throughout the whole code base.

45:21 Absolutely. We're just

45:22 gonna look at say the tool set is good. So it's good.

45:26 Press merge.

45:27 Yeah, beautiful. But I do think it's really important that the editor, because it teaches you maybe could teach you It teaches you the idiomatic the pythonic ways of writing things. Sure. Like, I had no idea that I could create a four loop with innum that had tuple unpacking instead of like, trying to do a four over range, and then pulling out the item and things like that. So it seems like a really cool combination.

45:49 Yeah, I think definitely that educational things. I want to focus on more like improving our documentation. So I wrote a blog post recently, with a few little refactorings you're doing like why we think they're a good idea, as opposed to just what

45:58 we've done. Is it the one that is called Python? refactorings? Part one? Yeah, yeah, I looked through that. So maybe you could give us a couple of refactorings out of there that you like code

46:07 hoisting is like, one of the best things because anytime you got duplicate code, you've got a way can introduce mistakes real easily. So that would be like maybe you have the same code. And if an else statement, yes, and it's just duplicated. So often people will write sort of a bit at the end of the same thing in the if and the answer, and lo develops even, maybe, because it happens in every branch, it means it always happens. So don't actually need to be in the condition at all. And if you take it out, exactly, just put it at the end, it also becomes kind of more clear what the conditionals doing, while it's controlling, because hasn't got this extraneous thing. And

46:39 another one you have in there is converting from a for loop, which does a yield to a yield from that collection directly, which is pretty nice. I mean, it might even apply to code that was written long ago, before you heard from was introduced to the language, but yield was there. And you could say, hey,

46:55 look, yeah, for sure. Naka, she got quite this old way can be gone. But if you come into there didn't realize you could do that. So it's like, a lot, a lot of long reading every kind of Pep. Right, really, and seeing everything I can do. changes my same to us.

47:10 I know it's so bizarre, I think I'm sure there's stuff that I don't read as well. You know, I had to pick a single most favorite absolute love it refactoring, it has to be convert, like a deeply nested set of code to something with guarding clauses. So it's like flat, right, instead of going. If this is true, then if this is true, while this is true, if this is true, and you end up like writing, starting on column 40, to write your code if you you negate them all, and like return earlier, breakout earlier or something. Yeah, it's just so much cleaner. So

47:46 avoiding

47:47 that cases out. That case is out that case is out. Now I focus on the essence,

47:51 avoiding nesting is one of our course kind of code metrics. Some of the things you think we didn't touch on is how you get the computer to realize that there's a code smell, it's like writing good code, how you get a computer to know is quite difficult. So there's Yes. metrics like cyclomatic complexity, which, what's that about? It's about avoiding conditionals. Basically,

48:13 number of decisions. Yeah, how many? How many branches? Would you have potentially go down? Right?

48:18 There's kind of this enhanced version of it, we've looked at called cognitive convexity, which is trying to get an idea of how hard something is to hold in your head. And that really penalizes nesting,

48:27 how many variables are at play? How many other things like that as well, right? Probably,

48:32 yeah. So like, that sort of penalizes nesting, most of all, so that's kind of like how sorcery knows not to like that. nesting is a bad idea. And then we've written written metrics about

48:43 Oh, inner, I'd never really had thought about it that way. But that's exactly the problem is like, the reason it sucks. So much is like that next test is piled on as a, and, and, and, and, and this and that all the stuff that you've nested yourself into, you've got to think of like all those at the same time while I'm in here, because the

48:59 number of things you have to hold in your head, you know, a human can only hold six or seven things in their head at once. Yeah, maybe if you're exceptional, you can do eight. So like, some of our metrics are sort of focusing on how many variables you have to be thinking of, and how many conditionals you have to be thinking of, yeah, when you're sort of halfway down the function, it's gone off to the right somewhere,

49:17 so that we actually call that the working memory metric. Yeah. Oh, cool. That specifically measures the number of variables that are in scope at the current time. So we think, yeah, if you're reading the code from top to bottom, by the time you've got to the 10th line of code, if you've got seven variables in your head, then you don't understand the you don't understand the function anymore. You can't. People that in your head and understand the next

49:44 scroll, scroll back and forth, instead of just reading it. Yeah,

49:48 yeah, there's this really interesting saying from a friend of mine that talked about it once, then, like, when you write code, I guess debugging code is harder. than writing code. So if you write code at the very limit of what you're kind of able to write and do and like the most complicated stuff you can do, you probably can't debug it. Yeah. Because try to think through it actually, it's like, more complex, and you kind of just pushed it over your limit. And so there's anytime you can kind of dial that back a bit through refactorings, or other stuff, like you know what that should be three functions, then you won't have to think about

50:23 so harshly. And like the most three

50:25 things in this part

50:26 so much, but it's interesting figure we found in a scientific paper that analyzed developers was sort of, they spend 70%, or we spend 70% of our time trying to understand the code, only 5% of the code time actually typing. Yes, that 70% of the time, you really need to cut down on by making it more readable refactoring.

50:44 And you have the move. That's kind of the whole Zen of Python. Right. I think that's why it's a popular language is because it's like clean to read. presents itself. Well, right. So don't undo that by writing bad code, I guess.

50:57 Definitely. Yeah.

50:59 All right. Well, I think this is probably a good place to leave it, you guys. It looks like a really cool project. If people are using pi charm, or they're using VS Code, they could just go get the plugin and give it a try. Right? Yeah, for

51:10 sure. Just search for sorcery in the marketplace of the idea.

51:14 Yeah. Okay. So you get like, as you go to the plugin marketplace and pi charm, or you do the extensions, and V SK, it'll just be

51:21 straight with you.

51:22 As a computer source, not as an Gandalf. Yeah.

51:27 Yeah. And also, if you go to our website has further instructions for installing both the plugins and using it on GitHub. And it also has my if people have open source GitHub repo, they should just drop it in there. And it'll give them some ideas on Yeah, give it a try. Absolutely. And links to our documentation as well. Very cool.

51:49 All right. Now, before I let you out of here, got the two questions, I always ask it in the show. So let's be quick, since there's two of you, Brenda, how about you go first gonna write some Python code? What editor Do you use, I use them. I ended up with wrist injuries from refactoring code using Ctrl. And shift and the arrow keys to hertz decided to learn them. I've ended up with that as well a long time ago, and had to like re juggle a lot of interesting stuff have like funky heavy keyboards and all sorts of stuff. And yeah, try to use hotkeys rather than mouse a lot.

52:23 Yeah, it was turning into a real issue. So I had to learn firm, which slowed me down by about 10 times for 10 weeks. But now I feel as though it's Yeah, it's magic. And the nice thing is, it's awesome.

52:36 Nick, how about you write Python code? Well,

52:38 I use PI charm. And I mean, some bit visually impaired, and their high contrast mode is just really, really good. dabbles in VS Code a bit. I really like how it starts up super quick, but it's a little difficult to see. So I've made the switch,

52:50 I can imagine that'll definitely push you over the edge. Alright, then notable pi package, maybe not something that everyone necessarily knows. But it's like, oh, cool. I found this the other day. And you should check it out. Any ideas recommendations

53:01 are mostly one that was used as this package called nootka. to spell UI, and new it aka. And it takes your Python code, cross compiles it into C and then compiles the C code and creates an executable. And that was without that package sorcery just wouldn't exist as an overlay running projects, because you'd have all sorts of deploy issues. We're just delivering all of our source code with the plugins and the extensions dressing. So you're packaging it

53:37 out. You're packing up sorcery with Nuka.

53:40 Yeah, exactly. Yeah. So interesting. Okay, yeah, that's fantastic. It builds in the version of Python that you're using. And it reads all the inputs the workout, which bits of the code it needs to compile, it compiles the whole thing down. And it works on Mac, Windows and Linux. It's awesome.

54:00 It's magnificent. Very cool. I had Kay Hagan from the project on for Episode 174, which is like, like, year and a half, two years ago, I don't know quite a long while ago. But yeah, super cool. I didn't realize that it was so flexible and packaging up apps, but I thought of more as like scythe on like this little bit we could make faster. So that's good to hear. Very nice. All right, final call to action. People are interested in sorcery. They're interested in refactoring. And what do you tell them?

54:28 Try out now.

54:31 You have a GitHub account, you can start our repo and try out in five seconds. Or you can install it and get all of your pull requests reflected. Or if you're using VS Code or pi charm, go and install it right now. Try it out, get your code refactor as you work and let

54:49 us know I mean, we're really keen to get feedback from people and keep on making it better and better, basically,

54:54 awesome. Do you guys have like a GitHub repo? Or how should they give you feedback or say, You know my favorite refactoring is Whatever you guys don't do how did they make that happen?

55:03 Yeah we've got the sorcery I repo we can raise issues for just email us free look, read and answer every email that hello@sorcery.ai and our GitHub repo is sorcery that IRA slash sorcery. Cool. Okay, awesome. Well, Brendan Nick, thank you both for being here and creating this cool project. Looks awesome.

55:22 Thanks very much. Thank you very much for having us. Miko. Yep, you bet. Bye bye. Bye. Bye. Bye.

55:28 This has been another episode of talk Python to me. Our guests in this episode were Brendan McGinnis, and Nick tappin, and it's been brought to buy data dog and linode data dog gives you visibility into the whole system running your code, visit talkpython.fm/ data dog and see what you've been missing. But throw in a free t shirt with your free trial. Start your next Python project on the nodes state of the art cloud service. Just visit talkpython.fm/ linode li n od E, you'll automatically get a $20 credit when you create a new account. Want to level up your Python if you're just getting started, try my Python jumpstart by building 10 apps course or if you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed it /itunes. The Google Play feed is /play in the direct RSS feed at /rss on talk python.fm. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon