Refactoring your code, like magic with Sourcery

Episode #266, published Fri, May 29, 2020, recorded Thu, May 21, 2020

Episode Deep Dive Links Transcript

Refactoring your code is a fundamental step on the path to professional and maintainable software. We rarely have the perfect picture of what we need to build when we start writing code and attempts to over plan and overdesign software often lead to analysis paralysis rather than ideal outcomes.

Join me as I discuss refactoring with Brendan Maginnis and Nick Thapen as well as their tool, Sourcery, to automate refactoring in the popular Python editors.

Episode Deep Dive

Guests Introduction and Background

Nick Thapen and Brendan McGinnis are co-founders of Sourcery, an automated refactoring tool for Python. Both started their careers working on legacy systems in finance, cutting their teeth on older languages like IBM RPG, Delphi, and later Java. Their mutual interest in more modern programming languages led them to Scala and functional paradigms, eventually landing them at Imperial College London. There, they dove deep into machine learning, quantum mechanics simulations, and complex algorithmic challenges, experiences that shaped their vision for better code quality and developer tools. This collective background helped them form Sourcery, aiming to simplify and automate the often tricky process of refactoring Python code.

What to Know If You're New to Python

If you're just beginning with Python and want to follow along with the ideas of refactoring and code quality:

Understand basic Python syntax (loops, functions, and conditionals).
Familiarize yourself with abstract syntax trees (ASTs) conceptually, as Sourcery relies on AST-level transformations.
Recognize the concept of “Pythonic” code, which emphasizes readability and simplicity.
Have a basic editor or IDE (e.g., VS Code, PyCharm) set up with Python.

Key Points and Takeaways

Refactoring as an Ongoing Process Refactoring isn’t a one-and-done event; it’s a continual process of improving existing code without changing its functionality. A huge motivation behind Sourcery is to make refactoring proactive, catching issues or suggesting improvements as soon as you write code.
- Links and tools:
  - Refactoring by Martin Fowler
  - Clean Code by Robert C. Martin
Sourcery’s Core Approach Sourcery analyzes your Python code’s abstract syntax tree (AST) to detect patterns that can be improved and then automatically suggests refactorings in real time. It goes beyond simple transformations and can chain multiple small fixes, resulting in significantly cleaner code.
- Links and tools:
  - Sourcery Plugin for PyCharm
  - Sourcery Extension for VS Code
Plugin and GitHub Workflow While Sourcery works seamlessly as a local plugin, it also offers a GitHub integration. When you open a pull request, Sourcery can propose a follow-up PR with its suggested refactorings. This ensures the entire team benefits uniformly from code improvements.
- Links and tools:
  - GitHub Sourcery Bot
Teaching Pythonic Patterns For developers new to Python, seeing Sourcery’s suggestions can teach Pythonic idioms like list comprehensions, enumerate(), early returns, and more. Developers who might otherwise stick to a Java or C style can learn idiomatic Python incrementally.
- Links and tools:
  - enumerate() documentation
Strong Testing Practices Sourcery’s creators emphasize tests and type hints (via mypy) to ensure that refactoring is safe. They also run Sourcery’s refactorings against real-world open-source libraries like SQLAlchemy and Requests to confirm no functionality is broken.
- Links and tools:
  - mypy
  - SQLAlchemy
Identifying “Code Smells” The tool relies on metrics like cognitive complexity and duplicate code detection to identify code smells. This goes beyond just counting lines or branches, factoring in how many nested variables and conditionals a developer would have to track mentally at once.
- Links and tools:
  - Cognitive Complexity (SonarSource documentation)
Guard Clauses vs. Deep Nesting One of the most praised refactorings is converting deeply nested if-statements into simpler guard clauses that “return early.” This flattening makes the code more readable and reduces the mental overhead of tracking multiple nested conditions.
- Links and tools:
  - Martin Fowler on Refactoring Guard Clauses
From For-Loops to Yield From Sourcery can suggest advanced Python features like yield from to replace verbose for-loops that yield each item. This is especially useful for streaming or generator-based code.
- Links and tools:
  - PEP 380: Syntax for Delegating to a Subgenerator (“yield from”)
Automated vs. Manual Refactoring While manual refactoring is possible, it’s riskier if you don’t have comprehensive test coverage. Automated tools like Sourcery reduce the potential for introducing bugs, especially in large or legacy codebases where developer familiarity is low.
- Links and tools:
  - Web Search: “Legacy Code Michael Feathers”
Packaging and Deployment with Nuitka Sourcery uses Nuitka to compile Python to C, bundling everything into a standalone executable. This makes installing and distributing Sourcery as a local tool easier and sidesteps Python version mismatches.

Links and tools:
- Nuitka

Interesting Quotes and Stories

"No one really understands the domain when they first write the code. You have to write the code, find all the mistakes, and then tidy it up." – Brendan McGinnis

"We’re basically Grammarly for Python code. We improve the style and structure without changing the functionality." – Nick Thapen

"Sometimes you see a suggestion from Sourcery, and you think 'this must be wrong.' But once you look closer, it turns out your code was the one that needed fixing." – Brendan McGinnis

Key Definitions and Terms

Abstract Syntax Tree (AST): A tree representation of source code that tools like Sourcery use to analyze structure and semantics.
Code Smell: An indicator that something may be wrong in your code’s design, even if it works correctly.
Guard Clause: A short conditional check at the start of a function or block, returning early if certain conditions are not met, thereby reducing nesting.
Cognitive Complexity: A metric that measures how complicated a piece of code is to understand by counting nesting, conditionals, and repeated logic.

Learning Resources

Here are a few curated courses and references to deepen your Python skills and improve code quality:

Python for Absolute Beginners: A thorough, hands-on introduction to Python.
Write Pythonic Code Like a Seasoned Developer: Learn about idiomatic and "Pythonic" coding patterns.
Rock Solid Python with Python Typing: Dive into Python’s type hints and how they can make refactoring safer.
Getting Started with pytest: Boost your refactoring confidence with solid tests.

Overall Takeaway

Refactoring is a continuous journey, not just a cleanup you do once. Tools like Sourcery streamline this process by suggesting and automating Pythonic improvements without altering your code’s logic. By integrating best practices, strong tests, and a clear focus on readability, you can ensure your code evolves gracefully. Whether you're a beginner or an experienced developer, embracing refactoring leads to cleaner, more maintainable code, freeing up your time to focus on building great features rather than slogging through technical debt.

Links from the show

Guests

Brendan Maginnis: @brendan_m6s
Nick Thapen: @nthapen

Sourcery
Sourcery: sourcery.ai
Sourcery on Twitter: @sourceryai
VS Code and PyCharm Plugins: sourcery.ai/editor
GitHub Bot: sourcery.ai/github
For an instant demo ⭐ this repo, and Sourcery will refactor your most popular Python repo: github.com/sourcery-ai/sourcery

Python Refactorings article: sourcery.ai/blog

Nuitka
Talk Python episode: talkpython.fm
Nuitka site: github.com

Gilded Rose Kata: github.com
Episode #266 deep-dive: talkpython.fm/266
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #266 deep-dive: talkpython.fm/266

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Refactoring your code is a fundamental step on the path to professional and maintainable software.

00:04 We rarely have the perfect picture of what we need to build when we start writing code,

00:09 and attempts to overplan and overdesign software more often lead to analysis paralysis rather than ideal outcomes.

00:16 Join me as I discuss refactoring with Brendan McGinnis and Nick Thoppen,

00:21 as well as their tool, Sorcery, which adds automatic refactoring in the popular Python editors.

00:28 This is Talk Python To Me, episode 266, recorded May 21st, 2020.

00:32 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:52 This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:56 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:02 This episode is sponsored by Datadog and Linode.

01:05 Please check out what they're offering during their segments. It really helps support the show.

01:09 Brendan, Nick, welcome to Talk Python To Me.

01:12 Thank you very much.

01:13 Thank you for having us.

01:14 Yeah, it's great to have you here.

01:16 I'm such a huge fan of refactoring and code quality and all these ways of taking living software and making it evolve, right?

01:26 I think long gone are the days of we have to plan this perfectly, and then we're going to build the perfect thing that we've thought up, right?

01:33 And so having this idea of continuously evolving and improving code, it just frees you from worrying about trying to get it all right and you can just get started.

01:40 And so I'm really excited to talk about sorcery and refactoring with you guys.

01:43 Awesome. Yeah, we think iteration is super important as well.

01:45 Sort of trying to get a skeleton of the thing up and running and then sort of tidying up later.

01:51 Yeah, absolutely.

01:52 That's kind of how we like to work.

01:53 No one really understands the domain when they first write the code anyway.

01:57 You have to write the code, find out all the mistakes that you've made, and then tidy it up, clean it up.

02:03 Over time, like you say, it evolves into what looks to be a nice quality code base and solves the needs of the users.

02:13 Yeah, I think that's a super good point because understanding, you don't fully understand it until you've gotten mostly into it.

02:20 But if you tried to understand it all, it's just so much work.

02:23 Even if you get it right, it's so much work to get to that point that you might as well have just written it three times.

02:27 So that's awesome.

02:29 But before we jump into more of that, let's just get your stories.

02:33 And Nick, I guess we'll start with you first.

02:34 How do you all get into programming into Python?

02:36 So I guess the first program I did was back in school on these graphics calculators we had back in the 90s.

02:42 And I remember reading a book about complex numbers and being super proud of getting my calculator to draw a fractal.

02:47 I think it took about 12 hours to draw it.

02:50 But I end up with a Mandelbrot set on my little calculator.

02:54 Like the Mandelbrot set or something cool like that.

02:56 Could you zoom in?

02:57 I think you could zoom in, but it would take another 12 hours to show you.

03:00 You could only do it twice because then the battery would run out.

03:03 Pretty much.

03:04 I did a little bit of basics and stuff, but I never did programming in a serious way until joining a software company after university.

03:10 What are you studying at university?

03:12 So it's maths and philosophy.

03:13 Okay.

03:13 That's a cool combination, actually.

03:15 Yeah.

03:15 And there's like a lot of logic, which I guess is quite key to programming.

03:19 And then the first languages I learned in the software company were this mainframe language called IBM RPG.

03:24 I don't know if you've come across that one.

03:27 I could not look at RPG source code and tell you what it is.

03:30 Like I couldn't identify it from code.

03:32 Yeah, I've heard of it.

03:33 That's incredible.

03:33 It's sort of, it's like punch cards.

03:35 Everything has to be in the same, in the right column kind of thing.

03:38 Okay.

03:39 Because it was this finance company.

03:40 Brendan went there as well.

03:41 It sounds as useful, but I think I'll stick with Python.

03:43 Yeah.

03:43 I had this code base from the eighties.

03:46 Okay, cool.

03:47 And you had to program it in a green screen terminal.

03:48 So that didn't put me off.

03:50 Yeah.

03:50 If you go through that, you're definitely good for this industry.

03:52 If you can make it through that, Steve, you'll be good.

03:55 And then it was Delphi, which I also look back on as not being super amazing.

03:59 And then Java.

04:01 And then it was only when I joined Imperial and started doing sort of machine learning that I got into Python, I guess, mid 2010s.

04:08 Cool.

04:08 You're like, why does this language have semicolons?

04:10 What happened to it?

04:11 Yeah.

04:11 I mean, after Java, it just was like a breath of fresh air, really.

04:14 And when I had to go back to doing Java, it seemed so verbose.

04:17 Yeah.

04:17 You're like, why?

04:18 I can't read this.

04:18 There's symbols all over what it's trying to tell me.

04:21 Yeah.

04:21 Curly braces and parentheses and semicolons.

04:24 Yeah.

04:24 And the way they do the libraries and things, actually, it seems to be all done in a very verbose way.

04:28 Yeah.

04:28 Indeed.

04:29 Lots of boilerplate.

04:30 Nice.

04:30 Cool.

04:30 Well, Brendan, how about you?

04:31 The first programming I did was in my final year math degree.

04:36 So we had a project to do, which was to simulate various mathematical equations.

04:44 So the ones that I chose were the heat equation.

04:47 And it would simulate one wall being hot and another wall being cold and various obstacles between the two walls.

04:57 And then after it ran, it would show this chart of the temperature at different places in the room.

05:03 And the other thing I simulated was the Schrodinger equation, which quantum mechanics.

05:10 Yeah.

05:10 Quantum mechanics is so interesting.

05:12 And it's just like, it seems like such a weird Twilight Zone alternative reality.

05:17 And yet it seems to be applying to real reality.

05:20 It's such a weird world.

05:21 I love quantum mechanics.

05:22 Yeah.

05:23 It's amazing how accurate the predictions are.

05:26 And yet when you try to understand it, it just has no similarity to reality.

05:32 Just making that up, aren't you?

05:33 There's no way this is real.

05:35 When you get so small, it doesn't match up with what you understand.

05:39 Yeah, for sure.

05:41 How can this object be in two places at the same time?

05:43 What does that even mean?

05:45 Exactly.

05:45 Yeah.

05:47 Well, the equation, I was just simulating a single object in the waveform of it.

05:52 So it was quite simple.

05:54 But the way they did it was, here's min GW, go and program it in C, and then just send us off to go and do it.

06:03 And we knew nothing about how to program.

06:06 So my programs were just hundreds of lines of statements, absolutely no functions, nothing, no structure.

06:13 Yeah.

06:14 I think we probably all got-

06:15 Probably some memory leaks in there as well if it was written in C.

06:17 Oh, yeah.

06:18 The whole thing, yeah.

06:19 The whole thing was pretty terrible.

06:21 But you get to the end of the project and you simulate it correctly and you think, I can program now.

06:28 I know how to program.

06:30 Even if that's not entirely true, I think you do come away with this feeling of like, oh my gosh, look what I built.

06:36 Like, this is so awesome, even if it's not really.

06:39 It's just like that feeling of creating that thing is so cool in those days when you're getting started.

06:44 Absolutely.

06:45 I think that, yeah, the first time that I managed to output the heat equation onto the screen where you see the different temperature across the room,

06:55 they provided us with a charting library to use.

06:58 That was amazing.

06:59 And I was like, wow, I can program.

07:01 I can do anything.

07:03 I know computers now.

07:04 Yeah.

07:05 Yeah, yeah.

07:06 Super cool.

07:07 So, yeah.

07:08 After I finished university, I joined the same company as Nick.

07:12 I learned RPG and I learned Delphi.

07:15 I learned Java.

07:16 And then I achieved quite a senior role in the company.

07:21 I was part of the architecture team.

07:24 And I managed to introduce Scala to the company, which, yeah, we did a language comparison and it was between JVM languages.

07:32 So, one of the languages was actually J Python or Jiton.

07:38 We didn't go for that or JRuby because it wasn't first class language on the platform.

07:43 So, really, the main contenders were Java, Scala and Clojure.

07:49 And we had someone come in and talk to us about Scala, who was really, really amazing.

07:54 And he convinced us to use it.

07:58 So, yeah, I ended up leading the project to bring Scala into the company and re-architect a lot of the systems in that.

08:07 When I left that company, I was very much down the functional programming route.

08:12 So, I did some more Clojure, which is very functional.

08:16 I did some Haskell.

08:18 But then I joined Nick, Imperial College London.

08:22 And that's where he said he got into the Python.

08:28 That's where I got into it as well.

08:30 So, I joined a reading group that was all about deep learning.

08:35 And I went to the reading group and we read a paper, which was called Differential Neural Computer.

08:42 And I had no understanding what it meant.

08:44 And I was like, OK, I'm going to go home and I'm going to implement it.

08:48 So, I went home and I cracked out some Python, cracked out TensorFlow.

08:53 And it took me weeks and weeks to implement it because I had no idea what was going on.

08:59 That was the start.

09:02 So, I just started implementing more and more of these papers as I went to more and more reading groups.

09:06 And it tended to be all of the Atari playing reinforcement learning algorithms, which is really fascinating to me.

09:16 OK, cool.

09:17 So, I learned how to play Breakout and learned how to play Pong.

09:20 Right.

09:21 Maybe some Pitfall in there.

09:23 I never tried it on Pitfall, actually.

09:25 That was a good one.

09:26 Nice.

09:28 I'm really fascinated with these ways to train AIs around video games.

09:33 I mean, I'm blanking on one of the options.

09:36 But there's a handful of libraries that you can kind of plug your AI into the virtual world.

09:42 So, it has somewhere to interact with and things to interact with.

09:46 And yeah, that's fascinating.

09:47 Yeah.

09:48 I really did find it fascinating.

09:50 That's why I was doing it every week.

09:52 And then just around that time, AlphaGo came out as well.

09:56 And it was like a shock to the world.

09:59 Suddenly, the hardest game that humanity can play is beaten by a computer.

10:05 Yeah.

10:05 And it's one of the first, it might have been the first real AI opponent that used like

10:12 strategizing rather than just deep exploration of the paths.

10:17 Right.

10:18 Like the chess one is like, well, I can hold 12 steps ahead in every direction in my mind.

10:23 That's more than the chess master.

10:26 So, right.

10:26 Like we'll just play them all out, all the possible futures out and then go down the best

10:31 one.

10:31 Right.

10:31 But that's not how AlphaGo worked, which is, I think, part of the magic.

10:34 Yeah, it's got intuition in a sense.

10:36 That's the interesting bit.

10:37 Yeah.

10:38 Yeah.

10:38 It was just fascinating the way they did that.

10:41 I ended up writing a version of AlphaGo to play Kinect 4.

10:46 Okay.

10:46 Which is actually pretty strong.

10:49 That was a really fun project as well.

10:52 I mean, it can beat me.

10:53 You're a very strong player.

10:56 That's awesome.

10:56 Yeah.

10:57 I'm an average player out there.

10:59 Yeah.

11:00 Yeah.

11:00 How cool.

11:01 All right.

11:01 So what are you both working on the same project yet again?

11:04 What are you doing day to day?

11:05 What are you doing now?

11:06 We're both working on Sorcery full time.

11:08 We kind of started working on it back at the end of 2018 from Brendan's flat.

11:12 So I'd turn up and he'd be still in his pajamas eating cereal.

11:16 Sit down and code away.

11:19 Yeah.

11:20 We're kind of totally focused on making Sorcery as good a refactoring tool as can be really.

11:26 Yeah.

11:26 Cool.

11:27 We'll get into what Sorcery is more later, but what's the quick elevator pitch before

11:30 we dive into just more general software stuff?

11:33 I guess we've been pitching it as kind of Grammarly for code.

11:36 And if you don't know what Grammarly is, it's like it improves the style of the code without

11:41 changing the sort of meaning or the content.

11:43 I guess that's what refactoring is.

11:44 You improve this quality in the structure without changing the functionality.

11:48 And that's what we aim to do.

11:50 So as you're writing it, we analyze it and we suggest refactoring improvements sort of

11:55 as you go.

11:55 So maybe I got a loop and I'm doing some kind of accumulation into a list and it could say,

12:01 you know what, that could just be a list comprehension.

12:03 Yeah, exactly.

12:04 Yeah.

12:04 Okay.

12:05 That sounds awesome.

12:06 It's easy to just get focused on writing code and then not really worrying about the quality.

12:12 It's also equally easy to get super obsessed with the quality and not just get the thing

12:19 done.

12:19 Right.

12:20 So I kind of see like these two bimodal distributions.

12:23 One's like, I don't give a crap.

12:24 I'm not going wrong as it works.

12:25 I'm just going to write, write, write.

12:26 Yeah.

12:27 And, you know, that's one group of people's philosophy.

12:30 The other is they're like really slow and super meticulous to get it just right because

12:36 they want to write it the best way.

12:38 And it sounds to me like the tool would let people kind of find a middle ground, right?

12:44 Like write it a little more loose and free.

12:47 But then it says, oh, by the way, that thing you just wrote, actually, we could make that

12:50 way better if you just let us.

12:51 Yeah, that's kind of the idea.

12:52 And that's some of the feedback we've been getting.

12:53 That's people write codes a bit more quickly without worrying so much about the quality and

12:57 it'll kind of tidy up for them.

12:59 Okay, cool.

13:00 And the other aspect of it is some people don't actually know what good quality code

13:07 is.

13:07 So if you're starting out in Python, you may be able to write the solution, but you won't

13:13 necessarily know how to write it well.

13:15 So you may not even know about this comprehensions yet.

13:19 Right, right, right.

13:20 And the benefit of using Sorcery in that case is it can teach you the Pythonic way of writing

13:29 code.

13:29 This portion of Talk Python To Me is brought to you by Datadog.

13:34 Are you having trouble visualizing bottlenecks and latency in your apps and you're not sure

13:38 where the issue is coming from or how to solve it?

13:40 With Datadog's end-to-end monitoring platform, you can use their customizable built-in dashboard

13:46 to collect metrics and visualize app performance in real time.

13:49 Datadog automatically correlates logs and traces at the level of individual requests, allowing

13:55 you to quickly troubleshoot your Python application.

13:57 Plus, their service map automatically plots the flow of requests across your application

14:02 architecture so you understand dependencies and can proactively monitor the performance

14:07 of your apps.

14:08 Be the hero that got your app back on track at your company.

14:12 Get started today with a free trial at talkpython.fm/datadog.

14:17 So maybe you don't necessarily know the idioms.

14:19 I think one of the challenges of Python in particular, I mean, all languages have this problem, but

14:25 Python suffers more than others from it.

14:28 One is that it's so easy to learn that people feel like they can learn it in a weekend and

14:34 then they just go write code in it, right?

14:36 Because like, look, it's a simple little language.

14:38 There's not a whole lot to it.

14:40 Which I think is actually not true, right?

14:42 I still feel like I'm learning Python every day.

14:44 I'm like, I didn't know, or I should have done this, right?

14:48 Like, there's all these things.

14:49 There's just, there's so much nuanced detail to it.

14:52 But it's easy for people to come from a language like C or Java or something else and just do

14:58 the Java style programming there or the C style programming there.

15:03 And not, like you said, not even know that there's this other component, right?

15:08 They say, oh, Python's crappy.

15:10 It doesn't have, you know, a four, a numerical for loop.

15:14 This is crummy, right?

15:15 When really a better way to do it would be to use enumerate collection where you get the

15:20 index and the item, right?

15:22 You don't have to go back.

15:22 If you didn't know, like that's multiple layers.

15:25 One, you have to know there's a four in loop.

15:26 And then two, you have to know that enumerate is a thing.

15:29 And then you've got to know about tuple and back.

15:30 And like, that's a pretty complex set of topics.

15:34 If you've, well, I spent the weekend now, I kind of know, let's, let's finish this project.

15:38 Yeah.

15:38 If you took that philosophy.

15:39 Absolutely.

15:40 I mean, like you said, the learning, the Python language, and it's so beautifully simple that

15:46 most people can pick it up in a week to a month's time.

15:50 But then you've got all of these items and they're not just the different bits of syntax

15:56 or the enumerate.

15:58 There's multiple libraries all over the place.

16:00 And they're the libraries that are built into the Python main library.

16:05 But there are also all of the libraries out there that you need to accomplish more complicated

16:11 tasks.

16:11 Yeah.

16:11 Just finding out what the best library is for the job is a difficult piece of research often.

16:17 It is.

16:17 Sometimes it's something built in, like inner tools or something like that.

16:21 Or other times it's something you've never heard of because there's 200,000 options and

16:26 they all have their, their behaviors and whatnot.

16:28 But if you would grab it like that would take 20 lines down to one, probably be faster at the

16:33 same time.

16:34 It's like, it's incredible.

16:35 Right.

16:35 That's kind of what I was thinking.

16:36 I'm like, I've never done learning Python because like, oh, there's this other standard

16:40 library module I discovered where I wasn't using counter in this way or, you know, like

16:45 whatever it is, right?

16:45 There's just all these options.

16:46 Yeah.

16:46 We've definitely come across that.

16:47 Sorry.

16:48 Come on.

16:48 Yeah.

16:49 I can imagine.

16:50 I can imagine.

16:50 And coming from an academic space, I'm sure you see that a lot there as well.

16:54 Cause there's probably a lot of people who don't see themselves as a developer, but they're

17:01 still touching Python, writing code.

17:03 It's full of code you kind of write once and then don't, don't look at again.

17:07 So it's kind of, I guess done with a different aim in mind.

17:10 You're not, they're not so worried about reuse by other people often.

17:13 Right.

17:14 Right.

17:14 It's actually been quite a challenge for us because we learn Python together effectively

17:20 at the same time.

17:22 And we've never written Python in a large code base apart from our own.

17:29 So we've had to learn all of these things ourselves.

17:33 So over time, we've rewritten bits of the system where we found out, oh, we can use this,

17:39 this feature of Python that makes it so much better.

17:43 And, at one point we integrated my pie into our code base and that was a big improvement.

17:51 And you say, it's slowly learn about these options.

17:53 And I think a lot of people out there who are learning Python probably fall into that bracket

17:58 as well.

17:58 Then they're learning it as their first language.

18:01 They're not learning it with people who can guide them how best to use it.

18:05 And it's really hard on your own.

18:07 I mean, we're experienced developers and I think we're finding it really hard ourselves.

18:12 Yeah.

18:12 To do it right and to take full advantage of it, it is.

18:15 So it's cool to have like extensions for the IDE that will sort of not quite be a paired

18:20 program partner, but someone to sort of sit there.

18:22 But you know what, that actually is not the right way to do it, but it's super easy to fix

18:26 and I can take care of that for you, right?

18:28 Exactly.

18:28 Yeah.

18:29 That's exactly what we...

18:31 That's awesome.

18:31 So we talked about code quality and, you know, that's a little bit in the eye of the

18:37 beholder.

18:37 It's also a little bit in the trade-off.

18:38 Like Nick, you talked about this concept of I'm going to write a script and get an answer

18:42 and never run it again, right?

18:44 That has a different threshold for code quality than, you know, the core trading engine at a bank,

18:51 right?

18:52 Like it would probably be improper to put that much energy into that script that's going

18:59 to be run once, right?

19:00 You should just write the thing and get it to work and not worry too much about it.

19:03 But at the same time, if you're building something to be reused and is important, it's going

19:07 to be run by lots of people.

19:08 You really want to get it right.

19:10 And so I think there's this spectrum and people got to like figure out where they live on it.

19:14 But no matter where you live, you would like to have better code quality rather than less

19:19 good code quality by like whatever applies to your situation.

19:22 Right.

19:22 Oh, definitely.

19:23 And it is kind of hard to quantify what code quality is.

19:25 Sometimes, you know, when you see it, I guess it was reading the book Clean Codes.

19:29 I think it's by Bob Martin.

19:30 Robert C. Martin.

19:30 Yeah.

19:31 Yeah.

19:31 I really crystallized it for me.

19:32 And I was always trying to get the graduates at the old company I worked at to read it.

19:36 And eventually one of them stole it.

19:37 So I think, I guess that means they liked it.

19:41 You won.

19:41 Yeah.

19:41 It worked.

19:43 You converted and they stole your book.

19:45 I guess the real core of it is just high quality code is easy to read and understand.

19:50 It just reads like a sort of story.

19:52 Does this and this and this.

19:53 Yeah.

19:53 I really love this idea of clean code and the stuff that Bob Martin talks about.

19:57 He's got some really good ideas.

19:59 I don't totally agree with everything he says, but I think there's a lot of good lessons

20:02 to take from what he's doing.

20:04 Yeah.

20:04 Sure.

20:05 Yeah.

20:05 One of the really interesting ideas that comes from one of his contemporaries, Martin Fowler,

20:10 way back at the origins of refactoring.

20:13 I remember reading the book called Refactoring in 1999 or something like that, just going,

20:20 my mind is blown, right?

20:22 I've had this problem of bad code quality and I've had this problem of trying to write it

20:27 well or to fix it.

20:28 And then I realized, you know, reading what he was talking about, like, oh, there's this

20:32 way to take the bad stuff you've already put down as like sediment in the software.

20:38 It's crystallized and like turn that into something that can be improved and grown over

20:42 time.

20:42 And I just, I remember it really changed my way of thinking about programming, like digging

20:47 into refactoring.

20:48 So I'm just such a huge fan.

20:49 How do you guys come across it?

20:51 It's an interesting one because the code base we worked on our old company was so huge and

20:56 difficult to change.

20:57 Often we didn't even try refactoring it.

20:59 It's just kind of, you did a surgical approach.

21:01 You went in and tried to understand it and made the smaller changes you could.

21:04 Please don't break.

21:05 Just take the new feature without breaking.

21:07 Or I guess, Brendan, you actually sort of took the other approach and said like, okay, I'll

21:12 just rewrite this whole bit.

21:13 Yeah.

21:15 Well, I mean, it was a risky approach given that the system was not under test at all.

21:21 So I would only do that on front end components.

21:25 So the UI, but yeah, sometimes it was, it was just the case of, I don't understand what's

21:31 going on here.

21:32 I can see what the functionality currently is.

21:34 So I'm just going to re-implement it from scratch.

21:37 You know, if people are in that space, like the whole area of what Michael Feathers talked

21:42 about with legacy code and like how to take these, these huge systems that are hard to

21:48 change that you don't necessarily know.

21:49 They don't have tests and how to like break off little bits that are maintainable.

21:53 That's such a cool book working effectively with legacy code.

21:56 I really do enjoy sort of tinkering with code, tracking down bugs, sort of improving it, making

22:01 it a little better.

22:01 I think when I see a blank sheet of paper or blank screen, I kind of sometimes find it difficult

22:06 to start.

22:06 I think Brendan's a bit better at that.

22:08 So I, yeah, I super enjoy refactoring.

22:10 And I guess it's one of the things we're trying to kind of achieve with Sorcery is sort of if

22:16 there's a machine that can do the refactoring for you, you can be less worried about it

22:20 being under test because, you know, it's done proper analysis.

22:23 Because whenever I do refactoring, I break something.

22:26 And I would lean heavily on the tests.

22:29 It's almost never seems like a good idea to me to do a refactoring manually.

22:33 If there's some sort of like tool-based way in which it will happen, right?

22:39 It's just, you never know what little thing you're going to, you're like, oh, there's that

22:42 one, that cron job thing that we had that was also using that.

22:46 Now, apparently it's not going to take it anymore.

22:48 And with, with Python, you don't have compiling, right?

22:50 So you're not going to catch the obvious stuff.

22:52 Like I move this function over here.

22:53 It's just like, well, don't run that part.

22:55 It's going to crash.

22:56 That is one of the challenges with Python.

22:58 And I mean, the way we've approached it with our Sorcery code base is testing and

23:07 mypy type annotations.

23:09 Yeah.

23:09 I think they give enormous confidence when you're refactoring the code.

23:13 So nowadays I don't do crazy refactors, crazy rewrites like that.

23:18 I incrementally improve through small changes.

23:23 But yeah, I've experimented with that once and realized it's not the way to go.

23:28 But yeah, the real key is having those tests and the type annotations.

23:35 You can move something anywhere in the system and you'll get told about all of the errors

23:41 that you now have.

23:42 And then you can go and fix all of those.

23:44 And then you can do the next refactoring and build it up through there.

23:49 Most of our bugs are in the bits we hadn't added mypy to.

23:53 So I totally expected to hear automation from you guys and applying Sorcery back unto itself

24:00 and things like this.

24:01 And we'll dig into the features in a second.

24:03 But I didn't expect to hear typing in mypy.

24:06 I'm personally a huge fan of the type annotations in Python.

24:09 I think they make working with Python code so much easier.

24:13 You don't annotate everything, but certain places like this function returns one of these,

24:20 just knowing like actually it expects to return one of these.

24:22 It's super helpful and it'll light up the editors as well, right?

24:26 They can all of a sudden give you autocomplete for what they weren't sure before.

24:29 But now they know, oh, here's the five things you can do with what you got back.

24:32 Perfect.

24:33 How does mypy fit into your world?

24:36 I think that's pretty interesting.

24:37 What are you doing with that?

24:38 We really only use it internally because most code out there doesn't actually have mypy

24:43 or type annotations.

24:45 Right.

24:45 Even if it has some type annotations, there's like sort of a chain of annotations that have

24:51 to be consistent.

24:52 Like mypy is a stronger level than just saying, oh, this function happens to return a list.

24:56 Yeah.

24:56 I mean, one of the great things about type annotations is you can just look at a function

25:02 at the definition of it and understand the interface.

25:06 You don't need to read the code.

25:08 Without those type annotations, you have to actually read the code and say, oh, actually

25:13 this is a string and this is an integer and it returns a list of integers or something

25:19 like that.

25:19 Yeah.

25:20 With the type annotations.

25:21 Yeah.

25:21 That's a really good point.

25:22 I like that.

25:23 So I think type annotations are a form of documentation.

25:27 They're really, really powerful just from a readability point of view.

25:32 But then you get all of the security as well of knowing when you've broken the code or when

25:37 you've not called something correctly.

25:39 One of the things that is really important to us with Sorcery is that we never break other

25:45 people's code.

25:46 So we have to have extremely strong gar...

25:50 Yeah.

25:50 So let's take a step back and why don't you tell people about what Sorcery is?

25:54 How do people use it?

25:55 So you mentioned that it's a plugin for IDEs, but give us a little bit more detail and then

25:59 we can talk about how you keep from breaking people's code, which is probably...

26:03 People probably appreciate that.

26:04 If you...

26:06 It acts as a plugin to your IDE.

26:09 So we've got plugins for VS Code and PyCharm.

26:13 And as you're coding away, it sits there reading your code and analyzing it.

26:18 And if it identifies a change to a function that you're working on that will improve the

26:26 code quality, it'll suggest it to you.

26:28 And you can review that suggestion.

26:30 And if you like it, you can accept it.

26:33 And it'll apply that change in line.

26:35 And you carry on coding it.

26:39 It works almost seamlessly in your workflow.

26:41 Okay.

26:42 That sounds awesome.

26:43 Does it change the way the editors work in other ways?

26:46 Like, for example, does it change the autocomplete or things like that?

26:51 Or is it really more like the code intentions, like the little light bulb in PyCharm?

26:56 Yeah, it's kind of exactly like a code intention in PyCharm.

26:58 That's kind of the thing we've gone with.

27:00 Yeah.

27:01 So yeah, it does a little underline.

27:02 Cool.

27:02 So as you're going along, you're watching...

27:04 Oh, there's a little pop-up.

27:05 I should go see what this is about.

27:07 Yeah, yeah, exactly.

27:08 And it runs locally on your machine.

27:09 I guess that was quite a sort of concern for people.

27:11 They didn't want their code being sent to the cloud.

27:13 I can't imagine why.

27:14 So when we first started it, we were going to do it kind of as a service in the cloud.

27:20 And we kind of had to do a pivot and get it running locally on the machine.

27:23 Okay.

27:24 So somehow behind this, how does it make decisions about refactorings?

27:28 Is it like an AI-based thing?

27:32 Is it pattern matching?

27:32 Like, what is it doing inside?

27:34 So at the moment, there's no machine learning or AI in it.

27:39 The way it works is it is essentially pattern matching.

27:43 So it's looking for...

27:44 Well, it works at the level of the abstract syntax tree.

27:48 So it takes the code and it parses it into a data structure.

27:53 And that data structure will have, for instance, an if node.

27:58 And within that if node, it will have a function call.

28:03 And the if node will also have a list of statements.

28:07 It looks at those nodes and it looks for the patterns, like you say.

28:12 So, for instance, it might look for a for loop that has an if statement within it that appends to a list.

28:20 And then it says, okay.

28:22 Like, that's a list comprehension waiting to be made right there.

28:25 Yeah.

28:26 Exactly.

28:26 And I guess the clever bit is it kind of has these little, lots of little tiny little patterns of improvements it can do.

28:33 But it can compose those together into like a bigger refactoring.

28:36 And it's guided by a load of code metrics we've kind of incorporated into it.

28:40 So we can get into those later.

28:41 Like cyclomatic complexity and some of those types of things?

28:45 Yeah.

28:46 So we don't use cyclomatic.

28:48 We do use cognitive complexity, which I think is like a trademark of SonarCube or something.

28:52 But it's a different metric.

28:54 And we use a few we've written ourselves.

28:56 And so it can kind of chain together little refactorings to do something bigger.

29:00 So, for example, on the, I don't know if you've seen the Gilded Rose refactoring CARTA.

29:06 No, tell people about it.

29:07 So it's kind of this big fantasy.

29:11 Wait, let's take a step back.

29:12 What's a CARTA?

29:13 So a coding CARTA is a coding exercise to improve your programming.

29:17 Right, right.

29:18 And there are various ones of these floating around the internet.

29:20 And Gilded Rose is like this big, complicated set of nested ifs, basically.

29:24 And it's sort of about this fantasy in, I think.

29:28 And it takes maybe an hour to kind of manually sort the code out and refactor it.

29:33 And it's sort of an exercise people do.

29:34 And our aim when we started Sorcery was to, this was like our initial target problem.

29:40 So it can kind of do all that work at once by chaining together lots of little refactorings.

29:45 So it can take the sort of complex mess of spaghetti code and then turn it into something understandable.

29:51 Right.

29:52 Instead of having to say, okay, here's a little if statement that could be improved and then apply it again and say, well, now that we have this code, there's another thing we can improve than apply it again.

30:02 It'll like chain those all together and go, actually, we could roll this all up.

30:05 Exactly.

30:06 Yeah.

30:06 Because when you're doing manual refactoring, that's kind of what often happens.

30:08 You sort of do a little thing and then you realize, oh, now I can do this.

30:11 And you might have an aim in mind or you might not.

30:14 And then you start chaining these things together.

30:16 And in the end, you're like, oh, now it's an understandable code base.

30:19 Yeah.

30:20 Very cool.

30:21 Well, that sounds super, super useful.

30:23 I know that some refactorings build into certain tools like PyCharm has certain refactorings, but they don't seem to take this more holistic approach.

30:31 Right.

30:32 They're like, oh, this list, this list comprehension could be expanded to a for loop if you need it or something like that.

30:37 But that's kind of as far as it goes.

30:38 Yeah.

30:39 And they're kind of very developer driven.

30:40 You have to know you want to do them.

30:41 Yeah.

30:42 And you have to know where you can do them and then you have to do them.

30:44 So they're very useful if you know you want to do something because it'll do it for you.

30:47 And like, it won't make mistakes, but they don't sort of, they don't tell you if it's a good idea or not.

30:52 So our idea is we're kind of suggesting things that we think are good ideas to actually change.

30:58 This portion of Talk Python To Me is brought to you by Linode.

31:01 Whether you're working on a personal project or managing your enterprise's infrastructure, Linode has the pricing, support, and scale that you need to take your project to the next level.

31:10 With 11 data centers worldwide, including their newest data center in Sydney, Australia, enterprise grade hardware, S3 compatible storage, and the next generation network.

31:21 Linode delivers the performance that you expect at a price that you don't.

31:25 Get started on Linode today with a $20 credit and you get access to native SSD storage, a 40 gigabit network, industry-leading processors, their revamped cloud manager at cloud.linode.com, root access to your server, along with their newest API and a Python CLI.

31:41 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically get $20 credit for your next project.

31:50 Oh, and one last thing.

31:50 They're hiring.

31:51 Go to linode.com slash careers to find out more.

31:54 Let them know that we sent you.

31:56 Let me share one of my favorite concepts from refactoring and then ask you about some of your favorite refactorings.

32:02 So there's all these different refactorings, even in the early days that Mark Fowler talked about, like, okay, there's a God object and here's how you break it down.

32:10 Or there's a function that's too long.

32:12 Here's what you do and so on.

32:13 But those are all kind of interesting.

32:15 Like, the most interesting concept around all that stuff to me was the concept of a code smell.

32:20 Right?

32:20 Just like, there's something wrong with this.

32:24 Like, it works, but your nose kind of turns up when you look at it.

32:28 You're like, there's something wrong with this part of my code or code probably I inherited from somebody else.

32:33 Right?

32:34 And the other thing was he would talk about comments and say, often comments have value, but a lot of times they're really just deodorant for these code smells.

32:45 Like, this is really hard to understand because it's written badly.

32:47 So let me write a comment that tells people what it really means.

32:51 Yeah.

32:51 And then just leave the bad stuff there.

32:53 Right?

32:53 It kind of deodorizes the code smell a little bit.

32:55 And that's this idea of, like, if you have those comments, it's like this underlying thing of, like, you should start thinking of applying these different refactorings.

33:03 So my question to you with, like, sort of putting that out there is, what are some of the favorite refactorings that you guys are seeing possible with, like, this deeper integration, right?

33:14 Like, obviously, for loop to Liz comprehension, Liz comprehension to for loop.

33:18 Like, those are pretty straightforward, but it sounds like there might be either things you just really love or there might be, like, some more interesting, larger refactorings.

33:26 The main code smell that I think Sorcery does really well with is eliminating duplicate code within a function.

33:34 And in particular, within different branches of a complicated set of if expressions.

33:41 So you may have the same body of code in two different places.

33:46 And Sorcery can restructure the code until there's just a single condition that applies for that block of code.

33:54 That's cool.

33:54 Yeah, I think that lets me delete lines of code is always very pleasing.

33:57 So, yeah.

33:58 Yeah, yeah.

33:59 And delete conditionals as well, right, if possible, or simplify them.

34:03 You know, I feel like the more the word legacy gets applied to a code base, the less you want to do those kinds of things.

34:08 You're like, I'm pretty sure these three things are the same, but I don't want to be responsible for what happens if I misunderstand that these are actually slightly different.

34:17 And try to refactor it to a cleaner version.

34:19 So as much as you can get software, does you go, actually, no, this is totally safe.

34:22 We got you.

34:23 It's actually quite interesting because one of the things that has happened is we've suggested refactorings to people and they've gone, this is incorrect.

34:35 And there's been these examples of eliminating duplicate code and simplifying if expression.

34:42 It's kind of once you work through the logic.

34:43 Yeah.

34:44 We've discovered actually sorcery is correct here.

34:49 And it turns out your code was either very confusing or possibly had a bug in it, which you have now identified.

34:56 Right, right.

34:57 You thought these two things were doing different stuff, but in fact, it has no...

35:00 The effect of this is not what you had in your mind its actual effect being, right?

35:05 Yeah.

35:05 You misunderstood what was actually happening and your mental model didn't match the refactoring result, but that's because it wasn't actually doing that.

35:13 Exactly.

35:14 And so actually, once it's done that refactoring, you can say, oh, actually, there is a bug in my code.

35:20 I can fix it.

35:22 But you have to have the trust in sorcery to know that it's correct before you're willing to take that step.

35:30 So it takes a little bit of usage to build up that trust.

35:33 So how do you guys check that your refactorings are valid?

35:37 So like we said, we have a library of smaller refactorings, and then we have a search engine that composes those together.

35:47 So the important thing is making sure each of those individual refactorings in the library is correct.

35:53 Right, right.

35:53 Because composing a bunch of things that are correct is not going to break anything.

35:56 Exactly.

35:57 Yeah.

35:58 So the challenge is to try and make sure that those individual ones are correct.

36:04 So we have lots and lots of tests, and those tests are of the form.

36:09 Here is a piece of source code, and here is the expected refactored source code.

36:15 And for each refactoring, there's a multitude of those.

36:18 But there's also a multitude of tests of the form.

36:22 Here's a piece of source code that looks similar to these other bits that you have refactored,

36:27 but you shouldn't refactor it because if you do refactor it, or if you do make the change, you'll break the code.

36:34 There's got some gotcha in there, yeah.

36:37 It's not a true refactoring.

36:38 And it will tend to be things like you're calling a function, and so you can't swap these statements

36:45 because one of them is actually a global variable.

36:48 So we have an awful lot of analysis, which determines what statements in a function depend on the other statements.

36:57 Yeah, it sounds interesting.

36:58 It turns out to be the hardest problem that we've tried to solve.

37:01 I can imagine.

37:02 Have you guys looked at using things like hypothesis or other property-based testing,

37:06 where it's like, here's a block of code, apply refactoring to it, feed it a bunch of inputs to both versions, and see if as long as you get the same outputs,

37:16 or things like that?

37:17 That's a future plan that we have.

37:20 We do have a second form of testing that we do at the moment, which is, as part of our build process,

37:26 we run Sorcery over a whole bunch of popular open source libraries and refactor them.

37:33 And then once they've been refactored, we run their tests over themselves

37:39 to check that we haven't broken any of their code.

37:42 Because people have already written a ton of tests for SQLAlchemy, for requests or whatever, right?

37:48 Exactly.

37:49 So, I mean, that has identified, I mean, when we first introduced it, identified lots of issues.

37:55 And since then, it's stopped us releasing any new bugs as far as we're aware.

38:02 That sounds like a pretty good way to just hit it with enough information that it's going to get caught in the issues.

38:07 Have you found errors in other libraries because of this and gotten back to them?

38:11 Like, you know what?

38:12 This is actually, we thought our stuff was broken, but actually your stuff is broken.

38:17 I mean, that'd be cool.

38:19 I mean, sometimes the tests are failing in master.

38:22 So after a while, we just decided to pick a tag and stick with it and be done with it.

38:28 But actually, we found that there's just a lot of backcode to review.

38:33 So we tend not to review it at the moment.

38:37 Yeah, it's not your job to check all open source libraries for correctness, right?

38:42 Exactly.

38:43 And some of the libraries like SQLAlchemy that we do run it on are absolutely enormous.

38:48 There's hundreds of files and multiple drivers for different database backends.

38:54 So it takes a long time to run those tests.

38:57 Yeah.

38:58 We just rely on the test to tell us whether we're good to release or not.

39:01 Yeah, that's cool.

39:02 But yeah, the hypothesis-based testing is a very interesting idea that we have talked about.

39:09 So the way we were considering doing it was exactly how you talked about.

39:14 You write a piece of code and then you put some inputs and you check what the output is.

39:21 And then you run Sorcery over it.

39:23 And the way we were going to do it is actually also write a generator for source code that takes

39:29 maybe an initial piece of code and does random mutations to it to start off with.

39:35 So here's a piece of code that Sorcery should refactor.

39:38 Let's apply a bunch of random mutations to it and then run Sorcery over it.

39:46 Check the inputs and outputs are the same again.

39:48 So yeah, the generator for that would have been quite interesting to write.

39:53 And it is something that we're considering in the future.

39:56 Okay.

39:56 Yeah, it sounds cool.

39:57 It sounds like it would take forever to run, but it sounds like a cool project.

40:00 Maybe don't run it on every save.

40:03 All right.

40:04 Well, let's see.

40:06 There's a bunch of things I want to ask you about, but I don't want to go over too much

40:10 on time.

40:10 So I guess one area that looks interesting to me is we've talked about this being a plugin

40:16 for an editor that's interactive.

40:17 You also talk about just applying it to open source libraries.

40:21 And on the Sorcery homepage, I see that there's a get instant quality of your Python code base,

40:28 like just point it at your repo and it'll give you some answers.

40:31 So is there more stuff that it does than just be a plugin?

40:34 Yeah.

40:35 Or is there like a CLI way to use it?

40:38 So it is also available as a GitHub bot.

40:41 So you install the Sorcery bot into your GitHub repo.

40:48 And every time you do a pull request, Sorcery will review that pull request.

40:53 And if it finds any improvements to any of the files that have been touched by the pull request,

41:00 then it will create a pull request on top of that saying, here's the changes that you can make.

41:05 An improvement to it.

41:07 Yeah.

41:07 Then you can just merge that pull request in straight away.

41:10 When you first install it, also it can refactor the whole library kind of all at once.

41:15 Very cool.

41:16 All right.

41:17 So let's talk about pricing.

41:19 So this is something that is free for some people, but it's not free for some other people.

41:24 What's the story with the whole business model, open source side of things?

41:28 Like what is, what are you guys offering here?

41:30 Because it sounds really useful to a lot of people, but at the same time, you are charging some folks for it.

41:36 So that might, you know, that might influence people's opinion on how they feel about it.

41:40 Yeah.

41:40 So the plugins are free at the moment.

41:43 We think in the future, we'll probably introduce a premium version and still have a free version.

41:46 I see.

41:47 So if I'm sitting here and I want to write code on my MacBook on PyCharm, I can just go get it for free.

41:52 I don't have to pay anything.

41:53 Yeah.

41:53 You can just go get it for free right now and not pay anything.

41:55 And that's whether you're open source or closed source or anything.

41:58 I could be working for a bank even, huh?

41:59 You could.

42:00 We have had users from banks working on it or using it.

42:03 So yeah.

42:04 Right on.

42:04 But there is some business model where you guys charge money for something.

42:08 So what are you, what is that side of the things?

42:10 Yes.

42:10 So for the code review, it's free for open source again.

42:13 But if you want to use it on a closed source repository, there's a small charge per developer

42:18 per month, basically.

42:19 And that's something we've only just released in the last few days, basically.

42:23 All right.

42:24 Cool.

42:24 So basically, if I'm going to apply it to my code base as a autonomous bot type of thing

42:30 and I'm open source, it's 100% free.

42:33 Right.

42:33 So if I was taking care of requests or SQLAlchemy or Flask or whatever, I could just plug it

42:38 into the Flask repo on GitHub.

42:40 And all of a sudden it would solve those problems of like, you didn't seriously just give me

42:45 a four loop that depends to a list, did you?

42:46 Right.

42:48 Yeah, exactly.

42:49 So yeah.

42:50 Particularly useful for people maintaining large open source libraries because they'll get a

42:57 lot of pull requests and they may come in at various standards.

43:00 So it does the initial code review for the maintainer of the project.

43:05 There's another way of trying it out.

43:07 If you have a GitHub account and that is simply to star our public repo and the GitHub bot,

43:16 our GitHub bot will find your most popular Python repository and send you a pull request

43:23 to refactor your code base.

43:26 So it's as simple as click a star and you'll get a pull request.

43:30 Oh wow.

43:30 Okay, cool.

43:31 Yeah.

43:32 I mean, it seems to me really useful to have it just built into GitHub automatically looking

43:38 over the code because I don't know.

43:40 You all have worked with different groups of people at different companies, different languages.

43:43 My experience has been that people that care about code quality and refactoring and testing

43:49 and maintainability and patterns and all that kind of stuff.

43:53 There's a massive spectrum on any given team.

43:57 Some people, it really matters to them and others.

44:01 Those failing tests and that failing build is just a nuisance.

44:04 And how do I turn off the build so I don't have to hear about it again?

44:07 Right?

44:07 And so having it as part of the repo means that I get kind of applies to everyone.

44:15 At least it suggests for everyone.

44:17 Whereas if it's just in the editor, there's going to be the people who love it and the people who just

44:22 like, how do I uninstall this or disable this?

44:24 So it doesn't, because it's just, I wrote my code and I don't want it to, you know what I mean?

44:28 Like there's just, it doesn't matter how much advocacy there is.

44:31 There's going to be that.

44:31 And so having that kind of external is pretty cool.

44:33 Yeah.

44:34 Like we started doing in the editor because we thought that was kind of the way to really

44:37 make you write code faster and kind of you hack it a bit and it straight away does the change.

44:42 But like you're saying, definitely that's why we introduced the GitHub plugin because the

44:47 code review, because not everyone's at the same level.

44:51 It kind of brings things up to a level.

44:53 Yeah.

44:53 So it's got, it gives you the benefit as a beginner programmer in the team.

44:57 So you get those code reviews, but also as the experienced developer, it saves you time

45:03 doing the code review because there's already a tool doing the simple steps.

45:08 It's not dealing with the architectural elements of it, but it's making sure each function is

45:13 nicely written.

45:14 Yeah.

45:14 I think that makes a lot of sense because it doesn't matter how good you are.

45:18 You don't want to have to go think of the implications throughout the whole code base.

45:21 Absolutely.

45:22 And we're just looking to say the tool set is good.

45:24 So it's good.

45:26 Press merge.

45:27 Yeah.

45:28 Beautiful.

45:28 But I do think it's really important that it's an editor because it teaches you, maybe it

45:33 could teach you.

45:33 It teaches you the idiomatic, the Pythonic ways of writing things.

45:36 You're like, I had no idea that I could create a for loop with enum that had tuple unpacking

45:41 instead of like trying to do a for over range and then pulling out the item and things like

45:46 that.

45:46 So it seems like a really cool combination.

45:49 Yeah.

45:49 I think definitely that educational thing is something we want to focus on more, like improving

45:52 our documentation.

45:53 So I wrote a blog post recently with a few little refactorings.

45:56 We're doing like why we think they're a good idea as opposed to just what we've done.

45:59 Is it the one that is called Python refactorings part one?

46:02 Yeah.

46:02 I looked through that.

46:03 So maybe you could give us a couple of the refactorings out of there that you like.

46:07 Code hoisting is like one of the best things because anytime you've got duplicate code,

46:11 you've got a way you can introduce mistakes really easily.

46:14 So that would be like maybe you have the same code in an if and an else statement.

46:18 Yes.

46:18 And it's just duplicated.

46:19 So often people will write sort of a bit at the end of the same thing in the if and the

46:24 else or in loads of elifs even maybe because it happens in every branch.

46:28 It means it always happens.

46:29 So it doesn't actually need to be in the condition at all.

46:31 And if you take it out.

46:32 Exactly.

46:33 Just put it at the end.

46:34 It also becomes kind of more clear what the conditional is doing, what it's controlling,

46:37 because it hasn't got this extraneous thing in it.

46:39 Another one you have in there is converting from a for loop, which does a yield to a yield

46:45 from that collection directly, which is pretty nice.

46:48 I mean, it might even apply to code that was written long ago before yield from was introduced

46:52 to the language, but yield was there.

46:54 And you could say, hey, look.

46:55 Yeah, for sure.

46:56 Nightgash, you've got quite a few comments.

46:57 This old way could be gone.

46:58 Quite a few comments that they didn't realize you could do that.

47:00 So it's like a lot of people aren't reading every kind of pep.

47:03 What?

47:04 Really?

47:05 And seeing everything they can do.

47:07 Strangely, it may seem to us.

47:10 I know.

47:11 It's so bizarre.

47:11 I think I'm sure there's peps that I don't read as well.

47:13 You know, if I had to pick a single most favorite absolute love it refactoring, it has to be

47:21 convert like a deeply nested set of code to something with guarding clauses.

47:28 So it's like flat, right?

47:30 Instead of going, if this is true, then if this is true, while this is true, if this is true,

47:36 and you end up like writing, starting on column 40 to write your code, if you negate them all

47:42 and like return early or break out early or something, it's just so much cleaner.

47:46 Yeah, so avoiding nesting is...

47:47 Okay, that case is out.

47:48 That case is out.

47:49 That case is out.

47:50 Now I focus on the essence.

47:51 Avoiding nesting is one of our course kind of code metrics.

47:54 Some of the other things I think we didn't touch on is how you get the computer to realize

47:58 that there's a code smell.

47:59 It's like writing good code metrics is quite...

48:02 How do you get a computer to know?

48:03 It's quite difficult.

48:04 So there's these metrics like cyclomatic complexity, which...

48:09 What's that about?

48:10 It's about avoiding conditionals, basically.

48:13 Number of decisions.

48:14 Yeah.

48:15 How many branches would you potentially go down, right?

48:18 There's kind of this enhanced version of it we've looked at called cognitive complexity,

48:22 which is trying to get to an idea of how hard something is to hold in your head.

48:26 And that really penalizes nesting.

48:28 How many variables are at play?

48:29 How many other things like that as well, right?

48:31 Probably.

48:32 Yeah.

48:32 So like that sort of penalizes nesting most of all.

48:35 So that's kind of like how sorcery knows not to...

48:38 Like nesting is a bad idea.

48:41 And then we've written metrics about...

48:44 Oh, I never really had thought about it that way.

48:46 But that's exactly the problem is like the reason it sucks so much is like that next test

48:50 is piled on as a and, and, and, and, and this.

48:54 And that all the stuff that you've nested yourself into, you've got to think of like all

48:57 those at the same time while I'm in here.

48:59 Yeah.

48:59 Cause the number of things you have to hold in your head, you know, a human can only hold

49:02 six or seven things in their head at once.

49:04 Yeah.

49:05 Maybe if you're exceptional, you can do eight.

49:06 So like some of our metrics is sort of focusing on how many variables you have to be thinking

49:12 of and how many conditionals you have to be thinking of when you're sort of halfway

49:15 down the function and it's gone off to the right somewhere.

49:17 So we actually call that the working memory metric.

49:20 Yeah.

49:20 Oh, cool.

49:21 That specifically measures the number of variables that are in scope at the current

49:26 time.

49:26 So we think you have, if you're reading the code from top to bottom, by the time you've

49:32 got to the 10th line of code, if you've got seven variables in your head, then you don't

49:37 understand the, you don't understand the function anymore.

49:40 You can't keep all that in your head and understand the next page.

49:44 So we keep having to scroll back and forth instead of just reading it.

49:48 Yeah.

49:49 Yeah.

49:49 There's this really interesting saying from a friend of mine that talked about, it went

49:54 something like when you write code, I guess debugging code is harder than writing code.

50:01 So if you write code at the very limit of what you're kind of able to write and do and like

50:07 the most complicated stuff you can do, you probably can't debug it because trying to think through

50:11 it actually is like a more complex than you kind of just pushed it over your limit.

50:15 And so there's anytime you can kind of dial that back a bit through refactorings or other

50:20 stuff, like, you know what, that should be three functions.

50:22 Then you won't have to think about so hard.

50:24 Sure.

50:24 And like the most three things in this part, so much about it.

50:26 The most interesting figure we found in a scientific paper that analyzed developers was sort of,

50:31 they spend 70% or we spend 70% of our time trying to understand the codes and only 5% of the

50:37 code time actually typing.

50:38 So it's that 70% of the time you really need to cut down on by making it more readable and

50:43 refactoring.

50:44 And you have the, that's kind of the whole Zen of Python, right?

50:47 I think that's why it's a popular language is because it's like clean to read presents itself

50:52 well.

50:53 Right.

50:53 So don't undo that by writing bad code, I guess.

50:56 Definitely.

50:57 Yeah.

50:58 All right.

50:59 Well, I think this is probably a good place to leave it.

51:02 You guys, it looks like a really cool project.

51:04 If people are using PyCharm or they're using VS Code, they could just go get the plugin and

51:09 give it a try.

51:09 Right.

51:09 Yeah, for sure.

51:10 Just search for sorcery in the marketplace of the ID.

51:14 Yeah.

51:14 Okay.

51:15 So you get it like as a, you go to the plugin marketplace in PyCharm or you do the extensions

51:19 in VS Code and it'll just be in there.

51:21 Sorcery with a U.

51:22 As in computer source.

51:24 Yeah.

51:24 Not as in Gandalf.

51:25 Yeah.

51:26 Yeah.

51:28 Yeah.

51:28 And also if you go to our website, it has full instructions for installing both the plugins

51:34 and using it on GitHub.

51:36 And it also has.

51:38 Right.

51:39 And people have open source GitHub repo, they should just drop it in there and it'll give

51:42 them some ideas, huh?

51:43 Yeah.

51:43 Give it a try.

51:44 Absolutely.

51:44 And links to our documentation as well.

51:48 Very cool.

51:48 All right.

51:49 Now, before I let you out of here, I've got the two questions I always ask at the end of

51:53 the show.

51:53 So we'll just be quick since there's two of you.

51:55 Brendan, how about you go first?

51:56 You're going to write some Python code.

51:57 What editor do you use?

51:58 I use Vim nowadays.

52:00 I ended up with wrist injuries from refactoring code using control and shift and the arrow keys

52:08 too much.

52:09 So I decided to learn Vim.

52:10 I've ended up with that as well a long time ago and had to like rejuggle a lot of interesting

52:16 stuff.

52:16 Have like funky, curvy keyboards and all sorts of stuff.

52:20 And yeah.

52:21 Try to use hotkeys rather than mouse a lot.

52:23 Yeah.

52:23 It was turning into a real issue.

52:25 So I had to learn Vim, which slowed me down by about 10 times for 10 weeks.

52:31 But now I feel as though it's magic under my fingertips.

52:35 That's awesome.

52:36 Nick, how about you?

52:37 Write Python code?

52:38 I use PyCharm at the moment.

52:39 So I'm a bit visually impaired and the high contrast mode is just really, really good.

52:43 Dabble with VS Code a bit.

52:45 I really like how it starts up super quick, but it's a little difficult to see.

52:49 So I've made the switch.

52:50 I can imagine that'll definitely push you over the edge.

52:52 All right.

52:52 Then notable PyPI package, maybe not something that everyone necessarily knows, but is like,

52:57 oh, cool.

52:58 I found this the other day and you should check it out.

53:00 Any ideas, recommendations?

53:01 Amazing one that we've used is this package called NUTKA, which is spelled N-U-I-T-K-A.

53:10 And it takes your Python code, cross-compiles it into C, and then compiles the C code and creates an executable.

53:19 And without that package, Sorcery just wouldn't exist as a locally running project because you'd have all sorts of deploy issues.

53:32 We'd just be delivering all of our source codes with the plugins and the extensions.

53:36 Interesting.

53:37 So you're packaging it up.

53:38 You're packing up Sorcery with NUTKA, huh?

53:40 Yeah.

53:40 Exactly.

53:40 Yeah.

53:41 How interesting.

53:43 Okay.

53:43 Yeah.

53:44 It's fantastic.

53:45 It builds in the version of Python that you're using and it reads all the imports to work out which bits of the code it needs to compile.

53:54 It compiles the whole thing down and it works on Mac Windows and Linux.

53:59 That's awesome.

53:59 It's magnificent.

54:01 Very cool.

54:01 I had Kay Hayen from the NU-U-K-A project on for episode 174, which is like a year and a half,

54:09 two years ago.

54:10 I don't know.

54:11 Quite a long while ago.

54:12 But yeah, that's super cool.

54:14 I didn't realize that it was so flexible in packaging up apps.

54:17 I thought of more as like Cython, like this little bit we can make faster.

54:20 So that's good to hear.

54:22 Very nice.

54:23 All right.

54:23 Final call to action.

54:24 People are interested in sorcery.

54:26 They're interested in refactoring.

54:27 What do you tell them?

54:28 Try it out now.

54:29 If you have a GitHub account, you can star our repo and try it out in five seconds.

54:36 Or you can install it and get all of your pull requests refactored.

54:43 If you're using VS Code or PyCharm, go and install it right now.

54:47 Try it out.

54:47 Get your code refactored as you work.

54:49 And let us know.

54:50 I mean, we're really keen to get feedback from people and keep on making it better and better,

54:53 basically.

54:54 Awesome.

54:54 Do you guys have like a GitHub repo?

54:55 Or how should they give you feedback or say, you know, my favorite refactoring is whatever

55:01 you guys don't do.

55:01 How do they make that happen?

55:03 Yeah, we've got the Sorcery AI repo where you can raise issues.

55:05 Or just email us.

55:07 And our GitHub repo is sorcery-iai slash sorcery.

55:16 Very cool.

55:16 Okay, there.

55:17 Awesome.

55:18 Well, Brendan and Nick, thank you both for being here and creating this cool project.

55:22 Looks awesome.

55:22 Thanks very much.

55:23 Thank you very much for having us, Michael.

55:25 Yep, you bet.

55:26 Bye-bye.

55:26 Bye.

55:27 This has been another episode of Talk Python To Me.

55:31 Our guests on this episode were Brendan McGinnis and Nick Thappen.

55:35 And it's been brought to you by Datadog and Linode.

55:38 Datadog gives you visibility into the whole system running your code.

55:42 Visit talkpython.fm/datadog and see what you've been missing.

55:46 Go throw in a free t-shirt with your free trial.

55:48 Start your next Python project on Linode's state-of-the-art cloud service.

55:52 Just visit talkpython.fm/Linode, L-I-N-O-D-E.

55:57 You'll automatically get a $20 credit when you create a new account.

56:00 Want to level up your Python?

56:03 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

56:08 Or if you're looking for something more advanced, check out our new async course that digs into

56:13 all the different types of async programming you can do in Python.

56:16 And of course, if you're interested in more than one of these, be sure to check out our

56:20 Everything Bundle.

56:20 It's like a subscription that never expires.

56:23 Be sure to subscribe to the show.

56:24 Open your favorite podcatcher and search for Python.

56:27 We should be right at the top.

56:28 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

56:33 and the direct RSS feed at /rss on talkpython.fm.

56:37 This is your host, Michael Kennedy.

56:39 Thanks so much for listening.

56:41 I really appreciate it.

56:42 Now get out there and write some Python code.

56:43 I'll see you next time.

57:03 Thank you.