Learn Python with Talk Python's 270 hours of courses

#18: Python Anti-patterns and other mistakes Transcript

Recorded on Wednesday, Jul 1, 2015.

00:00 Often, the most important lessons we learn are what not to do.

00:03 Today, we'll be talking about bad Python code and Python anti-patterns with Andreas Duvez.

00:09 It's show number 18, recorded Thursday, July 1st, 2015.

00:13 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the library,

00:43 the ecosystem, and the personalities.

00:45 This is your host, Michael Kennedy.

00:46 Follow me on Twitter, where I'm @mkennedy.

00:49 Keep up with the show and listen to past episodes at talkpythontome.com,

00:53 and follow us on Twitter, where we're at Talk Python.

00:55 This episode, we'll be talking to Andreas Duvez from Quantified Code about Python anti-patterns.

01:02 This episode is brought to you by CodeShip and Hired.

01:06 Thank them for supporting the show on Twitter via at CodeShip and at Hired underscore HQ.

01:13 I don't have any major news for you this week, so let me introduce Andreas so we can get right to the show.

01:19 Andreas is a physicist and software developer with a passion for code quality.

01:23 He leads the development of algorithms and software at Quantified Code.

01:27 Andreas, welcome to the show.

01:29 Thanks.

01:30 Thanks for having me here.

01:32 It's great to have you here.

01:33 We're going to talk about design patterns, both good design patterns and bad design patterns or anti-patterns,

01:40 with the stuff you guys have going on at Quantified Code.

01:43 Exactly.

01:44 So you've built a bunch of awesome tools, both visual and static analysis, and we're going to talk about that.

01:50 But before we get into it, let's just take a step back down memory lane and, you know,

01:55 tell everyone how you got started in Python and programming and what's your story?

01:58 How did you get here?

01:59 So my background is actually not in software development, but in physics.

02:04 And I made my first contact with Python while I was doing my PhD, which was about, like,

02:10 experimental quantum computing.

02:12 So we did experiments with microwaves and lots of signal processing.

02:19 And when I arrived at our lab, we had, like, a lot of equipment that we needed to, like, control

02:25 using programs, and we had a lot of data that we needed to analyze, to visualize, and to process.

02:32 And we did that using various programming languages, for example, Matlab and C and C++.

02:38 And we were also using tools like LabVIEW that help you to, like, define interfaces, graphical user

02:46 things, and, like, control your measurement equipment.

02:50 And then I remember when I came to the lab, this was basically a huge mess because we used, like,

02:57 four different programming languages and the graphical programming interface of LabVIEW, which is nice in principle,

03:04 but which makes it really, really difficult to write complex software systems with that.

03:09 And so I said, hey, there has to be a better way to do this.

03:13 And one of our postdocs, he used Python in his job in the U.S., and he introduced me to it.

03:19 And so I started, like, getting interested in that, and I tried to, like, replace more and more of our software stack with Python.

03:26 And after about one year or half a year, I think, we had replaced most of the data acquisition and data analysis things we had in the lab with Python programs.

03:37 So that was kind of, like, the first contact with me, for me, with that language.

03:41 And also when I, like, that I saw that it's really powerful and you can basically do anything with it.

03:48 And, yeah.

03:49 Yeah, that's great.

03:51 I think a lot of people come to Python for the data science component of it.

03:55 And I think, like you were saying, the fact that it's a full stack, legitimate, end-to-end programming language means you don't have to just live within those little shells of what MATLAB will do or LabVIEW will do.

04:07 But you can bring it all together.

04:08 Yes, exactly.

04:09 Yeah.

04:10 And, I mean, back in 2009, when I started using it, the tooling was already pretty good.

04:15 But if you have a look at it now in 2015, it's really amazing what kind of an ecosystem people have created.

04:23 You know, for example, the IPython notebook, NumPy, SciPy, and all these other tools that we can use to, as you say, do almost anything that we can imagine.

04:32 So it's really, really amazing to see.

04:35 Yeah, it is really amazing.

04:36 And it's great.

04:38 And I think the thing that, among other things, that surprises me is it's free.

04:42 You look at MATLAB and Mathematica, you pay tons of money to be locked into these, like, somewhat confusing.

04:47 Yeah, exactly.

04:48 And it's the power of open source.

04:50 So I think, yeah.

04:52 You're doing all this cool programming and getting into Python for your work at Quantified Code.

04:57 This is a company you co-founded.

04:58 Who do you co-found it with?

04:59 And what do you guys do there?

05:01 So I co-founded it with an old friend of mine.

05:04 And we're doing static code analysis for Python, but also other programming languages.

05:09 And the reason we started this company was that I saw during my PhD that, hey, it's really hard to write good software because I didn't have a background in software development.

05:18 So I wrote lots of very, very bad code.

05:21 And I made a lot of mistakes.

05:23 And I thought it would be really nice to have, like, a system, an automated system that would tell me how to write good code.

05:29 So that's when I got interested in all these things like static code analysis.

05:32 And I started looking into what kind of tools existed for Python and other languages.

05:37 And I found that most of the tools that we're using today are still not good enough or they could be much better.

05:43 And so that's kind of our mission to, like, make these tools much better and allow anyone to, like, have access to really high-quality, state-of-the-art code analysis.

05:53 Yeah, that's great.

05:54 One of the problems with a lot of static code analysis is it's very good at finding the small rules.

05:59 Like, hey, you should have white space between these things.

06:03 You should have a new line here.

06:04 Your line shouldn't be this long.

06:06 Your variables should be named this way.

06:07 Yeah, exactly.

06:08 But that's, like, the stuff that robots can do, you know?

06:10 The stuff that's really valuable is the hard-learned lessons of if you write your code this way, it's easier to maintain.

06:17 If you write your code that way, it will actually run faster or it will be less error-prone.

06:22 And that's normally what you get by working with kind of a mentoring apprenticeship, you know, working in a team.

06:28 But it's also the whole concept of design patterns, right?

06:32 Yes, exactly.

06:34 I mean, design patterns certainly help people to become productive and to use solutions that are better proven, so to say, and tested in the field.

06:43 I mean, a design pattern in that sense is just like a reusable solution that you can take and implement in your own code.

06:51 And, I mean, I wasn't even aware of that, but I also made use of design patterns, which are covered, like, from the internet for the work I did during my PhD.

06:59 I used, for example, the observer pattern and the command pattern and other things, which I just, like, yeah, intuitively could adapt to my own needs and use in my projects.

07:13 And the great thing about this is also that it gives you, like, a common language to talk to other people, you know, because if you say, hey, I got this class, which is, like, receiving a notification from this other class and then I was doing some things.

07:27 I mean, it's okay, but you could just say, hey, I'm using this observer class and then everyone kind of knows what you mean, you know?

07:33 Yeah, I think that is one of the absolute really strong powers of design patterns.

07:39 Obviously, one is to know the solutions, but the other is to have larger building blocks as you think about your application mentally.

07:46 Yes, exactly.

07:47 All right.

07:47 Like you say, if I go to you and I describe, okay, what I'm going to do is I'm going to have a variable and the variable is going to be shared.

07:53 There's only going to be one of them and it's going to be shared across all parts of the application.

07:57 And so when one part changes it, the other part will see that same change, I could have this conversation or I could just say I'm using the singleton pattern.

08:05 Yes, exactly.

08:06 You know what I'm talking about.

08:07 You know what the positive side of that is.

08:09 You know what the negative side of that is.

08:11 You know singleton pattern is a challenge for unit testing.

08:14 And so is that a tradeoff you want to make?

08:16 And it's not just helpful for working in a team.

08:19 I can tell you that.

08:20 You could tell me that.

08:21 But I think it's also helpful just for you internally as you think about how your code is structured.

08:26 Yeah, absolutely.

08:27 We've all written a lot of code and we may or may have not given it good attention.

08:32 We may have written it bad.

08:34 We may have written it well.

08:36 We might have been in a hurry.

08:37 And so, or, you know, much more common, we've gotten it from somebody else.

08:41 And we have to deal with it.

08:43 And I think understanding these design patterns is really interesting, but also understanding when they're needed.

08:50 And so there's this related idea of this concept called code smells.

08:55 I think it was introduced by Martin Fowler, but I'm not entirely sure.

08:59 Are you familiar with it?

08:59 Yes, it's possible.

09:00 Yeah.

09:01 I'm not 100% sure.

09:02 I mean, design patterns has been introduced by this gang of four in 1994, but I'm not sure where the term code smells comes from.

09:10 Yeah, certainly Martin Fowler wrote a lot about it.

09:12 And I think it's a very interesting way to think of things.

09:16 So I'll put a link to some code smells in the show notes.

09:20 And I think kind of what your code does is it sort of looks for code smells.

09:24 It says you're doing these things that are not necessarily going to make your code not run, but it's not the best way to do it.

09:30 Yes, exactly.

09:31 I mean, what a static code analyzer does is basically that it looks for patterns in the code that can either produce bugs or are quality problems of the code.

09:43 And so for us, like code smell can be categorized by how severe it is.

09:49 So it might make your code crash or it might just be an annoyance to another programmer who takes longer to understand the code.

09:57 And also by various categories like does it affect the performance?

10:01 Does it affect the security of the code?

10:03 Does it affect the maintainability?

10:05 So there are lots of lots of different types of code smells in that sense.

10:10 Yeah, and if you think back, you know, just thinking of the code smells that I can remember, there's some good ones like duplicated code.

10:18 Oh, yeah.

10:20 That one's common, right?

10:22 Yes.

10:23 Or feature envy or, you know, long method.

10:27 Or there's a bunch of, you know, too many parameters.

10:32 Yes.

10:32 I mean, this we're seeing a lot in the code we analyze.

10:35 Very complex functions, a lot of parameters that have too many responsibilities and are really hard to understand.

10:43 So, I mean, I think that's really the number one performance or maintainability killer today in software complexity.

10:51 I agree.

10:51 One of the really interesting code smells, I think, are comments.

10:57 And so you think, you know, comments, I'm supposed to comment my code.

11:00 That's a good thing.

11:01 A lot of times programmers will use comments where they should have just written better code.

11:06 Yes, exactly.

11:07 They'll write a function that has a bad name and it's not well written.

11:11 So it's hard to understand.

11:12 And it'll put a nice little comment to describe what it does because it's not well written, you know, where the real fix would be to write a well, to refactor, to have a proper descriptive name and to be cleaner in smaller parts, right?

11:27 Yeah, absolutely.

11:28 Yeah.

11:29 Comments are kind of like deodorant for code smells, right?

11:32 That's a good conversion, yeah.

11:35 Okay, so that brings us to how I kind of got connected to you guys or learned about you guys is through this thing you wrote called The Little Book of Python Anti-Patterns.

11:45 Yes.

11:46 Can you tell us about that?

11:47 Yeah, sure.

11:48 So this was a project we started because we wanted to get an overview of what problems you would find in Python code.

11:57 And so we started collecting problem types and code smells and issues from different sources, for example, lintes like PyLint or PyFlakes that already have like a collection of a few hundred of those anti-patterns.

12:12 But also from sources such as Stack Overflow and articles, blog posts, and also from people and from code that we were reading.

12:20 And so we like collected all this together and then we worked with a freelancer in the beginning who helped us like write a large part of the initial text.

12:28 And we bundled this whole collection of anti-patterns together into a book so that people could just like learn about different things they should avoid in their Python code.

12:40 So it's not like learning from good code, but learning from bad code in this sense.

12:44 Yeah.

12:45 You know, it's related to that code smells idea.

12:47 Like I see all these problems and if I know to avoid them, well, either you know the right design pattern and the right style or you just know you need to go do some research and don't do it this way.

12:58 Yeah, exactly.

13:00 Yeah.

13:00 Yeah.

13:00 In your book, you do have sort of recommendations for most of the anti-patterns, right?

13:06 Yes.

13:07 So we always, if we describe the problem, we always try to like give at least one or two solutions to it.

13:13 I mean, sometimes there is no really unique solutions or people would kind of like disagree if it is a problem at all or not.

13:24 So, but we are always pretty opinionated and we think like a good programmer should be a bit opinionated.

13:31 And so we try to like give our solution that we would actually solve the problem with.

13:35 Yeah, that's great.

13:37 And let me just tell everybody that the book is at docs.quantifiedcode.com slash python dash code dash patterns.

13:44 And of course, I'll put that in the show notes as well.

13:46 Yes, that would be great.

13:47 I mean, we also have like a GitHub repository and the whole book is open source released under a Creative Commons license.

13:53 So we're always happy to see contributions.

13:56 And like, yeah, so if people are interested, they would be very welcome to like contribute.

14:03 Yeah, that's excellent.

14:04 Let me pull this up here.

14:06 So in your book, you've broken it into quick math, six sections.

14:11 You have, you sort of categorize the problems that people run into.

14:15 One of them is correctness.

14:16 One is maintainability.

14:18 Maintainability is one that catches up with you later.

14:21 Readability, security, performance, and then you have a special section on Django.

14:27 Yes.

14:28 So yeah, I mean, as I said before, we tried to like just categorize the issues or anti-patterns so that people could like go through individual sections.

14:39 And for example, see what type of problems really affect the correctness of their program and what type of anti-patterns only affect, for example, the readability or the performance.

14:50 So let's maybe look at a couple of them.

14:53 I'll just sort of scan through here and just tell people the kind of stuff that's in there.

14:57 Sure.

14:58 One that I think is somewhat common when you're new to programming in Python is this one you call bad accept clauses order.

15:08 Can you talk about that?

15:09 Yes.

15:10 A bad accept clauses order basically means that you put an accept statement, that you have a sequence of multiple accept statements,

15:20 where the first one of those statements is already more generic or general than the second one.

15:27 So basically your second accept clause will never get executed.

15:31 And that's kind of really easy to get wrong.

15:33 It also happens to me a lot of the time.

15:36 Yeah.

15:37 Yeah.

15:37 It's not obvious, especially if you're going back and adding these additional accept clauses.

15:41 So maybe you say try, you're going to do some code and then you say accept exception as X and then accept type error as TE.

15:51 Right?

15:52 That type error will never get run because the exception is the base class of type error.

15:56 Even worse than this, a pattern is probably an empty accept plug without also an exception type.

16:02 I think that's kind of the most evil Python anti-pattern that exists.

16:06 This episode is brought to you by CodeShip.

16:24 CodeShip has launched organizations, create teams, set permissions for specific team members, and improve collaboration in your continuous delivery workflow.

16:32 Maintain centralized control over your organization's projects and teams with CodeShip's new organizations plan.

16:38 And as Talk Python listeners, you can save 20% off any premium plan for the next three months.

16:44 Just use the code TALKPython, all caps, no spaces.

16:47 Check them out at CodeShip.com and tell them thanks for supporting the show on Twitter where they're at CodeShip.

16:53 I have to admit, sometimes I use it, but it's really bad because it swallows up all your exceptions and it doesn't give you any useful information about what happens in your code.

17:08 Yeah, maybe if you're lucky, the empty accept clause at least has like a logging statement or something.

17:15 Yeah, that's already good, yeah.

17:15 Yeah, but if it has accept pass, that's pretty bad.

17:20 That's really easy, yeah.

17:21 I spend many hours debugging stuff like this, yeah.

17:24 Most of the time I wrote that myself.

17:27 Yeah, I like to think of that one as the intern clause.

17:32 Yes.

17:33 Because they're like, oh, this isn't working.

17:35 Oh, I can just put this in here and my code is now reliable.

17:38 I'll be out of here in three months.

17:40 Let's see.

17:42 There's some other interesting ones.

17:44 Under future import is not the first non-doc string statement.

17:49 What's the problem with that?

17:50 Yeah, so this is just a convention of the Python interpreter because you can use these so-called future imports to import functionality that is usually not available in your Python version.

18:03 For example, one thing that is often used are absolute imports.

18:05 And the thing with the future imports is that they have to be the first statement in your file.

18:11 So you cannot have auto code and then afterwards a future import because the Python interpreter needs to load these ones right at the beginning.

18:19 So, I mean, this is kind of an easy pattern to catch, but still, sometimes it happens that you have this in your code.

18:26 I mean, it's also something that you would notice really easily because your program wouldn't run even.

18:31 Yeah, for sure.

18:32 Another one that you can tell that people are coming from somewhere else and they haven't quite fully gotten the Pythonic idiomatic style is implementing Java style or C++ style getters and setters.

18:44 Oh, yeah.

18:44 Yeah, I like this one a lot.

18:45 We also see it often in our analysis.

18:48 And so this is basically when you write set something and get something function in a class, which you would do in Java.

18:56 Whereas in Python, the appropriate or the correct way to do this would be to use so-called class properties.

19:02 We can just have just access an attribute of a class using a function, which looks like an attribute access, so to say.

19:13 Right.

19:13 Just use the app property decorator.

19:15 Yeah.

19:16 Yes, exactly.

19:16 Yeah.

19:17 Yeah.

19:18 Then there's others that are less obvious to me.

19:21 One of them is if I'm using, I'm not using the default dict.

19:25 What's the story of that one?

19:27 Oh, so this is more a recommendation than a real antipodent.

19:30 Python has this great collections library, which contains many data structures that can help you to make your code easier to understand.

19:39 And the default dict is one of them.

19:41 And it's one of my favorite ones and the one I probably use most in my code.

19:46 And what it is, is a dictionary that basically initializes the value of a given key to something that you can pass in as a function.

19:56 So the typical use case for this would be that you have a dictionary with a list inside.

20:03 And so if you want to create an entry for some key that you haven't seen before, normally what you would need to do is to check if the key has been set in the dictionary before.

20:14 And if not, create a list, an empty list in that place.

20:18 And then append the element to the list.

20:20 Yeah, so it's like three or four lines of code, right?

20:22 But the default dictionary basically does, yes, so that's four lines of code.

20:25 And the default dictionary basically does that for you because you can just say default dict of list.

20:30 And then whenever you would like access a given value for a key in the dictionary, it will have created an empty list if there has been no value in it already.

20:40 So really, really useful.

20:42 And I did not know this when I started writing Python, but it really makes your code much more readable.

20:50 Right.

20:50 Does that come from the collections module?

20:52 Default dict?

20:54 Yes, exactly.

20:55 Yeah.

20:55 Yeah.

20:56 Yeah.

20:56 So it's not immediately imported, but it's easy to get to.

20:59 Yes.

21:00 It's in the standard library.

21:01 Yeah.

21:02 So another one that you have, let's move on to the next section of maintainability.

21:06 And one of them I think is probably pretty common is you should not be using wildcard imports.

21:13 So from math import star.

21:16 It's so much easier to type.

21:19 Why shouldn't I just do that all the time?

21:20 Yeah.

21:21 It's also a mistake which I made like a lot in the beginning when I started writing Python code.

21:27 And it's pretty insidious because normally it doesn't break anything.

21:34 But in some circumstances, you can overwrite functions that you have imported before.

21:39 And so the behavior of your program changes in like really unexpected ways, I would say.

21:45 And so today I always try to use only qualified imports.

21:50 So either import the module and then use it as a name when I access the variable or explicitly name the things that I import from a module.

21:58 Yeah.

21:59 I'm with you on that.

22:00 I try to, if it's not too much writing, try to use the name so it's really clear where the type is coming from.

22:05 But sometimes, you know, it's like, well.

22:07 Yeah.

22:08 I know.

22:09 I do it sometimes too.

22:10 But, you know, more as like an explicit import.

22:12 Yeah.

22:13 So another area that you guys did a lot of work on was readability.

22:17 And I think one of the ones that's, I don't know how easy this is for your system to check automatically.

22:26 But certainly if you come from a language like Java or C# or C++ is not in your way of thinking.

22:34 Right.

22:34 Even though it's one of the core tenets of Python.

22:38 And that's ask for permission instead of for forgiveness.

22:42 Asking for permission is the anti-pattern rather than for forgiveness.

22:45 Yes, exactly.

22:46 Yeah.

22:46 Can you tell people about that?

22:47 So usually when you write Python code, the preferred way is to like assume that everything will go as you expect and not perform any checks.

22:59 But rather catch an exception if some part of your program chose one.

23:05 And like handle that exception.

23:07 And then react to like behavior which you didn't anticipate, so to say.

23:12 A good example is always like when you're opening a file.

23:16 You could think of like checking if the path exists and then open the file at the path.

23:23 But I mean between the two calls, like check of existence of the file and the opening, that could actually happen and stuff.

23:29 And so your program could still crash and you will have an exception handler for that then.

23:35 So the preferred way here is to just assume that the file is there and try to open it.

23:39 And if you get an error because the file does not exist or something else is wrong, then handle that exception, so to say.

23:46 Yeah.

23:46 I think that's pretty key.

23:48 Another example is just, you know, you're trying to think of all the use cases of how something could go bad.

23:53 And you end up with like five or six if tests.

23:57 And then finally you're going to actually perform the thing.

24:00 And you may have forgotten one of those.

24:02 Or there may be something else.

24:03 So imagine I'm going to call a web service.

24:04 Yeah, exactly.

24:05 I've got to make sure the network is on.

24:07 I've got to make sure that the address is well formed.

24:11 And then, you know, I've got to make sure my data, I'm going to send it as well for them.

24:15 And then it still may be the case that the DNS name doesn't look up.

24:17 I mean, there's so many things that could go wrong.

24:19 Just try it and somewhere higher up where it makes sense, catch it and handle it, right?

24:24 Yeah.

24:24 That's what you're recommending there.

24:26 Another thing which I did quite often in the beginning when I learned Python was to use, for example, the non variable to like indicate like a special that something happened inside a function, for example,

24:39 that something didn't exist and function couldn't do what it was supposed to do.

24:42 And then basically check for this non value in the code that called the function.

24:47 And this, of course, makes it also very complicated to like handle these kind of errors because now the function does not return only one type of value, but two different ones, like non, for example, and like a numerical value or something.

25:02 And that's also a very good case where you could like ask for forgiveness instead of permission by just like having a function that throws an exception when something goes wrong and then catch that exception in the calling block of the function instead of like looking for a magic non value or something.

25:19 And the thing about that is that complexity that you're adding by having a return of none or a real value or some sentinel value that is indicating the wrong value means you propagate that complexity up through the callers and then it propagates up higher.

25:32 Exactly.

25:33 It's like you forced this style onto everybody up the chain.

25:38 That's bad.

25:38 So another one that you have that I think is pretty common for people who are new to Python, but quickly they get over it, is using an un-Pythonic loop or a non-Pythonic loop.

25:50 What's a non-Pythonic loop for the listeners?

25:53 So, I mean, many people, me included, that come, for example, from like a C background, they're used to writing loops where they initialize some loop variable like X.

26:05 They set them to zero.

26:06 Then they increase the value of that variable to, for example, the length of an array.

26:13 And inside the loop, they then fetch the value of the array element using the index operator, so to say.

26:23 And in Python, of course, you can just like iterate directly over a list, for example, and not use a range to like first get the indices of the list and then only retrieve the value of the inside the loop.

26:36 I mean, this is only a small thing, but it also makes your code much more readable and it like helps you to avoid errors because it involves less variables and less things that can go wrong, so to say.

26:47 Yeah, that makes a lot of sense.

26:49 So the next section, security, is actually pretty simple.

26:53 Yes, I have to say we have to work on a security section because right now we only have, I think, one entry, which is the use of exec.

27:02 And in our analysis tool, we flagged that by default because it's not always an anti-pattern or something that you should avoid.

27:10 But it's, of course, a huge, huge opening in your system and it can like be used easily to like execute like untrusted code.

27:21 So I think it's not generally an anti-pattern, but if you use it, you should definitely be really careful with it and like think a lot about which code or which things you're passing into exec.

27:33 Yeah, absolutely.

27:35 Because unfortunately, Python does not have a real sandbox as well, a safe sandbox.

27:39 So, I mean, there are some things that you can do to make it a bit more safe to use exec, but there's nothing that can prove like 100% security in that case.

27:49 Yeah, to a large degree, it comes down to how much do you trust that string input that you're passing there, right?

27:55 Yes, exactly.

27:56 Yeah.

27:56 And if it's entered into a community forum, it probably shouldn't be exec.

28:02 Yes.

28:03 It would be basically the Python equivalent of a SQL injection attack.

28:09 Yes.

28:10 Something to that effect.

28:11 Okay.

28:12 Some other ones that were interesting in your performance area.

28:16 You have one using key and list to check if a key is contained in a list.

28:21 What's the story of that one?

28:22 Yeah.

28:23 I had to say the performance section is also pretty small right now, but this pattern has to do with the complexity or the computational complexity of an operation.

28:33 Because if you, for example, try to find out if a certain value is in a long list, in our example, we have, for example, a list of several numbers, and you want to find out if, for example, a tree is in that list, then the code that checks this will have a linear runtime.

28:51 So it will take longer, the longer the list is, so to say.

28:55 And if you would use a dict instead, where you have just a mapping between the values that you have in a list and a true value, for example, then you could perform the same check in constant time.

29:08 So order of one.

29:11 I mean, it's also like a small thing, but if you have really long, large data structures, which, for example, I had in my PhD when analyzing data, this can make a huge difference in the wrong time.

29:20 You're right.

29:21 It can make a tremendous difference.

29:23 I'll tell you a real world story where this completely changed my perspective on this.

29:39 This episode is brought to you by Hired.

29:42 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

29:48 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.

29:57 Typically, candidates receive five or more offers in just the first week, and there are no obligations ever.

30:04 Sounds pretty awesome, doesn't it?

30:06 Well, did I mention there's a signing bonus?

30:08 Everyone who accepts a job from Hired gets a $2,000 signing bonus, and as Talk Python listeners, it gets way sweeter.

30:16 Use the link Hired.com slash Talk Python to me, and Hired will double the signing bonus to $4,000.

30:24 Opportunity's knocking.

30:26 Visit Hired.com slash Talk Python to me and answer the call.

30:29 I was working on this project involving eye tracking.

30:43 Like, not the letter I, but, you know, where you're looking.

30:46 And this eye tracking system was collecting data at 250 hertz, so 250 times a second.

30:52 And we were trying to do real-time analysis on this, which meant we had four milliseconds between the times that sample came in.

30:58 It had to, like, really be quick and efficient.

31:02 And we were doing some stuff that we had ported over from MATLAB.

31:06 And we were doing basically what you said.

31:08 We had kind of this, like, running buffer of data.

31:11 And we would have to seek back into it and find certain elements.

31:14 And then we would apply that forward over a certain amount of time.

31:18 And we ran it, and the code was too slow.

31:20 We're like, oh, no, it's too slow.

31:22 If we can't make it go four milliseconds, it's not going to be real-time.

31:27 It's just that simple, right?

31:28 And so there was just tons of complicated math.

31:32 And, like, we really don't want to try to optimize that.

31:36 We're like, it's like wavelet decomposition.

31:38 And so it was, like, on the very edge of even understanding the math.

31:42 And so optimizing it was a bad idea.

31:45 And after running some performance analysis, profiling stuff on the code,

31:49 it turned out 80% of the time was spent looking for an item in the list.

31:53 Oh, I see.

31:54 So we got it to go, like, five times faster by switching to a dictionary.

31:59 And it was, like, basic.

32:01 That's great.

32:02 That's a quick win, yeah.

32:03 It was so little work.

32:04 And we got to avoid optimizing wavelet decomposition math, which is fantastic.

32:08 That's awesome, yeah.

32:10 Yeah, so it sounds really simple.

32:12 Yeah, no, that's a perfect example.

32:14 I mean, yeah, small things can make a big difference, yeah.

32:16 Yeah, they sure can.

32:17 That's cool.

32:18 The last section is Django.

32:20 And I'm more of a micro-framework guy.

32:22 I haven't done a lot of Django.

32:23 Probably have a show on that coming up at some point.

32:24 But maybe you could talk about some of the highlights there for us.

32:27 Yeah, so we added a Django section so that a lot of people were using a project.

32:31 I mean, a lot of the Python projects out there are web projects using Django or Flask or something else.

32:38 And so we wanted to help people also not only writing better Python code in general, but also getting more proficient at libraries like Django or Flask or anything else.

32:49 And so the Django section is basically organized in the same way as the main section.

32:54 So we have maintainability, security, correctness, etc.

32:57 And the only difference is that we are only talking about stuff that is really specific to the Django framework.

33:05 So we're telling you how to import modules, how to use certain fields and models, which variables you should have in your config and which variables should not have in your config under some conditions.

33:17 And also something that is really helpful to many people is we show you which things you would need to change actually in your code if, for example, you wanted to migrate from Django 1.6 to Django 1.8.

33:29 Right. Oh, that's really helpful.

33:31 There's a lot of stuff in there that's not obvious.

33:33 And I think a lot of people learn Django by going to the tutorial and going through it and go, now I know Django.

33:38 Yeah, exactly.

33:39 I mean, Django is changing really fast.

33:41 I mean, the community is really awesome and they're putting out a lot of new features and new versions.

33:48 And I think it's good to have like this resource where you can see what has changed and how you would need to change your code in order to make it work with the latest version.

33:58 And I mean, normally you want to be on the latest version because it's more secure and more features and probably better than the precedent one.

34:06 Yeah, definitely.

34:08 When you're running a website, you definitely want to stay on the latest version of your framework.

34:11 Exactly.

34:12 Okay.

34:13 So this project, the book you said is on GitHub and your general GitHub is just, it's github.com/quantified code, all lowercase.

34:23 Yeah.

34:23 Exactly.

34:24 Yeah.

34:25 So something I found up there when I went to check out your anti-patterns project on GitHub was you also have this thing called a repository called xMachina.

34:37 Ah, yeah.

34:38 Okay.

34:38 That's pretty cool.

34:39 Can you tell just people really briefly what that is?

34:41 It looks awesome.

34:42 I have to say this is something that my co-founder, Christoph, did because in the movie xMachina, there's a snippet of Python code that you can see on the screen.

34:52 And he just wrote down the code because he was interested in what it would do.

34:57 And it actually prints out an ISBN number of a book that is mentioned in the movie.

35:03 And he also ran it through our static analysis software and he found that actually it's not like PEP 8 compliant and there's some other things in it.

35:09 So it probably shouldn't be security code, which it was in the movie.

35:13 Sure.

35:14 Well, you know, the fact that you can take code from a movie and it even runs is already a good deal.

35:20 Yeah, it's great.

35:21 I mean, it's really good to see.

35:23 Awesome to see that Python is used in movies now.

35:26 Yeah, yeah.

35:26 XMachina looks really cool.

35:28 I have not seen it yet.

35:29 Yeah, great movie.

35:30 I would like to talk about it, but I don't want to say any spoilers.

35:33 Oh, yeah, yeah.

35:34 I'll put a link to the trailer on YouTube in the show notes for everyone.

35:38 They can check it out and go see it for themselves.

35:40 Awesome.

35:41 So another thing that you guys have up there is something called Code is Beautiful.

35:45 What's the story with that?

35:48 Exactly.

35:49 So in Quantified Code, we don't only do code analysis, but we're also thinking about new or better ways to visualize code.

35:58 Because for us, dealing with a large code base always involves getting an overview of the whole thing and seeing where, for example, problems are or where work would need to be done.

36:11 And so visualizing the code is a great way to do that.

36:14 And we always liked things like, for example, Cityscape visualizations, but we didn't find anything that was open source or available to the public, so to say.

36:24 And that's why we said, hey, let's develop some cool visualizations of Python code and give them for free to the community.

36:33 So we started this code, this open source project.

36:36 And we have currently three different visualizations of source code.

36:39 One is like a Cityscape visualization, which is like a 3D version of your code where you can, so to say, zoom in and see at a glance where, for example, the most complex part of your code are.

36:50 That's really cool.

36:51 And you should point out to people that it's live.

36:54 You can rotate it and hover over these little buildings or skyscrapers, and it'll tell you what segment of your code that is.

37:00 What do the colors mean in your visualizations?

37:03 So in the visualization, currently, the color encodes the complexity on the code.

37:08 So it's basically the cyclomatic complexity divided by the number of lines.

37:13 That is fantastic.

37:14 So the cyclomatic complexity is just the number of branches in the code, so to say.

37:21 Roughly, you can think of it like the number of unit tests you would need to test a given piece of code.

37:27 Right.

37:27 How many decision points are there in a given function in a class or something like that.

37:31 So I can create this cool cityscape, and the red buildings are more bad.

37:37 And if I'm going to look at somebody's code, I guess maybe looking at the height and width of a building also tells me how much is monolithic.

37:47 Yeah, exactly.

37:47 So the area is the number of lines in a given file, and the height is just the total complexity of the file.

37:57 Yeah, okay.

37:58 Very cool.

37:59 And then you also have two other visualizations that are, I would say, more standard.

38:04 Yes, so they're like 2D, a bit more down-to-earth visualizations.

38:08 And, yeah, we have them because many browsers still don't support WebGL, and we wanted to have something that, like, everyone can use, so to say.

38:18 Right.

38:19 And they're all interactive with D3, right?

38:21 So you can sort of explore them and so on.

38:23 Yeah, exactly.

38:25 And they're also open source, so you can download them in GitHub, use them in your own projects, and modify them.

38:30 Excellent.

38:31 Yeah, you have a live demo that people can go check out, right?

38:33 And what are you analyzing?

38:34 Are you analyzing Django there, or what are you analyzing?

38:37 Oh, we have a lot of projects on our platform, about, like, 10,000 public GitHub projects that we constantly analyze.

38:45 And you can basically go there and see if your project has any problems or if you could improve anything.

38:50 And you can also get visualizations of all the popular Python projects there.

38:54 Oh, that's awesome.

38:56 Is there a way to hook that visualization into a continuous integration story?

39:00 So, like, every time I push to a branch, a build is kicked off, and then an analysis is kicked off and saved.

39:05 Have you guys done this?

39:07 Oh, yeah, you can do that.

39:07 Really?

39:07 You can just sign up for free on our website, and you can add your GitHub project.

39:11 And every time you will make a commit, we will analyze your project, and then you can get the visualizations on our website.

39:17 And it's, of course, free for open-source projects.

39:19 Yeah, yeah, that's really cool.

39:21 So, that's quantifiedcode.com.

39:22 Yeah, very nice.

39:23 Yeah, I really like what you guys are doing there with that one.

39:27 Thanks.

39:28 Yeah.

39:28 Yeah.

39:29 So, I have a few more questions before we call it a show.

39:32 One is, what are the worst anti-patterns that you see?

39:37 Like, we went through a couple of them, but are there some that you go, oh, my gosh, if you do this, then it's really bad?

39:42 So, I mean, as I said before, the empty try-accept statement is one of the worst things to debug, I think.

39:50 And apart from that, I think it's not the simple anti-patterns that are, like, killing software projects.

39:58 It's mostly, like, complexity and, like, changing code too fast and without using, having a solid idea of what you're doing and without, like, writing unit tests.

40:11 So, I think there's no really easy recipe to say, okay, if you avoid this and this and this, then you will write very good code.

40:19 But I think it's, like, mostly about, like, controlling your complexity and, like, constantly thinking about your code structure and thinking how you can improve it and also discussing it with other people and doing manual code reviews and automated code reviews.

40:35 So, yeah, for me, it's like, as a software development, our daily job is to fight complexity.

40:41 Yeah, I think you're right.

40:43 One of the problems that people end up with is they end up with complex, highly cyclomatic, you know, high cyclomatic complexity and all that given to them.

40:52 And then they're like, well, if I had written this, I would be beautiful.

40:56 But, of course, I didn't.

40:57 And so now what do I do?

40:58 I'm kind of lost.

40:59 So, two books, other than just using your tools to understand it, two books I'd like to recommend to people.

41:04 One is called Refactoring the Patterns by Joshua Kershavisky.

41:09 Probably destroyed his name.

41:10 Sorry about that, Joshua.

41:11 But that's a really cool book of how I take, like, monolithic ugly code and, like, make it better.

41:16 And the other one is a book by… I can really recommend it.

41:19 Yeah.

41:19 Do you know the book?

41:20 Yeah, I know.

41:22 Yeah.

41:22 It's a good one.

41:23 The other one is Working Effectively with Legacy Code by Michael Feathers, which shows you how to take huge monolithic code and slowly break it into more maintainable pieces without breaking it, you know, while keeping it running, basically.

41:36 So, those are some good things to check out if you're interested.

41:39 There's also Clean Code.

41:41 I just don't remember the name of the order right now.

41:43 I don't know.

41:44 Do you remember?

41:44 That would be Robert C. Martin.

41:47 Yes.

41:48 Okay.

41:48 Uncle Bob.

41:49 Sometimes he goes that.

41:50 Yeah.

41:51 So, that's also a great resource, not specifically in refactoring, but more generally in how to write good code.

41:56 And it's mostly written for Java, but it contains so much wisdom.

42:01 I think I read it, like, five times already.

42:03 So, every time I've done something new.

42:05 Yeah.

42:05 He did a really good book there.

42:08 And I believe he might have written that with his son, Micah, as well.

42:10 So, there might be two authors.

42:11 But, yeah, it's very cool.

42:12 All right, Andreas.

42:14 This has been a really interesting conversation.

42:17 Before I let you off the book… Yeah, thanks for the conversation.

42:19 You bet.

42:20 Before I let you off the hook, there's a couple of thoughts and questions I have for you.

42:26 Sure.

42:26 One, what's your… You know, there's a bunch of PyPI packages and libraries out there in the world.

42:32 Do you have any favorites that you'd like to recommend to people?

42:36 Mm-hmm.

42:36 So, right now, I use SQLAlchemy a lot.

42:40 And I think it's really, really a great Python package.

42:43 It's perfectly documented.

42:46 And it allows you to do so much magic with SQL.

42:51 And it has allowed… It has saved us so much time.

42:54 So, it's really, really a great project.

42:56 And if you're working with SQL databases, you should have a look at that, definitely.

43:02 I totally agree.

43:03 You know, if I'm working with a relational database, I wouldn't use anything else, basically.

43:08 That's how I feel about it, right?

43:11 We actually had Mike Bayer on show five to talk about SQL as well.

43:15 And so, if people are listening now and they want to go get more info on it, we've got a whole show there.

43:19 That's awesome.

43:20 Yeah.

43:20 And what's your favorite editor?

43:22 I mostly use Sublime Text, I have to say.

43:25 But I'm not even a power user, so I'm pretty old-fashioned.

43:28 And sometimes I also use Vim to edit my code.

43:31 But I really don't use many key combinations or so.

43:35 So, I'm really… When it comes to developing, writing code, I'm… Yeah.

43:39 Yeah.

43:40 Cool.

43:40 Okay.

43:41 Excellent.

43:41 Yeah.

43:41 Sublime is great.

43:42 So, have you thought about selling an e-book version of your little book of anti-patterns?

43:48 Oh, I mean, right now we really don't see this as a commercial project.

43:53 And we just want to make this available to as many people as possible.

43:56 That's why we licensed it using a Creative Commons license.

44:01 So, basically, you can download it, modify it, and do whatever you want with it.

44:05 So, and right now I think we don't have the time to make an e-book version.

44:09 But if somebody would want to do it, he or she could do it.

44:13 Excellent.

44:14 Yeah.

44:15 Seems like you could write a cool script that would generate a little e-pub or something that you could throw at your kiddo.

44:21 I think the documentation swings also supports e-book and mobi outputs.

44:29 So, I think it would be pretty easy to do.

44:31 Oh, yeah.

44:32 Cool.

44:32 It's another good package, which I could recommend, by the way, if you're writing documentation.

44:35 Yeah.

44:35 Yeah.

44:36 That's what's powering your docs and your book.

44:38 Great.

44:38 Yeah.

44:39 Okay.

44:40 And then why don't you just tell us a little bit about what you guys do at Quantified Code so people can check out whatever you're up to in your day job.

44:48 So, I mean, at Quantified Code, what we're doing now is mostly static code analysis.

44:53 But what our big vision is, so to say, is to change the way that people write software.

44:59 Because we think that today we're still using tools that are very similar to the ones that we used like 30 years ago.

45:07 We're still entering text using editors, and we're thinking of code as text files, and we're not making use of all the progress in software development and computer hardware as well that has been made in the last 30 years.

45:22 So, our mission is kind of like to think about new ways to interact with code, to analyze code, and to transform it, and to also make use of all the available data that is out there.

45:34 Because today, most of us develop software using version control systems like Git, et cetera.

45:39 And those systems, they generate an amazing amount of data about every interaction we have with a code base.

45:45 And we can make use of that and basically use that data to find out how you can become a better programmer, how you can be more effective at your job, how you can be more happy maybe, and how you can write code that is more robust.

46:00 So, this is basically what we're trying to do.

46:03 And the first step on this way is, so to say, to develop a new way to do code analysis and help people to avoid some of the most problematic things in their code.

46:16 That's really a great vision and a mission, and you guys are off to a good start.

46:21 Thanks.

46:21 Cool.

46:22 Of course.

46:23 It will take a while until we get there, I guess.

46:25 But, I mean, yeah, somebody should do it.

46:28 Yeah, if you're going to change the world and no one else is doing it, maybe that's you, right?

46:32 That's awesome.

46:33 Okay, well, Andreas, thank you so much for being on the show.

46:36 It's been great to talk.

46:37 Yeah, thanks for having me.

46:38 It was really great.

46:39 Yeah, you bet.

46:40 It's been great to talk about bad code with you.

46:42 Thanks.

46:43 All right.

46:44 See you later.

46:45 See you.

46:46 Bye-bye.

46:48 This has been another episode of Talk Python to Me.

46:50 Today's guest was Andreas Duvez, and this episode has been sponsored by CodeShip and Hired.

46:56 Thank you guys for supporting the show.

46:58 Check out CodeShip at CodeShip.com and thank them on Twitter via at CodeShip.

47:03 Don't forget the discount for listeners.

47:05 It's easy.

47:05 Talk Python, all caps, no spaces.

47:08 Hired wants to help you find your next big thing.

47:11 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity

47:16 presented right up front and a special listener signing bonus of $4,000.

47:20 You can find the links from today's show at Talk Python To Me.com slash episodes slash show slash 18.

47:29 Be sure to subscribe to the show.

47:30 Open your favorite podcatcher and search for Python.

47:33 We should be right at the top.

47:35 You can also find the iTunes and direct RSS feeds in the footer of the website.

47:40 Our theme music is the song Developers, Developers, Developers by Corey Smith, who goes by Smix.

47:46 You can hear the entire song on our website.

47:48 This is your host, Michael Kennedy.

47:50 Thanks for listening.

47:50 Smix, take us out of here.

48:10 Developers, Developers, Developers, Developers, Developers, Developers, Developers, Developers.

48:15 you

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon