#18: Python Anti-patterns and other mistakes Transcript
00:00 Often the most important lessons we learn is what NOT to do. Today we are talking about BAD Python code and Python Anti Patterns with Andreas Dewes.
00:00 This is show number 18, recorded Thursday, July 1st 2015
00:00 [music]
00:39 Welcome to Talk Python to Me, a weekly podcast on Python the language, the libraries, the ecosystem, and the personalities.
00:39 This is your host, Michael Kennedy. Follow me on twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpythontome.com and follow us on Twitter where we are @talkpython.
00:56 This episode we'll be talking to Andreas Dewes, from Quantified Code about Python antipatterns.
00:56 This episode is brought to you by Codeship and Hired. Thank them for supporting the show on twitter via @codeship and @hired_hq
00:56 I don't have any major news for you this week, so let me introduce Andreas so we could get right to the show.
01:19 Andreas is a physicist and software developer with a passion for code quality. He leads the development of algorithms and software at quantified code.
01:28 Andreas, welcome to the show.
01:30 Thanks, thanks for having me.
01:31 It's great to have you here. We are going to talk about design patterns- both good design patterns and bad design patterns or anti-patterns, with the stuff you guys have going on at Quantified Code.
01:45 Mhm, exactly.
01:46 So you've built a bunch of awesome tools, both visual and static analyses, and we are going to talk about that, but before we get into it, let's just take a step back down the memory lane and you know, tell everyone how you got started in Python and programming and what is your story, how did you get here?
01:59 So, my background is actually not in software development but in physics, and I made my first contact with Python while I was doing my PhDs. It was about like experimental quantum computing, so we did experiments with micro waves and lots of signal processing, and when I arrived at our lab we had like lot of equipment that we needed to like control using programs and we had a lot of data that we needed to analyze, to visualize and to process. And, we did use various programming languages for example Matlab, and C and C++, and we also used tools like Labview that help you to like define interfaces, graphical user of things and like control your measurement and the equipment; and I remember when I came to the lab, there was basically huge mess because we used like four different programming languages and the graphic program interface of Labview you which is nice in principle, but which makes it really, really difficult to write software systems with that, and so I said, "Hey, there has to be a better way to do this." And one of our 3:13 use Python in his job in the US and he introduced me to it and so I started like getting interest in that and I tried to like replace more and more of our software stack with Python, and after about one year, or half a year I think, we had replaced most of the data acquisition and data analyses things we had in the lab with Python programs. So that was kind of like the first contact for me with that language and also when I like saw that it is really powerful and we can basically do anything with it...
03:50 Yeah, that's great. I think a lot of people come to Python for the data science component of it, and I think like you are saying, the fact that it is a full stack legitimate end to end programming language, means you don't just have to live within these little shells of what Matlab will do or Labview will do, but you can, you can bring it all together.
04:08 Yes, exactly, yeah. And I'm going back in 2009 when I started using it, the tooling was already pretty good but if you have a look at it now, in 2015, it's really amazing what kind of an ecosystem people have created in for example the IPython notebook, NumPy, SciPy, and all these other tools that we can use as you say do almost anything that we can imagine. So it's really, really amazing to see.
04:35 Yeah, it is really amazing and it is great, and I think the thing that among other things that surprises me is it's free! You look at Matlab and Mathematica, you pay tons of money to be locked into these somewhat confusing-
04:48 Yeah exactly, and it's a power of open source, so- I think, yeah.
04:53 You are doing all these cool programming and getting into Python for your work in Quantified Code. This is a company you co-founded; who do you co-found it with and what do you guys do there?
05:01 So I co-founded it with an old friend of mine and we are doing static code analyses for Python but also other programming languages. And, the reason we started this company was that I saw doing my PhD that it is really hard to write a good software, because I didn't have a background in software development so I wrote lots of very, very bad code, and I made a lot of mistakes and I thought it would be really nice to have like a system, an automated system that would tell me how to write good code. And so that's when I got interested in all these things, like starting code analyses, and started looking into what kind of tools existed for Python and other languages, and I found that most of the tools that we are using today are still not good enough or they could be much better. And so that's kind of our mission to like make these tools much better and allow anyone to like have access to really quality instead of the code analyses.
05:53 Yeah, that's great. One of the problems with a lot of static code analyses is it's very good at finding the small rules like, "Hey you should have white space between these things, you should have a new line here, your line shouldn't be this long, your variables should be named this way-"
06:08 Yeah, exactly.
06:08 But that is like the stuff that robots can do. You know, the stuff that's really valuable is the hard learned lessons of if you write your code this way it is easier to maintain, if you write your code that way, it will actually run faster or it will be less aeroprone and that is normally what you get by working with kind of a mentoring apprenticeship, you know, working in a team, but it is also the whole concept of design patterns, right?
06:34 Yes, exactly. I mean, design patterns certainly have people to become productive and to use solutions that are better improvements so to say and tested in the field. I mean, a design pattern that sounds just like a reusable solution that you can take and implement in your own code. And I mean, I wasn't aware of it but also you may use of design patterns which you recovered from the internet for the work I did during my PhD. I used for example the observer pattern and the command pattern and a lot of things, which I just like, yeah, intuitively could adapt to my own needs, and use in my projects, and the great thing about this is also that it gives you like a common language to talk to other people, because if you say, "Hey, I get this clause which is like receiving notifications from this other class and I was doing some things, I mean" so you could just say, "I'm using this observer class" and then everyone kind of knows what you mean, you know?
07:34 Yeah. I think that is one of the absolute really strong powers of design patterns. Obviously one is to know the solutions but the other is to have larger building blocks as you think about your application.
07:47 Yes, exactly.
07:48 Like you say if I go to you and I describe, "Ok, what I'm going to do is I am going to have a variable and the variable is going to be shared, get me one of them and it's going to be shared across all parts of the application and so when one part changes that the other part will see that same change" I can have this conversation or I could just say, "I am using the singleton pattern"
08:05 Yes, exactly.
08:06 You know, what I am talking about, you know what the positive side of that is, you know what the negative side of that is, you know singleton pattern is a challenge for unit testing and so is that a trade off you want to make- and it is not just helpful for working in the team, I can tell you that, or you can tell me that. But I think it is also helpful just for you internally, as you think about how your code is structured.
08:27 Yeah, absolutely.
08:28 We've all written a lot of code and we may or may have not given it good attention; we may have written it bad, we may have written it well, we might have been in a hurry- and so, or you know, much more common we've gotten it from somebody else. And we have to deal with it, and I think understanding these design patterns is really interesting, but also understanding when they are needed- and so there is this related idea of this concept called "Code smells" I think it was introduced by Martin Fowler but I am not entirely sure, are you familiar with this?
08:59 Yes, it's possible. I'm not 100% sure I mean design patterns has been introduced by this gang of four in 1994, but I'm not sure whether the-
09:10 Yeah, certainly Martin Fowler wrote a lot about it and I think it's a very interesting way to think of things, you know, so I'll put a link to some Code Smells in the show notes and I think kind of what your code does is it sort of looks for code smells, it says, "You are doing these things that are not necessarily going to make your code not run but it is not the best way to do it."
09:31 Yes, exactly. I mean, what a static code analysis does is basically it looks for patterns in the code that are either can produce bugs or quality problems of the code and so for us like code 9:46 and we categorize by how severe it is so it might make your code crash or it might just be an annoyance to another program, it would take longer to understand the code, and also by various categories like does it affect the performance, does it affect the security of the code, does it affect maintainability so... there are lots and lots of different types of codes in that sense.
10:09 Yeah and if you think back, you know just think of the code smells that I can remember there are some good ones like duplicated code-
10:20 Oh yeah, that's-
10:21 That one is common, right?
10:23 Yes.
10:23 Or, feature NV, or you long, long method or- there is a bunch of you know- too many parameters...
10:32 Yes, I mean this we are seeing a lot in the code we analyze, very complex functions, a lot of parameters that have too many responsibilities and are really hard to understand. So, I mean, I think that is really the number one performance or maintainability killer today in software, complexity.
10:51 I agree. One of the really interesting Code Smells I think are comments. And so, you think, "You know, comments- I am supposed to comment my code, that's a good thing." A lot of times programmers will use comments where they should have just written better code.
11:07 Yes, exactly.
11:09 They write a function that has a bad name, and it is not well written so it is hard to understand, and then they put a nice little comment to describe what it does because it is not well written, you know. Where the real fix would be to write a well- to refactor to have a proper descriptive name and to be cleaner in smaller parts, right?
11:27 Yeah, absolutely.
11:29 Yeah, comments are kind of like deodorant for Code Smells, right?
11:34 Yeah, that's a good comparison.
11:36 Ok, so, that brings us to how I kind of got connected to you guys, or learned about you guys, is through this thing you wrote called the "Little book of Python anti-patterns"
11:46 Yes.
11:46 Can you tell us about that?
11:48 Yes, sure. So this was a project we started because we wanted to get an overview of what problems you would find in Python code. And so we started collecting problem types and Code Smells and issues from different sources, for example PyLand or PyFlakes, that already have like a collection of few hundred of those anti-patterns, but also from sources such as stack overflow and articles, blog posts and also from people and from code that we were reading. And so we like collected all this together and then we worked freelance in the beginning and we have just like write a large part of the initial text and we’ve put this old collection of anti-patterns together into a book so the people could just like learn about different things they should avoid in their Python code. So it is not like learning from good code, but learning from bad code in this sense.
12:45 Yeah. You know, it is related to that Code Smells idea like I see all these problems and if I know to avoid them- well, either you know the right design pattern and the right style or you just know you need to go do some research, and don't do it this way.
12:59 Yeah, exactly.
13:01 Yeah. In your book, you do have sort of recommendations for most of the anti-patterns, right?
13:06 Mhm. Yes. If we describe the problem we always try to like give at least one or two solutions to it, I mean, sometimes there is no really unique solutions or people would kind of like disagree if it is problem at all or not, so what we are always pretty opinionated and we think like a good program should be with opinionated and so we try to like give our solution and we would actually solve the problem with...
13:36 Yeah, that's great. And let me just tell everybody that the book is at docs.quantifiedcode.com/python-code-patterns and of course I will put that in the show notes as well.
13:47 Yes, that would be great, I mean, we also have like GitHub repository and the whole book is open source released on the creative-commons license, so we are always happy to see contributions and like, yeah, so if people are interested they would be very welcome to contribute.
14:04 Yeah, that's excellent. Let me pull a step here- so, in your book, you've broken it in quick math- six sections, you have you sort of categorize the problems that people run into one of them is correctness, one is maintainability; maintainability is one that catches up with you later, readability, security performance and then you have a special section on Django.
14:28 Yes. So yeah, I mean, as I said before, we try to like just categorize the issues or anti-patterns so that people could like go through individual sections, and for example see what type of problems really affect the correctness of the program, and what type of anti-patterns only affect for example readability or the performance.
14:51 So, let's maybe look at the couple of- I'll just sort of scan through here and just tell people the kind of stuff that's in there.
14:58 Sure.
14:59 One that I think is somewhat common when you are new to programming in Python is this one you call "bad except clauses order", can you talk about that?
15:10 Yes, bad except clauses order basically means that you put an except statement, sequence of multiple except statements where the first one of those statements is already more generic or general than the second one. So basically, your second except clause will never get executed. And it is kind of really easy to get wrong- and also happens to me a lot of times.
15:37 Yeah, it's not obvious, especially if you are going back and adding these additional except clauses, so maybe you say, try, you are going to do some code and you say except exception as x and then except type error as te, right, that will never get run because the exception is the base class of type error.
15:57 Even worse than this, pattern is probably empty except log without also an exception type I think that is kind of the most evil Python anti-pattern that exists.
15:57 [music]
15:57 Codeship has launched organizations, create teams, set permissions for specific team members and improve collaboration in your continuous delivery workflow. Maintained centralized control over your organizations projects, and teams with Codeship's new organizations plan. And as Talk Python listeners, you can save 20% off any premium plan for the next three months. Just use the code TALKPYTHON. Check them out at Codeship.com and tell them thanks for supporting the show on Twitter where they are at @codeship.
15:57 [music]
16:58 I have to admit, sometimes I use it but it is really bad because it's swallows up all your exceptions and it doesn't give you any useful information about what happens in your code. So,...
17:09 Yeah, maybe if you are lucky the empty except clause at least has like a log in statement or something, but if it has except pass, that's pretty bad.
17:22 Many our debugging stuff like this. Also at the time I wrote they must have...
17:30 I like to think of that one as the intern clause, because they are like oh this is not working, I could just put this here and my code is now reliable. I'll be out of here in 3 months.
17:30 Let's see, there is some other interesting ones; __future__ import is not the first non docstring statement. What is the problem with that?
17:50 Yes, so this is just a convention of the Python because you can use this so called future imports to import functionality that is usually not available in your Python version. For example, one thing that is often used absolute imports, and the thing with the future imports is that you have to, that it have to be the first segment you file, so you cannot like have autocode and then afterwards future import, because the Python needs to load these ones right at the beginning. So, this is kind of an easy pattern to catch but still, sometimes it happens that you could have this in your code. I mean it is also something that you would not is really easy because your program wouldn't run in.
18:31 Yeah, for sure. Another one that you could tell that people are coming from somewhere else and they haven't quite fully gotten the Pythonic idiomatic style is implementing Java style or C++ style or getters and setters.
18:43 Oh yeah, I like this one a lot, we also see often and our analyses, and so this is basically when you write something and get something function and a class, which you would do in Java, whereas in Python the appropriate, the correct way to do this would be to use so called class properties, we can just have just access an attribute of a class using a function which looks like an attribute access so to say.
19:13 Right just use the property decorator, yeah?
19:17 Yes, exactly, yeah.
19:19 Then there is others that are less obvious to me. One of them is if I am using, I am not using the defaultdict, what is the story with that one?
19:27 This is more a recommendation than a real anti-pattern but I mean Python has this great collections library which contents many data structures that can help you to make your code easier, to understand, and the defaultdict is one of my them, it's one of my favorite ones and the one I probably use most in my code. What this is, is a dictionary that basically initialises the value of the given key to something that you can 19:55 pattern?? as a function so the typical use case with this would be that you have a dictionary with a list inside and so if you like want to create an entry for some key that you haven't seen before normally what you would need to do is to check if the key has been set in the dictionary before and if not, create an empty list in the place and then 20:17 the element to that list.
20:20 Yeah, so it is like 3 or 4 lines of code, right?
20:23 The default dictionary basically, yes, so that is 4 lines of code, and the default dictionary basically does that for you because it can just set defaultdict of list and then whenever you would like access, given value for a key and the dictionary it will have created an empty list that has a value in it already.
20:23 So really really useful and I did not know this when I started writing Python but it really makes a code much more readable.
20:51 Right. Does that come from the collection's module? Defaultdict?
20:53 Yes exactly.
20:56 Yeah, so it is not immediately imported but it's easy to get to. Mhm, it's in a 21:01 library, yeah.
21:03 So another one that you have, let's move on to the next session, maintainability. And one of them I think is probably pretty common, as you should not be using wildcard import so from math import*. It's so much easier to type why shouldn't I just do that all the time?
21:22 Yeah. It's also a mistake which I made like a lot in the beginning, when I started writing Python code and its pretty it's 21:31 because normally it doesn't break anything but in some circumstances you can overwrite functions that you have imported before and so the behavior of your program changes in like really an unexpected ways, I would say. And, so today I always try to use only qualified imports import a module and then use it as name when I access a variable, or explicitly name the things that I import from a module.
21:59 Yeah, I am with you, I tried to, if it not too much writing tried to use the name so it is really clear where the type is coming from, but sometimes you know, it's like, well...
22:09 Yeah, I know, I do it sometimes so, yeah.
22:11 But you know, more is like an explicit import, yeah. So, another area that you guys did a lot of work on was readability. And I think one of the ones that is I don't know how easy this is for your system to check automatically but certainly if you come from a language like Java or C Sharp or C++, is not in your way of thinking. Right? Even though it's one of the core tenets of Python and that's ask for permission instead for forgiveness. Asking for permission, is the anti-pattern rather than- can you tell people about that?
22:48 So, usually when you write Python code, the preferred way is to like assume that everything will go as you expect and not perform any checks but rather catch an exception, if some part of your program chose one and like handle that exception and then react to like behavior which you didn't anticipate, so to say. A good example is always like when you are opening a file, you could think of like checking if the path exists and then open the file, the path, but I mean, between the two calls like check of existence of the file and opening there could actually happen stuff and so your program could still crash and you will have an exception handler for that. So, the preferred way here is to just assume that the file is there and try to open it and if you get an error because the file does not exist or something else is wrong, then handle that exception so to say.
23:46 Yeah. I think that's pretty key. Another example is just you know, you trying to think of all the use cases or how something could go bad and you end up with like 5 or 6 if tests and then finally, you are going to actually perform the thing and you may have forgotten one of those, or there is maybe something else, so I imagine I'm going to call a web service-
24:05 Yeah, exactly-
24:05 I've got to make sure that network is on, I have got to make sure that the address is well formed, and then, you know, I've got to make sure my data I am going to send is well formed and then it is still maybe the case that the DNS name doesn't look up- I mean there is just so many things that could go wrong, just try it and somewhere higher up where it makes sense catch it and handle it, right?
24:24 Yeah.
24:25 That's what you are recommending there?
24:27 Another thing which I did quite often in the beginning when I learned Python was to use for example the non variable to like indicate like a special that something happened inside of function for example that something didn't exist and function couldn't do, what it was supposed to do, and then basically check for this non value and the code that called the function. And, this of course makes it also very complicated to like handle these kind of errors because now the function there is not return only one type of value but two different ones, like non for example and like a numerical value or something. And that's also very good case where you could like ask for forgiveness instead of permission by just like having a function that throws an exception, when something goes wrong, and then catch an exception and 25:14 work of the function instead of like looking for a magic non value or something.
25:19 Right, and the thing about that is that complexity that you are adding by having a return of none or real value, or some certain value that is indicating the wrong value, means you propagate that complexity up through the caller and that propagates up higher- it's not just your little function that is hard, it's like you forced this style into everybody up the chain, that's bad.
25:39 Yeah.
25:40 So, another one that you have that I think is pretty common for people who are new to Python, but quickly they get over it, is using an unpythonic loop or non pythonic loop- what is a non pythonic loop for the listeners?
25:53 So, I mean, many people, me included, that come for example from like a C background, they are used to writing loops where there are initially some loop variable like X they set them to 0, then they, it will be like increase the value of that variable to for example length of an array, and inside the loop they then fetch the value of the array element, using the index operators so to say. And in Python of course, you can just like iterate directly over a list for example, and not use a range to like first get the indices of the list and then only retrieve the value of the inside the loop. I mean this is only a small thing, but it also makes your code much more readable and it like keeps you to avoid errors because it involves less variables and less things that can go wrong so to say.
26:48 Yeah, that makes a lot of sense. So, the next section, security, is actually pretty simple.
26:56 Yes, I have to say we have to work on the security section because right now we only have I think one entry which is the use of exec, and in our analyses tool we flagged that by default because it's not always an anti-pattern or something that you should avoid, but it is of course a huge opening in your system and it can like be used easily to like execute like untrusted codes. So, I think it is not a general anti-pattern, but if you use it you should definitely be really careful with it and like think a lot about which code or which things you are passing into exec.
27:34 Yeah, absolutely.
27:36 Because unfortunately, Python does not have real safe sandbox, or I mean there are some things that you can do to make it a bit more safe to use exec but there is nothing that can prove like 100% security in that case.
27:50 Yeah. To a large degree it comes down to how much do you trust that string and put your passing there, right?
27:56 Yes, exactly.
27:56 Yeah. And if it is entered into a community form, it probably shouldn't be execed.
28:03 Yes. It would be the basically the Python equivalent of a SQL injection attack.
28:10 Yes.
28:11 Something of that effect. Ok, some other ones that are interesting in your performance area- you have 1. using key in list to check if key is contained in the list- what's the story with that one?
28:23 Yeah, the performance section is also pretty small now, but this pattern has to do with the complexity or the competitional complexity of an operation. Because if you for example try to find out if a certain value is in a long list- we have for example a list of several numbers, and you want to find out if for example it is on that list, then the code that checks this will have a linear runtime so it will take longer the longer the list is, so to say. And, if you would use a dict and set we have like just the mapping between the values that you have in the list and true value for example, then you could like perform the same check in the constant time. And I mean this is also like a small thing, but if you have a really long, a lot of data structures, which for example I had in my PhD when analyzing data this can make a huge difference in the runtimes.
29:20 You are right, it can make a tremendous difference, I'll tell you a real world story where this completely changed my perspective on this.
29:20 [music]
29:20 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
29:20 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.
29:20 Typicaly candidates receive 5 or more offers in just the first week and there are no obligations ever.
29:20 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!
29:20 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.
29:20 [music]
30:39 I was working on this project, involving eye tracking. Like not the letter I but you know, where you are looking. And this eye tracking system was collecting data at 250 Hz so 250 times a second. And we were trying to do a real time analyzes on this which meant we had 4 miliseconds between the time the sample came in, it had to like really be quick, and efficient. And we were doing some stuff that we had ported over from Matlab, and we were doing basically what you said, we had kind of this like running buffer of data we have to seek back and find certain elements, and then we would apply that forward, over certain amount of time, and we ran it and the code was too slow, and we were like, "oh no, it's too slow, and if we cannot make it go for 4 miliseconds it is not going to be real time" it is just that simple, right?
31:29 And so, there was just tones of complicated math. And like, we really don't want to try to optimize that, we were like, it's like way the composition and I was like on the very edge of even understanding the math and so optimizing was a bad idea, and after running some performance analyses profiling stuff on the code it turned out 80% of the time was spent looking for an item on the list-
31:53 Oh! I see.
31:54 So we got it to go like 5 times faster by switching to it a dictionary. And it was like basically-
32:03 That's good, that's quick-
32:04 It was so little work, and we got to avoid optimizing math which is fantastic.
32:10 That's awesome, yeah.
32:11 Yeah, so it sounds really simple
32:13 Yeah, now that is a perfect example, I mean, small things can make a big difference.
32:17 Yeah, they sure can. That's cool. The last section is Django, and I am more of a micro framework guy, I haven't done a lot of Django, but maybe you could talk about some of the highlights there for us?
32:27 Yes, so we added the Djangos ection so that a lot of people were using a project, I mean a lot of the Python projects up there are web projects using Django or Flask or something else and so we wanted to have people also not only writing better Python code in general but also getting more proficient at libraries like Django or Flask or anything else. And so the Djangos section is basically organized in the same way as a main section, as we have maintainability, security, correctness etc, and the only difference is that we are like only talking about stuff that is really specific to the Django framework- so we are telling you how to import modules, how to use certain field in modules, which variables you should have in your config, and which variables you should not have in your config, in some conditions and also something that is really helpful to many people is we show you which things you would need to change actually in your code if for example you wanted to migrate from Django 1.6 to Django 1.8.
33:30 Right. That's really helpful. There is a lot of stuff in there that is non obvious and I think a lot of people learn Django by going to the tutorial and going through it and going, "Now I know Django"
33:40 Yeah, exactly, I mean Django is changing really fast, I mean the community is really awesome and they are putting a lot of new features and new versions and I think it is good to have like these resources where you can like see what has changed and how you would need to change your code in order to make it work with the latest version, and I mean, normally, you want to be in the latest version because it is more secure and more features and probably better than the precedent one.
34:07 Yeah, definitely, when you are running a website you definitely want to stay on the latest version of your framework.
34:11 Exactly.
34:13 Ok, so this project, the book you said is on GitHub and your general GitHub is just, it's github.com/quantifiedcode, yeah?
34:23 Exactly.
34:24 Yeah, so some that I've found up there, when I went to check out your anti-patterns project on GitHub was you also have this thing called, repository called "Ex Machina"?
34:37 Oh yeah.
34:39 That's pretty cool, can you tell just people really briefly what that is? It looks awesome.
34:42 I have to say this is something that 34:44 Christoph did, because in the movie Ex Machina there is a snippet of python code, you can see on the screen, and he just like wrote down the code because he was interested in what it would do, and it actually prints out an ISBN number of a book that is mentioned in the movie. So, and he also ran it to a static analyses software and he found that it actually is not like a pep8 compliant and there is some other things in it so it probably shouldn't be security code which it was in the movie.
35:14 Sure, well you know, the fact that you can take code from a movie and it even runs is already good.
35:22 Yes, it is really good to see that Python is used in movies now.
35:26 Yeah yeah Ex Machina looks really cool, I have not seen it yet-
35:30 Yeah, great movie, I would like to talk a little about it but I don't want to say any spoilers.
35:34 Yeah yeah, I'll put a link to the trailer on the YouTube in the show notes for everyone, they can check it out and go see it for themselves. Awesome. So another thing that you guys have up there is something called "Code is beautiful" what is the story with that?
35:48 Exactly, so at Quantified Code we do not only do like code analyses but we are also thinking about new or better ways to like visualize code, because for us, like dealing logical places always involves like getting an overview of the whole thing and like seeing where for example problems are, like work that need to be done. And so visualizing the code is a great way to do that and we always like things for example cityscape visualizations but we didn't find anything that was like open source or available like to the public so to say. That is why we said, "Hey, let's develop some cool visualizations of Python code and give them for free to the community" So we started this open source project and we have currently like 3 different visualizations of source code, one is like cityscape visualization which is like a 3D version of your code where you can so to say zoom in and like see at a glance where for example the most complex parts of your code are-
36:51 That's really cool and you should point out to people that it's live, you can rotate it and hover over these little buildings and skyscrapers, and it will tell you what segment of your code that is. What do the colors mean in your visualizations?
37:03 In visualization currently the color encodes the complexity of the code, so it is basically the cyclomatic complexity divided by the number of lines.
37:15 That is fantastic!
37:15 Some cyclomatic complexity is just the number of branches in the code so to say. Roughly you can think of it like the number of unit tests you would need to like test a given piece of code.
37:27 Right. How many decision points are there in a given function or class or something like that. So I can create this cool cityscape in the red buildings are more bad and if I am going to look at somebody's code, I guess maybe looking at the height and width of a building also tells me like how much is like-
37:50 Yeah, the area is the number of lines in the given file and the height is just total complexity of the file, so.
37:58 Yeah, ok. Very cool. And then you also have two other visualizations that are I would say more standard-
38:06 Yes, they are sort of like 2D a bit more down to visualizations and we have them because many browsers still don't support 38:13 and we want to tell something that like everyone can use so to say.
38:19 Right. And they are all interactive with D3 right so you can sort of explore them and so on?
38:25 Yeah exactly and they are also open source so you can download them at GitHub use them in your own projects and modify them.
38:31 Excellent. Yeah you have a live demo, people can go and check out, right, and are you analyzing Django there, what are you analyzing?
38:37 We have a lot of projects on our platform, about like 10000, public GitHub projects that we constantly analyze, and you can basically go there and see if your project has any problems or if you could improve anything, and you can also get visualizations of all the popular Python projects there.
38:56 That is awesome, Is there a way to hook that visualization into a continuous integration story so like every time I push to a branch a build is kicked off and then an analysis is kicked off and saved, have you guys done that?
39:07 Oh yeah, you can do that. You can just sign up for free on our website and you can add your GitHub project and every time you make a comment we will analyze your project and then you can get the visualizations on our website. So, and it is of course free for open source projects.
39:20 Yeah yeah that is really cool. So that is quantifiedcode.com? Very nice. I really like what you guys are doing there with that one.
39:27 Thanks.
39:28 So I have a few more questions before we call it a show: one is what are the worst anti-patterns that you see, like we have been through a couple of them, but are there some that you are like "oh my gosh if you do this then that's really bad".
39:43 So I mean as I said before the empty try except statement is one of the worst thing to the work I think, and apart from that I think it is not the simple anti-patterns that are like killing software projects, it's mostly like complexity and like changing code too fast and without using having solid idea of what you are doing and without like writing unit tests. So there is no really easy recipe to say, "ok if you avoid this and this and this, then you will write very good code" but I think it's like most it's about like controlling your complexity and like constantly like thinking about your code structure and thinking how you can improve it and also discussing it with other people and doing manual code reviews and automatic code reviews. So yeah, for me it is like as a software development our daily job is to fight complexity-
40:43 Yeah, I think you are right. One of the problems that people end up with is they end up with high ciclomatic complexity and all that, given it to them and then they are like, "wow, if I had written this it would be beautiful, but of course I didn't and now what do I do, I'm kind of lost" So 2 books other than just using your tools there are 2 books I would like to recommend to people- one is called "Refactoring to patterns" by Joshua Kerievsky, probably destroyed his name, sorry about that Joshua, but that is a really cool book how I take like monolithic ugly code and like make it better. And the other one is a book "Working effectively with legacy code" by Michael Feathers. Which shows you how to take a huge monolithic code and slowly break it into more maintainable pieces without breaking, you know, while keeping it running basically. So those are some good things to check out if you are interested.
41:40 There is also "Clean Code" I just don't remember the name of the author, right now-
41:44 That would be Robert C Martin.
41:48 Yes, ok.
41:50 Uncle Bob, sometimes he goes by that name.
41:51 So that is also a great resource not specifically in refactoring but more generally in how to write good code and it is mostly written for Java but it's content so much wisdom I think I read it like 5 times already. Every time I learn something new.
42:05 Yeah, he did a really good book there I believe he might have written that with his son Mike as well so there might be 2 authors, but yeah it is very cool.
42:05 All right Andreas, this has been a really interesting conversation, before I let you off the hook-
42:18 Thanks.
42:18 There is a couple of thoughts and questions I have for you- what is your, you know there is a bunch PyPi packages and libraries out there in the world, do you have any favorites that you would like to recommend to people?
42:36 So right now I use SQLAlchemy a lot, I think it is really really great Python package, it is perfectly documented and it allow you to do so much magic with SQL and it saves so much time, so it is really a great project and if you are working with SQL databases you should have a look at that definitely.
43:02 I totally agree, you know if I am working with the relational database like I wouldn't use anything else basically, that's how I feel about it, right?
43:02 We actually had Mike Bayer on show 5 and if people are listening now they can go and get more info, we have got a whole show there it is awesome.
43:02 And what is your favorite editor?
43:23 I mostly use Sublime text, I have to say, but I'm not even a power user so I am pretty old fashioned so sometimes I also use Vem to edit my code but I really do not use like many combinations, really when it comes to like developing writing code, yeah.
43:40 Yeah, cool. Ok, excellent, yes Sublime is great.
43:40 So have you thought about selling an e-book version of your "Little book of anti-patterns"?
43:49 No, I really don't see this as a commercial product and we just want to make this available to as many people as possible, and that is why we licensed it using a creative common license so basically you can download it, modify it and do whatever you want with it. So, right now I don't think we have the time to make an e-book version but if somebody would want to do it, he or she could do it.
44:14 Excellent. Yeah. I seems like you could write a cool script that would generate a little epub or something...
44:22 I think the documentation Sphinx also supports ePub and Mobi outputs so I think it would be pretty easy to do it.
44:32 Oh yeah.
44:31 Sphinx is another good package which I could recommend by the way, if you are writing documentation
44:36 Ok and then why don't you just tell us a little bit about what you guys do at Quantified Code so people can check out whatever you are up to in your day job?
44:47 So I mean at Quantified Code what we are doing right now is mostly static code analyses, but what our big vision is so to say, is change the way that people write software. Because we think that today we are still using tools that are very similar to the ones that we used like 40 years ago, you know, we are like still entering text using editors and we are thinking of code as text files and we are like not making use of all the progress in software development and computer hardware that has been made in last few years so our mission is kind of like to think about new ways to interact with code to analyze code and to transform it and to also make use of all the available data that is out there you know, because today most of us develop software using control systems that are generate like amazing amount of data about every interaction we have in the code base. We can make use of that and like basically use that data to find out how you can become a better programmer, how you can be more effective at your job, how you can be more happy maybe and how you can write code that is more robust, and so this is basically what we are trying to do. And the first step on this way is so to say to develop a new way to do code analyzes and help people to like avoid some of the most problematic things in their code.
46:17 That's really a great vision and a mission and you guys are at the good start.
46:22 Thanks.
46:22 Cool, yeah of course.
46:24 It will take a while until we get there I guess, but I mean, somebody should do it.
46:28 Yeah, if you are going to change the world where no one else is doing it maybe it is up to you. That's awesome.
46:28 Ok, well, Andreas, thank you so much for being on the show, it's been a great to talk-
46:39 Yeah thanks for having me, it was really great.
46:40 Yeah you bet,it's been great to talk about bad code with you. All right, see you later.
46:48 See you, bye bye.
46:49 This has been another episode of Talk Python To Me.
46:49 Today's guest was Andreas Dewes and this episode has been sponsored by CodeShip and Hired. Thank you guys for supporting the show!
46:49 Check out Codeship at codeship.com and thank them on twitter via @codeship. Don't forget the discount code for listeners, it's easy: TALKPYTHON
46:49 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $4,000 USD.
46:49 You can find the links from the show at talkpythontome.com/episodes/show/18.
46:49 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.
46:49 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on our website.
46:49 This is your host, Michael Kennedy. Thanks for listening!
46:49 Smixx, take us out of here.