Learn Python with Talk Python's 270 hours of courses

#25: Effective Python Transcript

Recorded on Wednesday, Aug 12, 2015.

00:00 What if you could bottle up all the wisdom and hard-fought experience of many expert Python developers and power up your own skills? That's what Brett Slatkin did and he put it in his book Effective Python.

00:00 This is episode number 25 of Talk Python To Me recorded Wednesday, August 12th 2015

00:00 [music]

00:00 Welcome to Talk Python to Me. A weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on twitter via @talkpython.

00:00 This episode is brought to you by Hired and Codeship. Thank them for supporting the show on twitter via @hired_hq and @codeship.

00:00 Another week, another book give-away! And just like last week's, this one is really an excellent one. All you have to do is make sure you're a "friend of the show" and you'll be in the running to win the free ebook version of Effective Python. Just visit talkpython.fm, click on Friends of the show, and boom: winning happens, ... sometimes if you are lucky.

00:00 Now let me introduce Brett so we can get right to the show. Brett Slatkin is the author of Effective Python (Addison-Wesley 2015). He's the engineering lead and co-founder of Google Consumer Surveys. He formerly worked on Google App Engine, the PubSubHubbub protocol, and managing Google's server fleet. Outside of his day job, he works on open source tools and writes about software, bicycles, and other topics on his personal website (also published to @haxor). He earned his B.S. in Computer Engineering from Columbia University in the City of New York. He lives in San Francisco.

01:54 Brett, welcome to the show.

01:57 Thanks for having me, Michael.

01:57 Yeah, I'm really excited to talk to you about your book, and the other stuff you have going on with Python. We are going to talk about this book that you recently released called Effective Python, and I was the big fan of the Effective series back in the C++ days, and so when I saw your book title, it was like, “Oh, this is going to be awesome”. So, it's going to be a fun conversation.

02:17 Yeah, definitely. Effective C++ was the best programming book that I ever had and so I totally agree with that. So that was 1997, or 1998 when it came out when I first read it, and being able to make the Python version of that was really a great honor for me to be able to write.

02:39 Yeah, I bet it was. That's great. Before we get into the details of your book though, maybe we could just talk a bit about how you got to where you are, how did you get into programming in Python?

02:50 Yeah, so I won't bore you with like the super long details of my history of writing code, but I've been writing code since I was about ten or eleven years old. My first job out of college was at Google, and I showed up for my first day of work and they said, "Ok, here is this Python code base you need to make it better. Here is the book and Alex Martelli sits right there- figure it out." So I had seen Python, I hadn't written any Python at all, and I went from basically knowing nothing about Python and then actually writing a book about it in just ten years which is pretty funny.

03:28 Yeah, that's great. The student becomes the master sort of thing, right?

03:32 Yes, I hope so, I hope it's worthwhile, but I think it was funny that it was like my first impression of Python was from the BitTorrent code base, because it was written- Brad Cohen had this post all about Python and how it makes it more efficient, and that is why BitTorrent is so awesome. I don't think people realize BitTorrent was written in Python. And then, but then like Zope and Plone I thought were just really- there were code bases that were really unhappy with using what I tried Python for the first time, and that kind of turned me off. So then, when I showed up at Google and they said ok using Python, I was cautiously optimistic, but it end up turning out really well. So I can go really quickly just the kind of projects that I used Python for at Google.

04:16 Yeah, people would love to hear that.

04:17 My first job there was kind of doing janitorial services for data centers. Google has a bunch of machines, in the data centers, those machines have a life cycle, they are built, they are repaired, they break- and I worked on a bunch of tools that help that life cycle, primarily written in Python. And they got to scale that out in the Google's kind of world 4:42 data centers, a lot of system level code, a lot of networking, and database accessing, all kinds of stuff like that workflows.

04:51 So that was my first kind of big role. I did some other things similar to that around security, and securing the machines in the data center. And then, shortly after that, that was for a couple of years and then shortly after that, I heard of this really cool project that was being created called "App Engine" it just started. And I thought it was really- it had a lot of potential and so I went to talk to that team, 4 people at that point the original founding team, and I was like, "What can I do to get on this team, how can I work on this?"

05:23 And in my 20% time I built the App Engine dev app server which is- I don't know if you've ever used that engine, but you spend a lot of your time using this development tool that actually write your apps. And so yes, I built that and they were like, "Oh, this is pretty good" and then they invited me to join the team, and then I helped build App Engine out and launch it, which was- so it's in a platform as a service, it was cloud kind of system.

05:50 Yeah, that's really cool. And was Python the very first language, or was it Python and Go it launched with just a restricted set-

05:59 Yeah, so Python was the first language. Java came like a year and a half, or two years later, and then Go came a little bit after that, maybe two or three years after that. I think the first language is actually supposed to be Javascript, service side Javascript in the old Netscape live scripts scheme of things. But in 2008 that seemed like a completely ridiculous proposition, which is pretty funny looking it all the no JS stuff that's happened in the last few years with the VA- this was way before VA.

06:27 Javascript was still very slow, so anyway, so they went with Python because it was one of the main languages that Google had used and Google had a lot of expertise in Python in general. So, on the App Engine team I spent a lot of time working on Python infrastructure, Python APIs, just kind of the feel of what it was like to write apps on top of App Engine using Python.

06:48 And fixed a lot of bugs, did new APIs to go out to make sure they felt good, built infrastructure around, task queues and map reduce and offline processing and all kinds of different things. Did that for few years, and then also started this project called "Pubsubhubbub" in there, which was a real time RSS project, for making RSS feeds real time, which is kind of hilarious also in a hindsight analysis, that's pretty- it's kind of like saying- I made design pod or something like that, it's kind of old.

07:24 Yeah, that's pretty funny, but you know, that was the Google reader days and all that kind of stuff, right?

07:28 The good old days.

07:30 I kind of miss that things though.

07:30 I do too, all the time. I've been using NewsBlur recently which I believe is also a Python app, written in Django, but anyway, I miss reader.

07:42 Yeah, definitely. So, that was a bunch of really cool stuff and we may have to come back and talk about that at some later date. But let's talk about your book today- it's called "Effective Python" what's the subtitle?

07:56 It's "59 specific ways to write better Python".

07:59 It's kind of based in the heritage of the Effective series, which as we kind of hinted that at the beginning, I think that started out with Scott Meyers and Effective C++, right?

08:09 Yeah, he invented the format and, you know, he is really the guy who came up with this way of educating people, and all the other books are in his style.

08:20 Yeah, that's cool. So I remember when I was learning C++, I could sort of do stuff with it, I could write code with it and so on, but, after I read his book I felt like it really took my understanding and effectiveness to a new level, and so you are kind of trying to bring this to the Python developers, right?

08:38 Definitely. Yeah, that's the goal of the book, yeah.

08:43 It's a bit of an audacious goal, but yeah.

08:45 Yeah, I mean, I think it's really simple- I read an introductory Python book, this Effective Python is the book I would have given to myself as a second book, if I had had the chance. So there was a gap in my knowledge that took me years and years and years to fill, and this book is kind of a shortcut and get all that practical knowledge without having to pay the dues; so I wish I had had this book when I was starting out.

09:10 You know, we talk a lot about the concept of Pythonic code and idiomatic Python and stuff, and there is a lot of that concept in there, like this is the way you should do things in Python, even if you already know, if you could accomplish it using your old idioms from say Java or CSharp or something.

09:27 Mhm.

09:29 But then there is another part, another angle to it; that's just sort of broadening your horizon I think- so it's not just about writing Pythonic code, and I am thinking of things like how you would use say sub-process to manage child processes and parallelism from your Python app- this is not technically a Pythonic thing, but it definitely ups your game and what you can do.

09:53 Yeah, I think a lot of the advice is pure language construct kind of stuff. And then other things are like the sub-processes items, but you know, it's important to remember that one thing that they say about Python is "batteries are included", right. And so the difference between Python as a language and Python as an ecosystem, or Python as a set of libraries you can use- when people think of Python they think of all of it together, the syntax, the language features that it has, and then the libraries that you know you can rely on to always be there, so those are tools you should have in your toolbox that are part of being Python programmer, just like you knew the collections API if you are Java programmer because it's basically all you write all the time.

10:35 Yeah, absolutely. I have many friends that work in other languages, and they often want to compare their language to Python. And like you said, I think that that's not even a conversation you can begin to have if you don't think about the entire standard library and the 60 000 PyPi packages- you know, you've got to take it as a whole, right, it's not just "here is how you do properties here and properties there", so this is either better or less good, right?

11:02 Yeah, totally. I mean a lot of these things you have so much- I like to use the word leverage to describe you get to stand on the shoulders of giants, there is a lot of different ways to say it, but yeah, things like Numpy, as an example of a library it's a whole ecosystem under itself, and trying to do the pulling in Java or C++ is extremely difficult.

11:23 Yeah, that's for sure. So I thought one way that would be fun to have a conversation about your book is kind of go from section to section of from chapter to chapter, and pick out a few interesting pieces of guidance from each one of them.

11:37 Yeah, definitely.

11:37 Yeah cool. So, you broke your book into six parts. And we'll just take them one at the time. So the first one is this concept of sort of the libraries and Pythonic thinking right?

11:48 Yeah, that is the first part. It's actually 8 separate chapters, is how many there are, because it keeps going but yeah, the first one is Pythonic thinking.

11:59 Excellent. So did you start with that?

12:02 Yeah, I think it's kind of like the strange universe of Python syntax, you have to get used to doing things, even simple expressions like- I remember when I first started writing Python, and the not operator is opposed to using the bang or like exclamation point, getting used to where that not goes, and things like "is not" as opposed to like you do "A is not B" or you could do "not A is B" and understanding that the Pythonic way is "A is not B", that is how you would say it, so you really need this base to start with so that all the other things like make sense, in the context.

12:37 Right. I also think it is probably a good way to kind of shock the system a little bit, to prepare them for like this is a different world, probably, than you come from. And so, get prepared to think differently.

12:50 Yeah, it's to open- I think it's a good way to say so I think "if you try to fit it into the way you are writing other languages you are going to have bad time, and your really need to embrace the Pythonic way of doing it". I had one person told me once that my C++ code looks like Python, I thought that was a really great compliment, I guess. But, you don't want your Python to look like C++.

13:13 Yeah, absolutely. Cool. So, the very first thing that you talk about, literally the first piece of guidance is to know what version of Python you are using, and I do a lot of training in Python and it's always a problem, someone is always running the wrong version of Python, or even worse, they are trying to use the package and they installed it to the wrong version of Python, then they are running the right one or something like that, right.

13:37 Yeah, that's a big problem, especially with the Python 3 move that is going on, you know, you can get confused really easily and then a lot of the features that people like to use like even with statements, are actually more recent development, so if you are on some old version, some versions are running Python 2.5 and you don't know it and you try to do simple things like a with statement and it doesn't work. And so that could be very surprising. So it is always good to check your assumptions.

13:59 Yeah, absolutely. So, sort of if we all know what we are talking about, now we can move forward, right?

14:04 Yeah. Exactly.

14:05 So what I think probably a lot of people are familiar with but some are not, is sort of working with sequences, and you say, "you should prefer using list comprehensions over chaining a map and a filter call together". What's the story there?

14:20 Yeah, so it's interesting, because if you look at the history of Python, people will talk about how, "oh I like Python because it's kind of functional and it is kind of object oriented, and it's kind of scripty", and so you'll still see examples or guides today, you know you will be searching on the internet, you'll find some old guide on active state from 2006 and it is like, "here is how you should use maps and filters" and even from 2003, like you know, long time ago. And, those are useful tools, sometimes they are useful, but the list comprehension syntax is just so much more pithy.

14:54 I think that Python over the years has added more and more tools to maximize the readability of code, and to minimize the visual noise, the extra parenthesis and brackets and various symbols that you need to express something. And so, I think that map and filter are kind of just antiquated tools that are in the language because they are hard to take out, you know they are kind of keywords, built in functions. If I had my way I would just take them out, because I think list comprehensions are better.

14:54 [music]

14:54 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

14:54 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company. Typically, candidates receive 5 or more offers in just the first week and there are no obligations, ever.

14:54 Sounds pretty awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!

14:54 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.

14:54 [music]

16:34 But yes, the main thing is that, you know, if you have two ways of doing something, I think the Pythonic way of doing it is always the more clear and more explicit and obvious way of doing it. And so yes, so that is why I think list comprehension are definitely the way to go.

16:50 Yeah, that's cool, I totally agree. It's not exactly a more declarative way of programming but it is closer. That's nice.

16:57 Yeah. It's more obvious.

16:59 Sort of related to that, one of your other pieces of guidance was that you should use generators for large list comprehensions. It's so easy to switch from one to the other, but when should I use one, when shouldn't I, what's the story with that?

17:11 Yes, so I think that you know, Python can use a lot of memory, is one big thing. And, it's depending on what are you trying to do. If you are using a small data set that entirely fits within memory than you don't have to really think about this. You can just read everything as a list, and just work on lists. So if you are reading a file of like a csv file, you want to deal with the line by line or something like that. Then reading a whole thing in a memory is fine. But what always happens is you end up wanting to do something a little bit bigger than you expected.

17:41 Especially with the lot of the data process and stuff that people are doing in Python these days. And so, I am trying to change a culture a little bit to say, "Hey, why not use a generator if you can", because if you use a generator, than all of these memory problems go away; generator returns one item at a time, it doesn't materialize the lists, so the total memory space that is occupied is just the last thing that you return from the generator, the rest of it gets cleaned up, garbage collector.

18:11 Yeah, that's fantastic. Especially if you are like chaining one to another to another and they are kind of building up. It's really efficient.

18:20 Yeah, and to that point, I mean I think that if you start by at the base level, if you want to switch to generator code, it is difficult because you have to put generators all the way down to the laves of your call stacks; but if you start with the generators at the leaves and it's very easy to start saying "hey you know, I'm going to turn this to new streaming generator", and so creating those cascades of generators becomes very straightforward, and so you can get quick very fast executing code that is easy to follow and uses a low amount of memory, very cheaply and it's very readable. So that's why I suggest that.

18:53 So the final language one that I wanted to sort of point out, is something that I always thought was weird, is the whole concept of the else statement on the loop.

19:05 Yeah.

19:05 And something that I recently learned, is that try/except/finally also supports else.

19:13 It does, yeah.

19:15 And you have really interesting guidance, you say you know what these else statement on loops maybe not so much, but on exception handling this is very cool, it's like an alternative to the catch.

19:23 Yeah, definitely. Yeah, so I think else makes a lot of sense for try/except/finally. Because, it is basically saying "try to do this, if the exception happens do this, and if no exception happens, then do this other thing" and that is what the else block is on an exception kind of try except. And so that makes things really clear, clearly delineated what will and will not happen inside your exception handling. And we are doing exception handling, you want the try block to be as small as possible to narrow what you are catching, so that exceptions you didn't plan to catch are raised back up, that is a really important thing.

19:58 Yeah, that makes sense. And the other thing is, you know, you might think, "well, just the last bit of code", in your try block could be what would go in the else statement, but that doesn't account for the like early returns. And so, you know, if you are to do try do some stuff, if these just return, it would still run that else but then you would have to be really careful, you know, trying to put that cleanup all over the try part, so yeah, it is really nice.

20:24 Yeah, it's nice, and yeah, you can put like the final cleanup stuff in the finally and that always run if you have an early return and if you have an else, with an early return that will also still run the finally, so it's super nice. Back to that for loop part of it- yeah I think that is one of the big things, like when people first learn about that in Python, they are like, "oh this is so great, I can use it for doing this and that", and it's really- I think it is something to avoid, it's kind of like it's a shiny new toy and you want to use all the different parts of it for- you have to like stab yourself in the eye before you realize that it wasn't a good idea.

20:57 That thing hurts.

20:59 Yeah, it hurts. It's pointy, I shouldn't touch that. And I was talking to Guido about this one in particular, because he said- because I was asking what he thinks about this one, because it is kind of tough. I worked with Guido on App Engine for a few years and so then I see him in PyCon from time to time, and so you know, I felt bad by saying like, "hey you shouldn't use this part of the language", I just wanted to make sure it wasn't ridiculous to him, and he said that it is actually implementation detail of how the CPython runtime is implemented, and the way that the for else block- the else with loops works.

21:34 It has to do with the ways these go to statements actually work in the Python interpreter, and so if you wrote that code, then it makes perfect sense, because the else has to do with the way the certain switch go to statement works. It's a switch statement, not a go to, it's a switch. But if you don't have that mental model, then it makes no sense at all, and you know, I think that so for Guido and maybe other people who've hacked on the core, they just get it and that makes sense to them, but everyone else it does the opposite of what you would expect. And so, just because of that, I think it is too sharp and I think it is something you should avoid.

22:09 Yeah, I agree. All right, so the next chapter was "Functions", and there is a lot of good stuff in there. One of them that I am a fan of is the whole sort of concept of its "easier to ask for forgiveness than permission". And you know, that sort of manifests in your recommendation of saying "prefer exceptions rather than returning none from functions".

22:30 Yeah. Yeah definitely. Yeah this one I think that people- they are always looking- I don't know if you've ever heard of like a try bool, this was a joke about try bool to boolean value 3. People are always dying for a try bool because it is true or false or in some exceptional case it's neither true or false. And none in Python ends up playing that role a lot of the time. And that's a problem, because in Python both false and non evaluate false, they are false values. And so what ends up happening is that is it has a lot of bugs in your code because you say, "hey, x=this function call and then if x then do this other thing, or if not x do this then there is an error". But if you are returning none, then that if not x kind of statement is going to run no matter what.

23:22 And so, my whole point- I've been hit by this bug in production the whole bunch of times, I had learned this one the hard way over the course of many years, and I'm still managing the code base that has a bunch of this and trying to take it out. And so my advice here is if you have anything exceptional just always raise an exception, that is what exceptions are for. I think people coming from other languages are used to exceptions costing a lot of CPU time, or somehow cost a lot of cash problems, and that is true for C++ and other languages, but in Python raise an exception is a little bit of cost to it but it is not enough to matter, and the clarity is worth it.

24:03 Not enough to introduce bugs into your code, right?

24:05 Yeah. So well it is fast but it doesn't work correctly. It's like all right, well if that is what you want then it's fine.

24:11 Yeah, that is now a trade off, you want to make, right?

24:13 Yeah.

24:15 So, another one you had around functions that I thought was good advice was if you are going to return a lot of data, just make that function a generator rather than altering a list.

24:24 Yeah, and this kind of goes hand in hand with generator statements that I was talking about before, I think I kind of covered both of those when I was talking before; so, generator expressions is list comprehensions which are a fast way of doing map and filter on lists then this generator expressions which is a way of doing exactly the same thing but the data is generated one at the time.

24:46 And then a generator function is just a- it's not just one line, it's like a full function that does this; so what I was talking about before I think I kind of conflated both of those together- Yeah, I think just having functions that return better generators like I was saying let's kind of start with the leaves and move out to make things streaming, it's really helpful.

25:04 Yeah, and it's so easy to just throw in yield instead of list append or whatever, right?

25:10 Yeah. Exactly. And it's very clear and what is interesting, is that you know, Javascript another languages are adopting the yield semantics because it's such a powerful tool that people love to use, which is really great, and I am sure Python stole it from something else, I don't know exactly where yield is from originally. But it is a really great tool.

25:28 Yeah, it's really great. And it's one of those that I find a lot of people they haven't taken the time to really learn it and appreciate it, they know it's there, and they just kind of ignore it, right? But raising awareness is great.

25:43 Yeah, definitely, and it should be the default, I think you should like plan the yield, plan like if you are returning sequence like question what you are doing.

25:54 Yeah, exactly. The thing better be pretty short. So, the last one I wanted to talk about in functions which I haven't necessarily decided my feelings on it yet- if you use keyword arguments that can help in the expressivity and clarity of functions.

26:12 Yes.

26:14 So, I think the actual recommendation was to use just keyword arguments if you get a chance sometimes, right? Can you maybe speak to that a bit?

26:21 Sure, well I was curious what part you don't necessarily agree with, because I'm curious? I mean I have a few different parts to my advice, but yes, keyword arguments are extremely powerful, they let you be explicit about your intention and I think that using them in general for optional parameters is really important, so if you have a function that takes an optional flag, then you should always use a keyword argument, it's a great way to do that-

26:51 That I totally agree with. I think where there is an optional thing, it definitely helps because it's not part of what you expect to pass all the time.

27:00 Yeah, because Python isn't a polymorphism like other languages, so in Java you can define a same function name that takes three different sets of parameters, you know, C++ can do that too; in Python you can't do that. So the only way to deal with this, is to basically add additional optional parameters to one function and have it kind of deal with the various types of input, or define a totally different function with the different name, which can get verbose, so-

27:23 Right. Yeah, I'm not sure I disagree, I just haven't fully internalize that one yet, so the use case I was thinking of is if you have something like PyCharm, or Python tools for visual studio or something and you hit open parentheses and there is like a nice listing of all the parameters without even looking at the docs, which for optionals I think is still to show up but if you just do like the **kwargs-

27:48 Oh yeah, then you've got a problem

27:49 - then it is really like, Ok, what the heck can I pass

27:53 Yeah, that's totally true. So **kwargs which basically is a catch, let's you catch any keyword arguments no matter what they are including garbage, it can be a problem. This is one of the things I've tried to really hard in the book to do- I provided that it applies both to the Python 2 and Python 3, so it's not a Python 2 book, it's not a Python 3 book, it's like the most overlapping subset that I could find both, so it should be relevant to everyone. And this is a great example of Python 3 has specific language features that would make it to your ID, will continue to work properly, whereas Python 2 does not.

28:28 So, in Python 2, you can't say these arguments can only be parsed by keyword. And you want to do that in situations where you are parsing two integer values and you don't know which is which and so you want to use the keyword par in the function call to make it really clear that like, "hey, this is the numerator and this is the denominator and it's not the other way around, so here is the label in the function call, numerator=5, and denominator=10", something like that.

28:56 Right.

28:56 And, you want to enforce that behavior so that people can call up by just parsing 5 and 10, or 10 and 5 on accident. And so, in Python 3 you can just add the star into the arguments list and then that's enforced by the compiler and everything is great. And in Python 2 you can't do that. So the **kwargs is a way to deal with that and you are right, it breaks your ID and makes it harder to understand the documentation, you have to make that trade off for you and say, "hey, how prone is this function, is it worth doing this, having this optional parameters that are in this way" and yeah, maybe there is a better design that you can have, maybe a helper class, would be a better way to approach than having **kwargs sitting there.

29:36 Right. It's definitely a cool language feature, it's just I think sometimes it gets overused, and you are like, "Ok, I've got to basically keep the documentation up for a while, so I know what my possibilities are".

29:46 You are totally right, I would say don't use **kwargs unless you are doing something like this, I think it's one of the few times you should use it- there is some other glue kind of infrastructure times that are good to use it, if you are doing like generic wrappers of functions or something like that, but in general, try to stay away from it. And I am really happy that Python 3 has this extra forcing **kwargs feature.

30:07 Yeah, that's cool. Ok, so now I definitely agree with you.

30:13 Ok.

30:13 [music]

30:13 This episode is brought to you by Codeship. Codeship has launched organizations, create teams, set permissions for specific team members and improve collaboration in your continuous delivery workflow. Maintains centralized control over your organization's projects and teams with Codeship's new organizations plan.

30:13 And, as Talk Python listeners you can save 20% off any premium plan for the next 3 months. Just use the code TALKPYTHON.

30:13 Check them out at codeship.com and tell them "thanks" for supporting the show on Twitter where they are at @codeship.

30:13 [music]

31:04 So, let's move on to classes; one thing that Python supports is multiple inheritance. And, speaking about all that sharp that hurts is- I did a lot of C++and even com with like insane templating multiple inheritance you know, I kind of prefer single inheritance, and just keep it really simple. So you are saying that to some degree, and you are saying, "Look, this multiple inheritance stuff should be used for mixings".

31:34 Yeah, definitely.

31:35 Can you tell people what you mean by mixings in this case, and how that would go?

31:39 Yes. So I think that, a mixing to me is a set of functionality that you can add to a class, you can add to any class. So, help us for serialization, help us for logging, help us for doing some kind of introspection on functions and stuff like that; those are the mixings like hey I have this class, wouldn't it be nice if it automatically logged every function call or something like that. Or every attribute access. Let me inherit from this mixing utility to do that for me.

32:09 So, I think that, those are the times where you want to use multiple inheritance where the class structure doesn't actually matter, it's more like aspect oriented programming or something like that where, which I am not a huge fan of but I'm trying to say it is nice to be able to kind of compose functionality and so mixings are a way of doing composition with class inheritance. If there were another way to express that, composition in Python that would be great, but the tool that we have is multiple inheritance.

32:36 Yeah. And I have seen some really beautiful mixing code that when done right is nice.

32:41 Yeah, when it is done correctly it's really nice, and it's not brittle, I think a lot of the time multiple inheritance is really brittle and it breaks in weird ways when you refactor it, and mixings are built not to be brittle and so that is why it works out, so it is about composition, primarily.

33:00 Nice. So, let's move on to the next section, metaclasses.

33:05 Yeah.

33:05 And in there, one thing that sort of stood out to me was you were talking about that you can register class existence with the metaclass- what is the story with that?

33:16 Yeah, so metaclasses are like the sharpest tools you can poke yourself with in Python. And, it's kind of like once you use it enough, you're like, Ok, I'm going to try these metaclasses thing, and maybe read "thing about metaclasses" and some email filtering thing he wrote with and how awesome that was, and he realize that is from 12 years ago and it's worth to think about that again. But, it's a really powerful tool, I have 3 different kind of main ways I think you should use it, in Python, I don't think you should use it in other ways unless you really know what you are doing.

33:49 And the registration is one of the big ones. So it is really nice to know if you are creating infrastructure and you are creating classes to represent database tables and sensors that self register, a lot of that kind of like you want to program the kind of sign itself up and initialize itself ups, you don't have to write a buckle out of boiler play to say hey, remember, you can figure this sensor, you can figure that sensor, you could figure this database row, you have a lot of programs that end up having this main functions that are just like lists of registration calls for a hundred lines of registration and then it says go, after that. And so, metaclasses give you a really nice way of saying, hey, anytime anyone creates a class that is part of this hierarchy, register in the central database of classes or central dictionary of classes, so that when I start my program I can go through and do all the housekeeping I need to do for initialization.

34:41 So that is one thing, it lets you do initialization really easily, and then also lets you do lookups really easily so you say hey, is there a class that does this in the system- yes there is, I know exactly where it is, it's right here, I've already imported it, I've already looked at it, I know what it is capable of. So, anytime you see yourself doing that pattern where you have one of something now and you are going to have 50 of them later, and you need the registry of those things, there is no reason to explicitly register things, you should always just have them automatically register themselves, so that is less error prone. And that's the big thing that I'm trying to kind of advise you to do here, it's- we're humans, we leave things out, all the time, and so helping people make fewer mistakes is really important and so metaclasses are great way of helping people make fewer mistakes.

35:34 Nice. The other sort of topic you talk a little bit about there is attributes. I think you talk about like private versus somewhat private, you know, the sort of underscore versus double underscore attributes in there, is that right?

35:48 Yeah, I think this is a common thing also that people have when they come from other languages, they have like strict definitions of looking private and you know, Python doesn't have those things. It has best equivalence of that, and the gist of it is you should just make everything public by default, in almost all cases. Because people are going to go in there and reach in and make things that are private public anyway, that is the gist of it. Guido refers this as "we are all consenting adults, you should let people reach in the class and use it however they want if they want to, as long as you know you are getting yourself in the trouble then if you are willing to take on that risk then it is up to you".

36:28 So yeah, I think the whole idea is that Python is a dynamic language, so things are mutable and more fluid, so you don't have to have this kind of draconian public private enforcement that maybe Java programmers have. It's interesting other languages are so focused on not letting people do things by default, where Python is focused on really letting anyone anything with the class by default.

36:50 Yeah, that's interesting, just language philosophy statement.

36:56 Yeah it is, and some people say that is why Python gets difficult for really large projects, because that's why default in this closed is good is because once the project has millions of code then it is too late. I understand that, but I think a million line Python project is very large program, a lot of people have never worked on something that size and probably never will. So it's kind of like one size does not fit all.

37:18 Yeah, that's a good point, I mean, I see a lot of people that learn design patterns and other things and they just want to apply them.

37:24 Exactly.

37:25 To whatever, right. But these design patterns maybe make sense on very large scales or certain circumstances, right?

37:31 Absolutely, that's why I think all that advice need to be in context, you know. Ok, you give me this advice. "Who are you, what are you doing, how big is the code base you know, how often do you have to do it? I need to know those things because your design constraints are different than mine". So-

37:44 Right. "How similar are we?"

37:45 Yes.

37:45 So one of the sections I really liked, I am looking forward to take into it more- is the concurrency and parallelism section.

37:54 Yeah.

37:55 And you've drawn interesting distinction between what you are calling concurrency and parallelism. Can you maybe speak to that first?

38:02 Yeah, I think this goes back to there was this great talk that Rob Pike who is one of the creators of the Go programming languages, he also works at Google, he did talk "the concurrency is not parallelism", that's a great talk to check out. But yeah, it's basically you know, concurrency is a programming pattern, it's a way of doing multiple things at the same time, and having tools to let you do that, at the programming levels of threads of one of those tools, coroutines of one of those tools. And then parallelism is actually running two lines of execution on a processor at the same time. And that's not necessarily programming tool, it's just the way that program runs. And Python is great at doing concurrency style stuff, but Python can't actually do parallelism. Because of the global interpreter lock which I'm sure people listening to this have probably heard of, but- that is one of its big shortcomings as a language.

38:55 Concurrently- you are thinking of "maybe I'm going to make two web servers calls at the same time and I can wave for one to come back". But parallelism is "I'm trying to compute this financial algorithm and it's just a computational thing, and breaking that up into steps and trying to run that", but thread is not really going to do much for you, right?

39:14 Yeah, or if you try to decode two frames from a video stream simultaneously on two different CPU cores, you **can't** do that in Python without actually going down to the C API. Whereas asynchronous things you can do in parallel, you can even run threads **where that** computation looks like it's happening in parallel but it is actually not. It's happening concurrently. So, you **can't** increase your throughput of computation with threads or any kind of concurrent infrastructure in Python, you can't get parallelism.

39:47 Right. So your first piece of guidance was to use the subprocess module to manage child processes and those might be Python subprocesses or they might just be other executables, right?

39:59 Yeah, that's a cheap way to get some parallelism because at the system level, multiple processes can be doing multiple things on multiple cores, so you can really take advantage of your computer that way.

40:09 Yeah, nice. Another one that you had was to use threads for blocking IO but to avoid threads for CPU parallelism. And that kind of is a little bit of what we were talking about, right.

40:20 Yeah, your web servers calls exactly that, so you can do a web service call in one thread, another web service call in another thread, and they will happen concurrently, but they actually won't process in parallel, so it's great for kind of blocking tasks or IO bound tasks, but they don't actually help you go faster.

40:39 Right. So, verifying my mental model here, I think I recall that if I have a thread in Python, and it does some blocking IO, that thread will release the global lock, right?

40:52 exactly, yes, so you can use threads to avoid waiting, that's basically all you get. But you can use them to and to the gil, we have unlock and then another thread will start running and that's nice. And what the threads are good for. But, that's as far as that will go.

41:08 Right. Sort of the node.js style a parallelism maybe.

41:12 Yeah, it's very similar actually, node has a lot of the same constrains and problems. It only has one in executing thread at any time.

41:19 Ok, so another one was to use or consider coroutines for running mini functions concurrently. How does that look?

41:27 Yeah, this is my favorite item in the book. Item 40. I implement conway's game of life as a set of coroutines, I would say if you understand this one, then you've understood the zen of Python pretty well. And it's basically just trying to say that you can express very complicated high level ideas of the way the thing should have interact, whole system flows should work workflows, using coroutines and this really abstract way that is extremely powerful, easy to test, easy to expand, it's really hard to go into detail without explaining the whole thing-

42:00 Sure.

42:01 It's definitely worth checking out, it's actually online, fully published, if you search for it I think you will actually find it on effectivepython.com. Because, I was so proud on that.

42:08 Yeah, nice. That's awesome, you kind of pulled it out so everyone get to it.

42:11 Yeah, definitely.

42:12 Cool. And then the final concurrency one that I thought was interesting, was to consider concurrent.futures for true CPU parallelism.

42:20 Yeah, this is the kind of hack that someone came up with. So, there is this multiprocessing module, it's built in the Python, that will actually farm out work to some processes that are also Python. It's really ridiculous how it works, but for certain cases, it's extremely powerful and it will actually speed up your program by the number of cores in your machine, it doesn't always work, it also breaks in a lot of weird ways, but when it does work it's magical because you can go from something slow to something that is ten times faster. So it's worth checking out, at least trying. I just wouldn't go too far.

42:55 I think you just summed up multithreaded parallel programming, right there. It sometimes works, it sometimes breaks in weird ways and when it works really well it's magical.

43:06 Yeah, that's true. I guess that's no exception.

43:11 Python has its own peculiarities about how it breaks and how it works, and whatever, right.

43:16 That's a really good point, like people forget to mention that like, "Oh Python only has one thread, it's like- that is a problem." It's a problem. But, multiprocessing like multiprocess programming, multithreading and any other language was also really hard and error prone, so let's not forget that. That is a great point.

43:30 There is a group that was referring to the bugs that you get from multithreading as heisenbugs. And I really liked that, that way of thinking.

43:38 Yeah, definitely, they are very hard to track down.

43:41 So, let's talk about just one more section, in the book- collaboration. Like, how you work well with other people in sort of like teams and so on, right?

43:52 Yeah, I think Python is a huge community, like we are talking about before, and so working with others in Python is really important.

43:58 That's cool. And one of your pieces of guidance was to use packages rather than just patch around a bunch of files, right?

44:04 Yeah, and this all depends on the size of your codebase. When you are starting out, it doesn't matter. Like, until you have ten files or something like that, it really doesn't matter. But you should be willing to refactor your code into packages once you realize you have enough different working pieces, especially if you are working with different people, it's nice to split things a little earlier than you might have expected so you give them room to grow- it's kind of like if you have a kid you buy him shoes one size a little bit bigger, so they can grow into them. Yeah, so packages are kind of like that. I think it's a good way to modularize your code base before it becomes too late. And I've been in code bases where it's been way too late, and we were just why have this even happened? It was terrible.

44:48 Yeah. I can imagine. Do you recommend having like an internal private package server?

44:54 You know, it really depend on your company. I think if you're comfortable with pip, then yeah it's pretty good especially if you are doing the deployments off of it it can be really useful. Because you have a security team that needs web thing so that can be really valuable.

45:06 Maybe one team that builds API and another team that builds the web part that consumes the API or something like that.

45:14 Yeah, you could do stuff like that, it also depends on your source code repo. Google we have just one giant repo for the entire company so for us, I don't need a package server because I have the code, but yeah, it's GIT hygiene to have people publishing APIs that are stable and to think about things in that way. It's kind of the Netflix or Amazon approach to software engineering and it works really well. More like a service oriented architecture but at the API level.

45:36 Yeah, that's cool. So another piece of advice you had for collaboration was to consider the repr method for debugging output and helping people understand the state of your program.

45:50 Yeah. Repr, this is the most valuable tool for me in Python like print is the first most valuable tool for me in Python, but the second most valuable tool besides print is repr because you can get detailed information about an object, and you can basically make an object that will self describe itself. So that when you are debugging you can print it out and say, "hey, what is in this thing, what are the parameters that created this object, how did I even get here" __repr is really a great way to do that. And Python has built in support for all over the place so it's definitely worth checking out if you don't use it.

46:29 Right, absolutely. If you just take like a list and print it you'll get that representation for all the items it contains.

46:36 That's right. And it basically evaluates back to the Python statement that would have taken to create the same list and so makes- it usually can copy and paste reprs back into interpreter to test them out, so it's a really nice way to do kind of an interactive debugging session with your own code.

46:50 That's cool. And the final collaboration thing that I actually hadn't heard of, was you can understand memory, allocations and leaks with something called tracemallock.

47:00 Yeah. So this is another Python 3 only thing. And, it's wonderful, because if you have ever had memory leaks in Python for especially long running processes. It's really hard to track down where your memory usage is going. Any project of size that I've been on, we run a lot of servers at Google so we have this a lot. You end up having this problem where you look at your garbage collection, or your heap and it just keeps growing, you are like where is the actual space going and the tools to do that in Python are horrible. Python 2.7 and before, they are all bad and tracemalloc they finally fix this in Python 3.

47:37 And, yeah, it basically will tell you exactly like this is the object you have too many of, this is exactly where it was allocated, here are the parameters that we used at the time, and so now it is like orders at... easier to figure it out which is really great. There is a port, of that code to Python 2.7 I believe, but I don't know how well maintained it is and I don't think it's actually- it has to actually modify the CPython run like binary itself, it can't be done as a C extension module and so it will never be added to Python 2, which is unfortunate.

48:08 Right, so might have to like build your own version of the source code without and then run your code on that.

48:15 Which is pretty scary, but it might be worth it-

48:18 Maybe not for use in production too much.

48:20 Yeah, I wouldn't use in production.

48:22 For sure. Ok, so that was a bunch od really awesome things that I think will help pretty much people of all level.

48:29 Thanks.

48:31 Yeah, how long has the book been out.

48:34 It's been out since April basically, and it's doing pretty well and this week we also have a video that is coming out which is a lot of the same content but me just typing out the actual code, some people learn better from seeing demonstrations it might be better for them. But yeah, it's had a lot of good feedback, a lot of good reviews on Amazon and other places, a lot of bugs reported, of things I did wrong, but not too many bugs. Only one bug so far that I am really embarrassed about, I did this multiply instead of divide, but the rest of them were pretty good.

49:12 Well, multiply, divide- what's the difference?

49:14 Yeah, you know I got confused.

49:19 So, people can go get it on Amazon and I'll definitely link to the book, but you have a book code that we can give to listeners so they could get a bit of a discount, right?

49:32 Yeah, the code if EFFPY, and simplest way to use it is you go to informit.com/EFFPY and that lets you buy the book directly from the publisher and for some places that are not in US also can be cheaper this way than going through Amazon and let's say it's gives discount of the ebook version, the physical book and also this new video that just came out this week.

50:02 Yeah, you just released that, you released this nice new video training version of the book, so I'll be sure to link to that as well. That'll be cool.

50:10 Yeah, thank you.

50:11 Yeah, you bet. So Bratt, anything else you want to give shout out to, or let people know about that we somehow didn't touch on?

50:17 The only thing is- yeah, like the booksite is effectivepython.com, and everything you want to know about me you can find there and please report any bugs you have on GitHub with all the example code, and I'm trying to make sure that anything that's wrong I address in there. But yeah, if you have questions there is also a link to email me directly so if you hit an issue I'd love to hear from you, happy to answer your question.

50:42 Excellent. So, two final questions before I let you go: what is your current favorite code editor? If you are going to sit down and write some Python, what do you open up?

50:50 I am a Sublime fan, I used TextMate for long time, I still use Emacs when I'm on terminals I use Emacs every day also, I don't use VI but I can use Less, so I can do with the read part of it I guess, and yeah, Sublime is my favorite editor right now.

51:07 Nice. And then, of all the 60 000+ whatever it is PyPi packages, what are those people should know about that maybe they don't know?

51:18 Numpy obviously, I guess that everyone knows about that one. SQLAlchemy is an amazing tool, especially the relational package for doing database lookups, Flask is a great Web Framework that I like a lot, and then there is a bunch of stuff around speeding up programs that I am trying to get more into, Theano is one and Numba is another- these are things that let you use your GPU through Python, which is super cool and you can use that for things like neural network training, stuff I haven't really spent enough time doing. So I am really excited to try out some more of these computational tools and Python packages.

51:53 Yeah, that's really awesome. If you have not looked at the parallel capabilities of GPUs it's astounding.

52:03 It's crazy, and the fact that you can like use that in 3 lines of Python is crazy, so I think it speaks to the community the Python has of scientists and such a diverse community, so it is really great to see these packages blossoming in Python before many other languages.

52:18 Yeah, absolutely. Brett, it's been a very interesting conversation. I appreciate it.

52:23 Yeah, thanks a lot for having me on, and asking all those great questions.

52:26 Yeah, and I love the book and I encourage the readers to go check it out, and they'll enjoy.

52:31 Thanks very much.

52:32 Yeah, talk to you later.

52:32 This has been another episode of Talk Python To Me.

52:32 Today's guest was Brett Slatkin and this episode has been sponsored by Hired and OpBeat. Thank you guys for supporting the show!

52:32 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $4,000 USD.

52:32 Codeship wants you to ALWAYS KEEP SHIPPING. Check them out at codeship.com and thank them on twitter via @codeship. Don't forget the discount code for listeners, it's easy: TALKPYTHON

52:32 You can find the links from the show at talkpython.fm/episodes/show/25

52:32 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.

52:32 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on our website.

52:32 This is your host, Michael Kennedy. Thanks for listening!

52:32 Smixx, take us out of here.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon