« Return to show page
Transcript for Episode #77:
20 Python Libraries You Aren't Using (But Should)
Many of you write to me and tell me how you appreciate the way my guests and I highlight a particular Python package at the end of each episode. Well if you enjoy that little segment, you're going to love this episode.
This week you'll meet Caleb Hattingh who wrote a great book called 20 Python Libraries You Aren't Using (But Should). He and I spend an hour digging into all the very powerful and interesting packages that you probably haven't heard of but will be super excited to use after you learn about them.
This is talk Python To Me, episode 77, recorded September 20th, 2016.
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities.
This is your host, Michael Kennedy, follow me on Twitter where I'm at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython.
This episode is brought to you by Capital One and Intel. Thank them both for sponsoring the show by checking out what they're offering during their segments.
1:24 Michael: Hey everyone. I have a quick message for you before we get to Caleb and his book. In addition to writing this book for O'Reilly Caleb also wrote a screencast course on Cython. And, it looks to be one of the better Cython courses out there, so when he is talking about Cython if you are really interested in what he's up to, be sure to check out this course which is linked in the show notes. And, O'Reilly agreed to give away a free copy of his course, all you have to do to be eligible is be a friend of the show so be sure to visit talkpython.fm click on 'friends of the show' enter your email address and you'll be eligible to win. Now, let's talk to Caleb.
Caleb, welcome to the show.
2:00 Caleb: Hi Michael, it's great to be here.
2:03 Michael: Yeah. I'm super excited to talk to you, we've got some really cool stuff around a book, free ebook that you did and I found it super interesting, so I think everyone will. Basically we are going to take the last question I always put at the end of my podcast- what's your favorite PyPi package and turn that to an entire episode, and just go deep into the idea, right. Ok, so before we get to that though, let's start at the beginning, when did you get into programming in Python, that sort of thing?
2:27 Caleb: Great question. So one of the other podcasts I listen to is the C++ podcast, and just about every guest on that show says that they started programming assembler in grad school. That was not my story, I didn't get into programming at all in school or high school. I started really as a hobby in university while I was studying chemical engineering, which is kind of an odd thing to do as a hobby when you're doing something completely different but as the years went on I kind of got more and more into programming and it turned out that I did my masters degree in process control which is like a subset of chemical engineering and that was all in matlab, so it was pretty much all I programmed. And that's really how I got into programming, and I learned that I really did not like matlab very much at all.
3:14 Michael: Yeah, I spent about a decent, I spent some time that I can sympathize, I don't love the .m files, no.
3:20 Caleb: Yeah. I got to know it really well, and I decided that I didn't really want to carry that forward. And it was really when I started working in the first or second year that I started working. I started learning a couple of languages outside of work, and the one that I really tried to focus on was Java, and I did signed up for a very expensive certification course, I think Java was around version 1.2, 1.3 or something, at the time. And half way through that course, I came across Andrew Kuchling's Python tutorial at the time, which I think was for Python 1.5 and it just blew my mind, I kind of had the realization that I couldn't possibly use Java anymore to do the kind of work, data analyses work that I was doing, because it was so easy in Python, it was just a complete waste of time to develop all of these object oriented structures around fairly simple data pipeline processing tasks.
4:15 Michael: Yeah, that makes a lot of sense, I mean, Java has so much formality and maybe, let's say it's maybe good for large applications, maybe. But it certainly doesn't make sense for small ones, right, like you're talking about.
4:27 Caleb: Absolutely. There's definitely a place for Java, for very large programs, but for the kind of things that I was doing, and especially for shorter programs involving data pipeline processing, Java is just way more than what you need to get the job done. And that was pretty much the end of Java for me, I never finished that course. I really got stuck into Python and this was around 2001, so it was quite a few years ago. And then I watched Python become Python 2, and 2.4 was a big one for me, I used that for quite a long time. And so on. So yeah, that's pretty much how I got into Python, but along the way I did use quite a few other languages very heavily, Fortran I used quite a bit as well because large chunks of the scientific world still use Fortran, I have written a new Fortran 77 code, I've added that to the world. [laugh]
5:16 Michael: That's awesome.
5:18 Caleb: And, Delphi I used for quite a few years. My career has moved into and out of software development and into chemical engineering, I kind of straddled both worlds for the past fifteen years or so. And, I worked as the software engineer, in the hospitality industry for writing hotel administration software for about four years and that was heavily using Delphi, the ide from what used to be Borland and then Embarcadero. So I got to know that language really well as well, pretty much as well as I know Python, I would say. That's very interesting to me, I kind of regard Delphy and Python as almost opposites in many ways, a GUI is very easy in the one, not quite as easy in the other; deployment is extremely easy in one, deployment is kind of difficult in the other, and so on. There are many parallels where I kind of see Delphi and Python as direct opposites. Another good one is the gil, the global interpreter lock, which I think is really fascinating. In the Delphi world, for many years, one of the things I developed is asked for over and over again, was thread safety in the library, because it was one of the huge talking points, they wanted the containers and the structures inside the library to be thread safe because it was too easy to get right conditions in threads because you could [00:06:32] native threads and have them collaborate each other. And, it's really fascinating to me that the exact inverse argument gets made in the Python world, where the presence of thread safety is the problem. So yeah, it's really interesting to be able to have a depth of knowledge in multiple programming communities, because you kind of get a sense of maybe what is really important and what is spurious; everything is set of tradeoffs. That's the thing.
7:03 Michael: Yeah, that's what I was thinking.
7:04 Caleb: Nothing is accidental, but things are designed the way they are for good reasons, and they may not always be the best fit for every particular situation.
7:11 Michael: Right, you learn a language and maybe it's not obvious initially, you don't know the history or whatever, but there were probably some deep thoughts that went into at least all the popular programming languages like, those things have evolved with lots of thought over time, absolutely.
7:24 Caleb: Yeah, yeah.
7:25 Michael: Nice. Okay, well so a lot of Delphi, a lot of Python, a lot of scientific programming, what are you doing with Python and programming today, like what's your day job?
7:35 Caleb: Good point. So, I can give you a quick two minute, well, one minute let's say run through of the things I've done. So I started in chemical engineering with doing a lot of simulation work using off the shelf tools for that, and as time progressed, I started moving more towards the problems with- there were no off the shelf tools and for those you have to write code, and I began using a lot of Fortran, and then I started incorporating Python into that and then I took a break from engineering and went into software development where I used Deplhi, but I also used Python as well for web development, then I went back to engineering and I started writing simulation software for coal gasification, which is what brought me to Australia, and what's interesting about that job was for the first time I really decided to use Python for the entire system which means all of the number crunching stuff as well, and that way Cython really came to the picture for me; I made the decision to use a full Python Stack for that simulation work because it seemed to me that the Cython was mature enough really to be able to give you the speed that you need to solve these mathematical problems, in the background. And the definitely proved to be the case. It was a good choice on my part. Cython is not so much a break from Python really as an extension, but it gives you all of the native speed and control over the memory layout that you need if you want to make fast code. So I spent a couple of years doing that.
8:56 Michael: Yeah, would you say it's a little analogous to like inline assembler in C++ or something like that, you're like just if this one loop could be faster, let me just do this part fast?
9:05 Caleb: I think I would disagree with that; I began with that in mind, it's easy to look at it that way but Cython is so much more than inline assembler. There's another way to look at it which is, you can write, you can get all the benefits of by writing what looks like Python and in one or two places you just add some types onto a couple of variables and you can get a hundred fold [00:09:23] increase in speed. So whereas inline assembler is much more of a niche application of a different technology. Unless you can read assembler really easily which I can't.
9:33 Michael: Yeah, neither can I.
9:35 Caleb: Having inline assembler, yeah. Where Cython is not like that. By large Cython is as easy to read as Python. There are a couple of things that are different in the layer, but overall, you would find Cython as easy to read as Python.
9:49 Michael: Okay, sounds good. So, back to what you're doing today.
9:51 Caleb: Yes, so I've been working for the past couple of months on a contract for gps tracking, which has also been using a full Python stack and I was very lucky in that they were willing to go straight to Python 3.5. So I've been writing asyncio code in Python 3.5 since February which I feel very fortunate to have had that opportunity. And yeah, I'm starting a new job, tomorrow, from the day of this recording, working for a company called Console, which was formally called IIX and yeah, I believe my title is- it has something to do with network orchestration which is going to be a whole new thing again for me to learn.
10:28 Michael: That is quite cool. Python of course plays a super important role in that space, so it makes a lot of sense.
10:37 Caleb: Yeah, absolutely. One of the big benefits of Python just as a technology choice is that you can use it just about everywhere.
10:43 Michael: Yeah, that actually is really important.
10:45 Caleb: It's extremely important.
10:46 Michael: Yeah, so I think that's a wide range of background experiences, and it gives you a nice overview of the ecosystem and the standard library, and all the different ways that Python makes you efficient, productive, and so on. And so you wrote a really cool book, you wrote it for O'Reilly, right?
11:06 Caleb: Yeah, that's right.
11:08 Michael: Yeah, as a free ebook, I think, and it's called "20 Python Libraries You Aren't Using But Should". And, I thought it was a really nice survey.
11:18 Caleb: Very BuzzFeedy title.
11:21 Michael: [laugh] You know what, it always starts with 10, 20 or like the seven things you should never say, like you know it sounds like buzzfeed, but it's really succinct, it's good.
11:32 Caleb: The idea for the book came from O'Reilly. Susan Conor at O'Reilly suggested it to me in Austin if I would be willing to write the book, and so the title was established before I got to the project, and then I provided the content. But the original working title was "10 Python Libraries You Aren't Using But Should" and I couldn't stop at ten. And in fact, if you count the major featured packages there are 20 but you'll see throughout the text of the document that I referred to a whole bunch of other libraries as well. In footnotes and-
12:03 Michael: Yeah, I thought that was interesting, we'll talk about the 20 major ones and maybe even touch on some of the ones that you pull in, like for example one of the web service bits the implementation of the web service uses a few other libraries that we can talk about later, but they're not actually part of the 20, right, so there is, I feel like you get a really good well rounded view of what's out there.
12:24 Caleb: I was quite cautious about writing the book, because it's a fairly contentious thing, the choices that people make, many people can be quite passionate about those things, and the brief of the book was we want you to focus on libraries that other people don't know about yet. So that means I'd have to leave things out and it also means I have to leave things out that maybe fairly popular which means they might be quite a wide spread degree of support for libraries that I am not going to be mentioning, so I was somewhat upperhencive about that. So, the libraries that I try to focus on were things specifically that may not have much exposure. Which is very interesting idea for the book. I can't include things that are too niche that could not be used for match, very low applicability, but at the same time, I did not want to include things that were very well known because that defeats purpose of the book, so I found it quite challenging to pitch it.
13:15 Michael: It's the ones you aren't using, not the ones that you are, right?
13:17 Caleb: Exactly. Yeah.
13:19 Michael: It is a challenge and you didn't want to go okay, let's strip off the like if you say what is the most popular library to do- this, you almost have to say ok, except for that one, what else can we do, but I thought about that when I read the title, and I was like well, this is going to be a bunch of niche things that are like second fiddle to the stuff that you actually should be using, but no I think it was really good, so maybe we could start by talking about the ones that everybody has installed already, that's the stuff that comes in the standard library. So in a first chapter you said, “hey look, there is a bunch of stuff you are not using that's built in”.
13:52 Caleb: Developers with experience do tend to look in the standard library first, because they've been burned by carrying extra dependencies which after a couple of years may not be as well maintained. The impression that I have is that more experienced developers tend to lean more heavily on the standard library when choosing technology, even if some of the time there may be other third party packages that might be a better fit, that's a decision that gets made, that's really a trade-off, and I do the same, when I have to deploy an application to production, and I know that this is a core service, I tend to lean more heavily on choosing things out of the standard library when possible, as opposed to adding third party dependencies. Whereas newer developers tend to get whatever is the greatest on PyPi and run with that. So what seems to me to be that case, is that more experienced developers have a much better and deeper knowledge of what is available in the standard library, so even though the book was intended to be focused on third party libraries only, I did want to squeeze in some of the absolute must have, must know standard library options, like the collections package, which is the first section in this chapter, if you watch any of Raymond Hettinger's talks he will, he plugs the collection's module heavily as what he should, because it's awesome. I do feel strongly about that, that people really should know more about what's in the standard library and my original version of the book had more of it in but we decided that we wanted to focus the book more on the third party stuff so it got trimmed down.
15:15 Michael: Sure, that makes a lot of sense. There is nobody who is the advocate for the thing in the standard library, it's just built in. But when somebody makes their open source library, they setup some github pages thing and they've got some cool logo and you know, like there's somebody promoting it in a sense, and so I can see that- yeah, so a couple of the things that you talked about in the collection library, one of them which I think is pretty interesting entirely is ordered dict.
15:43 Caleb: So what's interesting about this section is that it may become redundant in December. I don't know if you've been following the discussions on the Python mailing list, but the new dictionary in Python 3.6 which I think the release date for the final version is December the dictionary is going to become ordered, whether that is going to be advertised as a requirement for the language spec or whether that's just going to be an implementation detail, remains to be seen.
16:12 Michael: Sure. Yeah, and so every now and then you'll see people building special dictionaries for Python, that are ordered, for example the Mongo Db library exchanges dictionaries at serialization, and the changing of order causes more rights on the server for documents, than if it doesn't and so they may want to create their own and we've got this ordered dict, so what are some of the problems you ran into with like the regular dictionary? Or like, why do you care about ordering, I guess?
16:39 Caleb: The main use case for ordering that I've come across, is usually when processing things that are really mappings in a sequence and they need to be processed in the sequence in which they appear, so I think the example I gave in the text is, a common example is processing lines in a file with the lines map to something else and you want to see serialize them or persist them in some way retaining the order that they appeared in the original list.
17:03 Michael: Right, like maybe like a CSV you're going to load and you're going to say look up by like some ID which is a column, if you write it back you want to be able to save it at the same order and not have to maintain like two data structures or something?
17:13 Caleb: Yeah, that's right, exactly.
17:15 Michael: Okay, so there is right now, in all the versions of Python the collections.orderdict which is a specialized dictionary that solves this problem, it just so happens if you live right out of the very edge of new Python, you might not need that in December, but a lot of people don't live there, right? So I think it's still totally relevant.
17:35 Caleb: Yes, and from the discussions that I've been seeing on the Python-dev mailing list, it probably is going to remain in the library; what the latest that I've seen is that the order of keyword arguments in function calls is going to be guaranteed to be maintained, but the requirement for normal dictionaries to be ordered may not be a specification of the language spec, which means that other implementations of Python may not need to maintain that.
17:59 Michael: Right, okay.
18:00 Caleb: So that's quite interesting. One of the caveats that I mentioned about order dict towards the end of the section, with the big red triangle is beware creation with keyword arguments, which is exactly this problem when you create an order dict and you supply keyword arguments as you would with maybe a regular dictionary, the problem is that the order is maintained with your specification because the keyword arguments first get created as a regular dictionary before they get created as an ordered dictionary, and that's going to be changing for sure in Python 3.6.
18:29 Michael: Oh excellent, it's good to know. Because, that happens at the call site before the order dict class ever gets any information, it's just given a dictionary and it can do what it can do but it's too late, the order is already changed, right?
18:42 Caleb: Yeah, exactly, that's right. And I think the other guarantee is that the dunder dict entry in classes is also going to have a guarantee of the order being maintained; even though it's implemented as a regular dictionary, what he language spec requires and what actually happens in practice are two different things. So the developers of Python are trying to maintain the language spec as a spec even for other implementations besides CPython; which is difficult to kind of keep in your head when all you work on is CPython which is largely the case for me. But yeah, they're dealing with bigger problems then just whatever goes into CPython.
19:17 Michael: Yeah, that's an interesting thing to keep in mind because we often just think the CPython equals Python language but there's a lot of other implementations and extensions, and forks and whatnot.
19:26 Caleb: Yeah exactly.
CapitalOne has a special message for you- they need Python pros who love to work with data; put your Python experience and work at CapitalOne and help them use data to make life better for millions of customers. CapitalOne is employing the latest tools and approaches to do data analytics and data science from the ground up. There is smart, creative professionals who love to explore new ways to interact with data, they are interested in figuring out novel, advanced Python techniques and even more interested in finding more people who will help them do that. When you join their state of the art Python community, you will work with people you really like, people who might be listening this podcast right now. Relentless innovation is their way of life, make it yours at CapitalOne. Visit jobs.capitalone.com/talkpython to learn more and apply today.
20:29 Michael: So another one, I would say is also one of, if I had to pick the most useful thing to come out of the collections library, I would say it's probably named tuple, which you highlighted in your book as well.
20:39 Caleb: Yeah, yeah, named tuple is kind of interesting, I have recently started using it directly when creating tuple structures, but most of my experience with named tuple really has been converting old code that used regular tuples into using name tuples just to improve the maintainability aspect of that. And it is very powerful in that respect.
20:58 Michael: Yeah, it doesn't change the performance much, and it's an easy thing you can do because named tuples are compatible with the existing code, but they definitely add a layer of maintainability, right? So maybe a regular tuple has three things and you need to put some new item in the middle to make it four, well the code that was going t(2) now is not true, not accurate anymore, right? But if you can refer to them by the names, the property names, that's fantastic. Which is what named tuples are, it's great.
21:30 Caleb: Yeah, that is good one.
21:31 Michael: Yeah, one that I've done a lot less with is contextlib, what's the story with that?
21:35 Caleb: So, what did you think of my example, just for the listeners, the example that I gave, the code snippet on the contextlib is creating a simple context manager that measures the time- well, it records the time before and after the execution of the body of the context manager and then gives you a way to calculate the performance of that section. I haven't gotten much feedback about the book yet, because it is fairly new, and I was curious what your opinion was?
22:00 Michael: Well, I got to say it did take me a moment of going back, I mean, look at this context manager implementation, it's just only three lines of code that I can just- you know basically the idea is you create a context manager instance by calling this method, and it will when it enters capture the start time, when you leave the with block or suite it captures the end time, and then it tells you how much time had passed. And so the implementation is T equals get the perfect counter, it's easier, equals perfect counter; yield a lambda which does a computation and then compute the value that is actually used in the lambda above, and I was a little bit taken aback by that, it was interesting.
22:46 Caleb: Yeah, I was worried that it was perhaps a little bit too complex, and I didn't want to- the fact that the use of the lambda, I didn't want the use of the lambda to overshadow the demonstration of how the context manager works, but basically where the yield comes in is where the body of the context manager gets executed and if you return something from the yield, that's pretty much what you get at the end of the line when you say with timing as thing, the thing is what gets yielded out of the context manager, and the little bit of cleverness in this particular example is that the lambda is a closure over the name space inside the timing function; so it captures the storage location of t1 and t0 so only when you evaluate the lambda later do the values of T1, T0 actually get used.
23:32 Michael: Yeah, it's quite clever, yeah.
23:33 Caleb: Yeah, this particular example is not imaginary, I use it quite a lot.
23:38 Michael: Yeah, that's nice. I appreciated it because it made me think and stop and not just read yep okay, yep okay, oh wait a minute, not necessarily okay, fine what's going on? And, you know, that was cool like it's nice when code makes you do that if it's not just because you're confused and it's too messy or whatever, it's cool.
23:54 Caleb: Yeah, my editor at O'Reilly, Dawn Schanafelt, she was really good about making sure that each of these steps were expanded in more detail and the editors at O'Reilly are really good, they can pick up based on the style of your writing whether you think you've expanded sufficiently or not and they can prod you to say are you sure you've explain this, it seems like you were a bit terse, perhaps add a few more points. So, all these bullets and points on the side where everything is spelled out in great detail, that wasn't driven by me, that was driven by the editors, they're really good at what they do.
24:25 Michael: Yeah, you did a good job as a team of breaking down the steps, yeah that's cool. So, the other thing that you and that was built in, was the concurrent.futures module in Python 3 and I thought that was a really interesting way to think about sort of a unifying API between process-based parallelism and thread base parallelism.
24:46 Caleb: Yeah, I wanted to push that point because I think that's the underutilized aspect of concurrent.futures is that it gives you this really easy leave it switch paradigms. For some processes thread based programming is valuable and for others process-based parallelism is equally valuable. And you get the same interface, really just about, so you can switch between those two paradigms really quite easily after the fact which is really interesting; usually for complex code involving parallelism, you end up with a structure that is hard to change to fit it into different paradigm, unless you do a rewrite and the fact that concurrent.futures gives you the same API for both thread based work and process based work is a really cool superpower.
25:26 Michael: Yeah, it totally is. And, it definitely is a simplification because when you start talking about threading there's so many edge cases and interesting variations, but maybe the general rule of thumb is if you spend most of your time waiting on the network, then thread based parallelism is probably good, especially if you're sharing a lot of data as well; and if you're doing a lot of computational stuff because the gil, and you're not using Cython or something, then you can't really parallelise that very much, so multiprocessing and multiple processes for that is may be a much better way to go. But yeah, with the thread pool, executor and- what was the other one called? The process pool executor, those two have exactly the same API, and so if you write your code against those instead of directly against multiprocessing and directly against the thread API, you literally change your import statement and it changes where stuff runs and how, which is a pretty cool while you try it out.
26:24 Caleb: Yeah, that's really good. One comment that I also want to make is if I make the choice between whether to use threads or whether to use processes, it's not because of the gil; because as you mentioned Cython lets you drop the global interpreter lock, that's not an issue for me I can write my number crunching code in Cython and use Python's normal threads and still access all of the calls; the distinction for me between whether to use process based parallelism or thread based is really about whether I need to use, I need to be able to access the entire memory space in the process, so that is the main distinction about whether the things are ok to be separated by processor whether I really need the entire memory space to be accessible by all of the parallel parts of execution. So if that is the case for example, if the batch of work that you need to operate on it has to all fit in the same memory space inside the process and you need to work on different sections of memory concurrently, then I would use threads. The presence of a global interpreter lock- while it's interesting, is not really a bottleneck anymore in CPython because of Cython, because it makes it so easy to drop the gil.
27:28 Michael: Right, awesome, okay, yeah, that's a really interesting point, and we will definitely be coming back to Cython. But if you're working on some data structure that is really large and the threads are updating multiple parts at the same time, then yeah, you want to keep that in the same process space.
27:42 Caleb: Yeah, absolutely, it's really difficult to make that work with process-based parallelism. I have been looking at ways of doing that and I would like to find more about that about using memory mapped files to share memory between processes. But I don't have much experience with that yet, that's something that I would like to get into more.
27:58 Michael: Yeah, that would possibly be a solution, but I don't know what the performance looks like, and it's interesting.
28:06 Caleb: Yeah, me neither.
28:07 Michael: Alright, so the next one that was built-in was logging, you said look it's time to get over the print statement if you're trying to actually do debugging stuff, don't just put it out like it's almost the same as debugging but you get a lot more.
28:21 Caleb: Yes, so the experience that I have, this is quite a few years now, the experience is I write out a new module or new script using print statements and a couple of hours later or a couple of days later, it becomes something that I actually want to use and depend on, and then I go back through the same code and I change all the print statements and logging statements and for the last couple of years I've now gotten the habit of just beginning with logging, just put in the boilerplate, the setup line, and then creating your logger and then you just run with that.
28:47 Michael: Yeah, it's pretty straightforward, right? You import logging you can logging.getLogger and then you can see logger.debug logger.info or warning, and I agree with your sentiment you know, where I find it I'll be totally happy with print for a while and then I want to make the code that I was playing with a library and not an application.
29:09 Caleb: Right, yeah.
29:11 Michael: And then all those print statements, it's like super hard to make them go away or to configure them, and it's just like argh, alright just remove them. Yeah, so logging, excellent. Another one that I really like in the space, although this is the built-in one, is I really like logbook, I think the logbook it's a nice external one, but like you said, having stuff built-in is great.
29:31 Caleb: Okay that's good tip, I didn't know that, I'm going to make a note.
29:33 Michael: I think that's Armin Ronacher, I can't entirely remember, I'll have to look, but it is really good. OK, let's see- so another thing that you might want to do is run something on a scheduled basis right, like every five minutes I want to do this thing or exactly on the hour I want something to happen. And the OS's have built-in ways to do this and I guess you could like spawn a threat or something and watch, but there's some cool stuff built-in for that, right?
30:03 Caleb: Yeah, that's right, so you're talking about the sched module S-C-H-E-D, this is a really good example of how you have to be aware of your biases for people who only ever work on posix systems or Linux for example right, Cron is always there it does what it does really well, there's a wealth of information available on the internet for how to use Cron, so it seems bizarre that they would be this thing in Python that does exactly the same job, but the thing is Cron doesn't run on Windows, Windows uses a separate system. However, because Python includes the sched module, you can get the same or very similar functionality to what you might get in Cron or the Windows task scheduler with a cross-platform Python module; and that's really powerful, if you're writing some service or library that needs to do these jobs on a timer, or at a particular time of the day or so on. I look at sched as a really great example of what Python provides in terms of cross-platform support for getting this kind of functionality but in a cross-platform way where you can use the same code base on multiple platforms.
31:03 Michael: Yeah, it's really nice and you basically set up the scheduler and you give it a priority and a frequency and then you say you can call this function whatever it's time, and you can do that in either a lapse time like ten minutes from now, or every five minutes, something like that, or you can do it on a more, like once a minute exactly at the minute, right?
31:27 Caleb: Yeah you can control completely when the target time is, what happens to be. I can definitely see sched becoming a part of the robotization I guess, of the internet in a big way; automating things and creating bots and timers, and work use and so on.
31:47 Michael: Yeah, it's beautiful if you've got some embedded device running your Python code and needs to get home every now then just set that up right?
31:55 Caleb: Yeah. absolutely. We've got to the end of the standard library section.
32:00 Michael: We have?
32:00 Caleb: There was an additional one that I had in an earlier draft of the book, but we dropped it because it was too short I guess, and that's Shlex there's a module called Shlex in the standard library, which I wanted to include for no other reason than it has a split function, which will split strings, like the normal split except that it will retain quotes around sections, so you can group chunks of words with quotes just like you might imagine shell processing would process your commands. If you put quotes around sections of things that it treats those as one thing, so the Shlex module in standard library has a split function that does that for you as well.
32:42 Michael: Nice, so you can almost escape the thing you are putting on by putting quotes, ok, awesome.
32:49 Caleb: You don't have to do any quite processing yourself, it's already in the standard library.
32:54 Michael: Yeah, excellent. Okay, very nice so that was sort the look inside of what's in the box if you just have Python, and then you said alright, let's look outside at external packages and why not start with a better way to install packages?
33:06 Caleb: Yeah, absolutely. So, if for anyone who doesn't know about flit and you found that the normal process for creating and publishing a Python package to be arduous flit absolutely is the thing that you need to look at, because it automates for simple packages, it automates almost entirely, everything that you need to do. It's by Thomas Clavier, he is very active in the Python scientific community, and I think it's just awesome, I'm using Flit at the moment for several of my own smaller projects.
33:37 Michael: Yeah, that's cool, so if you want to submit something to PyPi, you have to create a setup py with a lot of various settings, you know set the license in the right way, so people could discover it, and who's the author and where's the documentation and what version it is, all those kind of things. And, if you install Flit you can basically say I'd like to initialize this package and it just lets you, it basically takes you to Q&A and then it generates the things that needs to upload your package, right?
34:04 Caleb: That's right, yeah, and the Q&A is pretty short I think it's four questions or something like that. Another good tip is the cookie cutter project by Audrey Roy Greenfeld and there's a cookie kind of project for creating a skeleton for Python package, and it's quite eye-opening, when you run the cookie cutter and you see how many files it creates in a folder, there's a manifest,in and there are several other extra files that I use just to create and publish a package; whereas flit does away with all of that, you've really just got the flit.io and you can get your package on PyPi.
34:37 Michael: It's quite simple the stuff in the [00:34:40] file, it's not outrageous right?
34:42 Caleb: Yeah, exactly.
34:42 Michael: Nice, and so then you can say things like flit wheel upload and it'll just take whatever active package you happen to be in, with the version specified in the files and just package it up and send it right?
34:53 Caleb: Yeah exactly. I haven't tried Flit yet for packages with extensions, so yeah, I don't want to say that it can do that as well, because I just haven't tried that myself, but that's something that I do want to dig into as well.
35:06 Michael: Yeah, absolutely okay. Another thing that is very common is to create some kind of shell utility or app that has some kind of terminal output and there's not a lot of facilities in the standard library for like Keller [00:35:21] output in a nice sort of fancy style graphics if you will, and so one of the things you talked about is Colorama which I thought was pretty cool, I've looked at it a few times.
35:32 Caleb: Yeah, I feel very strongly about Colorama and the reason is because generally speaking, we write software for people, for other people or for ourselves and you see the output from software so particular in the terminal, you have to deal with that a lot, and I think that making that output friendlier and easy to read and easy to understand the context, but for example by using green for good and red for bed, it makes it a lot easier to use programs really. If we have to write software that works in the terminal as opposed to writing graphical user interfaces, there's no reason why we can't make that output appear better. The prompt-toolkit is another good library for making interactive user interfaces in the terminal, and I didn't cover that in the book, but I think later we come to the pt python interpreter- so we'll get to that later; but that is part of this, the use of color I strongly believe can help to make better use interfaces on the command line.
36:32 Michael: I totally agree with you.
36:32 Caleb: What makes Colorama so great is that they complete the abstract away again platform differences. So your code that uses Colorama will use the correct ansi codes in a bash shell but when you're running it in the windows command prompt it will also use the great color codes for the environment. I think that's really powerful you're not really committing to a particular platform by using Colorama, and it is a well-maintained package that I think support goes back to 2.64 and include 3.5 as well.
37:00 Michael: Nice, and you said also that you recommend the color log as a way to add color into your log messages, so like warning is one color, error is another and so on?
37:09 Caleb: Yeah, exactly, and it's two or three lines and you get that functionality and all your existing logging messages will just get those colors. It's a really easy drop-in replacement just to make sure that you have colorization for all the different logging levels of your logging messages.
37:23 Michael: Yeah, I think it's great, if you see something red go by you know, obviously pay attention right? It's great.
37:29 Caleb: Exactly, yeah. Bold red I see for critical.
37:33 Michael: Yeah, absolutely. So, another thing that you talked about were on the terminal the CLI is accepting arguments; so built-in we have argparse, but there's maybe some better ways, and one of the ways you recommend was the Begins library? What's Begins?
37:49 Caleb: Begins is a library that I first heard about at PyCon Australia in 2014; the author Aaron Iles gave a very strong demonstration of Begins and it struck me at that time how much you can really do with Python if you exploit all the features of the language that are available to you. So the points that are made in the book was that Begins just from the perspective of API design is extremely aggressive with exploiting everything that Python provides to you. For example, the annotation, the variable annotation format and the function definitions Begins uses those annotations for the docstrings of each of your parameters so that you don't have to add that anywhere else. And I really like the way the Begins API was designed to give you as much functionality as possible for as little input from you. I like that tradeoff very much.
38:40 Michael: Yeah that's really nice.
38:43 Caleb: Yes, what I have heard from many people though, is that they much prefer a slightly more rigorous specification format like what you can get now in the click library and docopt also gets a lot of love which is another way of creating your command line interface by- not docopt, I forget the name now- but there's another library where you can write out the help message of your CLI tool.
39:08 Michael: Yeah and it'll do it over, it's basically the reverse of Begins I think that is docopt, yeah.
39:13 Caleb: Okay it is docopt, yeah. So it's the reverse of Begins you write out your help message that will be printed when the user types help and then it infers with all your parameters. That is also fairly popular, even so I have found that for my own small scripts Begins gets me going much faster, and that even subcommands very easy to enable.
39:33 Michael: Right, so basically you have some method you want to give some kind of CLI to it, it takes some parameters, and you just give it a decorator or a subcommand decorator and now it is accessible and it's part of the help text and all that?
39:45 Caleb: That's correct, yeah.
39:45 Michael: Excellent.
We all love Python for its tremendous productivity benefits, but getting the best performance take some work. What if you could get out-of-the-box easy access to high-performance Python? Intel distribution for Python developers delivers just that. Get close to 100 times better performance for certain functions when using numpy, scipy, scikit-learn linked with the optimized native libraries like Intel math kernel library, access efficient multi-threading and Python projects like Numba and Cython. Try the Intel distribution for Python and experience performance today at talkpython.fm/intel. And profile your Python and native C, C++ applications for performance hot spots with Intel vtune amplifier. With Intel it's all about performance.
40:53 Michael: Alright, so let's move into the GUIs, the graphical interfaces and one of the first things you talked about is creating interactive dynamic graphs and things like that and while matplotlib plays a big role there, you also talked about PyQtGraph. Why'd you pick that over say matplotlib?
41:14 Caleb: The primary reason why I have selected PyGtGraph over matplotlib is for interactivity. It's hard to imagine that you could have a highly performant charting library for Python, but that is exactly what PyQtGraph is. It's based on Qt as the widget toolkit that runs in the back end but the interactivity is really good, you can have graphs that draw spectrum running at 50 frames a second quite easily, and you can drag and zoom and pan all the wild animation is happening, so you could have a data stream where you're plotting the data as it comes through live; whereas with matplotlib that degree of interactivity is not really there.
41:52 Michael: I see.
41:53 Caleb: It's not because of a lack of ability, it's because Matplotlib has been designed towards producing publication ready type charts in a similar way to what matlab's charting facilities were designed. Whereas PyQtGraph has been approached with a whole different use case in mind. So in my chemical engineering work PyQtGraph has been very valuable for me to be able to plot live data and then examine in real-time pan and zoom and move my moving data sets around.
42:22 Michael: That's awesome.
42:23 Caleb: And yeah, if that was all that PyQtGraph provided that would already be enough and that was my largest use case for it. But it has a fairly feature complete widget library in the background that lets you plot, not plot but create widgets on the fly for arbitrary Python data structures, so you can get input cells and sliders and so on that can manipulate your data and PyQtGraph provides all of that as well.
42:47 Michael: Yeah that's excellent. If you want to embed some kind of like live data thing into your app, it sounds really cool for that.
42:53 Caleb: Yeah it definitely is a good choice and especially if you're already using PyQt, PyQtGraph is a drop-in replacement. You can add its chart windows as a widget inside your existing PyQtapp. Yeah that's really great, and there's a lot of interesting talk around PySide coming back to the same company that does Qt and that sounds like it's going to be a real, like this is a vibrant growing area, so that's great.
43:19 Caleb: Yeah, I've got my eye on the resurgence of PySide as well.
43:22 Michael: Yeah, cool I'm totally excited for that. So then the next thing you talked about was one way to build your apps is using these graphical frameworks but a very popular one even using CSS front end frameworks like bootstrap and stuff is web development. So there's another interesting library that lets you have Python logic in your desktop application but actually presents the user interface through a GUI. Do you want to tell us about that?
43:50 Caleb: Yes so I also included PyWebview which is something that I found while doing the research for the book, it's not something that I had used before. But I was blown away at how such a powerful tool exists and it could not be better known. Most people know about the electron framework and the Chrome embedded framework which can also make these desktop apps that rely on the WebKit engine to provide the visualization layer. The interesting thing about PyWebview is that it doesn't require you to bundle something like electron with your app it just uses the native browser.
44:22 Michael: I see, that is amazing, is it cross-platform?
44:25 Caleb: Yeah, it's cross-platform so it will use internet explorer or edge on windows nd it will use the webview widget on OS X which is what power Safari and on Linux it will use whatever is native there.
46:08 Michael: Yeah, it looks really interesting and I kind of prefer CSS and HTML for GUI design so I may have to try this out, it's definitely worth looking into.
46:18 Caleb: I definitely recommend it. In the example that I used by also used another Python library called Dominate which allows you to create a HTML Dom and structures within the Dom directly from inside Python code but that was just me being too cute I guess. You can just write your HTML and CSS out as you normally wouldn't load and that works fine.
46:39 Michael: Right so you would, could you do something like have like a Chameleon or Ginger 2 template, and something like that, and pull that in?
46:47 Caleb: Yeah absolutely.
46:48 Michael: OK,.
46:49 Caleb: Definitely, no question. And a big benefit of PyWebview over using electron is again that you don't have to distribute a fairly large browser engine alongside your app.
47:01 Michael: Yeah, excellent.
47:02 Caleb: If you can find a way to bundle just the Python parts of your app, when you run it, it will use the native web widget of your target operating system.
47:11 Michael: Excellent, ok, I really like that one and I definitely want to have a look at it as well. So, moving on to sort of the systems management, system tool stuff. The first one you brought up was an example of something I was trying to do and one of my online classes that I was building, and like why is it so hard in the built-in process stuff and that's about managing processes with psutil.
47:37 Caleb: Yeah, so worry no more, because there's a library called psutil that does everything you could possibly want in terms of accessing information about the system and more. I have a feeling that psutil is going to be bad for the business for many monitoring companies, server monitoring frameworks, because it's so easy to run psutil in a demon on your server and get it to send information back to you about which process is consuming how much memory, if your particular application is misbehaving in some way or if the system itself has started to change how it is supposed to be operating; psutil makes all of that really easy, I reckon in a day or two you could probably whip something up that can give you as good performance monitoring as what you could get from a cloud provider currently.
48:22 Michael: Okay, wow, yeah you can ask for things like what's the CPU percent on the system and if you have like eight cores it will give you an array of eight floating point numbers that are percents and you can say what is the current process that I'm in, how much memory is it using and things like that, it's great.
48:37 Caleb: Yeah, and you can access all the other processes as well. You can get some information from all of them, from everything that's operating on your system.
48:46 Michael: Yeah, nice. So that was for watching processes and system stuff. Another thing that people often have to do is they have to watch a directory for when a file either changes or a new file arrives, like somebody's uploaded some new csv file, we got to ingest it and do work on that; and you talked about a thing called "watchdog" for that?
49:06 Caleb: Yeah, have you used that before?
49:07 Michael: I have not, but it sounds really cool and like the others you brought up is very nice that it's cross-platform even though the implementation is quite different on the different OSs.
49:15 Caleb: Yeah, that's right, and just like psuutil, it abstracts away some fairly complex work into a very nice very easy use API. That is again cross-platform. One of my requirements for selection in the book throughout is that every library had to work in a very easy way on all the three big target platforms. So, they should all be cross-platform, and watchdog probably does the best job of hiding platform differences away because these notification systems are quite different on each of the target platforms, and luckily you don't have to worry about that whatsoever, it completely hides away those differences and as you said, it gives you a way to monitor particular directory for any changes.
49:58 Michael: Yeah, it's really nice, so you just create a class driving from some built in monitor event handler type of thing, and you say call this function when you create one, or call this one when a of function its modified and then you can just tell it to start absorbing and it actually does that in the background on a background thread right?
50:15 Caleb: Yeah, that's right.
50:15 Michael: Yeah, very cool. So, the other one you talked about alluded to before is Pt python which I've not played with this, but I'm thinking this is getting installed. I really don't love the repl that much, the building one, but this is cool I need to check this out. Tell us about it.
50:33 Caleb: Yes, so pt python is based on another Python library called prompt toolkit which is a toolkit for making use of interfaces in the shell, or in a command-line view. And Pt python is a replacement Python interpreter, but it's supercharged for editing and editing history and bringing back previous functions and changing them and it has color support and a whole bunch of other features as well, which I could not get to in in the discussion. Pretty much the first thing that I install after updating [00:51:06] setuptools in a new virtual end is pt python and that's the interpreter that I used for doing any of that interactive kind of work.
51:13 Michael: Yeah it makes a ton of sense. For example if you, one of the things that drives me crazy in the repl is I'll type out like a function or an if statement or a loop and more likely, and then I'll either make a mistake or I want to run it again slightly differently, and then you've got an up arrow like, ok I know I'm going to [00:51:31] up there are five times and hit enter and then like sort of unroll the history so I can get back and I got to remember the line I changed, and like this one if you say I want to go back to some multi-line thing I worked on, it actually pulls up the multi-line thing right there, which already makes it worthwhile plus the color and the auto completion and all that, it's great.
51:49 Caleb: Yeah, that's right. And so, when you press up arrow and you get that multi-line statement that you did earlier, I use the VI key bindings and that all works, I can go to the top-of-the-line go down, I can to delete a line or [00:51:55] paste. So if you're used to Emacs they have Emacs key binding support, and if you're used to VI you can enable the VI key binding support and you get the much of the power of those keystrokes and commands inside every single line that you edit and enter inside pt python.
52:19 Michael: Which is so much better than the built in. Yeah, that's fantastic.
52:23 Caleb: If you run in a split screen in your terminal where you have for example your editor in the top half and a command line on the bottom half, if you run Pt Python in the bottom half, what's really interesting is if the key bindings match the editor that you're using you almost begin to feel like you're working in one environment because the key bindings work in your editor, and then when you jump to the repl, it works there as well the same way. So that's really nice I worked like that almost continuously.
52:47 Michael: That's really nice, yeah I like it, I'm indefinitely going to install it and check it out. The next thing is, moving onto the web APIs and http services and so on, is something I had not heard of but it's very nice, it's called Hug for building APIs.
53:04 Caleb: Yes, so just like we discussed earlier with begins, what I really liked about Hug is how they try to maximally exploit the features of Python to make as simple that possible use interface for you as a program to implement an API. I have had experience before with the Django rest framework which is an awesome industrial-strength very well designed, very sturdy and robust rest framework, so I recommend that one strongly. Flask also has a good rest framework, those are not bad choices at all, but I did have the impression that very few people knew about Hug and for simpler kinds of applications, I think Hug makes it extremely easy to get a rest interface up.
54:12 Caleb: Yeah exactly, and you're done pretty much, and you get the documentation because it auto generates that from your function declarations; and versioning is also pretty easy to add, which I had in the later section.
54:23 Michael: Yeah, so basically if you make a request to the base URL for the host that's running the hug service, it will actually describe all the services and how you talk to them and what's the inputs, the outputs, everything, and like you said, you can put versioning on it, so basically you don't have to go and change everything about your methods and try to somehow bulk versioning on, you can just say in your decorator this is for version two of the API. It also does argument conversion and stuff like that, right?
54:56 Caleb: That's right, yes.
54:56 Michael: Nice, cool. That helps in the documentation I guess as well, if you say here's an integer and its name is this, like the documentation can say, hey, it takes energy, hold this. Nice.
55:07 Caleb: Yeah, absolutely. Documentation, particularly for things like this, it's really a pain to write by hand, and no one should ever do that. Definitely you want to use a tool that makes it really easy to produce documentation and to keep the documentation up-to-date.
55:21 Michael: That's the key part, right, keep it up-to-date, because it's easy to create it, and they just leave that. I guess that changes all, sorry that documentation was wrong. Nice, ok so one of the things that's pretty challenging I think, let me rephrase that- is more challenging than I think it should be, is working with dates in Python. And so, you have some cool libraries to work with that that you found?
55:44 Caleb: That's right, so the first option that I had is not that unknown I guess. Many people who have had to deal with dates and times have used Arrow for several years now, and the key thing about Arrow, or at least the key thing for me I guess is that it does away with this idea of having naive date times and so-called aware daytime's. Aware date times are date time objects that carry with them the timezone that they apply to, and naive date time objects do not have the time zone information attached. And yeah, things get really out of hand if you start mixing and matching those without an awareness of what you're doing. And just by using Arrow because it uses aware date time objects everywhere, simply by using Arrow it means that you can avoid a certain class of problems where you're mixing up dates and times incorrectly.
56:31 Michael: Yeah, and you run into weird problems like, if you try to subtract two date times normally you get a time delta, but if one of them is a timezone aware and one is not then it crash, right?
56:44 Caleb: Yeah, that's right. And worse is when you don't get crashes and you do arithmetic operations, and the results that you're getting are not what you think you're getting; for example, in one part of your codebase you might call the now function, so date time.now and then you get a time object, and in a different part of your codebase you call a very similarly named function called UTC now. The problem is that the one gives you the time as it is in the UTC time zone, but without a timezone object attached, and the first one gives you the time as it is now, but in your local timezone, and the problem is that as a programmer depending on the context of the code, you may perceive those two values to mean literally this moment in time right now, but the values are vastly different, they're obviously different by the extent of the time zone differences, and so when you do operations on them you get very strange results, or results that seem strange to you because of the assumptions you made about what now actually means. So by using aware date time's you don't have those problems anymore, the time deltas that you obtained by doing operations between these are always correct.
57:46 Michael: Yeah, so even if one is from .now and others .utc now, it knows to normalize those to some common time zone before it does math, like it wants to look how far apart they are, I would say no those are actually, you know, either the same or like one millisecond apart or something like that.
58:07 Caleb: Yeah. absolutely.
58:07 Michael: Nice, so that's coming from the universe into now, you know bringing in the time and working with it. The next library you talked about is about pulling time that's been saved already into a bunch of different formats and process that, because parsing time can be super challenging, I was talking on the previous episode with Anna Scneider, they were pulling together data sources from all these different utilities and they said they have many different formats for time, that they are over 700 different formats for time out there. So trying to just like deal with all that stuff is super painful, so parsedatetime, which is what you talked about, really actually does an amazing job of that, even for human type stuff.
58:48 Caleb: Yeah, that's right. And this is another library that I discovered while doing the research for the book, I had not used this one before. I tried several libraries like this but I was amazed, in the section of this book I give some examples way parsedatetime is used to parse fairly typical looking datetime strings, but in the second half of the section, there's much more natural language type string sets that it also parses and does really well. I had a lot of fun doing this section, because I try to find ways of writing my statement of off what day it was in very different ways and very strange ways, and it seemed to get all of them.
59:23 Michael: Right.
59:24 Caleb: Yeah the last option that I had on my list was a string that said two weeks and three days in the future and parsedatetime correctly parsed that.
59:32 Michael: I know, it is so amazing, like when I had in mind what would work, you know it'll say look you can give it like 2016-07-16 or 7-16-2016 and these types of things, and it will actually parse those all correctly, but then you started to get more interesting, and you said like yesterday, 10 minutes from how, three days ago, and it just totally got all this, and then you got to the most outrageous one like you said; two weeks and three days in the future, that's awesome.
1:00:00 Caleb: Yeah, it's pretty cool, I would really want to use this in upcoming projects. I just need to find the right project. I have my PyPi package and I'm looking for the project in which to use it. [laugh]
1:00:11 Michael: Exactly. And I think I confused Arrow with parsedatetime. Arrow has the ability to give you like human relative time, so you can say on any Arrow time, you can say humanize and it'll say just now, ask it again a little bit later, it'll say seconds ago, or two hours ago, or two hours in the future, or something like that, which is really nice.
1:00:34 Caleb: Yeah. That's awesome. And the multilingual support is really good as well I see that as being hugely valuable in web services and web development.
1:00:43 Michael: I totally agree, Then, the last part that you looked at, you said okay, these are all very purpose focused packages that we talked about, parsing datetime or scheduling something to recur, but there's a couple of general-purpose libraries that you talked about, and the first one Boltons is from a multi-time guest on the show, Mahmoud Hashemi, and he put out there from the guys at PayPal which was great, so do you want to talk a little bit about what's good with Boltons?
1:01:12 Caleb: Yeah, for sure. The first thing though is that, what's quite interesting to me if you compare Python to some other languages, is because the standard library is so big and it covers a lot of ground, it's quite rare to find general-purpose libraries in the Python ecosystem. I thought that was quite interesting. Boltons is one of the few ones that I did manage to find, where the intention is literally just to be a general purpose library for use in very different spheres. Most of the packages that you get on the package index are dedicated towards a singular purpose usually, to perform some function like all the other libraries that we've looked at. So I really thought that was interesting in doing the research that there are not very many general-purpose libraries, and my conclusion is that that must be because the standard library covers already most of the general-purpose type of things that you need to do.
1:02:00 Michael: I agree that that's probably true, I wonder if another reason, a secondary reason is that it's pretty easy to bring in a bunch of small libraries, right, it's not like you've got to download and get the header files and the lib files and just statically linking, and all that kind of stuff that you have to normally deal with, it's if you just pit install and import a bunch of stuff you're good to go, so maybe it's also easier to have as small library as possible, but yeah, I think you're right that because a lot of stuff is built-in, people maybe put their energies towards fixing built in stuff and it's just, you know, it's a twenty five year-old standard library, right, it's pretty polished at this point.
1:02:38 Caleb: I think you make a lot of sense. It also makes a nice parallel with the Node ecosystem, where similarly, there aren't too many general-purpose libraries and that's probably because it's so easy to bring in a lot of smaller libraries to make up the feature set you require.
1:02:51 Michael: Yeah, absolutely. So maybe we could just really quickly touch on a few other things. So one of the things that's in there, that's pretty nice is the cache in functionality. So you start with cache utils?
1:03:01 Caleb: That's right. So the killer feature of the cache functionality in Boltons is the way that you can share a cache among multiple function calls, it's not that easy to do with the LRU cache that you get in the standard library; it's in the functools module so you have to import functools.lrucache the one that you get in Boltons, is very easy to share amongst many different function calls, as a decorator. That is the main attraction for me to use the cash in Boltons versus the one in the standard library. The LRI cash is also kind of interesting, I had to stretch a little bit to come up with an application to use both caches in the same codebase, it's definitely good to keep an eye on the the LRI cash.
1:03:42 Michael: Yeah, okay. And let's see, there was some other stuff that I thought was pretty interesting in there; one of them you had talked about was the atexit function which I'd never use the atexit function.
1:03:53 Caleb: Yeah, the atexit function is pretty neat, you can basically set something up to run when it exits.
1:03:59 Michael: Right, yeah, so if you want to do like save some data structures, and reload and start up, just register these here's the shutdown functions to make sure you run, that's cool. Other stuff that was in there that was nice was the iter tools that would give you like a window chunked data, so for example, and you talked about displaying that data interactively using PyQtgraph, you could maybe combine that with window iter tool the inter tools windowed iter behavior and take some sort of streaming data and always show the last 50 pieces of the data, like you have a few lines of code, right?
1:04:32 Caleb: Yeah, absolutely for sure, that the chunking and the chuchked iter and the windowed iter in my opinion they're much better than the recipes that the standard library gives in its iter tools documentation, which I see as a fairly clunky way of piecing together building blocks from iter tools to to get the same effects. I think it would make a pretty good addition to the standard library to have this chunked iter and window iter functions.
1:04:54 Michael: Yeah, it's very possible eventually some of these just become consumed into the standard library over time, it's great. You also have, there's also some nice debubugging tools that was cool; so you can say pdb on signal for example, how would you use that?
1:05:11 Caleb: Yes, so you can attach the speedy beyond signal function inside your running application and then by default a keyboard interrupt handler will automatically be added to your program, so that when a crash does occur, not a crash but when you ctrl C to stop your program, your program can stop at that point in a debug session. So for example with a long-running loop if it's taking too long and you're wondering whether the program is doing the correct thing, or perhaps you suspect that it is no longer doing the correct thing, you can ctrl C and you can get a debug prompt inside the loop, wherever you sent the signal.
1:05:46 Michael: Yeah, that's awesome.
1:05:47 Caleb: That's could be pretty handy in the right situation.
1:05:48 Michael: Yeah, if you're wondering what the heck is this process doing, it it talking to the database, talking to the web service, is it just broken, let's have a look, right? Yeah, very nice. Ok, so we're almost at the end so we should wrap it up; but the last major piece that you talk about in the general libraries is Cython.
1:06:08 Caleb: Yeah, so this 20 libraries book actually came about as a follow on from an earlier video screencast series that I did for O'Railly which was on Cython. It's a huge five and a half hour long set of 75 videos covering how to get into Cython and how to start using it.
1:06:26 Michael: That sounds great, I'll be sure to link to it from the shirts for everyone.
1:06:31 Caleb: Yeah, sure I'll give you the link. And yeah, Cython I think has started now to gain some mind share in the Python community, but not that many people are using it yet, because it does introduce some things that are more complex than what you usually have to deal with in a Python package, for example compiling with the C extensions. However, Cython, among many other things, Cython can give the average Python programmer two key things that have long been desired. The first one is Cython can speed up hot spots inside your source code easily affect a hundred or more if we're talking about basic math computation, a hundred times is not something to take lightly. It's the difference between you know running for a 100 days or running for one day. For a very big long running process. And the second thing that Cython can give you, again for the right kind of situation which might be mathematical computation, is an easy way to run your threads on different CPUs without the global interpreter lock interfering in any way whatsoever. Something that I tweeted just yesterday was there seems to be a misconception in some circles that you need openmp support to use parallelization in Cython and that's not the case; you can get pretty good parallelization just with normal Python threads as long as inside your Cython functions where you want to enable that you release the gil.
1:07:47 Michael: Right, and there's a way you can even do that with context managers, right, you can say with no gil or something like that right?
1:07:52 Caleb: That's exactly right, yeah. So for me, as a Python programmer now in the situation that I'm in, the gil is not really that big a problem for me, it depends on the details of the situation, but for heavy math computation where I want to be able to access all the cause and I simultaneously want my code to run faster than it normally might, just with a plane CPython interpreter, Cython gives me both of those things in the same package.
1:08:16 Michael: It sounds really great, I have not had a chance to do enough scientific work, but I can see it even being useful outside of scientific computational stuff, for example, if you're writing let's just say I'm writing some kind of ORM or something, and I'm spending a ton of time taking objects off the stream and the data layer and actually turning those into objects, and just that processing there is like some big hot spot if I do a query that returns a hundred thousand records. Maybe that loop could be written in Cython, is that right?
1:08:47 Caleb: Yes, absolutely, so a good example is something that I've been doing at work for the past week, which is converting our protocol buffer code away from using google's protocol buffer implementation to using a new tool called pyrobuff; pyrobuff is itself written in Cython and it generates a pyx file for you to use as your object implementation of the protocol buffer rather than just using your normal python-based implementation the protocol buffer. I'm not going to go into what the protocol buffers are, but basically it's exactly what you were saying, object serialization that you shuffle between two places and Cython has been used to create the protocol buffer, using pyrobuff, but in addition to that, I use the object, the particular object that that process generates, inside another Cython file which I can then use directly with no overhead from the Python interpreter.
1:09:33 Michael: Yeah, that's great, that's really cool. Alright, so you've definitely given me a broader view where Cython is applicable, and that that's cool. So let's round it out with one final awesome thing that you point out. And that's a github repository or project that is just a huge collection of stuff like this, that people have found awesome, like here's all the awesome Python packages for OCR, here's the awesome ones for e-commerce and so on and that's Awesome Python on github.
1:10:03 Caleb: Yeah, Awesome Python is so awesome that many of the other language communities have now begun to copy it so you can also find Awesome Go, and Awesome Ruby and many of the other variations.
1:10:17 Michael: Very cool. So Caleb, this has been really interesting I learned a lot from your book and not necessarily many of the pieces that we talked about, but maybe even the ones that we didn't get a chance to cover right, there's a bunch of other interesting packages that we are using in conjunction with your demos, I'll leave it just vague and people can go check out the book which at least a little while ago you could get a free ebook from O'Reilly, I'll link to it, but it's highly recommended I think it was time well spent to go through it, so thanks.
1:10:51 Caleb: Ok, it was my pleasure.
1:10:52 Michael: Yeah, so let me ask you as I always do everyone, but it's a bit of a bigger list to pick from, what's your favorite PyPi package, if you have one out of all, like okay this is the thing people should take away if they're not going to get the book?
1:11:07 Caleb: My pick for the PyPi package is pretty much anything inside the BeeWare project. I very strongly feel that the contributions that Russell is making are very positive for Python community and they are forward-looking. The things that he's working on at BeeWare project are things that we need to have happen in our community and our space, so for anyone who's thinking about finding a project online that they want to contribute to maybe get a little bit of experience, that is a great place to go. There's the toga framework which is intended to be used as a way of writing platform native graphical user applications in Python, but there are a whole bunch of other smaller projects that you can adopt and get into and dive in and play with the details. There are projects for running Python on iOS project for running Python on android and a bunch of different other features. It also has a project for packaging RPython for deployment to target machines which is another issue that many people in Python field has been I guess under addressed, the issue of deployment.
1:12:02 Michael: Yeah, I definitely think it's under addressed for desktop space is absolutely, or mobile for that matter but anywhere but the web or just your shell. Okay, very cool, that's great check that one out.
1:12:16 Caleb: So, BeeWare the name of the project is BeeWare, if you're looking for something to work on, you could do much worse than that project.
1:12:22 Michael: If you're writing some code what editor do you use?
1:12:23 Caleb: Yeah that's a great question. So I've been doing this awhile now and for most of those years I've been using Vim, and since January I've started using PyCharm because the scales have tipped the balance for me and the features that the PyCharm now provides outweigh what I can do to the best of my knowledge with Vim configuration. So, yeah I'm now writing my Python code in PyCharm.
1:12:45 Michael: Alright, you and me both, I love that one as well, it's great. Alright, final call to action for our listeners out there? First of all check out the BeeWare project and contribute to that if you you're looking to write some code, anything else? Get your book? Where do they get it?
1:12:58 Caleb: You can get the book and O'Reilly I think if you search for 20 Python libraries you aren't using that should be enough for google to find it for you. And of course then there's my Cython course as well, if you do want to get more into Cython, you can check out my course; as far as I know i think it's the only video course currently available for Cython, but I might be wrong about that, but that's something else to check out. And then maybe the last thing I would mention is just as a general comment, sometimes you see on forums like Reddit and other places there's a lot of dissatisfaction with some of the decisions that the core Python development team make regarding certain features in the language and what gets included and what gets excluded and so on. And I would encourage people to follow the newsletters and the mailing lists to see a bit more about the discussions that go into these decisions, the core team has many difficult and complex issues to deal with regarding features that they include and exclude. And before I started the mailing list for python dev, I had the same thoughts about why was this designed that way, why didn't they include that, why is that not done. But once you begin to follow the mailing list, and you start the discussions and the complexities that they have to deal with; for example in the [01:14:00] project that's another good example, once you begin to see the complexities that these teams are dealing with, you begin to understand why the decisions get made in the way they do, so I just want to make a point there that if anyone feels dissatisfied with what the core python team has been doing, get involved and find out more about why the decisions are getting made in particular ways.
1:14:18 Michael: I think that's great advice, certainly looking at how the trade offs are being chosen it's definitely important. Thank you so much for sharing your book and all this research you did it's really helpful.
1:14:30 Caleb: Sure, my pleasure.
1:14:30 Michael: I think people should they should check out the book will definitely enjoy it. Thanks for being on the show.
1:14:34 Caleb: Yeah, thanks Michael.
1:14:34 Michael: You bet, bye.
1:14:36 Caleb: Ok, bye.
This has been another episode of Talk Python To Me.
Today's guest has been Caleb Hattingh and this episode has been sponsored by Intel and Capital One. Thank you both for supporting the show!
Are you a data scientist or Python developer who loves data? If you are looking for a place to work on data science with truly big data, that can affect millions of lives, then head on over to jobs.capitalone.com/talkpython and check out the wide range of jobs that Capital One is trying to fill right now.
The Intel distribution for Python delivers the high performant Intel C libraries built right into Python, get close to a 100 times better performance for certain functions when using Numpy, Scipy and Scikit Learn. Check them out at talkpython.fm/intel.
Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my onlne course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. If you're looking for something a little more advanced, try my write pythonic code course at talkpython.fm/pythonic.
You can find the links from the show at talkpython.fm/episodes/show/77
Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm.
Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. Cory just recently started selling his music on iTunes so I recommend you check it out at talkpython.fm/music. You can browse his music there and listen to the full-length version of Developers Developers Developers.
This is your host, Michael Kennedy. Thanks for listening!
Smixx, take us out of here.