Monitor performance issues & errors in your code

#58: Create better Python programs with concurrency, libraries, and patterns Transcript

Recorded on Wednesday, May 4, 2016.

00:00 What do you focus on once you've learned the core concepts of the Python programming language and ecosystem? Obviously, knowing a few fundamental packages in your space is critical. If you're a web developer, you should probably know flask or pyramid, and sqlalchemy really well. If you're a data scientist, import pandas, numpy, matplotlib need to be something you type often and intuitively.

00:00 But then what? Well I have a few topics for you! This week you'll meet Mark Summerfield, a prolific author of many Python books. We spent time digging into the ideas behind his book Python in Practice: Create Better Programs Using Concurrency, Libraries, and Patterns.

00:00 What I really like about these topics is that they have a "long shelf life". You find them relevant over time even as frameworks come and go.

00:00 This is Talk Python To Me, episode 5, recorded May 4th , 2016.

00:00 [music intro]

00:00 Welcome to Talk Python To Me, a weekly podcast on Python- the language, the libraries, the ecosystem and the personalities.

00:00 This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython.

00:00 This episode is brought to you by Hired and Snap CI. Thank them for supporting the show on Twitter via @hired_hq and @snap_ci.

00:00 Hey everyone, I have a couple of things to share with you. First, we'll be giving away an electronic copy of Mark's book this week, as always, just be registered as a friend of the show on talkpython.fm, I'll pick one later in the week.

00:00 Next, I had the honor to spend an hour with Cecil Philip and Richie Rump on their late back technical but casual podcast Away From the Keyboard. I really enjoyed the conversation and I think you'll like their podcast too. Give them a listen at awayfromthekeyboard.com.

00:00 Finally, those of you taking my Python Jumpstart by Building Ten Apps course, I have a few improvements for you- I have transcripts to the player, the website and the github repository as well as added some activity tracking across devices so you know which lectures you've watched. I hope you find these additions useful.

00:00 Now, let's chat with Mark.

02:25 Micheal: Mark, welcome to the show.

02:26 Mark: Oh, thank you very much I am glad to be on it.

02:29 Michael: Yeah, I am super excited to talk about all these great books you've written, and one of them in particular really caught my attention, called Python In Practice- create better programs using concurrency, libraries and patterns, and that just really speaks to me on some of the most important parts of sort of design pattern and improving your overall skills not just focused on libraries, like Flask, or something.

02:53 Mark: Yes, sure, one of the motivations for writing that particular book was I wanted to write something for people who are already comfortable writing Python, but showing more of the high level things you could do with Python. Think, if you wanted for example to do really low level networking with TCP IP you can do that in Python, it's got the libraries all the facilities are there; but if what you are more interested in is application programming you might not want to go so low level, and Python either built in or in third party libraries has wonderful facilities for doing high level stuff, whether it's networking, concurrency or things like that. So I wanted to look at some of the facilities that are available both built in and third party that allow you to do some fantastic things with Python without having to get right down into some nitty-gritty details.

03:49 Michael: Yeah, that's beautiful, I find that when people are new to Python and this includes myself when I am working in some area that I haven't worked in a lot, I do not realize there is some really simple thing that I can use and I think it's really great, there is a lot of those little tips and tricks in your book, but before we get into the details of your book, let's start at the beginning- how did you get into programming in Python?

04:12 Mark: I started programming on paper, in the late seventies, I started reading computer magazines, I taught myself Basic purely off the magazines, and I wrote my code on paper and I executed it on paper. Then eventually I bought a home computer, I don't know if your listeners remember what they are but they were things before PCs, very limited, but quite a lot of fun, and eventually, I went on to do a computer science degree which I absolutely loved, and that gave me a lot of the sort of theoretical background, and then I just went into software development and I had been doing that for quite a few years when I bumped into someone who suggested trying Python. And I borrowed a book from a colleague fellow developer on Python and I hated the book, and that put me off Python for about a year, and that was one of the motivations for writing my first Python book, to write one that would actually encourage people to use it. But once I started using it within a year, all of my utility programs and tools that I used for my daily work, they were in Python, because I just loved it. So that was around 1999, and now, everything I do the first choice language is always Python.

05:32 Michael: Yeah, the Python ecosystem and frankly the language is fairly different from the late nineties.

05:37 Mark: Massively different, I actually didn't like the indentation at first, that took me like 48 hours before I really clicked wow, no braces, I just don't have to bother, and that was really nice, and also of course it forces your code to be quite neat in the first place; it doesn't of course make you use good variable names and things like that, you have to learn that separately but that applies to any language. But I really liked Python pretty well from the get go.

06:06 Michael: Yeah, so did I. I think the indentation does catch a lot of people off guard, and to me it's kind of like good science fiction you have to sort take this moment, the suspension of disbelief like just imagine for a minute this white space concept is a good idea, and work with it for a week, and then it just dawns on you, like wow this really is a good idea; like I went back to working in some C based languages right after I sort of learned Python and started working in it, and the white space was a shock to me at first but what was even a bigger shock was these C based languages that I loved I all of the sudden hated all the parenthesis, the curly braces, the semi colons and it was such a surprise to me that I felt that way, but it was within a week I was just completely over the semi colon.

06:55 Mark: Sure, and of course there is no dangling else problem that you can get in C or C++, you know, if you've got an else with the correct indentation you know what it's going to be executed, you are not going to get caught because you haven't put in braces, you know. So yeah, it really works for me, and Python the language, ok it's complete and and so is Perl and so is C++ and Java and all of these languages so you can do anything in one of them that you can do in the other, so why choose one particular language rather than another, and I think parts of that is what is the libraries and ecosystem like and part of it is what fits the way you think best, in my case it happened to be Python that worked, but I wouldn't argue against on who preferred some other language if that suited them. But for me Python is a great language, and in particularly Python 3, I started using that from the first alpha, I ported everything from Python 2 to Python 3 right at that stage and I think it's excellent.

07:56 Michael: Yeah, maybe that's a good segway to sort of taking the survey of the books you've written, because you've written many books, something- how many, around eight, seven?

08:05 Mark: It depends whether you count second editions, without second editions it's seven, and with them it's nine.

08:11 Michael: Ok ,excellent. And one them you wrote is a pretty sizable Python 3 book, right?

08:17 Mark: Yes, that's Programming in Python 3, yeah. That book is really aimed at people who can program in something and it's to port them over to Python 3, but it should also work for people- and the something that could be Python 2. So that is who it is aimed for, it's not aimed at beginners, the subtitle was poorly chosen by me, A Complete Introduction To The Python Language, sometimes people think that introduction means it's introductory, intention was just it's introducing everything that Python 3 has got that you are going to use in normal maintainable programming. The only things I don't tend to cover in my books are things that I think are dangerous and obscure, so for example in Python you can disassemble Python bytecode, rewrite it and put it back and that's brilliant, but I wouldn't ever cover that in my books because I want to cover stuff that people can maintain.

09:17 Michael: And understand that it runs, that's right.

09:18 Mark: Absolutely. Maintainability, understandability are really important to me because in my experience you live with code for quite a lot of time and literally years, so you don't want to torture yourself when you have to go back and fix something or do a modification to something and it's been a few years since you've seen that code.

09:40 Michael: Yeah, absolutely. So, maybe tell us about some of the other books you wrote.

09:45 Mark: Ok.

09:46 Michael: You've got some interesting topics there.

09:48 Mark: I think the first one I wrote considering Python is Rapid GUI Programming with Python and Qt which is about PyQt programming with PyQt 4, although the book is I mean I updated the examples for PySide as well, I really liked the Qt GUI toolkit, I liked cross platform GUI programming and that book, the first part of it is actually a very brief introduction to Python programming itself and I was quite pleased to have very good feedback on that, many people said well, I already knew Python but I read the introduction because, well, I bought the book, and I still learn things from it, so I was glad about that, and Qt itself I think is good, I know that it's very fashionable mobile programming and web programming and things like that, and that's great, but I personally love desktop programming, and desktop GUI applications, so it was an expression of something that I was really interested in and I really enjoyed doing.

10:46 Michael: Yeah, that’s great, I agree with you the desktop apps are really, they don't get enough love, because mobile is the super flashy thing, I actually spent most of my time writing web apps but I do very much appreciate a good desktop app and I think Qt is one of those frameworks that is cross platform but doesn't feel cross platform as a user, you are not like oh yeah, this button and this UI completely looks foreign but it technically is a UI, right?

11:15 Mark: Absolutely, yeah, I mean it really do well with native look and feel, even though they are not using native widgets unlike say wx Python which does, they emulate it all but they do very well and some of this stuff on Windows is faster than native the way that it is implemented, because they are not, when they create a window, they are not creating all those widgets which you would do using a normal windows toolkit, they just basically create an image, so it's much cheaper and much faster in terms of resources. So yeah, I love that toolkit and that's how I am earning my living now because I am earning my living writing on the back of commercial Python applications for desktop users. So they are written in Python, Python 3, pyside and Qt.

12:05 Michael: Oh wonderful, and do you ship those with something like cx freeze that sort of packages them up or is it py2app?

12:12 Mark: That's exactly right, now I use cx freeze, and so they are quite a big download because the Qt libraries are all in the bundle, and people can download them try them and hopefully, buy them. [laugh]

12:27 Michael: Yeah, wonderful.

12:29 Mark: And that seems to have worked for the last couple of years quite nicely.

12:32 Michael: Yeah, ok, fantastic, yeah, that's a really cool story.

12:35 Mark: It is possible to earn a living with Python not doing web programming, I mean obviously you can earn good money doing web programming, as well.

12:43 Michael: Right, of course, or data science, those are the two most common ones.

12:46 Mark: Oh absolutely, yeah.

12:48 Michael: So you also have some books on Go and C++.

12:52 Mark: I've got a Go book, I did that because I was just really interested in Go, I think Go and Rust are both sort of new languages that I am interested in, and I like the simplicity of Go, and I am also interested in concurrency, and so I started to learn the language and I thought yeah, I can write something better than what's available on this language. But one of the authors of Go has come out with a book on Go now, so I should think that would kill mine. [laugh]

13:25 Michael: You never know.

13:27 Mark: You never know. I mean well, I still think mine is a good book actually. And, in terms of C++, I have written a few but they are all C++ with Qt. So, I co-wrote with Jasmine Blanchette C++ GUI program with Qt 3 and then with Qt 4, and then I did a solo one Advanced Qt Programming. They are all C++, I still use them for Pyside programming, you know, to remind myself how to do things, and I actually use the C++ docs rather than the Pyside docs I can translate easily enough.

14:04 Michael: Yeah, they are a little more definitive maybe.

14:08 Mark: Yeah, I am frustrated with C++, I mean C++ 11 was like a huge step forward but they just didn't deprecate enough as far as I am concerned so it is getting to big and I think it's quite hard to write that language in a maintainable way now.

14:23 Michael: Yeah, that language is just ever growing and it wasn't all that simple in the beginning. So, I have another question about Qt, I definitely want to get to the topics of your books, but given your background there I just saw- when was this, back in mid April, there was an announcement saying we are bringing Pyside back to the Qt project.

14:45 Mark: That's right, I really delighted, fantastic, it's going to be Pyside for Qt 5, now you can use PyQt with Qt 5 but Pyside has been Qt 4 only so far, but they are actually putting money behind it and investing in it so Pyside 2 will be for Qt 5, and I am really looking forward to that.

15:07 Michael: Yeah, so am I, do you know the timeframe on when that kind of stuff will be out?

15:10 Mark: They are doing the development in the open, I think there is like a github or some equivalent to that you know so you can look at it, but I would guess it's going to be, I think we'd be lucky to see by the end of the year something that is useful, because it's not a simple job.

15:27 Michael: Yeah, there is a pretty big break from Qt 4 to Qt 5.

15:28 Mark: Not to mention the Qt side of it, I don't think that is the hard side, and when they did Pyside they invented a new way of doing bindings, and I think that hasn't proven to be a quite as maintainable and flexible as they had hoped, so I think that is where they are going to have to do quite a lot of work getting that to work with Qt 5 and the new C++.

15:55 Michael: Ok, yeah, so it's more with the Pyside version than it is the Qt itself, got it.

16:00 Mark: I think so, yeah.

16:03 Michael: All right, excellent. So, I know there is a lot of interesting GUI apps from the Python perspective and maybe another time we could dig into Qt even more, but let's talk a little bit about creating better Python apps- what was the motivation for writing this book, I mean you said you aimed it at people who were in the middle but you gave it four themes, you said I am going to cover sort of these general themes of code elegance, improving speed with concurrency, networking and graphics, like how did you come to that collection?

16:33 Mark: Well graphics because I just love GUI programming. I also with networking, I've done a fair bit of network programming but I am not a low level person, I like my networking to be as easy as possible and I wanted people to be aware that you can do networking really easily with Python without having to go down to low level stuff, and do it reliable and pleasant way. And I covered two approaches, one is the xmlrpc and I covered that because it's in the standard library, and it works really easily so it's really nice, and the other one I cover is a third party one called RPyC there is another one called Pyro which is also widely used, and I could have gone with either of those, and the advantage of RPyC and the Pyro is that they can be Python specific, so you can get more better performance, whereas xmlrpc is general so it's not quite as good the performance but it has the advantage that you can write a client server using xmlrpc and it will talk to anything else, that uses the xmlrpc protocol. So, that's very nice interoperability and they are high level so all of the details and timeouts and all of the issues that can arise in networking is just neatly controlled in 18:02 and ok you get exceptions and things like that, you know all normal Python stuff, you don't have to worry about the details.

18:09 Michael: Yeah, it makes a lot of sense.

18:10 Mark: So I want people to be aware of that and these kind of facilities exist.

18:15 Michael: You can basically write a wide range of different types of networking apps in Python, right, you can go all the way down to the raw sockets with bytes, I just talked with Mahmoud Hashemi from PayPal and those guys are writing services that take over a billion requests a day, they are doing network programming in Python but down below the http level, and these custom APIs and then you can of course go up higher, like xmlrpc or maybe rest service with request things like that, right?

18:46 Mark: Absolutely. It's the high level stuff that I was more interested in and I think that's because in my heart I'm an applications programmer and that means I know about the subjects of my application, but I don't necessarily have the expertise in particular areas that the application needs, and so for that I want high level libraries that give me the functionality that are created by experts in those fields, so I get the best of both worlds, I get the functionality I need, by excellent people who have developed it without actually having to learn about stuff myself.

19:21 Michael: Yeah, absolutely. I think the right way to start is start simple and then go do crazy network stuff if you need to improve the performance but generally, you don't have a performance problem.

19:33 Mark: No. And that actually brings us to concurrency, people say Python is slower, Python can't do concurrency and I really wanted to address those issues because how slow is Python really- well, I developed a program in C++ that was very CPU intensive, and I rewrote that program in Python and it was 50% slower, and I think that's not bad, going from C++ to an interpreted language, a bytecode interpreted language. But of course, I then made it concurrent and you could make it concurrent in C++ but it's much easier doing that in Python and so on 20:17 machine suddenly it was as fast as C++, give it more 20:20 it was faster than the C++, so even though base line yeah, it's 50% slower, on real hardware, using concurrency it was much faster and that's really what the user is going to care about, on my hardware is this thing running fast?

20:36 Michael: Right.

20:37 Mark: And of course it's much more maintainable doing concurrency in Python is so much easier, than in most other languages.

20:43 Michael: Yeah, and I think specifically around concurrency, it's easy to get yourself into a situation where you've been very clever and you though really hard about the algorithms and the way it works, and you kind of written something just at the limit of your understanding, like you don't really understand what you did but it's at the very edge, but of course, understanding multi threaded code debugging it, it's harder than writing it, and so maybe it's like you know, you have sort of gone a little too far and like ok, I could write this but I don't really understand how to fix it when it goes wrong.

21:18 Mark: Yeah, and that is a real problem with concurrency, in some languages you are stuck because the concurrency facilities they offer pretty basic so they don't make it easy but Python offers higher level approaches to concurrency as well as low level. For example, you've got the concurrent features module which makes it much easier to create either separate threads or separate processes where Python takes care of lots of the low level details.

21:18 [music]

21:18 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

21:18 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.

21:18 Typically, candidates receive 5 or more offers in just the first week and there are no obligations ever.

21:18 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $1,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $2,000!

Opportunity is knocking, visit hired.com/talkpythontome and answer the call.

21:18 [music]

22:46 Michael: I think you really put this together quite nicely in terms of sort of breaking out the different types of concurrency, and it helps you understand if you sort of think ok, well what are the types of concurrency, what type of problem am I solving, then you have a pretty good recommendation for if it's this type of problem solved this way, so you said there were three types, and you call them threaded concurrency, process based concurrency and then concurrent waiting.

23:10 Mark: Yeah. Basically if you are doing CPU intensive work, then using threading in Python is not going to help because of the global interpreter lock, so if it is CPU intensive and you need concurrency, then you need to use a different method and Python offers for example multi processing where you can split your work over multiple processes rather than multiple threads, and each of those has its own separate interpreter locks, so they don't interfere with each other, but I think the key to like getting real performance from concurrency is to avoid sharing as far as you can and that means either you don't need to share in the first place, or if you've got data that needs to be looked up by your multiple threads or multiple processes, then it may be cheaper to copy that data and rather than have them shared with some kind of locking to look at it.

24:08 Michael: Right, and that's a mistake I think a lot of people make, is they see the way that their program is working now, they've got some share, they've got a point that they are passing two methods or whatever, two parts of the algorithm and they are saying well this part is going to work on this part of memory, and this one is going to work over here, and so they think when I parallelize this, this is shared memory access and of course you have to take a lock or somehow serialized access to that data and it's easy to forget you know, if that is like 1MB or even maybe 50MB of data, it might be so much easier both for you and advantageous for performance, you just say copy, make a copy and just pass it over and then corelate the differences later.

24:50 Mark: Absolutely, the other possibility is to share the data without copying which is fine if you never write to it, so if you've got data where you just read information like a log or some data stream and you are never changing the information you are reading you might be producing new output, so if the stuff you are reading you can read that from a shared data structure as long as you read, that's not going to be a problem, it's only when you are going to start writing that sharing becomes an issue. And then, you've got problems if you don’t lock, but the best way is still don't lock, the best way if you are writing data is save the data in separate chunks and gather it together at the end that will often be less error prone and faster.

25:39 Michael: Yeah, absolutely.

25:39 Mark: But of course sometimes there is no choice, sometimes you do need to lock, and then that's when it becomes quite difficult to reason about because you' be got to be clear when you need to lock, and when you need to unlock and all the rest, and that's where the difficulty comes in, but if you can avoid having to lock then you can get good performance without problems.

26:05 Michael: Absolutely. And, there is some of the data structures in the newer versions of Python that sort of solve that problem for you and so we'll talk those a little bit but when you said copy data, one way to say copy data structure, like obviously you can't just pass the pointer over and get another variable and point two of the same thing, right, because it's a passed by a reference not passed by a value type of semantics, so there is actually a copy module in the standard library right?

26:36 Mark: Yeah, and basically what I would contend is this- if you use copy.deep copy and a non concurrent program, then there is almost certainly something wrong with your logic, but if you are using a concurrent programming, then it may well be the right solution for you, because deep copying can be expensive if you are using like nested data structures like dictionaries with dictionaries and things that are quite large for example, but nonetheless, it maybe the right trade off, of course the only way you are going to know for sure is to profile, and actually time things because that is the other sort of big issue isn't it, that we an intuitive feeling this will be fast or that will be slow, but unless you back it out with numbers you could be optimizing something that makes no difference whatsoever.

27:24 Michael: Yeah, you had some really interesting points there, that I thought were both interesting and good advice, one was if you are going to write a concurrent program, write the non concurrent serial single threaded version, fist if you can and then use that as the base line for your future work, right?

27:44 Mark: I mean one of my commercial programs it does its work using concurrency, it's written in Python and it's concurrent, but I have two modules that do the processing, one uses concurrency and one doesn't, and the tests I have to make sure they produce exactly the same results and of course one is much slower than the other, but I found that incredibly useful in the early days particularly for debugging and I still use the non concurrent one if there is some tricky area that I want to focus on without having to worry about concurrency, so I found it's paid off in terms of saving my time as a programmer and that's the other kind of time, isn't it, it's not just the processing or runtime of your software, it's the time you spend not just creating the stuff but maintaining it. And, concurrency can cost you a lot of maintenance, unless you are very careful. Yeah, I am a strong advocate get a non concurrent version working first, and of course, it may turn out that that's actually fast enough anyway.

28:51 Michael: Absolutely, and then you have saved yourself a whole lot of trouble, it's sort of the whole premature optimization issues, yeah.

28:58 Mark: Yeah, it's easy to get seduced in seeing things concurrently because it's very fashionable and you can like brag about it, but honestly, it's got to be the right solution, and you are not going to know that until you have done a non concurrent one first I think.

29:11 Michael: Yeah. That's a good point and sort of related to that is the performance story, so you have some examples in your book where you write the concurrent version and then you have the, you have the concurrent version but you write it in several ways. And you also have the serial version.

29:28 Mark: That's right, just to compare to show what different strategies may be depending on circumstances.

29:35 Michael: Right and for the CPU based one you were doing like image analyses and processing and-

29:40 Mark: Yeah, because that's expensive in terms of CPU and of course if you use threading it kills performance, [laugh]

29:46 Michael: Yeah, ironically. And CPython is actually several times slower if you do it in parallel.

29:52 Mark: Yeah, because it's only actually running on a single core at the time and you've got context switching on top of that, whereas if you use multiprocessing zoom it can run free, you know, it will max out your cores and it will go as fast as your machine is capable of.

30:07 Michael: The performance story around threading is super hard to see the whole picture because we obviously know when you take a lock here that it slows down both of threads and the contact switching is slow but then you also have the sort of memory usage, you have the L1, L2 cashes and registers and so like when you switch from one thread to the other it could pull in data that trashes your cash and you've got to go get it from main memory which is like a hundred times slower, and it just, it's very subtle.

30:37 Mark: And yeah, if you are not working, then generally using threads is fine, because the network latency is so dominant, I am not talking about in terms of if you are Google and doing like massive servers, but for a lot of if you run more ordinary applications then threading is fine for that. But of course, there is the new asyncio library.

31:02 Michael: The GIL is basically one of the problems that means you don't get any of the concurrency that you are aiming for computationally, but you still get all the overhead.

31:09 Mark: Well, you don't get it of the Python level, you will get the C level, so if you have something that is running with Python threads and actually the work is being done by say a C library, if that C library is written well, it will release the GIL do its work and then reacquire it when it needs to pass data back, so it's not a simple story no matter how you look at it. So you still have, you've got a performance test but you've got to have that serial version to give you benchmark so you know whether you are getting better or worse.

31:43 Michael: Yeah, right, absolutely.

31:45 Mark: And also, just doing that serial one, it will give you insights into the problem you are solving anyway. And it's better to make mistakes with the serial one than with the concurrent one because you've got less to think about then.

31:55 Michael: Yeah, that's for sure. If you really do have a problem and it is slower, the trick is to use subprocesses.

32:02 Mark: Yeah, use the multiprocessing module which prior to 3.2 I found not terribly reliable but they did loads of improvements in 3.2 and certainly in 3.3 and 3.4 it absolutely rocks solid both Linux and Windows which are the platforms I used. And it's brilliant, it's absolutely excellent library.

32:25 Michael: Yeah, and I just want to point out to people, like when you hear say use subprocesses, that doesn't mean just go kick off a bunch of processes and manage it yourself, right, there is sort of a concurrent library for managing them. Multiprocessing library.

32:37 Mark: Oh the multiprocessing library- yeah, there is nothing to stop you there is the subprocess module you can do manually but there is no reason to do that, I mean you can create a process pool, in one of my applications I do that and there is an asyncronous function you can basically just give it a Python function and some arguments and say ok, go do this somewhere else on some other processor. And it will just do that work if that's like expensive work that's great, because it doesn't stop your main process at all, and when it's done the work it lets you know and you cannot pick up from there. So, there is the concurrent features module which is a very high level module which makes it really easy to just execute either with threads or processes stuff or you can go use the multiprocessing module itself with its pools and stuff, so you can find the level that suits what you need.

33:36 Michael: Yeah, it feels to me like if you are doing Python 3.2 or above, you should really consider maybe a concurrent module first, and the concurrent features, because it's so easy to say let's do this computationally, let's do this as subprocesses, let's switch it to have like a pool of subprocesses, all of those things, right.

33:57 Mark: Yeah, the other thing about multiprocess thing is by default it doesn't share memory which is the opposite of threading, which means you are not going to 34:05 copy yourself by writing to something that is shared, of course it means if you do want to share, you would have to actually go to extra efforts and say ok, I am setting up this thing to be shared.

33:57 [music]

33:57 Continuous delivery isn't just a buzzword, it's a shift in productivity that will help your whole team become more efficient. The Snap Ci's continuous delivery tool you can test, debug and deploy your code quickly and reliably. Get your product in the hands of your users, faster and deploy from just about anywhere at any time. And did you know that Thoughtworks literally wrote the book on continuous integration and continuous delivery? Connect Snap to your GitHub repo and they will build and run your first pipeline automagically.

33:57 Thank Snap CI for sponsoring this show by trying them for free at snap.ci/talkpython.

33:57 [music]

35:13 Yeah there is some built in data structures for sharing across process, right?

35:16 Mark: There are, I mean I only used them personally for flags you know, I tend to get them update to separately if I've got results data and then join it all up together at the end, so I just use flags basically to say it I have this bit or this stage or a flag to say look, just stop ow because the users can't sort, you have to give me a clean termination. But you know, I covered all that sort of stuff in the book, but it's a really great module, multiprocessing, but concurrent features give you that high level approach which makes it simple as it can be for this kind of stuff, I mean I'd still advocate not using concurrency unless you need her, because it does make your program more complicated, and harder to reason with, reason about.

36:06 Michael: Yeah, and you know, it's easy to switch between concurrent features using subprocesses or using threads, but that doesn't mean the code that you write can be just flipped from one to the other, because of the serialization issues, and the lock in shared data, so that's maybe a really subtle thing you can run into.

36:24 Mark: Yeah, well, thread, if you are doing threading the memory is shared by default, so any thread can stomp on anything, which can be a problem, but on the other hand if you use multiprocessing any data that you are passing around has to be picklable for example, which doesn't apply in threading because you are just accessing the same data through the same memory space. So there are differencies and there are tradeoffs, and the API for multiprocessing started as mimicing the threading API but it's actually growing considerably since then. So, it's worth digging in and learning but I would start with the concurrent features because that is the easiest conceptually any practical code requires the least code to get the stuff done.

37:12 Michael: Yeah. Absolutely. So, both the multiprocessing and the threading are pretty good for when you are doing basic IO bound work, right, because the key thing to know about that is a thread when it waits on a network call in CPython will release the GIL, right?

37:30 Mark: Yeah, but of course, there is the asyncio module which is designed for that kind of work. I am not a user of that module because that most of my processing is CPU bound, but that is a third way if you like.

37:46 Michael: Yeah, so in Python 3.4 they added the asyncio and the concept of event loops, and I also have not used that a lot but my understanding is that's a little like the no js style of programming.

37:58 Mark: I don't know because I avoid Javascript as much as I can [laugh]

38:03 Michael: But basically waiting on the IO bits and releasing it's a process other bits of code, other methods while you are waiting on IO, right?

38:12 Mark: So it's going to let you know be blocked.

38:15 Michael: Right, very call back driven, yeah?

38:17 Mark: Yeah, which is perfectly good approach.

38:21 Michael: Yeah, and then Python 3.5 added the async and await keywords.

38:26 Mark: Yeah, which I haven't used I am still using 3.4 partly because I had some compatibility issues with cx freeze at the time, and partly because of the installer for my commercial software, I released both 32 and 64 bit versions on Windows, and up to 3.4 it's really easy to install both of those side by side, it's no problem, but with the 3.5 installer what I found was some third party libraries couldn't find one or the other. So I am a bit stuck with 3.5 on Windows at the moment.

39:04 Michael: Well, and the installer for Python 3.5 got a major reworking by Steve Dower who was actually just on the show, what number was that, that was- 53, so just few weeks ago and the installer is much nicer than the old one.

39:21 Mark: It is, but it doesn't do what I- [laugh]

39:24 Michael: If you need this other thing it's not doing obviously you can't use it, right.

39:30 Mark: Yeah, I need to be able to install 32 and 64 bit Python side by side and I can do that up to 3.4. I am not saying it's not possible, I mean I have done it with 3.5 but what I haven't managed to do is get my third party stuff, py win 32 and apsw which I'll mention at the end, I couldn't get them working properly with that when I had both, they worked fine when I just got got one Python, but not when I had both. But hopefully that problem will go away because some time I am going to like stop doing 32 but versions for my apps. [laugh]

40:03 Michael: I really want to look into the async and await stuff more because that programming model is so beautiful, it's just I haven't been writing any code that requires that type of work but-

40:17 Mark: I like that model because it's very similar to the GUI event loop model, I mean the GUI event loop basically sits there and says I'll let you know if something happens. And you say ok, well if this thing happens call this.

40:28 Michael: Yeah, GUIs are inherently event driven, right?

40:32 Mark: Absolutely.

40:33 Michael: They've got their message and everything, so, actually one of the last sections in the concurrency bit of your book, you talk about special considerations for GUIs.

40:44 Mark: Yeah, I mean, I did this using tk 40:45 simply because that's in the book, comes with python out the book, so though personally I use pyside and Qt, but it would work, the method works with both and I am sure it would work with wx or with pygobjects, any GUI system, and what I discovered was, how do you make or the question that rose for me was ok, I've got a GUI application, and it's got to do some CPU intensive work, but I don't want to freeze the CPU because what if the user wants to cancel the operational, what if they just want to quit the application, I don't need frozen for like minutes 41:19 on end when they can't do anything.

41:22 Michael: One of the quickest ways you can make a user believe that your application is crappy is to have it just lock up on windows like get that sort of white opaque overlay saying not responding or on osx it says 41:35 you are like hmm, I am a little suspicious now...

41:38 Mark: Yeah, and sometimes I use messages come to heavy because sometimes, but anyway, so that was the problem that I had to address, and what I found was if I used threading I have a work thread and a GUI thread, the GUI still freezes. So what I needed was some way of not having the GUI freeze and the model that I came up with was I have two threads, one for the GUI and what I call rather sarcastically the manager thread, and the thing about the manager thread is the GUI did the work, whenever there is work to be done there is like CPU intensive, it gives it to the manager, but like a good manager, the manager never does any work. And that means that the GUI thread always gets all the CPU of its core, so it's never blocked. And the manager is giving all the work and never does any work, and that solves the problem, because what the manager does, it uses multiprocessing to hand it off to other processes, and if you've got multiple calls that's no problem, I did try on single call machines and it was still no problem.

42:37 Michael: Right, because you still have the multithreading that gives you enough time 42:40 that your user feels like your app is working.

42:43 Mark: So basically you've got two threads the GUI thread, gets all the CPU for its core, and whenever it has work it gives it to the manager who immediately hands it on to process in a process pool, and that process is separate and goes off and does it and lets you know, and of course it's cancelabale if the user wants that.

43:04 Michael: Yeah, ok, very nice.

43:06 Mark: But it basically, it shares one int, and the int is I am going to say like you are good to go or look, they don't want you anymore, stop work.

43:14 Michael: Exit, yeah.

43:15 Mark: That model I covered in the book, and that's to say, it will work with any.

43:23 Michael: I think that's a pretty good summation of the concurrency story. The other part of performance that you talked about, that I actually don't know very much about and I haven't talked about it on my show is using Cython to speed up your code. Can you tell everyone what Cython is and give a quick summary there?

43:41 Mark: Ok, Cython is basically, it's a kind of compiler so if you have an application written in pure ordinary Python and you run it through Cython it will create a C version of your code. And, my experience is that will basically run twice as fast, just without touching it, without doing anything, just because it's now C, but you can then go further you can actually give it hints and say well, you can give it type in, so basically you can say well this is an intel 44:18 this is a string, and if you give it hints you can optimize better it's also got optimizations for numpy so for people who are interested in that kind of processing so it can produce very fast code for that.

44:33 Michael: Yeah, that's interesting and it is worth pointing out that it's not the same concept of type ins in Python 3.5 which is just more of an IDE, right?

44:40 Mark: That's right, typing module for 3.5 is in a sense it has no functionality at runtime, it's purely used for static code analyses to say you know, whether you are being consistent with your types, I mean you are, in other words you are saying I am claiming that this is a list of strings and it will statically analyze your code and say well ok, you have only used it as if it were a list of strings so that's good. But of course, a compiler could use that type 45:12 information to produce more optimized code, and I expect that's where things will go.

45:18 Michael: Yeah, absolutely.

45:19 Mark: I am hoping that Cython will actually adopt that syntax, I mean, there are the compilers like Nuitka and so on, may adopt that. Now that typing is a standard module one hopes that these 3rd party compilers will adopt her.

45:34 Michael: Yeah, at least as an option, right?

45:36 Mark: Yeah, and it would mean consistent code then, it would mean you could write your code using typing and we 45:42 for one of the compilers you chose would give you some kind of speed up.

45:46 Michael: Yeah, beautiful.

45:46 Mark: Yeah.

45:47 Michael: All right, so we don't have a lot of time left in the show, but I wanted to give you a chance to just talk about some other projects on your website, one of the ones that you are working on is something called DiffPDF and another one was the Gravitate game, I thought those were kind of interesting.

46:00 Mark: Yeah, the game was just for the fun, I did it in one of my book, I think it's actually in Python In Practice, because I'd never put a game in the book, and I thought why not; I wrote in tkinter but I have got Qt versions and on the website I've got a Javascript version I did, I used the canvas. There is basically the same game but with the things gravitating to the middle rather than falling to the bottom and left, that's it. And yeah, you can do fun games with Python no problem, and of course there is a Py game library as well for people who are more sort of heavily into games. DiffPDF is paying my salary basically, it compares pdfs and you might think well, that's 46:44 you just compare the pixels and it can do that, and lots of other tools can do that, but what turns out to be quite tricky is comparing the text as text, because pdfs are really a graphical file formats so a pdf file doesn't actually know what a sentence is or even a word, so it can break up text in quite weird ways and this pdf gives you a rational comparison.

47:09 Michael: Yeah, cool, so you can ask questions like is the central content changed, not just like something bold or whatever, right?

47:16 Mark: Yeah, and I thought it would be used by publishers, I wanted to use it originally to compare if you do a second printing of a book, not a second edition, but second printing, the publisher will let you make minor corrections as long as he doesn't change the pagination, and having to like check that I haven't messed up by looking at 300, or 400 or 500 pages, was pretty tiring, so that was an incentive to create this tool.

47:44 Michael: You are like within that time it would take me to do this I could write an app and solve the problem, right?

47:48 Mark: Exactly, but it turns out it's used by finance companies, insurance companies and banks.

47:54 Michael: Ah, the lawyer types, yeah.

47:55 Mark: Yeah, but why- I don't know because they won't tell me. But they use it [laugh] And as long as they buy I don't care, I mean that's great.

48:04 Michael: That's cool, and is that written in Python?

48:06 Mark: It is, it was originally written in C++ but now it's written in Python, it uses the model of concurrency I described and it's a windows specific product; it uses the third party pdf library that I bought, 48:21 and yeah, that's been paying away, but I've come up with another program, one that I originally wanted to write more than 20 years ago, but I didn't have the skill then and the tools weren't available anyway, and that's XindeX, and it's a book indexes, and there are some existing products out there for book indexes, but this one uses Python and it uses sql lite which I adore as a date space, I really like it. So, I get all the reliability and also the conveniance sql lite has full text search, and that's just absolutely superb.

49:06 Michael: Oh yeah, that's excellent.

49:07 Mark: So that application has only gone on sale the end of the last month.

49:12 Michael: Wow, congratulations on that, that's excellent.

49:13 Mark: Thank you.So I am really pleased about that, and I am waiting to see like people can use it for 40 days free trial so I'll see in a couple of months if people actually buy it. [laugh]

49:24 Michael: Well, good luck on that, that's great. So before we end the show, let me ask you a couple of questions that I always ask everyone. So, there is close to 80 000 PyPi packages out there, and everybody uses some that are super interesting that maybe do not make to get their round that everyone knows about, so what are the ones that you really like?

49:44 Mark: Well, obviously, I use pyside, now I used a roman one as well, because I use roman numerals like in indexing thing, but the, and I obviously use the cx freeze, and I use pywin32 which is very useful for windows. And I use wmi, which is windows-

50:07 Michael: Windows management infrastructure I think.

50:08 Mark: Thank you very much, because I had forgot. But the one that I want to sort of boost if you like is APSW another Python SQlite wrapper, and as you know, the Python standard library has a sqlite 3 module which is perfectly good, nothing wrong with that, but APSW is absolutely excellent, it provides you as a Python programmer with all the access to SQLite that you would get if you were a C programmer but with all the pleasure of programming in Python.

50:39 Michael: Wonderful.

50:39 Mark: So, you can create your own custom functions, in Python that you can feed into it so you can create your incalations, 50:46 and you can even create your own virtual tables everything that you can do in C you can do in Python and it's just a fantastic library it doesn't follow precisely the DB API2 it does where it can, but it favors, if there is a choice and like SQLite offers more it offers you the more because it's designed to give you everything that SQLite has to offer. I mean if you are wanting to prototype on SQLite for transferring to another data base then use the built in SQLite tree, but if you want to use SQLite for example as a file format of for some other purpose where you are only going to be using SQLite, then APSW is the best module I have ever seen for doing that.

51:33 Michael: Yeah, that's wonderful. Well, we are on database packages I'll also throw out records by Kenneth Reitz which he called SQL for humans, which is like the simplest possible sort of alternative to DB API that I could find. It's like the opposite end, it's like super simple, not like you know, we are going to give you access to everything.

51:52 Mark: Ok, I mean, I like writing raw SQL so APISW suits me, I know there are things like SQLAlchemy and that give you a high level but I love APSW.

52:02 Michael: Yeah, wonderful. Ok, last question, when you write some Python code what editor do you use?

52:07 Mark: I use GVIM. So graphical Vim, now I am not saying I would recommend that, I think that it could drive someone insane trying to learn it because it is a strange editor, but I have been using it for more than 20 years now, I just would find it hard to use another one for daily work.

52:26 Michael: All right, excellent, thanks for recommendation.

52:29 Mark: Ok, and thank you very much for having me on your show.

52:32 Michael: Yeah, Mark it's been a great conversation and I am really happy to shine a light on this whole concurrency story and the GUI story as well in Python because they don't get as much coverage as I think they should.

52:43 Mark: No, and Python is good at both of those things, Python gives you a lot of concurrency options so you've got more choice and you need to choose it with more care, but if you choose well, then it will give you great performance.

52:57 Michael: Yeah. Absolutely, well thanks for being on the show it was great to talk to you.

53:00 Mark: Thank you very much.

53:00 This has been another episode of Talk Python To Me. Today's guest was Mark Summerfield and this episode has been sponsored by Hired and Snap CI, thank you both for supporting the show.

53:00 Hired wants to help you find your next big thing, visit hired.com/talkpythontome to get 5 or more offers with salary inequity presented right upfront and a special listeners' signing bonus of $2000.

53:00 Snap CI is modern continuous integration and delivery, build test and deploy your code directly from github all in your browser with debugging, docker and parallelism included. Try them for free at snap.ci/talkpython.

53:00 Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python.

53:00 You can find the links from the show at talkpython.fm/episodes/show/58

53:00 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at/play and direct RSS feed at /rss on talkpython.fm.

53:00 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song at talkpython.fm/music.

53:00 This is your host, Michael Kennedy. Thanks for listening!

53:00 Smixx, take us outta here.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon