Create better Python programs with concurrency, libraries, and patterns

Episode #58, published Tue, May 10, 2016, recorded Wed, May 4, 2016

Episode Deep Dive Transcript

What do you focus on once you've learned the core concepts of the Python programming language and ecosystem?

Obviously, knowing a few fundamental packages in your space is critical. If you're a web developer, you should probably know flask or pyramid, and sqlalchemy really well. If you're a data scientist, import pandas, numpy, matplotlib need to be something you type often and intuitively.

But then what? Well I have a few topics for you! This week you'll meet Mark Summerfield, prolific author of many Python books. We spend time digging into the ideas behind his book Python in Practice: Create Better Programs Using Concurrency, Libraries, and Patterns.

What I really like about these topics is that they have a "long shelf life". You find them relevant over time even as frameworks come and go.

Links from the show:

Mark on the web: qtrac.eu

Books:
Python in Practice: Create Better Programs Using Concurrency, Libraries, and Patterns:
amzn.to/1SMkk4n
Programming in Python 3: A Complete Introduction to the Python Language:
amzn.to/24quCP1
Rapid GUI Programming with Python and Qt: The Definitive Guide to PyQt:
amzn.to/1TlYHUk
Advanced Qt Programming: Creating Great Software with C++ and Qt 4:
amzn.to/1SMkpVr
Programming in Go: Creating Applications for the 21st Century:
amzn.to/1TlYO28
Advanced Python 3 Programming Techniques:
amzn.to/1SMkvwp
Programming in Python 3: A Complete Introduction to the Python Language:
amzn.to/24quYF2

Packages:
APSW package: rogerbinns.github.io/apsw
cx_freeze: cx-freeze.sourceforge.net
pywin32: sourceforge.net/projects/pywin32
roman package: pypi.python.org/pypi/roman
wmi package: timgolden.me.uk/python/wmi
Records: SQL for Humans:
kennethreitz.org/essays/introducing-records-just-write-sql

Extras:
Michael's episode on Away From The Keyboard pocdast:
awayfromthekeyboard.com
Updated course / player:
talkpython.fm/course

Episode Deep Dive

Guest introduction and background

Mark Summerfield is a seasoned Python developer, prolific author, and trainer. He began writing Python in the late 90s, shifted nearly all of his internal tools to Python within a year, and has since authored multiple Python books. Some well-known titles include Programming in Python 3, Rapid GUI Programming with Python and Qt, and Python in Practice: Create Better Programs Using Concurrency, Libraries, and Patterns. Mark’s work focuses heavily on Python’s concurrency, GUI programming with Qt, and writing maintainable, clean code.

What to Know If You're New to Python

If you’re still ramping up on Python’s core concepts, remember that Python’s syntax enforces indentation rather than braces, which was a memorable shift for Mark when coming from C-like languages. Concurrency in Python often revolves around choosing between threading, multiprocessing, or async/await. Also keep in mind the “global interpreter lock” (GIL) means that true parallel CPU-bound work will usually require multiple processes rather than just threads. Brush up on basic I/O and CPU-bound tasks to decide which concurrency approach makes sense for your next steps in Python.

Key points and takeaways

Using Concurrency to Create Better Python Programs Concurrency can significantly improve performance when applied correctly, especially to CPU-bound or high-latency I/O tasks. However, you need to choose the right approach: threads for I/O and multiprocessing for CPU-intensive code. If you’re new to concurrency, be sure to start with a non-concurrent (serial) solution first for easier debugging and performance baselining.
- Links and tools:
  - concurrent.futures
  - multiprocessing
Threads vs. Processes and the GIL Python’s GIL (Global Interpreter Lock) often means threads won’t speed up CPU-bound tasks but work fine for I/O-bound scenarios. Multiprocessing, on the other hand, spawns separate interpreter processes, each free of the GIL constraints, making it well-suited for CPU-intensive work. Mark emphasized carefully deciding which concurrency pattern, threading or multiprocessing, matches your program’s workload.
- Links and tools:
  - Threading
Simplifying Concurrency with concurrent.futures The concurrent.futures module is a high-level Python library that streamlines concurrent code with its Executor API. You can switch between threads and processes simply by changing the executor type, although you must ensure objects are picklable if you use process-based concurrency. This high-level approach removes much of the complexity of manual thread or process management.
- Links and tools:
  - concurrent.futures.ProcessPoolExecutor
  - concurrent.futures.ThreadPoolExecutor
Concurrency in GUI Applications Mark discussed a special case where a GUI thread remains responsive while CPU-intensive work is delegated elsewhere. His recommended pattern: keep one “manager” thread to handle communication and hand off actual heavy work to separate processes, preserving the GUI’s responsiveness. This decouples the UI from the work so a user can still interact or even cancel operations.
- Links and tools:
  - PySide (Qt for Python)
  - tkinter
Building the Non-Concurrent Version First In Mark’s experience, starting with a serial implementation provides a solid baseline for correctness and performance. Only after you confirm the program’s functionality and identify real bottlenecks do you introduce concurrency. This approach makes troubleshooting simpler since you already know the single-threaded code works correctly.
Copying vs. Sharing Data in Parallel Code When using processes, Python typically requires pickled data, which inherently discourages shared-state concurrency. For CPU-bound tasks, consider copying data (e.g., copy.deepcopy) or working with data that doesn’t need shared writes. Mark emphasized that concurrency overhead and locking can be more expensive than copying data in many scenarios.
- Links and tools:
  - copy.deepcopy
Cython for Extra Performance For the rare cases where you still need more speed, Cython can compile Python into C extensions and potentially double performance out of the gate. By adding type hints, Cython can optimize even further, especially if you’re dealing with numeric or tight loops. Mark shared that he’s seen significant improvements over pure Python with moderate effort.
- Links and tools:
  - Cython
High-Level Networking Approaches in Python While Python can do low-level socket programming, Mark finds it easier to build applications with higher-level libraries like XML-RPC or RPyC. XML-RPC simplifies cross-language service calls, while Python-specific solutions such as RPyC or Pyro can be faster and more direct if you only need Python-to-Python communication. Both keep the networking details under the hood, letting you focus on your application logic.
- Links and tools:
Advanced SQLite Features with APSW Mark highlighted that Python’s built-in sqlite3 module is great, but APSW (Another Python SQLite Wrapper) offers more advanced features mirroring what’s available to C-based developers. APSW can create custom collations, user-defined functions, and even virtual tables in pure Python. If you rely heavily on SQLite’s advanced capabilities, APSW is worth exploring.
Python GUI Development with Qt Mark is a longtime advocate for the Qt toolkit due to its cross-platform consistency and robust features. He’s authored books on PySide/PyQt and believes desktop GUI applications still matter alongside modern mobile and web apps. If you need a polished, native-looking cross-platform desktop interface in Python, consider PySide or PyQt.

Links and tools:
- PySide (Qt for Python)
- PyQt

Interesting quotes and stories

"It's easy to get seduced into concurrency because it's fashionable, but you only want it if you actually need it." -- Mark Summerfield

"I always recommend starting with a serial version first because it gives you a clear baseline for debugging and performance before you introduce concurrency." -- Mark Summerfield

Key definitions and terms

Concurrency: The ability of a program to deal with multiple tasks or operations at once. In Python, concurrency can be achieved with threading, multiprocessing, or async/await.
Global Interpreter Lock (GIL): A mechanism in CPython that allows only one thread at a time to execute Python code, limiting true parallelism for CPU-bound threads.
Multiprocessing: Running multiple processes (each with its own GIL) for parallel CPU-bound tasks.
Threading: Using threads within a single process, useful for I/O-bound tasks that release the GIL.
Cython: A compiler and superset of Python that can yield C-level performance improvements.
APSW: Another Python SQLite Wrapper, which gives Python developers full access to advanced SQLite features.

Learning resources

Overall takeaway

Mark Summerfield’s insights underscore that writing better Python programs often means striking a balance between readability, maintainability, and performance. Concurrency can be a game changer, especially for I/O-bound or CPU-intensive operations, but only when approached thoughtfully. By mastering modules like multiprocessing or concurrent.futures, experimenting with tools like Cython, and leveraging best practices for data-sharing strategies, you can make your Python apps faster and more responsive without sacrificing maintainability.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 What do you focus on once you've learned the core concepts of the Python programming language

00:04 and ecosystem? Obviously, knowing a few fundamental packages in your space is critical.

00:08 So if you're a web developer, you should probably know Flask or Pyramid and SQLAlchemy really well.

00:13 If you're a data scientist, import pandas, numpy, matplotlib need to be something you type often

00:19 and intuitively. But then what? Well, I have a few topics for you. This week, you'll meet Mark

00:24 Summerfield, a prolific author of many Python books. We spend our time digging into the ideas

00:29 behind his book, Python in Practice, Create Better Programs Using Concurrency, Libraries,

00:34 and Patterns. What I really like about these topics is that they have a long shelf life.

00:38 You'll find them relevant over time, even as frameworks come and go.

00:42 This is Talk Python To Me, episode 58, recorded May 4th, 2016.

00:59 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

01:17 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter,

01:21 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm

01:26 and follow the show on Twitter via at Talk Python. This episode is brought to you by Hired and SnapCI.

01:33 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.

01:41 Hey, everyone. A couple things to share with you. First, we'll be giving away an electronic copy of Mark's book this week.

01:46 As always, just be registered as a friend of the show on talkpython.fm. I'll pick a winner later in the week.

01:51 Next, I had the honor to spend an hour with Cecil Phillip and Richie Rump on their laid-back,

01:57 technical but casual podcast, Away from the Keyboard.

01:59 I really enjoyed the conversation, and I think you'll like their podcast, too.

02:03 Give them a listen at awayfromthekeyboard.com.

02:07 Finally, those of you taking my Python Jumpstart by Building 10 Apps course,

02:10 I have a few improvements for you.

02:12 I added transcripts to the player, the website, and the GitHub repository,

02:16 as well as added some activity tracking across devices so you know which lectures you've watched.

02:21 I hope you find these additions useful.

02:22 Now, let's chat with Mark.

02:24 Mark, welcome to the show.

02:26 Oh, thank you very much. I'm glad to be on it.

02:28 Yeah, I'm super excited to talk about all these great books you've written,

02:31 and one of them in particular really caught my attention, called Python in Practice,

02:36 Create Better Programs Using Concurrency, Libraries, and Patterns.

02:40 And that just really speaks to me on some of the most important parts of

02:45 sort of design pattern and improving your overall skill, not just focused on libraries, like Flask or something.

02:51 Yeah, sure.

02:53 One of the motivations for writing that particular book was I wanted to write something for people who were already comfortable writing Python,

03:02 but showing more of the high-level things you could do with Python.

03:06 You know, if you wanted, for example, to do really low-level networking with TCPIP,

03:12 you can do that in Python.

03:14 It's got the libraries.

03:15 All the facilities are there.

03:17 But if what you're more interested in is application programming, you might not want to go so low-level.

03:23 And Python, either built-in or in third-party libraries, has wonderful facilities for doing high-level stuff,

03:30 whether it's networking, concurrency, and GUIs, and things like that.

03:34 So I wanted to look at some of the facilities that are available, both built-in and third-party,

03:40 that allow you to do some fantastic things with Python without having to get right down into some nitty-gritty details.

03:47 Yeah, that's beautiful.

03:49 I find that when people are new to Python, and this includes myself when I'm working in some area that I haven't worked in a lot,

03:56 I'll not realize there's some really simple thing that I can use.

04:02 And I think it's really great.

04:04 There's a lot of those little tips and tricks in your book.

04:06 But before we get into the details of your book, let's start at the beginning.

04:09 Sure.

04:10 How did you get into programming in Python?

04:11 I started programming on paper in the late 70s.

04:17 I started reading computer magazines.

04:18 I taught myself basic purely off the magazines.

04:21 And I wrote my code on paper, and I executed it on paper.

04:25 Then eventually, I bought a home computer.

04:29 I don't know if your listeners will remember what they are, but they were things before PCs.

04:32 Very limited, but quite a lot of fun.

04:35 And eventually, I went on to do a computer science degree, which I absolutely loved.

04:42 And that gave me a lot of the sort of theoretical background.

04:45 And then I just went into software development.

04:48 And I'd been doing that for quite a few years when I bumped into someone who suggested trying Python.

04:54 And I borrowed a book from a colleague, a fellow developer on Python.

04:59 And I hated the book.

05:00 And that put me off Python for about a year.

05:03 And that was one of the motivations for writing my first Python book, was to write one that would actually encourage people to use it.

05:10 But once I started using it, within a year, all of my utility programs and tools that I use for my daily work,

05:18 they were in Python, because I just loved it.

05:22 So that was around 1999.

05:26 And now, everything I do, the first choice language is always Python.

05:30 Yeah, the Python ecosystem and, frankly, the language is fairly different from the late 90s today.

05:36 Oh, massively different.

05:38 I mean, I actually didn't like the indentation at first.

05:41 That took me like 48 hours before it really clicked.

05:45 Wow, no braces.

05:47 I just don't have to bother.

05:48 And that was really nice.

05:50 And also, of course, it forces your code to be quite neat in the first place.

05:54 It doesn't, of course, make you use good variable names and things like that.

05:58 You have to learn that separately.

06:00 But that applies to any language.

06:02 But I really liked Python pretty well from the get-go.

06:06 Yeah, so did I.

06:07 I think the indentation does catch a lot of people off guard.

06:11 And to me, it's kind of like good science fiction.

06:14 You have to sort of take this moment, the suspension of disbelief.

06:18 Like, just imagine for a minute this white space concept is a good idea and work with it for a week.

06:25 And then it just dawns on you, like, wow, this really is a good idea.

06:28 Like, I went back to working in some C-based languages right after I sort of learned Python and started working in it.

06:35 And the white space was a shock to me at first.

06:38 But what was even a bigger shock was these C-based languages that I loved.

06:43 I all of a sudden hated all the parentheses, the curly braces, the semicolons.

06:47 And that was such a surprise to me that I felt that way.

06:50 But it was within a week.

06:52 It was just completely over the semicolon.

06:53 Sure.

06:54 And of course, there's no dangling else problem that you can get in C or C++.

06:59 You know, if you've got an else with the correct indentation, you know what's going to be executed.

07:04 You're not going to get caught because you hadn't put in braces, you know?

07:07 Yeah, absolutely.

07:08 Yeah.

07:09 It really works for me.

07:10 And Python, the language.

07:12 Okay.

07:13 It's Turing complete.

07:14 And so is Perl.

07:14 And so is C++ and Java and all of these languages.

07:17 So you can do anything in one of them that you can do in the other.

07:21 So why choose one particular language rather than another?

07:25 And I think part of that is, well, what's the libraries and ecosystem like?

07:30 And part of it is what fits the way you think best.

07:33 And in my case, it happened to be Python that works.

07:36 But, you know, I wouldn't argue against someone who preferred some other language if that suited them.

07:41 Yeah, absolutely.

07:41 But for me, Python is a great language.

07:44 And particularly Python 3.

07:46 I started using that from the first alpha.

07:49 I'd ported everything from Python 2 to Python 3 right at that stage.

07:54 And I think it's excellent.

07:55 Yeah.

07:55 You know, maybe that's a good segue to sort of taking a survey of the books that you've written.

07:59 Because you've written many books.

08:01 Is something, how many, around eight, seven?

08:05 Depends whether you count second editions.

08:07 Without second editions, it's seven.

08:09 And with them, it's nine.

08:10 Okay, excellent.

08:11 Yeah.

08:12 And one of them you wrote is a pretty sizable Python 3 book, right?

08:17 Yes.

08:17 That's Programming in Python 3.

08:19 Yeah.

08:20 That book is really aimed at people who can program in something or other.

08:25 And it's to port them over to Python 3.

08:27 But it should also work for people.

08:29 And the something or other could be Python 2.

08:32 Right, right.

08:34 So that's who it's aimed for.

08:35 It's not aimed at beginners.

08:36 The subtitle was poorly chosen by me.

08:40 A complete introduction to the Python language.

08:43 Sometimes people think that introduction means it's introductory, which wasn't the intended intention.

08:49 It was just, it's introducing everything that Python 3's got that you're going to use in normal, maintainable programming.

08:57 The only things I don't tend to cover in my books are things that I think are dangerous and obscure.

09:03 So, for example, in Python, you can disassemble Python bytecode, rewrite it, and put it back.

09:09 And that's brilliant.

09:10 But I wouldn't ever cover that in my books because I wanted to cover stuff that people can maintain.

09:15 Understand that it runs.

09:18 That's right.

09:18 Absolutely.

09:19 Absolutely.

09:19 Maintainability and understandability are really important to me because, in my experience, you live with code for quite a long time.

09:27 You know, literally years.

09:29 So you don't want to torture yourself when you have to go back and fix something or do a modification to something.

09:37 And it's been a few years since you've seen that code.

09:40 Yeah.

09:40 Oh, absolutely.

09:41 So maybe tell us some of the other books that you wrote.

09:45 Okay.

09:45 You've got some interesting topics there.

09:47 I think the first one I wrote concerning Python is Rapid GUI Programming with Python and Qt, which is about PyCute programming with PyCute 4.

09:56 Although the book is – I mean, I've updated the examples for PySide as well.

10:01 I really like the Qt GUI toolkit.

10:05 I like cross-platform GUI programming.

10:08 And that book, the first part of it is actually a very brief introduction to Python programming itself.

10:15 And I was quite pleased.

10:17 I had very good feedback on that.

10:18 Many people said, well, I already knew Python, but I read the introduction because, well, I bought the book.

10:24 And I still learned things from it.

10:26 So I was glad about that.

10:27 And Qt itself, I think, is good.

10:30 I know that it's very fashionable, you know, mobile programming and web programming and things like that.

10:35 And they're great.

10:36 But I personally love desktop programming and desktop GUI applications.

10:42 So it was an expression of something that I was really interested in and really enjoyed doing.

10:46 Yeah, that's great.

10:47 I agree with you that desktop apps are really – they don't get enough love, you know, because mobile is a super flashy thing.

10:55 I actually spend most of my time writing web apps.

10:57 But I do very much appreciate a good desktop app.

11:01 And I think Qt is one of those frameworks that is cross-platform but doesn't feel cross-platform as a user.

11:07 You're not like, oh, yeah, this button and this UI completely looks foreign.

11:12 But it technically is a UI, right?

11:14 Absolutely.

11:15 Yeah.

11:15 I mean, it really does – they really do well with native look and feel, even though they're not using native widgets, unlike, say, WXPython, which does.

11:23 They emulate it all.

11:25 But they do it very well.

11:26 And some of their stuff on Windows is faster than native.

11:29 Wow.

11:30 The way that it's implemented.

11:32 Because they're not – when they create a window, they're not creating all those widgets, which you would do using a normal Windows toolkit.

11:40 They just basically create an image.

11:43 Right, right.

11:44 So it's much, much cheaper and much faster in terms of resources.

11:47 So, yeah, I love that toolkit.

11:49 And that's how I'm earning my living now because I'm earning my living writing on the back of commercial Python applications for desktop users.

12:00 So they're written in Python, Python 3, PySidem Qt.

12:04 Oh, wonderful.

12:05 And do you ship those with something like CX Freeze to sort of package them up or was it PyToApp?

12:11 That's exactly right.

12:12 No, I use CX Freeze.

12:14 And so, yeah, they're quite a big download because the Qt libraries are all in the bundle.

12:18 And people can download them, try them, and, you know, hopefully buy them.

12:26 Yeah, wonderful.

12:28 And that seems to have worked for the last couple of years quite nicely.

12:31 Yeah, okay.

12:32 Fantastic.

12:33 Yeah, that's a really cool story.

12:34 It is possible to earn a living with Python not doing web programming.

12:38 I mean, obviously, you can earn good money doing web programming as well.

12:42 Right, of course.

12:43 Or data science.

12:44 Those are the two most common ones.

12:46 Oh, absolutely.

12:46 Yeah.

12:47 Yeah, absolutely.

12:48 So you also have some books on Go and C++.

12:52 Oh, I've got a Go book.

12:53 I did that because I was just really interested in Go.

12:57 I think Go and Rust are both sort of new languages that I am interested in.

13:02 And I like the simplicity of Go.

13:05 And I'm also interested in concurrency.

13:08 And so I started to learn the language.

13:12 And I thought, yeah, I can write something better than what's available on this language.

13:17 But one of the authors of Go has come out with a book on Go now.

13:22 So I should think that will kill mine.

13:25 You never know.

13:27 You never know.

13:28 I mean, well, I still think mine's a good book, actually.

13:30 But there we are.

13:32 In terms of C++, I have written a few.

13:36 But they're all C++ with Q.

13:38 So I co-wrote with Jasmine Blanchett, C++ GUI programming with Q3 and then with Q4.

13:46 And then I did a solo one, advanced Q programming.

13:49 Okay.

13:50 But again, they're all C++.

13:52 I still use them for PiSide programming, you know, to remind myself how to do things.

13:58 And I actually use the C++ docs rather than the PiSide docs.

14:02 So, you know, I can translate easily enough.

14:05 Yeah, they're a little more definitive, maybe.

14:07 Yeah.

14:07 I am frustrated with C++.

14:10 I mean, C++11 was like a huge step forward.

14:13 But they just didn't deprecate enough as far as I'm concerned.

14:16 So it's getting too big.

14:18 And I think it's quite hard to write that language in a maintainable way now.

14:23 Yeah, that language is just ever-growing.

14:26 And it wasn't all that simple in the beginning.

14:28 No.

14:29 I have another question about Qt.

14:31 I definitely want to get to the topics of your books.

14:33 Sure.

14:33 Given your background there.

14:34 I just saw, when was this, back in mid-April, there was an announcement saying we're bringing

14:42 PiSide back to the Qt project.

14:44 That's right.

14:45 I'm really delighted.

14:47 Fantastic.

14:48 It's going to be PiSide for Qt 5.

14:51 Now, you can use PiQt with Qt 5.

14:54 But PiSide has been Qt 4 only so far.

14:57 But they're actually putting money behind it and investing in it.

15:00 So PiSide 2 will be for Qt 5.

15:04 And I'm really looking forward to that.

15:06 Yeah.

15:07 So am I.

15:07 Do you know the time frame on when that kind of stuff will be out?

15:10 They're doing the development in the open.

15:11 I think there's like a GitHub or some equivalent of that.

15:14 So you can look at it.

15:16 But I would guess it's going to be, I think we'd be lucky to see by the end of the year

15:22 something that's usable.

15:23 Right.

15:24 Because it's not a simple job.

15:26 Yeah.

15:26 There's a pretty big break from Qt 4 to Qt 5.

15:28 Not so much the Qt side of it.

15:30 I don't think that's the hard side.

15:32 As you know, programmers, we love reinventing things.

15:37 And when they did PiSide, they invented a new way of doing bindings.

15:42 And I think that hasn't proven to be quite as maintainable and flexible as they'd hoped.

15:48 So I think that's where they're going to have to do quite a lot of work, getting that to

15:51 work with Qt 5 and the new C++.

15:54 Right.

15:54 Okay.

15:55 Yeah.

15:55 So it's more with the PiSide version than it is the Qt itself.

15:59 Got it.

16:00 I think so.

16:01 I think so.

16:02 Yeah.

16:02 All right.

16:02 Excellent.

16:03 So I know there's a lot of interest in GUI apps from a Python perspective.

16:09 And maybe another time we can dig into Qt even more.

16:12 But let's talk a little bit about creating better Python apps.

16:15 Okay.

16:15 What was the motivation for writing this book?

16:18 I mean, you said you aimed it at people who were in the middle, but you gave it four themes.

16:23 You said, I'm going to cover sort of these general themes of code elegance, improving speed

16:28 with concurrency, networking and graphics.

16:30 How did you come to that collection?

16:32 Well, graphics, because I just love GUI programming.

16:35 So that had to go in because it's something I love.

16:38 I also, with networking, I've done a fair bit of network programming, but I'm not a low-level

16:44 person.

16:45 I like my networking to be as easy as possible.

16:49 And I wanted people to be aware that you can do networking really easily with Python without

16:54 having to go down, you know, to low-level stuff and do it in a reliable and pleasant way.

17:00 And I cover two approaches.

17:03 One is XMLRPC.

17:04 And I cover that because it's built, it's in the standard library.

17:08 And it works really easily.

17:11 So it's really nice.

17:13 And the other one I cover is a third-party one called RPYC.

17:17 There's another one called Pyro, which is also widely used.

17:23 And I could have gone with either of those.

17:25 And the advantage of RPYC or Pyro is that they can be Python-specific.

17:30 So you can get better performance, whereas XMLRPC is general.

17:35 So it's not got quite as good the performance, but it has the advantage that you can write a

17:41 client and or server using XMLRPC.

17:44 And it'll talk to anything else that uses the XMLRPC protocol.

17:49 So that's very nice for interoperability.

17:52 And they're high-level.

17:54 So all of the detail and, you know, timeouts and all of the issues that can arise in networking

18:00 is just neatly controlled and wrapped up.

18:02 And, OK, you get exceptions and things like that.

18:04 You know, all normal Python stuff.

18:07 So you don't have to worry about the details.

18:08 Yeah, it makes a lot of sense.

18:10 So I want people to be aware of that, that these kind of facilities exist.

18:14 You can basically write a wide range of different types of networking apps in Python, right?

18:20 You can go all the way down to the raw sockets with Byte.

18:23 I just talked with Mahmoud Hashemi from PayPal.

18:26 And those guys are writing services that take over a billion requests a day.

18:30 Wow.

18:31 They're doing network programming in Python, but down below the HTTP level.

18:36 Wow.

18:36 And these custom APIs.

18:38 And then you can, of course, go up higher, right?

18:40 Like XML RPC or maybe REST service with requests, things like that, right?

18:45 Absolutely.

18:47 And it's the high level stuff that I was more interested in.

18:49 And I think that's because at heart, I'm an applications programmer.

18:54 And that means I know about the subjects of my application, but I don't necessarily have the

19:00 expertise in particular areas that the application needs.

19:05 And so for that, I want high level libraries that give me the functionality that are created

19:10 by experts in those fields.

19:11 So I get the best of both worlds.

19:13 I get the functionality I need by, you know, excellent people who've developed it without

19:19 actually having to learn all that stuff myself.

19:21 Yeah, absolutely.

19:22 I think the right way to start is start simple and then, you know, then go do crazy network

19:27 stuff if you need to improve the performance.

19:29 But generally, you don't have a performance problem.

19:32 No, no.

19:33 And that actually brings us nicely to concurrency.

19:36 Python, you know, people say, oh, is Python slower?

19:40 Python can't do concurrency.

19:42 And I really wanted to address those issues because how slow is Python really?

19:47 Well, I developed a program in C++ that was very CPU intensive.

19:52 And I rewrote that program in Python and it was 50% slower.

19:57 And I think that's not bad going from C++ to an interpreted language, a bytecode interpreted

20:04 language.

20:05 But of course, I then made it concurrent.

20:08 And you could make it concurrent in C++, but it's much easier.

20:12 They're doing that in Python.

20:13 And so on a dual core machine, suddenly it was as fast as C++.

20:19 Give it more cores.

20:20 And it was faster than the C++.

20:22 So even though baseline, yeah, it's 50% slower on real hardware using concurrency, it was much

20:29 faster.

20:30 And that's really what the user is going to care about.

20:31 And that's really what the user is going to care about.

20:33 You know, on my hardware is this thing running fast.

20:35 Right.

20:36 And of course, it's much more maintainable.

20:38 I mean, doing concurrency in Python is so much easier than in most other languages.

20:43 Yeah.

20:43 And I think specifically around concurrency, it's easy to get yourself into a situation where

20:50 you've been very clever and you've thought really hard about the algorithms and the way

20:55 it works.

20:56 And you've kind of written something just at the limit of your understanding.

21:00 Like you totally understood what you did, but it's at the very edge.

21:04 But of course, understanding multithreaded code, debugging it is harder than writing it.

21:09 And so maybe it's like, you know, you've sort of gone a little too far.

21:13 You're like, okay, I could write this, but I don't really understand how to fix it when it

21:17 goes wrong.

21:17 Yeah.

21:18 And that is a real problem with concurrency.

21:20 In some languages, you're stuck because the concurrency facilities, they offer pretty basic.

21:27 So they don't make it easy.

21:29 But Python offers higher level concurrent approaches to concurrency as well as low level.

21:34 For example, you've got the concurrent futures module, which makes it much easier to create

21:42 either separate threads or separate processes where Python takes care of lots of the low level

21:47 details.

22:17 Within the first week, and there are no obligations ever.

22:20 Sounds awesome, doesn't it?

22:21 Well, did I mention the signing bonus?

22:23 Everyone who accepts a job from Hired gets a $1,000 signing bonus.

22:27 And as Talk Python listeners, it gets way sweeter.

22:29 Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $2,000.

22:35 Opportunity's knocking.

22:37 Visit Hired.com slash Talk Python To Me and answer the call.

22:45 I think you really put this together quite nicely in terms of sort of breaking out the different

22:50 types of concurrency.

22:52 And it helps you understand if you sort of think, okay, well, what are the types of concurrency?

22:56 What type of problem am I solving?

22:58 Then you have a pretty good recommendation for if it's this type of problem, solve it this

23:03 way.

23:04 And so you said there were three types.

23:05 You called them threaded concurrency, process-based concurrency, and then concurrent weighting.

23:10 Yeah, yeah.

23:11 Basically, if you're doing CPU-intensive work, then using threading in Python is not going to

23:17 help because of the global interpreter lock.

23:21 So if it's CPU-intensive and you need concurrency, then you need to use a different method.

23:26 And Python offers, for example, multi-processing, where you can split your work over multiple

23:32 processes rather than multiple threads.

23:34 And each of those has its own separate interpreter lock so that they don't interfere with each

23:40 other.

23:41 Right.

23:41 But I think the key to getting real performance from concurrency is to avoid sharing insofar as

23:49 you can.

23:50 And that means either you don't need to share in the first place, or if you've got data that

23:56 needs to be looked at by your multiple threads or multiple processes, then it may be cheaper

24:01 to copy that data rather than have them share some kind of locking to look at it.

24:07 Right.

24:07 And that's a mistake I think a lot of people make is they see the way that their program is

24:13 working now.

24:14 They've got some shared...

24:15 They've got a pointer they're passing to two methods or whatever, two parts of the

24:18 algorithm.

24:19 And they're saying, well, this part's going to work on this part of the memory and this

24:22 one's going to work over here.

24:23 And so they think, well, when I parallelize this, this is shared memory access.

24:28 And of course, you have to take a lock or somehow serialize access to that data.

24:33 And it's easy to forget that, you know, if that's like a meg or even maybe 50 megs of

24:39 data, it might be so much easier both for you and advantageous for performance to just say

24:45 copy.

24:45 Make a copy and just pass it over and then correlate the differences later.

24:49 Absolutely.

24:50 The other possibility is to share the data without copying, which is fine if you never write to

24:56 it.

24:57 So if you've got data where you're just reading in information like a log or some data

25:02 stream, and you're never changing the information you're reading, you might be producing new output,

25:08 but that's separate.

25:09 So if the stuff you're reading, you can read that from a shared data structure as long as

25:14 you read.

25:14 And that's not going to be a problem.

25:16 It's only when you're going to start writing that sharing becomes an issue.

25:21 And then you've got problems if you don't lock.

25:24 But the best way is still don't lock.

25:27 The best way if you're writing data is write or save the data in separate chunks and gather

25:33 it together at the end.

25:34 That will often be less error prone and faster.

25:38 Yeah, absolutely.

25:39 But of course, sometimes there isn't a choice.

25:41 Sometimes you do need to lock.

25:43 And then that's when it becomes quite difficult to reason about because you've got to be clear

25:51 when you need to lock and when you need to unlock and all the rest of it.

25:55 And that's where the difficulty comes in.

25:57 But if you can avoid having to lock, then you can get good performance without problems.

26:03 Absolutely.

26:06 And there's some of the data structures in the newer versions of Python that sort of solve

26:13 that problem for you.

26:14 And so we'll talk those a little bit.

26:16 But when you said copy data, one way to say copy a data structure, like obviously you can't

26:22 just pass the pointer over or get another variable and point to the same thing, right?

26:27 Because it's a pass by reference, not pass by value type of semantics.

26:30 So there's actually a copy module in the standard library, right?

26:34 Yeah.

26:35 And basically, what I would contend is this.

26:38 If you use copy.deepcopy in a non-concurrent program, then there's almost certainly something

26:46 wrong with your logic.

26:47 But if you're using a concurrent programming, then it may well be the right solution for you.

26:54 Because deep copying can be expensive if you're using like nested data structures, like dictionaries

26:59 with dictionaries and things that are quite large, for example.

27:02 But nonetheless, it may be the right tradeoff.

27:05 Of course, the only way you're going to know for sure is to profile and actually time things.

27:10 Because that's the other sort of big issue, isn't it?

27:13 That, you know, we have an intuitive feeling this will be fast or that will be slow.

27:17 But unless you back it up with numbers, you could be optimizing something that makes no

27:22 difference whatsoever.

27:24 Yeah, you had some really interesting points there that I thought were both interesting and

27:29 good advice.

27:30 One was, if you're going to write a concurrent program, write the non-concurrent serial single

27:37 threaded version.

27:38 Absolutely.

27:39 first, if you can, and then use that as the baseline for your future work, right?

27:44 I mean, one of my commercial programs, it does its work using concurrency.

27:49 It's written in Python and it's concurrent.

27:50 But I have two modules that do the processing.

27:55 And one uses concurrency and one doesn't.

27:58 And the tests, I have to make sure they produce exactly the same results.

28:03 And of course, one is much slower than the other.

28:05 But I found that incredibly useful in the early days, particularly for debugging.

28:10 And I still use the non-concurrent one if there's some tricky area that I want to focus

28:15 on without having to worry about concurrency.

28:17 So I found it's paid off in terms of saving my time as a programmer.

28:22 And that's the other kind of time, isn't it?

28:24 It's not just the processing or runtime of your software.

28:28 It's the time you spend not just creating this stuff, but maintaining it.

28:34 And concurrency can cost you a lot of maintenance.

28:39 Oh, yes.

28:40 Unless you're very careful.

28:41 Yeah, I'm a strong advocate.

28:43 Get the non-concurrent version working first.

28:45 And of course, it may turn out that that's actually fast enough anyway.

28:50 Absolutely.

28:51 And then you've saved yourself a whole lot of trouble.

28:54 It's sort of the whole premature optimization issues.

28:57 Yeah.

28:57 It's easy to get seduced into doing things concurrently because it's very fashionable and you can brag

29:03 about it.

29:04 But quite honestly, it's got to be the right solution.

29:07 And you're not going to know that until you've done a non-concurrent one first, I think.

29:11 Yeah, that's a good point.

29:12 And sort of related to that is the performance story.

29:16 So you have some examples in your book where you write the concurrent version and then you have

29:22 the concurrent version, but you write it in several ways.

29:24 And you also have the serial version.

29:27 That's right.

29:28 Just to compare to see, to show what difference different strategies make depending on circumstances.

29:34 Right.

29:35 And for the CPU based one, you were doing like image analysis and processing.

29:39 Yeah, because that's expensive in terms of CPU.

29:41 And of course, if you use threading, it kills performance.

29:44 Yeah.

29:45 Ironically, in CPython, it's actually several times slower if you do it in parallel.

29:52 Yeah, because it's only actually running on a single call at a time and you've got context

29:58 switching on top of it.

29:59 Whereas if you use multi-processing, Zoom, it can run free.

30:03 You know, it'll max out your calls and it will go as fast as your machine's capable of.

30:07 The performance story around threading is super hard to see the whole picture because we obviously

30:13 know when you take a lock here, that slows down both the threads and the context switching

30:18 is slow.

30:19 But then you also have the sort of memory usage.

30:22 You have the L1, L2 caches and registers.

30:25 And so like when you switch from one thread to the other, it could pull in data that trashes

30:31 your cache and you've got to go get it from main memory, which is like 100 times slower.

30:34 And it's just, it's very subtle.

30:37 And yeah, if you're networking, then generally using threads is fine because the network latency

30:44 is so dominant.

30:46 I'm not talking about in terms of if you're like Google and doing like massive servers,

30:51 but for a lot of, if you like more ordinary applications, then threading is fine for that.

30:58 But of course there is the new asyncio library.

31:01 The GIL is basically one of the problems.

31:03 It means you don't get any of the concurrency that you're aiming for computationally, but

31:07 you still get all the overhead.

31:08 Well, you don't get it at the Python level.

31:11 You will get it at the C level.

31:12 So if you have something that's running with Python threads and actually the work is being

31:19 done by say a C library, if that C library is written well, it'll release the GIL, do its

31:26 work and then reacquire it when it needs to pass data back.

31:29 So it's, it's, it's not a simple story, no matter how you look at it.

31:33 Yeah.

31:34 Okay.

31:34 So you still have, you've got a performance test, but you've got to have that serial version

31:38 to give you a benchmark so that you know whether you're getting better or worse.

31:43 Right.

31:43 Absolutely.

31:44 And also just doing that serial one, it will give you insights into the problem you're solving

31:49 anyway.

31:49 And it's better to make mistakes with the serial one than with the concurrent one because you've

31:53 got less to think about then.

31:55 Yeah, that's for sure.

31:56 If you really do have a problem and it is slower, the trick is to use sub processes.

32:01 Yeah, use the multi-processing module, which prior to 3.2, I found not terribly reliable,

32:09 but they did loads of improvements in 3.2 and certainly in 3.3 and 4.

32:15 It's absolutely rock solid, both Windows and Linux, which are the ones, the platforms I use.

32:20 And it's brilliant.

32:22 It's absolutely an excellent library.

32:24 Yeah.

32:25 And I just want to point out to people, like when you hear say use sub processes, that doesn't

32:28 mean just go kick off a bunch of processes and manage it yourself, right?

32:31 There's sort of a concurrent library for managing them, right?

32:35 Multi-processing library.

32:37 Oh, the multi- yeah.

32:38 I mean, there's nothing to stop you.

32:40 I mean, there is the sub process module.

32:42 You can do it all manually, but there's no reason to do that.

32:45 I mean, you can create a process pool.

32:48 In one of my applications, I do that.

32:50 And there's an asynchronous function.

32:53 You can basically just give it a Python function and some arguments and say, okay, go do this

32:58 somewhere else on some other processor.

33:00 And it'll just do that work.

33:02 If that's like expensive work, that's great because it doesn't stop your main process at

33:09 all.

33:09 And when it's done the work, it lets you know.

33:12 And you can pick up from there.

33:14 Right.

33:15 So there's the concurrent futures module, which can be, which is a very high level module,

33:21 which makes it really easy to just execute either with threads or processes stuff.

33:27 Or you can go use the multi-processing module itself with its pools and stuff.

33:32 So you can find the level that suits what you need.

33:35 Yeah.

33:36 It feels to me like if you're doing Python 3.2 or above, you should really consider maybe

33:42 the concurrent module first and the concurrent futures, because it's so easy to say, let's

33:47 do this computationally.

33:48 Let's do this.

33:49 Yeah.

33:49 Let's do this as sub processes.

33:51 Let's switch it to have like a pool of sub processes.

33:54 All of those things.

33:56 Right.

33:56 Yeah.

33:56 The other thing about multi-processing is by default, it doesn't share memory, which is

34:01 the opposite of threading.

34:02 Which means you're not going to clobber yourself by writing to something that's shared.

34:08 Of course, it means if you do want to share, you have to actually go to extra effort and

34:13 say, OK, I'm setting up this thing to be shared.

34:30 Continuous delivery isn't just a buzzword.

34:33 It's a shift in productivity that will help your whole team become more efficient.

34:36 With SnapCI's continuous delivery tool, you can test, debug, and deploy your code quickly

34:42 and reliably.

34:42 Get your product in the hands of your users faster and deploy from just about anywhere at

34:48 any time.

34:48 And did you know that ThoughtWorks literally wrote the book on continuous integration and

34:53 continuous delivery?

34:54 Connect Snap to your GitHub repo and they'll build and run your first pipeline automagically.

34:59 Thanks SnapCI for sponsoring this episode by trying them for free at snap.ci slash talkpython.

35:13 Yeah, there's some built-in data structures for sharing across process, right?

35:17 There are.

35:17 I mean, I only use them personally for flags.

35:20 You know, I tend to gather my data separately if I've got results data and then join it all

35:27 up together at the end.

35:28 So I just use flags basically to say, like, this bit's at this stage or a flag to say, look,

35:33 just stop now because the user's cancelled.

35:36 Yeah.

35:37 To give me a clean termination.

35:39 But, you know, I mean, I cover all that sort of stuff in the book.

35:42 But it's a really great module, multiprocessing.

35:46 But concurrent futures gives you that high-level approach, which makes it as simple as it can

35:52 be for this kind of stuff.

35:54 I mean, I'd still advocate not using concurrency unless you need it, you know, because it does

36:01 make your program more complicated and harder to reason with, you know, or reason about.

36:06 Yeah.

36:06 And, you know, it is easy to switch between concurrent futures using subprocesses or using

36:12 threads.

36:13 But that doesn't mean the code that you write can be just flipped from one to the other because

36:17 of the serialization issues and all, you know, the locking shared data.

36:21 So that's maybe a really subtle thing you could run into.

36:24 Well, if you're doing threading, the memory is shared by default.

36:30 So any thread can stomp on anything, which can be a problem.

36:35 But on the other hand, if you're using multiprocessing, any data that you're passing around has to

36:41 be picklable, for example, which doesn't apply as a limit in threading because you're just

36:47 accessing the same data in the same memory space.

36:49 So there are differences and there are tradeoffs.

36:51 And the API for multiprocessing started as mimicking the threading API, but it's actually

36:59 grown considerably since then.

37:01 So it's worth digging in and learning.

37:04 But I start with the concurrent futures because that is the easiest conceptually and impractical

37:10 code.

37:10 It requires the least code to get stuff done.

37:12 Yeah, absolutely.

37:13 So both the multiprocessing and the threading are pretty good for when you're doing basic

37:21 IO bound work, right?

37:22 Because the key thing to know about that is a thread when it waits on a network call in

37:27 CPython will release the gill, right?

37:29 Yeah.

37:30 But of course, there is the asyncio module, which is designed for that kind of work.

37:37 I'm not a user of that module because most of my processing is CPU bound.

37:43 Right.

37:43 But that is a third way, if you like.

37:45 Yeah.

37:46 So in Python 3.4, they added the asyncio and the concept of event loops.

37:51 And I also have not used that a lot.

37:53 But my understanding is that's a little like the Node.js style of programming.

37:57 I don't know because I avoid JavaScript as much as I can.

38:02 But basically waiting on the IO bits and releasing it to process other bits of code, other methods

38:09 while you're waiting on IO, right?

38:11 So it's going to let you know rather than you having to poll or be blocked.

38:15 Right.

38:15 Very callback driven.

38:16 Yeah.

38:17 Yeah.

38:17 Which is a perfectly good approach.

38:19 Yeah.

38:20 And then Python 3.5 added the async and await keywords.

38:24 Yeah.

38:25 Which I haven't used.

38:26 I'm still using 3.4.

38:28 Partly because I had some compatibility issues with CX Freeze at the time.

38:34 And partly because of the installer.

38:36 For my commercial software, I released both 32 and 64 bit versions on Windows.

38:43 And up to 3.4, it's really easy to install both of those side by side.

38:52 There's no problem.

38:53 But with a 3.5 installer, what I found was that some third party libraries couldn't find one

38:59 or the other.

38:59 So I'm a bit stuck with 3.5 on Windows at the moment.

39:03 Well, and the installer for Python 3.5 got a major reworking by Steve Dower, who was actually

39:11 just on the show.

39:12 What number was that?

39:13 That was 53.

39:15 So just, you know, a few weeks ago.

39:17 And the installer is much nicer than the old one.

39:20 It is, but it doesn't do what I want.

39:23 But it has, you know, right?

39:24 If you need this other thing it's not doing, then obviously you can't use it, right?

39:28 Yeah.

39:28 I need to be able to install 32 and 64 bit Python side by side.

39:34 And I can do that up to 3.4.

39:35 I'm not saying it's not possible.

39:37 I mean, I have done it with 3.5.

39:40 But what I haven't managed to do is get my third party stuff.

39:43 PyWin32 and APSW, which I'll mention at the end.

39:48 I couldn't get them working properly with it when I had both.

39:51 They work fine when I've just got one Python, but not when I had both.

39:55 But hopefully that problem will go away.

39:57 Because, I mean, sometime I'm going to, like, stop doing 32-bit versions of my apps.

40:01 I really want to look into the async and await stuff more.

40:06 Because that programming model is so beautiful.

40:10 It's just I haven't been writing any code that requires that type of work.

40:16 I like that model because it's very similar to the GUI event loop model.

40:19 I mean, the GUI event loop basically sits there and says, I'll let you know if something happens.

40:24 And you say, okay, well, if this thing happens, call this.

40:28 Yeah.

40:28 GUIs are inherently event-driven, right?

40:31 Absolutely.

40:32 They've got their message pump and everything.

40:34 So you actually, one of the, maybe the last section in the concurrency bit of your book, you talk about special considerations for GUIs.

40:43 Yeah.

40:43 I mean, I did this using TKinter simply because that's in the box.

40:47 You know, comes with Python out of the box.

40:49 Although, personally, I use PySide and Q.

40:51 But it would work.

40:53 The method works with both.

40:55 And I'm sure it would work with WX or with PyG objects, any GUI system.

41:00 And what I discovered was, how do you make, well, the question that arose for me was, okay, I've got a GUI application.

41:08 And it's got to do some CPU intensive work.

41:11 But I don't want to freeze the CPU.

41:13 Because what if the user wants to cancel the operation?

41:15 Or what if they just want to quit the application?

41:17 I don't want it frozen for like, you know, minutes on end when they can't do anything.

41:21 One of the quickest ways you can make a user believe that your application is crappy is to have it just lock up.

41:27 And on Windows, like, get that sort of white, opaque overlay saying not responding.

41:32 Or on OS X, it says force quit.

41:35 You're like, hmm, I'm a little suspicious now.

41:38 Yeah, and sometimes those messages come too early because sometimes, you know.

41:41 But anyway, so that was the problem that I had to address.

41:44 And what I found was, if I use threading, I do have a work thread and a GUI thread, the GUI still freezes.

41:51 So what I needed was some way of not having the GUI freeze.

41:56 And the model that I came up with was I have two threads, one for the GUI, and what I call, rather sarcastically, the manager thread.

42:05 And the thing about the manager thread is the GUI gives it work.

42:08 Whenever there's work to be done that's like CPU intensive, it gives it to the manager.

42:11 But like a good manager, the manager never does any work.

42:14 And that means that the GUI thread always gets all the CPU of its core, so it's never blocked.

42:21 And the manager's given all the work and never does any work.

42:24 And that solved the problem because what the manager does, it uses multiprocessing to hand it off to other processes.

42:30 And if you've got multiple cores, that's no problem.

42:34 I did try it on single core machines, and it was still no problem.

42:37 Right, because you still have the preemptive multithreading that gives you enough time slice that your user feels like your app's working.

42:42 So basically, you've got two threads.

42:45 The GUI thread gets all the CPU for its core.

42:48 And whenever it has work, it gives it to the manager who immediately hands it on to a process in a process pool.

42:56 And that process is separate and goes off and does it and lets you know.

43:00 And, of course, is cancelable if the user wants that.

43:03 Right. Okay. Very, very nice.

43:04 Because it basically, it shares one int.

43:07 And the int is either going to say, like, you're good to go or, like, they don't want you anymore.

43:12 Stop work.

43:14 That model I cover in the book.

43:16 And as I say, it'll work with any, although I show it with TK Enter, it's not TK Enter specific.

43:21 Yeah, absolutely.

43:22 I think that's a pretty good summation of the concurrency story.

43:26 The other part of performance that you talked about that I actually don't know very much about and I haven't talked a lot about on my show is using Cython to speed up your code.

43:37 Can you tell everyone what Cython is and give a quick summary there?

43:40 Okay.

43:41 Cython is basically, it's a kind of compiler.

43:47 So, if you have an application written in pure, ordinary Python and you run it through Cython, it will create a C version of your code.

43:58 And my experience is that will basically run twice as fast, just without touching it, without doing anything, just because it's now C.

44:08 But you can then go further.

44:10 You can actually give it hints and say, well, you can give it type hints.

44:15 So, basically, you can say, well, this is an int or this is a string.

44:18 And if you give it hints, it can optimize it better.

44:22 It's also got optimizations for NumPy.

44:25 So, for people who are interested in that kind of processing.

44:28 So, it can produce very fast code for that.

44:31 Yeah, that's interesting.

44:32 And it's, I think, worth pointing out that it's not the same concept as type hints in Python 3.5, which is just more of an IDE tool, right?

44:40 That's right.

44:41 Typing module 4.3.5 is, in a sense, it has no functionality at runtime.

44:49 It is purely used for static code analysis to say, you know, whether you're being consistent with your types.

44:56 In other words, you're saying, I'm claiming that this is a list of strings and it will statically analyze your code and say, well, okay, you've only used it as if it were a list of strings.

45:07 So, that's good.

45:09 But, of course, a compiler could use that type hinting information to produce more optimized code.

45:16 And I expect that's where things will go.

45:18 Yeah, absolutely.

45:19 I'm hoping that Cython will actually adopt that syntax.

45:23 And that, I mean, there are other compilers like Nuitka and so on may adopt that.

45:27 Now that typing is a standard module, one hopes that the third-party compilers will adopt it.

45:34 Yeah, at least there's an option, right?

45:35 Yeah.

45:36 And it would mean consistent code then.

45:37 It would mean you could write your code using typing and know that whichever one of the compilers you chose would give you some kind of speed up.

45:45 Yeah, yeah.

45:46 Beautiful.

45:46 All right, so we don't have a lot of time left in the show, but I wanted to give you a chance to just talk about some of the projects on your website.

45:53 One of the ones that you were working on is something called Diff PDF.

45:56 Another one was the Gravitate Game.

45:58 I thought those were kind of interesting.

45:59 Oh, yeah.

46:00 The game was just a bit of fun.

46:02 I did it in one of my books.

46:04 I think it's actually in Python practice because I'd never put a game in a book.

46:07 And I thought, well, why not?

46:08 I wrote it in TK Inter, but I have got cute versions.

46:12 And on the website, I've got a JavaScript version that I did using the canvas, you know, the HTML5 canvas.

46:19 It's basically the same game or tile fall, but with the things gravitating to the middle rather than falling to the bottom and left.

46:26 That's it.

46:27 And yeah, you can do fun games with Python, no problem.

46:30 And of course, there is a Pi game library as well for people who are more sort of heavily into games.

46:35 Yeah.

46:35 Diff PDF is paying my salary.

46:39 Basically, it compares PDFs and you might think, well, that's easy.

46:44 You just compare the pixels and it can do that.

46:47 And lots of other tools can do that.

46:48 But what turns out to be quite tricky is comparing the text as text because PDFs are really a graphical file format.

46:58 So a PDF file doesn't actually know what a sentence is or even a word.

47:02 So it can break up text in quite weird ways.

47:05 And this PDF gives you a sort of rational comparison.

47:08 Yeah, cool.

47:09 So you can ask questions like, is the essential content changed, not just like something bolded or whatever, right?

47:15 Yeah.

47:16 And I thought it'd be used by publishers.

47:18 I wanted to use it originally to compare.

47:20 If you do a second printing of a book, not a second edition, a second printing, the publisher will let you make minor corrections as long as it doesn't change the pagination.

47:30 And having to like check that I hadn't messed up by looking at 300 or 400 or 500 pages was pretty tiring.

47:40 So that was an incentive for creating this tool.

47:44 You're like, within the time it would take me to do this, I could write an app and solve this.

47:47 Exactly.

47:47 But it turns out it's used by finance companies, insurance companies and banks.

47:53 Ah, the lawyer types, yeah.

47:54 Well, yeah, but why?

47:56 And I don't know because they won't tell me.

47:58 But they use it.

47:59 And as long as they buy, I don't care.

48:03 I mean, that's great.

48:04 That's cool.

48:04 And is that written in Python?

48:05 It is.

48:06 It was originally written in C++, but now it's written in Python.

48:10 It uses concurrency, the model of concurrency I described.

48:12 And it's a Windows-specific product.

48:16 It uses a third-party PDF library that I bought, a royalty-free library.

48:23 And yeah, that's been paying the way.

48:26 But I've come up with another program, one that I originally wanted to write more than 20 years ago.

48:32 But I didn't have the skill then, and the tools weren't available anyway.

48:37 And that's X-Index.

48:38 And it's for book indexes.

48:39 And there are a couple of – well, there are some existing products out there for book indexes.

48:46 But this one uses Python, and it uses SQLite, which I adore as a database.

48:51 I really like it.

48:53 Yeah, I'm a fan of it too.

48:54 Yeah.

48:54 So I get all the reliability and also the convenience.

48:58 I mean, SQLite has full-text search, which I think people from Google added to it.

49:03 And that's just absolutely superb.

49:05 Oh, yeah.

49:05 That's excellent.

49:06 So that application has only gone on sale the end of last month.

49:11 Wow.

49:11 Congratulations on that.

49:12 That's excellent.

49:13 Thank you.

49:13 So I'm really pleased about that.

49:15 And I'm waiting to see – like people can use it for 40 days free trial now.

49:19 So I'll see in a couple of months if people actually buy it.

49:23 Well, good luck on that.

49:25 That's great.

49:25 Yeah.

49:26 So before we end the show, let me ask you a couple of questions that I always ask everyone.

49:30 Yeah.

49:31 So there's close to 80,000 PyPI packages out there.

49:35 And everybody uses some that are super interesting that maybe don't get the rounds that everyone knows about.

49:42 So what are ones that you really like?

49:44 Well, obviously, I use the PySite.

49:46 I use the Roman one as well because I use Roman numerals like in the indexing thing.

49:51 Nice.

49:51 Yeah, I know.

49:52 But the – and obviously, I use CX3s.

49:57 And I use PyWin32, which is very useful for Windows.

50:01 And I use – and another Windows one I use is WMI, which is Windows – Windows management infrastructure, I think.

50:07 Thank you very much because I'd forgotten.

50:09 But the one that I want to sort of boost, if you like, is APSW, another Python SQLite wrapper.

50:18 And as you know, the Python standard library has a SQLite 3 module, which is perfectly good.

50:23 Nothing wrong with that.

50:24 But APSW is absolutely excellent.

50:28 It provides you as a Python programmer with all the access to SQLite that you would get if you were a C programmer, but with all the pleasure of programming in Python.

50:39 Oh, wonderful.

50:40 So you can create your own custom functions in Python that you can feed into it.

50:45 So you can create your own collations.

50:47 You can even create your own virtual tables.

50:50 Everything that you can do in C, you can do in Python.

50:53 And it's just a fantastic library.

50:56 It doesn't follow precisely the DBAPI 2 standard.

51:03 It does where it can, but it favors – if there's a choice and like SQLite offers more, it offers you the more.

51:09 Because it's designed to give you everything that SQLite has to offer.

51:13 I mean, if you're wanting to prototype on SQLite for transferring to another database, then use the built-in SQLite 3.

51:21 But if you want to use SQLite in its own right, for example, as a file format or for some other purpose where you're only going to be using SQLite, then APSW is the best module I've ever seen for doing that.

51:33 Yeah, that's wonderful.

51:35 While we're on database packages, I'll also throw out records by Kenneth Wright, which he called SQL for humans, which is like the simplest possible sort of alternative to DBAPI that I can find.

51:45 It's like the opposite end.

51:47 It's like super simple, not like, you know, we're going to give you access to everything.

51:51 Okay.

51:51 I mean, I like writing raw SQL, so APSW suits me for that as well.

51:56 I mean, I know there are things like SQLAlchemy and, you know, that give you a higher level, but I love APSW.

52:01 Yeah, wonderful.

52:02 Wonderful.

52:03 Okay.

52:03 Last question.

52:04 When you write some Python code, what editor do you use?

52:07 I use gvim.

52:08 gvim.

52:08 Okay.

52:09 Okay.

52:09 So graphical vim.

52:10 Now, I'm not saying that I'd recommend that.

52:12 I think that it could drive, you know, someone insane trying to learn it because it is a strange editor, but I've been using it for more than 20 years now.

52:21 I just, I would find it hard to use another one for daily work.

52:25 All right.

52:25 Excellent.

52:25 Well, thanks for the recommendation.

52:26 Okay.

52:27 And thank you very much for having me on your show.

52:31 Yeah, Mark.

52:32 It's been a great conversation.

52:33 And I'm really happy to shine a light on this whole concurrency story and the GUI story as well in Python because they don't get as much coverage as I think they should.

52:42 No.

52:42 And Python is good at both of those things.

52:45 You know, I don't believe people who say otherwise.

52:48 Python needs, gives you a lot of concurrency options, so you've got more choice and you need to choose with more care.

52:54 But if you choose well, then it will give you great performance.

52:57 Yeah, absolutely.

52:58 All right.

52:59 Well, thanks for being on the show.

52:59 It was great to talk to you.

53:00 Thank you very much.

53:03 This has been another episode of Talk Python To Me.

53:06 Today's guest was Mark Summerfield, and this episode has been sponsored by Hired and SnapCI.

53:11 Thank you both for supporting the show.

53:13 Hired wants to help you find your next big thing.

53:15 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $2,000.

53:24 SnapCI is modern, continuous integration and delivery.

53:28 Build, test, and deploy your code directly from GitHub, all in your browser with debugging,

53:32 Docker and parallels included.

53:34 Try them for free at snap.ci slash Talk Python.

53:37 Are you or a colleague trying to learn Python?

53:40 Have you tried books and videos that left you bored by just covering topics point by point?

53:44 Well, check out my online course, Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python.

53:53 You can find the links from today's show at talkpython.fm/episodes slash show slash 58.

53:59 Be sure to subscribe to the show.

54:01 Open your favorite podcatcher and search for Python.

54:03 We should be right at the top.

54:04 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

54:14 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

54:18 You can hear the entire song at talkpython.fm/music.

54:22 This is your host, Michael Kennedy.

54:24 Thanks so much for listening.

54:25 I really appreciate it.

54:26 Smix, let's get out of here.

54:29 Staying with my voice.

54:31 There's no norm that I can feel within.

54:33 Haven't been sleeping.

54:34 I've been using lots of rest.

54:35 I'll pass the mic back to who rocked it best.

54:38 I'll pass the mic back to who rocked it best.